未验证 提交 ba956b48 编写于 作者: M MissPenguin 提交者: GitHub

Merge pull request #1380 from LDOUBLEV/dyg_db

update doc of db and installtion
...@@ -14,7 +14,7 @@ wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_l ...@@ -14,7 +14,7 @@ wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/train_icdar2015_l
wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt wget -P ./train_data/ https://paddleocr.bj.bcebos.com/dataset/test_icdar2015_label.txt
``` ```
PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `train_data/gen_label.py`, 这里以训练集为例: PaddleOCR 也提供了数据格式转换脚本,可以将官网 label 转换支持的数据格式。 数据转换工具在 `ppocr/utils/gen_label.py`, 这里以训练集为例:
``` ```
# 将官网下载的标签文件转换为 train_icdar2015_label.txt # 将官网下载的标签文件转换为 train_icdar2015_label.txt
...@@ -73,25 +73,30 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model ...@@ -73,25 +73,30 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model
*如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false* *如果您安装的是cpu版本,请将配置文件中的 `use_gpu` 字段修改为false*
```shell ```shell
# 训练 mv3_db 模型,并将训练日志保存为 tain_det.log # 单机单卡训练 mv3_db 模型
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml \ python3 tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ \ -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
2>&1 | tee train_det.log # 单机多卡训练,通过 --select_gpus 参数设置使用的GPU ID;
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/
``` ```
上述指令中,通过-c 选择训练使用configs/det/det_db_mv3_v1.1.yml配置文件。
上述指令中,通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。
有关配置文件的详细解释,请参考[链接](./config.md) 有关配置文件的详细解释,请参考[链接](./config.md)
您也可以通过-o参数在不需要修改yml文件的情况下,改变训练的参数,比如,调整训练的学习率为0.0001 您也可以通过-o参数在不需要修改yml文件的情况下,改变训练的参数,比如,调整训练的学习率为0.0001
```shell ```shell
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Optimizer.base_lr=0.0001 python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
``` ```
#### 断点训练 #### 断点训练
如果训练程序中断,如果希望加载训练中断的模型从而恢复训练,可以通过指定Global.checkpoints指定要加载的模型路径: 如果训练程序中断,如果希望加载训练中断的模型从而恢复训练,可以通过指定Global.checkpoints指定要加载的模型路径:
```shell ```shell
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./your/trained/model python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
``` ```
**注意**`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrain_weights`指定的模型。 **注意**`Global.checkpoints`的优先级高于`Global.pretrain_weights`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrain_weights`指定的模型。
...@@ -100,17 +105,17 @@ python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints= ...@@ -100,17 +105,17 @@ python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=
PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall、Hmean。 PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall、Hmean。
运行如下代码,根据配置文件`det_db_mv3_v1.1.yml``save_res_path`指定的测试集检测结果文件,计算评估指标。 运行如下代码,根据配置文件`det_db_mv3.yml``save_res_path`指定的测试集检测结果文件,计算评估指标。
评估时设置后处理参数`box_thresh=0.6``unclip_ratio=1.5`,使用不同数据集、不同模型训练,可调整这两个参数进行优化 评估时设置后处理参数`box_thresh=0.6``unclip_ratio=1.5`,使用不同数据集、不同模型训练,可调整这两个参数进行优化
```shell ```shell
python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
训练中模型参数默认保存在`Global.save_model_dir`目录下。在评估指标时,需要设置`Global.checkpoints`指向保存的参数文件。 训练中模型参数默认保存在`Global.save_model_dir`目录下。在评估指标时,需要设置`Global.checkpoints`指向保存的参数文件。
比如: 比如:
```shell ```shell
python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
* 注:`box_thresh``unclip_ratio`是DB后处理所需要的参数,在评估EAST模型时不需要设置 * 注:`box_thresh``unclip_ratio`是DB后处理所需要的参数,在评估EAST模型时不需要设置
...@@ -119,16 +124,16 @@ python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints= ...@@ -119,16 +124,16 @@ python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=
测试单张图像的检测效果 测试单张图像的检测效果
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
``` ```
测试DB模型时,调整后处理阈值, 测试DB模型时,调整后处理阈值,
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
测试文件夹下所有图像的检测效果 测试文件夹下所有图像的检测效果
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
``` ```
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23 经测试PaddleOCR可在glibc 2.23上运行,您也可以测试其他glibc版本或安装glic 2.23
PaddleOCR 工作环境 PaddleOCR 工作环境
- PaddlePaddle 1.8+ ,推荐使用 PaddlePaddle 2.0.0.beta - PaddlePaddle 2.0rc0+ ,推荐使用 PaddlePaddle 2.0rc0
- python3.7 - python3.7
- glibc 2.23 - glibc 2.23
- cuDNN 7.6+ (GPU) - cuDNN 7.6+ (GPU)
...@@ -19,44 +19,27 @@ cd /home/Projects ...@@ -19,44 +19,27 @@ cd /home/Projects
# 创建一个名字为ppocr的docker容器,并将当前目录映射到容器的/paddle目录下 # 创建一个名字为ppocr的docker容器,并将当前目录映射到容器的/paddle目录下
如果您希望在CPU环境下使用docker,使用docker而不是nvidia-docker创建docker 如果您希望在CPU环境下使用docker,使用docker而不是nvidia-docker创建docker
sudo docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash sudo docker run --name ppocr -v $PWD:/paddle --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash
如果使用CUDA9,请运行以下命令创建容器 如果使用CUDA10,请运行以下命令创建容器,设置docker容器共享内存shm-size为64G,建议设置32G以上
sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash
如果使用CUDA10,请运行以下命令创建容器
sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev /bin/bash
您也可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。 您也可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。
# ctrl+P+Q可退出docker,重新进入docker使用如下命令 # ctrl+P+Q可退出docker 容器,重新进入docker 容器使用如下命令
sudo docker container exec -it ppocr /bin/bash sudo docker container exec -it ppocr /bin/bash
``` ```
注意:如果docker pull过慢,可以按照如下步骤手动下载后加载docker,以cuda9 docker为例,使用cuda10 docker只需要将cuda9改为cuda10即可。
```
# 下载CUDA9 docker的压缩文件,并解压
wget https://paddleocr.bj.bcebos.com/docker/docker_pdocr_cuda9.tar.gz
# 为减少下载时间,上传的docker image是压缩过的,需要解压使用
tar zxf docker_pdocr_cuda9.tar.gz
# 创建image
docker load < docker_pdocr_cuda9.tar
# 完成上述步骤后通过docker images检查是否加载了下载的镜像
docker images
# 执行docker images后如果有下面的输出,即可按照按照 步骤1 创建docker环境。
hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829
```
**2. 安装PaddlePaddle Fluid v2.0** **2. 安装PaddlePaddle Fluid v2.0**
``` ```
pip3 install --upgrade pip pip3 install --upgrade pip
如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装 如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
python3 -m pip install paddlepaddle-gpu==2.0.0b0 -i https://mirror.baidu.com/pypi/simple python3 -m pip install paddlepaddle-gpu==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
如果您的机器是CPU,请运行以下命令安装 如果您的机器是CPU,请运行以下命令安装
python3 -m pip install paddlepaddle==2.0.0b0 -i https://mirror.baidu.com/pypi/simple python3 -m pip install paddlepaddle==2.0.0rc0 -i https://mirror.baidu.com/pypi/simple
更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。 更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
``` ```
......
...@@ -64,7 +64,7 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model ...@@ -64,7 +64,7 @@ tar -xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_model
#### START TRAINING #### START TRAINING
*If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.* *If CPU version installed, please set the parameter `use_gpu` to `false` in the configuration.*
```shell ```shell
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml 2>&1 | tee train_det.log python3 tools/train.py -c configs/det/det_mv3_db.yml
``` ```
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file. In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file.
...@@ -72,7 +72,12 @@ For a detailed explanation of the configuration file, please refer to [config](. ...@@ -72,7 +72,12 @@ For a detailed explanation of the configuration file, please refer to [config](.
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
```shell ```shell
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Optimizer.base_lr=0.0001 # single GPU training
python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
# multi-GPU training
# Set the GPU ID used by the '--select_gpus' parameter;
python3 -m paddle.distributed.launch --selected_gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
``` ```
#### load trained model and continue training #### load trained model and continue training
...@@ -80,7 +85,7 @@ If you expect to load trained model and continue the training again, you can spe ...@@ -80,7 +85,7 @@ If you expect to load trained model and continue the training again, you can spe
For example: For example:
```shell ```shell
python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=./your/trained/model python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model
``` ```
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded. **Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
...@@ -90,18 +95,18 @@ python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints= ...@@ -90,18 +95,18 @@ python3 tools/train.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=
PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean. PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean.
Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3_v1.1.yml` Run the following code to calculate the evaluation indicators. The result will be saved in the test result file specified by `save_res_path` in the configuration file `det_db_mv3.yml`
When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result. When evaluating, set post-processing parameters `box_thresh=0.6`, `unclip_ratio=1.5`. If you use different datasets, different models for training, these two parameters should be adjusted for better result.
```shell ```shell
python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="{path/to/weights}/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file. The model parameters during training are saved in the `Global.save_model_dir` directory by default. When evaluating indicators, you need to set `Global.checkpoints` to point to the saved parameter file.
Such as: Such as:
```shell ```shell
python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
* Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model. * Note: `box_thresh` and `unclip_ratio` are parameters required for DB post-processing, and not need to be set when evaluating the EAST model.
...@@ -110,16 +115,16 @@ python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints= ...@@ -110,16 +115,16 @@ python3 tools/eval.py -c configs/det/det_mv3_db_v1.1.yml -o Global.checkpoints=
Test the detection result on a single image: Test the detection result on a single image:
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy"
``` ```
When testing the DB model, adjust the post-processing threshold: When testing the DB model, adjust the post-processing threshold:
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5 python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.checkpoints="./output/det_db/best_accuracy" PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5
``` ```
Test the detection result on all images in the folder: Test the detection result on all images in the folder:
```shell ```shell
python3 tools/infer_det.py -c configs/det/det_mv3_db_v1.1.yml -o TestReader.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy" python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.checkpoints="./output/det_db/best_accuracy"
``` ```
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility. After testing, paddleocr can run on glibc 2.23. You can also test other glibc versions or install glic 2.23 for the best compatibility.
PaddleOCR working environment: PaddleOCR working environment:
- PaddlePaddle1.8+, Recommend PaddlePaddle 2.0.0.beta - PaddlePaddle1.8+, Recommend PaddlePaddle 2.0rc0
- python3.7 - python3.7
- glibc 2.23 - glibc 2.23
...@@ -11,7 +11,7 @@ It is recommended to use the docker provided by us to run PaddleOCR, please refe ...@@ -11,7 +11,7 @@ It is recommended to use the docker provided by us to run PaddleOCR, please refe
*If you want to directly run the prediction code on mac or windows, you can start from step 2.* *If you want to directly run the prediction code on mac or windows, you can start from step 2.*
**1. (Recommended) Prepare a docker environment. The first time you use this image, it will be downloaded automatically. Please be patient.** **1. (Recommended) Prepare a docker environment. The first time you use this docker image, it will be downloaded automatically. Please be patient.**
``` ```
# Switch to the working directory # Switch to the working directory
cd /home/Projects cd /home/Projects
...@@ -19,15 +19,13 @@ cd /home/Projects ...@@ -19,15 +19,13 @@ cd /home/Projects
# Create a docker container named ppocr and map the current directory to the /paddle directory of the container # Create a docker container named ppocr and map the current directory to the /paddle directory of the container
#If using CPU, use docker instead of nvidia-docker to create docker #If using CPU, use docker instead of nvidia-docker to create docker
sudo docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash sudo docker run --name ppocr -v $PWD:/paddle --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash
``` ```
If using CUDA9, please run the following command to create a container:
``` If using CUDA10, please run the following command to create a container.
sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev /bin/bash It is recommended to set a shared memory greater than or equal to 32G through the --shm-size parameter:
```
If using CUDA10, please run the following command to create a container:
``` ```
sudo nvidia-docker run --name ppocr -v $PWD:/paddle --network=host -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev /bin/bash sudo nvidia-docker run --name ppocr -v $PWD:/paddle --shm-size=64G --network=host -it paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash
``` ```
You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get the image that fits your machine. You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get the image that fits your machine.
``` ```
...@@ -35,29 +33,15 @@ You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags ...@@ -35,29 +33,15 @@ You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags
sudo docker container exec -it ppocr /bin/bash sudo docker container exec -it ppocr /bin/bash
``` ```
Note: If the docker pull is too slow, you can download and load the docker image manually according to the following steps. Take cuda9 docker for example, you only need to change cuda9 to cuda10 to use cuda10 docker:
```
# Download the CUDA9 docker compressed file and unzip it
wget https://paddleocr.bj.bcebos.com/docker/docker_pdocr_cuda9.tar.gz
# To reduce download time, the uploaded docker image is compressed and needs to be decompressed
tar zxf docker_pdocr_cuda9.tar.gz
# Create image
docker load < docker_pdocr_cuda9.tar
# After completing the above steps, check whether the downloaded image is loaded through docker images
docker images
# If you have the following output after executing docker images, you can follow step 1 to create a docker environment.
hub.baidubce.com/paddlepaddle/paddle latest-gpu-cuda9.0-cudnn7-dev f56310dcc829
```
**2. Install PaddlePaddle Fluid v2.0** **2. Install PaddlePaddle Fluid v2.0**
``` ```
pip3 install --upgrade pip pip3 install --upgrade pip
# If you have cuda9 or cuda10 installed on your machine, please run the following command to install # If you have cuda9 or cuda10 installed on your machine, please run the following command to install
python3 -m pip install paddlepaddle-gpu==2.0.0b0 -i https://mirror.baidu.com/pypi/simple python3 -m pip install paddlepaddle-gpu==2.0rc0 -i https://mirror.baidu.com/pypi/simple
# If you only have cpu on your machine, please run the following command to install # If you only have cpu on your machine, please run the following command to install
python3 -m pip install paddlepaddle==2.0.0b0 -i https://mirror.baidu.com/pypi/simple python3 -m pip install paddlepaddle==2.0rc0 -i https://mirror.baidu.com/pypi/simple
``` ```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation. For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
......
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import argparse
import json
def gen_rec_label(input_path, out_label):
with open(out_label, 'w') as out_file:
with open(input_path, 'r') as f:
for line in f.readlines():
tmp = line.strip('\n').replace(" ", "").split(',')
img_path, label = tmp[0], tmp[1]
label = label.replace("\"", "")
out_file.write(img_path + '\t' + label + '\n')
def gen_det_label(root_path, input_dir, out_label):
with open(out_label, 'w') as out_file:
for label_file in os.listdir(input_dir):
img_path = root_path + label_file[3:-4] + ".jpg"
label = []
with open(os.path.join(input_dir, label_file), 'r') as f:
for line in f.readlines():
tmp = line.strip("\n\r").replace("\xef\xbb\xbf",
"").split(',')
points = tmp[:8]
s = []
for i in range(0, len(points), 2):
b = points[i:i + 2]
b = [int(t) for t in b]
s.append(b)
result = {"transcription": tmp[8], "points": s}
label.append(result)
out_file.write(img_path + '\t' + json.dumps(
label, ensure_ascii=False) + '\n')
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
'--mode',
type=str,
default="rec",
help='Generate rec_label or det_label, can be set rec or det')
parser.add_argument(
'--root_path',
type=str,
default=".",
help='The root directory of images.Only takes effect when mode=det ')
parser.add_argument(
'--input_path',
type=str,
default=".",
help='Input_label or input path to be converted')
parser.add_argument(
'--output_label',
type=str,
default="out_label.txt",
help='Output file name')
args = parser.parse_args()
if args.mode == "rec":
print("Generate rec label")
gen_rec_label(args.input_path, args.output_label)
elif args.mode == "det":
gen_det_label(args.root_path, args.input_path, args.output_label)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册