未验证 提交 006d84bf 编写于 作者: 崔浩 提交者: GitHub

Merge branch 'PaddlePaddle:dygraph' into dygraph

......@@ -34,10 +34,10 @@ PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, w
pip3 install --upgrade pip
# If you have cuda9 or cuda10 installed on your machine, please run the following command to install
python3 -m pip install paddlepaddle-gpu==2.0.0 -i https://mirror.baidu.com/pypi/simple
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
# If you only have cpu on your machine, please run the following command to install
python3 -m pip install paddlepaddle==2.0.0 -i https://mirror.baidu.com/pypi/simple
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
......
......@@ -37,11 +37,11 @@ PPOCRLabel是一款适用于OCR领域的半自动化图形标注工具,内置P
pip3 install --upgrade pip
如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
python3 -m pip install paddlepaddle-gpu==2.0.0 -i https://mirror.baidu.com/pypi/simple
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
如果您的机器是CPU,请运行以下命令安装
python3 -m pip install paddlepaddle==2.0.0 -i https://mirror.baidu.com/pypi/simple
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
更多的版本需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
......
#!/bin/bash
function func_parser_key(){
strs=$1
IFS=":"
array=(${strs})
tmp=${array[0]}
echo ${tmp}
}
function func_parser_value(){
strs=$1
IFS=":"
array=(${strs})
tmp=${array[1]}
echo ${tmp}
}
function func_set_params(){
key=$1
value=$2
if [ ${key}x = "null"x ];then
echo " "
elif [[ ${value} = "null" ]] || [[ ${value} = " " ]] || [ ${#value} -le 0 ];then
echo " "
else
echo "${key}=${value}"
fi
}
function func_parser_params(){
strs=$1
IFS=":"
array=(${strs})
key=${array[0]}
tmp=${array[1]}
IFS="|"
res=""
for _params in ${tmp[*]}; do
IFS="="
array=(${_params})
mode=${array[0]}
value=${array[1]}
if [[ ${mode} = ${MODE} ]]; then
IFS="|"
#echo $(func_set_params "${mode}" "${value}")
echo $value
break
fi
IFS="|"
done
echo ${res}
}
function status_check(){
last_status=$1 # the exit code
run_command=$2
run_log=$3
if [ $last_status -eq 0 ]; then
echo -e "\033[33m Run successfully with command - ${run_command}! \033[0m" | tee -a ${run_log}
else
echo -e "\033[33m Run failed with command - ${run_command}! \033[0m" | tee -a ${run_log}
fi
}
import numpy as np
import os
import subprocess
import json
import argparse
import glob
def init_args():
parser = argparse.ArgumentParser()
# params for testing assert allclose
parser.add_argument("--atol", type=float, default=1e-3)
parser.add_argument("--rtol", type=float, default=1e-3)
parser.add_argument("--gt_file", type=str, default="")
parser.add_argument("--log_file", type=str, default="")
parser.add_argument("--precision", type=str, default="fp32")
return parser
def parse_args():
parser = init_args()
return parser.parse_args()
def run_shell_command(cmd):
p = subprocess.Popen(
cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
out, err = p.communicate()
if p.returncode == 0:
return out.decode('utf-8')
else:
return None
def parser_results_from_log_by_name(log_path, names_list):
if not os.path.exists(log_path):
raise ValueError("The log file {} does not exists!".format(log_path))
if names_list is None or len(names_list) < 1:
return []
parser_results = {}
for name in names_list:
cmd = "grep {} {}".format(name, log_path)
outs = run_shell_command(cmd)
outs = outs.split("\n")[0]
result = outs.split("{}".format(name))[-1]
try:
result = json.loads(result)
except:
result = np.array([int(r) for r in result.split()]).reshape(-1, 4)
parser_results[name] = result
return parser_results
def load_gt_from_file(gt_file):
if not os.path.exists(gt_file):
raise ValueError("The log file {} does not exists!".format(gt_file))
with open(gt_file, 'r') as f:
data = f.readlines()
f.close()
parser_gt = {}
for line in data:
image_name, result = line.strip("\n").split("\t")
image_name = image_name.split('/')[-1]
try:
result = json.loads(result)
except:
result = np.array([int(r) for r in result.split()]).reshape(-1, 4)
parser_gt[image_name] = result
return parser_gt
def load_gt_from_txts(gt_file):
gt_list = glob.glob(gt_file)
gt_collection = {}
for gt_f in gt_list:
gt_dict = load_gt_from_file(gt_f)
basename = os.path.basename(gt_f)
if "fp32" in basename:
gt_collection["fp32"] = [gt_dict, gt_f]
elif "fp16" in basename:
gt_collection["fp16"] = [gt_dict, gt_f]
elif "int8" in basename:
gt_collection["int8"] = [gt_dict, gt_f]
else:
continue
return gt_collection
def collect_predict_from_logs(log_path, key_list):
log_list = glob.glob(log_path)
pred_collection = {}
for log_f in log_list:
pred_dict = parser_results_from_log_by_name(log_f, key_list)
key = os.path.basename(log_f)
pred_collection[key] = pred_dict
return pred_collection
def testing_assert_allclose(dict_x, dict_y, atol=1e-7, rtol=1e-7):
for k in dict_x:
np.testing.assert_allclose(
np.array(dict_x[k]), np.array(dict_y[k]), atol=atol, rtol=rtol)
if __name__ == "__main__":
# Usage:
# python3.7 tests/compare_results.py --gt_file=./tests/results/*.txt --log_file=./tests/output/infer_*.log
args = parse_args()
gt_collection = load_gt_from_txts(args.gt_file)
key_list = gt_collection["fp32"][0].keys()
pred_collection = collect_predict_from_logs(args.log_file, key_list)
for filename in pred_collection.keys():
if "fp32" in filename:
gt_dict, gt_filename = gt_collection["fp32"]
elif "fp16" in filename:
gt_dict, gt_filename = gt_collection["fp16"]
elif "int8" in filename:
gt_dict, gt_filename = gt_collection["int8"]
else:
continue
pred_dict = pred_collection[filename]
try:
testing_assert_allclose(
gt_dict, pred_dict, atol=args.atol, rtol=args.rtol)
print(
"Assert allclose passed! The results of {} and {} are consistent!".
format(filename, gt_filename))
except Exception as E:
print(E)
raise ValueError(
"The results of {} and the results of {} are inconsistent!".
format(filename, gt_filename))
Global:
use_gpu: false
epoch_num: 5
log_smooth_window: 20
print_batch_step: 1
save_model_dir: ./output/db_mv3/
save_epoch_step: 1200
# evaluation is run every 2000 iterations
eval_batch_step: [0, 400]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./output/det_db/predicts_db.txt
Architecture:
model_type: det
algorithm: DB
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: False
Neck:
name: DBFPN
out_channels: 256
Head:
name: DBHead
k: 50
Loss:
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
name: Adam #Momentum
#momentum: 0.9
beta1: 0.9
beta2: 0.999
lr:
learning_rate: 0.001
regularizer:
name: 'L2'
factor: 0
PostProcess:
name: DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
Metric:
name: DetMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
ratio_list: [1.0]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- Resize:
size: [640, 640]
- MakeBorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- MakeShrinkMap:
shrink_ratio: 0.4
min_text_size: 8
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
loader:
shuffle: False
drop_last: False
batch_size_per_card: 1
num_workers: 0
use_shared_memory: False
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- DetResizeForTest:
image_shape: [736, 1280]
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 0
use_shared_memory: False
Global:
use_gpu: false
epoch_num: 5
log_smooth_window: 20
print_batch_step: 1
save_model_dir: ./output/db_mv3/
save_epoch_step: 1200
# evaluation is run every 2000 iterations
eval_batch_step: [0, 400]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./output/det_db/predicts_db.txt
Architecture:
model_type: det
algorithm: DB
Transform:
Backbone:
name: ResNet #MobileNetV3
layers: 50
Neck:
name: DBFPN
out_channels: 256
Head:
name: DBHead
k: 50
Loss:
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5 #5
beta: 10 #10
ohem_ratio: 3
Optimizer:
name: Adam #Momentum
#momentum: 0.9
beta1: 0.9
beta2: 0.999
lr:
learning_rate: 0.001
regularizer:
name: 'L2'
factor: 0
PostProcess:
name: DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
Metric:
name: DetMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
ratio_list: [1.0]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- Resize:
# size: [640, 640]
- MakeBorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- MakeShrinkMap:
shrink_ratio: 0.4
min_text_size: 8
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
loader:
shuffle: False
drop_last: False
batch_size_per_card: 1
num_workers: 0
use_shared_memory: False
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- DetResizeForTest:
image_shape: [736, 1280]
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 0
use_shared_memory: False
===========================train_params===========================
model_name:ocr_det
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:null
Global.epoch_num:lite_train_infer=1|whole_train_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_infer=2|whole_train_infer=4
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
null:null
##
trainer:norm_train|pact_train|fpgm_train
norm_train:tools/train.py -c tests/configs/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
pact_train:deploy/slim/quantization/quant.py -c tests/configs/det_mv3_db.yml -o
fpgm_train:deploy/slim/prune/sensitivity_anal.py -c tests/configs/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/det_mv3_db_v2.0_train/best_accuracy
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:null
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.pretrained_model:
norm_export:tools/export_model.py -c tests/configs/det_mv3_db.yml -o
quant_export:deploy/slim/quantization/export_model.py -c tests/configs/det_mv3_db.yml -o
fpgm_export:deploy/slim/prune/export_prune_model.py -c tests/configs/det_mv3_db.yml -o
distill_export:null
export1:null
export2:null
##
train_model:./inference/ch_ppocr_mobile_v2.0_det_train/best_accuracy
infer_export:tools/export_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o
infer_quant:False
inference:tools/infer/predict_det.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16|int8
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
null:null
--benchmark:True
null:null
===========================cpp_infer_params===========================
use_opencv:True
infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/
infer_quant:False
inference:./deploy/cpp_infer/build/ppocr det
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
null:null
--benchmark:True
===========================serving_params===========================
model_name:ocr_det
python:python3.7
trans_model:-m paddle_serving_client.convert
--dirname:./inference/ch_ppocr_mobile_v2.0_det_infer/
--model_filename:inference.pdmodel
--params_filename:inference.pdiparams
--serving_server:./deploy/pdserving/ppocr_det_mobile_2.0_serving/
--serving_client:./deploy/pdserving/ppocr_det_mobile_2.0_client/
serving_dir:./deploy/pdserving
web_service:web_service_det.py --config=config.yml --opt op.det.concurrency=1
op.det.local_service_conf.devices:null|0
op.det.local_service_conf.use_mkldnn:True|False
op.det.local_service_conf.thread_num:1|6
op.det.local_service_conf.use_trt:False|True
op.det.local_service_conf.precision:fp32|fp16|int8
pipline:pipeline_http_client.py --image_dir=../../doc/imgs
===========================kl_quant_params===========================
infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/
infer_export:tools/export_model.py -c configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml -o
infer_quant:True
inference:tools/infer/predict_det.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:int8
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
null:null
--benchmark:True
null:null
null:null
===========================train_params===========================
model_name:ocr_server_det
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:null
Global.epoch_num:lite_train_infer=2|whole_train_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_infer=2|whole_train_infer=4
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
null:null
##
trainer:norm_train|pact_train|fpgm_export
norm_train:tools/train.py -c tests/configs/det_r50_vd_db.yml -o
quant_export:deploy/slim/quantization/export_model.py -c tests/configs/det_r50_vd_db.yml -o
fpgm_export:deploy/slim/prune/export_prune_model.py -c tests/configs/det_r50_vd_db.yml -o
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c tests/configs/det_r50_vd_db.yml -o
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.pretrained_model:
norm_export:tools/export_model.py -c tests/configs/det_r50_vd_db.yml -o
quant_export:null
fpgm_export:null
distill_export:null
export1:null
export2:null
##
train_model:./inference/ch_ppocr_server_v2.0_det_train/best_accuracy
infer_export:tools/export_model.py -c configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml -o
infer_quant:False
inference:tools/infer/predict_det.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16|int8
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
--save_log_path:null
--benchmark:True
null:null
===========================cpp_infer_params===========================
use_opencv:True
infer_model:./inference/ch_ppocr_server_v2.0_det_infer/
infer_quant:False
inference:./deploy/cpp_infer/build/ppocr det
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
null:null
--benchmark:True
===========================serving_params===========================
model_name:ocr_det_server
python:python3.7
trans_model:-m paddle_serving_client.convert
--dirname:./inference/ch_ppocr_server_v2.0_det_infer/
--model_filename:inference.pdmodel
--params_filename:inference.pdiparams
--serving_server:./deploy/pdserving/ppocr_det_mobile_2.0_serving/
--serving_client:./deploy/pdserving/ppocr_det_mobile_2.0_client/
serving_dir:./deploy/pdserving
web_service:web_service_det.py --config=config.yml --opt op.det.concurrency=1
op.det.local_service_conf.devices:null|0
op.det.local_service_conf.use_mkldnn:True|False
op.det.local_service_conf.thread_num:1|6
op.det.local_service_conf.use_trt:False|True
op.det.local_service_conf.precision:fp32|fp16|int8
pipline:pipeline_http_client.py --image_dir=../../doc/imgs
===========================train_params===========================
model_name:ocr_rec
python:python3.7
gpu_list:0|2,3
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:null
Global.epoch_num:lite_train_infer=2|whole_train_infer=300
......@@ -9,7 +9,7 @@ Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_infer=128|whole_train_infer=128
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./train_data/ic15_data/train
train_infer_img_dir:./inference/rec_inference
null:null
##
trainer:norm_train|pact_train
......@@ -41,7 +41,7 @@ inference:tools/infer/predict_rec.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--rec_batch_num:1|6
--use_tensorrt:True|False
--precision:fp32|fp16|int8
--rec_model_dir:
......@@ -49,3 +49,35 @@ inference:tools/infer/predict_rec.py
--save_log_path:./test/output/
--benchmark:True
null:null
===========================cpp_infer_params===========================
use_opencv:True
infer_model:./inference/ch_ppocr_mobile_v2.0_rec_infer/
infer_quant:False
inference:./deploy/cpp_infer/build/ppocr rec
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16
--rec_model_dir:
--image_dir:./inference/rec_inference/
null:null
--benchmark:True
===========================serving_params===========================
model_name:ocr_rec
python:python3.7
trans_model:-m paddle_serving_client.convert
--dirname:./inference/ch_ppocr_mobile_v2.0_rec_infer/
--model_filename:inference.pdmodel
--params_filename:inference.pdiparams
--serving_server:./deploy/pdserving/ppocr_rec_mobile_2.0_serving/
--serving_client:./deploy/pdserving/ppocr_rec_mobile_2.0_client/
serving_dir:./deploy/pdserving
web_service:web_service_rec.py --config=config.yml --opt op.rec.concurrency=1
op.rec.local_service_conf.devices:null|0
op.rec.local_service_conf.use_mkldnn:True|False
op.rec.local_service_conf.thread_num:1|6
op.rec.local_service_conf.use_trt:False|True
op.rec.local_service_conf.precision:fp32|fp16|int8
pipline:pipeline_http_client.py --image_dir=../../doc/imgs_words_en
===========================train_params===========================
model_name:ocr_server_rec
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
Global.auto_cast:null
Global.epoch_num:lite_train_infer=2|whole_train_infer=300
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_infer=128|whole_train_infer=128
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./inference/rec_inference
null:null
##
trainer:norm_train|pact_train
norm_train:tools/train.py -c tests/configs/rec_icdar15_r34_train.yml -o
pact_train:deploy/slim/quantization/quant.py -c tests/configs/rec_icdar15_r34_train.yml -o
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c tests/configs/rec_icdar15_r34_train.yml -o
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.pretrained_model:
norm_export:tools/export_model.py -c tests/configs/rec_icdar15_r34_train.yml -o
quant_export:deploy/slim/quantization/export_model.py -c tests/configs/rec_icdar15_r34_train.yml -o
fpgm_export:null
distill_export:null
export1:null
export2:null
##
infer_model:./inference/ch_ppocr_server_v2.0_rec_infer/
infer_export:null
infer_quant:False
inference:tools/infer/predict_rec.py
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1|6
--use_tensorrt:True|False
--precision:fp32|fp16|int8
--rec_model_dir:
--image_dir:./inference/rec_inference
--save_log_path:./test/output/
--benchmark:True
null:null
===========================cpp_infer_params===========================
use_opencv:True
infer_model:./inference/ch_ppocr_server_v2.0_rec_infer/
infer_quant:False
inference:./deploy/cpp_infer/build/ppocr rec
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16
--rec_model_dir:
--image_dir:./inference/rec_inference/
null:null
--benchmark:True
===========================serving_params===========================
model_name:ocr_server_rec
python:python3.7
trans_model:-m paddle_serving_client.convert
--dirname:./inference/ch_ppocr_server_v2.0_rec_infer/
--model_filename:inference.pdmodel
--params_filename:inference.pdiparams
--serving_server:./deploy/pdserving/ppocr_rec_mobile_2.0_serving/
--serving_client:./deploy/pdserving/ppocr_rec_mobile_2.0_client/
serving_dir:./deploy/pdserving
web_service:web_service_rec.py --config=config.yml --opt op.rec.concurrency=1
op.rec.local_service_conf.devices:null|0
op.rec.local_service_conf.use_mkldnn:True|False
op.rec.local_service_conf.thread_num:1|6
op.rec.local_service_conf.use_trt:False|True
op.rec.local_service_conf.precision:fp32|fp16|int8
pipline:pipeline_http_client.py --image_dir=../../doc/imgs_words_en
===========================train_params===========================
model_name:ocr_det
model_name:ocr_system_mobile
python:python3.7
gpu_list:0|0,1
Global.use_gpu:True|True
gpu_list:null
Global.use_gpu:null
Global.auto_cast:null
Global.epoch_num:lite_train_infer=2|whole_train_infer=300
Global.epoch_num:null
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:lite_train_infer=2|whole_train_infer=4
Train.loader.batch_size_per_card:null
Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./train_data/icdar2015/text_localization/ch4_test_images/
train_model_name:null
train_infer_img_dir:null
null:null
##
trainer:norm_train|pact_train
norm_train:tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
pact_train:deploy/slim/quantization/quant.py -c configs/det/det_mv3_db.yml -o
trainer:
norm_train:null
pact_train:null
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c configs/det/det_mv3_db.yml -o
eval:null
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.pretrained_model:
norm_export:tools/export_model.py -c configs/det/det_mv3_db.yml -o
quant_export:deploy/slim/quantization/export_model.py -c configs/det/det_mv3_db.yml -o
fpgm_export:deploy/slim/prune/export_prune_model.py
norm_export:null
quant_export:null
fpgm_export:null
distill_export:null
export1:null
export2:null
......@@ -37,8 +37,8 @@ export2:null
infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/
infer_export:null
infer_quant:False
inference:tools/infer/predict_det.py
--use_gpu:True|False
inference:tools/infer/predict_system.py
--use_gpu:True
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
......@@ -48,5 +48,20 @@ inference:tools/infer/predict_det.py
--image_dir:./inference/ch_det_data_50/all-sum-510/
--save_log_path:null
--benchmark:True
null:null
--rec_model_dir:./inference/ch_ppocr_mobile_v2.0_rec_infer/
===========================cpp_infer_params===========================
use_opencv:True
infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/
infer_quant:False
inference:./deploy/cpp_infer/build/ppocr system
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
--rec_model_dir:./inference/ch_ppocr_mobile_v2.0_rec_infer/
--benchmark:True
===========================train_params===========================
model_name:ocr_system_server
python:python3.7
gpu_list:null
Global.use_gpu:null
Global.auto_cast:null
Global.epoch_num:null
Global.save_model_dir:./output/
Train.loader.batch_size_per_card:null
Global.pretrained_model:null
train_model_name:null
train_infer_img_dir:null
null:null
##
trainer:
norm_train:null
pact_train:null
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:null
null:null
##
===========================infer_params===========================
Global.save_inference_dir:./output/
Global.pretrained_model:
norm_export:null
quant_export:null
fpgm_export:null
distill_export:null
export1:null
export2:null
##
infer_model:./inference/ch_ppocr_server_v2.0_det_infer/
infer_export:null
infer_quant:False
inference:tools/infer/predict_system.py
--use_gpu:True
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16|int8
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
--save_log_path:null
--benchmark:True
--rec_model_dir:./inference/ch_ppocr_server_v2.0_rec_infer/
===========================cpp_infer_params===========================
use_opencv:True
infer_model:./inference/ch_ppocr_server_v2.0_det_infer/
infer_quant:False
inference:./deploy/cpp_infer/build/ppocr system
--use_gpu:True|False
--enable_mkldnn:True|False
--cpu_threads:1|6
--rec_batch_num:1
--use_tensorrt:False|True
--precision:fp32|fp16
--det_model_dir:
--image_dir:./inference/ch_det_data_50/all-sum-510/
--rec_model_dir:./inference/ch_ppocr_server_v2.0_rec_infer/
--benchmark:True
\ No newline at end of file
Global:
use_gpu: true
epoch_num: 72
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec/ic15/
save_epoch_step: 3
# evaluation is run every 2000 iterations
eval_batch_step: [0, 2000]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir: ./
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path: ppocr/utils/en_dict.txt
character_type: EN
max_text_length: 25
infer_mode: False
use_space_char: False
save_res_path: ./output/rec/predicts_ic15.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
learning_rate: 0.0005
regularizer:
name: 'L2'
factor: 0
Architecture:
model_type: rec
algorithm: CRNN
Transform:
Backbone:
name: ResNet
layers: 34
Neck:
name: SequenceEncoder
encoder_type: rnn
hidden_size: 256
Head:
name: CTCHead
fc_decay: 0
Loss:
name: CTCLoss
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ic15_data/
label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 100]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 256
drop_last: True
num_workers: 8
use_shared_memory: False
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ic15_data
label_file_list: ["./train_data/ic15_data/rec_gt_test.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 100]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 256
num_workers: 4
use_shared_memory: False
## 环境配置
本教程适用于PTDN目录下基础功能测试的运行环境搭建。
推荐环境:
- CUDA 10.1
- CUDNN 7.6
- TensorRT 6.1.0.5 / 7.1
推荐docker镜像安装,按照如下命令创建镜像,当前目录映射到镜像中的`/paddle`目录下
```
nvidia-docker run --name paddle -it -v $PWD:/paddle paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82 /bin/bash
cd /paddle
# 安装带TRT的paddle
pip3.7 install https://paddle-wheel.bj.bcebos.com/with-trt/2.1.3/linux-gpu-cuda10.1-cudnn7-mkl-gcc8.2-trt6-avx/paddlepaddle_gpu-2.1.3.post101-cp37-cp37m-linux_x86_64.whl
# 安装AutoLog
git clone https://github.com/LDOUBLEV/AutoLog
cd AutoLog
pip3.7 install -r requirements.txt
python3.7 setup.py bdist_wheel
pip3.7 install ./dist/auto_log-1.0.0-py3-none-any.whl
# 下载OCR代码
cd ../
git clone https://github.com/PaddlePaddle/PaddleOCR
```
安装PaddleOCR依赖:
```
cd PaddleOCR
pip3.7 install -r requirements.txt
```
## FAQ :
Q. You are using Paddle compiled with TensorRT, but TensorRT dynamic library is not found. Ignore this if TensorRT is not needed.
A. 问题一般是当前安装paddle版本带TRT,但是本地环境找不到TensorRT的预测库,需要下载TensorRT库,解压后设置环境变量LD_LIBRARY_PATH;
如:
```
export LD_LIBRARY_PATH=/usr/local/python3.7.0/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/paddle/package/TensorRT-6.0.1.5/lib
```
或者问题是下载的TensorRT版本和当前paddle中编译的TRT版本不匹配,需要下载版本相符的TRT。
# C++预测功能测试
C++预测功能测试的主程序为`test_inference_cpp.sh`,可以测试基于C++预测库的模型推理功能。
## 1. 测试结论汇总
基于训练是否使用量化,进行本测试的模型可以分为`正常模型``量化模型`,这两类模型对应的C++预测功能汇总如下:
| 模型类型 |device | batchsize | tensorrt | mkldnn | cpu多线程 |
| ---- | ---- | ---- | :----: | :----: | :----: |
| 正常模型 | GPU | 1/6 | fp32/fp16 | - | - |
| 正常模型 | CPU | 1/6 | - | fp32 | 支持 |
| 量化模型 | GPU | 1/6 | int8 | - | - |
| 量化模型 | CPU | 1/6 | - | int8 | 支持 |
## 2. 测试流程
### 2.1 功能测试
先运行`prepare.sh`准备数据和模型,然后运行`test_inference_cpp.sh`进行测试,最终在```tests/output```目录下生成`cpp_infer_*.log`后缀的日志文件。
```shell
bash tests/prepare.sh ./tests/configs/ppocr_det_mobile_params.txt "cpp_infer"
# 用法1:
bash tests/test_inference_cpp.sh ./tests/configs/ppocr_det_mobile_params.txt
# 用法2: 指定GPU卡预测,第三个传入参数为GPU卡号
bash tests/test_inference_cpp.sh ./tests/configs/ppocr_det_mobile_params.txt '1'
```
### 2.2 精度测试
使用compare_results.py脚本比较模型预测的结果是否符合预期,主要步骤包括:
- 提取日志中的预测坐标;
- 从本地文件中提取保存好的坐标结果;
- 比较上述两个结果是否符合精度预期,误差大于设置阈值时会报错。
#### 使用方式
运行命令:
```shell
python3.7 tests/compare_results.py --gt_file=./tests/results/cpp_*.txt --log_file=./tests/output/cpp_*.log --atol=1e-3 --rtol=1e-3
```
参数介绍:
- gt_file: 指向事先保存好的预测结果路径,支持*.txt 结尾,会自动索引*.txt格式的文件,文件默认保存在tests/result/ 文件夹下
- log_file: 指向运行tests/test.sh 脚本的infer模式保存的预测日志,预测日志中打印的有预测结果,比如:文本框,预测文本,类别等等,同样支持infer_*.log格式传入
- atol: 设置的绝对误差
- rtol: 设置的相对误差
#### 运行结果
正常运行效果如下图:
<img src="compare_cpp_right.png" width="1000">
出现不一致结果时的运行输出:
<img src="compare_cpp_wrong.png" width="1000">
## 3. 更多教程
本文档为功能测试用,更详细的c++预测使用教程请参考:[服务器端C++预测](https://github.com/PaddlePaddle/PaddleOCR/tree/dygraph/deploy/cpp_infer)
# 基础训练预测功能测试
基础训练预测功能测试的主程序为`test_train_inference_python.sh`,可以测试基于Python的模型训练、评估、推理等基本功能,包括裁剪、量化、蒸馏。
## 1. 测试结论汇总
- 训练相关:
| 算法名称 | 模型名称 | 单机单卡 | 单机多卡 | 多机多卡 | 模型压缩(单机多卡) |
| :---- | :---- | :---- | :---- | :---- | :---- |
| DB | ch_ppocr_mobile_v2.0_det| 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | 正常训练:FPGM裁剪、PACT量化 <br> 离线量化(无需训练) |
| DB | ch_ppocr_server_v2.0_det| 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | 正常训练:FPGM裁剪、PACT量化 <br> 离线量化(无需训练) |
| CRNN | ch_ppocr_mobile_v2.0_rec| 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | 正常训练:PACT量化 <br> 离线量化(无需训练) |
| CRNN | ch_ppocr_server_v2.0_rec| 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | 正常训练:PACT量化 <br> 离线量化(无需训练) |
|PP-OCR| ch_ppocr_mobile_v2.0| 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | - |
|PP-OCR| ch_ppocr_server_v2.0| 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | - |
|PP-OCRv2| ch_PP-OCRv2 | 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | 正常训练 <br> 混合精度 | - |
- 预测相关:基于训练是否使用量化,可以将训练产出的模型可以分为`正常模型``量化模型`,这两类模型对应的预测功能汇总如下,
| 模型类型 |device | batchsize | tensorrt | mkldnn | cpu多线程 |
| ---- | ---- | ---- | :----: | :----: | :----: |
| 正常模型 | GPU | 1/6 | fp32/fp16 | - | - |
| 正常模型 | CPU | 1/6 | - | fp32 | 支持 |
| 量化模型 | GPU | 1/6 | int8 | - | - |
| 量化模型 | CPU | 1/6 | - | int8 | 支持 |
## 2. 测试流程
### 2.1 安装依赖
- 安装PaddlePaddle >= 2.0
- 安装PaddleOCR依赖
```
pip3 install -r ../requirements.txt
```
- 安装autolog(规范化日志输出工具)
```
git clone https://github.com/LDOUBLEV/AutoLog
cd AutoLog
pip3 install -r requirements.txt
python3 setup.py bdist_wheel
pip3 install ./dist/auto_log-1.0.0-py3-none-any.whl
cd ../
```
### 2.2 功能测试
先运行`prepare.sh`准备数据和模型,然后运行`test_train_inference_python.sh`进行测试,最终在```tests/output```目录下生成`python_infer_*.log`格式的日志文件。
`test_train_inference_python.sh`包含5种运行模式,每种模式的运行数据不同,分别用于测试速度和精度,分别是:
- 模式1:lite_train_infer,使用少量数据训练,用于快速验证训练到预测的走通流程,不验证精度和速度;
```shell
bash tests/prepare.sh ./tests/configs/ppocr_det_mobile_params.txt 'lite_train_infer'
bash tests/test_train_inference_python.sh ./tests/configs/ppocr_det_mobile_params.txt 'lite_train_infer'
```
- 模式2:whole_infer,使用少量数据训练,一定量数据预测,用于验证训练后的模型执行预测,预测速度是否合理;
```shell
bash tests/prepare.sh ./tests/configs/ppocr_det_mobile_params.txt 'whole_infer'
bash tests/test_train_inference_python.sh ./tests/configs/ppocr_det_mobile_params.txt 'whole_infer'
```
- 模式3:infer,不训练,全量数据预测,走通开源模型评估、动转静,检查inference model预测时间和精度;
```shell
bash tests/prepare.sh ./tests/configs/ppocr_det_mobile_params.txt 'infer'
# 用法1:
bash tests/test_train_inference_python.sh ./tests/configs/ppocr_det_mobile_params.txt 'infer'
# 用法2: 指定GPU卡预测,第三个传入参数为GPU卡号
bash tests/test_train_inference_python.sh ./tests/configs/ppocr_det_mobile_params.txt 'infer' '1'
```
- 模式4:whole_train_infer,CE: 全量数据训练,全量数据预测,验证模型训练精度,预测精度,预测速度;
```shell
bash tests/prepare.sh ./tests/configs/ppocr_det_mobile_params.txt 'whole_train_infer'
bash tests/test_train_inference_python.sh ./tests/configs/ppocr_det_mobile_params.txt 'whole_train_infer'
```
- 模式5:klquant_infer,测试离线量化;
```shell
bash tests/prepare.sh ./tests/configs/ppocr_det_mobile_params.txt 'klquant_infer'
bash tests/test_train_inference_python.sh tests/configs/ppocr_det_mobile_params.txt 'klquant_infer'
```
### 2.3 精度测试
使用compare_results.py脚本比较模型预测的结果是否符合预期,主要步骤包括:
- 提取日志中的预测坐标;
- 从本地文件中提取保存好的坐标结果;
- 比较上述两个结果是否符合精度预期,误差大于设置阈值时会报错。
#### 使用方式
运行命令:
```shell
python3.7 tests/compare_results.py --gt_file=./tests/results/python_*.txt --log_file=./tests/output/python_*.log --atol=1e-3 --rtol=1e-3
```
参数介绍:
- gt_file: 指向事先保存好的预测结果路径,支持*.txt 结尾,会自动索引*.txt格式的文件,文件默认保存在tests/result/ 文件夹下
- log_file: 指向运行tests/test.sh 脚本的infer模式保存的预测日志,预测日志中打印的有预测结果,比如:文本框,预测文本,类别等等,同样支持infer_*.log格式传入
- atol: 设置的绝对误差
- rtol: 设置的相对误差
#### 运行结果
正常运行效果如下图:
<img src="compare_right.png" width="1000">
出现不一致结果时的运行输出:
<img src="compare_wrong.png" width="1000">
## 3. 更多教程
本文档为功能测试用,更丰富的训练预测使用教程请参考:
[模型训练](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/training.md)
[基于Python预测引擎推理](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/inference.md)
#!/bin/bash
FILENAME=$1
# MODE be one of ['lite_train_infer' 'whole_infer' 'whole_train_infer', 'infer']
# MODE be one of ['lite_train_infer' 'whole_infer' 'whole_train_infer', 'infer',
# 'cpp_infer', 'serving_infer', 'klquant_infer']
MODE=$2
dataline=$(cat ${FILENAME})
......@@ -34,14 +37,19 @@ MODE=$2
if [ ${MODE} = "lite_train_infer" ];then
# pretrain lite train data
wget -nc -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams
wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar
cd ./pretrain_models/ && tar xf det_mv3_db_v2.0_train.tar && cd ../
rm -rf ./train_data/icdar2015
rm -rf ./train_data/ic15_data
wget -nc -P ./train_data/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/icdar2015_lite.tar
wget -nc -P ./train_data/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ic15_data.tar # todo change to bcebos
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/rec_inference.tar
wget -nc -P ./deploy/slim/prune https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/sen.pickle
cd ./train_data/ && tar xf icdar2015_lite.tar && tar xf ic15_data.tar
ln -s ./icdar2015_lite ./icdar2015
cd ../
cd ./inference && tar xf rec_inference.tar && cd ../
elif [ ${MODE} = "whole_train_infer" ];then
wget -nc -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams
rm -rf ./train_data/icdar2015
......@@ -58,19 +66,73 @@ elif [ ${MODE} = "whole_infer" ];then
cd ./train_data/ && tar xf icdar2015_infer.tar && tar xf ic15_data.tar
ln -s ./icdar2015_infer ./icdar2015
cd ../
else
elif [ ${MODE} = "infer" ];then
if [ ${model_name} = "ocr_det" ]; then
eval_model_name="ch_ppocr_mobile_v2.0_det_infer"
eval_model_name="ch_ppocr_mobile_v2.0_det_train"
rm -rf ./train_data/icdar2015
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar
cd ./inference && tar xf ${eval_model_name}.tar && tar xf ch_det_data_50.tar && cd ../
else
elif [ ${model_name} = "ocr_server_det" ]; then
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
cd ./inference && tar xf ch_ppocr_server_v2.0_det_train.tar && tar xf ch_det_data_50.tar && cd ../
elif [ ${model_name} = "ocr_system_mobile" ]; then
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar
cd ./inference && tar xf ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_det_data_50.tar && cd ../
elif [ ${model_name} = "ocr_system_server" ]; then
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar
cd ./inference && tar xf ch_ppocr_server_v2.0_det_infer.tar && tar xf ch_ppocr_server_v2.0_rec_infer.tar && tar xf ch_det_data_50.tar && cd ../
elif [ ${model_name} = "ocr_rec" ]; then
rm -rf ./train_data/ic15_data
eval_model_name="ch_ppocr_mobile_v2.0_rec_infer"
wget -nc -P ./train_data/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ic15_data.tar
wget -nc -P ./inference/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/rec_inference.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar
cd ./inference && tar xf ${eval_model_name}.tar && tar xf rec_inference.tar && cd ../
elif [ ${model_name} = "ocr_server_rec" ]; then
rm -rf ./train_data/ic15_data
eval_model_name="ch_ppocr_server_v2.0_rec_infer"
wget -nc -P ./inference/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/rec_inference.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar
cd ./inference && tar xf ${eval_model_name}.tar && tar xf rec_inference.tar && cd ../
fi
elif [ ${MODE} = "klquant_infer" ];then
if [ ${model_name} = "ocr_det" ]; then
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
cd ./inference && tar xf ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_det_data_50.tar && cd ../
fi
elif [ ${MODE} = "cpp_infer" ];then
if [ ${model_name} = "ocr_det" ]; then
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
cd ./inference && tar xf ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_det_data_50.tar && cd ../
elif [ ${model_name} = "ocr_rec" ]; then
wget -nc -P ./inference/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/rec_inference.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar
cd ./inference && tar xf ${eval_model_name}.tar && tar xf ic15_data.tar && cd ../
cd ./inference && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf rec_inference.tar && cd ../
elif [ ${model_name} = "ocr_system" ]; then
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/ch_det_data_50.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar
cd ./inference && tar xf ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_det_data_50.tar && cd ../
fi
fi
if [ ${MODE} = "serving_infer" ];then
# prepare serving env
python_name=$(func_parser_value "${lines[2]}")
wget https://paddle-serving.bj.bcebos.com/chain/paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl
${python_name} -m pip install install paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl
${python_name} -m pip install paddle_serving_client==0.6.1
${python_name} -m pip install paddle-serving-app==0.6.3
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar
wget -nc -P ./inference https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar
cd ./inference && tar xf ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_server_v2.0_rec_infer.tar && tar xf ch_ppocr_server_v2.0_det_infer.tar && cd ../
fi
# 推理部署导航
## 1. 简介
飞桨除了基本的模型训练和预测,还提供了支持多端多平台的高性能推理部署工具。本文档提供了PaddleOCR中所有模型的推理部署导航PTDN(Paddle Train Deploy Navigation),方便用户查阅每种模型的推理部署打通情况,并可以进行一键测试。
<div align="center">
<img src="docs/guide.png" width="1000">
</div>
## 2. 汇总信息
打通情况汇总如下,已填写的部分表示可以使用本工具进行一键测试,未填写的表示正在支持中。
**字段说明:**
- 基础训练预测:包括模型训练、Paddle Inference Python预测。
- 更多训练方式:包括多机多卡、混合精度。
- 模型压缩:包括裁剪、离线/在线量化、蒸馏。
- 其他预测部署:包括Paddle Inference C++预测、Paddle Serving部署、Paddle-Lite部署等。
更详细的mkldnn、Tensorrt等预测加速相关功能的支持情况可以查看各测试工具的[更多教程](#more)
| 算法论文 | 模型名称 | 模型类型 | 基础<br>训练预测 | 更多<br>训练方式 | 模型压缩 | 其他预测部署 |
| :--- | :--- | :----: | :--------: | :---- | :---- | :---- |
| DB |ch_ppocr_mobile_v2.0_det | 检测 | 支持 | 多机多卡 <br> 混合精度 | FPGM裁剪 <br> 离线量化| Paddle Inference: C++ <br> Paddle Serving: Python, C++ <br> Paddle-Lite: <br> (1) ARM CPU(C++) |
| DB |ch_ppocr_server_v2.0_det | 检测 | 支持 | 多机多卡 <br> 混合精度 | FPGM裁剪 <br> 离线量化| Paddle Inference: C++ <br> Paddle Serving: Python, C++ <br> Paddle-Lite: <br> (1) ARM CPU(C++) |
| DB |ch_PP-OCRv2_det | 检测 |
| CRNN |ch_ppocr_mobile_v2.0_rec | 识别 | 支持 | 多机多卡 <br> 混合精度 | PACT量化 <br> 离线量化| Paddle Inference: C++ <br> Paddle Serving: Python, C++ <br> Paddle-Lite: <br> (1) ARM CPU(C++) |
| CRNN |ch_ppocr_server_v2.0_rec | 识别 | 支持 | 多机多卡 <br> 混合精度 | PACT量化 <br> 离线量化| Paddle Inference: C++ <br> Paddle Serving: Python, C++ <br> Paddle-Lite: <br> (1) ARM CPU(C++) |
| CRNN |ch_PP-OCRv2_rec | 识别 |
| PP-OCR |ch_ppocr_mobile_v2.0 | 检测+识别 | 支持 | 多机多卡 <br> 混合精度 | - | Paddle Inference: C++ <br> Paddle Serving: Python, C++ <br> Paddle-Lite: <br> (1) ARM CPU(C++) |
| PP-OCR |ch_ppocr_server_v2.0 | 检测+识别 | 支持 | 多机多卡 <br> 混合精度 | - | Paddle Inference: C++ <br> Paddle Serving: Python, C++ <br> Paddle-Lite: <br> (1) ARM CPU(C++) |
|PP-OCRv2|ch_PP-OCRv2 | 检测+识别 |
| DB |det_mv3_db_v2.0 | 检测 |
| DB |det_r50_vd_db_v2.0 | 检测 |
| EAST |det_mv3_east_v2.0 | 检测 |
| EAST |det_r50_vd_east_v2.0 | 检测 |
| PSENet |det_mv3_pse_v2.0 | 检测 |
| PSENet |det_r50_vd_pse_v2.0 | 检测 |
| SAST |det_r50_vd_sast_totaltext_v2.0 | 检测 |
| Rosetta|rec_mv3_none_none_ctc_v2.0 | 识别 |
| Rosetta|rec_r34_vd_none_none_ctc_v2.0 | 识别 |
| CRNN |rec_mv3_none_bilstm_ctc_v2.0 | 识别 |
| CRNN |rec_r34_vd_none_bilstm_ctc_v2.0| 识别 |
| StarNet|rec_mv3_tps_bilstm_ctc_v2.0 | 识别 |
| StarNet|rec_r34_vd_tps_bilstm_ctc_v2.0 | 识别 |
| RARE |rec_mv3_tps_bilstm_att_v2.0 | 识别 |
| RARE |rec_r34_vd_tps_bilstm_att_v2.0 | 识别 |
| SRN |rec_r50fpn_vd_none_srn | 识别 |
| NRTR |rec_mtb_nrtr | 识别 |
| SAR |rec_r31_sar | 识别 |
| PGNet |rec_r34_vd_none_none_ctc_v2.0 | 端到端|
## 3. 一键测试工具使用
### 目录介绍
```shell
PTDN/
├── configs/ # 配置文件目录
├── det_mv3_db.yml # 测试mobile版ppocr检测模型训练的yml文件
├── det_r50_vd_db.yml # 测试server版ppocr检测模型训练的yml文件
├── rec_icdar15_r34_train.yml # 测试server版ppocr识别模型训练的yml文件
├── ppocr_sys_mobile_params.txt # 测试mobile版ppocr检测+识别模型串联的参数配置文件
├── ppocr_det_mobile_params.txt # 测试mobile版ppocr检测模型的参数配置文件
├── ppocr_rec_mobile_params.txt # 测试mobile版ppocr识别模型的参数配置文件
├── ppocr_sys_server_params.txt # 测试server版ppocr检测+识别模型串联的参数配置文件
├── ppocr_det_server_params.txt # 测试server版ppocr检测模型的参数配置文件
├── ppocr_rec_server_params.txt # 测试server版ppocr识别模型的参数配置文件
├── ...
├── results/ # 预先保存的预测结果,用于和实际预测结果进行精读比对
├── python_ppocr_det_mobile_results_fp32.txt # 预存的mobile版ppocr检测模型python预测fp32精度的结果
├── python_ppocr_det_mobile_results_fp16.txt # 预存的mobile版ppocr检测模型python预测fp16精度的结果
├── cpp_ppocr_det_mobile_results_fp32.txt # 预存的mobile版ppocr检测模型c++预测的fp32精度的结果
├── cpp_ppocr_det_mobile_results_fp16.txt # 预存的mobile版ppocr检测模型c++预测的fp16精度的结果
├── ...
├── prepare.sh # 完成test_*.sh运行所需要的数据和模型下载
├── test_train_inference_python.sh # 测试python训练预测的主程序
├── test_inference_cpp.sh # 测试c++预测的主程序
├── test_serving.sh # 测试serving部署预测的主程序
├── test_lite.sh # 测试lite部署预测的主程序
├── compare_results.py # 用于对比log中的预测结果与results中的预存结果精度误差是否在限定范围内
└── readme.md # 使用文档
```
### 测试流程
使用本工具,可以测试不同功能的支持情况,以及预测结果是否对齐,测试流程如下:
<div align="center">
<img src="docs/test.png" width="800">
</div>
1. 运行prepare.sh准备测试所需数据和模型;
2. 运行要测试的功能对应的测试脚本`test_*.sh`,产出log,由log可以看到不同配置是否运行成功;
3.`compare_results.py`对比log中的预测结果和预存在results目录下的结果,判断预测精度是否符合预期(在误差范围内)。
其中,有4个测试主程序,功能如下:
- `test_train_inference_python.sh`:测试基于Python的模型训练、评估、推理等基本功能,包括裁剪、量化、蒸馏。
- `test_inference_cpp.sh`:测试基于C++的模型推理。
- `test_serving.sh`:测试基于Paddle Serving的服务化部署功能。
- `test_lite.sh`:测试基于Paddle-Lite的端侧预测部署功能。
<a name="more"></a>
#### 更多教程
各功能测试中涉及混合精度、裁剪、量化等训练相关,及mkldnn、Tensorrt等多种预测相关参数配置,请点击下方相应链接了解更多细节和使用教程:
[test_train_inference_python 使用](docs/test_train_inference_python.md)
[test_inference_cpp 使用](docs/test_inference_cpp.md)
[test_serving 使用](docs/test_serving.md)
[test_lite 使用](docs/test_lite.md)
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
#!/bin/bash
source tests/common_func.sh
FILENAME=$1
dataline=$(awk 'NR==52, NR==66{print}' $FILENAME)
# parser params
IFS=$'\n'
lines=(${dataline})
# parser cpp inference model
use_opencv=$(func_parser_value "${lines[1]}")
cpp_infer_model_dir_list=$(func_parser_value "${lines[2]}")
cpp_infer_is_quant=$(func_parser_value "${lines[3]}")
# parser cpp inference
inference_cmd=$(func_parser_value "${lines[4]}")
cpp_use_gpu_key=$(func_parser_key "${lines[5]}")
cpp_use_gpu_list=$(func_parser_value "${lines[5]}")
cpp_use_mkldnn_key=$(func_parser_key "${lines[6]}")
cpp_use_mkldnn_list=$(func_parser_value "${lines[6]}")
cpp_cpu_threads_key=$(func_parser_key "${lines[7]}")
cpp_cpu_threads_list=$(func_parser_value "${lines[7]}")
cpp_batch_size_key=$(func_parser_key "${lines[8]}")
cpp_batch_size_list=$(func_parser_value "${lines[8]}")
cpp_use_trt_key=$(func_parser_key "${lines[9]}")
cpp_use_trt_list=$(func_parser_value "${lines[9]}")
cpp_precision_key=$(func_parser_key "${lines[10]}")
cpp_precision_list=$(func_parser_value "${lines[10]}")
cpp_infer_model_key=$(func_parser_key "${lines[11]}")
cpp_image_dir_key=$(func_parser_key "${lines[12]}")
cpp_infer_img_dir=$(func_parser_value "${lines[12]}")
cpp_infer_key1=$(func_parser_key "${lines[13]}")
cpp_infer_value1=$(func_parser_value "${lines[13]}")
cpp_benchmark_key=$(func_parser_key "${lines[14]}")
cpp_benchmark_value=$(func_parser_value "${lines[14]}")
LOG_PATH="./tests/output"
mkdir -p ${LOG_PATH}
status_log="${LOG_PATH}/results_cpp.log"
function func_cpp_inference(){
IFS='|'
_script=$1
_model_dir=$2
_log_path=$3
_img_dir=$4
_flag_quant=$5
# inference
for use_gpu in ${cpp_use_gpu_list[*]}; do
if [ ${use_gpu} = "False" ] || [ ${use_gpu} = "cpu" ]; then
for use_mkldnn in ${cpp_use_mkldnn_list[*]}; do
if [ ${use_mkldnn} = "False" ] && [ ${_flag_quant} = "True" ]; then
continue
fi
for threads in ${cpp_cpu_threads_list[*]}; do
for batch_size in ${cpp_batch_size_list[*]}; do
precision="fp32"
if [ ${use_mkldnn} = "False" ] && [ ${_flag_quant} = "True" ]; then
precison="int8"
fi
_save_log_path="${_log_path}/cpp_infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_precision_${precision}_batchsize_${batch_size}.log"
set_infer_data=$(func_set_params "${cpp_image_dir_key}" "${_img_dir}")
set_benchmark=$(func_set_params "${cpp_benchmark_key}" "${cpp_benchmark_value}")
set_batchsize=$(func_set_params "${cpp_batch_size_key}" "${batch_size}")
set_cpu_threads=$(func_set_params "${cpp_cpu_threads_key}" "${threads}")
set_model_dir=$(func_set_params "${cpp_infer_model_key}" "${_model_dir}")
set_infer_params1=$(func_set_params "${cpp_infer_key1}" "${cpp_infer_value1}")
command="${_script} ${cpp_use_gpu_key}=${use_gpu} ${cpp_use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 "
eval $command
last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}"
status_check $last_status "${command}" "${status_log}"
done
done
done
elif [ ${use_gpu} = "True" ] || [ ${use_gpu} = "gpu" ]; then
for use_trt in ${cpp_use_trt_list[*]}; do
for precision in ${cpp_precision_list[*]}; do
if [[ ${_flag_quant} = "False" ]] && [[ ${precision} =~ "int8" ]]; then
continue
fi
if [[ ${precision} =~ "fp16" || ${precision} =~ "int8" ]] && [ ${use_trt} = "False" ]; then
continue
fi
if [[ ${use_trt} = "False" || ${precision} =~ "int8" ]] && [ ${_flag_quant} = "True" ]; then
continue
fi
for batch_size in ${cpp_batch_size_list[*]}; do
_save_log_path="${_log_path}/cpp_infer_gpu_usetrt_${use_trt}_precision_${precision}_batchsize_${batch_size}.log"
set_infer_data=$(func_set_params "${cpp_image_dir_key}" "${_img_dir}")
set_benchmark=$(func_set_params "${cpp_benchmark_key}" "${cpp_benchmark_value}")
set_batchsize=$(func_set_params "${cpp_batch_size_key}" "${batch_size}")
set_tensorrt=$(func_set_params "${cpp_use_trt_key}" "${use_trt}")
set_precision=$(func_set_params "${cpp_precision_key}" "${precision}")
set_model_dir=$(func_set_params "${cpp_infer_model_key}" "${_model_dir}")
set_infer_params1=$(func_set_params "${cpp_infer_key1}" "${cpp_infer_value1}")
command="${_script} ${cpp_use_gpu_key}=${use_gpu} ${set_tensorrt} ${set_precision} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 "
eval $command
last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}"
status_check $last_status "${command}" "${status_log}"
done
done
done
else
echo "Does not support hardware other than CPU and GPU Currently!"
fi
done
}
cd deploy/cpp_infer
if [ ${use_opencv} = "True" ]; then
if [ -d "opencv-3.4.7/opencv3/" ] && [ $(md5sum opencv-3.4.7.tar.gz | awk -F ' ' '{print $1}') = "faa2b5950f8bee3f03118e600c74746a" ];then
echo "################### build opencv skipped ###################"
else
echo "################### build opencv ###################"
rm -rf opencv-3.4.7.tar.gz opencv-3.4.7/
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/opencv-3.4.7.tar.gz
tar -xf opencv-3.4.7.tar.gz
cd opencv-3.4.7/
install_path=$(pwd)/opencv3
rm -rf build
mkdir build
cd build
cmake .. \
-DCMAKE_INSTALL_PREFIX=${install_path} \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DWITH_IPP=OFF \
-DBUILD_IPP_IW=OFF \
-DWITH_LAPACK=OFF \
-DWITH_EIGEN=OFF \
-DCMAKE_INSTALL_LIBDIR=lib64 \
-DWITH_ZLIB=ON \
-DBUILD_ZLIB=ON \
-DWITH_JPEG=ON \
-DBUILD_JPEG=ON \
-DWITH_PNG=ON \
-DBUILD_PNG=ON \
-DWITH_TIFF=ON \
-DBUILD_TIFF=ON
make -j
make install
cd ../
echo "################### build opencv finished ###################"
fi
fi
echo "################### build PaddleOCR demo ####################"
if [ ${use_opencv} = "True" ]; then
OPENCV_DIR=$(pwd)/opencv-3.4.7/opencv3/
else
OPENCV_DIR=''
fi
LIB_DIR=$(pwd)/Paddle/build/paddle_inference_install_dir/
CUDA_LIB_DIR=$(dirname `find /usr -name libcudart.so`)
CUDNN_LIB_DIR=$(dirname `find /usr -name libcudnn.so`)
BUILD_DIR=build
rm -rf ${BUILD_DIR}
mkdir ${BUILD_DIR}
cd ${BUILD_DIR}
cmake .. \
-DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=ON \
-DWITH_GPU=OFF \
-DWITH_STATIC_LIB=OFF \
-DWITH_TENSORRT=OFF \
-DOPENCV_DIR=${OPENCV_DIR} \
-DCUDNN_LIB=${CUDNN_LIB_DIR} \
-DCUDA_LIB=${CUDA_LIB_DIR} \
-DTENSORRT_DIR=${TENSORRT_DIR} \
make -j
cd ../../../
echo "################### build PaddleOCR demo finished ###################"
# set cuda device
GPUID=$2
if [ ${#GPUID} -le 0 ];then
env=" "
else
env="export CUDA_VISIBLE_DEVICES=${GPUID}"
fi
set CUDA_VISIBLE_DEVICES
eval $env
echo "################### run test ###################"
export Count=0
IFS="|"
infer_quant_flag=(${cpp_infer_is_quant})
for infer_model in ${cpp_infer_model_dir_list[*]}; do
#run inference
is_quant=${infer_quant_flag[Count]}
func_cpp_inference "${inference_cmd}" "${infer_model}" "${LOG_PATH}" "${cpp_infer_img_dir}" ${is_quant}
Count=$(($Count + 1))
done
#!/bin/bash
source tests/common_func.sh
FILENAME=$1
dataline=$(awk 'NR==67, NR==83{print}' $FILENAME)
# parser params
IFS=$'\n'
lines=(${dataline})
# parser serving
model_name=$(func_parser_value "${lines[1]}")
python=$(func_parser_value "${lines[2]}")
trans_model_py=$(func_parser_value "${lines[3]}")
infer_model_dir_key=$(func_parser_key "${lines[4]}")
infer_model_dir_value=$(func_parser_value "${lines[4]}")
model_filename_key=$(func_parser_key "${lines[5]}")
model_filename_value=$(func_parser_value "${lines[5]}")
params_filename_key=$(func_parser_key "${lines[6]}")
params_filename_value=$(func_parser_value "${lines[6]}")
serving_server_key=$(func_parser_key "${lines[7]}")
serving_server_value=$(func_parser_value "${lines[7]}")
serving_client_key=$(func_parser_key "${lines[8]}")
serving_client_value=$(func_parser_value "${lines[8]}")
serving_dir_value=$(func_parser_value "${lines[9]}")
web_service_py=$(func_parser_value "${lines[10]}")
web_use_gpu_key=$(func_parser_key "${lines[11]}")
web_use_gpu_list=$(func_parser_value "${lines[11]}")
web_use_mkldnn_key=$(func_parser_key "${lines[12]}")
web_use_mkldnn_list=$(func_parser_value "${lines[12]}")
web_cpu_threads_key=$(func_parser_key "${lines[13]}")
web_cpu_threads_list=$(func_parser_value "${lines[13]}")
web_use_trt_key=$(func_parser_key "${lines[14]}")
web_use_trt_list=$(func_parser_value "${lines[14]}")
web_precision_key=$(func_parser_key "${lines[15]}")
web_precision_list=$(func_parser_value "${lines[15]}")
pipeline_py=$(func_parser_value "${lines[16]}")
LOG_PATH="../../tests/output"
mkdir -p ./tests/output
status_log="${LOG_PATH}/results_serving.log"
function func_serving(){
IFS='|'
_python=$1
_script=$2
_model_dir=$3
# pdserving
set_dirname=$(func_set_params "${infer_model_dir_key}" "${infer_model_dir_value}")
set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
set_serving_server=$(func_set_params "${serving_server_key}" "${serving_server_value}")
set_serving_client=$(func_set_params "${serving_client_key}" "${serving_client_value}")
trans_model_cmd="${python} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}"
eval $trans_model_cmd
cd ${serving_dir_value}
echo $PWD
unset https_proxy
unset http_proxy
for use_gpu in ${web_use_gpu_list[*]}; do
echo ${ues_gpu}
if [ ${use_gpu} = "null" ]; then
for use_mkldnn in ${web_use_mkldnn_list[*]}; do
if [ ${use_mkldnn} = "False" ]; then
continue
fi
for threads in ${web_cpu_threads_list[*]}; do
_save_log_path="${LOG_PATH}/server_infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_batchsize_1.log"
set_cpu_threads=$(func_set_params "${web_cpu_threads_key}" "${threads}")
web_service_cmd="${python} ${web_service_py} ${web_use_gpu_key}=${use_gpu} ${web_use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} &"
eval $web_service_cmd
sleep 2s
pipeline_cmd="${python} ${pipeline_py} > ${_save_log_path} 2>&1 "
eval $pipeline_cmd
last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}"
status_check $last_status "${pipeline_cmd}" "${status_log}"
PID=$!
kill $PID
sleep 2s
ps ux | grep -E 'web_service|pipeline' | awk '{print $2}' | xargs kill -s 9
done
done
elif [ ${use_gpu} = "0" ]; then
for use_trt in ${web_use_trt_list[*]}; do
for precision in ${web_precision_list[*]}; do
if [[ ${_flag_quant} = "False" ]] && [[ ${precision} =~ "int8" ]]; then
continue
fi
if [[ ${precision} =~ "fp16" || ${precision} =~ "int8" ]] && [ ${use_trt} = "False" ]; then
continue
fi
if [[ ${use_trt} = "False" || ${precision} =~ "int8" ]] && [[ ${_flag_quant} = "True" ]]; then
continue
fi
_save_log_path="${LOG_PATH}/server_infer_gpu_usetrt_${use_trt}_precision_${precision}_batchsize_1.log"
set_tensorrt=$(func_set_params "${web_use_trt_key}" "${use_trt}")
set_precision=$(func_set_params "${web_precision_key}" "${precision}")
web_service_cmd="${python} ${web_service_py} ${web_use_gpu_key}=${use_gpu} ${set_tensorrt} ${set_precision} & "
eval $web_service_cmd
sleep 2s
pipeline_cmd="${python} ${pipeline_py} > ${_save_log_path} 2>&1"
eval $pipeline_cmd
last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}"
status_check $last_status "${pipeline_cmd}" "${status_log}"
PID=$!
kill $PID
sleep 2s
ps ux | grep -E 'web_service|pipeline' | awk '{print $2}' | xargs kill -s 9
done
done
else
echo "Does not support hardware other than CPU and GPU Currently!"
fi
done
}
# set cuda device
GPUID=$2
if [ ${#GPUID} -le 0 ];then
env=" "
else
env="export CUDA_VISIBLE_DEVICES=${GPUID}"
fi
set CUDA_VISIBLE_DEVICES
eval $env
echo "################### run test ###################"
export Count=0
IFS="|"
func_serving "${web_service_cmd}"
#!/bin/bash
source tests/common_func.sh
FILENAME=$1
# MODE be one of ['lite_train_infer' 'whole_infer' 'whole_train_infer', 'infer']
# MODE be one of ['lite_train_infer' 'whole_infer' 'whole_train_infer', 'infer', 'klquant_infer']
MODE=$2
dataline=$(cat ${FILENAME})
dataline=$(awk 'NR==1, NR==51{print}' $FILENAME)
# parser params
IFS=$'\n'
lines=(${dataline})
function func_parser_key(){
strs=$1
IFS=":"
array=(${strs})
tmp=${array[0]}
echo ${tmp}
}
function func_parser_value(){
strs=$1
IFS=":"
array=(${strs})
tmp=${array[1]}
echo ${tmp}
}
function func_set_params(){
key=$1
value=$2
if [ ${key} = "null" ];then
echo " "
elif [[ ${value} = "null" ]] || [[ ${value} = " " ]] || [ ${#value} -le 0 ];then
echo " "
else
echo "${key}=${value}"
fi
}
function func_parser_params(){
strs=$1
IFS=":"
array=(${strs})
key=${array[0]}
tmp=${array[1]}
IFS="|"
res=""
for _params in ${tmp[*]}; do
IFS="="
array=(${_params})
mode=${array[0]}
value=${array[1]}
if [[ ${mode} = ${MODE} ]]; then
IFS="|"
#echo $(func_set_params "${mode}" "${value}")
echo $value
break
fi
IFS="|"
done
echo ${res}
}
function status_check(){
last_status=$1 # the exit code
run_command=$2
run_log=$3
if [ $last_status -eq 0 ]; then
echo -e "\033[33m Run successfully with command - ${run_command}! \033[0m" | tee -a ${run_log}
else
echo -e "\033[33m Run failed with command - ${run_command}! \033[0m" | tee -a ${run_log}
fi
}
IFS=$'\n'
# The training params
model_name=$(func_parser_value "${lines[1]}")
python=$(func_parser_value "${lines[2]}")
......@@ -145,9 +87,41 @@ benchmark_value=$(func_parser_value "${lines[49]}")
infer_key1=$(func_parser_key "${lines[50]}")
infer_value1=$(func_parser_value "${lines[50]}")
# parser klquant_infer
if [ ${MODE} = "klquant_infer" ]; then
dataline=$(awk 'NR==82, NR==98{print}' $FILENAME)
lines=(${dataline})
# parser inference model
infer_model_dir_list=$(func_parser_value "${lines[1]}")
infer_export_list=$(func_parser_value "${lines[2]}")
infer_is_quant=$(func_parser_value "${lines[3]}")
# parser inference
inference_py=$(func_parser_value "${lines[4]}")
use_gpu_key=$(func_parser_key "${lines[5]}")
use_gpu_list=$(func_parser_value "${lines[5]}")
use_mkldnn_key=$(func_parser_key "${lines[6]}")
use_mkldnn_list=$(func_parser_value "${lines[6]}")
cpu_threads_key=$(func_parser_key "${lines[7]}")
cpu_threads_list=$(func_parser_value "${lines[7]}")
batch_size_key=$(func_parser_key "${lines[8]}")
batch_size_list=$(func_parser_value "${lines[8]}")
use_trt_key=$(func_parser_key "${lines[9]}")
use_trt_list=$(func_parser_value "${lines[9]}")
precision_key=$(func_parser_key "${lines[10]}")
precision_list=$(func_parser_value "${lines[10]}")
infer_model_key=$(func_parser_key "${lines[11]}")
image_dir_key=$(func_parser_key "${lines[12]}")
infer_img_dir=$(func_parser_value "${lines[12]}")
save_log_key=$(func_parser_key "${lines[13]}")
benchmark_key=$(func_parser_key "${lines[14]}")
benchmark_value=$(func_parser_value "${lines[14]}")
infer_key1=$(func_parser_key "${lines[15]}")
infer_value1=$(func_parser_value "${lines[15]}")
fi
LOG_PATH="./tests/output"
mkdir -p ${LOG_PATH}
status_log="${LOG_PATH}/results.log"
status_log="${LOG_PATH}/results_python.log"
function func_inference(){
......@@ -167,18 +141,28 @@ function func_inference(){
fi
for threads in ${cpu_threads_list[*]}; do
for batch_size in ${batch_size_list[*]}; do
_save_log_path="${_log_path}/infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_batchsize_${batch_size}.log"
set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}")
set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}")
set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}")
set_cpu_threads=$(func_set_params "${cpu_threads_key}" "${threads}")
set_model_dir=$(func_set_params "${infer_model_key}" "${_model_dir}")
set_infer_params1=$(func_set_params "${infer_key1}" "${infer_value1}")
command="${_python} ${_script} ${use_gpu_key}=${use_gpu} ${use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 "
eval $command
last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}"
status_check $last_status "${command}" "${status_log}"
for precision in ${precision_list[*]}; do
if [ ${use_mkldnn} = "False" ] && [ ${precision} = "fp16" ]; then
continue
fi # skip when enable fp16 but disable mkldnn
if [ ${_flag_quant} = "True" ] && [ ${precision} != "int8" ]; then
continue
fi # skip when quant model inference but precision is not int8
set_precision=$(func_set_params "${precision_key}" "${precision}")
_save_log_path="${_log_path}/python_infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_precision_${precision}_batchsize_${batch_size}.log"
set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}")
set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}")
set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}")
set_cpu_threads=$(func_set_params "${cpu_threads_key}" "${threads}")
set_model_dir=$(func_set_params "${infer_model_key}" "${_model_dir}")
set_infer_params1=$(func_set_params "${infer_key1}" "${infer_value1}")
command="${_python} ${_script} ${use_gpu_key}=${use_gpu} ${use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_precision} ${set_infer_params1} > ${_save_log_path} 2>&1 "
eval $command
last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}"
status_check $last_status "${command}" "${status_log}"
done
done
done
done
......@@ -195,7 +179,7 @@ function func_inference(){
continue
fi
for batch_size in ${batch_size_list[*]}; do
_save_log_path="${_log_path}/infer_gpu_usetrt_${use_trt}_precision_${precision}_batchsize_${batch_size}.log"
_save_log_path="${_log_path}/python_infer_gpu_usetrt_${use_trt}_precision_${precision}_batchsize_${batch_size}.log"
set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}")
set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}")
set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}")
......@@ -218,7 +202,7 @@ function func_inference(){
done
}
if [ ${MODE} = "infer" ]; then
if [ ${MODE} = "infer" ] || [ ${MODE} = "klquant_infer" ]; then
GPUID=$3
if [ ${#GPUID} -le 0 ];then
env=" "
......@@ -237,21 +221,23 @@ if [ ${MODE} = "infer" ]; then
save_infer_dir=$(dirname $infer_model)
set_export_weight=$(func_set_params "${export_weight}" "${infer_model}")
set_save_infer_key=$(func_set_params "${save_infer_key}" "${save_infer_dir}")
export_cmd="${python} ${norm_export} ${set_export_weight} ${set_save_infer_key}"
export_cmd="${python} ${infer_run_exports[Count]} ${set_export_weight} ${set_save_infer_key}"
echo ${infer_run_exports[Count]}
echo $export_cmd
eval $export_cmd
status_export=$?
if [ ${status_export} = 0 ];then
status_check $status_export "${export_cmd}" "${status_log}"
fi
status_check $status_export "${export_cmd}" "${status_log}"
else
save_infer_dir=${infer_model}
fi
#run inference
is_quant=${infer_quant_flag[Count]}
if [ ${MODE} = "klquant_infer" ]; then
is_quant="True"
fi
func_inference "${python}" "${inference_py}" "${save_infer_dir}" "${LOG_PATH}" "${infer_img_dir}" ${is_quant}
Count=$(($Count + 1))
done
else
IFS="|"
export Count=0
......@@ -363,3 +349,4 @@ else
done # done with: for autocast in ${autocast_list[*]}; do
done # done with: for gpu in ${gpu_list[*]}; do
fi # end if [ ${MODE} = "infer" ]; then
English | [简体中文](README_ch.md)
<p align="center">
<img src="./doc/PaddleOCR_log.png" align="middle" width = "600"/>
<p align="center">
------------------------------------------------------------------------------------------
<p align="left">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleOCR?color=ffa"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/pypi/format/PaddleOCR?color=c77"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleOCR?color=9ea"></a>
<a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a>
</p>
## Introduction
PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.
## Notice
PaddleOCR supports both dynamic graph and static graph programming paradigm
- Dynamic graph: dygraph branch (default), **supported by paddle 2.0.0 ([installation](./doc/doc_en/installation_en.md))**
- Static graph: develop branch
**Recent updates**
- 2021.1.21 update more than 25+ multilingual recognition models [models list](./doc/doc_en/models_list_en.md), including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated [Develop Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](./StyleText/README.md),easy to synthesize a large number of images which are similar to the target scene image.
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](./PPOCRLabel/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
- PaddleOCR R&D team would like to share the key points of PP-OCRv2, at 20:15 pm on September 8th, [Live Address](https://live.bilibili.com/21689802).
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
- [more](./doc/doc_en/update_en.md)
## Features
- PPOCR series of high-quality pre-trained models, comparable to commercial effects
- Ultra lightweight ppocr_mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M
- General ppocr_server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M
- PP-OCR series of high-quality pre-trained models, comparable to commercial effects
- Ultra lightweight PP-OCRv2 series models: detection (3.1M) + direction classifier (1.4M) + recognition 8.5M) = 13.0M
- Ultra lightweight PP-OCR mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M
- General PP-OCR server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
- Support multi-language recognition: Korean, Japanese, German, French
- Rich toolkits related to the OCR areas
......@@ -64,39 +82,45 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr
<a name="Supported-Chinese-model-list"></a>
## PP-OCR 2.0 series model list(Update on Dec 15)
**Note** : Compared with [models 1.1](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/models_list_en.md), which are trained with static graph programming paradigm, models 2.0 are the dynamic graph trained version and achieve close performance.
## PP-OCR Series Model List(Update on September 8th)
| Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
| ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| Chinese and English ultra-lightweight OCR model (9.4M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
| Chinese and English general OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_traingit.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
| Chinese and English ultra-lightweight PP-OCRv2 model(11.6M) | ch_PP-OCRv2_xx |Mobile&Server|[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| [inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/PP-OCRv2/ch/ch_PP-OCRv2_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)|
| Chinese and English ultra-lightweight PP-OCR model (9.4M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
| Chinese and English general PP-OCR model (143.4M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_traingit.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
For more model downloads (including multiple languages), please refer to [PP-OCR v2.0 series model downloads](./doc/doc_en/models_list_en.md).
For more model downloads (including multiple languages), please refer to [PP-OCR series model downloads](./doc/doc_en/models_list_en.md).
For a new language request, please refer to [Guideline for new language_requests](#language_requests).
## Tutorials
- [Installation](./doc/doc_en/installation_en.md)
- [Environment Preparation](./doc/doc_en/environment_en.md)
- [Quick Start](./doc/doc_en/quickstart_en.md)
- [Code Structure](./doc/doc_en/tree_en.md)
- Algorithm Introduction
- [Text Detection Algorithm](./doc/doc_en/algorithm_overview_en.md)
- [Text Recognition Algorithm](./doc/doc_en/algorithm_overview_en.md)
- [PP-OCR Pipeline](#PP-OCR-Pipeline)
- Model Training/Evaluation
- [Text Detection](./doc/doc_en/detection_en.md)
- [Text Recognition](./doc/doc_en/recognition_en.md)
- [Direction Classification](./doc/doc_en/angle_class_en.md)
- [Yml Configuration](./doc/doc_en/config_en.md)
- Inference and Deployment
- [Quick Inference Based on PIP](./doc/doc_en/whl_en.md)
- [PaddleOCR Overview and Installation](./doc/doc_en/paddleOCR_overview_en.md)
- PP-OCR Industry Landing: from Training to Deployment
- [PP-OCR Model and Configuration](./doc/doc_en/models_and_config_en.md)
- [PP-OCR Model Download](./doc/doc_en/models_list_en.md)
- [Yml Configuration](./doc/doc_en/config_en.md)
- [Python Inference for PP-OCR Model Library](./doc/doc_en/inference_ppocr_en.md)
- [PP-OCR Training](./doc/doc_en/training_en.md)
- [Text Detection](./doc/doc_en/detection_en.md)
- [Text Recognition](./doc/doc_en/recognition_en.md)
- [Text Direction Classification](./doc/doc_en/angle_class_en.md)
- [Yml Configuration](./doc/doc_en/config_en.md)
- Inference and Deployment
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
- [Serving](./deploy/pdserving/README.md)
- [Mobile](./deploy/lite/readme_en.md)
- [Benchmark](./doc/doc_en/benchmark_en.md)
- [PP-Structure: Information Extraction](./ppstructure/README.md)
- [Layout Parser](./ppstructure/layout/README.md)
- [Table Recognition](./ppstructure/table/README.md)
- Academic Circles
- [Two-stage Algorithm](./doc/doc_en/algorithm_overview_en.md)
- [PGNet Algorithm](./doc/doc_en/algorithm_overview_en.md)
- [Python Inference](./doc/doc_en/inference_en.md)
- [C++ Inference](./deploy/cpp_infer/readme_en.md)
- [Serving](./deploy/pdserving/README.md)
- [Mobile](./deploy/lite/readme_en.md)
- [Benchmark](./doc/doc_en/benchmark_en.md)
- Data Annotation and Synthesis
- [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md)
- [Data Synthesis Tool: Style-Text](./StyleText/README.md)
......@@ -114,17 +138,18 @@ For a new language request, please refer to [Guideline for new language_requests
- [License](#LICENSE)
- [Contribution](#CONTRIBUTION)
<a name="PP-OCRv2"></a>
## PP-OCRv2 Pipeline
<div align="center">
<img src="./doc/ppocrv2_framework.jpg" width="800">
</div>
<a name="PP-OCR-Pipeline"></a>
[1] PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).
## PP-OCR Pipeline
[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2 (arXiv link is coming soon).
<div align="center">
<img src="./doc/ppocr_framework.png" width="800">
</div>
PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection[2], detection frame correction and CRNN text recognition[7]. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module. The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941). Besides, The implementation of the FPGM Pruner [8] and PACT quantization [9] is based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim).
## Visualization [more](./doc/doc_en/visualization_en.md)
......@@ -149,7 +174,7 @@ PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of thr
<a name="language_requests"></a>
## Guideline for new language requests
## Guideline for New Language Requests
If you want to request a new language support, a PR with 2 following files are needed:
......
[English](README.md) | 简体中文
<p align="center">
<img src="./doc/PaddleOCR_log.png" align="middle" width = "600"/>
<p align="center">
------------------------------------------------------------------------------------------
<p align="left">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleOCR?color=ffa"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/pypi/format/PaddleOCR?color=c77"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleOCR?color=9ea"></a>
<a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a>
</p>
## 简介
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
## 注意
PaddleOCR同时支持动态图与静态图两种编程范式
- 动态图版本:dygraph分支(默认),需将paddle版本升级至2.0.0([快速安装](./doc/doc_ch/installation.md)
- 静态图版本:develop分支
**近期更新**
- 2021.4.8 release 2.1版本,新增AAAI 2021论文[端到端识别算法PGNet](./doc/doc_ch/pgnet.md)开源,[多语言模型](./doc/doc_ch/multi_languages.md)支持种类增加到80+。
- 2021.2.1 [FAQ](./doc/doc_ch/FAQ.md)新增5个高频问题,总数162个,每周一都会更新,欢迎大家持续关注。
- 2021.1.21 更新多语言识别模型,目前支持语种超过27种,包括中文简体、中文繁体、英文、法文、德文、韩文、日文、意大利文、西班牙文、葡萄牙文、俄罗斯文、阿拉伯文等,后续计划可以参考[多语言研发计划](https://github.com/PaddlePaddle/PaddleOCR/issues/1048)
- 2020.12.15 更新数据合成工具[Style-Text](./StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。
- 2020.11.25 更新半自动标注工具[PPOCRLabel](./PPOCRLabel/README_ch.md),辅助开发者高效完成标注任务,输出格式与PP-OCR训练任务完美衔接。
- 2020.9.22 更新PP-OCR技术文章,https://arxiv.org/abs/2009.09941
- [More](./doc/doc_ch/update.md)
- PaddleOCR研发团队对最新发版内容技术深入解读,9月8日晚上20:15,[直播地址](https://live.bilibili.com/21689802)
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
- 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题,总数248个,每周一都会更新,欢迎大家持续关注。
- 2021.4.8 release 2.1版本,新增AAAI 2021论文[端到端识别算法PGNet](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/pgnet.md)开源,[多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/multi_languages.md)支持种类增加到80+。
- [More](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/update.md)
## 特性
- PPOCR系列高质量预训练模型,准确的识别效果
- 超轻量ppocr_mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
- 通用ppocr_server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
- PP-OCR系列高质量预训练模型,准确的识别效果
- 超轻量PP-OCRv2系列:检测(3.1M)+ 方向分类器(1.4M)+ 识别(8.5M)= 13.0M
- 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
- 通用PPOCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
- 支持中英文数字组合识别、竖排文本识别、长文本识别
- 支持多语言识别:韩语、日语、德语、法语
- 丰富易用的OCR相关工具组件
- 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
- 数据合成工具Style-Text:批量合成大量与目标场景类似的图像
- 文档分析能力PP-Structure:版面分析与表格识别
- 支持用户自定义训练,提供丰富的预测推理部署方案
- 支持PIP快速安装使用
- 可运行于Linux、Windows、MacOS等多种系统
......@@ -39,7 +54,7 @@ PaddleOCR同时支持动态图与静态图两种编程范式
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
</div>
上图是通用ppocr_server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)
上图是通用PP-OCR server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)
<a name="欢迎加入PaddleOCR技术交流群"></a>
## 欢迎加入PaddleOCR技术交流群
......@@ -62,71 +77,79 @@ PaddleOCR同时支持动态图与静态图两种编程范式
- 代码体验:从[快速安装](./doc/doc_ch/quickstart.md) 开始
<a name="模型下载"></a>
## PP-OCR 2.0系列模型列表(更新中)
**说明** :2.0版模型和[1.1版模型](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/models_list.md)的主要区别在于动态图训练vs.静态图训练,模型性能上无明显差距。
## PP-OCR系列模型列表(更新中)
| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
| 中英文超轻量OCR模型(9.4M) | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
| 中英文通用OCR模型(143.4M) |ch_ppocr_server_v2.0_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
| 中英文超轻量PP-OCRv2模型(13.0M) | ch_PP-OCRv2_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)|
| 中英文超轻量PP-OCR mobile模型(9.4M) | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar) |
| 中英文通用PP-OCR server模型(143.4M) |ch_ppocr_server_v2.0_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar) |
更多模型下载(包括多语言),可以参考[PP-OCR v2.0 系列模型下载](./doc/doc_ch/models_list.md)
更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md)
## 文档教程
- [快速安装](./doc/doc_ch/installation.md)
- [中文OCR模型快速使用](./doc/doc_ch/quickstart.md)
- [多语言OCR模型快速使用](./doc/doc_ch/multi_languages.md)
- [代码组织结构](./doc/doc_ch/tree.md)
- 算法介绍
- [文本检测](./doc/doc_ch/algorithm_overview.md)
- [文本识别](./doc/doc_ch/algorithm_overview.md)
- [PP-OCR Pipeline](#PP-OCR)
- [运行环境准备](./doc/doc_ch/environment.md)
- [快速开始(中英文/多语言/文档分析)](./doc/doc_ch/quickstart.md)
- [PaddleOCR全景图与项目克隆](./doc/doc_ch/paddleOCR_overview.md)
- PP-OCR产业落地:从训练到部署
- [PP-OCR模型与配置文件](./doc/doc_ch/models_and_config.md)
- [PP-OCR模型下载](./doc/doc_ch/models_list.md)
- [配置文件内容与生成](./doc/doc_ch/config.md)
- [PP-OCR模型库快速推理](./doc/doc_ch/inference_ppocr.md)
- [PP-OCR模型训练](./doc/doc_ch/training.md)
- [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md)
- [文本方向分类器](./doc/doc_ch/angle_class.md)
- [配置文件内容与生成](./doc/doc_ch/config.md)
- PP-OCR模型推理部署
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
- [服务化部署](./deploy/pdserving/README_CN.md)
- [端侧部署](./deploy/lite/readme.md)
- [Benchmark](./doc/doc_ch/benchmark.md)
- [PP-Structure信息提取](./ppstructure/README_ch.md)
- [版面分析](./ppstructure/layout/README_ch.md)
- [表格识别](./ppstructure/table/README_ch.md)
- 数据标注与合成
- [半自动标注工具PPOCRLabel](./PPOCRLabel/README_ch.md)
- [数据合成工具Style-Text](./StyleText/README_ch.md)
- [其它数据标注工具](./doc/doc_ch/data_annotation.md)
- [其它数据合成工具](./doc/doc_ch/data_synthesis.md)
- OCR学术圈
- [两阶段模型介绍与下载](./doc/doc_ch/algorithm_overview.md)
- [端到端PGNet算法](./doc/doc_ch/pgnet.md)
- 模型训练/评估
- [文本检测](./doc/doc_ch/detection.md)
- [文本识别](./doc/doc_ch/recognition.md)
- [方向分类器](./doc/doc_ch/angle_class.md)
- [yml参数配置文件介绍](./doc/doc_ch/config.md)
- 预测部署
- [基于pip安装whl包快速推理](./doc/doc_ch/whl.md)
- [基于Python脚本预测引擎推理](./doc/doc_ch/inference.md)
- [基于C++预测引擎推理](./deploy/cpp_infer/readme.md)
- [服务化部署](./deploy/pdserving/README_CN.md)
- [端侧部署](./deploy/lite/readme.md)
- [Benchmark](./doc/doc_ch/benchmark.md)
- 数据集
- [通用中英文OCR数据集](./doc/doc_ch/datasets.md)
- [手写中文OCR数据集](./doc/doc_ch/handwritten_datasets.md)
- [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md)
- 数据标注与合成
- [半自动标注工具PPOCRLabel](./PPOCRLabel/README_ch.md)
- [数据合成工具Style-Text](./StyleText/README_ch.md)
- [其它数据标注工具](./doc/doc_ch/data_annotation.md)
- [其它数据合成工具](./doc/doc_ch/data_synthesis.md)
- [效果展示](#效果展示)
- FAQ
- [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md)
- [【理论篇】OCR通用32个问题](./doc/doc_ch/FAQ.md)
- [【实战篇】PaddleOCR实战110个问题](./doc/doc_ch/FAQ.md)
- [【理论篇】OCR通用50个问题](./doc/doc_ch/FAQ.md)
- [【实战篇】PaddleOCR实战183个问题](./doc/doc_ch/FAQ.md)
- [技术交流群](#欢迎加入PaddleOCR技术交流群)
- [参考文献](./doc/doc_ch/reference.md)
- [许可证书](#许可证书)
- [贡献代码](#贡献代码)
- [代码组织结构](./doc/doc_ch/tree.md)
<a name="PP-OCRv2"></a>
<a name="PP-OCR"></a>
## PP-OCR Pipeline
## PP-OCRv2 Pipeline
<div align="center">
<img src="./doc/ppocr_framework.png" width="800">
<img src="./doc/ppocrv2_framework.jpg" width="800">
</div>
PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测[2]、检测框矫正和CRNN文本识别三部分组成[7]。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身,最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941 。其中FPGM裁剪器[8]和PACT量化[9]的实现可以参考[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
[1] PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941
[2] PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和Enhanced CTC loss损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCR技术方案(arxiv链接生成中)。
<a name="效果展示"></a>
## 效果展示 [more](./doc/doc_ch/visualization.md)
- 中文模型
<div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00015504.jpg" width="800">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
</div>
......
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import argparse
import json
import os
import re
import traceback
def parse_args():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--filename", type=str, help="The name of log which need to analysis.")
parser.add_argument(
"--log_with_profiler", type=str, help="The path of train log with profiler")
parser.add_argument(
"--profiler_path", type=str, help="The path of profiler timeline log.")
parser.add_argument(
"--keyword", type=str, help="Keyword to specify analysis data")
parser.add_argument(
"--separator", type=str, default=None, help="Separator of different field in log")
parser.add_argument(
'--position', type=int, default=None, help='The position of data field')
parser.add_argument(
'--range', type=str, default="", help='The range of data field to intercept')
parser.add_argument(
'--base_batch_size', type=int, help='base_batch size on gpu')
parser.add_argument(
'--skip_steps', type=int, default=0, help='The number of steps to be skipped')
parser.add_argument(
'--model_mode', type=int, default=-1, help='Analysis mode, default value is -1')
parser.add_argument(
'--ips_unit', type=str, default=None, help='IPS unit')
parser.add_argument(
'--model_name', type=str, default=0, help='training model_name, transformer_base')
parser.add_argument(
'--mission_name', type=str, default=0, help='training mission name')
parser.add_argument(
'--direction_id', type=int, default=0, help='training direction_id')
parser.add_argument(
'--run_mode', type=str, default="sp", help='multi process or single process')
parser.add_argument(
'--index', type=int, default=1, help='{1: speed, 2:mem, 3:profiler, 6:max_batch_size}')
parser.add_argument(
'--gpu_num', type=int, default=1, help='nums of training gpus')
args = parser.parse_args()
args.separator = None if args.separator == "None" else args.separator
return args
def _is_number(num):
pattern = re.compile(r'^[-+]?[-0-9]\d*\.\d*|[-+]?\.?[0-9]\d*$')
result = pattern.match(num)
if result:
return True
else:
return False
class TimeAnalyzer(object):
def __init__(self, filename, keyword=None, separator=None, position=None, range="-1"):
if filename is None:
raise Exception("Please specify the filename!")
if keyword is None:
raise Exception("Please specify the keyword!")
self.filename = filename
self.keyword = keyword
self.separator = separator
self.position = position
self.range = range
self.records = None
self._distil()
def _distil(self):
self.records = []
with open(self.filename, "r") as f_object:
lines = f_object.readlines()
for line in lines:
if self.keyword not in line:
continue
try:
result = None
# Distil the string from a line.
line = line.strip()
line_words = line.split(self.separator) if self.separator else line.split()
if args.position:
result = line_words[self.position]
else:
# Distil the string following the keyword.
for i in range(len(line_words) - 1):
if line_words[i] == self.keyword:
result = line_words[i + 1]
break
# Distil the result from the picked string.
if not self.range:
result = result[0:]
elif _is_number(self.range):
result = result[0: int(self.range)]
else:
result = result[int(self.range.split(":")[0]): int(self.range.split(":")[1])]
self.records.append(float(result))
except Exception as exc:
print("line is: {}; separator={}; position={}".format(line, self.separator, self.position))
print("Extract {} records: separator={}; position={}".format(len(self.records), self.separator, self.position))
def _get_fps(self, mode, batch_size, gpu_num, avg_of_records, run_mode, unit=None):
if mode == -1 and run_mode == 'sp':
assert unit, "Please set the unit when mode is -1."
fps = gpu_num * avg_of_records
elif mode == -1 and run_mode == 'mp':
assert unit, "Please set the unit when mode is -1."
fps = gpu_num * avg_of_records #temporarily, not used now
print("------------this is mp")
elif mode == 0:
# s/step -> samples/s
fps = (batch_size * gpu_num) / avg_of_records
unit = "samples/s"
elif mode == 1:
# steps/s -> steps/s
fps = avg_of_records
unit = "steps/s"
elif mode == 2:
# s/step -> steps/s
fps = 1 / avg_of_records
unit = "steps/s"
elif mode == 3:
# steps/s -> samples/s
fps = batch_size * gpu_num * avg_of_records
unit = "samples/s"
elif mode == 4:
# s/epoch -> s/epoch
fps = avg_of_records
unit = "s/epoch"
else:
ValueError("Unsupported analysis mode.")
return fps, unit
def analysis(self, batch_size, gpu_num=1, skip_steps=0, mode=-1, run_mode='sp', unit=None):
if batch_size <= 0:
print("base_batch_size should larger than 0.")
return 0, ''
if len(self.records) <= skip_steps: # to address the condition which item of log equals to skip_steps
print("no records")
return 0, ''
sum_of_records = 0
sum_of_records_skipped = 0
skip_min = self.records[skip_steps]
skip_max = self.records[skip_steps]
count = len(self.records)
for i in range(count):
sum_of_records += self.records[i]
if i >= skip_steps:
sum_of_records_skipped += self.records[i]
if self.records[i] < skip_min:
skip_min = self.records[i]
if self.records[i] > skip_max:
skip_max = self.records[i]
avg_of_records = sum_of_records / float(count)
avg_of_records_skipped = sum_of_records_skipped / float(count - skip_steps)
fps, fps_unit = self._get_fps(mode, batch_size, gpu_num, avg_of_records, run_mode, unit)
fps_skipped, _ = self._get_fps(mode, batch_size, gpu_num, avg_of_records_skipped, run_mode, unit)
if mode == -1:
print("average ips of %d steps, skip 0 step:" % count)
print("\tAvg: %.3f %s" % (avg_of_records, fps_unit))
print("\tFPS: %.3f %s" % (fps, fps_unit))
if skip_steps > 0:
print("average ips of %d steps, skip %d steps:" % (count, skip_steps))
print("\tAvg: %.3f %s" % (avg_of_records_skipped, fps_unit))
print("\tMin: %.3f %s" % (skip_min, fps_unit))
print("\tMax: %.3f %s" % (skip_max, fps_unit))
print("\tFPS: %.3f %s" % (fps_skipped, fps_unit))
elif mode == 1 or mode == 3:
print("average latency of %d steps, skip 0 step:" % count)
print("\tAvg: %.3f steps/s" % avg_of_records)
print("\tFPS: %.3f %s" % (fps, fps_unit))
if skip_steps > 0:
print("average latency of %d steps, skip %d steps:" % (count, skip_steps))
print("\tAvg: %.3f steps/s" % avg_of_records_skipped)
print("\tMin: %.3f steps/s" % skip_min)
print("\tMax: %.3f steps/s" % skip_max)
print("\tFPS: %.3f %s" % (fps_skipped, fps_unit))
elif mode == 0 or mode == 2:
print("average latency of %d steps, skip 0 step:" % count)
print("\tAvg: %.3f s/step" % avg_of_records)
print("\tFPS: %.3f %s" % (fps, fps_unit))
if skip_steps > 0:
print("average latency of %d steps, skip %d steps:" % (count, skip_steps))
print("\tAvg: %.3f s/step" % avg_of_records_skipped)
print("\tMin: %.3f s/step" % skip_min)
print("\tMax: %.3f s/step" % skip_max)
print("\tFPS: %.3f %s" % (fps_skipped, fps_unit))
return round(fps_skipped, 3), fps_unit
if __name__ == "__main__":
args = parse_args()
run_info = dict()
run_info["log_file"] = args.filename
run_info["model_name"] = args.model_name
run_info["mission_name"] = args.mission_name
run_info["direction_id"] = args.direction_id
run_info["run_mode"] = args.run_mode
run_info["index"] = args.index
run_info["gpu_num"] = args.gpu_num
run_info["FINAL_RESULT"] = 0
run_info["JOB_FAIL_FLAG"] = 0
try:
if args.index == 1:
if args.gpu_num == 1:
run_info["log_with_profiler"] = args.log_with_profiler
run_info["profiler_path"] = args.profiler_path
analyzer = TimeAnalyzer(args.filename, args.keyword, args.separator, args.position, args.range)
run_info["FINAL_RESULT"], run_info["UNIT"] = analyzer.analysis(
batch_size=args.base_batch_size,
gpu_num=args.gpu_num,
skip_steps=args.skip_steps,
mode=args.model_mode,
run_mode=args.run_mode,
unit=args.ips_unit)
try:
if int(os.getenv('job_fail_flag')) == 1 or int(run_info["FINAL_RESULT"]) == 0:
run_info["JOB_FAIL_FLAG"] = 1
except:
pass
elif args.index == 3:
run_info["FINAL_RESULT"] = {}
records_fo_total = TimeAnalyzer(args.filename, 'Framework overhead', None, 3, '').records
records_fo_ratio = TimeAnalyzer(args.filename, 'Framework overhead', None, 5).records
records_ct_total = TimeAnalyzer(args.filename, 'Computation time', None, 3, '').records
records_gm_total = TimeAnalyzer(args.filename, 'GpuMemcpy Calls', None, 4, '').records
records_gm_ratio = TimeAnalyzer(args.filename, 'GpuMemcpy Calls', None, 6).records
records_gmas_total = TimeAnalyzer(args.filename, 'GpuMemcpyAsync Calls', None, 4, '').records
records_gms_total = TimeAnalyzer(args.filename, 'GpuMemcpySync Calls', None, 4, '').records
run_info["FINAL_RESULT"]["Framework_Total"] = records_fo_total[0] if records_fo_total else 0
run_info["FINAL_RESULT"]["Framework_Ratio"] = records_fo_ratio[0] if records_fo_ratio else 0
run_info["FINAL_RESULT"]["ComputationTime_Total"] = records_ct_total[0] if records_ct_total else 0
run_info["FINAL_RESULT"]["GpuMemcpy_Total"] = records_gm_total[0] if records_gm_total else 0
run_info["FINAL_RESULT"]["GpuMemcpy_Ratio"] = records_gm_ratio[0] if records_gm_ratio else 0
run_info["FINAL_RESULT"]["GpuMemcpyAsync_Total"] = records_gmas_total[0] if records_gmas_total else 0
run_info["FINAL_RESULT"]["GpuMemcpySync_Total"] = records_gms_total[0] if records_gms_total else 0
else:
print("Not support!")
except Exception:
traceback.print_exc()
print("{}".format(json.dumps(run_info))) # it's required, for the log file path insert to the database
# PaddleOCR DB/EAST 算法训练benchmark测试
PaddleOCR/benchmark目录下的文件用于获取并分析训练日志。
训练采用icdar2015数据集,包括1000张训练图像和500张测试图像。模型配置采用resnet18_vd作为backbone,分别训练batch_size=8和batch_size=16的情况。
## 运行训练benchmark
benchmark/run_det.sh 中包含了三个过程:
- 安装依赖
- 下载数据
- 执行训练
- 日志分析获取IPS
在执行训练部分,会执行单机单卡(默认0号卡)单机多卡训练,并分别执行batch_size=8和batch_size=16的情况。所以执行完后,每种模型会得到4个日志文件。
run_det.sh 执行方式如下:
```
# cd PaddleOCR/
bash benchmark/run_det.sh
```
以DB为例,将得到四个日志文件,如下:
```
det_res18_db_v2.0_sp_bs16_fp32_1
det_res18_db_v2.0_sp_bs8_fp32_1
det_res18_db_v2.0_mp_bs16_fp32_1
det_res18_db_v2.0_mp_bs8_fp32_1
```
#!/usr/bin/env bash
set -xe
# 运行示例:CUDA_VISIBLE_DEVICES=0 bash run_benchmark.sh ${run_mode} ${bs_item} ${fp_item} 500 ${model_mode}
# 参数说明
function _set_params(){
run_mode=${1:-"sp"} # 单卡sp|多卡mp
batch_size=${2:-"64"}
fp_item=${3:-"fp32"} # fp32|fp16
max_iter=${4:-"500"} # 可选,如果需要修改代码提前中断
model_name=${5:-"model_name"}
run_log_path=${TRAIN_LOG_DIR:-$(pwd)} # TRAIN_LOG_DIR 后续QA设置该参数
# 以下不用修改
device=${CUDA_VISIBLE_DEVICES//,/ }
arr=(${device})
num_gpu_devices=${#arr[*]}
log_file=${run_log_path}/${model_name}_${run_mode}_bs${batch_size}_${fp_item}_${num_gpu_devices}
}
function _train(){
echo "Train on ${num_gpu_devices} GPUs"
echo "current CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES, gpus=$num_gpu_devices, batch_size=$batch_size"
train_cmd="-c configs/det/${model_name}.yml -o Train.loader.batch_size_per_card=${batch_size} Global.epoch_num=${max_iter} "
case ${run_mode} in
sp)
train_cmd="python3.7 tools/train.py "${train_cmd}""
;;
mp)
train_cmd="python3.7 -m paddle.distributed.launch --log_dir=./mylog --gpus=$CUDA_VISIBLE_DEVICES tools/train.py ${train_cmd}"
;;
*) echo "choose run_mode(sp or mp)"; exit 1;
esac
# 以下不用修改
timeout 15m ${train_cmd} > ${log_file} 2>&1
if [ $? -ne 0 ];then
echo -e "${model_name}, FAIL"
export job_fail_flag=1
else
echo -e "${model_name}, SUCCESS"
export job_fail_flag=0
fi
kill -9 `ps -ef|grep 'python3.7'|awk '{print $2}'`
if [ $run_mode = "mp" -a -d mylog ]; then
rm ${log_file}
cp mylog/workerlog.0 ${log_file}
fi
# run log analysis
analysis_cmd="python3.7 benchmark/analysis.py --filename ${log_file} --mission_name ${model_name} --run_mode ${mode} --direction_id 0 --keyword 'ips:' --base_batch_size ${batch_szie} --skip_steps 1 --gpu_num ${num_gpu_devices} --index 1 --model_mode=-1 --ips_unit=samples/sec"
eval $analysis_cmd
}
_set_params $@
_train
# 提供可稳定复现性能的脚本,默认在标准docker环境内py37执行: paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7 paddle=2.1.2 py=37
# 执行目录: ./PaddleOCR
# 1 安装该模型需要的依赖 (如需开启优化策略请注明)
python3.7 -m pip install -r requirements.txt
# 2 拷贝该模型需要数据、预训练模型
wget -c -p ./tain_data/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/icdar2015.tar && cd train_data && tar xf icdar2015.tar && cd ../
wget -c -p ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
# 3 批量运行(如不方便批量,1,2需放到单个模型中)
model_mode_list=(det_res18_db_v2.0 det_r50_vd_east)
fp_item_list=(fp32)
bs_list=(8 16)
for model_mode in ${model_mode_list[@]}; do
for fp_item in ${fp_item_list[@]}; do
for bs_item in ${bs_list[@]}; do
echo "index is speed, 1gpus, begin, ${model_name}"
run_mode=sp
CUDA_VISIBLE_DEVICES=0 bash benchmark/run_benchmark_det.sh ${run_mode} ${bs_item} ${fp_item} 10 ${model_mode} # (5min)
sleep 60
echo "index is speed, 8gpus, run_mode is multi_process, begin, ${model_name}"
run_mode=mp
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash benchmark/run_benchmark_det.sh ${run_mode} ${bs_item} ${fp_item} 10 ${model_mode}
sleep 60
done
done
done
......@@ -8,7 +8,7 @@ Global:
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [3000, 2000]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
pretrained_model: ./pretrain_models/ch_PP-OCRv2_det_distill_train/best_accuracy
checkpoints:
save_inference_dir:
use_visualdl: False
......@@ -19,30 +19,26 @@ Architecture:
name: DistillationModel
algorithm: Distillation
Models:
Student:
pretrained: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
freeze_params: false
Teacher:
freeze_params: true
return_all_feats: false
model_type: det
algorithm: DB
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: True
name: ResNet
layers: 18
Neck:
name: DBFPN
out_channels: 96
out_channels: 256
Head:
name: DBHead
k: 50
Student2:
pretrained: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
Student:
freeze_params: false
return_all_feats: false
model_type: det
algorithm: DB
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
......@@ -54,23 +50,24 @@ Architecture:
Head:
name: DBHead
k: 50
Teacher:
pretrained: ./pretrain_models/ch_ppocr_server_v2.0_det_train/best_accuracy
freeze_params: true
Student2:
freeze_params: false
return_all_feats: false
model_type: det
algorithm: DB
Transform:
Backbone:
name: ResNet
layers: 18
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: True
Neck:
name: DBFPN
out_channels: 256
out_channels: 96
Head:
name: DBHead
k: 50
Loss:
name: CombinedLoss
loss_config_list:
......@@ -144,6 +141,7 @@ Train:
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- CopyPaste:
- IaaAugment:
augmenter_args:
- { 'type': Fliplr, 'args': { 'p': 0.5 } }
......
......@@ -68,8 +68,7 @@ Loss:
ohem_ratio: 3
- DistillationDBLoss:
weight: 1.0
model_name_list: ["Student", "Teacher"]
# key: maps
model_name_list: ["Student"]
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
......@@ -116,6 +115,7 @@ Train:
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- CopyPaste:
- IaaAugment:
augmenter_args:
- { 'type': Fliplr, 'args': { 'p': 0.5 } }
......
......@@ -118,6 +118,7 @@ Train:
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- CopyPaste:
- IaaAugment:
augmenter_args:
- { 'type': Fliplr, 'args': { 'p': 0.5 } }
......
Global:
use_gpu: true
epoch_num: 1200
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/ch_db_mv3/
save_epoch_step: 1200
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [0, 400]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/student.pdparams
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./output/det_db/predicts_db.txt
Architecture:
model_type: det
algorithm: DB
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: True
Neck:
name: DBFPN
out_channels: 96
Head:
name: DBHead
k: 50
Loss:
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 2
regularizer:
name: 'L2'
factor: 0
PostProcess:
name: DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
Metric:
name: DetMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
ratio_list: [1.0]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- IaaAugment:
augmenter_args:
- { 'type': Fliplr, 'args': { 'p': 0.5 } }
- { 'type': Affine, 'args': { 'rotate': [-10, 10] } }
- { 'type': Resize, 'args': { 'size': [0.5, 3] } }
- EastRandomCropData:
size: [960, 960]
max_tries: 50
keep_ratio: true
- MakeBorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- MakeShrinkMap:
shrink_ratio: 0.4
min_text_size: 8
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- DetResizeForTest:
# image_shape: [736, 1280]
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 2
......@@ -128,4 +128,4 @@ Eval:
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 8
use_shared_memory: False
\ No newline at end of file
use_shared_memory: False
Global:
use_gpu: true
epoch_num: 600
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/det_mv3_pse/
save_epoch_step: 600
# evaluation is run every 63 iterations
eval_batch_step: [ 0,63 ]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/MobileNetV3_large_x0_5_pretrained
checkpoints: #./output/det_r50_vd_pse_batch8_ColorJitter/best_accuracy
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./output/det_pse/predicts_pse.txt
Architecture:
model_type: det
algorithm: PSE
Transform: null
Backbone:
name: MobileNetV3
scale: 0.5
model_name: large
Neck:
name: FPN
out_channels: 96
Head:
name: PSEHead
hidden_dim: 96
out_channels: 7
Loss:
name: PSELoss
alpha: 0.7
ohem_ratio: 3
kernel_sample_mask: pred
reduction: none
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Step
learning_rate: 0.001
step_size: 200
gamma: 0.1
regularizer:
name: 'L2'
factor: 0.0005
PostProcess:
name: PSEPostProcess
thresh: 0
box_thresh: 0.85
min_area: 16
box_type: box # 'box' or 'poly'
scale: 1
Metric:
name: DetMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- ColorJitter:
brightness: 0.12549019607843137
saturation: 0.5
- IaaAugment:
augmenter_args:
- { 'type': Resize, 'args': { 'size': [ 0.5, 3 ] } }
- { 'type': Fliplr, 'args': { 'p': 0.5 } }
- { 'type': Affine, 'args': { 'rotate': [ -10, 10 ] } }
- MakePseGt:
kernel_num: 7
min_shrink_ratio: 0.4
size: 640
- RandomCropImgMask:
size: [ 640,640 ]
main_key: gt_text
crop_keys: [ 'image', 'gt_text', 'gt_kernels', 'mask' ]
- NormalizeImage:
scale: 1./255.
mean: [ 0.485, 0.456, 0.406 ]
std: [ 0.229, 0.224, 0.225 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'image', 'gt_text', 'gt_kernels', 'mask' ] # the order of the dataloader list
loader:
shuffle: True
drop_last: False
batch_size_per_card: 16
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- DetResizeForTest:
limit_side_len: 736
limit_type: min
- NormalizeImage:
scale: 1./255.
mean: [ 0.485, 0.456, 0.406 ]
std: [ 0.229, 0.224, 0.225 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'image', 'shape', 'polys', 'ignore_tags' ]
loader:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 8
\ No newline at end of file
......@@ -98,7 +98,7 @@ Train:
shuffle: True
drop_last: False
batch_size_per_card: 16
num_workers: 8
num_workers: 4
Eval:
dataset:
......@@ -125,4 +125,4 @@ Eval:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 8
\ No newline at end of file
num_workers: 8
......@@ -8,7 +8,7 @@ Global:
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [4000, 5000]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/ResNet50_vd_pretrained/
pretrained_model: ./pretrain_models/ResNet50_vd_pretrained
checkpoints:
save_inference_dir:
use_visualdl: False
......
Global:
use_gpu: true
epoch_num: 600
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/det_r50_vd_pse/
save_epoch_step: 600
# evaluation is run every 125 iterations
eval_batch_step: [ 0,125 ]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/ResNet50_vd_ssld_pretrained
checkpoints: #./output/det_r50_vd_pse_batch8_ColorJitter/best_accuracy
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./output/det_pse/predicts_pse.txt
Architecture:
model_type: det
algorithm: PSE
Transform:
Backbone:
name: ResNet
layers: 50
Neck:
name: FPN
out_channels: 256
Head:
name: PSEHead
hidden_dim: 256
out_channels: 7
Loss:
name: PSELoss
alpha: 0.7
ohem_ratio: 3
kernel_sample_mask: pred
reduction: none
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Step
learning_rate: 0.0001
step_size: 200
gamma: 0.1
regularizer:
name: 'L2'
factor: 0.0005
PostProcess:
name: PSEPostProcess
thresh: 0
box_thresh: 0.85
min_area: 16
box_type: box # 'box' or 'poly'
scale: 1
Metric:
name: DetMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- ColorJitter:
brightness: 0.12549019607843137
saturation: 0.5
- IaaAugment:
augmenter_args:
- { 'type': Resize, 'args': { 'size': [ 0.5, 3 ] } }
- { 'type': Fliplr, 'args': { 'p': 0.5 } }
- { 'type': Affine, 'args': { 'rotate': [ -10, 10 ] } }
- MakePseGt:
kernel_num: 7
min_shrink_ratio: 0.4
size: 640
- RandomCropImgMask:
size: [ 640,640 ]
main_key: gt_text
crop_keys: [ 'image', 'gt_text', 'gt_kernels', 'mask' ]
- NormalizeImage:
scale: 1./255.
mean: [ 0.485, 0.456, 0.406 ]
std: [ 0.229, 0.224, 0.225 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'image', 'gt_text', 'gt_kernels', 'mask' ] # the order of the dataloader list
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- DetResizeForTest:
limit_side_len: 736
limit_type: min
- NormalizeImage:
scale: 1./255.
mean: [ 0.485, 0.456, 0.406 ]
std: [ 0.229, 0.224, 0.225 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'image', 'shape', 'polys', 'ignore_tags' ]
loader:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 8
\ No newline at end of file
Global:
use_gpu: true
epoch_num: 1200
log_smooth_window: 20
print_batch_step: 2
save_model_dir: ./output/ch_db_res18/
save_epoch_step: 1200
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [3000, 2000]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/ResNet18_vd_pretrained
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./output/det_db/predicts_db.txt
Architecture:
model_type: det
algorithm: DB
Transform:
Backbone:
name: ResNet
layers: 18
disable_se: True
Neck:
name: DBFPN
out_channels: 256
Head:
name: DBHead
k: 50
Loss:
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 2
regularizer:
name: 'L2'
factor: 0
PostProcess:
name: DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
Metric:
name: DetMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/train_icdar2015_label.txt
ratio_list: [1.0]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- IaaAugment:
augmenter_args:
- { 'type': Fliplr, 'args': { 'p': 0.5 } }
- { 'type': Affine, 'args': { 'rotate': [-10, 10] } }
- { 'type': Resize, 'args': { 'size': [0.5, 3] } }
- EastRandomCropData:
size: [960, 960]
max_tries: 50
keep_ratio: true
- MakeBorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- MakeShrinkMap:
shrink_ratio: 0.4
min_text_size: 8
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/icdar2015/text_localization/
label_file_list:
- ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- DetResizeForTest:
# image_shape: [736, 1280]
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 2
......@@ -94,7 +94,7 @@ Eval:
label_file_list: [./train_data/total_text/test/test.txt]
transforms:
- DecodeImage: # load image
img_mode: RGB
img_mode: BGR
channel_first: False
- E2ELabelEncodeTest:
- E2EResizeForTest:
......@@ -111,4 +111,4 @@ Eval:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 2
\ No newline at end of file
num_workers: 2
Global:
debug: false
use_gpu: true
epoch_num: 800
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_mobile_pp-OCRv2
save_epoch_step: 3
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
max_text_length: 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_mobile_pp-OCRv2.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Piecewise
decay_epochs : [700, 800]
values : [0.001, 0.0001]
warmup_epoch: 5
regularizer:
name: L2
factor: 2.0e-05
Architecture:
model_type: rec
algorithm: CRNN
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
Neck:
name: SequenceEncoder
encoder_type: rnn
hidden_size: 64
Head:
name: CTCHead
mid_channels: 96
fc_decay: 0.00002
Loss:
name: CTCLoss
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/
label_file_list:
- ./train_data/train_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecAug:
- CTCLabelEncode:
- RecResizeImg:
image_shape: [3, 32, 320]
- KeepKeys:
keep_keys:
- image
- label
- length
loader:
shuffle: true
batch_size_per_card: 128
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data
label_file_list:
- ./train_data/val_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- CTCLabelEncode:
- RecResizeImg:
image_shape: [3, 32, 320]
- KeepKeys:
keep_keys:
- image
- label
- length
loader:
shuffle: false
drop_last: false
batch_size_per_card: 128
num_workers: 8
......@@ -4,7 +4,7 @@ Global:
epoch_num: 800
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_chinese_lite_distillation_v2.1
save_model_dir: ./output/rec_pp-OCRv2_distillation
save_epoch_step: 3
eval_batch_step: [0, 2000]
cal_metric_during_train: true
......@@ -14,12 +14,11 @@ Global:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
character_type: ch
max_text_length: 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_chinese_lite_distillation_v2.1.txt
save_res_path: ./output/rec/predicts_pp-OCRv2_distillation.txt
Optimizer:
......@@ -88,6 +87,7 @@ Loss:
- DistillationDMLLoss:
weight: 1.0
act: "softmax"
use_log: true
model_name_pairs:
- ["Student", "Teacher"]
key: head_out
......
Global:
debug: false
use_gpu: true
epoch_num: 800
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_mobile_pp-OCRv2_enhanced_ctc_loss
save_epoch_step: 3
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
max_text_length: 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_mobile_pp-OCRv2_enhanced_ctc_loss.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Piecewise
decay_epochs : [700, 800]
values : [0.001, 0.0001]
warmup_epoch: 5
regularizer:
name: L2
factor: 2.0e-05
Architecture:
model_type: rec
algorithm: CRNN
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
Neck:
name: SequenceEncoder
encoder_type: rnn
hidden_size: 64
Head:
name: CTCHead
mid_channels: 96
fc_decay: 0.00002
return_feats: true
Loss:
name: CombinedLoss
loss_config_list:
- CTCLoss:
use_focal_loss: false
weight: 1.0
- CenterLoss:
weight: 0.05
num_classes: 6625
feat_dim: 96
init_center: false
center_file_path: "./train_center.pkl"
# you can also try to add ace loss on your own dataset
# - ACELoss:
# weight: 0.1
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/
label_file_list:
- ./train_data/train_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecAug:
- CTCLabelEncode:
- RecResizeImg:
image_shape: [3, 32, 320]
- KeepKeys:
keep_keys:
- image
- label
- length
- label_ace
loader:
shuffle: true
batch_size_per_card: 128
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data
label_file_list:
- ./train_data/val_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- CTCLabelEncode:
- RecResizeImg:
image_shape: [3, 32, 320]
- KeepKeys:
keep_keys:
- image
- label
- length
loader:
shuffle: false
drop_last: false
batch_size_per_card: 128
num_workers: 8
......@@ -15,7 +15,6 @@ Global:
infer_img: doc/imgs_words/ch/word_1.jpg
# for data or label process
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
character_type: ch
max_text_length: 25
infer_mode: False
use_space_char: True
......
......@@ -15,7 +15,6 @@ Global:
infer_img: doc/imgs_words/ch/word_1.jpg
# for data or label process
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
character_type: ch
max_text_length: 25
infer_mode: False
use_space_char: True
......
......@@ -15,7 +15,6 @@ Global:
use_visualdl: false
infer_img: null
character_dict_path: ppocr/utils/dict/arabic_dict.txt
character_type: arabic
max_text_length: 25
infer_mode: false
use_space_char: true
......
......@@ -15,7 +15,6 @@ Global:
use_visualdl: false
infer_img: null
character_dict_path: ppocr/utils/dict/cyrillic_dict.txt
character_type: cyrillic
max_text_length: 25
infer_mode: false
use_space_char: true
......
......@@ -15,7 +15,6 @@ Global:
use_visualdl: false
infer_img: null
character_dict_path: ppocr/utils/dict/devanagari_dict.txt
character_type: devanagari
max_text_length: 25
infer_mode: false
use_space_char: true
......
......@@ -16,7 +16,6 @@ Global:
infer_img:
# for data or label process
character_dict_path: ppocr/utils/en_dict.txt
character_type: EN
max_text_length: 25
infer_mode: False
use_space_char: True
......
......@@ -16,7 +16,6 @@ Global:
infer_img:
# for data or label process
character_dict_path: ppocr/utils/dict/french_dict.txt
character_type: french
max_text_length: 25
infer_mode: False
use_space_char: False
......
......@@ -16,7 +16,6 @@ Global:
infer_img:
# for data or label process
character_dict_path: ppocr/utils/dict/german_dict.txt
character_type: german
max_text_length: 25
infer_mode: False
use_space_char: False
......
......@@ -16,7 +16,6 @@ Global:
infer_img:
# for data or label process
character_dict_path: ppocr/utils/dict/japan_dict.txt
character_type: japan
max_text_length: 25
infer_mode: False
use_space_char: False
......
......@@ -16,7 +16,6 @@ Global:
infer_img:
# for data or label process
character_dict_path: ppocr/utils/dict/korean_dict.txt
character_type: korean
max_text_length: 25
infer_mode: False
use_space_char: False
......
......@@ -15,7 +15,6 @@ Global:
use_visualdl: false
infer_img: null
character_dict_path: ppocr/utils/dict/latin_dict.txt
character_type: latin
max_text_length: 25
infer_mode: false
use_space_char: true
......
......@@ -14,8 +14,7 @@ Global:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path: ppocr/utils/ic15_dict.txt
character_type: ch
character_dict_path: ppocr/utils/en_dict.txt
max_text_length: 25
infer_mode: False
use_space_char: False
......
Global:
use_gpu: True
epoch_num: 21
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec/nrtr/
save_epoch_step: 1
# evaluation is run every 2000 iterations
eval_batch_step: [0, 2000]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path: ppocr/utils/EN_symbol_dict.txt
max_text_length: 25
infer_mode: False
use_space_char: False
save_res_path: ./output/rec/predicts_nrtr.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.99
clip_norm: 5.0
lr:
name: Cosine
learning_rate: 0.0005
warmup_epoch: 2
regularizer:
name: 'L2'
factor: 0.
Architecture:
model_type: rec
algorithm: NRTR
in_channels: 1
Transform:
Backbone:
name: MTB
cnn_num: 2
Head:
name: Transformer
d_model: 512
num_encoder_layers: 6
beam_size: -1 # When Beam size is greater than 0, it means to use beam search when evaluation.
Loss:
name: NRTRLoss
smoothing: True
PostProcess:
name: NRTRLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: LMDBDataSet
data_dir: ./train_data/data_lmdb_release/training/
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- NRTRLabelEncode: # Class handling label
- NRTRRecResizeImg:
image_shape: [100, 32]
resize_type: PIL # PIL or OpenCV
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 512
drop_last: True
num_workers: 8
Eval:
dataset:
name: LMDBDataSet
data_dir: ./train_data/data_lmdb_release/evaluation/
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- NRTRLabelEncode: # Class handling label
- NRTRRecResizeImg:
image_shape: [100, 32]
resize_type: PIL # PIL or OpenCV
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 256
num_workers: 1
use_shared_memory: False
......@@ -14,8 +14,7 @@ Global:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path:
character_type: en
character_dict_path:
max_text_length: 25
infer_mode: False
use_space_char: False
......
......@@ -15,7 +15,6 @@ Global:
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path:
character_type: en
max_text_length: 25
infer_mode: False
use_space_char: False
......
......@@ -14,8 +14,7 @@ Global:
use_visualdl: False
infer_img: doc/imgs_words/ch/word_1.jpg
# for data or label process
character_dict_path:
character_type: en
character_dict_path:
max_text_length: 25
infer_mode: False
use_space_char: False
......
......@@ -15,7 +15,6 @@ Global:
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path:
character_type: en
max_text_length: 25
infer_mode: False
use_space_char: False
......
Global:
use_gpu: true
epoch_num: 5
log_smooth_window: 20
print_batch_step: 20
save_model_dir: ./sar_rec
save_epoch_step: 1
# evaluation is run every 2000 iterations
eval_batch_step: [0, 2000]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img:
# for data or label process
character_dict_path: ppocr/utils/dict90.txt
max_text_length: 30
infer_mode: False
use_space_char: False
rm_symbol: True
save_res_path: ./output/rec/predicts_sar.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Piecewise
decay_epochs: [3, 4]
values: [0.001, 0.0001, 0.00001]
regularizer:
name: 'L2'
factor: 0
Architecture:
model_type: rec
algorithm: SAR
Transform:
Backbone:
name: ResNet31
Head:
name: SARHead
Loss:
name: SARLoss
PostProcess:
name: SARLabelDecode
Metric:
name: RecMetric
Train:
dataset:
name: SimpleDataSet
label_file_list: ['./train_data/train_list.txt']
data_dir: ./train_data/
ratio_list: 1.0
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- SARLabelEncode: # Class handling label
- SARRecResizeImg:
image_shape: [3, 48, 48, 160] # h:48 w:[48,160]
width_downsample_ratio: 0.25
- KeepKeys:
keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 64
drop_last: True
num_workers: 8
use_shared_memory: False
Eval:
dataset:
name: LMDBDataSet
data_dir: ./train_data/data_lmdb_release/evaluation/
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- SARLabelEncode: # Class handling label
- SARRecResizeImg:
image_shape: [3, 48, 48, 160]
width_downsample_ratio: 0.25
- KeepKeys:
keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 64
num_workers: 4
use_shared_memory: False
......@@ -14,8 +14,7 @@ Global:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path:
character_type: en
character_dict_path:
max_text_length: 25
infer_mode: False
use_space_char: False
......
......@@ -15,7 +15,6 @@ Global:
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path:
character_type: en
max_text_length: 25
infer_mode: False
use_space_char: False
......
......@@ -14,8 +14,7 @@ Global:
use_visualdl: False
infer_img: doc/imgs_words/ch/word_1.jpg
# for data or label process
character_dict_path:
character_type: en
character_dict_path:
max_text_length: 25
infer_mode: False
use_space_char: False
......
......@@ -14,8 +14,7 @@ Global:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path:
character_type: en
character_dict_path:
max_text_length: 25
infer_mode: False
use_space_char: False
......
......@@ -14,8 +14,7 @@ Global:
use_visualdl: False
infer_img: doc/imgs_words/ch/word_1.jpg
# for data or label process
character_dict_path:
character_type: en
character_dict_path:
max_text_length: 25
num_heads: 8
infer_mode: False
......
Global:
use_gpu: True
epoch_num: 400
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec/seed
save_epoch_step: 3
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [0, 2000]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path: ppocr/utils/EN_symbol_dict.txt
max_text_length: 100
infer_mode: False
use_space_char: False
save_res_path: ./output/rec/predicts_seed.txt
Optimizer:
name: Adadelta
weight_deacy: 0.0
momentum: 0.9
lr:
name: Piecewise
decay_epochs: [4,5,8]
values: [1.0, 0.1, 0.01]
regularizer:
name: 'L2'
factor: 2.0e-05
Architecture:
model_type: rec
algorithm: SEED
Transform:
name: STN_ON
tps_inputsize: [32, 64]
tps_outputsize: [32, 100]
num_control_points: 20
tps_margins: [0.05,0.05]
stn_activation: none
Backbone:
name: ResNet_ASTER
Head:
name: AsterHead # AttentionHead
sDim: 512
attDim: 512
max_len_labels: 100
Loss:
name: AsterLoss
PostProcess:
name: SEEDLabelDecode
Metric:
name: RecMetric
main_indicator: acc
is_filter: True
Train:
dataset:
name: LMDBDataSet
data_dir: ./train_data/data_lmdb_release/training/
transforms:
- Fasttext:
path: "./cc.en.300.bin"
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- SEEDLabelEncode: # Class handling label
- RecResizeImg:
character_type: en
image_shape: [3, 64, 256]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length', 'fast_label'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 256
drop_last: True
num_workers: 6
Eval:
dataset:
name: LMDBDataSet
data_dir: ./train_data/data_lmdb_release/evaluation/
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- SEEDLabelEncode: # Class handling label
- RecResizeImg:
character_type: en
image_shape: [3, 64, 256]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: True
batch_size_per_card: 256
num_workers: 4
project(ppocr CXX C)
cmake_minimum_required(VERSION 3.14)
option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF)
......@@ -206,9 +207,12 @@ endif()
set(DEPS ${DEPS} ${OpenCV_LIBS})
include(FetchContent)
include(external-cmake/auto-log.cmake)
include_directories(${FETCHCONTENT_BASE_DIR}/extern_autolog-src)
AUX_SOURCE_DIRECTORY(./src SRCS)
add_executable(${DEMO_NAME} ${SRCS})
target_link_libraries(${DEMO_NAME} ${DEPS})
if (WIN32 AND WITH_MKL)
......
因为 它太大了无法显示 image diff 。你可以改为 查看blob
......@@ -5,20 +5,20 @@ PaddleOCR在Windows 平台下基于`Visual Studio 2019 Community` 进行了测
## 前置条件
* Visual Studio 2019
* CUDA 9.0 / CUDA 10.0,cudnn 7+ (仅在使用GPU版本的预测库时需要)
* CUDA 10.2,cudnn 7+ (仅在使用GPU版本的预测库时需要)
* CMake 3.0+
请确保系统已经安装好上述基本软件,我们使用的是`VS2019`的社区版。
**下面所有示例以工作目录为 `D:\projects`演示**
### Step1: 下载PaddlePaddle C++ 预测库 fluid_inference
### Step1: 下载PaddlePaddle C++ 预测库 paddle_inference
PaddlePaddle C++ 预测库针对不同的`CPU``CUDA`版本提供了不同的预编译版本,请根据实际情况下载: [C++预测库下载列表](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#windows)
解压后`D:\projects\fluid_inference`目录包含内容为:
解压后`D:\projects\paddle_inference`目录包含内容为:
```
fluid_inference
paddle_inference
├── paddle # paddle核心库和头文件
|
├── third_party # 第三方依赖库和头文件
......@@ -46,13 +46,13 @@ fluid_inference
![step2.2](https://paddleseg.bj.bcebos.com/inference/vs2019_step3.png)
3. 点击:`项目`->`cpp_inference_demo的CMake设置`
3. 点击:`项目`->`CMake设置`
![step3](https://paddleseg.bj.bcebos.com/inference/vs2019_step4.png)
4. 点击`浏览`分别设置编译选项指定`CUDA``CUDNN_LIB``OpenCV``Paddle预测库`的路径
4. 分别设置编译选项指定`CUDA``CUDNN_LIB``OpenCV``Paddle预测库`的路径
三个编译参数的含义说明如下(带`*`表示仅在使用**GPU版本**预测库时指定, 其中CUDA库版本尽量对齐**使用9.0、10.0版本,不使用9.2、10.1等版本CUDA库**):
三个编译参数的含义说明如下(带`*`表示仅在使用**GPU版本**预测库时指定, 其中CUDA库版本尽量对齐):
| 参数名 | 含义 |
| ---- | ---- |
......@@ -67,6 +67,11 @@ fluid_inference
![step4](https://paddleseg.bj.bcebos.com/inference/vs2019_step5.png)
下面给出with GPU的配置示例:
![step5](./vs2019_build_withgpu_config.png)
**注意:**
CMAKE_BACKWARDS的版本要根据平台安装cmake的版本进行设置。
**设置完成后**, 点击上图中`保存并生成CMake缓存以加载变量`
5. 点击`生成`->`全部生成`
......@@ -74,24 +79,34 @@ fluid_inference
![step6](https://paddleseg.bj.bcebos.com/inference/vs2019_step6.png)
### Step4: 预测及可视化
### Step4: 预测
上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release`目录下,打开`cmd`,并切换到该目录
上述`Visual Studio 2019`编译产出的可执行文件在`out\build\x64-Release\Release`目录下,打开`cmd`,并切换到`D:\projects\PaddleOCR\deploy\cpp_infer\`
```
cd D:\projects\PaddleOCR\deploy\cpp_infer\out\build\x64-Release
cd D:\projects\PaddleOCR\deploy\cpp_infer
```
可执行文件`ocr_system.exe`即为样例的预测程序,其主要使用方法如下
可执行文件`ppocr.exe`即为样例的预测程序,其主要使用方法如下,更多使用方法可以参考[说明文档](../readme.md)`运行demo`部分。
```shell
#预测图片 `D:\projects\PaddleOCR\doc\imgs\10.jpg`
.\ocr_system.exe D:\projects\PaddleOCR\deploy\cpp_infer\tools\config.txt D:\projects\PaddleOCR\doc\imgs\10.jpg
#识别中文图片 `D:\projects\PaddleOCR\doc\imgs_words\ch\`
.\out\build\x64-Release\Release\ppocr.exe rec --rec_model_dir=D:\projects\PaddleOCR\ch_ppocr_mobile_v2.0_rec_infer --image_dir=D:\projects\PaddleOCR\doc\imgs_words\ch\
#识别英文图片 'D:\projects\PaddleOCR\doc\imgs_words\en\'
.\out\build\x64-Release\Release\ppocr.exe rec --rec_model_dir=D:\projects\PaddleOCR\inference\rec_mv3crnn --image_dir=D:\projects\PaddleOCR\doc\imgs_words\en\ --char_list_file=D:\projects\PaddleOCR\ppocr\utils\dict\en_dict.txt
```
第一个参数为配置文件路径,第二个参数为需要预测的图片路径。
第一个参数为配置文件路径,第二个参数为需要预测的图片路径,第三个参数为配置文本识别的字典。
### 注意
### FQA
* 在Windows下的终端中执行文件exe时,可能会发生乱码的现象,此时需要在终端中输入`CHCP 65001`,将终端的编码方式由GBK编码(默认)改为UTF-8编码,更加具体的解释可以参考这篇博客:[https://blog.csdn.net/qq_35038153/article/details/78430359](https://blog.csdn.net/qq_35038153/article/details/78430359)。
* 编译时,如果报错`错误:C1083 无法打开包括文件:"dirent.h":No such file or directory`,可以参考该[文档](https://blog.csdn.net/Dora_blank/article/details/117740837#41_C1083_direnthNo_such_file_or_directory_54),新建`dirent.h`文件,并添加到`VC++`的包含目录中。
* 编译时,如果报错`错误:C1083 无法打开包括文件:"dirent.h":No such file or directory`,可以参考该[文档](https://blog.csdn.net/Dora_blank/article/details/117740837#41_C1083_direnthNo_such_file_or_directory_54),新建`dirent.h`文件,并添加到`utility.cpp`的头文件引用中。同时修改`utility.cpp`70行:`lstat`改成`stat`。
* 编译时,如果报错`Autolog未定义`,新建`autolog.h`文件,内容为:[autolog.h](https://github.com/LDOUBLEV/AutoLog/blob/main/auto_log/autolog.h),并添加到`main.cpp`的头文件引用中,再次编译。
* 运行时,如果弹窗报错找不到`paddle_inference.dll`或者`openblas.dll`,在`D:\projects\paddle_inference`预测库内找到这两个文件,复制到`D:\projects\PaddleOCR\deploy\cpp_infer\out\build\x64-Release\Release`目录下。不用重新编译,再次运行即可。
* 运行时,弹窗报错提示`应用程序无法正常启动(0xc0000142)`,并且`cmd`窗口内提示`You are using Paddle compiled with TensorRT, but TensorRT dynamic library is not found.`,把tensort目录下的lib里面的所有dll文件复制到release目录下,再次运行即可。
find_package(Git REQUIRED)
include(FetchContent)
set(FETCHCONTENT_BASE_DIR "${CMAKE_CURRENT_BINARY_DIR}/third-party")
FetchContent_Declare(
extern_Autolog
PREFIX autolog
GIT_REPOSITORY https://github.com/LDOUBLEV/AutoLog.git
GIT_TAG main
)
FetchContent_MakeAvailable(extern_Autolog)
......@@ -4,15 +4,32 @@
C++在性能计算上优于python,因此,在大多数CPU、GPU部署场景,多采用C++的部署方式,本节将介绍如何在Linux\Windows (CPU\GPU)环境下配置C++环境并完成
PaddleOCR模型部署。
* [1. 准备环境](#1)
+ [1.0 运行准备](#10)
+ [1.1 编译opencv库](#11)
+ [1.2 下载或者编译Paddle预测库](#12)
- [1.2.1 直接下载安装](#121)
- [1.2.2 预测库源码编译](#122)
* [2 开始运行](#2)
+ [2.1 将模型导出为inference model](#21)
+ [2.2 编译PaddleOCR C++预测demo](#22)
+ [2.3运行demo](#23)
<a name="1"></a>
## 1. 准备环境
### 运行准备
<a name="10"></a>
### 1.0 运行准备
- Linux环境,推荐使用docker。
- Windows环境,目前支持基于`Visual Studio 2019 Community`进行编译。
* 该文档主要介绍基于Linux环境的PaddleOCR C++预测流程,如果需要在Windows下基于预测库进行C++预测,具体编译方法请参考[Windows下编译教程](./docs/windows_vs2019_build.md)
<a name="11"></a>
### 1.1 编译opencv库
* 首先需要从opencv官网上下载在Linux环境下源码编译的包,以opencv3.4.7为例,下载命令如下。
......@@ -71,6 +88,8 @@ opencv3/
|-- share
```
<a name="12"></a>
### 1.2 下载或者编译Paddle预测库
* 有2种方式获取Paddle预测库,下面进行详细介绍。
......@@ -132,9 +151,12 @@ build/paddle_inference_install_dir/
其中`paddle`就是C++预测所需的Paddle库,`version.txt`中包含当前预测库的版本信息。
<a name="2"></a>
## 2 开始运行
<a name="21"></a>
### 2.1 将模型导出为inference model
* 可以参考[模型预测章节](../../doc/doc_ch/inference.md),导出inference model,用于模型预测。模型导出之后,假设放在`inference`目录下,则目录结构如下。
......@@ -149,6 +171,7 @@ inference/
| |--inference.pdmodel
```
<a name="22"></a>
### 2.2 编译PaddleOCR C++预测demo
......@@ -172,13 +195,14 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
* 编译完成之后,会在`build`文件夹下生成一个名为`ppocr`的可执行文件。
<a name="23"></a>
### 运行demo
### 2.3 运行demo
运行方式:
```shell
./build/ppocr <mode> [--param1] [--param2] [...]
```
```
其中,`mode`为必选参数,表示选择的功能,取值范围['det', 'rec', 'system'],分别表示调用检测、识别、检测识别串联(包括方向分类器)。具体命令如下:
##### 1. 只调用检测:
......@@ -258,6 +282,4 @@ CUDNN_LIB_DIR=/your_cudnn_lib_dir
</div>
### 2.3 注意
* 在使用Paddle预测库时,推荐使用2.0.0版本的预测库。
**注意:在使用Paddle预测库时,推荐使用2.0.0版本的预测库。**
# Server-side C++ inference
# Server-side C++ Inference
This chapter introduces the C++ deployment method of the PaddleOCR model, and the corresponding python predictive deployment method refers to [document](../../doc/doc_ch/inference.md).
C++ is better than python in terms of performance calculation. Therefore, in most CPU and GPU deployment scenarios, C++ deployment is mostly used.
......@@ -6,14 +6,14 @@ This section will introduce how to configure the C++ environment and complete it
PaddleOCR model deployment.
## 1. Prepare the environment
## 1. Prepare the Environment
### Environment
- Linux, docker is recommended.
### 1.1 Compile opencv
### 1.1 Compile OpenCV
* First of all, you need to download the source code compiled package in the Linux environment from the opencv official website. Taking opencv3.4.7 as an example, the download command is as follows.
......@@ -73,7 +73,7 @@ opencv3/
|-- share
```
### 1.2 Compile or download or the Paddle inference library
### 1.2 Compile or Download or the Paddle Inference Library
* There are 2 ways to obtain the Paddle inference library, described in detail below.
......@@ -136,7 +136,7 @@ build/paddle_inference_install_dir/
Among them, `paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library.
## 2. Compile and run the demo
## 2. Compile and Run the Demo
### 2.1 Export the inference model
......@@ -183,7 +183,7 @@ or the generated Paddle inference library path (`build/paddle_inference_install_
Execute the built executable file:
```shell
./build/ppocr <mode> [--param1] [--param2] [...]
```
```
Here, `mode` is a required parameter,and the value range is ['det', 'rec', 'system'], representing using detection only, using recognition only and using the end-to-end system respectively. Specifically,
##### 1. run det demo:
......
......@@ -35,15 +35,16 @@
#include <sys/stat.h>
#include <gflags/gflags.h>
#include "auto_log/autolog.h"
DEFINE_bool(use_gpu, false, "Infering with GPU or CPU.");
DEFINE_int32(gpu_id, 0, "Device id of GPU to execute.");
DEFINE_int32(gpu_mem, 4000, "GPU id when infering with GPU.");
DEFINE_int32(cpu_math_library_num_threads, 10, "Num of threads with CPU.");
DEFINE_bool(use_mkldnn, false, "Whether use mkldnn with CPU.");
DEFINE_int32(cpu_threads, 10, "Num of threads with CPU.");
DEFINE_bool(enable_mkldnn, false, "Whether use mkldnn with CPU.");
DEFINE_bool(use_tensorrt, false, "Whether use tensorrt.");
DEFINE_string(precision, "fp32", "Precision be one of fp32/fp16/int8");
DEFINE_bool(benchmark, true, "Whether use benchmark.");
DEFINE_bool(benchmark, false, "Whether use benchmark.");
DEFINE_string(save_log_path, "./log_output/", "Save benchmark log path.");
// detection related
DEFINE_string(image_dir, "", "Dir of input image.");
......@@ -60,6 +61,7 @@ DEFINE_string(cls_model_dir, "", "Path of cls inference model.");
DEFINE_double(cls_thresh, 0.9, "Threshold of cls_thresh.");
// recognition related
DEFINE_string(rec_model_dir, "", "Path of rec inference model.");
DEFINE_int32(rec_batch_num, 1, "rec_batch_num.");
DEFINE_string(char_list_file, "../../ppocr/utils/ppocr_keys_v1.txt", "Path of dictionary.");
......@@ -68,34 +70,6 @@ using namespace cv;
using namespace PaddleOCR;
void PrintBenchmarkLog(std::string model_name,
int batch_size,
std::string input_shape,
std::vector<double> time_info,
int img_num){
LOG(INFO) << "----------------------- Config info -----------------------";
LOG(INFO) << "runtime_device: " << (FLAGS_use_gpu ? "gpu" : "cpu");
LOG(INFO) << "ir_optim: " << "True";
LOG(INFO) << "enable_memory_optim: " << "True";
LOG(INFO) << "enable_tensorrt: " << FLAGS_use_tensorrt;
LOG(INFO) << "enable_mkldnn: " << (FLAGS_use_mkldnn ? "True" : "False");
LOG(INFO) << "cpu_math_library_num_threads: " << FLAGS_cpu_math_library_num_threads;
LOG(INFO) << "----------------------- Data info -----------------------";
LOG(INFO) << "batch_size: " << batch_size;
LOG(INFO) << "input_shape: " << input_shape;
LOG(INFO) << "data_num: " << img_num;
LOG(INFO) << "----------------------- Model info -----------------------";
LOG(INFO) << "model_name: " << model_name;
LOG(INFO) << "precision: " << FLAGS_precision;
LOG(INFO) << "----------------------- Perf info ------------------------";
LOG(INFO) << "Total time spent(ms): "
<< std::accumulate(time_info.begin(), time_info.end(), 0);
LOG(INFO) << "preprocess_time(ms): " << time_info[0] / img_num
<< ", inference_time(ms): " << time_info[1] / img_num
<< ", postprocess_time(ms): " << time_info[2] / img_num;
}
static bool PathExists(const std::string& path){
#ifdef _WIN32
struct _stat buffer;
......@@ -110,14 +84,14 @@ static bool PathExists(const std::string& path){
int main_det(std::vector<cv::String> cv_all_img_names) {
std::vector<double> time_info = {0, 0, 0};
DBDetector det(FLAGS_det_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
FLAGS_gpu_mem, FLAGS_cpu_math_library_num_threads,
FLAGS_use_mkldnn, FLAGS_max_side_len, FLAGS_det_db_thresh,
FLAGS_gpu_mem, FLAGS_cpu_threads,
FLAGS_enable_mkldnn, FLAGS_max_side_len, FLAGS_det_db_thresh,
FLAGS_det_db_box_thresh, FLAGS_det_db_unclip_ratio,
FLAGS_use_polygon_score, FLAGS_visualize,
FLAGS_use_tensorrt, FLAGS_precision);
for (int i = 0; i < cv_all_img_names.size(); ++i) {
LOG(INFO) << "The predict img: " << cv_all_img_names[i];
// LOG(INFO) << "The predict img: " << cv_all_img_names[i];
cv::Mat srcimg = cv::imread(cv_all_img_names[i], cv::IMREAD_COLOR);
if (!srcimg.data) {
......@@ -132,10 +106,30 @@ int main_det(std::vector<cv::String> cv_all_img_names) {
time_info[0] += det_times[0];
time_info[1] += det_times[1];
time_info[2] += det_times[2];
if (FLAGS_benchmark) {
cout << cv_all_img_names[i] << '\t';
for (int n = 0; n < boxes.size(); n++) {
for (int m = 0; m < boxes[n].size(); m++) {
cout << boxes[n][m][0] << ' ' << boxes[n][m][1] << ' ';
}
}
cout << endl;
}
}
if (FLAGS_benchmark) {
PrintBenchmarkLog("det", 1, "dynamic", time_info, cv_all_img_names.size());
AutoLogger autolog("ocr_det",
FLAGS_use_gpu,
FLAGS_use_tensorrt,
FLAGS_enable_mkldnn,
FLAGS_cpu_threads,
1,
"dynamic",
FLAGS_precision,
time_info,
cv_all_img_names.size());
autolog.report();
}
return 0;
}
......@@ -143,9 +137,15 @@ int main_det(std::vector<cv::String> cv_all_img_names) {
int main_rec(std::vector<cv::String> cv_all_img_names) {
std::vector<double> time_info = {0, 0, 0};
std::string char_list_file = FLAGS_char_list_file;
if (FLAGS_benchmark)
char_list_file = FLAGS_char_list_file.substr(6);
cout << "label file: " << char_list_file << endl;
CRNNRecognizer rec(FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
FLAGS_gpu_mem, FLAGS_cpu_math_library_num_threads,
FLAGS_use_mkldnn, FLAGS_char_list_file,
FLAGS_gpu_mem, FLAGS_cpu_threads,
FLAGS_enable_mkldnn, char_list_file,
FLAGS_use_tensorrt, FLAGS_precision);
for (int i = 0; i < cv_all_img_names.size(); ++i) {
......@@ -164,19 +164,31 @@ int main_rec(std::vector<cv::String> cv_all_img_names) {
time_info[1] += rec_times[1];
time_info[2] += rec_times[2];
}
if (FLAGS_benchmark) {
PrintBenchmarkLog("rec", 1, "dynamic", time_info, cv_all_img_names.size());
AutoLogger autolog("ocr_rec",
FLAGS_use_gpu,
FLAGS_use_tensorrt,
FLAGS_enable_mkldnn,
FLAGS_cpu_threads,
1,
"dynamic",
FLAGS_precision,
time_info,
cv_all_img_names.size());
autolog.report();
}
return 0;
}
int main_system(std::vector<cv::String> cv_all_img_names) {
std::vector<double> time_info_det = {0, 0, 0};
std::vector<double> time_info_rec = {0, 0, 0};
DBDetector det(FLAGS_det_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
FLAGS_gpu_mem, FLAGS_cpu_math_library_num_threads,
FLAGS_use_mkldnn, FLAGS_max_side_len, FLAGS_det_db_thresh,
FLAGS_gpu_mem, FLAGS_cpu_threads,
FLAGS_enable_mkldnn, FLAGS_max_side_len, FLAGS_det_db_thresh,
FLAGS_det_db_box_thresh, FLAGS_det_db_unclip_ratio,
FLAGS_use_polygon_score, FLAGS_visualize,
FLAGS_use_tensorrt, FLAGS_precision);
......@@ -184,22 +196,25 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
Classifier *cls = nullptr;
if (FLAGS_use_angle_cls) {
cls = new Classifier(FLAGS_cls_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
FLAGS_gpu_mem, FLAGS_cpu_math_library_num_threads,
FLAGS_use_mkldnn, FLAGS_cls_thresh,
FLAGS_gpu_mem, FLAGS_cpu_threads,
FLAGS_enable_mkldnn, FLAGS_cls_thresh,
FLAGS_use_tensorrt, FLAGS_precision);
}
std::string char_list_file = FLAGS_char_list_file;
if (FLAGS_benchmark)
char_list_file = FLAGS_char_list_file.substr(6);
cout << "label file: " << char_list_file << endl;
CRNNRecognizer rec(FLAGS_rec_model_dir, FLAGS_use_gpu, FLAGS_gpu_id,
FLAGS_gpu_mem, FLAGS_cpu_math_library_num_threads,
FLAGS_use_mkldnn, FLAGS_char_list_file,
FLAGS_gpu_mem, FLAGS_cpu_threads,
FLAGS_enable_mkldnn, char_list_file,
FLAGS_use_tensorrt, FLAGS_precision);
auto start = std::chrono::system_clock::now();
for (int i = 0; i < cv_all_img_names.size(); ++i) {
LOG(INFO) << "The predict img: " << cv_all_img_names[i];
cv::Mat srcimg = cv::imread(FLAGS_image_dir, cv::IMREAD_COLOR);
cv::Mat srcimg = cv::imread(cv_all_img_names[i], cv::IMREAD_COLOR);
if (!srcimg.data) {
std::cerr << "[ERROR] image read failed! image path: " << cv_all_img_names[i] << endl;
exit(1);
......@@ -209,7 +224,10 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
std::vector<double> rec_times;
det.Run(srcimg, boxes, &det_times);
time_info_det[0] += det_times[0];
time_info_det[1] += det_times[1];
time_info_det[2] += det_times[2];
cv::Mat crop_img;
for (int j = 0; j < boxes.size(); j++) {
crop_img = Utility::GetRotateCropImage(srcimg, boxes[j]);
......@@ -218,18 +236,36 @@ int main_system(std::vector<cv::String> cv_all_img_names) {
crop_img = cls->Run(crop_img);
}
rec.Run(crop_img, &rec_times);
time_info_rec[0] += rec_times[0];
time_info_rec[1] += rec_times[1];
time_info_rec[2] += rec_times[2];
}
auto end = std::chrono::system_clock::now();
auto duration =
std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "Cost "
<< double(duration.count()) *
std::chrono::microseconds::period::num /
std::chrono::microseconds::period::den
<< "s" << std::endl;
}
if (FLAGS_benchmark) {
AutoLogger autolog_det("ocr_det",
FLAGS_use_gpu,
FLAGS_use_tensorrt,
FLAGS_enable_mkldnn,
FLAGS_cpu_threads,
1,
"dynamic",
FLAGS_precision,
time_info_det,
cv_all_img_names.size());
AutoLogger autolog_rec("ocr_rec",
FLAGS_use_gpu,
FLAGS_use_tensorrt,
FLAGS_enable_mkldnn,
FLAGS_cpu_threads,
1,
"dynamic",
FLAGS_precision,
time_info_rec,
cv_all_img_names.size());
autolog_det.report();
std::cout << endl;
autolog_rec.report();
}
return 0;
}
......
......@@ -112,12 +112,16 @@ void CRNNRecognizer::LoadModel(const std::string &model_dir) {
1 << 20, 10, 3,
precision,
false, false);
std::map<std::string, std::vector<int>> min_input_shape = {
{"x", {1, 3, 32, 10}}};
{"x", {1, 3, 32, 10}},
{"lstm_0.tmp_0", {10, 1, 96}}};
std::map<std::string, std::vector<int>> max_input_shape = {
{"x", {1, 3, 32, 2000}}};
{"x", {1, 3, 32, 2000}},
{"lstm_0.tmp_0", {1000, 1, 96}}};
std::map<std::string, std::vector<int>> opt_input_shape = {
{"x", {1, 3, 32, 320}}};
{"x", {1, 3, 32, 320}},
{"lstm_0.tmp_0", {25, 1, 96}}};
config.SetTRTDynamicShapeInfo(min_input_shape, max_input_shape,
opt_input_shape);
......@@ -139,7 +143,7 @@ void CRNNRecognizer::LoadModel(const std::string &model_dir) {
config.SwitchIrOptim(true);
config.EnableMemoryOptim();
config.DisableGlogInfo();
// config.DisableGlogInfo();
this->predictor_ = CreatePredictor(config);
}
......
......@@ -13,7 +13,7 @@ def read_params():
#params for text detector
cfg.det_algorithm = "DB"
cfg.det_model_dir = "./inference/ch_ppocr_mobile_v2.0_det_infer/"
cfg.det_model_dir = "./inference/ch_PP-OCRv2_det_infer/"
cfg.det_limit_side_len = 960
cfg.det_limit_type = 'max'
......
......@@ -13,7 +13,7 @@ def read_params():
#params for text recognizer
cfg.rec_algorithm = "CRNN"
cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v2.0_rec_infer/"
cfg.rec_model_dir = "./inference/ch_PP-OCRv2_rec_infer/"
cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch'
......
......@@ -13,7 +13,7 @@ def read_params():
#params for text detector
cfg.det_algorithm = "DB"
cfg.det_model_dir = "./inference/ch_ppocr_mobile_v2.0_det_infer/"
cfg.det_model_dir = "./inference/ch_PP-OCRv2_det_infer/"
cfg.det_limit_side_len = 960
cfg.det_limit_type = 'max'
......@@ -31,7 +31,7 @@ def read_params():
#params for text recognizer
cfg.rec_algorithm = "CRNN"
cfg.rec_model_dir = "./inference/ch_ppocr_mobile_v2.0_rec_infer/"
cfg.rec_model_dir = "./inference/ch_PP-OCRv2_rec_infer/"
cfg.rec_image_shape = "3, 32, 320"
cfg.rec_char_type = 'ch'
......
......@@ -34,10 +34,10 @@ pip3 install paddlehub==2.1.0 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/sim
```
### 2. 下载推理模型
安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是v2.0版的超轻量模型,默认模型路径为:
安装服务模块前,需要准备推理模型并放到正确路径。默认使用的是PP-OCRv2模型,默认模型路径为:
```
检测模型:./inference/ch_ppocr_mobile_v2.0_det_infer/
识别模型:./inference/ch_ppocr_mobile_v2.0_rec_infer/
检测模型:./inference/ch_PP-OCRv2_det_infer/
识别模型:./inference/ch_PP-OCRv2_rec_infer/
方向分类器:./inference/ch_ppocr_mobile_v2.0_cls_infer/
```
......
......@@ -35,10 +35,10 @@ pip3 install paddlehub==2.1.0 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/sim
```
### 2. Download inference model
Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the ultra lightweight model of v2.0 is used, and the default model path is:
Before installing the service module, you need to prepare the inference model and put it in the correct path. By default, the PP-OCRv2 models are used, and the default model path is:
```
detection model: ./inference/ch_ppocr_mobile_v2.0_det_infer/
recognition model: ./inference/ch_ppocr_mobile_v2.0_rec_infer/
detection model: ./inference/ch_PP-OCRv2_det_infer/
recognition model: ./inference/ch_PP-OCRv2_rec_infer/
text direction classifier: ./inference/ch_ppocr_mobile_v2.0_cls_infer/
```
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册