提交 88964dc9 编写于 作者: J Jethong

add visual png

上级 97111112
...@@ -18,11 +18,13 @@ Global: ...@@ -18,11 +18,13 @@ Global:
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
infer_img: infer_img:
valid_set: totaltext #two mode: totaltext valid curved words, partvgg valid non-curved words valid_set: totaltext # two mode: totaltext valid curved words, partvgg valid non-curved words
save_res_path: ./output/pgnet_r50_vd_totaltext/predicts_pgnet.txt save_res_path: ./output/pgnet_r50_vd_totaltext/predicts_pgnet.txt
character_dict_path: ppocr/utils/pgnet_dict.txt character_dict_path: ppocr/utils/ic15_dict.txt
character_type: EN character_type: EN
max_text_length: 50 max_text_length: 50 # the max length in seq
max_text_nums: 30 # the max seq nums in a pic
tcl_len: 64
Architecture: Architecture:
model_type: e2e model_type: e2e
...@@ -33,13 +35,15 @@ Architecture: ...@@ -33,13 +35,15 @@ Architecture:
layers: 50 layers: 50
Neck: Neck:
name: PGFPN name: PGFPN
model_name: large
Head: Head:
name: PGHead name: PGHead
model_name: large
Loss: Loss:
name: PGLoss name: PGLoss
tcl_bs: 64
max_text_length: 50 # the same as Global: max_text_length
max_text_nums: 30 # the same as Global:max_text_nums
pad_num: 36 # the length of dict for pad
Optimizer: Optimizer:
name: Adam name: Adam
...@@ -54,10 +58,10 @@ Optimizer: ...@@ -54,10 +58,10 @@ Optimizer:
PostProcess: PostProcess:
name: PGPostProcess name: PGPostProcess
score_thresh: 0.8 score_thresh: 0.5
Metric: Metric:
name: E2EMetric name: E2EMetric
character_dict_path: ppocr/utils/pgnet_dict.txt character_dict_path: ppocr/utils/ic15_dict.txt
main_indicator: f_score_e2e main_indicator: f_score_e2e
Train: Train:
......
...@@ -9,8 +9,10 @@ ...@@ -9,8 +9,10 @@
解压数据集和下载标注文件后,PaddleOCR/train_data/part_vgg_synth/train/ 有一个文件夹和一个文件,分别是: 解压数据集和下载标注文件后,PaddleOCR/train_data/part_vgg_synth/train/ 有一个文件夹和一个文件,分别是:
``` ```
/PaddleOCR/train_data/part_vgg_synth/train/ /PaddleOCR/train_data/part_vgg_synth/train/
└─ image/ partvgg数据集的训练数据 |- image/ partvgg数据集的训练数据
└─ train_annotation_info.txt partvgg数据集的测试标注 |- 119_nile_110_31.png
| ...
|- train_annotation_info.txt partvgg数据集的测试标注
``` ```
提供的标注文件格式如下,中间用"\t"分隔: 提供的标注文件格式如下,中间用"\t"分隔:
...@@ -18,7 +20,7 @@ ...@@ -18,7 +20,7 @@
" 图像文件名 图像标注信息--四点标注 图像标注信息--识别标注 " 图像文件名 图像标注信息--四点标注 图像标注信息--识别标注
119_nile_110_31 140.2 222.5 266.0 194.6 278.7 251.8 152.9 279.7 Path: 32.9 133.1 106.0 130.8 106.4 143.8 33.3 146.1 were 21.8 81.9 106.9 80.4 107.7 123.2 22.6 124.7 why 119_nile_110_31 140.2 222.5 266.0 194.6 278.7 251.8 152.9 279.7 Path: 32.9 133.1 106.0 130.8 106.4 143.8 33.3 146.1 were 21.8 81.9 106.9 80.4 107.7 123.2 22.6 124.7 why
``` ```
标注文件txt当中,其中每一行代表一组数据,以第一行为例。第一个代表同级目录image/下面的文件名, 后面每9个代表一组标注信息,前8个代表文本框的四个点坐标(x,y),从左上角的点开始顺时针排列。 标注文件txt当中,其中每一行代表一组数据,以第一行为例。第一个代表同级目录image/下面的文件名前缀, 后面每9个代表一组标注信息,前8个代表文本框的四个点坐标(x,y),从左上角的点开始顺时针排列。
最后一个代表文字的识别结果,**当其内容为“###”时,表示该文本框无效,在训练时会跳过。** 最后一个代表文字的识别结果,**当其内容为“###”时,表示该文本框无效,在训练时会跳过。**
...@@ -26,8 +28,12 @@ ...@@ -26,8 +28,12 @@
解压数据集和下载标注文件后,PaddleOCR/train_data/total_text/train/ 有两个文件夹,分别是: 解压数据集和下载标注文件后,PaddleOCR/train_data/total_text/train/ 有两个文件夹,分别是:
``` ```
/PaddleOCR/train_data/total_text/train/ /PaddleOCR/train_data/total_text/train/
└─ rgb/ total_text数据集的训练数据 |- rgb/ total_text数据集的训练数据
└─ poly/ total_text数据集的测试标注 |- gt_0.png
| ...
|-poly/ total_text数据集的测试标注
|- gt_0.txt
| ...
``` ```
提供的标注文件格式如下,中间用"\t"分隔: 提供的标注文件格式如下,中间用"\t"分隔:
...@@ -36,7 +42,7 @@ ...@@ -36,7 +42,7 @@
1004.0,689.0,1019.0,698.0,1034.0,708.0,1049.0,718.0,1064.0,728.0,1079.0,738.0,1095.0,748.0,1094.0,774.0,1079.0,765.0,1065.0,756.0,1050.0,747.0,1036.0,738.0,1021.0,729.0,1007.0,721.0 EST 1004.0,689.0,1019.0,698.0,1034.0,708.0,1049.0,718.0,1064.0,728.0,1079.0,738.0,1095.0,748.0,1094.0,774.0,1079.0,765.0,1065.0,756.0,1050.0,747.0,1036.0,738.0,1021.0,729.0,1007.0,721.0 EST
1102.0,755.0,1116.0,764.0,1131.0,773.0,1146.0,783.0,1161.0,792.0,1176.0,801.0,1191.0,811.0,1193.0,837.0,1178.0,828.0,1164.0,819.0,1150.0,810.0,1135.0,801.0,1121.0,792.0,1107.0,784.0 1972 1102.0,755.0,1116.0,764.0,1131.0,773.0,1146.0,783.0,1161.0,792.0,1176.0,801.0,1191.0,811.0,1193.0,837.0,1178.0,828.0,1164.0,819.0,1150.0,810.0,1135.0,801.0,1121.0,792.0,1107.0,784.0 1972
``` ```
标注文件当中,其中每一个txt文件代表一组数据,文件名同级目录rgb/下面的文件名。以第一行为例,前面28个代表文本框的十四个点坐标(x,y),从左上角的点开始顺时针排列。 标注文件当中,其中每一个txt文件代表一组数据,文件名就是同级目录rgb/下面的文件名。以第一行为例,前面28个代表文本框的十四个点坐标(x,y),从左上角的点开始顺时针排列。
最后一个代表文字的识别结果,**当其内容为“###”时,表示该文本框无效,在训练时会跳过。** 最后一个代表文字的识别结果,**当其内容为“###”时,表示该文本框无效,在训练时会跳过。**
如果您想在其他数据集上训练,可以按照上述形式构建标注文件。 如果您想在其他数据集上训练,可以按照上述形式构建标注文件。
......
...@@ -29,7 +29,7 @@ inference 模型(`paddle.jit.save`保存的模型) ...@@ -29,7 +29,7 @@ inference 模型(`paddle.jit.save`保存的模型)
- [5. 多语言模型的推理](#多语言模型的推理) - [5. 多语言模型的推理](#多语言模型的推理)
- [四、端到端模型推理](#端到端模型推理) - [四、端到端模型推理](#端到端模型推理)
- [1. PGNet端到端模型推理](#SAST文本检测模型推理) - [1. PGNet端到端模型推理](#PGNet端到端模型推理)
- [五、方向分类模型推理](#方向识别模型推理) - [五、方向分类模型推理](#方向识别模型推理)
- [1. 方向分类模型推理](#方向分类模型推理) - [1. 方向分类模型推理](#方向分类模型推理)
...@@ -366,7 +366,7 @@ Predicts of ./doc/imgs_words/korean/1.jpg:('바탕으로', 0.9948904) ...@@ -366,7 +366,7 @@ Predicts of ./doc/imgs_words/korean/1.jpg:('바탕으로', 0.9948904)
## 四、端到端模型推理 ## 四、端到端模型推理
端到端模型推理,默认使用PGNet模型的配置参数。当不使用PGNet模型时,在推理时,需要通过传入相应的参数进行算法适配,细节参考下文。 端到端模型推理,默认使用PGNet模型的配置参数。当不使用PGNet模型时,在推理时,需要通过传入相应的参数进行算法适配,细节参考下文。
<a name="SAST文本检测模型推理"></a> <a name="PGNet端到端模型推理"></a>
### 1. PGNet端到端模型推理 ### 1. PGNet端到端模型推理
#### (1). 四边形文本检测模型(ICDAR2015) #### (1). 四边形文本检测模型(ICDAR2015)
首先将PGNet端到端训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)),可以使用如下命令进行转换: 首先将PGNet端到端训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)),可以使用如下命令进行转换:
...@@ -375,28 +375,26 @@ python3 tools/export_model.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.pretrai ...@@ -375,28 +375,26 @@ python3 tools/export_model.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.pretrai
``` ```
**PGNet端到端模型推理,需要设置参数`--e2e_algorithm="PGNet"`**,可以执行如下命令: **PGNet端到端模型推理,需要设置参数`--e2e_algorithm="PGNet"`**,可以执行如下命令:
``` ```
python3 tools/infer/predict_e2e.py --e2e_algorithm="PGNet" --image_dir="./doc/imgs_en/img_10.jpg" --e2e_model_dir="./inference/e2e_pgnet_ic15/" python3 tools/infer/predict_e2e.py --e2e_algorithm="PGNet" --image_dir="./doc/imgs_en/img_10.jpg" --e2e_model_dir="./inference/e2e/" --e2e_pgnet_polygon=False
``` ```
可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'e2e_res'。结果示例如下: 可视化文本检测结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'e2e_res'。结果示例如下:
![](../imgs_results/det_res_img_10_sast.jpg) ![](../imgs_results/e2e_res_img_10_pgnet.jpg)
#### (2). 弯曲文本检测模型(Total-Text) #### (2). 弯曲文本检测模型(Total-Text)
首先将PGNet端到端训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在Total-Text英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)),可以使用如下命令进行转换: 首先将PGNet端到端训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在Total-Text英文数据集训练的模型为例([模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_totaltext_v2.0_train.tar)),可以使用如下命令进行转换:
``` ```
python3 tools/export_model.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.pretrained_model=./det_r50_vd_sast_totaltext_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/e2e_pgnet_tt python3 tools/export_model.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.pretrained_model=./det_r50_vd_sast_totaltext_v2.0_train/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/e2e
``` ```
**PGNet端到端模型推理,需要设置参数`--e2e_algorithm="PGNet"`,同时,还需要增加参数`--e2e_pgnet_polygon=True`,**可以执行如下命令: **PGNet端到端模型推理,需要设置参数`--e2e_algorithm="PGNet"`,同时,还需要增加参数`--e2e_pgnet_polygon=True`,**可以执行如下命令:
``` ```
python3 tools/infer/predict_e2e.py --e2e_algorithm="PGNet" --image_dir="./doc/imgs_en/img623.jpg" --e2e_model_dir="./inference/e2e_pgnet_tt/" --e2e_pgnet_polygon=True python3 tools/infer/predict_e2e.py --e2e_algorithm="PGNet" --image_dir="./doc/imgs_en/img623.jpg" --e2e_model_dir="./inference/e2e/" --e2e_pgnet_polygon=True
``` ```
可视化文本端到端结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'e2e_res'。结果示例如下: 可视化文本端到端结果默认保存到`./inference_results`文件夹里面,结果文件的名称前缀为'e2e_res'。结果示例如下:
![](../imgs_results/e2e_res_img623_pg.jpg) ![](../imgs_results/e2e_res_img623_pgnet.jpg)
**注意**:本代码库中,SAST后处理Locality-Aware NMS有python和c++两种版本,c++版速度明显快于python版。由于c++版本nms编译版本问题,只有python3.5环境下会调用c++版nms,其他情况将调用python版nms。
<a name="方向分类模型推理"></a> <a name="方向分类模型推理"></a>
......
...@@ -197,17 +197,17 @@ class E2ELabelEncode(BaseRecLabelEncode): ...@@ -197,17 +197,17 @@ class E2ELabelEncode(BaseRecLabelEncode):
super(E2ELabelEncode, super(E2ELabelEncode,
self).__init__(max_text_length, character_dict_path, self).__init__(max_text_length, character_dict_path,
character_type, use_space_char) character_type, use_space_char)
self.pad_num = len(self.dict) # the length to pad
def __call__(self, data): def __call__(self, data):
texts = data['strs'] texts = data['strs']
temp_texts = [] temp_texts = []
for text in texts: for text in texts:
text = text.upper() text = text.lower()
text = self.encode(text) text = self.encode(text)
if text is None: if text is None:
return None return None
text = text + [36] * (self.max_text_len - len(text) text = text + [self.pad_num] * (self.max_text_len - len(text))
) # use 36 to pad
temp_texts.append(text) temp_texts.append(text)
data['strs'] = np.array(temp_texts) data['strs'] = np.array(temp_texts)
return data return data
......
...@@ -22,16 +22,23 @@ __all__ = ['PGProcessTrain'] ...@@ -22,16 +22,23 @@ __all__ = ['PGProcessTrain']
class PGProcessTrain(object): class PGProcessTrain(object):
def __init__(self, def __init__(self,
character_dict_path, character_dict_path,
max_text_length,
max_text_nums,
tcl_len,
batch_size=14, batch_size=14,
min_crop_size=24, min_crop_size=24,
min_text_size=10, min_text_size=10,
max_text_size=512, max_text_size=512,
**kwargs): **kwargs):
self.tcl_len = tcl_len
self.max_text_length = max_text_length
self.max_text_nums = max_text_nums
self.batch_size = batch_size self.batch_size = batch_size
self.min_crop_size = min_crop_size self.min_crop_size = min_crop_size
self.min_text_size = min_text_size self.min_text_size = min_text_size
self.max_text_size = max_text_size self.max_text_size = max_text_size
self.Lexicon_Table = self.get_dict(character_dict_path) self.Lexicon_Table = self.get_dict(character_dict_path)
self.pad_num = len(self.Lexicon_Table)
self.img_id = 0 self.img_id = 0
def get_dict(self, character_dict_path): def get_dict(self, character_dict_path):
...@@ -290,7 +297,7 @@ class PGProcessTrain(object): ...@@ -290,7 +297,7 @@ class PGProcessTrain(object):
height_list.append(quad_h) height_list.append(quad_h)
norm_width = max(sum(width_list) / n_char, 1.0) norm_width = max(sum(width_list) / n_char, 1.0)
average_height = max(sum(height_list) / len(height_list), 1.0) average_height = max(sum(height_list) / len(height_list), 1.0)
k = 1
for quad in poly_quads: for quad in poly_quads:
direct_vector_full = ( direct_vector_full = (
(quad[1] + quad[2]) - (quad[0] + quad[3])) / 2.0 (quad[1] + quad[2]) - (quad[0] + quad[3])) / 2.0
...@@ -302,6 +309,8 @@ class PGProcessTrain(object): ...@@ -302,6 +309,8 @@ class PGProcessTrain(object):
cv2.fillPoly(direction_map, cv2.fillPoly(direction_map,
quad.round().astype(np.int32)[np.newaxis, :, :], quad.round().astype(np.int32)[np.newaxis, :, :],
direction_label) direction_label)
cv2.imwrite("output/{}.png".format(k), direction_map * 255.0)
k += 1
return direction_map return direction_map
def calculate_average_height(self, poly_quads): def calculate_average_height(self, poly_quads):
...@@ -371,7 +380,6 @@ class PGProcessTrain(object): ...@@ -371,7 +380,6 @@ class PGProcessTrain(object):
continue continue
if tag: if tag:
# continue
cv2.fillPoly(training_mask, cv2.fillPoly(training_mask,
poly.astype(np.int32)[np.newaxis, :, :], 0.15) poly.astype(np.int32)[np.newaxis, :, :], 0.15)
else: else:
...@@ -577,7 +585,7 @@ class PGProcessTrain(object): ...@@ -577,7 +585,7 @@ class PGProcessTrain(object):
Prepare text lablel by given Lexicon_Table. Prepare text lablel by given Lexicon_Table.
""" """
if len(Lexicon_Table) == 36: if len(Lexicon_Table) == 36:
return label_str.upper() return label_str.lower()
else: else:
return label_str return label_str
...@@ -846,23 +854,23 @@ class PGProcessTrain(object): ...@@ -846,23 +854,23 @@ class PGProcessTrain(object):
return None return None
pos_list_temp = np.zeros([64, 3]) pos_list_temp = np.zeros([64, 3])
pos_mask_temp = np.zeros([64, 1]) pos_mask_temp = np.zeros([64, 1])
label_list_temp = np.zeros([50, 1]) + 36 label_list_temp = np.zeros([self.max_text_length, 1]) + self.pad_num
for i, label in enumerate(label_list): for i, label in enumerate(label_list):
n = len(label) n = len(label)
if n > 50: if n > self.max_text_length:
label_list[i] = label[:50] label_list[i] = label[:self.max_text_length]
continue continue
while n < 50: while n < self.max_text_length:
label.append([36]) label.append([self.pad_num])
n += 1 n += 1
for i in range(len(label_list)): for i in range(len(label_list)):
label_list[i] = np.array(label_list[i]) label_list[i] = np.array(label_list[i])
if len(pos_list) <= 0 or len(pos_list) > 30: #一张图片中最多存在30行文本 if len(pos_list) <= 0 or len(pos_list) > self.max_text_nums:
return None return None
for __ in range(30 - len(pos_list), 0, -1): for __ in range(self.max_text_nums - len(pos_list), 0, -1):
pos_list.append(pos_list_temp) pos_list.append(pos_list_temp)
pos_mask.append(pos_mask_temp) pos_mask.append(pos_mask_temp)
label_list.append(label_list_temp) label_list.append(label_list_temp)
......
...@@ -156,6 +156,7 @@ class PGDataSet(Dataset): ...@@ -156,6 +156,7 @@ class PGDataSet(Dataset):
img = f.read() img = f.read()
data['image'] = img data['image'] = img
outs = transform(data, self.ops) outs = transform(data, self.ops)
except Exception as e: except Exception as e:
self.logger.error( self.logger.error(
"When parsing line {}, error happened with msg: {}".format( "When parsing line {}, error happened with msg: {}".format(
......
...@@ -18,102 +18,26 @@ from __future__ import print_function ...@@ -18,102 +18,26 @@ from __future__ import print_function
from paddle import nn from paddle import nn
import paddle import paddle
import numpy as np
import copy
from .det_basic_loss import DiceLoss from .det_basic_loss import DiceLoss
from ppocr.utils.e2e_utils.extract_batchsize import *
class PGLoss(nn.Layer): class PGLoss(nn.Layer):
def __init__(self, eps=1e-6, **kwargs): def __init__(self,
tcl_bs,
max_text_length,
max_text_nums,
pad_num,
eps=1e-6,
**kwargs):
super(PGLoss, self).__init__() super(PGLoss, self).__init__()
self.tcl_bs = tcl_bs
self.max_text_nums = max_text_nums
self.max_text_length = max_text_length
self.pad_num = pad_num
self.dice_loss = DiceLoss(eps=eps) self.dice_loss = DiceLoss(eps=eps)
def org_tcl_rois(self, batch_size, pos_lists, pos_masks, label_lists):
"""
"""
pos_lists_, pos_masks_, label_lists_ = [], [], []
img_bs = batch_size
tcl_bs = 64
ngpu = int(batch_size / img_bs)
img_ids = np.array(pos_lists, dtype=np.int32)[:, 0, 0].copy()
pos_lists_split, pos_masks_split, label_lists_split = [], [], []
for i in range(ngpu):
pos_lists_split.append([])
pos_masks_split.append([])
label_lists_split.append([])
for i in range(img_ids.shape[0]):
img_id = img_ids[i]
gpu_id = int(img_id / img_bs)
img_id = img_id % img_bs
pos_list = pos_lists[i].copy()
pos_list[:, 0] = img_id
pos_lists_split[gpu_id].append(pos_list)
pos_masks_split[gpu_id].append(pos_masks[i].copy())
label_lists_split[gpu_id].append(copy.deepcopy(label_lists[i]))
# repeat or delete
for i in range(ngpu):
vp_len = len(pos_lists_split[i])
if vp_len <= tcl_bs:
for j in range(0, tcl_bs - vp_len):
pos_list = pos_lists_split[i][j].copy()
pos_lists_split[i].append(pos_list)
pos_mask = pos_masks_split[i][j].copy()
pos_masks_split[i].append(pos_mask)
label_list = copy.deepcopy(label_lists_split[i][j])
label_lists_split[i].append(label_list)
else:
for j in range(0, vp_len - tcl_bs):
c_len = len(pos_lists_split[i])
pop_id = np.random.permutation(c_len)[0]
pos_lists_split[i].pop(pop_id)
pos_masks_split[i].pop(pop_id)
label_lists_split[i].pop(pop_id)
# merge
for i in range(ngpu):
pos_lists_.extend(pos_lists_split[i])
pos_masks_.extend(pos_masks_split[i])
label_lists_.extend(label_lists_split[i])
return pos_lists_, pos_masks_, label_lists_
def pre_process(self, label_list, pos_list, pos_mask):
max_len = 30 # the max texts in a single image
max_str_len = 50 # the max len in a single text
pad_num = 36 # padding num
label_list = label_list.numpy()
batch, _, _, _ = label_list.shape
pos_list = pos_list.numpy()
pos_mask = pos_mask.numpy()
pos_list_t = []
pos_mask_t = []
label_list_t = []
for i in range(batch):
for j in range(max_len):
if pos_mask[i, j].any():
pos_list_t.append(pos_list[i][j])
pos_mask_t.append(pos_mask[i][j])
label_list_t.append(label_list[i][j])
pos_list, pos_mask, label_list = self.org_tcl_rois(
batch, pos_list_t, pos_mask_t, label_list_t)
label = []
tt = [l.tolist() for l in label_list]
for i in range(batch):
k = 0
for j in range(max_str_len):
if tt[i][j][0] != pad_num:
k += 1
else:
break
label.append(k)
label = paddle.to_tensor(label)
label = paddle.cast(label, dtype='int64')
pos_list = paddle.to_tensor(pos_list)
pos_mask = paddle.to_tensor(pos_mask)
label_list = paddle.squeeze(paddle.to_tensor(label_list), axis=2)
label_list = paddle.cast(label_list, dtype='int32')
return pos_list, pos_mask, label_list, label
def border_loss(self, f_border, l_border, l_score, l_mask): def border_loss(self, f_border, l_border, l_score, l_mask):
l_border_split, l_border_norm = paddle.tensor.split( l_border_split, l_border_norm = paddle.tensor.split(
l_border, num_or_sections=[4, 1], axis=1) l_border, num_or_sections=[4, 1], axis=1)
...@@ -183,7 +107,7 @@ class PGLoss(nn.Layer): ...@@ -183,7 +107,7 @@ class PGLoss(nn.Layer):
labels=tcl_label, labels=tcl_label,
input_lengths=input_lengths, input_lengths=input_lengths,
label_lengths=label_t, label_lengths=label_t,
blank=36, blank=self.pad_num,
reduction='none') reduction='none')
cost = cost.mean() cost = cost.mean()
return cost return cost
...@@ -192,12 +116,14 @@ class PGLoss(nn.Layer): ...@@ -192,12 +116,14 @@ class PGLoss(nn.Layer):
images, tcl_maps, tcl_label_maps, border_maps \ images, tcl_maps, tcl_label_maps, border_maps \
, direction_maps, training_masks, label_list, pos_list, pos_mask = labels , direction_maps, training_masks, label_list, pos_list, pos_mask = labels
# for all the batch_size # for all the batch_size
pos_list, pos_mask, label_list, label_t = self.pre_process( pos_list, pos_mask, label_list, label_t = pre_process(
label_list, pos_list, pos_mask) label_list, pos_list, pos_mask, self.max_text_length,
self.max_text_nums, self.pad_num, self.tcl_bs)
f_score, f_boder, f_direction, f_char = predicts f_score, f_border, f_direction, f_char = predicts['f_score'], predicts['f_border'], predicts['f_direction'], \
predicts['f_char']
score_loss = self.dice_loss(f_score, tcl_maps, training_masks) score_loss = self.dice_loss(f_score, tcl_maps, training_masks)
border_loss = self.border_loss(f_boder, border_maps, tcl_maps, border_loss = self.border_loss(f_border, border_maps, tcl_maps,
training_masks) training_masks)
direction_loss = self.direction_loss(f_direction, direction_maps, direction_loss = self.direction_loss(f_direction, direction_maps,
tcl_maps, training_masks) tcl_maps, training_masks)
......
...@@ -66,9 +66,8 @@ class PGHead(nn.Layer): ...@@ -66,9 +66,8 @@ class PGHead(nn.Layer):
""" """
""" """
def __init__(self, in_channels, model_name, **kwargs): def __init__(self, in_channels, **kwargs):
super(PGHead, self).__init__() super(PGHead, self).__init__()
self.model_name = model_name
self.conv_f_score1 = ConvBNLayer( self.conv_f_score1 = ConvBNLayer(
in_channels=in_channels, in_channels=in_channels,
out_channels=64, out_channels=64,
......
...@@ -23,8 +23,7 @@ __dir__ = os.path.dirname(__file__) ...@@ -23,8 +23,7 @@ __dir__ = os.path.dirname(__file__)
sys.path.append(__dir__) sys.path.append(__dir__)
sys.path.append(os.path.join(__dir__, '..')) sys.path.append(os.path.join(__dir__, '..'))
from ppocr.utils.e2e_utils.extract_textpoint import * from ppocr.utils.e2e_utils.extract_textpoint import get_dict, generate_pivot_list, restore_poly
from ppocr.utils.e2e_utils.visual import *
import paddle import paddle
...@@ -34,16 +33,10 @@ class PGPostProcess(object): ...@@ -34,16 +33,10 @@ class PGPostProcess(object):
""" """
def __init__(self, character_dict_path, valid_set, score_thresh, **kwargs): def __init__(self, character_dict_path, valid_set, score_thresh, **kwargs):
self.Lexicon_Table = get_dict(character_dict_path) self.Lexicon_Table = get_dict(character_dict_path)
self.valid_set = valid_set self.valid_set = valid_set
self.score_thresh = score_thresh self.score_thresh = score_thresh
# c++ la-nms is faster, but only support python 3.5
self.is_python35 = False
if sys.version_info.major == 3 and sys.version_info.minor == 5:
self.is_python35 = True
def __call__(self, outs_dict, shape_list): def __call__(self, outs_dict, shape_list):
p_score = outs_dict['f_score'] p_score = outs_dict['f_score']
p_border = outs_dict['f_border'] p_border = outs_dict['f_border']
...@@ -61,96 +54,15 @@ class PGPostProcess(object): ...@@ -61,96 +54,15 @@ class PGPostProcess(object):
p_char = p_char[0] p_char = p_char[0]
src_h, src_w, ratio_h, ratio_w = shape_list[0] src_h, src_w, ratio_h, ratio_w = shape_list[0]
is_curved = self.valid_set == "totaltext" instance_yxs_list, seq_strs = generate_pivot_list(
instance_yxs_list = generate_pivot_list(
p_score, p_score,
p_char, p_char,
p_direction, p_direction,
score_thresh=self.score_thresh, self.Lexicon_Table,
is_backbone=True, score_thresh=self.score_thresh)
is_curved=is_curved) poly_list, keep_str_list = restore_poly(instance_yxs_list, seq_strs,
p_char = np.expand_dims(p_char, axis=0) p_border, ratio_w, ratio_h,
p_char = paddle.to_tensor(p_char) src_w, src_h, self.valid_set)
char_seq_idx_set = []
for i in range(len(instance_yxs_list)):
gather_info_lod = paddle.to_tensor(instance_yxs_list[i])
f_char_map = paddle.transpose(p_char, [0, 2, 3, 1])
featyre_seq = paddle.gather_nd(f_char_map, gather_info_lod)
featyre_seq = np.expand_dims(featyre_seq.numpy(), axis=0)
t = len(featyre_seq[0])
featyre_seq = paddle.to_tensor(featyre_seq)
l = np.array([[t]]).astype(np.int64)
length = paddle.to_tensor(l)
seq_pred = paddle.fluid.layers.ctc_greedy_decoder(
input=featyre_seq, blank=36, input_length=length)
seq_pred1 = seq_pred[0].numpy().tolist()[0]
seq_len = seq_pred[1].numpy()[0][0]
temp_t = []
for x in seq_pred1[:seq_len]:
temp_t.append(x)
char_seq_idx_set.append(temp_t)
seq_strs = []
for char_idx_set in char_seq_idx_set:
pr_str = ''.join([self.Lexicon_Table[pos] for pos in char_idx_set])
seq_strs.append(pr_str)
poly_list = []
keep_str_list = []
all_point_list = []
all_point_pair_list = []
for yx_center_line, keep_str in zip(instance_yxs_list, seq_strs):
if len(yx_center_line) == 1:
yx_center_line.append(yx_center_line[-1])
offset_expand = 1.0
if self.valid_set == 'totaltext':
offset_expand = 1.2
point_pair_list = []
for batch_id, y, x in yx_center_line:
offset = p_border[:, y, x].reshape(2, 2)
if offset_expand != 1.0:
offset_length = np.linalg.norm(
offset, axis=1, keepdims=True)
expand_length = np.clip(
offset_length * (offset_expand - 1),
a_min=0.5,
a_max=3.0)
offset_detal = offset / offset_length * expand_length
offset = offset + offset_detal
ori_yx = np.array([y, x], dtype=np.float32)
point_pair = (ori_yx + offset)[:, ::-1] * 4.0 / np.array(
[ratio_w, ratio_h]).reshape(-1, 2)
point_pair_list.append(point_pair)
all_point_list.append([
int(round(x * 4.0 / ratio_w)),
int(round(y * 4.0 / ratio_h))
])
all_point_pair_list.append(point_pair.round().astype(np.int32)
.tolist())
detected_poly, pair_length_info = point_pair2poly(point_pair_list)
detected_poly = expand_poly_along_width(
detected_poly, shrink_ratio_of_width=0.2)
detected_poly[:, 0] = np.clip(
detected_poly[:, 0], a_min=0, a_max=src_w)
detected_poly[:, 1] = np.clip(
detected_poly[:, 1], a_min=0, a_max=src_h)
if len(keep_str) < 2:
continue
keep_str_list.append(keep_str)
if self.valid_set == 'partvgg':
middle_point = len(detected_poly) // 2
detected_poly = detected_poly[
[0, middle_point - 1, middle_point, -1], :]
poly_list.append(detected_poly)
elif self.valid_set == 'totaltext':
poly_list.append(detected_poly)
else:
print('--> Not supported format.')
exit(-1)
data = { data = {
'points': poly_list, 'points': poly_list,
'strs': keep_str_list, 'strs': keep_str_list,
......
import paddle
import numpy as np
import copy
def org_tcl_rois(batch_size, pos_lists, pos_masks, label_lists, tcl_bs):
"""
"""
pos_lists_, pos_masks_, label_lists_ = [], [], []
img_bs = batch_size
ngpu = int(batch_size / img_bs)
img_ids = np.array(pos_lists, dtype=np.int32)[:, 0, 0].copy()
pos_lists_split, pos_masks_split, label_lists_split = [], [], []
for i in range(ngpu):
pos_lists_split.append([])
pos_masks_split.append([])
label_lists_split.append([])
for i in range(img_ids.shape[0]):
img_id = img_ids[i]
gpu_id = int(img_id / img_bs)
img_id = img_id % img_bs
pos_list = pos_lists[i].copy()
pos_list[:, 0] = img_id
pos_lists_split[gpu_id].append(pos_list)
pos_masks_split[gpu_id].append(pos_masks[i].copy())
label_lists_split[gpu_id].append(copy.deepcopy(label_lists[i]))
# repeat or delete
for i in range(ngpu):
vp_len = len(pos_lists_split[i])
if vp_len <= tcl_bs:
for j in range(0, tcl_bs - vp_len):
pos_list = pos_lists_split[i][j].copy()
pos_lists_split[i].append(pos_list)
pos_mask = pos_masks_split[i][j].copy()
pos_masks_split[i].append(pos_mask)
label_list = copy.deepcopy(label_lists_split[i][j])
label_lists_split[i].append(label_list)
else:
for j in range(0, vp_len - tcl_bs):
c_len = len(pos_lists_split[i])
pop_id = np.random.permutation(c_len)[0]
pos_lists_split[i].pop(pop_id)
pos_masks_split[i].pop(pop_id)
label_lists_split[i].pop(pop_id)
# merge
for i in range(ngpu):
pos_lists_.extend(pos_lists_split[i])
pos_masks_.extend(pos_masks_split[i])
label_lists_.extend(label_lists_split[i])
return pos_lists_, pos_masks_, label_lists_
def pre_process(label_list, pos_list, pos_mask, max_text_length, max_text_nums,
pad_num, tcl_bs):
label_list = label_list.numpy()
batch, _, _, _ = label_list.shape
pos_list = pos_list.numpy()
pos_mask = pos_mask.numpy()
pos_list_t = []
pos_mask_t = []
label_list_t = []
for i in range(batch):
for j in range(max_text_nums):
if pos_mask[i, j].any():
pos_list_t.append(pos_list[i][j])
pos_mask_t.append(pos_mask[i][j])
label_list_t.append(label_list[i][j])
pos_list, pos_mask, label_list = org_tcl_rois(batch, pos_list_t, pos_mask_t,
label_list_t, tcl_bs)
label = []
tt = [l.tolist() for l in label_list]
for i in range(tcl_bs):
k = 0
for j in range(max_text_length):
if tt[i][j][0] != pad_num:
k += 1
else:
break
label.append(k)
label = paddle.to_tensor(label)
label = paddle.cast(label, dtype='int64')
pos_list = paddle.to_tensor(pos_list)
pos_mask = paddle.to_tensor(pos_mask)
label_list = paddle.squeeze(paddle.to_tensor(label_list), axis=2)
label_list = paddle.cast(label_list, dtype='int32')
return pos_list, pos_mask, label_list, label
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
\ No newline at end of file
...@@ -39,10 +39,7 @@ class TextE2e(object): ...@@ -39,10 +39,7 @@ class TextE2e(object):
self.args = args self.args = args
self.e2e_algorithm = args.e2e_algorithm self.e2e_algorithm = args.e2e_algorithm
pre_process_list = [{ pre_process_list = [{
'E2EResizeForTest': { 'E2EResizeForTest': {}
'max_side_len': 768,
'valid_set': 'totaltext'
}
}, { }, {
'NormalizeImage': { 'NormalizeImage': {
'std': [0.229, 0.224, 0.225], 'std': [0.229, 0.224, 0.225],
...@@ -70,12 +67,6 @@ class TextE2e(object): ...@@ -70,12 +67,6 @@ class TextE2e(object):
postprocess_params["character_dict_path"] = args.e2e_char_dict_path postprocess_params["character_dict_path"] = args.e2e_char_dict_path
postprocess_params["valid_set"] = args.e2e_pgnet_valid_set postprocess_params["valid_set"] = args.e2e_pgnet_valid_set
self.e2e_pgnet_polygon = args.e2e_pgnet_polygon self.e2e_pgnet_polygon = args.e2e_pgnet_polygon
if self.e2e_pgnet_polygon:
postprocess_params["expand_scale"] = 1.2
postprocess_params["shrink_ratio_of_width"] = 0.2
else:
postprocess_params["expand_scale"] = 1.0
postprocess_params["shrink_ratio_of_width"] = 0.3
else: else:
logger.info("unknown e2e_algorithm:{}".format(self.e2e_algorithm)) logger.info("unknown e2e_algorithm:{}".format(self.e2e_algorithm))
sys.exit(0) sys.exit(0)
...@@ -102,6 +93,7 @@ class TextE2e(object): ...@@ -102,6 +93,7 @@ class TextE2e(object):
return dt_boxes return dt_boxes
def __call__(self, img): def __call__(self, img):
ori_im = img.copy() ori_im = img.copy()
data = {'image': img} data = {'image': img}
data = transform(data, self.preprocess_op) data = transform(data, self.preprocess_op)
...@@ -109,7 +101,6 @@ class TextE2e(object): ...@@ -109,7 +101,6 @@ class TextE2e(object):
if img is None: if img is None:
return None, 0 return None, 0
img = np.expand_dims(img, axis=0) img = np.expand_dims(img, axis=0)
print(img.shape)
shape_list = np.expand_dims(shape_list, axis=0) shape_list = np.expand_dims(shape_list, axis=0)
img = img.copy() img = img.copy()
starttime = time.time() starttime = time.time()
...@@ -123,13 +114,12 @@ class TextE2e(object): ...@@ -123,13 +114,12 @@ class TextE2e(object):
preds = {} preds = {}
if self.e2e_algorithm == 'PGNet': if self.e2e_algorithm == 'PGNet':
preds['f_score'] = outputs[0] preds['f_border'] = outputs[0]
preds['f_border'] = outputs[1] preds['f_char'] = outputs[1]
preds['f_direction'] = outputs[2] preds['f_direction'] = outputs[2]
preds['f_char'] = outputs[3] preds['f_score'] = outputs[3]
else: else:
raise NotImplementedError raise NotImplementedError
post_result = self.postprocess_op(preds, shape_list) post_result = self.postprocess_op(preds, shape_list)
points, strs = post_result['points'], post_result['strs'] points, strs = post_result['points'], post_result['strs']
dt_boxes = self.filter_tag_det_res_only_clip(points, ori_im.shape) dt_boxes = self.filter_tag_det_res_only_clip(points, ori_im.shape)
......
...@@ -83,11 +83,9 @@ def parse_args(): ...@@ -83,11 +83,9 @@ def parse_args():
# PGNet parmas # PGNet parmas
parser.add_argument("--e2e_pgnet_score_thresh", type=float, default=0.5) parser.add_argument("--e2e_pgnet_score_thresh", type=float, default=0.5)
parser.add_argument( parser.add_argument(
"--e2e_char_dict_path", "--e2e_char_dict_path", type=str, default="./ppocr/utils/ic15_dict.txt")
type=str,
default="./ppocr/utils/pgnet_dict.txt")
parser.add_argument("--e2e_pgnet_valid_set", type=str, default='totaltext') parser.add_argument("--e2e_pgnet_valid_set", type=str, default='totaltext')
parser.add_argument("--e2e_pgnet_polygon", type=bool, default=False) parser.add_argument("--e2e_pgnet_polygon", type=bool, default=True)
# params for text classifier # params for text classifier
parser.add_argument("--use_angle_cls", type=str2bool, default=False) parser.add_argument("--use_angle_cls", type=str2bool, default=False)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册