未验证 提交 fca0693f 编写于 作者: B BreezeDeus 提交者: GitHub

Merge pull request #176 from breezedeus/dev

fix: dependencies for onnx and onnxruntime
......@@ -33,7 +33,7 @@ doc:
package:
python setup.py sdist bdist_wheel
VERSION = 2.1.1
VERSION = 2.1.1.1
upload:
python -m twine upload dist/cnocr-$(VERSION)* --verbose
......
......@@ -17,4 +17,4 @@
# specific language governing permissions and limitations
# under the License.
__version__ = '2.1.1'
__version__ = '2.1.1.1'
......@@ -28,7 +28,6 @@ from operator import itemgetter
from pathlib import Path
import click
import Levenshtein
from torchvision import transforms as T
import torch
......@@ -275,6 +274,12 @@ def evaluate(
verbose,
):
"""评估模型效果"""
try:
import Levenshtein
except Exception as e:
logger.error(e)
logger.error('try to install the package by using `pip install python-Levenshtein`')
return
ocr = CnOcr(model_name=model_name, model_fp=pretrained_model_fp, context=context)
fn_labels_list = read_input_file(eval_index_fp)
......
# Release Notes
### Update 2022.05.15: 发布 cnocr V2.1.1.1
主要变更:
- 增加了对 **ONNX** 模型的支持,支持 **`*-fc`** 模型,提升预测速度;
-`CnOcr` 的初始化中增加了参数 `model_backend``vocab_fp`,具体参见 [使用方法](usage.md)
- 增加了 `cnocr export-onnx` 命令,把训练好的PyTorch模型导出为ONNX模型;
- 去掉了对包 `python-Levenshtein` 的依赖。
### Update 2021.11.06: 发布 cnocr V2.1.0
主要变更:
......@@ -9,7 +19,6 @@
* 提供了更多预训练好的模型;
* 加入了 `cnocr evaluate` 命令以评估效果。
### Update 2021.09.21: 发布 cnocr V2.0.1
主要变更:
......@@ -17,7 +26,6 @@
* 重新训练了模型,模型识别精度略有提升;
* 函数 `CnOcr.ocr_for_single_lines(img_list, batch_size=1)` 中加入了 `batch_size` 参数。
### Update 2021.08.26: 发布 cnocr V2.0.0
主要变更:
......@@ -29,9 +37,6 @@
* 优化了对场景文字的识别效果;
* 使用接口略有调整,请谨慎更新。
### Update 2021.08.24: 发布 cnocr V1.2.3
主要变更:
......@@ -39,7 +44,6 @@
* 更改了模型的默认下载urls;
* 依赖中去掉了对numpy的约束。
### Update 2020.05.29: 发布 cnocr V1.2.2
主要变更:
......@@ -48,8 +52,6 @@
* bugfix:
* 修复同时初始化多个实例时会报错的问题。
### Update 2020.05.25: 发布 cnocr V1.2.1
主要变更:
......@@ -57,8 +59,6 @@
* bugfix:
* 修复了zip文件名的typo。
### Update 2020.05.25: 发布 cnocr V1.2.0
主要变更:
......@@ -74,14 +74,10 @@
* 输入图片宽度很小时导致异常;
* 去掉 `f-print`
### Update 2020.04.21: 发布 cnocr V1.1.0
V1.1.0对代码做了很大改动,重写了大部分训练的代码,也生成了更多更难的训练和测试数据。训练好的模型相较于之前版本的模型精度有显著提升,尤其是针对英文单词的识别。
以下列出了主要的变更:
* 更新了训练代码,使用mxnet的`recordio`首先把数据转换成二进制格式,提升后续的训练效率。训练时支持对图片做实时数据增强。也加入了更多可传入的参数。
......@@ -100,17 +96,13 @@ V1.1.0对代码做了很大改动,重写了大部分训练的代码,也生
* mxnet依赖升级到更新的版本了。很多人反馈mxnet `1.4.1`经常找不到没法装,现在升级到`>=1.5.0,<1.7.0`
### Update 2019.07.25: 发布 cnocr V1.0.0
`cnocr`发布了预测效率更高的新版本v1.0.0。**新版本的模型跟以前版本的模型不兼容**。所以如果大家是升级的话,需要重新下载最新的模型文件。具体说明见下面(流程和原来相同)。
主要改动如下:
- **crnn模型支持可变长预测,提升预测效率**
- 支持利用特定数据对现有模型进行精调(继续训练)
- 修复bugs,如训练时`accuracy`一直为`0`
- 依赖的 `mxnet` 版本从`1.3.1`更新至 `1.4.1`
- **crnn模型支持可变长预测,提升预测效率**
- 支持利用特定数据对现有模型进行精调(继续训练)
- 修复bugs,如训练时`accuracy`一直为`0`
- 依赖的 `mxnet` 版本从`1.3.1`更新至 `1.4.1`
# 强强联合:[CnStd](https://github.com/breezedeus/cnstd) + CnOcr
关于为什么要结合 [CnStd](https://github.com/breezedeus/cnstd) 和 CnOcr 一起使用,可参考 [场景文字识别介绍](std_ocr.md)
......@@ -17,7 +16,7 @@ box_infos = std.detect('examples/taobao.jpg')
for box_info in box_infos['detected_texts']:
cropped_img = box_info['cropped_img']
ocr_res = cn_ocr.ocr_for_single_line(cropped_img)
print('ocr result: %s' % str(ocr_out))
print('ocr result: %s' % str(ocr_res))
```
注:运行上面示例需要先安装 **[cnstd](https://github.com/breezedeus/cnstd)**
......@@ -29,4 +28,3 @@ pip install cnstd
**[cnstd](https://github.com/breezedeus/cnstd)** 相关的更多使用说明请参考其项目地址。
可基于 [在线 Demo](demo.md) 查看 CnStd + CnOcr 的联合效果。
......@@ -5,6 +5,6 @@ torchvision>=0.9.0
numpy
pytorch-lightning
pillow>=5.3.0
python-Levenshtein
#python-Levenshtein
onnx
onnxruntime
\ No newline at end of file
......@@ -34,7 +34,6 @@ pyasn1-modules==0.2.8 # via google-auth
pyasn1==0.4.8 # via pyasn1-modules, rsa
pydeprecate==0.3.1 # via pytorch-lightning
pyparsing==2.4.7 # via packaging
python-levenshtein==0.12.0 # via -r requirements.in
pytorch-lightning==1.4.4 # via -r requirements.in
pyyaml==5.4.1 # via pytorch-lightning
requests-oauthlib==1.3.0 # via google-auth-oauthlib
......
# coding: utf-8
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
""" An example of predicting CAPTCHA image data with a LSTM network pre-trained with a CTC loss"""
from __future__ import print_function
import sys
import os
import time
import logging
import argparse
from operator import itemgetter
from pathlib import Path
from collections import Counter
import mxnet as mx
import Levenshtein
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from cnocr import CnOcr
from cnocr.utils import set_logger, gen_context
logger = set_logger(log_level=logging.INFO)
def evaluate():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model-name", help="model name", type=str, default='densenet-lite-lstm'
)
parser.add_argument("--model-epoch", type=int, default=None, help="model epoch")
parser.add_argument(
"--gpu",
help="Number of GPUs for training [Default 0, means using cpu]"
"目前限制gpu <= 1,因为 gpu > 1时预测结果有问题,与 gpu = 1时不同,暂未发现原因。",
type=int,
default=0,
)
parser.add_argument(
"-i",
"--input-fp",
default='test.txt',
help="the file path with image names and labels",
)
parser.add_argument(
"--image-prefix-dir", default='.', help="图片所在文件夹,相对于索引文件中记录的图片位置"
)
parser.add_argument("--batch-size", type=int, default=128, help="batch size")
parser.add_argument(
"-v",
"--verbose",
action='store_true',
help="whether to print details to screen",
)
parser.add_argument(
"-o",
"--output-dir",
default=False,
help="the output directory which records the analysis results",
)
args = parser.parse_args()
assert args.gpu <= 1
context = gen_context(args.gpu)
ocr = CnOcr(
model_name=args.model_name, model_epoch=args.model_epoch, context=context
)
alphabet = ocr._vocab
fn_labels_list = read_input_file(args.input_fp)
miss_cnt, redundant_cnt = Counter(), Counter()
model_time_cost = 0.0
start_idx = 0
bad_cnt = 0
badcases = []
while start_idx < len(fn_labels_list):
logger.info('start_idx: %d', start_idx)
batch = fn_labels_list[start_idx : start_idx + args.batch_size]
batch_img_fns = []
batch_labels = []
batch_imgs = []
for fn, labels in batch:
batch_labels.append(labels)
img_fp = os.path.join(args.image_prefix_dir, fn)
batch_img_fns.append(img_fp)
img = mx.image.imread(img_fp, 1).asnumpy()
batch_imgs.append(img)
start_time = time.time()
batch_preds = ocr.ocr_for_single_lines(batch_imgs)
model_time_cost += time.time() - start_time
for bad_info in compare_preds_to_reals(
batch_preds, batch_labels, batch_img_fns, alphabet
):
if args.verbose:
logger.info('\t'.join(bad_info))
distance = Levenshtein.distance(bad_info[1], bad_info[2])
bad_info.insert(0, distance)
badcases.append(bad_info)
miss_cnt.update(list(bad_info[-2]))
redundant_cnt.update(list(bad_info[-1]))
bad_cnt += 1
start_idx += args.batch_size
badcases.sort(key=itemgetter(0), reverse=True)
output_dir = Path(args.output_dir)
if not output_dir.exists():
os.makedirs(output_dir)
with open(output_dir / 'badcases.txt', 'w') as f:
f.write(
'\t'.join(
[
'distance',
'image_fp',
'real_words',
'pred_words',
'miss_words',
'redundant_words',
]
)
+ '\n'
)
for bad_info in badcases:
f.write('\t'.join(map(str, bad_info)) + '\n')
with open(output_dir / 'miss_words_stat.txt', 'w') as f:
for word, num in miss_cnt.most_common():
f.write('\t'.join([word, str(num)]) + '\n')
with open(output_dir / 'redundant_words_stat.txt', 'w') as f:
for word, num in redundant_cnt.most_common():
f.write('\t'.join([word, str(num)]) + '\n')
logger.info(
"number of total cases: %d, number of bad cases: %d, acc: %.4f, time cost per image: %f"
% (
len(fn_labels_list),
bad_cnt,
bad_cnt / len(fn_labels_list),
model_time_cost / len(fn_labels_list),
)
)
def read_input_file(in_fp):
fn_labels_list = []
with open(in_fp) as f:
for line in f:
fields = line.strip().split()
fn_labels_list.append((fields[0], fields[1:]))
return fn_labels_list
def compare_preds_to_reals(batch_preds, batch_reals, batch_img_fns, alphabet):
for preds, reals, img_fn in zip(batch_preds, batch_reals, batch_img_fns):
reals = [alphabet[int(_id)] for _id in reals if _id != '0'] # '0' is padding id
if preds == reals:
continue
preds_set, reals_set = set(preds), set(reals)
miss_words = reals_set.difference(preds_set)
redundant_words = preds_set.difference(reals_set)
yield [
img_fn,
''.join(reals),
''.join(preds),
''.join(miss_words),
''.join(redundant_words),
]
if __name__ == '__main__':
evaluate()
# coding: utf-8
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
""" An example of predicting CAPTCHA image data with a LSTM network pre-trained with a CTC loss"""
from __future__ import print_function
import sys
import os
import time
import glob
import logging
import argparse
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from cnocr import CnOcr
from cnocr.utils import set_logger
logger = set_logger(log_level=logging.INFO)
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model_name", help="model name", type=str, default='conv-lite-fc'
)
parser.add_argument("--model_epoch", type=int, default=None, help="model epoch")
parser.add_argument(
"--context",
help="使用cpu还是gpu运行代码。默认为cpu",
type=str,
choices=['cpu', 'gpu'],
default='cpu',
)
parser.add_argument("-f", "--file", help="Path to the image file or dir")
parser.add_argument(
"-s",
"--single-line",
default=False,
help="Whether the image only includes one-line characters",
)
args = parser.parse_args()
ocr = CnOcr(
model_name=args.model_name, model_epoch=args.model_epoch, context=args.context
)
ocr_func = ocr.ocr_for_single_line if args.single_line else ocr.ocr
fp_list = []
if os.path.isfile(args.file):
fp_list.append(args.file)
elif os.path.isdir(args.file):
fn_list = glob.glob1(args.file, '*g')
fp_list = [os.path.join(args.file, fn) for fn in fn_list]
for fp in fp_list:
start_time = time.time()
res = ocr_func(fp)
logger.info('\n' + '=' * 10 + fp + '=' * 10)
if not args.single_line:
res = '\n'.join([''.join(line_p) for line_p in res])
else:
res = ''.join(res)
logger.info('\n' + res)
logger.info('time cost: %f' % (time.time() - start_time))
if __name__ == '__main__':
main()
# coding: utf-8
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
from __future__ import print_function
import argparse
import logging
import os
import sys
import mxnet as mx
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from cnocr.consts import EMB_MODEL_TYPES, SEQ_MODEL_TYPES, MODEL_VERSION
from cnocr.utils import data_dir, set_logger
from cnocr.hyperparams.cn_hyperparams import CnHyperparams
from cnocr.data_utils.data_iter import GrayImageIter
from cnocr.data_utils.aug import FgBgFlipAug
from cnocr.symbols.crnn import gen_network
from cnocr.fit.ctc_metrics import CtcMetrics
from cnocr.fit.fit import fit
logger = set_logger(log_level=logging.INFO)
def parse_args():
# Parse command line arguments
parser = argparse.ArgumentParser()
parser.add_argument(
"--emb_model_type",
help="which embedding model to use",
choices=EMB_MODEL_TYPES,
type=str,
default='conv-lite',
)
parser.add_argument(
"--seq_model_type",
help='which sequence model to use',
default='fc',
type=str,
choices=SEQ_MODEL_TYPES,
)
parser.add_argument(
"--train_file",
help="Path to train txt file",
type=str,
default='data/sample-data-lst/train.txt',
)
parser.add_argument(
"--test_file",
help="Path to test txt file",
type=str,
default='data/sample-data-lst/test.txt',
)
parser.add_argument(
"--use_train_image_aug",
action='store_true',
help="Whether to use image augmentation for training",
)
parser.add_argument(
"--gpu",
help="Number of GPUs for training [Default 0, means using cpu]",
type=int,
default=0,
)
parser.add_argument(
"--optimizer",
help="optimizer for training [Default: Adam]",
type=str,
default='Adam',
)
parser.add_argument(
'--batch_size',
type=int,
default=128,
help='batch size for each device [Default: 128]',
)
parser.add_argument(
'--epoch', type=int, default=20, help='train epochs [Default: 20]'
)
parser.add_argument(
'--load_epoch',
type=int,
help='load the model on an epoch using the model-load-prefix '
'[Default: no trained model will be loaded]',
)
parser.add_argument('--lr', type=float, default=0.001, help='learning rate')
parser.add_argument(
'--dropout', type=float, default=0.5, help='dropout ratio [Default: 0.5]'
)
parser.add_argument(
'--wd', type=float, default=0.0, help='weight decay factor [Default: 0.0]'
)
parser.add_argument(
'--clip_gradient',
type=float,
default=None,
help='value for clip gradient [Default: None, means no gradient will be clip]',
)
parser.add_argument(
"--out_model_dir",
help='output model directory',
default=os.path.join(data_dir(), MODEL_VERSION),
)
return parser.parse_args()
def train_cnocr(args):
head = '%(asctime)-15s %(message)s'
logging.basicConfig(level=logging.DEBUG, format=head)
args.model_name = args.emb_model_type + '-' + args.seq_model_type
out_dir = os.path.join(args.out_model_dir, args.model_name)
logger.info('save models to dir: %s' % out_dir)
if not os.path.exists(out_dir):
os.makedirs(out_dir)
args.prefix = os.path.join(
out_dir, 'cnocr-v{}-{}'.format(MODEL_VERSION, args.model_name)
)
hp = CnHyperparams()
hp = _update_hp(hp, args)
network, hp = gen_network(args.model_name, hp)
metrics = CtcMetrics(hp.seq_length)
data_train, data_val = _gen_iters(
hp, args.train_file, args.test_file, args.use_train_image_aug
)
data_names = ['data']
fit(
network=network,
data_train=data_train,
data_val=data_val,
metrics=metrics,
args=args,
hp=hp,
data_names=data_names,
)
def _update_hp(hp, args):
hp.seq_model_type = args.seq_model_type
hp._num_epoch = args.epoch
hp.optimizer = args.optimizer
hp._batch_size = args.batch_size
hp._learning_rate = args.lr
hp._drop_out = args.dropout
hp.wd = args.wd
hp.clip_gradient = args.clip_gradient
return hp
def _gen_iters(hp, train_fp_prefix, val_fp_prefix, use_train_image_aug):
height, width = hp.img_height, hp.img_width
augs = None
if use_train_image_aug:
augs = mx.image.CreateAugmenter(
data_shape=(3, height, width),
resize=0,
rand_crop=False,
rand_resize=False,
rand_mirror=False,
mean=None,
std=None,
brightness=0.001,
contrast=0.001,
saturation=0.001,
hue=0.05,
pca_noise=0.1,
inter_method=2,
)
augs.append(FgBgFlipAug(p=0.2))
train_iter = GrayImageIter(
batch_size=hp.batch_size,
data_shape=(3, height, width),
label_width=hp.num_label,
dtype='int32',
shuffle=True,
path_imgrec=str(train_fp_prefix) + ".rec",
path_imgidx=str(train_fp_prefix) + ".idx",
aug_list=augs,
)
val_iter = GrayImageIter(
batch_size=hp.batch_size,
data_shape=(3, height, width),
label_width=hp.num_label,
dtype='int32',
path_imgrec=str(val_fp_prefix) + ".rec",
path_imgidx=str(val_fp_prefix) + ".idx",
)
return train_iter, val_iter
if __name__ == '__main__':
args = parse_args()
train_cnocr(args)
#!/usr/bin/env bash
# -*- coding: utf-8 -*-
cd `dirname $0`/../
## 训练captcha
#python scripts/cnocr_train.py --cpu 2 --num_proc 2 --loss ctc --dataset captcha --font_path /Users/king/Documents/WhatIHaveDone/Test/text_renderer/data/fonts/chn/msyh.ttf
# 训练中文ocr模型crnn
python scripts/cnocr_train.py --cpu 2 --num_proc 4 --loss ctc --dataset cn_ocr
## gpu版本
#python scripts/cnocr_train.py --gpu 1 --num_proc 8 --loss ctc --dataset cn_ocr --data_root /jfs/jinlong/data/ocr/outer/images \
# --train_file /jfs/jinlong/data/ocr/outer/train.txt --test_file /jfs/jinlong/data/ocr/outer/test.txt
## 预测中文图片
#python scripts/cnocr_predict.py --file examples/rand_cn1.png
\ No newline at end of file
......@@ -44,7 +44,8 @@ required = [
'numpy',
"pytorch-lightning",
"pillow>=5.3.0",
"python-Levenshtein",
"onnx",
"onnxruntime",
]
extras_require = {
"dev": ["pip-tools", "pytest", "python-Levenshtein"],
......
......@@ -25,7 +25,6 @@ import pytest
import numpy as np
from PIL import Image
import Levenshtein
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.insert(1, os.path.dirname(os.path.abspath(__file__)))
......@@ -99,6 +98,8 @@ def print_preds(pred):
def cal_score(preds, expected):
import Levenshtein
if len(preds) != len(expected):
return 0
total_cnt = 0
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册