提交 8cb8f8c8 编写于 作者: W wuzewu

Merge branch 'release/v1.7' into develop

...@@ -21,8 +21,13 @@ env: ...@@ -21,8 +21,13 @@ env:
- PYTHONPATH=${PWD} - PYTHONPATH=${PWD}
install: install:
- pip install --upgrade paddlepaddle - if [[ $TRAVIS_OS_NAME == osx ]]; then
- pip install -r requirements.txt pip3 install --upgrade paddlepaddle;
pip3 install -r requirements.txt;
else
pip install --upgrade paddlepaddle;
pip install -r requirements.txt;
fi
notifications: notifications:
email: email:
......
...@@ -50,7 +50,7 @@ PaddleHub以预训练模型应用为核心具备以下特点: ...@@ -50,7 +50,7 @@ PaddleHub以预训练模型应用为核心具备以下特点:
### 安装命令 ### 安装命令
PaddlePaddle框架的安装请查阅[飞桨快速安装](https://www.paddlepaddle.org.cn/install/quick) 在安装PaddleHub之前,请先安装PaddlePaddle深度学习框架,更多安装说明请查阅[飞桨快速安装](https://www.paddlepaddle.org.cn/install/quick)
```shell ```shell
pip install paddlehub pip install paddlehub
...@@ -66,6 +66,18 @@ PaddleHub采用模型即软件的设计理念,所有的预训练模型与Pytho ...@@ -66,6 +66,18 @@ PaddleHub采用模型即软件的设计理念,所有的预训练模型与Pytho
安装PaddleHub后,执行命令[hub run](./docs/tutorial/cmdintro.md),即可快速体验无需代码、一键预测的功能: 安装PaddleHub后,执行命令[hub run](./docs/tutorial/cmdintro.md),即可快速体验无需代码、一键预测的功能:
* 使用[文字识别](https://www.paddlepaddle.org.cn/hublist?filter=en_category&value=TextRecognition)轻量级中文OCR模型chinese_ocr_db_crnn_mobile即可一键快速识别图片中的文字。
```shell
$ wget https://paddlehub.bj.bcebos.com/model/image/ocr/test_ocr.jpg
$ hub run chinese_ocr_db_crnn_mobile --input_path test_ocr.jpg --visualization=True
```
预测结果图片保存在当前运行路径下ocr_result文件夹中,如下图所示。
<p align="center">
<img src="./docs/imgs/ocr_res.jpg" width='70%' align="middle"
</p>
* 使用[目标检测](https://www.paddlepaddle.org.cn/hublist?filter=en_category&value=ObjectDetection)模型pyramidbox_lite_mobile_mask对图片进行口罩检测 * 使用[目标检测](https://www.paddlepaddle.org.cn/hublist?filter=en_category&value=ObjectDetection)模型pyramidbox_lite_mobile_mask对图片进行口罩检测
```shell ```shell
$ wget https://paddlehub.bj.bcebos.com/resources/test_mask_detection.jpg $ wget https://paddlehub.bj.bcebos.com/resources/test_mask_detection.jpg
...@@ -192,5 +204,5 @@ $ hub uninstall ernie ...@@ -192,5 +204,5 @@ $ hub uninstall ernie
## 更新历史 ## 更新历史
PaddleHub v1.6 已发布! PaddleHub v1.7 已发布!
更多升级详情参考[更新历史](./RELEASE.md) 更多升级详情参考[更新历史](./RELEASE.md)
## `v1.7.0`
* 丰富预训练模型,提升应用性
* 新增VENUS系列视觉预训练模型[yolov3_darknet53_venus](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_venus&en_category=ObjectDetection)[faster_rcnn_resnet50_fpn_venus](https://www.paddlepaddle.org.cn/hubdetail?name=faster_rcnn_resnet50_fpn_venus&en_category=ObjectDetection),可大幅度提升图像分类和目标检测任务的Fine-tune效果
* 新增工业级短视频分类模型[videotag_tsn_lstm](https://paddlepaddle.org.cn/hubdetail?name=videotag_tsn_lstm&en_category=VideoClassification),支持3000类中文标签识别
* 新增轻量级中文OCR模型[chinese_ocr_db_rcnn](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_rcnn&en_category=TextRecognition)[chinese_text_detection_db](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_text_detection_db&en_category=TextRecognition),支持一键快速OCR识别
* 新增行人检测、车辆检测、动物识别、Object等工业级模型
* Fine-tune API升级
* 文本分类任务新增6个预置网络,包括CNN, BOW, LSTM, BiLSTM, DPCNN等
* 使用VisualDL可视化训练评估性能数据
## `v1.6.2`
* 修复图像分类在windows下运行错误
## `v1.6.1`
* 修复windows下安装PaddleHub缺失config.json文件
# `v1.6.0` # `v1.6.0`
* NLP Module全面升级,提升应用性和灵活性 * NLP Module全面升级,提升应用性和灵活性
......
# DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks
# Introduction ## Introduction
This page implements the [DELTA](https://arxiv.org/abs/1901.09229) algorithm in [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick).
This page implements the [DELTA](https://arxiv.org/abs/1901.09229) algorithm in [PaddlePaddle](https://www.paddlepaddle.org.cn).
> Li, Xingjian, et al. "DELTA: Deep learning transfer using feature map with attention for convolutional networks." ICLR 2019. > Li, Xingjian, et al. "DELTA: Deep learning transfer using feature map with attention for convolutional networks." ICLR 2019.
# Preparation of Data and Pre-trained Model ## Preparation of Data and Pre-trained Model
- Download transfer learning target datasets, like [Caltech-256](http://www.vision.caltech.edu/Image_Datasets/Caltech256/), [CUB_200_2011](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) or others. Arrange the dataset in this way: - Download transfer learning target datasets, like [Caltech-256](http://www.vision.caltech.edu/Image_Datasets/Caltech256/), [CUB_200_2011](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) or others. Arrange the dataset in this way:
``` ```
...@@ -23,7 +25,7 @@ This page implements the [DELTA](https://arxiv.org/abs/1901.09229) algorithm in ...@@ -23,7 +25,7 @@ This page implements the [DELTA](https://arxiv.org/abs/1901.09229) algorithm in
- Download [the pretrained models](https://github.com/PaddlePaddle/models/tree/release/1.7/PaddleCV/image_classification#resnet-series). We give the results of ResNet-101 below. - Download [the pretrained models](https://github.com/PaddlePaddle/models/tree/release/1.7/PaddleCV/image_classification#resnet-series). We give the results of ResNet-101 below.
# Running Scripts ## Running Scripts
Modify `global_data_path` in `datasets/data_path` to the path root where the dataset is. Modify `global_data_path` in `datasets/data_path` to the path root where the dataset is.
......
...@@ -18,7 +18,7 @@ parser.add_argument( ...@@ -18,7 +18,7 @@ parser.add_argument(
default="mobilenet", default="mobilenet",
help="Module used as feature extractor.") help="Module used as feature extractor.")
# the name of hyperparameters to be searched should keep with hparam.py # the name of hyper-parameters to be searched should keep with hparam.py
parser.add_argument( parser.add_argument(
"--batch_size", "--batch_size",
type=int, type=int,
...@@ -27,7 +27,7 @@ parser.add_argument( ...@@ -27,7 +27,7 @@ parser.add_argument(
parser.add_argument( parser.add_argument(
"--learning_rate", type=float, default=1e-4, help="learning_rate.") "--learning_rate", type=float, default=1e-4, help="learning_rate.")
# saved_params_dir and model_path are needed by auto finetune # saved_params_dir and model_path are needed by auto fine-tune
parser.add_argument( parser.add_argument(
"--saved_params_dir", "--saved_params_dir",
type=str, type=str,
...@@ -76,7 +76,7 @@ def finetune(args): ...@@ -76,7 +76,7 @@ def finetune(args):
img = input_dict["image"] img = input_dict["image"]
feed_list = [img.name] feed_list = [img.name]
# Select finetune strategy, setup config and finetune # Select fine-tune strategy, setup config and fine-tune
strategy = hub.DefaultFinetuneStrategy(learning_rate=args.learning_rate) strategy = hub.DefaultFinetuneStrategy(learning_rate=args.learning_rate)
config = hub.RunConfig( config = hub.RunConfig(
use_cuda=True, use_cuda=True,
...@@ -100,7 +100,7 @@ def finetune(args): ...@@ -100,7 +100,7 @@ def finetune(args):
task.load_parameters(args.model_path) task.load_parameters(args.model_path)
logger.info("PaddleHub has loaded model from %s" % args.model_path) logger.info("PaddleHub has loaded model from %s" % args.model_path)
# Finetune by PaddleHub's API # Fine-tune by PaddleHub's API
task.finetune() task.finetune()
# Evaluate by PaddleHub's API # Evaluate by PaddleHub's API
run_states = task.eval() run_states = task.eval()
...@@ -114,7 +114,7 @@ def finetune(args): ...@@ -114,7 +114,7 @@ def finetune(args):
shutil.copytree(best_model_dir, args.saved_params_dir) shutil.copytree(best_model_dir, args.saved_params_dir)
shutil.rmtree(config.checkpoint_dir) shutil.rmtree(config.checkpoint_dir)
# acc on dev will be used by auto finetune # acc on dev will be used by auto fine-tune
hub.report_final_result(eval_avg_score["acc"]) hub.report_final_result(eval_avg_score["acc"])
......
...@@ -13,7 +13,7 @@ from paddlehub.common.logger import logger ...@@ -13,7 +13,7 @@ from paddlehub.common.logger import logger
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--epochs", type=int, default=3, help="epochs.") parser.add_argument("--epochs", type=int, default=3, help="epochs.")
# the name of hyperparameters to be searched should keep with hparam.py # the name of hyper-parameters to be searched should keep with hparam.py
parser.add_argument("--batch_size", type=int, default=32, help="batch_size.") parser.add_argument("--batch_size", type=int, default=32, help="batch_size.")
parser.add_argument( parser.add_argument(
"--learning_rate", type=float, default=5e-5, help="learning_rate.") "--learning_rate", type=float, default=5e-5, help="learning_rate.")
...@@ -33,7 +33,7 @@ parser.add_argument( ...@@ -33,7 +33,7 @@ parser.add_argument(
default=None, default=None,
help="Directory to model checkpoint") help="Directory to model checkpoint")
# saved_params_dir and model_path are needed by auto finetune # saved_params_dir and model_path are needed by auto fine-tune
parser.add_argument( parser.add_argument(
"--saved_params_dir", "--saved_params_dir",
type=str, type=str,
...@@ -82,14 +82,14 @@ if __name__ == '__main__': ...@@ -82,14 +82,14 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Select finetune strategy, setup config and finetune # Select fine-tune strategy, setup config and fine-tune
strategy = hub.AdamWeightDecayStrategy( strategy = hub.AdamWeightDecayStrategy(
warmup_proportion=args.warmup_prop, warmup_proportion=args.warmup_prop,
learning_rate=args.learning_rate, learning_rate=args.learning_rate,
weight_decay=args.weight_decay, weight_decay=args.weight_decay,
lr_scheduler="linear_decay") lr_scheduler="linear_decay")
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
use_cuda=True, use_cuda=True,
...@@ -98,7 +98,7 @@ if __name__ == '__main__': ...@@ -98,7 +98,7 @@ if __name__ == '__main__':
enable_memory_optim=True, enable_memory_optim=True,
strategy=strategy) strategy=strategy)
# Define a classfication finetune task by PaddleHub's API # Define a classfication fine-tune task by PaddleHub's API
cls_task = hub.TextClassifierTask( cls_task = hub.TextClassifierTask(
data_reader=reader, data_reader=reader,
feature=pooled_output, feature=pooled_output,
...@@ -125,5 +125,5 @@ if __name__ == '__main__': ...@@ -125,5 +125,5 @@ if __name__ == '__main__':
shutil.copytree(best_model_dir, args.saved_params_dir) shutil.copytree(best_model_dir, args.saved_params_dir)
shutil.rmtree(config.checkpoint_dir) shutil.rmtree(config.checkpoint_dir)
# acc on dev will be used by auto finetune # acc on dev will be used by auto fine-tune
hub.report_final_result(eval_avg_score["acc"]) hub.report_final_result(eval_avg_score["acc"])
...@@ -14,7 +14,7 @@ parser.add_argument("--use_gpu", type=ast.literal_eval, default=True ...@@ -14,7 +14,7 @@ parser.add_argument("--use_gpu", type=ast.literal_eval, default=True
parser.add_argument("--checkpoint_dir", type=str, default="paddlehub_finetune_ckpt", help="Path to save log data.") parser.add_argument("--checkpoint_dir", type=str, default="paddlehub_finetune_ckpt", help="Path to save log data.")
parser.add_argument("--batch_size", type=int, default=16, help="Total examples' number in batch for training.") parser.add_argument("--batch_size", type=int, default=16, help="Total examples' number in batch for training.")
parser.add_argument("--module", type=str, default="resnet50", help="Module used as feature extractor.") parser.add_argument("--module", type=str, default="resnet50", help="Module used as feature extractor.")
parser.add_argument("--dataset", type=str, default="flowers", help="Dataset to finetune.") parser.add_argument("--dataset", type=str, default="flowers", help="Dataset to fine-tune.")
parser.add_argument("--use_data_parallel", type=ast.literal_eval, default=True, help="Whether use data parallel.") parser.add_argument("--use_data_parallel", type=ast.literal_eval, default=True, help="Whether use data parallel.")
# yapf: enable. # yapf: enable.
...@@ -60,7 +60,7 @@ def finetune(args): ...@@ -60,7 +60,7 @@ def finetune(args):
# Setup feed list for data feeder # Setup feed list for data feeder
feed_list = [input_dict["image"].name] feed_list = [input_dict["image"].name]
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=args.use_data_parallel, use_data_parallel=args.use_data_parallel,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -69,7 +69,7 @@ def finetune(args): ...@@ -69,7 +69,7 @@ def finetune(args):
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=hub.finetune.strategy.DefaultFinetuneStrategy()) strategy=hub.finetune.strategy.DefaultFinetuneStrategy())
# Define a reading comprehension finetune task by PaddleHub's API # Define a image classification task by PaddleHub Fine-tune API
task = hub.ImageClassifierTask( task = hub.ImageClassifierTask(
data_reader=data_reader, data_reader=data_reader,
feed_list=feed_list, feed_list=feed_list,
...@@ -77,7 +77,7 @@ def finetune(args): ...@@ -77,7 +77,7 @@ def finetune(args):
num_classes=dataset.num_labels, num_classes=dataset.num_labels,
config=config) config=config)
# Finetune by PaddleHub's API # Fine-tune by PaddleHub's API
task.finetune_and_eval() task.finetune_and_eval()
......
...@@ -13,7 +13,7 @@ parser.add_argument("--use_gpu", type=ast.literal_eval, default=True ...@@ -13,7 +13,7 @@ parser.add_argument("--use_gpu", type=ast.literal_eval, default=True
parser.add_argument("--checkpoint_dir", type=str, default="paddlehub_finetune_ckpt", help="Path to save log data.") parser.add_argument("--checkpoint_dir", type=str, default="paddlehub_finetune_ckpt", help="Path to save log data.")
parser.add_argument("--batch_size", type=int, default=16, help="Total examples' number in batch for training.") parser.add_argument("--batch_size", type=int, default=16, help="Total examples' number in batch for training.")
parser.add_argument("--module", type=str, default="resnet50", help="Module used as a feature extractor.") parser.add_argument("--module", type=str, default="resnet50", help="Module used as a feature extractor.")
parser.add_argument("--dataset", type=str, default="flowers", help="Dataset to finetune.") parser.add_argument("--dataset", type=str, default="flowers", help="Dataset to fine-tune.")
# yapf: enable. # yapf: enable.
module_map = { module_map = {
...@@ -58,7 +58,7 @@ def predict(args): ...@@ -58,7 +58,7 @@ def predict(args):
# Setup feed list for data feeder # Setup feed list for data feeder
feed_list = [input_dict["image"].name] feed_list = [input_dict["image"].name]
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=False, use_data_parallel=False,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -66,7 +66,7 @@ def predict(args): ...@@ -66,7 +66,7 @@ def predict(args):
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=hub.finetune.strategy.DefaultFinetuneStrategy()) strategy=hub.finetune.strategy.DefaultFinetuneStrategy())
# Define a reading comprehension finetune task by PaddleHub's API # Define a image classification task by PaddleHub Fine-tune API
task = hub.ImageClassifierTask( task = hub.ImageClassifierTask(
data_reader=data_reader, data_reader=data_reader,
feed_list=feed_list, feed_list=feed_list,
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
import argparse import argparse
import ast import ast
...@@ -23,7 +23,7 @@ import paddlehub as hub ...@@ -23,7 +23,7 @@ import paddlehub as hub
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.") parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.") parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.")
parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.") parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.")
parser.add_argument("--warmup_proportion", type=float, default=0.1, help="Warmup proportion params for warmup strategy") parser.add_argument("--warmup_proportion", type=float, default=0.1, help="Warmup proportion params for warmup strategy")
...@@ -56,13 +56,13 @@ if __name__ == '__main__': ...@@ -56,13 +56,13 @@ if __name__ == '__main__':
# Use "pooled_output" for classification tasks on an entire sentence. # Use "pooled_output" for classification tasks on an entire sentence.
pooled_output = outputs["pooled_output"] pooled_output = outputs["pooled_output"]
# Select finetune strategy, setup config and finetune # Select fine-tune strategy, setup config and fine-tune
strategy = hub.AdamWeightDecayStrategy( strategy = hub.AdamWeightDecayStrategy(
warmup_proportion=args.warmup_proportion, warmup_proportion=args.warmup_proportion,
weight_decay=args.weight_decay, weight_decay=args.weight_decay,
learning_rate=args.learning_rate) learning_rate=args.learning_rate)
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
num_epoch=args.num_epoch, num_epoch=args.num_epoch,
...@@ -70,7 +70,7 @@ if __name__ == '__main__': ...@@ -70,7 +70,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=strategy) strategy=strategy)
# Define a classfication finetune task by PaddleHub's API # Define a classfication fine-tune task by PaddleHub's API
multi_label_cls_task = hub.MultiLabelClassifierTask( multi_label_cls_task = hub.MultiLabelClassifierTask(
data_reader=reader, data_reader=reader,
feature=pooled_output, feature=pooled_output,
...@@ -78,6 +78,6 @@ if __name__ == '__main__': ...@@ -78,6 +78,6 @@ if __name__ == '__main__':
num_classes=dataset.num_labels, num_classes=dataset.num_labels,
config=config) config=config)
# Finetune and evaluate by PaddleHub's API # Fine-tune and evaluate by PaddleHub's API
# will finish training, evaluation, testing, save model automatically # will finish training, evaluation, testing, save model automatically
multi_label_cls_task.finetune_and_eval() multi_label_cls_task.finetune_and_eval()
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -35,7 +35,7 @@ parser = argparse.ArgumentParser(__doc__) ...@@ -35,7 +35,7 @@ parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint") parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint")
parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.") parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.")
parser.add_argument("--max_seq_len", type=int, default=128, help="Number of words of the longest seqence.") parser.add_argument("--max_seq_len", type=int, default=128, help="Number of words of the longest seqence.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for fine-tuning, input should be True or False")
args = parser.parse_args() args = parser.parse_args()
# yapf: enable. # yapf: enable.
...@@ -65,7 +65,7 @@ if __name__ == '__main__': ...@@ -65,7 +65,7 @@ if __name__ == '__main__':
# Use "sequence_output" for token-level output. # Use "sequence_output" for token-level output.
pooled_output = outputs["pooled_output"] pooled_output = outputs["pooled_output"]
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=False, use_data_parallel=False,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -73,7 +73,7 @@ if __name__ == '__main__': ...@@ -73,7 +73,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=hub.finetune.strategy.DefaultFinetuneStrategy()) strategy=hub.finetune.strategy.DefaultFinetuneStrategy())
# Define a classfication finetune task by PaddleHub's API # Define a classfication fine-tune task by PaddleHub's API
multi_label_cls_task = hub.MultiLabelClassifierTask( multi_label_cls_task = hub.MultiLabelClassifierTask(
data_reader=reader, data_reader=reader,
feature=pooled_output, feature=pooled_output,
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
import argparse import argparse
import ast import ast
...@@ -23,7 +23,7 @@ import paddlehub as hub ...@@ -23,7 +23,7 @@ import paddlehub as hub
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.") parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.") parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.")
parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.") parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.")
parser.add_argument("--warmup_proportion", type=float, default=0.0, help="Warmup proportion params for warmup strategy") parser.add_argument("--warmup_proportion", type=float, default=0.0, help="Warmup proportion params for warmup strategy")
...@@ -61,13 +61,13 @@ if __name__ == '__main__': ...@@ -61,13 +61,13 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Select finetune strategy, setup config and finetune # Select fine-tune strategy, setup config and fine-tune
strategy = hub.AdamWeightDecayStrategy( strategy = hub.AdamWeightDecayStrategy(
warmup_proportion=args.warmup_proportion, warmup_proportion=args.warmup_proportion,
weight_decay=args.weight_decay, weight_decay=args.weight_decay,
learning_rate=args.learning_rate) learning_rate=args.learning_rate)
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=args.use_data_parallel, use_data_parallel=args.use_data_parallel,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -76,7 +76,7 @@ if __name__ == '__main__': ...@@ -76,7 +76,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=strategy) strategy=strategy)
# Define a classfication finetune task by PaddleHub's API # Define a classfication fine-tune task by PaddleHub's API
cls_task = hub.TextClassifierTask( cls_task = hub.TextClassifierTask(
data_reader=reader, data_reader=reader,
feature=pooled_output, feature=pooled_output,
...@@ -84,6 +84,6 @@ if __name__ == '__main__': ...@@ -84,6 +84,6 @@ if __name__ == '__main__':
num_classes=dataset.num_labels, num_classes=dataset.num_labels,
config=config) config=config)
# Finetune and evaluate by PaddleHub's API # Fine-tune and evaluate by PaddleHub's API
# will finish training, evaluation, testing, save model automatically # will finish training, evaluation, testing, save model automatically
cls_task.finetune_and_eval() cls_task.finetune_and_eval()
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -33,7 +33,7 @@ parser = argparse.ArgumentParser(__doc__) ...@@ -33,7 +33,7 @@ parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint") parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint")
parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.") parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.")
parser.add_argument("--max_seq_len", type=int, default=128, help="Number of words of the longest seqence.") parser.add_argument("--max_seq_len", type=int, default=128, help="Number of words of the longest seqence.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for fine-tuning, input should be True or False")
args = parser.parse_args() args = parser.parse_args()
# yapf: enable. # yapf: enable.
...@@ -63,7 +63,7 @@ if __name__ == '__main__': ...@@ -63,7 +63,7 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=False, use_data_parallel=False,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -71,7 +71,7 @@ if __name__ == '__main__': ...@@ -71,7 +71,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=hub.finetune.strategy.DefaultFinetuneStrategy()) strategy=hub.finetune.strategy.DefaultFinetuneStrategy())
# Define a classfication finetune task by PaddleHub's API # Define a classfication fine-tune task by PaddleHub's API
cls_task = hub.TextClassifierTask( cls_task = hub.TextClassifierTask(
data_reader=reader, data_reader=reader,
feature=pooled_output, feature=pooled_output,
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -28,7 +28,7 @@ hub.common.logger.logger.setLevel("INFO") ...@@ -28,7 +28,7 @@ hub.common.logger.logger.setLevel("INFO")
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=1, help="Number of epoches for fine-tuning.") parser.add_argument("--num_epoch", type=int, default=1, help="Number of epoches for fine-tuning.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint.") parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint.")
parser.add_argument("--max_seq_len", type=int, default=384, help="Number of words of the longest seqence.") parser.add_argument("--max_seq_len", type=int, default=384, help="Number of words of the longest seqence.")
parser.add_argument("--batch_size", type=int, default=8, help="Total examples' number in batch for training.") parser.add_argument("--batch_size", type=int, default=8, help="Total examples' number in batch for training.")
...@@ -64,7 +64,7 @@ if __name__ == '__main__': ...@@ -64,7 +64,7 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=False, use_data_parallel=False,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -72,7 +72,7 @@ if __name__ == '__main__': ...@@ -72,7 +72,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=hub.AdamWeightDecayStrategy()) strategy=hub.AdamWeightDecayStrategy())
# Define a reading comprehension finetune task by PaddleHub's API # Define a reading comprehension fine-tune task by PaddleHub's API
reading_comprehension_task = hub.ReadingComprehensionTask( reading_comprehension_task = hub.ReadingComprehensionTask(
data_reader=reader, data_reader=reader,
feature=seq_output, feature=seq_output,
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
import argparse import argparse
import ast import ast
...@@ -25,7 +25,7 @@ hub.common.logger.logger.setLevel("INFO") ...@@ -25,7 +25,7 @@ hub.common.logger.logger.setLevel("INFO")
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=1, help="Number of epoches for fine-tuning.") parser.add_argument("--num_epoch", type=int, default=1, help="Number of epoches for fine-tuning.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--learning_rate", type=float, default=3e-5, help="Learning rate used to train with warmup.") parser.add_argument("--learning_rate", type=float, default=3e-5, help="Learning rate used to train with warmup.")
parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.") parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.")
parser.add_argument("--warmup_proportion", type=float, default=0.0, help="Warmup proportion params for warmup strategy") parser.add_argument("--warmup_proportion", type=float, default=0.0, help="Warmup proportion params for warmup strategy")
...@@ -64,13 +64,13 @@ if __name__ == '__main__': ...@@ -64,13 +64,13 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Select finetune strategy, setup config and finetune # Select fine-tune strategy, setup config and fine-tune
strategy = hub.AdamWeightDecayStrategy( strategy = hub.AdamWeightDecayStrategy(
weight_decay=args.weight_decay, weight_decay=args.weight_decay,
learning_rate=args.learning_rate, learning_rate=args.learning_rate,
warmup_proportion=args.warmup_proportion) warmup_proportion=args.warmup_proportion)
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
eval_interval=300, eval_interval=300,
use_data_parallel=args.use_data_parallel, use_data_parallel=args.use_data_parallel,
...@@ -80,7 +80,7 @@ if __name__ == '__main__': ...@@ -80,7 +80,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=strategy) strategy=strategy)
# Define a reading comprehension finetune task by PaddleHub's API # Define a reading comprehension fine-tune task by PaddleHub's API
reading_comprehension_task = hub.ReadingComprehensionTask( reading_comprehension_task = hub.ReadingComprehensionTask(
data_reader=reader, data_reader=reader,
feature=seq_output, feature=seq_output,
...@@ -89,5 +89,5 @@ if __name__ == '__main__': ...@@ -89,5 +89,5 @@ if __name__ == '__main__':
sub_task="squad", sub_task="squad",
) )
# Finetune by PaddleHub's API # Fine-tune by PaddleHub's API
reading_comprehension_task.finetune_and_eval() reading_comprehension_task.finetune_and_eval()
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -33,7 +33,7 @@ parser = argparse.ArgumentParser(__doc__) ...@@ -33,7 +33,7 @@ parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint") parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint")
parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.") parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.")
parser.add_argument("--max_seq_len", type=int, default=512, help="Number of words of the longest seqence.") parser.add_argument("--max_seq_len", type=int, default=512, help="Number of words of the longest seqence.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for fine-tuning, input should be True or False")
args = parser.parse_args() args = parser.parse_args()
# yapf: enable. # yapf: enable.
...@@ -64,7 +64,7 @@ if __name__ == '__main__': ...@@ -64,7 +64,7 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=False, use_data_parallel=False,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -72,7 +72,7 @@ if __name__ == '__main__': ...@@ -72,7 +72,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=hub.AdamWeightDecayStrategy()) strategy=hub.AdamWeightDecayStrategy())
# Define a regression finetune task by PaddleHub's API # Define a regression fine-tune task by PaddleHub's API
reg_task = hub.RegressionTask( reg_task = hub.RegressionTask(
data_reader=reader, data_reader=reader,
feature=pooled_output, feature=pooled_output,
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
import argparse import argparse
import ast import ast
...@@ -23,7 +23,7 @@ import paddlehub as hub ...@@ -23,7 +23,7 @@ import paddlehub as hub
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.") parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.") parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.")
parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.") parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.")
parser.add_argument("--warmup_proportion", type=float, default=0.1, help="Warmup proportion params for warmup strategy") parser.add_argument("--warmup_proportion", type=float, default=0.1, help="Warmup proportion params for warmup strategy")
...@@ -62,13 +62,13 @@ if __name__ == '__main__': ...@@ -62,13 +62,13 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Select finetune strategy, setup config and finetune # Select fine-tune strategy, setup config and fine-tune
strategy = hub.AdamWeightDecayStrategy( strategy = hub.AdamWeightDecayStrategy(
warmup_proportion=args.warmup_proportion, warmup_proportion=args.warmup_proportion,
weight_decay=args.weight_decay, weight_decay=args.weight_decay,
learning_rate=args.learning_rate) learning_rate=args.learning_rate)
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
eval_interval=300, eval_interval=300,
use_data_parallel=args.use_data_parallel, use_data_parallel=args.use_data_parallel,
...@@ -78,13 +78,13 @@ if __name__ == '__main__': ...@@ -78,13 +78,13 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=strategy) strategy=strategy)
# Define a regression finetune task by PaddleHub's API # Define a regression fine-tune task by PaddleHub's API
reg_task = hub.RegressionTask( reg_task = hub.RegressionTask(
data_reader=reader, data_reader=reader,
feature=pooled_output, feature=pooled_output,
feed_list=feed_list, feed_list=feed_list,
config=config) config=config)
# Finetune and evaluate by PaddleHub's API # Fine-tune and evaluate by PaddleHub's API
# will finish training, evaluation, testing, save model automatically # will finish training, evaluation, testing, save model automatically
reg_task.finetune_and_eval() reg_task.finetune_and_eval()
...@@ -16,7 +16,7 @@ import paddlehub as hub ...@@ -16,7 +16,7 @@ import paddlehub as hub
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint") parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch when the program predicts.") parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch when the program predicts.")
args = parser.parse_args() args = parser.parse_args()
# yapf: enable. # yapf: enable.
...@@ -37,7 +37,7 @@ if __name__ == '__main__': ...@@ -37,7 +37,7 @@ if __name__ == '__main__':
# Must feed all the tensor of senta's module need # Must feed all the tensor of senta's module need
feed_list = [inputs["words"].name] feed_list = [inputs["words"].name]
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=False, use_data_parallel=False,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -45,7 +45,7 @@ if __name__ == '__main__': ...@@ -45,7 +45,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=hub.AdamWeightDecayStrategy()) strategy=hub.AdamWeightDecayStrategy())
# Define a classfication finetune task by PaddleHub's API # Define a classfication fine-tune task by PaddleHub's API
cls_task = hub.TextClassifierTask( cls_task = hub.TextClassifierTask(
data_reader=reader, data_reader=reader,
feature=sent_feature, feature=sent_feature,
......
...@@ -8,7 +8,7 @@ import paddlehub as hub ...@@ -8,7 +8,7 @@ import paddlehub as hub
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.") parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint") parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint")
parser.add_argument("--batch_size", type=int, default=32, help="Total examples' number in batch for training.") parser.add_argument("--batch_size", type=int, default=32, help="Total examples' number in batch for training.")
args = parser.parse_args() args = parser.parse_args()
...@@ -30,7 +30,7 @@ if __name__ == '__main__': ...@@ -30,7 +30,7 @@ if __name__ == '__main__':
# Must feed all the tensor of senta's module need # Must feed all the tensor of senta's module need
feed_list = [inputs["words"].name] feed_list = [inputs["words"].name]
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
use_pyreader=False, use_pyreader=False,
...@@ -40,7 +40,7 @@ if __name__ == '__main__': ...@@ -40,7 +40,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=hub.AdamWeightDecayStrategy()) strategy=hub.AdamWeightDecayStrategy())
# Define a classfication finetune task by PaddleHub's API # Define a classfication fine-tune task by PaddleHub's API
cls_task = hub.TextClassifierTask( cls_task = hub.TextClassifierTask(
data_reader=reader, data_reader=reader,
feature=sent_feature, feature=sent_feature,
...@@ -48,6 +48,6 @@ if __name__ == '__main__': ...@@ -48,6 +48,6 @@ if __name__ == '__main__':
num_classes=dataset.num_labels, num_classes=dataset.num_labels,
config=config) config=config)
# Finetune and evaluate by PaddleHub's API # Fine-tune and evaluate by PaddleHub's API
# will finish training, evaluation, testing, save model automatically # will finish training, evaluation, testing, save model automatically
cls_task.finetune_and_eval() cls_task.finetune_and_eval()
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on sequence labeling task """ """Fine-tuning on sequence labeling task """
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -27,14 +27,13 @@ import time ...@@ -27,14 +27,13 @@ import time
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import paddlehub as hub import paddlehub as hub
from paddlehub.finetune.evaluate import chunk_eval, calculate_f1
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint") parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint")
parser.add_argument("--max_seq_len", type=int, default=512, help="Number of words of the longest seqence.") parser.add_argument("--max_seq_len", type=int, default=512, help="Number of words of the longest seqence.")
parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.") parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for fine-tuning, input should be True or False")
args = parser.parse_args() args = parser.parse_args()
# yapf: enable. # yapf: enable.
...@@ -67,7 +66,7 @@ if __name__ == '__main__': ...@@ -67,7 +66,7 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=False, use_data_parallel=False,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -75,7 +74,7 @@ if __name__ == '__main__': ...@@ -75,7 +74,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=hub.finetune.strategy.DefaultFinetuneStrategy()) strategy=hub.finetune.strategy.DefaultFinetuneStrategy())
# Define a sequence labeling finetune task by PaddleHub's API # Define a sequence labeling fine-tune task by PaddleHub's API
# if add crf, the network use crf as decoder # if add crf, the network use crf as decoder
seq_label_task = hub.SequenceLabelTask( seq_label_task = hub.SequenceLabelTask(
data_reader=reader, data_reader=reader,
...@@ -84,7 +83,7 @@ if __name__ == '__main__': ...@@ -84,7 +83,7 @@ if __name__ == '__main__':
max_seq_len=args.max_seq_len, max_seq_len=args.max_seq_len,
num_classes=dataset.num_labels, num_classes=dataset.num_labels,
config=config, config=config,
add_crf=True) add_crf=False)
# Data to be predicted # Data to be predicted
# If using python 2, prefix "u" is necessary # If using python 2, prefix "u" is necessary
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on sequence labeling task.""" """Fine-tuning on sequence labeling task."""
import argparse import argparse
import ast import ast
...@@ -23,7 +23,7 @@ import paddlehub as hub ...@@ -23,7 +23,7 @@ import paddlehub as hub
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.") parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.") parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.")
parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.") parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.")
parser.add_argument("--warmup_proportion", type=float, default=0.1, help="Warmup proportion params for warmup strategy") parser.add_argument("--warmup_proportion", type=float, default=0.1, help="Warmup proportion params for warmup strategy")
...@@ -60,13 +60,13 @@ if __name__ == '__main__': ...@@ -60,13 +60,13 @@ if __name__ == '__main__':
inputs["segment_ids"].name, inputs["input_mask"].name inputs["segment_ids"].name, inputs["input_mask"].name
] ]
# Select a finetune strategy # Select a fine-tune strategy
strategy = hub.AdamWeightDecayStrategy( strategy = hub.AdamWeightDecayStrategy(
warmup_proportion=args.warmup_proportion, warmup_proportion=args.warmup_proportion,
weight_decay=args.weight_decay, weight_decay=args.weight_decay,
learning_rate=args.learning_rate) learning_rate=args.learning_rate)
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=args.use_data_parallel, use_data_parallel=args.use_data_parallel,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -75,7 +75,7 @@ if __name__ == '__main__': ...@@ -75,7 +75,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=strategy) strategy=strategy)
# Define a sequence labeling finetune task by PaddleHub's API # Define a sequence labeling fine-tune task by PaddleHub's API
# If add crf, the network use crf as decoder # If add crf, the network use crf as decoder
seq_label_task = hub.SequenceLabelTask( seq_label_task = hub.SequenceLabelTask(
data_reader=reader, data_reader=reader,
...@@ -84,8 +84,8 @@ if __name__ == '__main__': ...@@ -84,8 +84,8 @@ if __name__ == '__main__':
max_seq_len=args.max_seq_len, max_seq_len=args.max_seq_len,
num_classes=dataset.num_labels, num_classes=dataset.num_labels,
config=config, config=config,
add_crf=True) add_crf=False)
# Finetune and evaluate model by PaddleHub's API # Fine-tune and evaluate model by PaddleHub's API
# will finish training, evaluation, testing, save model automatically # will finish training, evaluation, testing, save model automatically
seq_label_task.finetune_and_eval() seq_label_task.finetune_and_eval()
#coding:utf-8 #coding:utf-8
import os import os
import paddlehub as hub import paddlehub as hub
import cv2
if __name__ == "__main__": if __name__ == "__main__":
ssd = hub.Module(name="ssd_mobilenet_v1_pascal") ssd = hub.Module(name="ssd_mobilenet_v1_pascal")
test_img_path = os.path.join("test", "test_img_bird.jpg") test_img_path = os.path.join("test", "test_img_bird.jpg")
# get the input keys for signature 'object_detection'
data_format = ssd.processor.data_format(sign_name='object_detection')
key = list(data_format.keys())[0]
# set input dict
input_dict = {key: [test_img_path]}
# execute predict and print the result # execute predict and print the result
results = ssd.object_detection(data=input_dict) results = ssd.object_detection(images=[cv2.imread(test_img_path)])
for result in results: for result in results:
hub.logger.info(result) print(result)
...@@ -2,9 +2,31 @@ ...@@ -2,9 +2,31 @@
本示例将展示如何使用PaddleHub Fine-tune API以及Transformer类预训练模型(ERNIE/BERT/RoBERTa)完成分类任务。 本示例将展示如何使用PaddleHub Fine-tune API以及Transformer类预训练模型(ERNIE/BERT/RoBERTa)完成分类任务。
**PaddleHub 1.7.0以上版本支持在Transformer类预训练模型之后拼接预置网络(bow, bilstm, cnn, dpcnn, gru, lstm)完成文本分类任务**
## 目录结构
```
text_classification
├── finetuned_model_to_module # PaddleHub Fine-tune得到模型如何转化为module,从而利用PaddleHub Serving部署
│   ├── __init__.py
│   └── module.py
├── predict_predefine_net.py # 加入预置网络预测脚本
├── predict.py # 不使用预置网络(使用fc网络)的预测脚本
├── README.md # 文本分类迁移学习文档说明
├── run_cls_predefine_net.sh # 加入预置网络的文本分类任务训练启动脚本
├── run_cls.sh # 不使用预置网络(使用fc网络)的训练启动脚本
├── run_predict_predefine_net.sh # 使用预置网络(使用fc网络)的预测启动脚本
├── run_predict.sh # # 不使用预置网络(使用fc网络)的预测启动脚本
├── text_classifier_dygraph.py # 动态图训练脚本
├── text_cls_predefine_net.py # 加入预置网络训练脚本
└── text_cls.py # 不使用预置网络(使用fc网络)的训练脚本
```
## 如何开始Fine-tune ## 如何开始Fine-tune
在完成安装PaddlePaddle与PaddleHub后,通过执行脚本`sh run_classifier.sh`即可开始使用ERNIE对ChnSentiCorp数据集进行Fine-tune。 以下例子已不使用预置网络完成文本分类任务,说明PaddleHub如何完成迁移学习。使用预置网络完成文本分类任务,步骤类似。
在完成安装PaddlePaddle与PaddleHub后,通过执行脚本`sh run_cls.sh`即可开始使用ERNIE对ChnSentiCorp数据集进行Fine-tune。
其中脚本参数说明如下: 其中脚本参数说明如下:
...@@ -164,9 +186,27 @@ cls_task = hub.TextClassifierTask( ...@@ -164,9 +186,27 @@ cls_task = hub.TextClassifierTask(
cls_task.finetune_and_eval() cls_task.finetune_and_eval()
``` ```
**NOTE:** **NOTE:**
1. `outputs["pooled_output"]`返回了ERNIE/BERT模型对应的[CLS]向量,可以用于句子或句对的特征表达。 1. `outputs["pooled_output"]`返回了Transformer类预训练模型对应的[CLS]向量,可以用于句子或句对的特征表达。
2. `feed_list`中的inputs参数指名了ERNIE/BERT中的输入tensor的顺序,与ClassifyReader返回的结果一致。 2. `feed_list`中的inputs参数指名了Transformer类预训练模型中的输入tensor的顺序,与ClassifyReader返回的结果一致。
3. `hub.TextClassifierTask`通过输入特征,label与迁移的类别数,可以生成适用于文本分类的迁移任务`TextClassifierTask` 3. `hub.TextClassifierTask`通过输入特征,label与迁移的类别数,可以生成适用于文本分类的迁移任务`TextClassifierTask`
4. 使用预置网络与否,传入`hub.TextClassifierTask`的特征不相同。`hub.TextClassifierTask`通过参数`feature``token_feature`区分。
`feature`应是sentence-level特征,shape应为[-1, emb_size];`token_feature`是token-levle特征,shape应为[-1, max_seq_len, emb_size]。
如果使用预置网络,则应取Transformer类预训练模型的sequence_output特征(`outputs["sequence_output"]`)。并且`hub.TextClassifierTask(token_feature=outputs["sequence_output"])`
如果不使用预置网络,直接通过fc网络进行分类,则应取Transformer类预训练模型的pooled_output特征(`outputs["pooled_output"]`)。并且`hub.TextClassifierTask(feature=outputs["pooled_output"])`
5. 使用预置网络,可以通过`hub.TextClassifierTask`参数network进行指定不同的网络结构。如下代码表示选择bilstm网络拼接在Transformer类预训练模型之后。
PaddleHub文本分类任务预置网络支持BOW,Bi-LSTM,CNN,DPCNN,GRU,LSTM。指定network应是其中之一。
其中DPCNN网络实现为[ACL2017-Deep Pyramid Convolutional Neural Networks for Text Categorization](https://www.aclweb.org/anthology/P17-1052.pdf)
```python
cls_task = hub.TextClassifierTask(
data_reader=reader,
token_feature=outputs["sequence_output"],
feed_list=feed_list,
network='bilstm',
num_classes=dataset.num_labels,
config=config,
metrics_choices=metrics_choices)
```
#### 自定义迁移任务 #### 自定义迁移任务
...@@ -190,29 +230,9 @@ python predict.py --checkpoint_dir $CKPT_DIR --max_seq_len 128 ...@@ -190,29 +230,9 @@ python predict.py --checkpoint_dir $CKPT_DIR --max_seq_len 128
``` ```
其中CKPT_DIR为Fine-tune API保存最佳模型的路径, max_seq_len是ERNIE模型的最大序列长度,*请与训练时配置的参数保持一致* 其中CKPT_DIR为Fine-tune API保存最佳模型的路径, max_seq_len是ERNIE模型的最大序列长度,*请与训练时配置的参数保持一致*
参数配置正确后,请执行脚本`sh run_predict.sh`,即可看到以下文本分类预测结果, 以及最终准确率。 参数配置正确后,请执行脚本`sh run_predict.sh`,即可看到文本分类预测结果。
如需了解更多预测步骤,请参考`predict.py`
```
这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般 predict=0
交通方便;环境很好;服务态度很好 房间较小 predict=1
19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~ predict=0
```
我们在AI Studio上提供了IPython NoteBook形式的demo,您可以直接在平台上在线体验,链接如下: 我们在AI Studio上提供了IPython NoteBook形式的demo,点击[PaddleHub教程合集](https://aistudio.baidu.com/aistudio/projectdetail/231146),可使用AI Studio平台提供的GPU算力进行快速尝试。
|预训练模型|任务类型|数据集|AIStudio链接|备注|
|-|-|-|-|-|
|ResNet|图像分类|猫狗数据集DogCat|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/147010)||
|ERNIE|文本分类|中文情感分类数据集ChnSentiCorp|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/147006)||
|ERNIE|文本分类|中文新闻分类数据集THUNEWS|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/221999)|本教程讲述了如何将自定义数据集加载,并利用Fine-tune API完成文本分类迁移学习。|
|ERNIE|序列标注|中文序列标注数据集MSRA_NER|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/147009)||
|ERNIE|序列标注|中文快递单数据集Express|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/184200)|本教程讲述了如何将自定义数据集加载,并利用Fine-tune API完成序列标注迁移学习。|
|ERNIE Tiny|文本分类|中文情感分类数据集ChnSentiCorp|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/186443)||
|Senta|文本分类|中文情感分类数据集ChnSentiCorp|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/216846)|本教程讲述了任何利用Senta和Fine-tune API完成情感分类迁移学习。|
|Senta|情感分析预测|N/A|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/215814)||
|LAC|词法分析|N/A|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/215711)||
|Ultra-Light-Fast-Generic-Face-Detector-1MB|人脸检测|N/A|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/215962)||
## 超参优化AutoDL Finetuner ## 超参优化AutoDL Finetuner
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -32,7 +32,7 @@ parser = argparse.ArgumentParser(__doc__) ...@@ -32,7 +32,7 @@ parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint") parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint")
parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.") parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.")
parser.add_argument("--max_seq_len", type=int, default=512, help="Number of words of the longest seqence.") parser.add_argument("--max_seq_len", type=int, default=512, help="Number of words of the longest seqence.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--use_data_parallel", type=ast.literal_eval, default=False, help="Whether use data parallel.") parser.add_argument("--use_data_parallel", type=ast.literal_eval, default=False, help="Whether use data parallel.")
args = parser.parse_args() args = parser.parse_args()
# yapf: enable. # yapf: enable.
...@@ -70,7 +70,7 @@ if __name__ == '__main__': ...@@ -70,7 +70,7 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=args.use_data_parallel, use_data_parallel=args.use_data_parallel,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -78,7 +78,7 @@ if __name__ == '__main__': ...@@ -78,7 +78,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=hub.AdamWeightDecayStrategy()) strategy=hub.AdamWeightDecayStrategy())
# Define a classfication finetune task by PaddleHub's API # Define a classfication fine-tune task by PaddleHub's API
cls_task = hub.TextClassifierTask( cls_task = hub.TextClassifierTask(
data_reader=reader, data_reader=reader,
feature=pooled_output, feature=pooled_output,
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -32,7 +32,7 @@ parser = argparse.ArgumentParser(__doc__) ...@@ -32,7 +32,7 @@ parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint") parser.add_argument("--checkpoint_dir", type=str, default=None, help="Directory to model checkpoint")
parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.") parser.add_argument("--batch_size", type=int, default=1, help="Total examples' number in batch for training.")
parser.add_argument("--max_seq_len", type=int, default=512, help="Number of words of the longest seqence.") parser.add_argument("--max_seq_len", type=int, default=512, help="Number of words of the longest seqence.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=False, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--use_data_parallel", type=ast.literal_eval, default=False, help="Whether use data parallel.") parser.add_argument("--use_data_parallel", type=ast.literal_eval, default=False, help="Whether use data parallel.")
parser.add_argument("--network", type=str, default='bilstm', help="Pre-defined network which was connected after Transformer model, such as ERNIE, BERT ,RoBERTa and ELECTRA.") parser.add_argument("--network", type=str, default='bilstm', help="Pre-defined network which was connected after Transformer model, such as ERNIE, BERT ,RoBERTa and ELECTRA.")
args = parser.parse_args() args = parser.parse_args()
...@@ -71,7 +71,7 @@ if __name__ == '__main__': ...@@ -71,7 +71,7 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=args.use_data_parallel, use_data_parallel=args.use_data_parallel,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -79,7 +79,7 @@ if __name__ == '__main__': ...@@ -79,7 +79,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=hub.AdamWeightDecayStrategy()) strategy=hub.AdamWeightDecayStrategy())
# Define a classfication finetune task by PaddleHub's API # Define a classfication fine-tune task by PaddleHub's API
# network choice: bilstm, bow, cnn, dpcnn, gru, lstm (PaddleHub pre-defined network) # network choice: bilstm, bow, cnn, dpcnn, gru, lstm (PaddleHub pre-defined network)
# If you wanna add network after ERNIE/BERT/RoBERTa/ELECTRA module, # If you wanna add network after ERNIE/BERT/RoBERTa/ELECTRA module,
# you must use the outputs["sequence_output"] as the token_feature of TextClassifierTask, # you must use the outputs["sequence_output"] as the token_feature of TextClassifierTask,
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
import argparse import argparse
import ast import ast
...@@ -21,7 +21,7 @@ import paddlehub as hub ...@@ -21,7 +21,7 @@ import paddlehub as hub
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.") parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.") parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.")
parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.") parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.")
parser.add_argument("--warmup_proportion", type=float, default=0.1, help="Warmup proportion params for warmup strategy") parser.add_argument("--warmup_proportion", type=float, default=0.1, help="Warmup proportion params for warmup strategy")
...@@ -68,13 +68,13 @@ if __name__ == '__main__': ...@@ -68,13 +68,13 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Select finetune strategy, setup config and finetune # Select fine-tune strategy, setup config and fine-tune
strategy = hub.AdamWeightDecayStrategy( strategy = hub.AdamWeightDecayStrategy(
warmup_proportion=args.warmup_proportion, warmup_proportion=args.warmup_proportion,
weight_decay=args.weight_decay, weight_decay=args.weight_decay,
learning_rate=args.learning_rate) learning_rate=args.learning_rate)
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=args.use_data_parallel, use_data_parallel=args.use_data_parallel,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -83,7 +83,7 @@ if __name__ == '__main__': ...@@ -83,7 +83,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=strategy) strategy=strategy)
# Define a classfication finetune task by PaddleHub's API # Define a classfication fine-tune task by PaddleHub's API
cls_task = hub.TextClassifierTask( cls_task = hub.TextClassifierTask(
data_reader=reader, data_reader=reader,
feature=pooled_output, feature=pooled_output,
...@@ -92,6 +92,6 @@ if __name__ == '__main__': ...@@ -92,6 +92,6 @@ if __name__ == '__main__':
config=config, config=config,
metrics_choices=metrics_choices) metrics_choices=metrics_choices)
# Finetune and evaluate by PaddleHub's API # Fine-tune and evaluate by PaddleHub's API
# will finish training, evaluation, testing, save model automatically # will finish training, evaluation, testing, save model automatically
cls_task.finetune_and_eval() cls_task.finetune_and_eval()
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Finetuning on classification task """ """Fine-tuning on classification task """
import argparse import argparse
import ast import ast
...@@ -21,7 +21,7 @@ import paddlehub as hub ...@@ -21,7 +21,7 @@ import paddlehub as hub
# yapf: disable # yapf: disable
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.") parser.add_argument("--num_epoch", type=int, default=3, help="Number of epoches for fine-tuning.")
parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for finetuning, input should be True or False") parser.add_argument("--use_gpu", type=ast.literal_eval, default=True, help="Whether use GPU for fine-tuning, input should be True or False")
parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.") parser.add_argument("--learning_rate", type=float, default=5e-5, help="Learning rate used to train with warmup.")
parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.") parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay rate for L2 regularizer.")
parser.add_argument("--warmup_proportion", type=float, default=0.1, help="Warmup proportion params for warmup strategy") parser.add_argument("--warmup_proportion", type=float, default=0.1, help="Warmup proportion params for warmup strategy")
...@@ -69,13 +69,13 @@ if __name__ == '__main__': ...@@ -69,13 +69,13 @@ if __name__ == '__main__':
inputs["input_mask"].name, inputs["input_mask"].name,
] ]
# Select finetune strategy, setup config and finetune # Select fine-tune strategy, setup config and fine-tune
strategy = hub.AdamWeightDecayStrategy( strategy = hub.AdamWeightDecayStrategy(
warmup_proportion=args.warmup_proportion, warmup_proportion=args.warmup_proportion,
weight_decay=args.weight_decay, weight_decay=args.weight_decay,
learning_rate=args.learning_rate) learning_rate=args.learning_rate)
# Setup runing config for PaddleHub Finetune API # Setup RunConfig for PaddleHub Fine-tune API
config = hub.RunConfig( config = hub.RunConfig(
use_data_parallel=args.use_data_parallel, use_data_parallel=args.use_data_parallel,
use_cuda=args.use_gpu, use_cuda=args.use_gpu,
...@@ -84,7 +84,7 @@ if __name__ == '__main__': ...@@ -84,7 +84,7 @@ if __name__ == '__main__':
checkpoint_dir=args.checkpoint_dir, checkpoint_dir=args.checkpoint_dir,
strategy=strategy) strategy=strategy)
# Define a classfication finetune task by PaddleHub's API # Define a classfication fine-tune task by PaddleHub's API
# network choice: bilstm, bow, cnn, dpcnn, gru, lstm (PaddleHub pre-defined network) # network choice: bilstm, bow, cnn, dpcnn, gru, lstm (PaddleHub pre-defined network)
# If you wanna add network after ERNIE/BERT/RoBERTa/ELECTRA module, # If you wanna add network after ERNIE/BERT/RoBERTa/ELECTRA module,
# you must use the outputs["sequence_output"] as the token_feature of TextClassifierTask, # you must use the outputs["sequence_output"] as the token_feature of TextClassifierTask,
...@@ -98,6 +98,6 @@ if __name__ == '__main__': ...@@ -98,6 +98,6 @@ if __name__ == '__main__':
config=config, config=config,
metrics_choices=metrics_choices) metrics_choices=metrics_choices)
# Finetune and evaluate by PaddleHub's API # Fine-tune and evaluate by PaddleHub's API
# will finish training, evaluation, testing, save model automatically # will finish training, evaluation, testing, save model automatically
cls_task.finetune_and_eval() cls_task.finetune_and_eval()
...@@ -13,22 +13,22 @@ Task的基本方法和属性参见[BaseTask](base_task.md)。 ...@@ -13,22 +13,22 @@ Task的基本方法和属性参见[BaseTask](base_task.md)。
PaddleHub预置了常见任务的Task,每种Task都有自己特有的应用场景以及提供了对应的度量指标,用于适应用户的不同需求。预置的任务类型如下: PaddleHub预置了常见任务的Task,每种Task都有自己特有的应用场景以及提供了对应的度量指标,用于适应用户的不同需求。预置的任务类型如下:
* 图像分类任务 * 图像分类任务
[ImageClassifierTask]() [ImageClassifierTask](image_classify_task.md)
* 文本分类任务 * 文本分类任务
[TextClassifierTask]() [TextClassifierTask](text_classify_task.md)
* 序列标注任务 * 序列标注任务
[SequenceLabelTask]() [SequenceLabelTask](sequence_label_task.md)
* 多标签分类任务 * 多标签分类任务
[MultiLabelClassifierTask]() [MultiLabelClassifierTask](multi_lable_classify_task.md)
* 回归任务 * 回归任务
[RegressionTask]() [RegressionTask](regression_task.md)
* 阅读理解任务 * 阅读理解任务
[ReadingComprehensionTask]() [ReadingComprehensionTask](reading_comprehension_task.md)
## 自定义Task ## 自定义Task
如果这些Task不支持您的特定需求,您也可以通过继承BasicTask来实现自己的任务,具体实现细节参见[自定义Task]() 如果这些Task不支持您的特定需求,您也可以通过继承BasicTask来实现自己的任务,具体实现细节参见[自定义Task](../../tutorial/how_to_define_task.md)以及[修改Task中的模型网络](../../tutorial/define_task_example.md)
## 修改Task内置方法 ## 修改Task内置方法
如果Task内置方法不满足您的需求,您可以通过Task支持的Hook机制修改方法实现,详细信息参见[修改Task内置方法]() 如果Task内置方法不满足您的需求,您可以通过Task支持的Hook机制修改方法实现,详细信息参见[修改Task内置方法](../../tutorial/hook.md)
...@@ -2,23 +2,28 @@ ...@@ -2,23 +2,28 @@
文本分类任务Task,继承自[BaseTask](base_task.md),该Task基于输入的特征,添加一个Dropout层,以及一个或多个全连接层来创建一个文本分类任务用于finetune,度量指标为准确率,损失函数为交叉熵Loss。 文本分类任务Task,继承自[BaseTask](base_task.md),该Task基于输入的特征,添加一个Dropout层,以及一个或多个全连接层来创建一个文本分类任务用于finetune,度量指标为准确率,损失函数为交叉熵Loss。
```python ```python
hub.TextClassifierTask( hub.TextClassifierTask(
feature,
num_classes, num_classes,
feed_list, feed_list,
data_reader, data_reader,
feature=None,
token_feature=None,
startup_program=None, startup_program=None,
config=None, config=None,
hidden_units=None, hidden_units=None,
network=None,
metrics_choices="default"): metrics_choices="default"):
``` ```
**参数** **参数**
* feature (fluid.Variable): 输入的特征矩阵。
* num_classes (int): 分类任务的类别数量 * num_classes (int): 分类任务的类别数量
* feed_list (list): 待feed变量的名字列表 * feed_list (list): 待feed变量的名字列表
* data_reader: 提供数据的Reader * data_reader: 提供数据的Reader,可选为ClassifyReader和LACClassifyReader。
* feature(fluid.Variable): 输入的sentence-level特征矩阵,shape应为[-1, emb_size]。默认为None。
* token_feature(fluid.Variable): 输入的token-level特征矩阵,shape应为[-1, seq_len, emb_size]。默认为None。feature和token_feature须指定其中一个。
* network(str): 文本分类任务PaddleHub预置网络,支持BOW,Bi-LSTM,CNN,DPCNN,GRU,LSTM。如果指定network,则应使用token_feature作为输入特征。其中DPCNN网络实现为[ACL2017-Deep Pyramid Convolutional Neural Networks for Text Categorization](https://www.aclweb.org/anthology/P17-1052.pdf)
* startup_program (fluid.Program): 存储了模型参数初始化op的Program,如果未提供,则使用fluid.default_startup_program() * startup_program (fluid.Program): 存储了模型参数初始化op的Program,如果未提供,则使用fluid.default_startup_program()
* config ([RunConfig](../config.md)): 运行配置 * config ([RunConfig](../config.md)): 运行配置,如设置batch_size,epoch,learning_rate等。
* hidden_units (list): TextClassifierTask最终的全连接层输出维度为label_size,是每个label的概率值。在这个全连接层之前可以设置额外的全连接层,并指定它们的输出维度,例如hidden_units=[4,2]表示先经过一层输出维度为4的全连接层,再输入一层输出维度为2的全连接层,最后再输入输出维度为label_size的全连接层。 * hidden_units (list): TextClassifierTask最终的全连接层输出维度为label_size,是每个label的概率值。在这个全连接层之前可以设置额外的全连接层,并指定它们的输出维度,例如hidden_units=[4,2]表示先经过一层输出维度为4的全连接层,再输入一层输出维度为2的全连接层,最后再输入输出维度为label_size的全连接层。
* metrics_choices("default" or list ⊂ ["acc", "f1", "matthews"]): 任务训练过程中需要计算的评估指标,默认为“default”,此时等效于["acc"]。metrics_choices支持训练过程中同时评估多个指标,其中指定的第一个指标将被作为主指标用于判断当前得分是否为最佳分值,例如["matthews", "acc"],"matthews"将作为主指标,参与最佳模型的判断中;“acc”只计算并输出,不参与最佳模型的判断。 * metrics_choices("default" or list ⊂ ["acc", "f1", "matthews"]): 任务训练过程中需要计算的评估指标,默认为“default”,此时等效于["acc"]。metrics_choices支持训练过程中同时评估多个指标,其中指定的第一个指标将被作为主指标用于判断当前得分是否为最佳分值,例如["matthews", "acc"],"matthews"将作为主指标,参与最佳模型的判断中;“acc”只计算并输出,不参与最佳模型的判断。
...@@ -28,4 +33,4 @@ hub.TextClassifierTask( ...@@ -28,4 +33,4 @@ hub.TextClassifierTask(
**示例** **示例**
[文本分类](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.4/demo/text_classification/text_classifier.py) [文本分类](../../../demo/text_classification/text_cls.py)
# 更新历史 # 更新历史
## `v1.7.0`
* 丰富预训练模型,提升应用性
* 新增VENUS系列视觉预训练模型[yolov3_darknet53_venus](https://www.paddlepaddle.org.cn/hubdetail?name=yolov3_darknet53_venus&en_category=ObjectDetection)[faster_rcnn_resnet50_fpn_venus](https://www.paddlepaddle.org.cn/hubdetail?name=faster_rcnn_resnet50_fpn_venus&en_category=ObjectDetection),可大幅度提升图像分类和目标检测任务的Fine-tune效果
* 新增工业级短视频分类模型[videotag_tsn_lstm](https://paddlepaddle.org.cn/hubdetail?name=videotag_tsn_lstm&en_category=VideoClassification),支持3000类中文标签识别
* 新增轻量级中文OCR模型[chinese_ocr_db_rcnn](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_rcnn&en_category=TextRecognition)[chinese_text_detection_db](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_text_detection_db&en_category=TextRecognition),支持一键快速OCR识别
* 新增行人检测、车辆检测、动物识别、Object等工业级模型
* Fine-tune API升级
* 文本分类任务新增6个预置网络,包括CNN, BOW, LSTM, BiLSTM, DPCNN等
* 使用VisualDL可视化训练评估性能数据
## `v1.6.2`
* 修复图像分类在windows下运行错误
## `v1.6.1`
* 修复windows下安装PaddleHub缺失config.json文件
## `v1.6.0`
* NLP Module全面升级,提升应用性和灵活性
* lac、senta系列(bow、cnn、bilstm、gru、lstm)、simnet_bow、porn_detection系列(cnn、gru、lstm)升级高性能预测,性能提升高达50%
* ERNIE、BERT、RoBERTa等Transformer类语义模型新增获取预训练embedding接口get_embedding,方便接入下游任务,提升应用性
* 新增RoBERTa通过模型结构压缩得到的3层Transformer模型[rbt3](https://www.paddlepaddle.org.cn/hubdetail?name=rbt3&en_category=SemanticModel)[rbtl3](https://www.paddlepaddle.org.cn/hubdetail?name=rbtl3&en_category=SemanticModel)
* Task predict接口增加高性能预测模式accelerate_mode,性能提升高达90%
* PaddleHub Module创建流程开放,支持Fine-tune模型转化,全面提升应用性和灵活性
* [预训练模型转化为PaddleHub Module教程](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.6/docs/contribution/contri_pretrained_model.md)
* [Fine-tune模型转化为PaddleHub Module教程](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.6/docs/tutorial/finetuned_model_to_module.md)
* [PaddleHub Serving](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.6/docs/tutorial/serving.md)优化启动方式,支持更加灵活的参数配置
## `v1.5.2` ## `v1.5.2`
* 优化pyramidbox_lite_server_mask、pyramidbox_lite_mobile_mask模型的服务化部署性能 * 优化pyramidbox_lite_server_mask、pyramidbox_lite_mobile_mask模型的服务化部署性能
......
...@@ -95,7 +95,7 @@ label_list.txt的格式如下 ...@@ -95,7 +95,7 @@ label_list.txt的格式如下
``` ```
示例: 示例:
[DogCat数据集](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub-API:-Dataset#class-hubdatasetdogcatdataset)为示例,train_list.txt/test_list.txt/validate_list.txt内容如下示例 [DogCat数据集](../reference/dataset.md#class-hubdatasetdogcatdataset)为示例,train_list.txt/test_list.txt/validate_list.txt内容如下示例
``` ```
cat/3270.jpg 0 cat/3270.jpg 0
cat/646.jpg 0 cat/646.jpg 0
......
...@@ -175,7 +175,7 @@ class EfficientNetB0ImageNet(hub.Module): ...@@ -175,7 +175,7 @@ class EfficientNetB0ImageNet(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
all_data = list() all_data = list()
......
...@@ -161,7 +161,7 @@ class FixResnext10132x48dwslImagenet(hub.Module): ...@@ -161,7 +161,7 @@ class FixResnext10132x48dwslImagenet(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
if not self.predictor_set: if not self.predictor_set:
......
...@@ -161,7 +161,7 @@ class MobileNetV2Animals(hub.Module): ...@@ -161,7 +161,7 @@ class MobileNetV2Animals(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
all_data = list() all_data = list()
......
...@@ -161,7 +161,7 @@ class MobileNetV2Dishes(hub.Module): ...@@ -161,7 +161,7 @@ class MobileNetV2Dishes(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
all_data = list() all_data = list()
......
...@@ -184,7 +184,7 @@ class MobileNetV2ImageNetSSLD(hub.Module): ...@@ -184,7 +184,7 @@ class MobileNetV2ImageNetSSLD(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
all_data = list() all_data = list()
......
...@@ -161,7 +161,7 @@ class MobileNetV3Large(hub.Module): ...@@ -161,7 +161,7 @@ class MobileNetV3Large(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
all_data = list() all_data = list()
......
...@@ -161,7 +161,7 @@ class MobileNetV3Small(hub.Module): ...@@ -161,7 +161,7 @@ class MobileNetV3Small(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
all_data = list() all_data = list()
......
...@@ -161,7 +161,7 @@ class ResNet18vdImageNet(hub.Module): ...@@ -161,7 +161,7 @@ class ResNet18vdImageNet(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
if not self.predictor_set: if not self.predictor_set:
......
...@@ -161,7 +161,7 @@ class ResNet50vdAnimals(hub.Module): ...@@ -161,7 +161,7 @@ class ResNet50vdAnimals(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
all_data = list() all_data = list()
......
...@@ -161,7 +161,7 @@ class ResNet50vdDishes(hub.Module): ...@@ -161,7 +161,7 @@ class ResNet50vdDishes(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
all_data = list() all_data = list()
......
...@@ -161,7 +161,7 @@ class ResNet50vdDishes(hub.Module): ...@@ -161,7 +161,7 @@ class ResNet50vdDishes(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
all_data = list() all_data = list()
......
...@@ -161,7 +161,7 @@ class ResNet50vdWildAnimals(hub.Module): ...@@ -161,7 +161,7 @@ class ResNet50vdWildAnimals(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
all_data = list() all_data = list()
......
...@@ -161,7 +161,7 @@ class SEResNet18vdImageNet(hub.Module): ...@@ -161,7 +161,7 @@ class SEResNet18vdImageNet(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
if not self.predictor_set: if not self.predictor_set:
......
...@@ -83,7 +83,7 @@ class PyramidBoxFaceDetection(hub.Module): ...@@ -83,7 +83,7 @@ class PyramidBoxFaceDetection(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
# compatibility with older versions # compatibility with older versions
......
...@@ -28,6 +28,7 @@ class PyramidBoxLiteMobile(hub.Module): ...@@ -28,6 +28,7 @@ class PyramidBoxLiteMobile(hub.Module):
self.default_pretrained_model_path = os.path.join( self.default_pretrained_model_path = os.path.join(
self.directory, "pyramidbox_lite_mobile_face_detection") self.directory, "pyramidbox_lite_mobile_face_detection")
self._set_config() self._set_config()
self.processor = self
def _set_config(self): def _set_config(self):
""" """
...@@ -81,7 +82,7 @@ class PyramidBoxLiteMobile(hub.Module): ...@@ -81,7 +82,7 @@ class PyramidBoxLiteMobile(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
# compatibility with older versions # compatibility with older versions
...@@ -130,6 +131,9 @@ class PyramidBoxLiteMobile(hub.Module): ...@@ -130,6 +131,9 @@ class PyramidBoxLiteMobile(hub.Module):
program, feeded_var_names, target_vars = fluid.io.load_inference_model( program, feeded_var_names, target_vars = fluid.io.load_inference_model(
dirname=self.default_pretrained_model_path, executor=exe) dirname=self.default_pretrained_model_path, executor=exe)
var = program.global_block().vars['detection_output_0.tmp_1']
var.desc.set_dtype(fluid.core.VarDesc.VarType.INT32)
fluid.io.save_inference_model( fluid.io.save_inference_model(
dirname=dirname, dirname=dirname,
main_program=program, main_program=program,
......
...@@ -37,6 +37,7 @@ class PyramidBoxLiteMobileMask(hub.Module): ...@@ -37,6 +37,7 @@ class PyramidBoxLiteMobileMask(hub.Module):
else: else:
self.face_detector = face_detector_module self.face_detector = face_detector_module
self._set_config() self._set_config()
self.processor = self
def _set_config(self): def _set_config(self):
""" """
...@@ -107,7 +108,7 @@ class PyramidBoxLiteMobileMask(hub.Module): ...@@ -107,7 +108,7 @@ class PyramidBoxLiteMobileMask(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
# compatibility with older versions # compatibility with older versions
......
...@@ -94,12 +94,12 @@ def draw_bounding_box_on_image(save_im_path, output_data): ...@@ -94,12 +94,12 @@ def draw_bounding_box_on_image(save_im_path, output_data):
box_fill = (255) box_fill = (255)
text_fill = (0) text_fill = (0)
draw.rectangle( draw.rectangle(
xy=(bbox['left'], bbox['top'] - (textsize_height + 5), xy=(bbox['left'], bbox['top'] - (textsize_height + 5),
bbox['left'] + textsize_width + 10, bbox['top'] - 3), bbox['left'] + textsize_width + 10, bbox['top'] - 3),
fill=box_fill) fill=box_fill)
draw.text( draw.text(
xy=(bbox['left'], bbox['top'] - 15), text=text, fill=text_fill) xy=(bbox['left'], bbox['top'] - 15), text=text, fill=text_fill)
image.save(save_im_path) image.save(save_im_path)
......
...@@ -28,6 +28,7 @@ class PyramidBoxLiteServer(hub.Module): ...@@ -28,6 +28,7 @@ class PyramidBoxLiteServer(hub.Module):
self.default_pretrained_model_path = os.path.join( self.default_pretrained_model_path = os.path.join(
self.directory, "pyramidbox_lite_server_face_detection") self.directory, "pyramidbox_lite_server_face_detection")
self._set_config() self._set_config()
self.processor = self
def _set_config(self): def _set_config(self):
""" """
...@@ -81,7 +82,7 @@ class PyramidBoxLiteServer(hub.Module): ...@@ -81,7 +82,7 @@ class PyramidBoxLiteServer(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
# compatibility with older versions # compatibility with older versions
......
...@@ -37,6 +37,7 @@ class PyramidBoxLiteServerMask(hub.Module): ...@@ -37,6 +37,7 @@ class PyramidBoxLiteServerMask(hub.Module):
else: else:
self.face_detector = face_detector_module self.face_detector = face_detector_module
self._set_config() self._set_config()
self.processor = self
def _set_config(self): def _set_config(self):
""" """
...@@ -106,7 +107,7 @@ class PyramidBoxLiteServerMask(hub.Module): ...@@ -106,7 +107,7 @@ class PyramidBoxLiteServerMask(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
# compatibility with older versions # compatibility with older versions
......
...@@ -94,12 +94,12 @@ def draw_bounding_box_on_image(save_im_path, output_data): ...@@ -94,12 +94,12 @@ def draw_bounding_box_on_image(save_im_path, output_data):
box_fill = (255) box_fill = (255)
text_fill = (0) text_fill = (0)
draw.rectangle( draw.rectangle(
xy=(bbox['left'], bbox['top'] - (textsize_height + 5), xy=(bbox['left'], bbox['top'] - (textsize_height + 5),
bbox['left'] + textsize_width + 10, bbox['top'] - 3), bbox['left'] + textsize_width + 10, bbox['top'] - 3),
fill=box_fill) fill=box_fill)
draw.text( draw.text(
xy=(bbox['left'], bbox['top'] - 15), text=text, fill=text_fill) xy=(bbox['left'], bbox['top'] - 15), text=text, fill=text_fill)
image.save(save_im_path) image.save(save_im_path)
......
...@@ -107,7 +107,7 @@ class FaceDetector320(hub.Module): ...@@ -107,7 +107,7 @@ class FaceDetector320(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
# compatibility with older versions # compatibility with older versions
......
...@@ -106,7 +106,7 @@ class FaceDetector640(hub.Module): ...@@ -106,7 +106,7 @@ class FaceDetector640(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
# compatibility with older versions # compatibility with older versions
......
...@@ -133,7 +133,7 @@ class FaceLandmarkLocalization(hub.Module): ...@@ -133,7 +133,7 @@ class FaceLandmarkLocalization(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
# get all data # get all data
......
...@@ -323,7 +323,7 @@ class FasterRCNNResNet50(hub.Module): ...@@ -323,7 +323,7 @@ class FasterRCNNResNet50(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
paths = paths if paths else list() paths = paths if paths else list()
if data and 'image' in data: if data and 'image' in data:
......
...@@ -333,7 +333,7 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -333,7 +333,7 @@ class FasterRCNNResNet50RPN(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
paths = paths if paths else list() paths = paths if paths else list()
......
...@@ -246,7 +246,7 @@ class RetinaNetResNet50FPN(hub.Module): ...@@ -246,7 +246,7 @@ class RetinaNetResNet50FPN(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
all_images = list() all_images = list()
......
...@@ -21,7 +21,7 @@ from ssd_mobilenet_v1_pascal.data_feed import reader ...@@ -21,7 +21,7 @@ from ssd_mobilenet_v1_pascal.data_feed import reader
@moduleinfo( @moduleinfo(
name="ssd_mobilenet_v1_pascal", name="ssd_mobilenet_v1_pascal",
version="1.1.0", version="1.1.1",
type="cv/object_detection", type="cv/object_detection",
summary="SSD with backbone MobileNet_V1, trained with dataset Pasecal VOC.", summary="SSD with backbone MobileNet_V1, trained with dataset Pasecal VOC.",
author="paddlepaddle", author="paddlepaddle",
...@@ -194,7 +194,7 @@ class SSDMobileNetv1(hub.Module): ...@@ -194,7 +194,7 @@ class SSDMobileNetv1(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
paths = paths if paths else list() paths = paths if paths else list()
...@@ -275,7 +275,7 @@ class SSDMobileNetv1(hub.Module): ...@@ -275,7 +275,7 @@ class SSDMobileNetv1(hub.Module):
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
args = self.parser.parse_args(argvs) args = self.parser.parse_args(argvs)
results = self.face_detection( results = self.object_detection(
paths=[args.input_path], paths=[args.input_path],
batch_size=args.batch_size, batch_size=args.batch_size,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -21,7 +21,7 @@ from ssd_vgg16_300_coco2017.data_feed import reader ...@@ -21,7 +21,7 @@ from ssd_vgg16_300_coco2017.data_feed import reader
@moduleinfo( @moduleinfo(
name="ssd_vgg16_300_coco2017", name="ssd_vgg16_300_coco2017",
version="1.0.0", version="1.0.1",
type="cv/object_detection", type="cv/object_detection",
summary="SSD with backbone VGG16, trained with dataset COCO.", summary="SSD with backbone VGG16, trained with dataset COCO.",
author="paddlepaddle", author="paddlepaddle",
...@@ -264,7 +264,7 @@ class SSDVGG16(hub.Module): ...@@ -264,7 +264,7 @@ class SSDVGG16(hub.Module):
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
args = self.parser.parse_args(argvs) args = self.parser.parse_args(argvs)
results = self.face_detection( results = self.object_detection(
paths=[args.input_path], paths=[args.input_path],
batch_size=args.batch_size, batch_size=args.batch_size,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -21,7 +21,7 @@ from ssd_vgg16_512_coco2017.data_feed import reader ...@@ -21,7 +21,7 @@ from ssd_vgg16_512_coco2017.data_feed import reader
@moduleinfo( @moduleinfo(
name="ssd_vgg16_512_coco2017", name="ssd_vgg16_512_coco2017",
version="1.0.0", version="1.0.1",
type="cv/object_detection", type="cv/object_detection",
summary="SSD with backbone VGG16, trained with dataset COCO.", summary="SSD with backbone VGG16, trained with dataset COCO.",
author="paddlepaddle", author="paddlepaddle",
...@@ -200,7 +200,7 @@ class SSDVGG16_512(hub.Module): ...@@ -200,7 +200,7 @@ class SSDVGG16_512(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
paths = paths if paths else list() paths = paths if paths else list()
...@@ -278,7 +278,7 @@ class SSDVGG16_512(hub.Module): ...@@ -278,7 +278,7 @@ class SSDVGG16_512(hub.Module):
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
args = self.parser.parse_args(argvs) args = self.parser.parse_args(argvs)
results = self.face_detection( results = self.object_detection(
paths=[args.input_path], paths=[args.input_path],
batch_size=args.batch_size, batch_size=args.batch_size,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -21,7 +21,7 @@ from yolov3_darknet53_coco2017.yolo_head import MultiClassNMS, YOLOv3Head ...@@ -21,7 +21,7 @@ from yolov3_darknet53_coco2017.yolo_head import MultiClassNMS, YOLOv3Head
@moduleinfo( @moduleinfo(
name="yolov3_darknet53_coco2017", name="yolov3_darknet53_coco2017",
version="1.1.0", version="1.1.1",
type="CV/object_detection", type="CV/object_detection",
summary= summary=
"Baidu's YOLOv3 model for object detection, with backbone DarkNet53, trained with dataset coco2017.", "Baidu's YOLOv3 model for object detection, with backbone DarkNet53, trained with dataset coco2017.",
...@@ -186,7 +186,7 @@ class YOLOv3DarkNet53Coco2017(hub.Module): ...@@ -186,7 +186,7 @@ class YOLOv3DarkNet53Coco2017(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
paths = paths if paths else list() paths = paths if paths else list()
...@@ -270,7 +270,7 @@ class YOLOv3DarkNet53Coco2017(hub.Module): ...@@ -270,7 +270,7 @@ class YOLOv3DarkNet53Coco2017(hub.Module):
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
args = self.parser.parse_args(argvs) args = self.parser.parse_args(argvs)
results = self.face_detection( results = self.object_detection(
paths=[args.input_path], paths=[args.input_path],
batch_size=args.batch_size, batch_size=args.batch_size,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -21,7 +21,7 @@ from yolov3_darknet53_pedestrian.yolo_head import MultiClassNMS, YOLOv3Head ...@@ -21,7 +21,7 @@ from yolov3_darknet53_pedestrian.yolo_head import MultiClassNMS, YOLOv3Head
@moduleinfo( @moduleinfo(
name="yolov3_darknet53_pedestrian", name="yolov3_darknet53_pedestrian",
version="1.0.0", version="1.0.1",
type="CV/object_detection", type="CV/object_detection",
summary= summary=
"Baidu's YOLOv3 model for pedestrian detection, with backbone DarkNet53.", "Baidu's YOLOv3 model for pedestrian detection, with backbone DarkNet53.",
...@@ -199,7 +199,7 @@ class YOLOv3DarkNet53Pedestrian(hub.Module): ...@@ -199,7 +199,7 @@ class YOLOv3DarkNet53Pedestrian(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
paths = paths if paths else list() paths = paths if paths else list()
...@@ -280,7 +280,7 @@ class YOLOv3DarkNet53Pedestrian(hub.Module): ...@@ -280,7 +280,7 @@ class YOLOv3DarkNet53Pedestrian(hub.Module):
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
args = self.parser.parse_args(argvs) args = self.parser.parse_args(argvs)
results = self.face_detection( results = self.object_detection(
paths=[args.input_path], paths=[args.input_path],
batch_size=args.batch_size, batch_size=args.batch_size,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -21,7 +21,7 @@ from yolov3_darknet53_vehicles.yolo_head import MultiClassNMS, YOLOv3Head ...@@ -21,7 +21,7 @@ from yolov3_darknet53_vehicles.yolo_head import MultiClassNMS, YOLOv3Head
@moduleinfo( @moduleinfo(
name="yolov3_darknet53_vehicles", name="yolov3_darknet53_vehicles",
version="1.0.0", version="1.0.1",
type="CV/object_detection", type="CV/object_detection",
summary= summary=
"Baidu's YOLOv3 model for vehicles detection, with backbone DarkNet53.", "Baidu's YOLOv3 model for vehicles detection, with backbone DarkNet53.",
...@@ -199,7 +199,7 @@ class YOLOv3DarkNet53Vehicles(hub.Module): ...@@ -199,7 +199,7 @@ class YOLOv3DarkNet53Vehicles(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
paths = paths if paths else list() paths = paths if paths else list()
...@@ -280,7 +280,7 @@ class YOLOv3DarkNet53Vehicles(hub.Module): ...@@ -280,7 +280,7 @@ class YOLOv3DarkNet53Vehicles(hub.Module):
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
args = self.parser.parse_args(argvs) args = self.parser.parse_args(argvs)
results = self.face_detection( results = self.object_detection(
paths=[args.input_path], paths=[args.input_path],
batch_size=args.batch_size, batch_size=args.batch_size,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -21,7 +21,7 @@ from yolov3_mobilenet_v1_coco2017.yolo_head import MultiClassNMS, YOLOv3Head ...@@ -21,7 +21,7 @@ from yolov3_mobilenet_v1_coco2017.yolo_head import MultiClassNMS, YOLOv3Head
@moduleinfo( @moduleinfo(
name="yolov3_mobilenet_v1_coco2017", name="yolov3_mobilenet_v1_coco2017",
version="1.0.0", version="1.0.1",
type="CV/object_detection", type="CV/object_detection",
summary= summary=
"Baidu's YOLOv3 model for object detection with backbone MobileNet_V1, trained with dataset COCO2017.", "Baidu's YOLOv3 model for object detection with backbone MobileNet_V1, trained with dataset COCO2017.",
...@@ -189,7 +189,7 @@ class YOLOv3MobileNetV1Coco2017(hub.Module): ...@@ -189,7 +189,7 @@ class YOLOv3MobileNetV1Coco2017(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
paths = paths if paths else list() paths = paths if paths else list()
...@@ -270,7 +270,7 @@ class YOLOv3MobileNetV1Coco2017(hub.Module): ...@@ -270,7 +270,7 @@ class YOLOv3MobileNetV1Coco2017(hub.Module):
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
args = self.parser.parse_args(argvs) args = self.parser.parse_args(argvs)
results = self.face_detection( results = self.object_detection(
paths=[args.input_path], paths=[args.input_path],
batch_size=args.batch_size, batch_size=args.batch_size,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -21,7 +21,7 @@ from yolov3_resnet34_coco2017.yolo_head import MultiClassNMS, YOLOv3Head ...@@ -21,7 +21,7 @@ from yolov3_resnet34_coco2017.yolo_head import MultiClassNMS, YOLOv3Head
@moduleinfo( @moduleinfo(
name="yolov3_resnet34_coco2017", name="yolov3_resnet34_coco2017",
version="1.0.0", version="1.0.1",
type="CV/object_detection", type="CV/object_detection",
summary= summary=
"Baidu's YOLOv3 model for object detection with backbone ResNet34, trained with dataset coco2017.", "Baidu's YOLOv3 model for object detection with backbone ResNet34, trained with dataset coco2017.",
...@@ -191,7 +191,7 @@ class YOLOv3ResNet34Coco2017(hub.Module): ...@@ -191,7 +191,7 @@ class YOLOv3ResNet34Coco2017(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
paths = paths if paths else list() paths = paths if paths else list()
...@@ -272,7 +272,7 @@ class YOLOv3ResNet34Coco2017(hub.Module): ...@@ -272,7 +272,7 @@ class YOLOv3ResNet34Coco2017(hub.Module):
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
args = self.parser.parse_args(argvs) args = self.parser.parse_args(argvs)
results = self.face_detection( results = self.object_detection(
paths=[args.input_path], paths=[args.input_path],
batch_size=args.batch_size, batch_size=args.batch_size,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -21,7 +21,7 @@ from yolov3_resnet50_vd_coco2017.yolo_head import MultiClassNMS, YOLOv3Head ...@@ -21,7 +21,7 @@ from yolov3_resnet50_vd_coco2017.yolo_head import MultiClassNMS, YOLOv3Head
@moduleinfo( @moduleinfo(
name="yolov3_resnet50_vd_coco2017", name="yolov3_resnet50_vd_coco2017",
version="1.0.0", version="1.0.1",
type="CV/object_detection", type="CV/object_detection",
summary= summary=
"Baidu's YOLOv3 model for object detection with backbone ResNet50, trained with dataset coco2017.", "Baidu's YOLOv3 model for object detection with backbone ResNet50, trained with dataset coco2017.",
...@@ -193,7 +193,7 @@ class YOLOv3ResNet50Coco2017(hub.Module): ...@@ -193,7 +193,7 @@ class YOLOv3ResNet50Coco2017(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
paths = paths if paths else list() paths = paths if paths else list()
...@@ -274,7 +274,7 @@ class YOLOv3ResNet50Coco2017(hub.Module): ...@@ -274,7 +274,7 @@ class YOLOv3ResNet50Coco2017(hub.Module):
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
args = self.parser.parse_args(argvs) args = self.parser.parse_args(argvs)
results = self.face_detection( results = self.object_detection(
paths=[args.input_path], paths=[args.input_path],
batch_size=args.batch_size, batch_size=args.batch_size,
use_gpu=args.use_gpu, use_gpu=args.use_gpu,
......
...@@ -86,7 +86,7 @@ class ACE2P(hub.Module): ...@@ -86,7 +86,7 @@ class ACE2P(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
# compatibility with older versions # compatibility with older versions
......
...@@ -82,7 +82,7 @@ class DeeplabV3pXception65HumanSeg(hub.Module): ...@@ -82,7 +82,7 @@ class DeeplabV3pXception65HumanSeg(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
# compatibility with older versions # compatibility with older versions
......
...@@ -104,7 +104,7 @@ class StyleProjection(hub.Module): ...@@ -104,7 +104,7 @@ class StyleProjection(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly." "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id."
) )
im_output = [] im_output = []
......
## 概述
chinese_ocr_db_crnn_mobile Module用于识别图片当中的汉字。其基于[chinese_text_detection_db_mobile Module](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_text_detection_db_mobile&en_category=TextRecognition)检测得到的文本框,继续识别文本框中的中文文字。识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个超轻量级中文OCR模型,支持直接预测。
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/model/image/ocr/rcnn.png" hspace='10'/> <br />
</p>
更多详情参考[An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 命令行预测
```shell
$ hub run chinese_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
```
**该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
## API
```python
def recognize_text(images=[],
paths=[],
use_gpu=False,
output_dir='ocr_result',
visualization=False,
box_thresh=0.5,
text_thresh=0.5)
```
预测API,检测输入图片中的所有中文文本的位置。
**参数**
* paths (list\[str\]): 图片的路径;
* images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
* use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
* box\_thresh (float): 检测文本框置信度的阈值;
* text\_thresh (float): 识别中文文本置信度的阈值;
* visualization (bool): 是否将识别结果保存为图片文件;
* output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
**返回**
* res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
* data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
* text(str): 识别得到的文本
* confidence(float): 识别文本结果置信度
* text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标
如果无识别结果则data为\[\]
* save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
### 代码示例
```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="chinese_ocr_db_crnn_mobile")
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
* 样例结果示例
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/model/image/ocr/ocr_res.jpg" hspace='10'/> <br />
</p>
## 服务部署
PaddleHub Serving 可以部署一个目标检测的在线服务。
### 第一步:启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m chinese_ocr_db_crnn_mobile
```
这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
### 第二步:发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/chinese_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 查看代码
https://github.com/PaddlePaddle/PaddleOCR
### 依赖
paddlepaddle >= 1.7.2
paddlehub >= 1.6.0
shapely
pyclipper
## 更新历史
* 1.0.0
初始发布
* 1.0.1
修复使用在线服务调用模型失败问题
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import string
class CharacterOps(object):
""" Convert between text-label and text-index """
def __init__(self, config):
self.character_type = config['character_type']
self.loss_type = config['loss_type']
if self.character_type == "en":
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
elif self.character_type == "ch":
character_dict_path = config['character_dict_path']
self.character_str = ""
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
line = line.decode('utf-8').strip("\n")
self.character_str += line
dict_character = list(self.character_str)
elif self.character_type == "en_sensitive":
# same with ASTER setting (use 94 char).
self.character_str = string.printable[:-6]
dict_character = list(self.character_str)
else:
self.character_str = None
assert self.character_str is not None, \
"Nonsupport type of the character: {}".format(self.character_str)
self.beg_str = "sos"
self.end_str = "eos"
if self.loss_type == "attention":
dict_character = [self.beg_str, self.end_str] + dict_character
self.dict = {}
for i, char in enumerate(dict_character):
self.dict[char] = i
self.character = dict_character
def encode(self, text):
"""convert text-label into text-index.
input:
text: text labels of each image. [batch_size]
output:
text: concatenated text index for CTCLoss.
[sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
length: length of each text. [batch_size]
"""
if self.character_type == "en":
text = text.lower()
text_list = []
for char in text:
if char not in self.dict:
continue
text_list.append(self.dict[char])
text = np.array(text_list)
return text
def decode(self, text_index, is_remove_duplicate=False):
""" convert text-index into text-label. """
char_list = []
char_num = self.get_char_num()
if self.loss_type == "attention":
beg_idx = self.get_beg_end_flag_idx("beg")
end_idx = self.get_beg_end_flag_idx("end")
ignored_tokens = [beg_idx, end_idx]
else:
ignored_tokens = [char_num]
for idx in range(len(text_index)):
if text_index[idx] in ignored_tokens:
continue
if is_remove_duplicate:
if idx > 0 and text_index[idx - 1] == text_index[idx]:
continue
char_list.append(self.character[text_index[idx]])
text = ''.join(char_list)
return text
def get_char_num(self):
return len(self.character)
def get_beg_end_flag_idx(self, beg_or_end):
if self.loss_type == "attention":
if beg_or_end == "beg":
idx = np.array(self.dict[self.beg_str])
elif beg_or_end == "end":
idx = np.array(self.dict[self.end_str])
else:
assert False, "Unsupport type %s in get_beg_end_flag_idx"\
% beg_or_end
return idx
else:
err = "error in get_beg_end_flag_idx when using the loss %s"\
% (self.loss_type)
assert False, err
def cal_predicts_accuracy(char_ops,
preds,
preds_lod,
labels,
labels_lod,
is_remove_duplicate=False):
acc_num = 0
img_num = 0
for ino in range(len(labels_lod) - 1):
beg_no = preds_lod[ino]
end_no = preds_lod[ino + 1]
preds_text = preds[beg_no:end_no].reshape(-1)
preds_text = char_ops.decode(preds_text, is_remove_duplicate)
beg_no = labels_lod[ino]
end_no = labels_lod[ino + 1]
labels_text = labels[beg_no:end_no].reshape(-1)
labels_text = char_ops.decode(labels_text, is_remove_duplicate)
img_num += 1
if preds_text == labels_text:
acc_num += 1
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def convert_rec_attention_infer_res(preds):
img_num = preds.shape[0]
target_lod = [0]
convert_ids = []
for ino in range(img_num):
end_pos = np.where(preds[ino, :] == 1)[0]
if len(end_pos) <= 1:
text_list = preds[ino, 1:]
else:
text_list = preds[ino, 1:end_pos[1]]
target_lod.append(target_lod[ino] + len(text_list))
convert_ids = convert_ids + list(text_list)
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
def convert_rec_label_to_lod(ori_labels):
img_num = len(ori_labels)
target_lod = [0]
convert_ids = []
for ino in range(img_num):
target_lod.append(target_lod[ino] + len(ori_labels[ino]))
convert_ids = convert_ids + list(ori_labels[ino])
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import copy
import math
import os
import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, runnable, serving
from PIL import Image
import cv2
import numpy as np
import paddle.fluid as fluid
import paddlehub as hub
from chinese_ocr_db_crnn_mobile.character import CharacterOps
from chinese_ocr_db_crnn_mobile.utils import base64_to_cv2, draw_ocr, get_image_ext, sorted_boxes
@moduleinfo(
name="chinese_ocr_db_crnn_mobile",
version="1.0.1",
summary=
"The module can recognize the chinese texts in an image. Firstly, it will detect the text box positions based on the differentiable_binarization_chn module. Then it recognizes the chinese texts. ",
author="paddle-dev",
author_email="paddle-dev@baidu.com",
type="cv/text_recognition")
class ChineseOCRDBCRNN(hub.Module):
def _initialize(self, text_detector_module=None):
"""
initialize with the necessary elements
"""
self.character_dict_path = os.path.join(self.directory, 'assets',
'ppocr_keys_v1.txt')
char_ops_params = {
'character_type': 'ch',
'character_dict_path': self.character_dict_path,
'loss_type': 'ctc'
}
self.char_ops = CharacterOps(char_ops_params)
self.rec_image_shape = [3, 32, 320]
self._text_detector_module = text_detector_module
self.font_file = os.path.join(self.directory, 'assets', 'simfang.ttf')
self.pretrained_model_path = os.path.join(self.directory,
'inference_model')
self._set_config()
def _set_config(self):
"""
predictor config setting
"""
model_file_path = os.path.join(self.pretrained_model_path, 'model')
params_file_path = os.path.join(self.pretrained_model_path, 'params')
config = AnalysisConfig(model_file_path, params_file_path)
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
use_gpu = True
except:
use_gpu = False
if use_gpu:
config.enable_use_gpu(8000, 0)
else:
config.disable_gpu()
config.disable_glog_info()
# use zero copy
config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
config.switch_use_feed_fetch_ops(False)
self.predictor = create_paddle_predictor(config)
input_names = self.predictor.get_input_names()
self.input_tensor = self.predictor.get_input_tensor(input_names[0])
output_names = self.predictor.get_output_names()
self.output_tensors = []
for output_name in output_names:
output_tensor = self.predictor.get_output_tensor(output_name)
self.output_tensors.append(output_tensor)
@property
def text_detector_module(self):
"""
text detect module
"""
if not self._text_detector_module:
self._text_detector_module = hub.Module(
name='chinese_text_detection_db_mobile')
return self._text_detector_module
def read_images(self, paths=[]):
images = []
for img_path in paths:
assert os.path.isfile(
img_path), "The {} isn't a valid file.".format(img_path)
img = cv2.imread(img_path)
if img is None:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
return images
def get_rotate_crop_image(self, img, points):
img_height, img_width = img.shape[0:2]
left = int(np.min(points[:, 0]))
right = int(np.max(points[:, 0]))
top = int(np.min(points[:, 1]))
bottom = int(np.max(points[:, 1]))
img_crop = img[top:bottom, left:right, :].copy()
points[:, 0] = points[:, 0] - left
points[:, 1] = points[:, 1] - top
img_crop_width = int(np.linalg.norm(points[0] - points[1]))
img_crop_height = int(np.linalg.norm(points[0] - points[3]))
pts_std = np.float32([[0, 0], [img_crop_width, 0],\
[img_crop_width, img_crop_height], [0, img_crop_height]])
M = cv2.getPerspectiveTransform(points, pts_std)
dst_img = cv2.warpPerspective(
img_crop,
M, (img_crop_width, img_crop_height),
borderMode=cv2.BORDER_REPLICATE)
dst_img_height, dst_img_width = dst_img.shape[0:2]
if dst_img_height * 1.0 / dst_img_width >= 1.5:
dst_img = np.rot90(dst_img)
return dst_img
def resize_norm_img(self, img, max_wh_ratio):
imgC, imgH, imgW = self.rec_image_shape
imgW = int(32 * max_wh_ratio)
h = img.shape[0]
w = img.shape[1]
ratio = w / float(h)
if math.ceil(imgH * ratio) > imgW:
resized_w = imgW
else:
resized_w = int(math.ceil(imgH * ratio))
resized_image = cv2.resize(img, (resized_w, imgH))
resized_image = resized_image.astype('float32')
resized_image = resized_image.transpose((2, 0, 1)) / 255
resized_image -= 0.5
resized_image /= 0.5
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
return padding_im
def recognize_text(self,
images=[],
paths=[],
use_gpu=False,
output_dir='ocr_result',
visualization=False,
box_thresh=0.5,
text_thresh=0.5):
"""
Get the chinese texts in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
use_gpu (bool): Whether to use gpu.
batch_size(int): the program deals once with one
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
box_thresh(float): the threshold of the detected text box's confidence
text_thresh(float): the threshold of the recognize chinese texts' confidence
Returns:
res (list): The result of chinese texts and save path of images.
"""
if use_gpu:
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
except:
raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
self.use_gpu = use_gpu
if images != [] and isinstance(images, list) and paths == []:
predicted_data = images
elif images == [] and isinstance(paths, list) and paths != []:
predicted_data = self.read_images(paths)
else:
raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
detection_results = self.text_detector_module.detect_text(
images=predicted_data, use_gpu=self.use_gpu, box_thresh=box_thresh)
boxes = [
np.array(item['data']).astype(np.float32)
for item in detection_results
]
all_results = []
for index, img_boxes in enumerate(boxes):
original_image = predicted_data[index].copy()
result = {'save_path': ''}
if img_boxes is None:
result['data'] = []
else:
img_crop_list = []
boxes = sorted_boxes(img_boxes)
for num_box in range(len(boxes)):
tmp_box = copy.deepcopy(boxes[num_box])
img_crop = self.get_rotate_crop_image(
original_image, tmp_box)
img_crop_list.append(img_crop)
rec_results = self._recognize_text(img_crop_list)
# if the recognized text confidence score is lower than text_thresh, then drop it
rec_res_final = []
for index, res in enumerate(rec_results):
text, score = res
if score >= text_thresh:
rec_res_final.append({
'text':
text,
'confidence':
float(score),
'text_box_position':
boxes[index].astype(np.int).tolist()
})
result['data'] = rec_res_final
if visualization and result['data']:
result['save_path'] = self.save_result_image(
original_image, boxes, rec_results, output_dir,
text_thresh)
all_results.append(result)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
def save_result_image(self,
original_image,
detection_boxes,
rec_results,
output_dir='ocr_result',
text_thresh=0.5):
image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
txts = [item[0] for item in rec_results]
scores = [item[1] for item in rec_results]
draw_img = draw_ocr(
image,
detection_boxes,
txts,
scores,
font_file=self.font_file,
draw_txt=True,
drop_score=text_thresh)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
ext = get_image_ext(original_image)
saved_name = 'ndarray_{}{}'.format(time.time(), ext)
save_file_path = os.path.join(output_dir, saved_name)
cv2.imwrite(save_file_path, draw_img[:, :, ::-1])
return save_file_path
def _recognize_text(self, image_list):
img_num = len(image_list)
batch_num = 30
rec_res = []
predict_time = 0
for beg_img_no in range(0, img_num, batch_num):
end_img_no = min(img_num, beg_img_no + batch_num)
norm_img_batch = []
max_wh_ratio = 0
for ino in range(beg_img_no, end_img_no):
h, w = image_list[ino].shape[0:2]
wh_ratio = w / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no):
norm_img = self.resize_norm_img(image_list[ino], max_wh_ratio)
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
norm_img_batch = np.concatenate(norm_img_batch)
norm_img_batch = norm_img_batch.copy()
self.input_tensor.copy_from_cpu(norm_img_batch)
self.predictor.zero_copy_run()
rec_idx_batch = self.output_tensors[0].copy_to_cpu()
rec_idx_lod = self.output_tensors[0].lod()[0]
predict_batch = self.output_tensors[1].copy_to_cpu()
predict_lod = self.output_tensors[1].lod()[0]
for rno in range(len(rec_idx_lod) - 1):
beg = rec_idx_lod[rno]
end = rec_idx_lod[rno + 1]
rec_idx_tmp = rec_idx_batch[beg:end, 0]
preds_text = self.char_ops.decode(rec_idx_tmp)
beg = predict_lod[rno]
end = predict_lod[rno + 1]
probs = predict_batch[beg:end, :]
ind = np.argmax(probs, axis=1)
blank = probs.shape[1]
valid_ind = np.where(ind != (blank - 1))[0]
score = np.mean(probs[valid_ind, ind[valid_ind]])
rec_res.append([preds_text, score])
return rec_res
def save_inference_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
detector_dir = os.path.join(dirname, 'text_detector')
recognizer_dir = os.path.join(dirname, 'text_recognizer')
self._save_detector_model(detector_dir, model_filename, params_filename,
combined)
self._save_recognizer_model(recognizer_dir, model_filename,
params_filename, combined)
logger.info("The inference model has been saved in the path {}".format(
os.path.realpath(dirname)))
def _save_detector_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
self.text_detector_module.save_inference_model(
dirname, model_filename, params_filename, combined)
def _save_recognizer_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined:
model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename
place = fluid.CPUPlace()
exe = fluid.Executor(place)
model_file_path = os.path.join(self.pretrained_model_path, 'model')
params_file_path = os.path.join(self.pretrained_model_path, 'params')
program, feeded_var_names, target_vars = fluid.io.load_inference_model(
dirname=self.pretrained_model_path,
model_filename=model_file_path,
params_filename=params_file_path,
executor=exe)
fluid.io.save_inference_model(
dirname=dirname,
main_program=program,
executor=exe,
feeded_var_names=feeded_var_names,
target_vars=target_vars,
model_filename=model_filename,
params_filename=params_filename)
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
self.parser = argparse.ArgumentParser(
description="Run the %s module." % self.name,
prog='hub run %s' % self.name,
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options",
description=
"Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
results = self.recognize_text(
paths=[args.input_path],
use_gpu=args.use_gpu,
output_dir=args.output_dir,
visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument(
'--output_dir',
type=str,
default='ocr_result',
help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization',
type=ast.literal_eval,
default=False,
help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options
"""
self.arg_input_group.add_argument(
'--input_path', type=str, default=None, help="diretory to image")
if __name__ == '__main__':
ocr = ChineseOCRDBCRNN()
image_path = [
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/11.jpg',
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/12.jpg',
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/test_image.jpg'
]
res = ocr.recognize_text(paths=image_path, visualization=True)
ocr.save_inference_model('save')
print(res)
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from PIL import Image, ImageDraw, ImageFont
import base64
import cv2
import numpy as np
def draw_ocr(image,
boxes,
txts,
scores,
font_file,
draw_txt=True,
drop_score=0.5):
"""
Visualize the results of OCR detection and recognition
args:
image(Image|array): RGB image
boxes(list): boxes with shape(N, 4, 2)
txts(list): the texts
scores(list): txxs corresponding scores
draw_txt(bool): whether draw text or not
drop_score(float): only scores greater than drop_threshold will be visualized
return(array):
the visualized img
"""
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score or math.isnan(score):
continue
box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
if draw_txt:
img = np.array(resize_img(image, input_size=600))
txt_img = text_visual(
txts,
scores,
font_file,
img_h=img.shape[0],
img_w=600,
threshold=drop_score)
img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
return img
return image
def text_visual(texts, scores, font_file, img_h=400, img_w=600, threshold=0.):
"""
create new blank img and draw txt on it
args:
texts(list): the text will be draw
scores(list|None): corresponding score of each txt
img_h(int): the height of blank img
img_w(int): the width of blank img
return(array):
"""
if scores is not None:
assert len(texts) == len(
scores), "The number of txts and corresponding scores must match"
def create_blank_img():
blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255
blank_img[:, img_w - 1:] = 0
blank_img = Image.fromarray(blank_img).convert("RGB")
draw_txt = ImageDraw.Draw(blank_img)
return blank_img, draw_txt
blank_img, draw_txt = create_blank_img()
font_size = 20
txt_color = (0, 0, 0)
font = ImageFont.truetype(font_file, font_size, encoding="utf-8")
gap = font_size + 5
txt_img_list = []
count, index = 1, 0
for idx, txt in enumerate(texts):
index += 1
if scores[idx] < threshold or math.isnan(scores[idx]):
index -= 1
continue
first_line = True
while str_count(txt) >= img_w // font_size - 4:
tmp = txt
txt = tmp[:img_w // font_size - 4]
if first_line:
new_txt = str(index) + ': ' + txt
first_line = False
else:
new_txt = ' ' + txt
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
txt = tmp[img_w // font_size - 4:]
if count >= img_h // gap - 1:
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
if first_line:
new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx])
else:
new_txt = " " + txt + " " + '%.3f' % (scores[idx])
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
# whether add new blank img or not
if count >= img_h // gap - 1 and idx + 1 < len(texts):
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
txt_img_list.append(np.array(blank_img))
if len(txt_img_list) == 1:
blank_img = np.array(txt_img_list[0])
else:
blank_img = np.concatenate(txt_img_list, axis=1)
return np.array(blank_img)
def str_count(s):
"""
Count the number of Chinese characters,
a single English character and a single number
equal to half the length of Chinese characters.
args:
s(string): the input of string
return(int):
the number of Chinese characters
"""
import string
count_zh = count_pu = 0
s_len = len(s)
en_dg_count = 0
for c in s:
if c in string.ascii_letters or c.isdigit() or c.isspace():
en_dg_count += 1
elif c.isalpha():
count_zh += 1
else:
count_pu += 1
return s_len - math.ceil(en_dg_count / 2)
def resize_img(img, input_size=600):
img = np.array(img)
im_shape = img.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
im_scale = float(input_size) / float(im_size_max)
im = cv2.resize(img, None, None, fx=im_scale, fy=im_scale)
return im
def get_image_ext(image):
if image.shape[2] == 4:
return ".png"
return ".jpg"
def sorted_boxes(dt_boxes):
"""
Sort text boxes in order from top to bottom, left to right
args:
dt_boxes(array):detected text boxes with shape [4, 2]
return:
sorted boxes(array) with shape [4, 2]
"""
num_boxes = dt_boxes.shape[0]
sorted_boxes = sorted(dt_boxes, key=lambda x: x[0][1])
_boxes = list(sorted_boxes)
for i in range(num_boxes - 1):
if abs(_boxes[i+1][0][1] - _boxes[i][0][1]) < 10 and \
(_boxes[i + 1][0][0] < _boxes[i][0][0]):
tmp = _boxes[i]
_boxes[i] = _boxes[i + 1]
_boxes[i + 1] = tmp
return _boxes
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
## 概述
chinese_ocr_db_crnn_server Module用于识别图片当中的汉字。其基于[chinese_text_detection_db_server Module](https://www.paddlepaddle.org.cn/hubdetail?name=chinese_text_detection_db_server&en_category=TextRecognition)检测得到的文本框,继续识别文本框中的中文文字。识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个通用的OCR模型,支持直接预测。
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/model/image/ocr/rcnn.png" hspace='10'/> <br />
</p>
更多详情参考[An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 命令行预测
```shell
$ hub run chinese_ocr_db_crnn_server --input_path "/PATH/TO/IMAGE"
```
**该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
## API
```python
def recognize_text(images=[],
paths=[],
use_gpu=False,
output_dir='ocr_result',
visualization=False,
box_thresh=0.5,
text_thresh=0.5)
```
预测API,检测输入图片中的所有中文文本的位置。
**参数**
* paths (list\[str\]): 图片的路径;
* images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
* use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
* box\_thresh (float): 检测文本框置信度的阈值;
* text\_thresh (float): 识别中文文本置信度的阈值;
* visualization (bool): 是否将识别结果保存为图片文件;
* output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
**返回**
* res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
* data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
* text(str): 识别得到的文本
* confidence(float): 识别文本结果置信度
* text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标
如果无识别结果则data为\[\]
* save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
### 代码示例
```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="chinese_ocr_db_crnn_server")
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
* 样例结果示例
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/model/image/ocr/ocr_res.jpg" hspace='10'/> <br />
</p>
## 服务部署
PaddleHub Serving 可以部署一个目标检测的在线服务。
### 第一步:启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m chinese_ocr_db_crnn_server
```
这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
### 第二步:发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/chinese_ocr_db_crnn_server"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 查看代码
https://github.com/PaddlePaddle/PaddleOCR
### 依赖
paddlepaddle >= 1.7.2
paddlehub >= 1.6.0
shapely
pyclipper
## 更新历史
* 1.0.0
初始发布
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import string
class CharacterOps(object):
""" Convert between text-label and text-index """
def __init__(self, config):
self.character_type = config['character_type']
self.loss_type = config['loss_type']
if self.character_type == "en":
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
elif self.character_type == "ch":
character_dict_path = config['character_dict_path']
self.character_str = ""
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
line = line.decode('utf-8').strip("\n")
self.character_str += line
dict_character = list(self.character_str)
elif self.character_type == "en_sensitive":
# same with ASTER setting (use 94 char).
self.character_str = string.printable[:-6]
dict_character = list(self.character_str)
else:
self.character_str = None
assert self.character_str is not None, \
"Nonsupport type of the character: {}".format(self.character_str)
self.beg_str = "sos"
self.end_str = "eos"
if self.loss_type == "attention":
dict_character = [self.beg_str, self.end_str] + dict_character
self.dict = {}
for i, char in enumerate(dict_character):
self.dict[char] = i
self.character = dict_character
def encode(self, text):
"""convert text-label into text-index.
input:
text: text labels of each image. [batch_size]
output:
text: concatenated text index for CTCLoss.
[sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
length: length of each text. [batch_size]
"""
if self.character_type == "en":
text = text.lower()
text_list = []
for char in text:
if char not in self.dict:
continue
text_list.append(self.dict[char])
text = np.array(text_list)
return text
def decode(self, text_index, is_remove_duplicate=False):
""" convert text-index into text-label. """
char_list = []
char_num = self.get_char_num()
if self.loss_type == "attention":
beg_idx = self.get_beg_end_flag_idx("beg")
end_idx = self.get_beg_end_flag_idx("end")
ignored_tokens = [beg_idx, end_idx]
else:
ignored_tokens = [char_num]
for idx in range(len(text_index)):
if text_index[idx] in ignored_tokens:
continue
if is_remove_duplicate:
if idx > 0 and text_index[idx - 1] == text_index[idx]:
continue
char_list.append(self.character[text_index[idx]])
text = ''.join(char_list)
return text
def get_char_num(self):
return len(self.character)
def get_beg_end_flag_idx(self, beg_or_end):
if self.loss_type == "attention":
if beg_or_end == "beg":
idx = np.array(self.dict[self.beg_str])
elif beg_or_end == "end":
idx = np.array(self.dict[self.end_str])
else:
assert False, "Unsupport type %s in get_beg_end_flag_idx"\
% beg_or_end
return idx
else:
err = "error in get_beg_end_flag_idx when using the loss %s"\
% (self.loss_type)
assert False, err
def cal_predicts_accuracy(char_ops,
preds,
preds_lod,
labels,
labels_lod,
is_remove_duplicate=False):
acc_num = 0
img_num = 0
for ino in range(len(labels_lod) - 1):
beg_no = preds_lod[ino]
end_no = preds_lod[ino + 1]
preds_text = preds[beg_no:end_no].reshape(-1)
preds_text = char_ops.decode(preds_text, is_remove_duplicate)
beg_no = labels_lod[ino]
end_no = labels_lod[ino + 1]
labels_text = labels[beg_no:end_no].reshape(-1)
labels_text = char_ops.decode(labels_text, is_remove_duplicate)
img_num += 1
if preds_text == labels_text:
acc_num += 1
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def convert_rec_attention_infer_res(preds):
img_num = preds.shape[0]
target_lod = [0]
convert_ids = []
for ino in range(img_num):
end_pos = np.where(preds[ino, :] == 1)[0]
if len(end_pos) <= 1:
text_list = preds[ino, 1:]
else:
text_list = preds[ino, 1:end_pos[1]]
target_lod.append(target_lod[ino] + len(text_list))
convert_ids = convert_ids + list(text_list)
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
def convert_rec_label_to_lod(ori_labels):
img_num = len(ori_labels)
target_lod = [0]
convert_ids = []
for ino in range(img_num):
target_lod.append(target_lod[ino] + len(ori_labels[ino]))
convert_ids = convert_ids + list(ori_labels[ino])
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import copy
import math
import os
import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, runnable, serving
from PIL import Image
import cv2
import numpy as np
import paddle.fluid as fluid
import paddlehub as hub
from chinese_ocr_db_crnn_server.character import CharacterOps
from chinese_ocr_db_crnn_server.utils import base64_to_cv2, draw_ocr, get_image_ext, sorted_boxes
@moduleinfo(
name="chinese_ocr_db_crnn_server",
version="1.0.0",
summary=
"The module can recognize the chinese texts in an image. Firstly, it will detect the text box positions based on the differentiable_binarization_chn module. Then it recognizes the chinese texts. ",
author="paddle-dev",
author_email="paddle-dev@baidu.com",
type="cv/text_recognition")
class ChineseOCRDBCRNNServer(hub.Module):
def _initialize(self, text_detector_module=None):
"""
initialize with the necessary elements
"""
self.character_dict_path = os.path.join(self.directory, 'assets',
'ppocr_keys_v1.txt')
char_ops_params = {
'character_type': 'ch',
'character_dict_path': self.character_dict_path,
'loss_type': 'ctc'
}
self.char_ops = CharacterOps(char_ops_params)
self.rec_image_shape = [3, 32, 320]
self._text_detector_module = text_detector_module
self.font_file = os.path.join(self.directory, 'assets', 'simfang.ttf')
self.pretrained_model_path = os.path.join(self.directory, 'assets',
'ch_rec_r34_vd_crnn')
self._set_config()
def _set_config(self):
"""
predictor config setting
"""
model_file_path = os.path.join(self.pretrained_model_path, 'model')
params_file_path = os.path.join(self.pretrained_model_path, 'params')
config = AnalysisConfig(model_file_path, params_file_path)
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
use_gpu = True
except:
use_gpu = False
if use_gpu:
config.enable_use_gpu(8000, 0)
else:
config.disable_gpu()
config.disable_glog_info()
# use zero copy
config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
config.switch_use_feed_fetch_ops(False)
self.predictor = create_paddle_predictor(config)
input_names = self.predictor.get_input_names()
self.input_tensor = self.predictor.get_input_tensor(input_names[0])
output_names = self.predictor.get_output_names()
self.output_tensors = []
for output_name in output_names:
output_tensor = self.predictor.get_output_tensor(output_name)
self.output_tensors.append(output_tensor)
@property
def text_detector_module(self):
"""
text detect module
"""
if not self._text_detector_module:
self._text_detector_module = hub.Module(
name='chinese_text_detection_db_server')
return self._text_detector_module
def read_images(self, paths=[]):
images = []
for img_path in paths:
assert os.path.isfile(
img_path), "The {} isn't a valid file.".format(img_path)
img = cv2.imread(img_path)
if img is None:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
return images
def get_rotate_crop_image(self, img, points):
img_height, img_width = img.shape[0:2]
left = int(np.min(points[:, 0]))
right = int(np.max(points[:, 0]))
top = int(np.min(points[:, 1]))
bottom = int(np.max(points[:, 1]))
img_crop = img[top:bottom, left:right, :].copy()
points[:, 0] = points[:, 0] - left
points[:, 1] = points[:, 1] - top
img_crop_width = int(np.linalg.norm(points[0] - points[1]))
img_crop_height = int(np.linalg.norm(points[0] - points[3]))
pts_std = np.float32([[0, 0], [img_crop_width, 0],\
[img_crop_width, img_crop_height], [0, img_crop_height]])
M = cv2.getPerspectiveTransform(points, pts_std)
dst_img = cv2.warpPerspective(
img_crop,
M, (img_crop_width, img_crop_height),
borderMode=cv2.BORDER_REPLICATE)
dst_img_height, dst_img_width = dst_img.shape[0:2]
if dst_img_height * 1.0 / dst_img_width >= 1.5:
dst_img = np.rot90(dst_img)
return dst_img
def resize_norm_img(self, img, max_wh_ratio):
imgC, imgH, imgW = self.rec_image_shape
imgW = int(32 * max_wh_ratio)
h = img.shape[0]
w = img.shape[1]
ratio = w / float(h)
if math.ceil(imgH * ratio) > imgW:
resized_w = imgW
else:
resized_w = int(math.ceil(imgH * ratio))
resized_image = cv2.resize(img, (resized_w, imgH))
resized_image = resized_image.astype('float32')
resized_image = resized_image.transpose((2, 0, 1)) / 255
resized_image -= 0.5
resized_image /= 0.5
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
return padding_im
def recognize_text(self,
images=[],
paths=[],
use_gpu=False,
output_dir='ocr_result',
visualization=False,
box_thresh=0.5,
text_thresh=0.5):
"""
Get the chinese texts in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
use_gpu (bool): Whether to use gpu.
batch_size(int): the program deals once with one
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
box_thresh(float): the threshold of the detected text box's confidence
text_thresh(float): the threshold of the recognize chinese texts' confidence
Returns:
res (list): The result of chinese texts and save path of images.
"""
if use_gpu:
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
except:
raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
self.use_gpu = use_gpu
if images != [] and isinstance(images, list) and paths == []:
predicted_data = images
elif images == [] and isinstance(paths, list) and paths != []:
predicted_data = self.read_images(paths)
else:
raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
detection_results = self.text_detector_module.detect_text(
images=predicted_data, use_gpu=self.use_gpu, box_thresh=box_thresh)
boxes = [
np.array(item['data']).astype(np.float32)
for item in detection_results
]
all_results = []
for index, img_boxes in enumerate(boxes):
original_image = predicted_data[index].copy()
result = {'save_path': ''}
if img_boxes is None:
result['data'] = []
else:
img_crop_list = []
boxes = sorted_boxes(img_boxes)
for num_box in range(len(boxes)):
tmp_box = copy.deepcopy(boxes[num_box])
img_crop = self.get_rotate_crop_image(
original_image, tmp_box)
img_crop_list.append(img_crop)
rec_results = self._recognize_text(img_crop_list)
# if the recognized text confidence score is lower than text_thresh, then drop it
rec_res_final = []
for index, res in enumerate(rec_results):
text, score = res
if score >= text_thresh:
rec_res_final.append({
'text':
text,
'confidence':
float(score),
'text_box_position':
boxes[index].astype(np.int).tolist()
})
result['data'] = rec_res_final
if visualization and result['data']:
result['save_path'] = self.save_result_image(
original_image, boxes, rec_results, output_dir,
text_thresh)
all_results.append(result)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
def save_result_image(self,
original_image,
detection_boxes,
rec_results,
output_dir='ocr_result',
text_thresh=0.5):
image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
txts = [item[0] for item in rec_results]
scores = [item[1] for item in rec_results]
draw_img = draw_ocr(
image,
detection_boxes,
txts,
scores,
font_file=self.font_file,
draw_txt=True,
drop_score=text_thresh)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
ext = get_image_ext(original_image)
saved_name = 'ndarray_{}{}'.format(time.time(), ext)
save_file_path = os.path.join(output_dir, saved_name)
cv2.imwrite(save_file_path, draw_img[:, :, ::-1])
return save_file_path
def _recognize_text(self, image_list):
img_num = len(image_list)
batch_num = 30
rec_res = []
predict_time = 0
for beg_img_no in range(0, img_num, batch_num):
end_img_no = min(img_num, beg_img_no + batch_num)
norm_img_batch = []
max_wh_ratio = 0
for ino in range(beg_img_no, end_img_no):
h, w = image_list[ino].shape[0:2]
wh_ratio = w / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no):
norm_img = self.resize_norm_img(image_list[ino], max_wh_ratio)
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
norm_img_batch = np.concatenate(norm_img_batch)
norm_img_batch = norm_img_batch.copy()
self.input_tensor.copy_from_cpu(norm_img_batch)
self.predictor.zero_copy_run()
rec_idx_batch = self.output_tensors[0].copy_to_cpu()
rec_idx_lod = self.output_tensors[0].lod()[0]
predict_batch = self.output_tensors[1].copy_to_cpu()
predict_lod = self.output_tensors[1].lod()[0]
for rno in range(len(rec_idx_lod) - 1):
beg = rec_idx_lod[rno]
end = rec_idx_lod[rno + 1]
rec_idx_tmp = rec_idx_batch[beg:end, 0]
preds_text = self.char_ops.decode(rec_idx_tmp)
beg = predict_lod[rno]
end = predict_lod[rno + 1]
probs = predict_batch[beg:end, :]
ind = np.argmax(probs, axis=1)
blank = probs.shape[1]
valid_ind = np.where(ind != (blank - 1))[0]
score = np.mean(probs[valid_ind, ind[valid_ind]])
rec_res.append([preds_text, score])
return rec_res
def save_inference_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
detector_dir = os.path.join(dirname, 'text_detector')
recognizer_dir = os.path.join(dirname, 'text_recognizer')
self._save_detector_model(detector_dir, model_filename, params_filename,
combined)
self._save_recognizer_model(recognizer_dir, model_filename,
params_filename, combined)
logger.info("The inference model has been saved in the path {}".format(
os.path.realpath(dirname)))
def _save_detector_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
self.text_detector_module.save_inference_model(
dirname, model_filename, params_filename, combined)
def _save_recognizer_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined:
model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename
place = fluid.CPUPlace()
exe = fluid.Executor(place)
model_file_path = os.path.join(self.pretrained_model_path, 'model')
params_file_path = os.path.join(self.pretrained_model_path, 'params')
program, feeded_var_names, target_vars = fluid.io.load_inference_model(
dirname=self.pretrained_model_path,
model_filename=model_file_path,
params_filename=params_file_path,
executor=exe)
fluid.io.save_inference_model(
dirname=dirname,
main_program=program,
executor=exe,
feeded_var_names=feeded_var_names,
target_vars=target_vars,
model_filename=model_filename,
params_filename=params_filename)
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
self.parser = argparse.ArgumentParser(
description="Run the %s module." % self.name,
prog='hub run %s' % self.name,
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options",
description=
"Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
results = self.recognize_text(
paths=[args.input_path],
use_gpu=args.use_gpu,
output_dir=args.output_dir,
visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument(
'--output_dir',
type=str,
default='ocr_result',
help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization',
type=ast.literal_eval,
default=False,
help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options
"""
self.arg_input_group.add_argument(
'--input_path', type=str, default=None, help="diretory to image")
if __name__ == '__main__':
ocr = ChineseOCRDBCRNNServer()
print(ocr.name)
image_path = [
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/11.jpg',
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/12.jpg',
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/test_image.jpg'
]
res = ocr.recognize_text(paths=image_path, visualization=True)
ocr.save_inference_model('save')
print(res)
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from PIL import Image, ImageDraw, ImageFont
import base64
import cv2
import numpy as np
def draw_ocr(image,
boxes,
txts,
scores,
font_file,
draw_txt=True,
drop_score=0.5):
"""
Visualize the results of OCR detection and recognition
args:
image(Image|array): RGB image
boxes(list): boxes with shape(N, 4, 2)
txts(list): the texts
scores(list): txxs corresponding scores
draw_txt(bool): whether draw text or not
drop_score(float): only scores greater than drop_threshold will be visualized
return(array):
the visualized img
"""
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score or math.isnan(score):
continue
box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
if draw_txt:
img = np.array(resize_img(image, input_size=600))
txt_img = text_visual(
txts,
scores,
font_file,
img_h=img.shape[0],
img_w=600,
threshold=drop_score)
img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
return img
return image
def text_visual(texts, scores, font_file, img_h=400, img_w=600, threshold=0.):
"""
create new blank img and draw txt on it
args:
texts(list): the text will be draw
scores(list|None): corresponding score of each txt
img_h(int): the height of blank img
img_w(int): the width of blank img
return(array):
"""
if scores is not None:
assert len(texts) == len(
scores), "The number of txts and corresponding scores must match"
def create_blank_img():
blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255
blank_img[:, img_w - 1:] = 0
blank_img = Image.fromarray(blank_img).convert("RGB")
draw_txt = ImageDraw.Draw(blank_img)
return blank_img, draw_txt
blank_img, draw_txt = create_blank_img()
font_size = 20
txt_color = (0, 0, 0)
font = ImageFont.truetype(font_file, font_size, encoding="utf-8")
gap = font_size + 5
txt_img_list = []
count, index = 1, 0
for idx, txt in enumerate(texts):
index += 1
if scores[idx] < threshold or math.isnan(scores[idx]):
index -= 1
continue
first_line = True
while str_count(txt) >= img_w // font_size - 4:
tmp = txt
txt = tmp[:img_w // font_size - 4]
if first_line:
new_txt = str(index) + ': ' + txt
first_line = False
else:
new_txt = ' ' + txt
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
txt = tmp[img_w // font_size - 4:]
if count >= img_h // gap - 1:
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
if first_line:
new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx])
else:
new_txt = " " + txt + " " + '%.3f' % (scores[idx])
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
# whether add new blank img or not
if count >= img_h // gap - 1 and idx + 1 < len(texts):
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
txt_img_list.append(np.array(blank_img))
if len(txt_img_list) == 1:
blank_img = np.array(txt_img_list[0])
else:
blank_img = np.concatenate(txt_img_list, axis=1)
return np.array(blank_img)
def str_count(s):
"""
Count the number of Chinese characters,
a single English character and a single number
equal to half the length of Chinese characters.
args:
s(string): the input of string
return(int):
the number of Chinese characters
"""
import string
count_zh = count_pu = 0
s_len = len(s)
en_dg_count = 0
for c in s:
if c in string.ascii_letters or c.isdigit() or c.isspace():
en_dg_count += 1
elif c.isalpha():
count_zh += 1
else:
count_pu += 1
return s_len - math.ceil(en_dg_count / 2)
def resize_img(img, input_size=600):
img = np.array(img)
im_shape = img.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
im_scale = float(input_size) / float(im_size_max)
im = cv2.resize(img, None, None, fx=im_scale, fy=im_scale)
return im
def get_image_ext(image):
if image.shape[2] == 4:
return ".png"
return ".jpg"
def sorted_boxes(dt_boxes):
"""
Sort text boxes in order from top to bottom, left to right
args:
dt_boxes(array):detected text boxes with shape [4, 2]
return:
sorted boxes(array) with shape [4, 2]
"""
num_boxes = dt_boxes.shape[0]
sorted_boxes = sorted(dt_boxes, key=lambda x: x[0][1])
_boxes = list(sorted_boxes)
for i in range(num_boxes - 1):
if abs(_boxes[i+1][0][1] - _boxes[i][0][1]) < 10 and \
(_boxes[i + 1][0][0] < _boxes[i][0][0]):
tmp = _boxes[i]
_boxes[i] = _boxes[i + 1]
_boxes[i + 1] = tmp
return _boxes
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
## 概述
Differentiable Binarization(简称DB)是一种基于分割的文本检测算法。在各种文本检测算法中,基于分割的检测算法可以更好地处理弯曲等不规则形状文本,因此往往能取得更好的检测效果。但分割法后处理步骤中将分割结果转化为检测框的流程复杂,耗时严重。DB将二值化阈值加入训练中学习,可以获得更准确的检测边界,从而简化后处理流程。该Module是一个超轻量级文本检测模型,支持直接预测。
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/model/image/ocr/db_algo.png" hspace='10'/> <br />
</p>
更多详情参考[Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
## 命令行预测
```shell
$ hub run chinese_text_detection_db_mobile --input_path "/PATH/TO/IMAGE"
```
**该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
## API
```python
def detect_text(paths=[],
images=[],
use_gpu=False,
output_dir='detection_result',
box_thresh=0.5,
visualization=False)
```
预测API,检测输入图片中的所有中文文本的位置。
**参数**
* paths (list\[str\]): 图片的路径;
* images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
* use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
* box\_thresh (float): 检测文本框置信度的阈值;
* visualization (bool): 是否将识别结果保存为图片文件;
* output\_dir (str): 图片的保存路径,默认设为 detection\_result;
**返回**
* res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
* data (list): 检测文本框结果,文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标
* save_path (str): 识别结果的保存路径, 如不保存图片则save_path为''
### 代码示例
```python
import paddlehub as hub
import cv2
text_detector = hub.Module(name="chinese_text_detection_db_mobile")
result = text_detector.detect_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result =text_detector.detect_text(paths=['/PATH/TO/IMAGE'])
```
## 服务部署
PaddleHub Serving 可以部署一个目标检测的在线服务。
### 第一步:启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m chinese_text_detection_db_mobile
```
这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
### 第二步:发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/chinese_text_detection_db_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 查看代码
https://github.com/PaddlePaddle/PaddleOCR
## 依赖
paddlepaddle >= 1.7.2
paddlehub >= 1.6.0
shapely
pyclipper
## 更新历史
* 1.0.0
初始发布
* 1.0.1
修复使用在线服务调用模型失败问题
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import math
import os
import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, runnable, serving
from PIL import Image
import base64
import cv2
import numpy as np
import paddle.fluid as fluid
import paddlehub as hub
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
@moduleinfo(
name="chinese_text_detection_db_mobile",
version="1.0.1",
summary=
"The module aims to detect chinese text position in the image, which is based on differentiable_binarization algorithm.",
author="paddle-dev",
author_email="paddle-dev@baidu.com",
type="cv/text_recognition")
class ChineseTextDetectionDB(hub.Module):
def _initialize(self):
"""
initialize with the necessary elements
"""
self.pretrained_model_path = os.path.join(self.directory,
'inference_model')
self._set_config()
def check_requirements(self):
try:
import shapely, pyclipper
except:
print(
'This module requires the shapely, pyclipper tools. The running enviroment does not meet the requirments. Please install the two packages.'
)
exit()
def _set_config(self):
"""
predictor config setting
"""
model_file_path = os.path.join(self.pretrained_model_path, 'model')
params_file_path = os.path.join(self.pretrained_model_path, 'params')
config = AnalysisConfig(model_file_path, params_file_path)
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
use_gpu = True
except:
use_gpu = False
if use_gpu:
config.enable_use_gpu(8000, 0)
else:
config.disable_gpu()
config.disable_glog_info()
# use zero copy
config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
config.switch_use_feed_fetch_ops(False)
self.predictor = create_paddle_predictor(config)
input_names = self.predictor.get_input_names()
self.input_tensor = self.predictor.get_input_tensor(input_names[0])
output_names = self.predictor.get_output_names()
self.output_tensors = []
for output_name in output_names:
output_tensor = self.predictor.get_output_tensor(output_name)
self.output_tensors.append(output_tensor)
def read_images(self, paths=[]):
images = []
for img_path in paths:
assert os.path.isfile(
img_path), "The {} isn't a valid file.".format(img_path)
img = cv2.imread(img_path)
if img is None:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
return images
def filter_tag_det_res(self, dt_boxes, image_shape):
img_height, img_width = image_shape[0:2]
dt_boxes_new = []
for box in dt_boxes:
box = self.order_points_clockwise(box)
left = int(np.min(box[:, 0]))
right = int(np.max(box[:, 0]))
top = int(np.min(box[:, 1]))
bottom = int(np.max(box[:, 1]))
bbox_height = bottom - top
bbox_width = right - left
diffh = math.fabs(box[0, 1] - box[1, 1])
diffw = math.fabs(box[0, 0] - box[3, 0])
rect_width = int(np.linalg.norm(box[0] - box[1]))
rect_height = int(np.linalg.norm(box[0] - box[3]))
if rect_width <= 10 or rect_height <= 10:
continue
dt_boxes_new.append(box)
dt_boxes = np.array(dt_boxes_new)
return dt_boxes
def order_points_clockwise(self, pts):
"""
reference from: https://github.com/jrosebr1/imutils/blob/master/imutils/perspective.py
# sort the points based on their x-coordinates
"""
xSorted = pts[np.argsort(pts[:, 0]), :]
# grab the left-most and right-most points from the sorted
# x-roodinate points
leftMost = xSorted[:2, :]
rightMost = xSorted[2:, :]
# now, sort the left-most coordinates according to their
# y-coordinates so we can grab the top-left and bottom-left
# points, respectively
leftMost = leftMost[np.argsort(leftMost[:, 1]), :]
(tl, bl) = leftMost
rightMost = rightMost[np.argsort(rightMost[:, 1]), :]
(tr, br) = rightMost
rect = np.array([tl, tr, br, bl], dtype="float32")
return rect
def detect_text(self,
images=[],
paths=[],
use_gpu=False,
output_dir='detection_result',
visualization=False,
box_thresh=0.5):
"""
Get the text box in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
use_gpu (bool): Whether to use gpu. Default false.
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
box_thresh(float): the threshold of the detected text box's confidence
Returns:
res (list): The result of text detection box and save path of images.
"""
self.check_requirements()
from chinese_text_detection_db_mobile.processor import DBPreProcess, DBPostProcess, draw_boxes, get_image_ext
if use_gpu:
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
except:
raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
if images != [] and isinstance(images, list) and paths == []:
predicted_data = images
elif images == [] and isinstance(paths, list) and paths != []:
predicted_data = self.read_images(paths)
else:
raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
preprocessor = DBPreProcess()
postprocessor = DBPostProcess(box_thresh)
all_imgs = []
all_ratios = []
all_results = []
for original_image in predicted_data:
im, ratio_list = preprocessor(original_image)
res = {'save_path': ''}
if im is None:
res['data'] = []
else:
im = im.copy()
starttime = time.time()
self.input_tensor.copy_from_cpu(im)
self.predictor.zero_copy_run()
data_out = self.output_tensors[0].copy_to_cpu()
dt_boxes_list = postprocessor(data_out, [ratio_list])
boxes = self.filter_tag_det_res(dt_boxes_list[0],
original_image.shape)
res['data'] = boxes.astype(np.int).tolist()
all_imgs.append(im)
all_ratios.append(ratio_list)
if visualization:
img = Image.fromarray(
cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
draw_img = draw_boxes(img, boxes)
draw_img = np.array(draw_img)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
ext = get_image_ext(original_image)
saved_name = 'ndarray_{}{}'.format(time.time(), ext)
cv2.imwrite(
os.path.join(output_dir, saved_name),
draw_img[:, :, ::-1])
res['save_path'] = os.path.join(output_dir, saved_name)
all_results.append(res)
return all_results
def save_inference_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined:
model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename
place = fluid.CPUPlace()
exe = fluid.Executor(place)
model_file_path = os.path.join(self.pretrained_model_path, 'model')
params_file_path = os.path.join(self.pretrained_model_path, 'params')
program, feeded_var_names, target_vars = fluid.io.load_inference_model(
dirname=self.pretrained_model_path,
model_filename=model_file_path,
params_filename=params_file_path,
executor=exe)
fluid.io.save_inference_model(
dirname=dirname,
main_program=program,
executor=exe,
feeded_var_names=feeded_var_names,
target_vars=target_vars,
model_filename=model_filename,
params_filename=params_filename)
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.detect_text(images=images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
self.parser = argparse.ArgumentParser(
description="Run the %s module." % self.name,
prog='hub run %s' % self.name,
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options",
description=
"Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
results = self.detect_text(
paths=[args.input_path],
use_gpu=args.use_gpu,
output_dir=args.output_dir,
visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument(
'--output_dir',
type=str,
default='detection_result',
help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization',
type=ast.literal_eval,
default=False,
help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options
"""
self.arg_input_group.add_argument(
'--input_path', type=str, default=None, help="diretory to image")
if __name__ == '__main__':
db = ChineseTextDetectionDB()
image_path = [
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/11.jpg',
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/12.jpg',
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/test_image.jpg'
]
res = db.detect_text(paths=image_path, visualization=True)
db.save_inference_model('save')
print(res)
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
from PIL import Image, ImageDraw, ImageFont
from shapely.geometry import Polygon
import cv2
import numpy as np
import pyclipper
class DBPreProcess(object):
def __init__(self, max_side_len=960):
self.max_side_len = max_side_len
def resize_image_type(self, im):
"""
resize image to a size multiple of 32 which is required by the network
"""
h, w, _ = im.shape
resize_w = w
resize_h = h
# limit the max side
if max(resize_h, resize_w) > self.max_side_len:
if resize_h > resize_w:
ratio = float(self.max_side_len) / resize_h
else:
ratio = float(self.max_side_len) / resize_w
else:
ratio = 1.
resize_h = int(resize_h * ratio)
resize_w = int(resize_w * ratio)
if resize_h % 32 == 0:
resize_h = resize_h
elif resize_h // 32 <= 1:
resize_h = 32
else:
resize_h = (resize_h // 32 - 1) * 32
if resize_w % 32 == 0:
resize_w = resize_w
elif resize_w // 32 <= 1:
resize_w = 32
else:
resize_w = (resize_w // 32 - 1) * 32
try:
if int(resize_w) <= 0 or int(resize_h) <= 0:
return None, (None, None)
im = cv2.resize(im, (int(resize_w), int(resize_h)))
except:
print(im.shape, resize_w, resize_h)
sys.exit(0)
ratio_h = resize_h / float(h)
ratio_w = resize_w / float(w)
return im, (ratio_h, ratio_w)
def normalize(self, im):
img_mean = [0.485, 0.456, 0.406]
img_std = [0.229, 0.224, 0.225]
im = im.astype(np.float32, copy=False)
im = im / 255
im -= img_mean
im /= img_std
channel_swap = (2, 0, 1)
im = im.transpose(channel_swap)
return im
def __call__(self, im):
im, (ratio_h, ratio_w) = self.resize_image_type(im)
im = self.normalize(im)
im = im[np.newaxis, :]
return [im, (ratio_h, ratio_w)]
class DBPostProcess(object):
"""
The post process for Differentiable Binarization (DB).
"""
def __init__(self, thresh=0.3, box_thresh=0.5, max_candidates=1000):
self.thresh = thresh
self.box_thresh = box_thresh
self.max_candidates = max_candidates
self.min_size = 3
def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height):
'''
_bitmap: single map with shape (1, H, W),
whose values are binarized as {0, 1}
'''
bitmap = _bitmap
height, width = bitmap.shape
outs = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST,
cv2.CHAIN_APPROX_SIMPLE)
if len(outs) == 3:
img, contours, _ = outs[0], outs[1], outs[2]
elif len(outs) == 2:
contours, _ = outs[0], outs[1]
num_contours = min(len(contours), self.max_candidates)
boxes = np.zeros((num_contours, 4, 2), dtype=np.int16)
scores = np.zeros((num_contours, ), dtype=np.float32)
for index in range(num_contours):
contour = contours[index]
points, sside = self.get_mini_boxes(contour)
if sside < self.min_size:
continue
points = np.array(points)
score = self.box_score_fast(pred, points.reshape(-1, 2))
if self.box_thresh > score:
continue
box = self.unclip(points).reshape(-1, 1, 2)
box, sside = self.get_mini_boxes(box)
if sside < self.min_size + 2:
continue
box = np.array(box)
if not isinstance(dest_width, int):
dest_width = dest_width.item()
dest_height = dest_height.item()
box[:, 0] = np.clip(
np.round(box[:, 0] / width * dest_width), 0, dest_width)
box[:, 1] = np.clip(
np.round(box[:, 1] / height * dest_height), 0, dest_height)
boxes[index, :, :] = box.astype(np.int16)
scores[index] = score
return boxes, scores
def unclip(self, box, unclip_ratio=2.0):
poly = Polygon(box)
distance = poly.area * unclip_ratio / poly.length
offset = pyclipper.PyclipperOffset()
offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
expanded = np.array(offset.Execute(distance))
return expanded
def get_mini_boxes(self, contour):
bounding_box = cv2.minAreaRect(contour)
points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
index_1, index_2, index_3, index_4 = 0, 1, 2, 3
if points[1][1] > points[0][1]:
index_1 = 0
index_4 = 1
else:
index_1 = 1
index_4 = 0
if points[3][1] > points[2][1]:
index_2 = 2
index_3 = 3
else:
index_2 = 3
index_3 = 2
box = [
points[index_1], points[index_2], points[index_3], points[index_4]
]
return box, min(bounding_box[1])
def box_score_fast(self, bitmap, _box):
h, w = bitmap.shape[:2]
box = _box.copy()
xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1)
xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1)
ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1)
ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1)
mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
box[:, 0] = box[:, 0] - xmin
box[:, 1] = box[:, 1] - ymin
cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
def __call__(self, predictions, ratio_list):
pred = predictions[:, 0, :, :]
segmentation = pred > self.thresh
boxes_batch = []
for batch_index in range(pred.shape[0]):
height, width = pred.shape[-2:]
tmp_boxes, tmp_scores = self.boxes_from_bitmap(
pred[batch_index], segmentation[batch_index], width, height)
boxes = []
for k in range(len(tmp_boxes)):
if tmp_scores[k] > self.box_thresh:
boxes.append(tmp_boxes[k])
if len(boxes) > 0:
boxes = np.array(boxes)
ratio_h, ratio_w = ratio_list[batch_index]
boxes[:, :, 0] = boxes[:, :, 0] / ratio_w
boxes[:, :, 1] = boxes[:, :, 1] / ratio_h
boxes_batch.append(boxes)
return boxes_batch
def draw_boxes(image, boxes, scores=None, drop_score=0.5):
img = image.copy()
draw = ImageDraw.Draw(img)
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score:
continue
draw.line([(box[0][0], box[0][1]), (box[1][0], box[1][1])], fill='red')
draw.line([(box[1][0], box[1][1]), (box[2][0], box[2][1])], fill='red')
draw.line([(box[2][0], box[2][1]), (box[3][0], box[3][1])], fill='red')
draw.line([(box[3][0], box[3][1]), (box[0][0], box[0][1])], fill='red')
draw.line([(box[0][0] - 1, box[0][1] + 1),
(box[1][0] - 1, box[1][1] + 1)],
fill='red')
draw.line([(box[1][0] - 1, box[1][1] + 1),
(box[2][0] - 1, box[2][1] + 1)],
fill='red')
draw.line([(box[2][0] - 1, box[2][1] + 1),
(box[3][0] - 1, box[3][1] + 1)],
fill='red')
draw.line([(box[3][0] - 1, box[3][1] + 1),
(box[0][0] - 1, box[0][1] + 1)],
fill='red')
return img
def get_image_ext(image):
if image.shape[2] == 4:
return ".png"
return ".jpg"
## 概述
Differentiable Binarization(简称DB)是一种基于分割的文本检测算法。在各种文本检测算法中,基于分割的检测算法可以更好地处理弯曲等不规则形状文本,因此往往能取得更好的检测效果。但分割法后处理步骤中将分割结果转化为检测框的流程复杂,耗时严重。DB将二值化阈值加入训练中学习,可以获得更准确的检测边界,从而简化后处理流程。该Module是一个通用的文本检测模型,支持直接预测。
<p align="center">
<img src="https://bj.bcebos.com/paddlehub/model/image/ocr/db_algo.png" hspace='10'/> <br />
</p>
更多详情参考[Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
## 命令行预测
```shell
$ hub run chinese_text_detection_db_server --input_path "/PATH/TO/IMAGE"
```
**该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
## API
```python
def detect_text(paths=[],
images=[],
use_gpu=False,
output_dir='detection_result',
box_thresh=0.5,
visualization=False)
```
预测API,检测输入图片中的所有中文文本的位置。
**参数**
* paths (list\[str\]): 图片的路径;
* images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
* use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
* box\_thresh (float): 检测文本框置信度的阈值;
* visualization (bool): 是否将识别结果保存为图片文件;
* output\_dir (str): 图片的保存路径,默认设为 detection\_result;
**返回**
* res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
* data (list): 检测文本框结果,文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标
* save_path (str): 识别结果的保存路径, 如不保存图片则save_path为''
### 代码示例
```python
import paddlehub as hub
import cv2
text_detector = hub.Module(name="chinese_text_detection_db_server")
result = text_detector.detect_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result =text_detector.detect_text(paths=['/PATH/TO/IMAGE'])
```
## 服务部署
PaddleHub Serving 可以部署一个目标检测的在线服务。
### 第一步:启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m chinese_text_detection_db_server
```
这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
### 第二步:发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/chinese_text_detection_db_server"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 查看代码
https://github.com/PaddlePaddle/PaddleOCR
## 依赖
paddlepaddle >= 1.7.2
paddlehub >= 1.6.0
shapely
pyclipper
## 更新历史
* 1.0.0
初始发布
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import math
import os
import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, runnable, serving
from PIL import Image
import base64
import cv2
import numpy as np
import paddle.fluid as fluid
import paddlehub as hub
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
@moduleinfo(
name="chinese_text_detection_db_server",
version="1.0.0",
summary=
"The module aims to detect chinese text position in the image, which is based on differentiable_binarization algorithm.",
author="paddle-dev",
author_email="paddle-dev@baidu.com",
type="cv/text_recognition")
class ChineseTextDetectionDBServer(hub.Module):
def _initialize(self):
"""
initialize with the necessary elements
"""
self.pretrained_model_path = os.path.join(self.directory,
'ch_det_r50_vd_db')
self._set_config()
def check_requirements(self):
try:
import shapely, pyclipper
except:
print(
'This module requires the shapely, pyclipper tools. The running enviroment does not meet the requirments. Please install the two packages.'
)
exit()
def _set_config(self):
"""
predictor config setting
"""
model_file_path = os.path.join(self.pretrained_model_path, 'model')
params_file_path = os.path.join(self.pretrained_model_path, 'params')
config = AnalysisConfig(model_file_path, params_file_path)
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
use_gpu = True
except:
use_gpu = False
if use_gpu:
config.enable_use_gpu(8000, 0)
else:
config.disable_gpu()
config.disable_glog_info()
# use zero copy
config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
config.switch_use_feed_fetch_ops(False)
self.predictor = create_paddle_predictor(config)
input_names = self.predictor.get_input_names()
self.input_tensor = self.predictor.get_input_tensor(input_names[0])
output_names = self.predictor.get_output_names()
self.output_tensors = []
for output_name in output_names:
output_tensor = self.predictor.get_output_tensor(output_name)
self.output_tensors.append(output_tensor)
def read_images(self, paths=[]):
images = []
for img_path in paths:
assert os.path.isfile(
img_path), "The {} isn't a valid file.".format(img_path)
img = cv2.imread(img_path)
if img is None:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
return images
def filter_tag_det_res(self, dt_boxes, image_shape):
img_height, img_width = image_shape[0:2]
dt_boxes_new = []
for box in dt_boxes:
box = self.order_points_clockwise(box)
left = int(np.min(box[:, 0]))
right = int(np.max(box[:, 0]))
top = int(np.min(box[:, 1]))
bottom = int(np.max(box[:, 1]))
bbox_height = bottom - top
bbox_width = right - left
diffh = math.fabs(box[0, 1] - box[1, 1])
diffw = math.fabs(box[0, 0] - box[3, 0])
rect_width = int(np.linalg.norm(box[0] - box[1]))
rect_height = int(np.linalg.norm(box[0] - box[3]))
if rect_width <= 10 or rect_height <= 10:
continue
dt_boxes_new.append(box)
dt_boxes = np.array(dt_boxes_new)
return dt_boxes
def order_points_clockwise(self, pts):
"""
reference from: https://github.com/jrosebr1/imutils/blob/master/imutils/perspective.py
# sort the points based on their x-coordinates
"""
xSorted = pts[np.argsort(pts[:, 0]), :]
# grab the left-most and right-most points from the sorted
# x-roodinate points
leftMost = xSorted[:2, :]
rightMost = xSorted[2:, :]
# now, sort the left-most coordinates according to their
# y-coordinates so we can grab the top-left and bottom-left
# points, respectively
leftMost = leftMost[np.argsort(leftMost[:, 1]), :]
(tl, bl) = leftMost
rightMost = rightMost[np.argsort(rightMost[:, 1]), :]
(tr, br) = rightMost
rect = np.array([tl, tr, br, bl], dtype="float32")
return rect
def detect_text(self,
images=[],
paths=[],
use_gpu=False,
output_dir='detection_result',
visualization=False,
box_thresh=0.5):
"""
Get the text box in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
use_gpu (bool): Whether to use gpu. Default false.
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
box_thresh(float): the threshold of the detected text box's confidence
Returns:
res (list): The result of text detection box and save path of images.
"""
self.check_requirements()
from chinese_text_detection_db_server.processor import DBPreProcess, DBPostProcess, draw_boxes, get_image_ext
if use_gpu:
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
except:
raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
if images != [] and isinstance(images, list) and paths == []:
predicted_data = images
elif images == [] and isinstance(paths, list) and paths != []:
predicted_data = self.read_images(paths)
else:
raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
preprocessor = DBPreProcess()
postprocessor = DBPostProcess(box_thresh)
all_imgs = []
all_ratios = []
all_results = []
for original_image in predicted_data:
im, ratio_list = preprocessor(original_image)
res = {'save_path': ''}
if im is None:
res['data'] = []
else:
im = im.copy()
starttime = time.time()
self.input_tensor.copy_from_cpu(im)
self.predictor.zero_copy_run()
data_out = self.output_tensors[0].copy_to_cpu()
dt_boxes_list = postprocessor(data_out, [ratio_list])
boxes = self.filter_tag_det_res(dt_boxes_list[0],
original_image.shape)
res['data'] = boxes.astype(np.int).tolist()
all_imgs.append(im)
all_ratios.append(ratio_list)
if visualization:
img = Image.fromarray(
cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
draw_img = draw_boxes(img, boxes)
draw_img = np.array(draw_img)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
ext = get_image_ext(original_image)
saved_name = 'ndarray_{}{}'.format(time.time(), ext)
cv2.imwrite(
os.path.join(output_dir, saved_name),
draw_img[:, :, ::-1])
res['save_path'] = os.path.join(output_dir, saved_name)
all_results.append(res)
return all_results
def save_inference_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined:
model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename
place = fluid.CPUPlace()
exe = fluid.Executor(place)
model_file_path = os.path.join(self.pretrained_model_path, 'model')
params_file_path = os.path.join(self.pretrained_model_path, 'params')
program, feeded_var_names, target_vars = fluid.io.load_inference_model(
dirname=self.pretrained_model_path,
model_filename=model_file_path,
params_filename=params_file_path,
executor=exe)
fluid.io.save_inference_model(
dirname=dirname,
main_program=program,
executor=exe,
feeded_var_names=feeded_var_names,
target_vars=target_vars,
model_filename=model_filename,
params_filename=params_filename)
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.detect_text(images=images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
self.parser = argparse.ArgumentParser(
description="Run the %s module." % self.name,
prog='hub run %s' % self.name,
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options",
description=
"Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
results = self.detect_text(
paths=[args.input_path],
use_gpu=args.use_gpu,
output_dir=args.output_dir,
visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument(
'--output_dir',
type=str,
default='detection_result',
help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization',
type=ast.literal_eval,
default=False,
help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options
"""
self.arg_input_group.add_argument(
'--input_path', type=str, default=None, help="diretory to image")
if __name__ == '__main__':
db = ChineseTextDetectionDBServer()
image_path = [
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/11.jpg',
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/12.jpg'
]
res = db.detect_text(paths=image_path, visualization=True)
db.save_inference_model('save')
print(res)
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
from PIL import Image, ImageDraw, ImageFont
from shapely.geometry import Polygon
import cv2
import numpy as np
import pyclipper
class DBPreProcess(object):
def __init__(self, max_side_len=960):
self.max_side_len = max_side_len
def resize_image_type(self, im):
"""
resize image to a size multiple of 32 which is required by the network
"""
h, w, _ = im.shape
resize_w = w
resize_h = h
# limit the max side
if max(resize_h, resize_w) > self.max_side_len:
if resize_h > resize_w:
ratio = float(self.max_side_len) / resize_h
else:
ratio = float(self.max_side_len) / resize_w
else:
ratio = 1.
resize_h = int(resize_h * ratio)
resize_w = int(resize_w * ratio)
if resize_h % 32 == 0:
resize_h = resize_h
elif resize_h // 32 <= 1:
resize_h = 32
else:
resize_h = (resize_h // 32 - 1) * 32
if resize_w % 32 == 0:
resize_w = resize_w
elif resize_w // 32 <= 1:
resize_w = 32
else:
resize_w = (resize_w // 32 - 1) * 32
try:
if int(resize_w) <= 0 or int(resize_h) <= 0:
return None, (None, None)
im = cv2.resize(im, (int(resize_w), int(resize_h)))
except:
print(im.shape, resize_w, resize_h)
sys.exit(0)
ratio_h = resize_h / float(h)
ratio_w = resize_w / float(w)
return im, (ratio_h, ratio_w)
def normalize(self, im):
img_mean = [0.485, 0.456, 0.406]
img_std = [0.229, 0.224, 0.225]
im = im.astype(np.float32, copy=False)
im = im / 255
im -= img_mean
im /= img_std
channel_swap = (2, 0, 1)
im = im.transpose(channel_swap)
return im
def __call__(self, im):
im, (ratio_h, ratio_w) = self.resize_image_type(im)
im = self.normalize(im)
im = im[np.newaxis, :]
return [im, (ratio_h, ratio_w)]
class DBPostProcess(object):
"""
The post process for Differentiable Binarization (DB).
"""
def __init__(self, thresh=0.3, box_thresh=0.5, max_candidates=1000):
self.thresh = thresh
self.box_thresh = box_thresh
self.max_candidates = max_candidates
self.min_size = 3
def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height):
'''
_bitmap: single map with shape (1, H, W),
whose values are binarized as {0, 1}
'''
bitmap = _bitmap
height, width = bitmap.shape
outs = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST,
cv2.CHAIN_APPROX_SIMPLE)
if len(outs) == 3:
img, contours, _ = outs[0], outs[1], outs[2]
elif len(outs) == 2:
contours, _ = outs[0], outs[1]
num_contours = min(len(contours), self.max_candidates)
boxes = np.zeros((num_contours, 4, 2), dtype=np.int16)
scores = np.zeros((num_contours, ), dtype=np.float32)
for index in range(num_contours):
contour = contours[index]
points, sside = self.get_mini_boxes(contour)
if sside < self.min_size:
continue
points = np.array(points)
score = self.box_score_fast(pred, points.reshape(-1, 2))
if self.box_thresh > score:
continue
box = self.unclip(points).reshape(-1, 1, 2)
box, sside = self.get_mini_boxes(box)
if sside < self.min_size + 2:
continue
box = np.array(box)
if not isinstance(dest_width, int):
dest_width = dest_width.item()
dest_height = dest_height.item()
box[:, 0] = np.clip(
np.round(box[:, 0] / width * dest_width), 0, dest_width)
box[:, 1] = np.clip(
np.round(box[:, 1] / height * dest_height), 0, dest_height)
boxes[index, :, :] = box.astype(np.int16)
scores[index] = score
return boxes, scores
def unclip(self, box, unclip_ratio=2.0):
poly = Polygon(box)
distance = poly.area * unclip_ratio / poly.length
offset = pyclipper.PyclipperOffset()
offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
expanded = np.array(offset.Execute(distance))
return expanded
def get_mini_boxes(self, contour):
bounding_box = cv2.minAreaRect(contour)
points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
index_1, index_2, index_3, index_4 = 0, 1, 2, 3
if points[1][1] > points[0][1]:
index_1 = 0
index_4 = 1
else:
index_1 = 1
index_4 = 0
if points[3][1] > points[2][1]:
index_2 = 2
index_3 = 3
else:
index_2 = 3
index_3 = 2
box = [
points[index_1], points[index_2], points[index_3], points[index_4]
]
return box, min(bounding_box[1])
def box_score_fast(self, bitmap, _box):
h, w = bitmap.shape[:2]
box = _box.copy()
xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1)
xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1)
ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1)
ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1)
mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
box[:, 0] = box[:, 0] - xmin
box[:, 1] = box[:, 1] - ymin
cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
def __call__(self, predictions, ratio_list):
pred = predictions[:, 0, :, :]
segmentation = pred > self.thresh
boxes_batch = []
for batch_index in range(pred.shape[0]):
height, width = pred.shape[-2:]
tmp_boxes, tmp_scores = self.boxes_from_bitmap(
pred[batch_index], segmentation[batch_index], width, height)
boxes = []
for k in range(len(tmp_boxes)):
if tmp_scores[k] > self.box_thresh:
boxes.append(tmp_boxes[k])
if len(boxes) > 0:
boxes = np.array(boxes)
ratio_h, ratio_w = ratio_list[batch_index]
boxes[:, :, 0] = boxes[:, :, 0] / ratio_w
boxes[:, :, 1] = boxes[:, :, 1] / ratio_h
boxes_batch.append(boxes)
return boxes_batch
def draw_boxes(image, boxes, scores=None, drop_score=0.5):
img = image.copy()
draw = ImageDraw.Draw(img)
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score:
continue
draw.line([(box[0][0], box[0][1]), (box[1][0], box[1][1])], fill='red')
draw.line([(box[1][0], box[1][1]), (box[2][0], box[2][1])], fill='red')
draw.line([(box[2][0], box[2][1]), (box[3][0], box[3][1])], fill='red')
draw.line([(box[3][0], box[3][1]), (box[0][0], box[0][1])], fill='red')
draw.line([(box[0][0] - 1, box[0][1] + 1),
(box[1][0] - 1, box[1][1] + 1)],
fill='red')
draw.line([(box[1][0] - 1, box[1][1] + 1),
(box[2][0] - 1, box[2][1] + 1)],
fill='red')
draw.line([(box[2][0] - 1, box[2][1] + 1),
(box[3][0] - 1, box[3][1] + 1)],
fill='red')
draw.line([(box[3][0] - 1, box[3][1] + 1),
(box[0][0] - 1, box[0][1] + 1)],
fill='red')
return img
def get_image_ext(image):
if image.shape[2] == 4:
return ".png"
return ".jpg"
...@@ -96,7 +96,7 @@ embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text ...@@ -96,7 +96,7 @@ embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy. # Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer() params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(params_layer=params_layer) strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
``` ```
......
...@@ -96,7 +96,7 @@ embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text ...@@ -96,7 +96,7 @@ embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy. # Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer() params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(params_layer=params_layer) strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
``` ```
## 查看代码 ## 查看代码
......
...@@ -96,7 +96,7 @@ embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text ...@@ -96,7 +96,7 @@ embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy. # Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer() params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(params_layer=params_layer) strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
``` ```
## 查看代码 ## 查看代码
......
...@@ -96,7 +96,7 @@ embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text ...@@ -96,7 +96,7 @@ embedding_result = module.get_embedding(texts=[["Sample1_text_a"],["Sample2_text
# Use "get_params_layer" to get params layer and used to ULMFiTStrategy. # Use "get_params_layer" to get params layer and used to ULMFiTStrategy.
params_layer = module.get_params_layer() params_layer = module.get_params_layer()
strategy = hub.finetune.strategy.ULMFiTStrategy(params_layer=params_layer) strategy = hub.finetune.strategy.ULMFiTStrategy(frz_params_layer=params_layer, dis_params_layer=params_layer)
``` ```
## 查看代码 ## 查看代码
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册