提交 7dbbef9c 编写于 作者: Z zhangxuefei

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleHub into develop

......@@ -87,12 +87,13 @@ if __name__ == '__main__':
add_crf=True)
# Data to be predicted
# If using python 2, prefix "u" is necessary
data = [
["我们变而以书会友,以书结缘,把欧美、港台流行的食品类图谱、画册、工具书汇集一堂。"],
["为了跟踪国际最新食品工艺、流行趋势,大量搜集海外专业书刊资料是提高技艺的捷径。"],
["其中线装古籍逾千册;民国出版物几百种;珍本四册、稀见本四百余册,出版时间跨越三百余年。"],
["有的古木交柯,春机荣欣,从诗人句中得之,而入画中,观之令人心驰。"],
["不过重在晋趣,略增明人气息,妙在集古有道、不露痕迹罢了。"],
[u"我们变而以书会友,以书结缘,把欧美、港台流行的食品类图谱、画册、工具书汇集一堂。"],
[u"为了跟踪国际最新食品工艺、流行趋势,大量搜集海外专业书刊资料是提高技艺的捷径。"],
[u"其中线装古籍逾千册;民国出版物几百种;珍本四册、稀见本四百余册,出版时间跨越三百余年。"],
[u"有的古木交柯,春机荣欣,从诗人句中得之,而入画中,观之令人心驰。"],
[u"不过重在晋趣,略增明人气息,妙在集古有道、不露痕迹罢了。"],
]
# Add 0x02 between characters to match the format of training data,
......
......@@ -8,11 +8,11 @@ PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通
PaddleHub Serving主要包括利用Bert Service实现embedding服务化,以及利用预测模型实现预训练模型预测服务化两大功能,未来还将支持开发者使用PaddleHub Fine-tune API的模型服务化。
## Bert Service
Bert Service是基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)框架的快速部署模型远程计算服务方案,可将embedding过程通过调用API接口的方式实现,减少了对机器资源的依赖。使用PaddleHub可在服务器上一键部署`Bert Service`服务,在另外的普通机器上通过客户端接口即可轻松的获取文本对应的embedding数据。
`Bert Service`是基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)框架的快速部署模型远程计算服务方案,可将embedding过程通过调用API接口的方式实现,减少了对机器资源的依赖。使用PaddleHub可在服务器上一键部署`Bert Service`服务,在另外的普通机器上通过客户端接口即可轻松的获取文本对应的embedding数据。
关于Bert Service的具体信息和demo请参见[Bert Service](../../tutorial/bert_service.md)
关于具体信息和demo请参见[Bert Service](../../tutorial/bert_service.md)
该示例展示了利用Bert Service进行远程embedding服务化部署和在线预测,并获取文本embedding结果。
该示例展示了利用`Bert Service`进行远程embedding服务化部署和在线预测,并获取文本embedding结果。
## 预训练模型一键服务部署
预训练模型一键服务部署是基于PaddleHub的预训练模型快速部署的服务化方案,能够将模型预测以API接口的方式实现。
......@@ -53,4 +53,4 @@ Bert Service是基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)
  该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
关于Paddle Serving预训练模型一键服务部署功能的具体信息请参见[serving](module_serving)
关于Paddle Serving预训练模型一键服务部署功能的具体信息请参见[Module Serving](module_serving)
......@@ -68,5 +68,4 @@ Paddle Inference Server exit successfully!
这样,我们就利用一台GPU机器就完成了`Bert Service`的部署,并利用另一台普通机器进行了测试,可见通过`Bert Service`能够方便地进行在线embedding服务的快速部署。
## 预训练模型一键服务部署
除了`Bert Service`外,PaddleHub
Serving还具有预训练模型一键服务部署功能,能够将预训练模型快捷部署上线,对外提供可靠的在线预测服务,具体信息请参见[Module Serving](../../../tutorial/serving.md)
除了`Bert Service`外,PaddleHub Serving还具有预训练模型一键服务部署功能,能够将预训练模型快捷部署上线,对外提供可靠的在线预测服务,具体信息请参见[Module Serving](../../../tutorial/serving.md)
# PaddleHub Serving模型一键服务部署
## 简介
### 为什么使用一键服务部署
使用PaddleHub能够快速进行迁移学习和模型预测,但开发者常面临将训练好的模型部署上线的需求,无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一句命令快速得到一个预测服务API,而无需关注网络框架选择和实现。
使用PaddleHub能够快速进行模型预测,但开发者常面临本地预测过程迁移线上的需求。无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一行命令即可快速启动一个模型预测在线服务,而无需关注网络框架选择和实现。
### 什么是一键服务部署
PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。
### 支持模型
目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括`lac``senta_bilstm`等nlp类模型,以及`yolov3_coco2017``vgg16_imagenet`等cv类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
**NOTE:** 关于PaddleHub Serving一键服务部署的具体信息请参见[PaddleHub Servung](../../../tutorial/serving.md)
## Demo——部署一个在线lac分词服务
### Step1:部署lac在线服务
现在,我们要部署一个lac在线服务,以通过接口获取文本的分词结果。
首先,根据2.1节所述,启动PaddleHub Serving服务端的两种方式分别为:
```shell
$ hub serving start -m lac
```
```shell
$ hub serving start -c serving_config.json
```
其中`serving_config.json`的内容如下:
```json
{
"modules_info": [
{
"module": "lac",
"version": "1.0.0",
"batch_size": 1
}
],
"use_gpu": false,
"port": 8866,
"use_multiprocess": false
}
```
启动成功界面如图:
<p align="center">
<img src="../demo/serving/module_serving/img/start_serving_lac.png" width="100%" />
</p>
这样我们就在8866端口部署了lac的在线分词服务。
*此处warning为Flask提示,不影响使用*
### Step2:访问lac预测接口
在服务部署好之后,我们可以进行测试,用来测试的文本为`今天是个好日子``天气预报说今天要下雨`
客户端代码如下:
```python
# coding: utf8
import requests
import json
if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["今天是个好日子", "天气预报说今天要下雨"]
text = {"text": text_list}
# 指定预测方法为lac并发送post请求
url = "http://127.0.0.1:8866/predict/text/lac"
r = requests.post(url=url, data=text)
# 打印预测结果
print(json.dumps(r.json(), indent=4, ensure_ascii=False))
```
运行后得到结果:
```python
{
"results": [
{
"tag": [
"TIME", "v", "q", "n"
],
"word": [
"今天", "是", "个", "好日子"
]
},
{
"tag": [
"n", "v", "TIME", "v", "v"
],
"word": [
"天气预报", "说", "今天", "要", "下雨"
]
}
]
}
```
## Demo——其他模型的一键部署服务
获取其他PaddleHub Serving的一键服务部署场景示例,可参见下列demo:
* [图像分类-基于vgg11_imagent](../demo/serving/module_serving/classification_vgg11_imagenet)
目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括`lac``senta_bilstm`等NLP类模型,以及`yolov3_darknett53_coco2017``vgg16_imagenet`等CV类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
**NOTE:** 关于PaddleHub Serving一键服务部署的具体信息请参见[PaddleHub Serving](../../../tutorial/serving.md)
## Demo
获取PaddleHub Serving的一键服务部署场景示例,可参见下列demo:
* [图像分类-基于vgg11_imagent](../module_serving/classification_vgg11_imagenet)
&emsp;&emsp;该示例展示了利用vgg11_imagent完成图像分类服务化部署和在线预测,获取图像分类结果。
* [图像生成-基于stgan_celeba](../demo/serving/module_serving/GAN_stgan_celeba)
* [图像生成-基于stgan_celeba](../module_serving/GAN_stgan_celeba)
&emsp;&emsp;该示例展示了利用stgan_celeba生成图像服务化部署和在线预测,获取指定风格的生成图像。
* [文本审核-基于porn_detection_lstm](../demo/serving/module_serving/text_censorship_porn_detection_lstm)
* [文本审核-基于porn_detection_lstm](../module_serving/text_censorship_porn_detection_lstm)
&emsp;&emsp;该示例展示了利用porn_detection_lstm完成中文文本黄色敏感信息鉴定的服务化部署和在线预测,获取文本是否敏感及其置信度。
* [中文词法分析-基于lac](../demo/serving/module_serving/lexical_analysis_lac)
* [中文词法分析-基于lac](../module_serving/lexical_analysis_lac)
&emsp;&emsp;该示例展示了利用lac完成中文文本分词服务化部署和在线预测,获取文本的分词结果,并可通过用户自定义词典干预分词结果。
* [目标检测-基于yolov3_darknet53_coco2017](.../demo/serving/serving/object_detection_yolov3_darknet53_coco2017)
* [目标检测-基于yolov3_darknet53_coco2017](../module_serving/object_detection_yolov3_darknet53_coco2017)
&emsp;&emsp;该示例展示了利用yolov3_darknet53_coco2017完成目标检测服务化部署和在线预测,获取检测结果和覆盖识别框的图片。
* [中文语义分析-基于simnet_bow](../demo/serving/module_serving/semantic_model_simnet_bow)
* [中文语义分析-基于simnet_bow](../module_serving/semantic_model_simnet_bow)
&emsp;&emsp;该示例展示了利用simnet_bow完成中文文本相似度检测服务化部署和在线预测,获取文本的相似程度。
* [图像分割-基于deeplabv3p_xception65_humanseg](../demo/serving/module_serving/semantic_segmentation_deeplabv3p_xception65_humanseg)
* [图像分割-基于deeplabv3p_xception65_humanseg](../module_serving/semantic_segmentation_deeplabv3p_xception65_humanseg)
&emsp;&emsp;该示例展示了利用deeplabv3p_xception65_humanseg完成图像分割服务化部署和在线预测,获取识别结果和分割后的图像。
* [中文情感分析-基于simnet_bow](../demo/serving/module_serving/semantic_model_simnet_bow)
* [中文情感分析-基于simnet_bow](../module_serving/semantic_model_simnet_bow)
&emsp;&emsp;该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
## Bert Service
除了预训练模型一键服务部署功能外,PaddleHub Serving还具有`Bert Service`功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见[Bert Service](../../../tutorial/bert_service.md)
除了预训练模型一键服务部署功能外,PaddleHub Serving还具有`Bert Service`功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见[Bert Service](../../../tutorial/bert_service.md)
......@@ -6,7 +6,7 @@
这里就带领大家使用PaddleHub Serving,通过简单几步部署一个词法分析在线服务。
## 2 启动PaddleHub Serving
## Step1:启动PaddleHub Serving
启动命令如下
```shell
$ hub serving start -m lac
......
......@@ -3,7 +3,7 @@ import requests
import json
if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["今天是个好日子", "天气预报说今天要下雨"]
text = {"text": text_list}
# 指定预测方法为lac并发送post请求
......
......@@ -3,7 +3,7 @@ import requests
import json
if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["今天是个好日子", "天气预报说今天要下雨"]
text = {"text": text_list}
# 指定自定义词典{"user_dict": dict.txt}
......
......@@ -3,7 +3,7 @@ import requests
import json
if __name__ == "__main__":
# 指定用于用于匹配的文本并生成字典{"text_1": [text_a1, text_a2, ... ]
# 指定用于匹配的文本并生成字典{"text_1": [text_a1, text_a2, ... ]
# "text_2": [text_b1, text_b2, ... ]}
text = {
"text_1": ["这道题太难了", "这道题太难了", "这道题太难了"],
......
......@@ -3,7 +3,7 @@ import requests
import json
if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["我不爱吃甜食", "我喜欢躺在床上看电影"]
text = {"text": text_list}
# 指定预测方法为senta_lstm并发送post请求
......
......@@ -3,7 +3,7 @@ import requests
import json
if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["黄片下载", "中国黄页"]
text = {"text": text_list}
# 指定预测方法为lac并发送post请求
......
......@@ -26,14 +26,22 @@ import paddlehub as hub
from paddlehub.commands.base_command import BaseCommand, ENTRY
from paddlehub.serving import app_single as app
import multiprocessing
import gunicorn.app.base
if platform.system() == "Windows":
def number_of_workers():
return (multiprocessing.cpu_count() * 2) + 1
class StandaloneApplication(object):
def __init__(self):
pass
def load_config(self):
pass
def load(self):
pass
else:
import gunicorn.app.base
class StandaloneApplication(gunicorn.app.base.BaseApplication):
class StandaloneApplication(gunicorn.app.base.BaseApplication):
def __init__(self, app, options=None):
self.options = options or {}
self.application = app
......@@ -52,6 +60,10 @@ class StandaloneApplication(gunicorn.app.base.BaseApplication):
return self.application
def number_of_workers():
return (multiprocessing.cpu_count() * 2) + 1
class ServingCommand(BaseCommand):
name = "serving"
module_list = []
......
......@@ -29,7 +29,7 @@ import tarfile
from paddlehub.common import utils
from paddlehub.common.logger import logger
__all__ = ['Downloader']
__all__ = ['Downloader', 'progress']
FLUSH_INTERVAL = 0.1
lasttime = time.time()
......
......@@ -26,7 +26,7 @@ from paddlehub.common.downloader import default_downloader
from paddlehub.common.logger import logger
class BaseCVDatast(BaseDataset):
class BaseCVDataset(BaseDataset):
def __init__(self,
base_path,
train_list_file=None,
......@@ -35,7 +35,7 @@ class BaseCVDatast(BaseDataset):
predict_list_file=None,
label_list_file=None,
label_list=None):
super(BaseCVDatast, self).__init__(
super(BaseCVDataset, self).__init__(
base_path=base_path,
train_file=train_list_file,
dev_file=validate_list_file,
......@@ -65,7 +65,7 @@ class BaseCVDatast(BaseDataset):
return data
# discarded. please use BaseCVDatast
# discarded. please use BaseCVDataset
class ImageClassificationDataset(object):
def __init__(self):
logger.warning(
......
......@@ -21,9 +21,10 @@ import io
import csv
from paddlehub.dataset import InputExample, BaseDataset
from paddlehub.common.logger import logger
class BaseNLPDatast(BaseDataset):
class BaseNLPDataset(BaseDataset):
def __init__(self,
base_path,
train_file=None,
......@@ -32,11 +33,11 @@ class BaseNLPDatast(BaseDataset):
predict_file=None,
label_file=None,
label_list=None,
train_file_with_head=False,
dev_file_with_head=False,
test_file_with_head=False,
predict_file_with_head=False):
super(BaseNLPDatast, self).__init__(
train_file_with_header=False,
dev_file_with_header=False,
test_file_with_header=False,
predict_file_with_header=False):
super(BaseNLPDataset, self).__init__(
base_path=base_path,
train_file=train_file,
dev_file=dev_file,
......@@ -44,25 +45,24 @@ class BaseNLPDatast(BaseDataset):
predict_file=predict_file,
label_file=label_file,
label_list=label_list,
train_file_with_head=train_file_with_head,
dev_file_with_head=dev_file_with_head,
test_file_with_head=test_file_with_head,
predict_file_with_head=predict_file_with_head)
train_file_with_header=train_file_with_header,
dev_file_with_header=dev_file_with_header,
test_file_with_header=test_file_with_header,
predict_file_with_header=predict_file_with_header)
def _read_file(self, input_file, phase=None):
"""Reads a tab separated value file."""
has_warned = False
with io.open(input_file, "r", encoding="UTF-8") as file:
reader = csv.reader(file, delimiter="\t", quotechar=None)
examples = []
for (i, line) in enumerate(reader):
if i == 0:
ncol = len(line)
if self.if_file_with_head[phase]:
if self.if_file_with_header[phase]:
continue
if ncol == 1:
if phase != "predict":
example = InputExample(guid=i, text_a=line[0])
else:
if ncol == 1:
raise Exception(
"the %s file: %s only has one column but it is not a predict file"
% (phase, input_file))
......@@ -71,10 +71,28 @@ class BaseNLPDatast(BaseDataset):
guid=i, text_a=line[0], label=line[1])
elif ncol == 3:
example = InputExample(
guid=i, text_a=line[0], text_b=line[1], label=line[2])
guid=i,
text_a=line[0],
text_b=line[1],
label=line[2])
else:
raise Exception(
"the %s file: %s has too many columns (should <=3)"
% (phase, input_file))
else:
if ncol == 1:
example = InputExample(guid=i, text_a=line[0])
elif ncol == 2:
if not has_warned:
logger.warning(
"the predict file: %s has 2 columns, as it is a predict file, the second one will be regarded as text_b"
% (input_file))
has_warned = True
example = InputExample(
guid=i, text_a=line[0], text_b=line[1])
else:
raise Exception(
"the %s file: %s has too many columns (should <=3)" %
(phase, input_file))
"the predict file: %s has too many columns (should <=2)"
% (input_file))
examples.append(example)
return examples
......@@ -20,10 +20,10 @@ from __future__ import print_function
import os
from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
class BQ(BaseNLPDatast):
class BQ(BaseNLPDataset):
def __init__(self):
dataset_dir = os.path.join(DATA_HOME, "bq")
base_path = self._download_dataset(
......
......@@ -23,10 +23,10 @@ import csv
from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
class ChnSentiCorp(BaseNLPDatast):
class ChnSentiCorp(BaseNLPDataset):
"""
ChnSentiCorp (by Tan Songbo at ICT of Chinese Academy of Sciences, and for
opinion mining)
......
......@@ -20,7 +20,7 @@ import os
from paddlehub.reader import tokenization
from paddlehub.common.dir import DATA_HOME
from paddlehub.common.logger import logger
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/cmrc2018.tar.gz"
SPIECE_UNDERLINE = '▁'
......@@ -62,7 +62,7 @@ class CMRC2018Example(object):
return s
class CMRC2018(BaseNLPDatast):
class CMRC2018(BaseNLPDataset):
"""A single set of features of data."""
def __init__(self):
......
......@@ -64,10 +64,10 @@ class BaseDataset(object):
predict_file=None,
label_file=None,
label_list=None,
train_file_with_head=False,
dev_file_with_head=False,
test_file_with_head=False,
predict_file_with_head=False):
train_file_with_header=False,
dev_file_with_header=False,
test_file_with_header=False,
predict_file_with_header=False):
if not (train_file or dev_file or test_file):
raise ValueError("At least one file should be assigned")
self.base_path = base_path
......@@ -83,11 +83,11 @@ class BaseDataset(object):
self.test_examples = []
self.predict_examples = []
self.if_file_with_head = {
"train": train_file_with_head,
"dev": dev_file_with_head,
"test": test_file_with_head,
"predict": predict_file_with_head
self.if_file_with_header = {
"train": train_file_with_header,
"dev": dev_file_with_header,
"test": test_file_with_header,
"predict": predict_file_with_header
}
if train_file:
......@@ -128,7 +128,7 @@ class BaseDataset(object):
def num_labels(self):
return len(self.label_list)
# To compatibility with the usage of ImageClassificationDataset
# To be compatible with ImageClassificationDataset
def label_dict(self):
return {index: key for index, key in enumerate(self.label_list)}
......
......@@ -20,10 +20,10 @@ from __future__ import print_function
import os
import paddlehub as hub
from paddlehub.dataset.base_cv_dataset import BaseCVDatast
from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class DogCatDataset(BaseCVDatast):
class DogCatDataset(BaseCVDataset):
def __init__(self):
dataset_path = os.path.join(hub.common.dir.DATA_HOME, "dog-cat")
base_path = self._download_dataset(
......
......@@ -20,7 +20,7 @@ import os
from paddlehub.reader import tokenization
from paddlehub.common.dir import DATA_HOME
from paddlehub.common.logger import logger
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/drcd.tar.gz"
SPIECE_UNDERLINE = '▁'
......@@ -62,7 +62,7 @@ class DRCDExample(object):
return s
class DRCD(BaseNLPDatast):
class DRCD(BaseNLPDataset):
"""A single set of features of data."""
def __init__(self):
......
......@@ -20,10 +20,10 @@ from __future__ import print_function
import os
import paddlehub as hub
from paddlehub.dataset.base_cv_dataset import BaseCVDatast
from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class FlowersDataset(BaseCVDatast):
class FlowersDataset(BaseCVDataset):
def __init__(self):
dataset_path = os.path.join(hub.common.dir.DATA_HOME, "flower_photos")
base_path = self._download_dataset(
......
......@@ -20,10 +20,10 @@ from __future__ import print_function
import os
import paddlehub as hub
from paddlehub.dataset.base_cv_dataset import BaseCVDatast
from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class Food101Dataset(BaseCVDatast):
class Food101Dataset(BaseCVDataset):
def __init__(self):
dataset_path = os.path.join(hub.common.dir.DATA_HOME, "food-101",
"images")
......
......@@ -24,12 +24,12 @@ import io
from paddlehub.dataset import InputExample
from paddlehub.common.logger import logger
from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/glue_data.tar.gz"
class GLUE(BaseNLPDatast):
class GLUE(BaseNLPDataset):
"""
Please refer to
https://gluebenchmark.com
......
......@@ -22,12 +22,12 @@ import os
from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/iflytek.tar.gz"
class IFLYTEK(BaseNLPDatast):
class IFLYTEK(BaseNLPDataset):
def __init__(self):
dataset_dir = os.path.join(DATA_HOME, "iflytek")
base_path = self._download_dataset(dataset_dir, url=_DATA_URL)
......
......@@ -20,10 +20,10 @@ from __future__ import print_function
import os
import paddlehub as hub
from paddlehub.dataset.base_cv_dataset import BaseCVDatast
from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class Indoor67Dataset(BaseCVDatast):
class Indoor67Dataset(BaseCVDataset):
def __init__(self):
dataset_path = os.path.join(hub.common.dir.DATA_HOME, "Indoor67")
base_path = self._download_dataset(
......
......@@ -23,12 +23,12 @@ import csv
from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/inews.tar.gz"
class INews(BaseNLPDatast):
class INews(BaseNLPDataset):
"""
INews is a sentiment analysis dataset for Internet News
"""
......
......@@ -23,12 +23,12 @@ import csv
from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/lcqmc.tar.gz"
class LCQMC(BaseNLPDatast):
class LCQMC(BaseNLPDataset):
def __init__(self):
dataset_dir = os.path.join(DATA_HOME, "lcqmc")
base_path = self._download_dataset(dataset_dir, url=_DATA_URL)
......
......@@ -23,12 +23,12 @@ import csv
from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/msra_ner.tar.gz"
class MSRA_NER(BaseNLPDatast):
class MSRA_NER(BaseNLPDataset):
"""
A set of manually annotated Chinese word-segmentation data and
specifications for training and testing a Chinese word-segmentation system
......
......@@ -23,12 +23,12 @@ import csv
from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/nlpcc-dbqa.tar.gz"
class NLPCC_DBQA(BaseNLPDatast):
class NLPCC_DBQA(BaseNLPDataset):
"""
Please refer to
http://tcci.ccf.org.cn/conference/2017/dldoc/taskgline05.pdf
......
......@@ -20,7 +20,7 @@ import os
from paddlehub.reader import tokenization
from paddlehub.common.dir import DATA_HOME
from paddlehub.common.logger import logger
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/squad.tar.gz"
......@@ -65,7 +65,7 @@ class SquadExample(object):
return s
class SQUAD(BaseNLPDatast):
class SQUAD(BaseNLPDataset):
"""A single set of features of data."""
def __init__(self, version_2_with_negative=False):
......
......@@ -20,10 +20,10 @@ from __future__ import print_function
import os
import paddlehub as hub
from paddlehub.dataset.base_cv_dataset import BaseCVDatast
from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class StanfordDogsDataset(BaseCVDatast):
class StanfordDogsDataset(BaseCVDataset):
def __init__(self):
dataset_path = os.path.join(hub.common.dir.DATA_HOME,
"StanfordDogs-120")
......
......@@ -22,12 +22,12 @@ import os
from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/thucnews.tar.gz"
class THUCNEWS(BaseNLPDatast):
class THUCNEWS(BaseNLPDataset):
def __init__(self):
dataset_dir = os.path.join(DATA_HOME, "thucnews")
base_path = self._download_dataset(dataset_dir, url=_DATA_URL)
......
......@@ -22,12 +22,12 @@ import pandas as pd
from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/toxic.tar.gz"
class Toxic(BaseNLPDatast):
class Toxic(BaseNLPDataset):
"""
The kaggle Toxic dataset:
https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
......
......@@ -25,12 +25,12 @@ import csv
from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast
from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/XNLI-lan.tar.gz"
class XNLI(BaseNLPDatast):
class XNLI(BaseNLPDataset):
"""
Please refer to
https://arxiv.org/pdf/1809.05053.pdf
......
......@@ -24,7 +24,12 @@ import copy
import logging
import inspect
from functools import partial
from collections import OrderedDict
import six
if six.PY2:
from inspect import getargspec as get_args
else:
from inspect import getfullargspec as get_args
import numpy as np
import paddle.fluid as fluid
from tb_paddle import SummaryWriter
......@@ -84,44 +89,44 @@ class RunEnv(object):
class TaskHooks():
def __init__(self):
self._registered_hooks = {
"build_env_start": {},
"build_env_end": {},
"finetune_start": {},
"finetune_end": {},
"predict_start": {},
"predict_end": {},
"eval_start": {},
"eval_end": {},
"log_interval": {},
"save_ckpt_interval": {},
"eval_interval": {},
"run_step": {},
"build_env_start_event": OrderedDict(),
"build_env_end_event": OrderedDict(),
"finetune_start_event": OrderedDict(),
"finetune_end_event": OrderedDict(),
"predict_start_event": OrderedDict(),
"predict_end_event": OrderedDict(),
"eval_start_event": OrderedDict(),
"eval_end_event": OrderedDict(),
"log_interval_event": OrderedDict(),
"save_ckpt_interval_event": OrderedDict(),
"eval_interval_event": OrderedDict(),
"run_step_event": OrderedDict(),
}
self._hook_params_num = {
"build_env_start": 1,
"build_env_end": 1,
"finetune_start": 1,
"finetune_end": 2,
"predict_start": 1,
"predict_end": 2,
"eval_start": 1,
"eval_end": 2,
"log_interval": 2,
"save_ckpt_interval": 1,
"eval_interval": 1,
"run_step": 2,
"build_env_start_event": 1,
"build_env_end_event": 1,
"finetune_start_event": 1,
"finetune_end_event": 2,
"predict_start_event": 1,
"predict_end_event": 2,
"eval_start_event": 1,
"eval_end_event": 2,
"log_interval_event": 2,
"save_ckpt_interval_event": 1,
"eval_interval_event": 1,
"run_step_event": 2,
}
def add(self, hook_type, name=None, func=None):
if not func or not callable(func):
raise TypeError(
"The hook function is empty or it is not a function")
if name and not isinstance(name, str):
raise TypeError("The hook name must be a string")
if not name:
if name == None:
name = "hook_%s" % id(func)
# check validity
if not isinstance(name, str) or name.strip() == "":
raise TypeError("The hook name must be a non-empty string")
if hook_type not in self._registered_hooks:
raise ValueError("hook_type: %s does not exist" % (hook_type))
if name in self._registered_hooks[hook_type]:
......@@ -129,7 +134,7 @@ class TaskHooks():
"name: %s has existed in hook_type:%s, use modify method to modify it"
% (name, hook_type))
else:
args_num = len(inspect.getfullargspec(func).args)
args_num = len(get_args(func).args)
if args_num != self._hook_params_num[hook_type]:
raise ValueError(
"The number of parameters to the hook hook_type:%s should be %i"
......@@ -163,13 +168,13 @@ class TaskHooks():
else:
return True
def info(self, only_customized=True):
def info(self, show_default=False):
# formatted output the source code
ret = ""
for hook_type, hooks in self._registered_hooks.items():
already_print_type = False
for name, func in hooks.items():
if name == "default" and only_customized:
if name == "default" and not show_default:
continue
if not already_print_type:
ret += "hook_type: %s{\n" % hook_type
......@@ -182,7 +187,7 @@ class TaskHooks():
if already_print_type:
ret += "}\n"
if not ret:
ret = "Not any hooks when only_customized=%s" % only_customized
ret = "Not any customized hooks have been defined, you can set show_default=True to see the default hooks information"
return ret
def __getitem__(self, hook_type):
......@@ -259,8 +264,8 @@ class BaseTask(object):
self._hooks = TaskHooks()
for hook_type, event_hooks in self._hooks._registered_hooks.items():
self._hooks.add(hook_type, "default",
eval("self._default_%s_event" % hook_type))
setattr(BaseTask, "_%s_event" % hook_type,
eval("self._default_%s" % hook_type))
setattr(BaseTask, "_%s" % hook_type,
self.create_event_function(hook_type))
# accelerate predict
......@@ -581,13 +586,18 @@ class BaseTask(object):
return self._hooks.info(only_customized)
def add_hook(self, hook_type, name=None, func=None):
if name == None:
name = "hook_%s" % id(func)
self._hooks.add(hook_type, name=name, func=func)
logger.info("Add hook %s:%s successfully" % (hook_type, name))
def delete_hook(self, hook_type, name):
self._hooks.delete(hook_type, name)
logger.info("Delete hook %s:%s successfully" % (hook_type, name))
def modify_hook(self, hook_type, name, func):
self._hooks.modify(hook_type, name, func)
logger.info("Modify hook %s:%s successfully" % (hook_type, name))
def _default_build_env_start_event(self):
pass
......
......@@ -142,7 +142,7 @@ class ClassifierTask(BaseTask):
}
except:
raise Exception(
"ImageClassificationDataset does not support postprocessing, please use BaseCVDatast instead"
"ImageClassificationDataset does not support postprocessing, please use BaseCVDataset instead"
)
results = []
for batch_state in run_states:
......
......@@ -26,6 +26,7 @@ import json
from collections import OrderedDict
import io
import numpy as np
import paddle.fluid as fluid
from .base_task import BaseTask
......@@ -517,13 +518,13 @@ class ReadingComprehensionTask(BaseTask):
null_score_diff_threshold=self.null_score_diff_threshold,
is_english=self.is_english)
if self.phase == 'val' or self.phase == 'dev':
with open(
with io.open(
self.data_reader.dataset.dev_path, 'r',
encoding="utf8") as dataset_file:
dataset_json = json.load(dataset_file)
dataset = dataset_json['data']
elif self.phase == 'test':
with open(
with io.open(
self.data_reader.dataset.test_path, 'r',
encoding="utf8") as dataset_file:
dataset_json = json.load(dataset_file)
......
......@@ -168,8 +168,7 @@ class LocalModuleManager(object):
with tarfile.open(module_package, "r:gz") as tar:
file_names = tar.getnames()
size = len(file_names) - 1
module_dir = os.path.split(file_names[0])[0]
module_dir = os.path.join(_dir, module_dir)
module_dir = os.path.join(_dir, file_names[0])
for index, file_name in enumerate(file_names):
tar.extract(file_name, _dir)
......@@ -195,7 +194,7 @@ class LocalModuleManager(object):
save_path = os.path.join(MODULE_HOME, module_name)
if os.path.exists(save_path):
shutil.move(save_path)
shutil.rmtree(save_path)
if from_user_dir:
shutil.copytree(module_dir, save_path)
else:
......
......@@ -37,6 +37,7 @@ from paddlehub.common.lock import lock
from paddlehub.common.logger import logger
from paddlehub.common.hub_server import CacheUpdater
from paddlehub.common import tmp_dir
from paddlehub.common.downloader import progress
from paddlehub.module import module_desc_pb2
from paddlehub.module.manager import default_module_manager
from paddlehub.module.checker import ModuleChecker
......@@ -99,10 +100,22 @@ def create_module(directory, name, author, email, module_type, summary,
_cwd = os.getcwd()
os.chdir(base_dir)
for dirname, _, files in os.walk(module_dir):
for file in files:
tar.add(os.path.join(dirname, file).replace(base_dir, "."))
module_dir = module_dir.replace(base_dir, ".")
tar.add(module_dir, recursive=False)
files = []
for dirname, _, subfiles in os.walk(module_dir):
for file in subfiles:
files.append(os.path.join(dirname, file))
total_length = len(files)
print("Create Module {}-{}".format(name, version))
for index, file in enumerate(files):
done = int(float(index) / total_length * 50)
progress("[%-50s] %.2f%%" % ('=' * done,
float(index / total_length * 100)))
tar.add(file)
progress("[%-50s] %.2f%%" % ('=' * 50, 100), end=True)
print("Module package saved as {}".format(save_file))
os.chdir(_cwd)
......
......@@ -170,7 +170,7 @@ class WSSPTokenizer(object):
self.inv_vocab = {v: k for k, v in self.vocab.items()}
self.ws = ws
self.lower = lower
self.dict = pickle.load(open(word_dict, 'rb'), encoding='utf8')
self.dict = pickle.load(open(word_dict, 'rb'))
self.sp_model = spm.SentencePieceProcessor()
self.window_size = 5
self.sp_model.Load(sp_model_dir)
......
......@@ -30,7 +30,7 @@
使用Bert Service搭建服务主要分为下面三个步骤:
## Step1:环境准备
## Step1:准备环境
### 环境要求
下表是使用`Bert Service`的环境要求,带有*号标志项为非必需依赖,可根据实际使用需求选择安装。
......@@ -40,8 +40,7 @@
|PaddleHub|>=1.4.0|无|
|PaddlePaddle|>=1.6.1|若使用GPU计算,则对应使用PaddlePaddle-gpu版本|
|GCC|>=4.8|无|
|CUDA*|>=8|若使用GPU,需使用CUDA8以上版本|
|paddle-gpu-serving*|>=0.8.0|在`Bert Service`服务端需依赖此包|
|paddle-gpu-serving*|>=0.8.2|在`Bert Service`服务端需依赖此包|
|ujson*|>=1.35|在`Bert Service`客户端需依赖此包|
### 安装步骤
......@@ -84,7 +83,7 @@ $ pip install ujson
|[bert_chinese_L-12_H-768_A-12](https://paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)|BERT|
## Step2:服务端(server)
## Step2:启动服务端(server)
### 简介
server端接收client端发送的数据,执行模型计算过程并将计算结果返回给client端。
......@@ -130,7 +129,7 @@ Paddle Inference Server exit successfully!
```
## Step3:客户端(client)
## Step3:启动客户端(client)
### 简介
client端接收文本数据,并获取server端返回的模型计算的embedding结果。
......@@ -197,11 +196,11 @@ input_text = [["西风吹老洞庭波"], ["一夜湘君白发多"], ["醉后不
```python
result = bc.get_result(input_text=input_text)
```
最后即可得到embedding结果(此处只展示部分结果)。
这样,就得到了embedding结果(此处只展示部分结果)。
```python
[[0.9993321895599361, 0.9994612336158751, 0.9999646544456481, 0.732795298099517, -0.34387934207916204, ... ]]
```
客户端代码demo文件见[示例](../paddlehub/serving/bert_serving/bert_service.py)
客户端代码demo文件见[示例](../demo/serving/bert_service/bert_service_client.py)
运行命令如下:
```shell
$ python bert_service_client.py
......
# PaddleHub Serving模型一键服务部署
## 简介
### 为什么使用一键服务部署
使用PaddleHub能够快速进行迁移学习和模型预测,但开发者常面临将训练好的模型部署上线的需求,无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一句命令快速得到一个预测服务API,而无需关注网络框架选择和实现。
使用PaddleHub能够快速进行模型预测,但开发者常面临本地预测过程迁移线上的需求。无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一行命令即可快速启动一个模型预测在线服务,而无需关注网络框架选择和实现。
### 什么是一键服务部署
PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。
### 支持模型
目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括`lac``senta_bilstm`等nlp类模型,以及`yolov3_coco2017``vgg16_imagenet`等cv类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
### 所需环境
下表是使用PaddleHub Serving的环境要求及注意事项。
|项目|建议版本|说明|
|:-:|:-:|:-:|
|操作系统|Linux/Darwin/Windows|建议使用Linux或Darwin,对多线程启动方式支持性较好|
|PaddleHub|>=1.4.0|无|
|PaddlePaddle|>=1.6.1|若使用GPU计算,则对应使用PaddlePaddle-gpu版本|
目前PaddleHub Serving支持对PaddleHub所有可直接预测的模型进行服务部署,包括`lac``senta_bilstm`等NLP类模型,以及`yolov3_darknet53_coco2017``vgg16_imagenet`等CV类模型,更多模型请参见[PaddleHub支持模型列表](https://paddlepaddle.org.cn/hublist)。未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
## 使用
### Step1:启动服务端部署
PaddleHub Serving有两种启动方式,分别是使用命令行命令启动,以及使用配置文件启动。
PaddleHub Serving有两种启动方式,分别是使用命令行启动,以及使用配置文件启动。
#### 命令行命令启动
启动命令
......@@ -37,7 +28,7 @@ $ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
|--modules/-m|PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出<br>*`当不指定Version时,默认选择最新版本`*|
|--port/-p|服务端口,默认为8866|
|--use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu|
|--use_multiprocess|是否启用并发方式,默认为单进程方式|
|--use_multiprocess|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式<br>*`Windows操作系统只支持单进程方式`*|
#### 配置文件启动
启动命令
......@@ -60,8 +51,8 @@ $ hub serving start --config config.json
"batch_size": "BATCH_SIZE_2"
}
],
"use_gpu": false,
"port": 8866,
"use_gpu": false,
"use_multiprocess": false
}
```
......@@ -70,10 +61,10 @@ $ hub serving start --config config.json
|参数|用途|
|-|-|
|--modules_info|PaddleHub Serving预安装模型,以字典列表形式列出,其中:<br>`module`为预测服务使用的模型名<br>`version`为预测模型的版本<br>`batch_size`为预测批次大小
|--use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu|
|--port/-p|服务端口,默认为8866|
|--use_multiprocess|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式|
|modules_info|PaddleHub Serving预安装模型,以字典列表形式列出,其中:<br>`module`为预测服务使用的模型名<br>`version`为预测模型的版本<br>`batch_size`为预测批次大小
|port|服务端口,默认为8866|
|use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu|
|use_multiprocess|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式<br>*`Windows操作系统只支持单进程方式`*|
### Step2:访问服务端
......@@ -99,7 +90,7 @@ http://0.0.0.0:8866/predict/<CATEGORY\>/\<MODULE>
### Step1:部署lac在线服务
现在,我们要部署一个lac在线服务,以通过接口获取文本的分词结果。
首先,根据2.1节所述,启动PaddleHub Serving服务端的两种方式分别为:
首先,任意选择一种启动方式,两种方式分别为:
```shell
$ hub serving start -m lac
```
......@@ -148,7 +139,7 @@ if __name__ == "__main__":
text_list = ["今天是个好日子", "天气预报说今天要下雨"]
text = {"text": text_list}
# 指定预测方法为lac并发送post请求
url = "http://127.0.0.1:8866/predict/text/lac"
url = "http://0.0.0.0:8866/predict/text/lac"
r = requests.post(url=url, data=text)
# 打印预测结果
......@@ -180,6 +171,8 @@ if __name__ == "__main__":
}
```
此Demo的具体信息和代码请参见[LAC Serving](../demo/serving/module_serving/lexical_analysis_lac)。另外,下面展示了一些其他的一键服务部署Demo。
## Demo——其他模型的一键部署服务
获取其他PaddleHub Serving的一键服务部署场景示例,可参见下列demo
......@@ -217,4 +210,4 @@ if __name__ == "__main__":
&emsp;&emsp;该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
## Bert Service
除了预训练模型一键服务部署功能外,PaddleHub Serving还具有`Bert Service`功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见[Bert Service](./bert_service.md)
除了预训练模型一键服务部署功能外,PaddleHub Serving还具有`Bert Service`功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见[Bert Service](./bert_service.md)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册