提交 7dbbef9c 编写于 作者: Z zhangxuefei

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleHub into develop

...@@ -87,12 +87,13 @@ if __name__ == '__main__': ...@@ -87,12 +87,13 @@ if __name__ == '__main__':
add_crf=True) add_crf=True)
# Data to be predicted # Data to be predicted
# If using python 2, prefix "u" is necessary
data = [ data = [
["我们变而以书会友,以书结缘,把欧美、港台流行的食品类图谱、画册、工具书汇集一堂。"], [u"我们变而以书会友,以书结缘,把欧美、港台流行的食品类图谱、画册、工具书汇集一堂。"],
["为了跟踪国际最新食品工艺、流行趋势,大量搜集海外专业书刊资料是提高技艺的捷径。"], [u"为了跟踪国际最新食品工艺、流行趋势,大量搜集海外专业书刊资料是提高技艺的捷径。"],
["其中线装古籍逾千册;民国出版物几百种;珍本四册、稀见本四百余册,出版时间跨越三百余年。"], [u"其中线装古籍逾千册;民国出版物几百种;珍本四册、稀见本四百余册,出版时间跨越三百余年。"],
["有的古木交柯,春机荣欣,从诗人句中得之,而入画中,观之令人心驰。"], [u"有的古木交柯,春机荣欣,从诗人句中得之,而入画中,观之令人心驰。"],
["不过重在晋趣,略增明人气息,妙在集古有道、不露痕迹罢了。"], [u"不过重在晋趣,略增明人气息,妙在集古有道、不露痕迹罢了。"],
] ]
# Add 0x02 between characters to match the format of training data, # Add 0x02 between characters to match the format of training data,
......
...@@ -8,11 +8,11 @@ PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通 ...@@ -8,11 +8,11 @@ PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通
PaddleHub Serving主要包括利用Bert Service实现embedding服务化,以及利用预测模型实现预训练模型预测服务化两大功能,未来还将支持开发者使用PaddleHub Fine-tune API的模型服务化。 PaddleHub Serving主要包括利用Bert Service实现embedding服务化,以及利用预测模型实现预训练模型预测服务化两大功能,未来还将支持开发者使用PaddleHub Fine-tune API的模型服务化。
## Bert Service ## Bert Service
Bert Service是基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)框架的快速部署模型远程计算服务方案,可将embedding过程通过调用API接口的方式实现,减少了对机器资源的依赖。使用PaddleHub可在服务器上一键部署`Bert Service`服务,在另外的普通机器上通过客户端接口即可轻松的获取文本对应的embedding数据。 `Bert Service`是基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)框架的快速部署模型远程计算服务方案,可将embedding过程通过调用API接口的方式实现,减少了对机器资源的依赖。使用PaddleHub可在服务器上一键部署`Bert Service`服务,在另外的普通机器上通过客户端接口即可轻松的获取文本对应的embedding数据。
关于Bert Service的具体信息和demo请参见[Bert Service](../../tutorial/bert_service.md) 关于具体信息和demo请参见[Bert Service](../../tutorial/bert_service.md)
该示例展示了利用Bert Service进行远程embedding服务化部署和在线预测,并获取文本embedding结果。 该示例展示了利用`Bert Service`进行远程embedding服务化部署和在线预测,并获取文本embedding结果。
## 预训练模型一键服务部署 ## 预训练模型一键服务部署
预训练模型一键服务部署是基于PaddleHub的预训练模型快速部署的服务化方案,能够将模型预测以API接口的方式实现。 预训练模型一键服务部署是基于PaddleHub的预训练模型快速部署的服务化方案,能够将模型预测以API接口的方式实现。
...@@ -53,4 +53,4 @@ Bert Service是基于[Paddle Serving](https://github.com/PaddlePaddle/Serving) ...@@ -53,4 +53,4 @@ Bert Service是基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)
  该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。   该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
关于Paddle Serving预训练模型一键服务部署功能的具体信息请参见[serving](module_serving) 关于Paddle Serving预训练模型一键服务部署功能的具体信息请参见[Module Serving](module_serving)
...@@ -68,5 +68,4 @@ Paddle Inference Server exit successfully! ...@@ -68,5 +68,4 @@ Paddle Inference Server exit successfully!
这样,我们就利用一台GPU机器就完成了`Bert Service`的部署,并利用另一台普通机器进行了测试,可见通过`Bert Service`能够方便地进行在线embedding服务的快速部署。 这样,我们就利用一台GPU机器就完成了`Bert Service`的部署,并利用另一台普通机器进行了测试,可见通过`Bert Service`能够方便地进行在线embedding服务的快速部署。
## 预训练模型一键服务部署 ## 预训练模型一键服务部署
除了`Bert Service`外,PaddleHub 除了`Bert Service`外,PaddleHub Serving还具有预训练模型一键服务部署功能,能够将预训练模型快捷部署上线,对外提供可靠的在线预测服务,具体信息请参见[Module Serving](../../../tutorial/serving.md)
Serving还具有预训练模型一键服务部署功能,能够将预训练模型快捷部署上线,对外提供可靠的在线预测服务,具体信息请参见[Module Serving](../../../tutorial/serving.md)
# PaddleHub Serving模型一键服务部署 # PaddleHub Serving模型一键服务部署
## 简介 ## 简介
### 为什么使用一键服务部署 ### 为什么使用一键服务部署
使用PaddleHub能够快速进行迁移学习和模型预测,但开发者常面临将训练好的模型部署上线的需求,无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一句命令快速得到一个预测服务API,而无需关注网络框架选择和实现。 使用PaddleHub能够快速进行模型预测,但开发者常面临本地预测过程迁移线上的需求。无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一行命令即可快速启动一个模型预测在线服务,而无需关注网络框架选择和实现。
### 什么是一键服务部署 ### 什么是一键服务部署
PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。 PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。
### 支持模型 ### 支持模型
目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括`lac``senta_bilstm`等nlp类模型,以及`yolov3_coco2017``vgg16_imagenet`等cv类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。 目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括`lac``senta_bilstm`等NLP类模型,以及`yolov3_darknett53_coco2017``vgg16_imagenet`等CV类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
**NOTE:** 关于PaddleHub Serving一键服务部署的具体信息请参见[PaddleHub Servung](../../../tutorial/serving.md) **NOTE:** 关于PaddleHub Serving一键服务部署的具体信息请参见[PaddleHub Serving](../../../tutorial/serving.md)
## Demo——部署一个在线lac分词服务 ## Demo
### Step1:部署lac在线服务 获取PaddleHub Serving的一键服务部署场景示例,可参见下列demo:
现在,我们要部署一个lac在线服务,以通过接口获取文本的分词结果。
* [图像分类-基于vgg11_imagent](../module_serving/classification_vgg11_imagenet)
首先,根据2.1节所述,启动PaddleHub Serving服务端的两种方式分别为:
```shell
$ hub serving start -m lac
```
```shell
$ hub serving start -c serving_config.json
```
其中`serving_config.json`的内容如下:
```json
{
"modules_info": [
{
"module": "lac",
"version": "1.0.0",
"batch_size": 1
}
],
"use_gpu": false,
"port": 8866,
"use_multiprocess": false
}
```
启动成功界面如图:
<p align="center">
<img src="../demo/serving/module_serving/img/start_serving_lac.png" width="100%" />
</p>
这样我们就在8866端口部署了lac的在线分词服务。
*此处warning为Flask提示,不影响使用*
### Step2:访问lac预测接口
在服务部署好之后,我们可以进行测试,用来测试的文本为`今天是个好日子``天气预报说今天要下雨`
客户端代码如下:
```python
# coding: utf8
import requests
import json
if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["今天是个好日子", "天气预报说今天要下雨"]
text = {"text": text_list}
# 指定预测方法为lac并发送post请求
url = "http://127.0.0.1:8866/predict/text/lac"
r = requests.post(url=url, data=text)
# 打印预测结果
print(json.dumps(r.json(), indent=4, ensure_ascii=False))
```
运行后得到结果:
```python
{
"results": [
{
"tag": [
"TIME", "v", "q", "n"
],
"word": [
"今天", "是", "个", "好日子"
]
},
{
"tag": [
"n", "v", "TIME", "v", "v"
],
"word": [
"天气预报", "说", "今天", "要", "下雨"
]
}
]
}
```
## Demo——其他模型的一键部署服务
获取其他PaddleHub Serving的一键服务部署场景示例,可参见下列demo:
* [图像分类-基于vgg11_imagent](../demo/serving/module_serving/classification_vgg11_imagenet)
&emsp;&emsp;该示例展示了利用vgg11_imagent完成图像分类服务化部署和在线预测,获取图像分类结果。 &emsp;&emsp;该示例展示了利用vgg11_imagent完成图像分类服务化部署和在线预测,获取图像分类结果。
* [图像生成-基于stgan_celeba](../demo/serving/module_serving/GAN_stgan_celeba) * [图像生成-基于stgan_celeba](../module_serving/GAN_stgan_celeba)
&emsp;&emsp;该示例展示了利用stgan_celeba生成图像服务化部署和在线预测,获取指定风格的生成图像。 &emsp;&emsp;该示例展示了利用stgan_celeba生成图像服务化部署和在线预测,获取指定风格的生成图像。
* [文本审核-基于porn_detection_lstm](../demo/serving/module_serving/text_censorship_porn_detection_lstm) * [文本审核-基于porn_detection_lstm](../module_serving/text_censorship_porn_detection_lstm)
&emsp;&emsp;该示例展示了利用porn_detection_lstm完成中文文本黄色敏感信息鉴定的服务化部署和在线预测,获取文本是否敏感及其置信度。 &emsp;&emsp;该示例展示了利用porn_detection_lstm完成中文文本黄色敏感信息鉴定的服务化部署和在线预测,获取文本是否敏感及其置信度。
* [中文词法分析-基于lac](../demo/serving/module_serving/lexical_analysis_lac) * [中文词法分析-基于lac](../module_serving/lexical_analysis_lac)
&emsp;&emsp;该示例展示了利用lac完成中文文本分词服务化部署和在线预测,获取文本的分词结果,并可通过用户自定义词典干预分词结果。 &emsp;&emsp;该示例展示了利用lac完成中文文本分词服务化部署和在线预测,获取文本的分词结果,并可通过用户自定义词典干预分词结果。
* [目标检测-基于yolov3_darknet53_coco2017](.../demo/serving/serving/object_detection_yolov3_darknet53_coco2017) * [目标检测-基于yolov3_darknet53_coco2017](../module_serving/object_detection_yolov3_darknet53_coco2017)
&emsp;&emsp;该示例展示了利用yolov3_darknet53_coco2017完成目标检测服务化部署和在线预测,获取检测结果和覆盖识别框的图片。 &emsp;&emsp;该示例展示了利用yolov3_darknet53_coco2017完成目标检测服务化部署和在线预测,获取检测结果和覆盖识别框的图片。
* [中文语义分析-基于simnet_bow](../demo/serving/module_serving/semantic_model_simnet_bow) * [中文语义分析-基于simnet_bow](../module_serving/semantic_model_simnet_bow)
&emsp;&emsp;该示例展示了利用simnet_bow完成中文文本相似度检测服务化部署和在线预测,获取文本的相似程度。 &emsp;&emsp;该示例展示了利用simnet_bow完成中文文本相似度检测服务化部署和在线预测,获取文本的相似程度。
* [图像分割-基于deeplabv3p_xception65_humanseg](../demo/serving/module_serving/semantic_segmentation_deeplabv3p_xception65_humanseg) * [图像分割-基于deeplabv3p_xception65_humanseg](../module_serving/semantic_segmentation_deeplabv3p_xception65_humanseg)
&emsp;&emsp;该示例展示了利用deeplabv3p_xception65_humanseg完成图像分割服务化部署和在线预测,获取识别结果和分割后的图像。 &emsp;&emsp;该示例展示了利用deeplabv3p_xception65_humanseg完成图像分割服务化部署和在线预测,获取识别结果和分割后的图像。
* [中文情感分析-基于simnet_bow](../demo/serving/module_serving/semantic_model_simnet_bow) * [中文情感分析-基于simnet_bow](../module_serving/semantic_model_simnet_bow)
&emsp;&emsp;该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。 &emsp;&emsp;该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
## Bert Service ## Bert Service
除了预训练模型一键服务部署功能外,PaddleHub Serving还具有`Bert Service`功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见[Bert Service](../../../tutorial/bert_service.md) 除了预训练模型一键服务部署功能外,PaddleHub Serving还具有`Bert Service`功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见[Bert Service](../../../tutorial/bert_service.md)
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
这里就带领大家使用PaddleHub Serving,通过简单几步部署一个词法分析在线服务。 这里就带领大家使用PaddleHub Serving,通过简单几步部署一个词法分析在线服务。
## 2 启动PaddleHub Serving ## Step1:启动PaddleHub Serving
启动命令如下 启动命令如下
```shell ```shell
$ hub serving start -m lac $ hub serving start -m lac
......
...@@ -3,7 +3,7 @@ import requests ...@@ -3,7 +3,7 @@ import requests
import json import json
if __name__ == "__main__": if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]} # 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["今天是个好日子", "天气预报说今天要下雨"] text_list = ["今天是个好日子", "天气预报说今天要下雨"]
text = {"text": text_list} text = {"text": text_list}
# 指定预测方法为lac并发送post请求 # 指定预测方法为lac并发送post请求
......
...@@ -3,7 +3,7 @@ import requests ...@@ -3,7 +3,7 @@ import requests
import json import json
if __name__ == "__main__": if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]} # 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["今天是个好日子", "天气预报说今天要下雨"] text_list = ["今天是个好日子", "天气预报说今天要下雨"]
text = {"text": text_list} text = {"text": text_list}
# 指定自定义词典{"user_dict": dict.txt} # 指定自定义词典{"user_dict": dict.txt}
......
...@@ -3,7 +3,7 @@ import requests ...@@ -3,7 +3,7 @@ import requests
import json import json
if __name__ == "__main__": if __name__ == "__main__":
# 指定用于用于匹配的文本并生成字典{"text_1": [text_a1, text_a2, ... ] # 指定用于匹配的文本并生成字典{"text_1": [text_a1, text_a2, ... ]
# "text_2": [text_b1, text_b2, ... ]} # "text_2": [text_b1, text_b2, ... ]}
text = { text = {
"text_1": ["这道题太难了", "这道题太难了", "这道题太难了"], "text_1": ["这道题太难了", "这道题太难了", "这道题太难了"],
......
...@@ -3,7 +3,7 @@ import requests ...@@ -3,7 +3,7 @@ import requests
import json import json
if __name__ == "__main__": if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]} # 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["我不爱吃甜食", "我喜欢躺在床上看电影"] text_list = ["我不爱吃甜食", "我喜欢躺在床上看电影"]
text = {"text": text_list} text = {"text": text_list}
# 指定预测方法为senta_lstm并发送post请求 # 指定预测方法为senta_lstm并发送post请求
......
...@@ -3,7 +3,7 @@ import requests ...@@ -3,7 +3,7 @@ import requests
import json import json
if __name__ == "__main__": if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]} # 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["黄片下载", "中国黄页"] text_list = ["黄片下载", "中国黄页"]
text = {"text": text_list} text = {"text": text_list}
# 指定预测方法为lac并发送post请求 # 指定预测方法为lac并发送post请求
......
...@@ -26,14 +26,22 @@ import paddlehub as hub ...@@ -26,14 +26,22 @@ import paddlehub as hub
from paddlehub.commands.base_command import BaseCommand, ENTRY from paddlehub.commands.base_command import BaseCommand, ENTRY
from paddlehub.serving import app_single as app from paddlehub.serving import app_single as app
import multiprocessing import multiprocessing
import gunicorn.app.base
if platform.system() == "Windows":
def number_of_workers(): class StandaloneApplication(object):
return (multiprocessing.cpu_count() * 2) + 1 def __init__(self):
pass
def load_config(self):
pass
def load(self):
pass
else:
import gunicorn.app.base
class StandaloneApplication(gunicorn.app.base.BaseApplication): class StandaloneApplication(gunicorn.app.base.BaseApplication):
def __init__(self, app, options=None): def __init__(self, app, options=None):
self.options = options or {} self.options = options or {}
self.application = app self.application = app
...@@ -52,6 +60,10 @@ class StandaloneApplication(gunicorn.app.base.BaseApplication): ...@@ -52,6 +60,10 @@ class StandaloneApplication(gunicorn.app.base.BaseApplication):
return self.application return self.application
def number_of_workers():
return (multiprocessing.cpu_count() * 2) + 1
class ServingCommand(BaseCommand): class ServingCommand(BaseCommand):
name = "serving" name = "serving"
module_list = [] module_list = []
......
...@@ -29,7 +29,7 @@ import tarfile ...@@ -29,7 +29,7 @@ import tarfile
from paddlehub.common import utils from paddlehub.common import utils
from paddlehub.common.logger import logger from paddlehub.common.logger import logger
__all__ = ['Downloader'] __all__ = ['Downloader', 'progress']
FLUSH_INTERVAL = 0.1 FLUSH_INTERVAL = 0.1
lasttime = time.time() lasttime = time.time()
......
...@@ -26,7 +26,7 @@ from paddlehub.common.downloader import default_downloader ...@@ -26,7 +26,7 @@ from paddlehub.common.downloader import default_downloader
from paddlehub.common.logger import logger from paddlehub.common.logger import logger
class BaseCVDatast(BaseDataset): class BaseCVDataset(BaseDataset):
def __init__(self, def __init__(self,
base_path, base_path,
train_list_file=None, train_list_file=None,
...@@ -35,7 +35,7 @@ class BaseCVDatast(BaseDataset): ...@@ -35,7 +35,7 @@ class BaseCVDatast(BaseDataset):
predict_list_file=None, predict_list_file=None,
label_list_file=None, label_list_file=None,
label_list=None): label_list=None):
super(BaseCVDatast, self).__init__( super(BaseCVDataset, self).__init__(
base_path=base_path, base_path=base_path,
train_file=train_list_file, train_file=train_list_file,
dev_file=validate_list_file, dev_file=validate_list_file,
...@@ -65,7 +65,7 @@ class BaseCVDatast(BaseDataset): ...@@ -65,7 +65,7 @@ class BaseCVDatast(BaseDataset):
return data return data
# discarded. please use BaseCVDatast # discarded. please use BaseCVDataset
class ImageClassificationDataset(object): class ImageClassificationDataset(object):
def __init__(self): def __init__(self):
logger.warning( logger.warning(
......
...@@ -21,9 +21,10 @@ import io ...@@ -21,9 +21,10 @@ import io
import csv import csv
from paddlehub.dataset import InputExample, BaseDataset from paddlehub.dataset import InputExample, BaseDataset
from paddlehub.common.logger import logger
class BaseNLPDatast(BaseDataset): class BaseNLPDataset(BaseDataset):
def __init__(self, def __init__(self,
base_path, base_path,
train_file=None, train_file=None,
...@@ -32,11 +33,11 @@ class BaseNLPDatast(BaseDataset): ...@@ -32,11 +33,11 @@ class BaseNLPDatast(BaseDataset):
predict_file=None, predict_file=None,
label_file=None, label_file=None,
label_list=None, label_list=None,
train_file_with_head=False, train_file_with_header=False,
dev_file_with_head=False, dev_file_with_header=False,
test_file_with_head=False, test_file_with_header=False,
predict_file_with_head=False): predict_file_with_header=False):
super(BaseNLPDatast, self).__init__( super(BaseNLPDataset, self).__init__(
base_path=base_path, base_path=base_path,
train_file=train_file, train_file=train_file,
dev_file=dev_file, dev_file=dev_file,
...@@ -44,25 +45,24 @@ class BaseNLPDatast(BaseDataset): ...@@ -44,25 +45,24 @@ class BaseNLPDatast(BaseDataset):
predict_file=predict_file, predict_file=predict_file,
label_file=label_file, label_file=label_file,
label_list=label_list, label_list=label_list,
train_file_with_head=train_file_with_head, train_file_with_header=train_file_with_header,
dev_file_with_head=dev_file_with_head, dev_file_with_header=dev_file_with_header,
test_file_with_head=test_file_with_head, test_file_with_header=test_file_with_header,
predict_file_with_head=predict_file_with_head) predict_file_with_header=predict_file_with_header)
def _read_file(self, input_file, phase=None): def _read_file(self, input_file, phase=None):
"""Reads a tab separated value file.""" """Reads a tab separated value file."""
has_warned = False
with io.open(input_file, "r", encoding="UTF-8") as file: with io.open(input_file, "r", encoding="UTF-8") as file:
reader = csv.reader(file, delimiter="\t", quotechar=None) reader = csv.reader(file, delimiter="\t", quotechar=None)
examples = [] examples = []
for (i, line) in enumerate(reader): for (i, line) in enumerate(reader):
if i == 0: if i == 0:
ncol = len(line) ncol = len(line)
if self.if_file_with_head[phase]: if self.if_file_with_header[phase]:
continue continue
if ncol == 1:
if phase != "predict": if phase != "predict":
example = InputExample(guid=i, text_a=line[0]) if ncol == 1:
else:
raise Exception( raise Exception(
"the %s file: %s only has one column but it is not a predict file" "the %s file: %s only has one column but it is not a predict file"
% (phase, input_file)) % (phase, input_file))
...@@ -71,10 +71,28 @@ class BaseNLPDatast(BaseDataset): ...@@ -71,10 +71,28 @@ class BaseNLPDatast(BaseDataset):
guid=i, text_a=line[0], label=line[1]) guid=i, text_a=line[0], label=line[1])
elif ncol == 3: elif ncol == 3:
example = InputExample( example = InputExample(
guid=i, text_a=line[0], text_b=line[1], label=line[2]) guid=i,
text_a=line[0],
text_b=line[1],
label=line[2])
else:
raise Exception(
"the %s file: %s has too many columns (should <=3)"
% (phase, input_file))
else:
if ncol == 1:
example = InputExample(guid=i, text_a=line[0])
elif ncol == 2:
if not has_warned:
logger.warning(
"the predict file: %s has 2 columns, as it is a predict file, the second one will be regarded as text_b"
% (input_file))
has_warned = True
example = InputExample(
guid=i, text_a=line[0], text_b=line[1])
else: else:
raise Exception( raise Exception(
"the %s file: %s has too many columns (should <=3)" % "the predict file: %s has too many columns (should <=2)"
(phase, input_file)) % (input_file))
examples.append(example) examples.append(example)
return examples return examples
...@@ -20,10 +20,10 @@ from __future__ import print_function ...@@ -20,10 +20,10 @@ from __future__ import print_function
import os import os
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
class BQ(BaseNLPDatast): class BQ(BaseNLPDataset):
def __init__(self): def __init__(self):
dataset_dir = os.path.join(DATA_HOME, "bq") dataset_dir = os.path.join(DATA_HOME, "bq")
base_path = self._download_dataset( base_path = self._download_dataset(
......
...@@ -23,10 +23,10 @@ import csv ...@@ -23,10 +23,10 @@ import csv
from paddlehub.dataset import InputExample from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
class ChnSentiCorp(BaseNLPDatast): class ChnSentiCorp(BaseNLPDataset):
""" """
ChnSentiCorp (by Tan Songbo at ICT of Chinese Academy of Sciences, and for ChnSentiCorp (by Tan Songbo at ICT of Chinese Academy of Sciences, and for
opinion mining) opinion mining)
......
...@@ -20,7 +20,7 @@ import os ...@@ -20,7 +20,7 @@ import os
from paddlehub.reader import tokenization from paddlehub.reader import tokenization
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.common.logger import logger from paddlehub.common.logger import logger
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/cmrc2018.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/cmrc2018.tar.gz"
SPIECE_UNDERLINE = '▁' SPIECE_UNDERLINE = '▁'
...@@ -62,7 +62,7 @@ class CMRC2018Example(object): ...@@ -62,7 +62,7 @@ class CMRC2018Example(object):
return s return s
class CMRC2018(BaseNLPDatast): class CMRC2018(BaseNLPDataset):
"""A single set of features of data.""" """A single set of features of data."""
def __init__(self): def __init__(self):
......
...@@ -64,10 +64,10 @@ class BaseDataset(object): ...@@ -64,10 +64,10 @@ class BaseDataset(object):
predict_file=None, predict_file=None,
label_file=None, label_file=None,
label_list=None, label_list=None,
train_file_with_head=False, train_file_with_header=False,
dev_file_with_head=False, dev_file_with_header=False,
test_file_with_head=False, test_file_with_header=False,
predict_file_with_head=False): predict_file_with_header=False):
if not (train_file or dev_file or test_file): if not (train_file or dev_file or test_file):
raise ValueError("At least one file should be assigned") raise ValueError("At least one file should be assigned")
self.base_path = base_path self.base_path = base_path
...@@ -83,11 +83,11 @@ class BaseDataset(object): ...@@ -83,11 +83,11 @@ class BaseDataset(object):
self.test_examples = [] self.test_examples = []
self.predict_examples = [] self.predict_examples = []
self.if_file_with_head = { self.if_file_with_header = {
"train": train_file_with_head, "train": train_file_with_header,
"dev": dev_file_with_head, "dev": dev_file_with_header,
"test": test_file_with_head, "test": test_file_with_header,
"predict": predict_file_with_head "predict": predict_file_with_header
} }
if train_file: if train_file:
...@@ -128,7 +128,7 @@ class BaseDataset(object): ...@@ -128,7 +128,7 @@ class BaseDataset(object):
def num_labels(self): def num_labels(self):
return len(self.label_list) return len(self.label_list)
# To compatibility with the usage of ImageClassificationDataset # To be compatible with ImageClassificationDataset
def label_dict(self): def label_dict(self):
return {index: key for index, key in enumerate(self.label_list)} return {index: key for index, key in enumerate(self.label_list)}
......
...@@ -20,10 +20,10 @@ from __future__ import print_function ...@@ -20,10 +20,10 @@ from __future__ import print_function
import os import os
import paddlehub as hub import paddlehub as hub
from paddlehub.dataset.base_cv_dataset import BaseCVDatast from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class DogCatDataset(BaseCVDatast): class DogCatDataset(BaseCVDataset):
def __init__(self): def __init__(self):
dataset_path = os.path.join(hub.common.dir.DATA_HOME, "dog-cat") dataset_path = os.path.join(hub.common.dir.DATA_HOME, "dog-cat")
base_path = self._download_dataset( base_path = self._download_dataset(
......
...@@ -20,7 +20,7 @@ import os ...@@ -20,7 +20,7 @@ import os
from paddlehub.reader import tokenization from paddlehub.reader import tokenization
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.common.logger import logger from paddlehub.common.logger import logger
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/drcd.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/drcd.tar.gz"
SPIECE_UNDERLINE = '▁' SPIECE_UNDERLINE = '▁'
...@@ -62,7 +62,7 @@ class DRCDExample(object): ...@@ -62,7 +62,7 @@ class DRCDExample(object):
return s return s
class DRCD(BaseNLPDatast): class DRCD(BaseNLPDataset):
"""A single set of features of data.""" """A single set of features of data."""
def __init__(self): def __init__(self):
......
...@@ -20,10 +20,10 @@ from __future__ import print_function ...@@ -20,10 +20,10 @@ from __future__ import print_function
import os import os
import paddlehub as hub import paddlehub as hub
from paddlehub.dataset.base_cv_dataset import BaseCVDatast from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class FlowersDataset(BaseCVDatast): class FlowersDataset(BaseCVDataset):
def __init__(self): def __init__(self):
dataset_path = os.path.join(hub.common.dir.DATA_HOME, "flower_photos") dataset_path = os.path.join(hub.common.dir.DATA_HOME, "flower_photos")
base_path = self._download_dataset( base_path = self._download_dataset(
......
...@@ -20,10 +20,10 @@ from __future__ import print_function ...@@ -20,10 +20,10 @@ from __future__ import print_function
import os import os
import paddlehub as hub import paddlehub as hub
from paddlehub.dataset.base_cv_dataset import BaseCVDatast from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class Food101Dataset(BaseCVDatast): class Food101Dataset(BaseCVDataset):
def __init__(self): def __init__(self):
dataset_path = os.path.join(hub.common.dir.DATA_HOME, "food-101", dataset_path = os.path.join(hub.common.dir.DATA_HOME, "food-101",
"images") "images")
......
...@@ -24,12 +24,12 @@ import io ...@@ -24,12 +24,12 @@ import io
from paddlehub.dataset import InputExample from paddlehub.dataset import InputExample
from paddlehub.common.logger import logger from paddlehub.common.logger import logger
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/glue_data.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/glue_data.tar.gz"
class GLUE(BaseNLPDatast): class GLUE(BaseNLPDataset):
""" """
Please refer to Please refer to
https://gluebenchmark.com https://gluebenchmark.com
......
...@@ -22,12 +22,12 @@ import os ...@@ -22,12 +22,12 @@ import os
from paddlehub.dataset import InputExample from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/iflytek.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/iflytek.tar.gz"
class IFLYTEK(BaseNLPDatast): class IFLYTEK(BaseNLPDataset):
def __init__(self): def __init__(self):
dataset_dir = os.path.join(DATA_HOME, "iflytek") dataset_dir = os.path.join(DATA_HOME, "iflytek")
base_path = self._download_dataset(dataset_dir, url=_DATA_URL) base_path = self._download_dataset(dataset_dir, url=_DATA_URL)
......
...@@ -20,10 +20,10 @@ from __future__ import print_function ...@@ -20,10 +20,10 @@ from __future__ import print_function
import os import os
import paddlehub as hub import paddlehub as hub
from paddlehub.dataset.base_cv_dataset import BaseCVDatast from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class Indoor67Dataset(BaseCVDatast): class Indoor67Dataset(BaseCVDataset):
def __init__(self): def __init__(self):
dataset_path = os.path.join(hub.common.dir.DATA_HOME, "Indoor67") dataset_path = os.path.join(hub.common.dir.DATA_HOME, "Indoor67")
base_path = self._download_dataset( base_path = self._download_dataset(
......
...@@ -23,12 +23,12 @@ import csv ...@@ -23,12 +23,12 @@ import csv
from paddlehub.dataset import InputExample from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/inews.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/inews.tar.gz"
class INews(BaseNLPDatast): class INews(BaseNLPDataset):
""" """
INews is a sentiment analysis dataset for Internet News INews is a sentiment analysis dataset for Internet News
""" """
......
...@@ -23,12 +23,12 @@ import csv ...@@ -23,12 +23,12 @@ import csv
from paddlehub.dataset import InputExample from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/lcqmc.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/lcqmc.tar.gz"
class LCQMC(BaseNLPDatast): class LCQMC(BaseNLPDataset):
def __init__(self): def __init__(self):
dataset_dir = os.path.join(DATA_HOME, "lcqmc") dataset_dir = os.path.join(DATA_HOME, "lcqmc")
base_path = self._download_dataset(dataset_dir, url=_DATA_URL) base_path = self._download_dataset(dataset_dir, url=_DATA_URL)
......
...@@ -23,12 +23,12 @@ import csv ...@@ -23,12 +23,12 @@ import csv
from paddlehub.dataset import InputExample from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/msra_ner.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/msra_ner.tar.gz"
class MSRA_NER(BaseNLPDatast): class MSRA_NER(BaseNLPDataset):
""" """
A set of manually annotated Chinese word-segmentation data and A set of manually annotated Chinese word-segmentation data and
specifications for training and testing a Chinese word-segmentation system specifications for training and testing a Chinese word-segmentation system
......
...@@ -23,12 +23,12 @@ import csv ...@@ -23,12 +23,12 @@ import csv
from paddlehub.dataset import InputExample from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/nlpcc-dbqa.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/nlpcc-dbqa.tar.gz"
class NLPCC_DBQA(BaseNLPDatast): class NLPCC_DBQA(BaseNLPDataset):
""" """
Please refer to Please refer to
http://tcci.ccf.org.cn/conference/2017/dldoc/taskgline05.pdf http://tcci.ccf.org.cn/conference/2017/dldoc/taskgline05.pdf
......
...@@ -20,7 +20,7 @@ import os ...@@ -20,7 +20,7 @@ import os
from paddlehub.reader import tokenization from paddlehub.reader import tokenization
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.common.logger import logger from paddlehub.common.logger import logger
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/squad.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/squad.tar.gz"
...@@ -65,7 +65,7 @@ class SquadExample(object): ...@@ -65,7 +65,7 @@ class SquadExample(object):
return s return s
class SQUAD(BaseNLPDatast): class SQUAD(BaseNLPDataset):
"""A single set of features of data.""" """A single set of features of data."""
def __init__(self, version_2_with_negative=False): def __init__(self, version_2_with_negative=False):
......
...@@ -20,10 +20,10 @@ from __future__ import print_function ...@@ -20,10 +20,10 @@ from __future__ import print_function
import os import os
import paddlehub as hub import paddlehub as hub
from paddlehub.dataset.base_cv_dataset import BaseCVDatast from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class StanfordDogsDataset(BaseCVDatast): class StanfordDogsDataset(BaseCVDataset):
def __init__(self): def __init__(self):
dataset_path = os.path.join(hub.common.dir.DATA_HOME, dataset_path = os.path.join(hub.common.dir.DATA_HOME,
"StanfordDogs-120") "StanfordDogs-120")
......
...@@ -22,12 +22,12 @@ import os ...@@ -22,12 +22,12 @@ import os
from paddlehub.dataset import InputExample from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/thucnews.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/thucnews.tar.gz"
class THUCNEWS(BaseNLPDatast): class THUCNEWS(BaseNLPDataset):
def __init__(self): def __init__(self):
dataset_dir = os.path.join(DATA_HOME, "thucnews") dataset_dir = os.path.join(DATA_HOME, "thucnews")
base_path = self._download_dataset(dataset_dir, url=_DATA_URL) base_path = self._download_dataset(dataset_dir, url=_DATA_URL)
......
...@@ -22,12 +22,12 @@ import pandas as pd ...@@ -22,12 +22,12 @@ import pandas as pd
from paddlehub.dataset import InputExample from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/toxic.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/toxic.tar.gz"
class Toxic(BaseNLPDatast): class Toxic(BaseNLPDataset):
""" """
The kaggle Toxic dataset: The kaggle Toxic dataset:
https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
......
...@@ -25,12 +25,12 @@ import csv ...@@ -25,12 +25,12 @@ import csv
from paddlehub.dataset import InputExample from paddlehub.dataset import InputExample
from paddlehub.common.dir import DATA_HOME from paddlehub.common.dir import DATA_HOME
from paddlehub.dataset.base_nlp_dataset import BaseNLPDatast from paddlehub.dataset.base_nlp_dataset import BaseNLPDataset
_DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/XNLI-lan.tar.gz" _DATA_URL = "https://bj.bcebos.com/paddlehub-dataset/XNLI-lan.tar.gz"
class XNLI(BaseNLPDatast): class XNLI(BaseNLPDataset):
""" """
Please refer to Please refer to
https://arxiv.org/pdf/1809.05053.pdf https://arxiv.org/pdf/1809.05053.pdf
......
...@@ -24,7 +24,12 @@ import copy ...@@ -24,7 +24,12 @@ import copy
import logging import logging
import inspect import inspect
from functools import partial from functools import partial
from collections import OrderedDict
import six
if six.PY2:
from inspect import getargspec as get_args
else:
from inspect import getfullargspec as get_args
import numpy as np import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
from tb_paddle import SummaryWriter from tb_paddle import SummaryWriter
...@@ -84,44 +89,44 @@ class RunEnv(object): ...@@ -84,44 +89,44 @@ class RunEnv(object):
class TaskHooks(): class TaskHooks():
def __init__(self): def __init__(self):
self._registered_hooks = { self._registered_hooks = {
"build_env_start": {}, "build_env_start_event": OrderedDict(),
"build_env_end": {}, "build_env_end_event": OrderedDict(),
"finetune_start": {}, "finetune_start_event": OrderedDict(),
"finetune_end": {}, "finetune_end_event": OrderedDict(),
"predict_start": {}, "predict_start_event": OrderedDict(),
"predict_end": {}, "predict_end_event": OrderedDict(),
"eval_start": {}, "eval_start_event": OrderedDict(),
"eval_end": {}, "eval_end_event": OrderedDict(),
"log_interval": {}, "log_interval_event": OrderedDict(),
"save_ckpt_interval": {}, "save_ckpt_interval_event": OrderedDict(),
"eval_interval": {}, "eval_interval_event": OrderedDict(),
"run_step": {}, "run_step_event": OrderedDict(),
} }
self._hook_params_num = { self._hook_params_num = {
"build_env_start": 1, "build_env_start_event": 1,
"build_env_end": 1, "build_env_end_event": 1,
"finetune_start": 1, "finetune_start_event": 1,
"finetune_end": 2, "finetune_end_event": 2,
"predict_start": 1, "predict_start_event": 1,
"predict_end": 2, "predict_end_event": 2,
"eval_start": 1, "eval_start_event": 1,
"eval_end": 2, "eval_end_event": 2,
"log_interval": 2, "log_interval_event": 2,
"save_ckpt_interval": 1, "save_ckpt_interval_event": 1,
"eval_interval": 1, "eval_interval_event": 1,
"run_step": 2, "run_step_event": 2,
} }
def add(self, hook_type, name=None, func=None): def add(self, hook_type, name=None, func=None):
if not func or not callable(func): if not func or not callable(func):
raise TypeError( raise TypeError(
"The hook function is empty or it is not a function") "The hook function is empty or it is not a function")
if name and not isinstance(name, str): if name == None:
raise TypeError("The hook name must be a string")
if not name:
name = "hook_%s" % id(func) name = "hook_%s" % id(func)
# check validity # check validity
if not isinstance(name, str) or name.strip() == "":
raise TypeError("The hook name must be a non-empty string")
if hook_type not in self._registered_hooks: if hook_type not in self._registered_hooks:
raise ValueError("hook_type: %s does not exist" % (hook_type)) raise ValueError("hook_type: %s does not exist" % (hook_type))
if name in self._registered_hooks[hook_type]: if name in self._registered_hooks[hook_type]:
...@@ -129,7 +134,7 @@ class TaskHooks(): ...@@ -129,7 +134,7 @@ class TaskHooks():
"name: %s has existed in hook_type:%s, use modify method to modify it" "name: %s has existed in hook_type:%s, use modify method to modify it"
% (name, hook_type)) % (name, hook_type))
else: else:
args_num = len(inspect.getfullargspec(func).args) args_num = len(get_args(func).args)
if args_num != self._hook_params_num[hook_type]: if args_num != self._hook_params_num[hook_type]:
raise ValueError( raise ValueError(
"The number of parameters to the hook hook_type:%s should be %i" "The number of parameters to the hook hook_type:%s should be %i"
...@@ -163,13 +168,13 @@ class TaskHooks(): ...@@ -163,13 +168,13 @@ class TaskHooks():
else: else:
return True return True
def info(self, only_customized=True): def info(self, show_default=False):
# formatted output the source code # formatted output the source code
ret = "" ret = ""
for hook_type, hooks in self._registered_hooks.items(): for hook_type, hooks in self._registered_hooks.items():
already_print_type = False already_print_type = False
for name, func in hooks.items(): for name, func in hooks.items():
if name == "default" and only_customized: if name == "default" and not show_default:
continue continue
if not already_print_type: if not already_print_type:
ret += "hook_type: %s{\n" % hook_type ret += "hook_type: %s{\n" % hook_type
...@@ -182,7 +187,7 @@ class TaskHooks(): ...@@ -182,7 +187,7 @@ class TaskHooks():
if already_print_type: if already_print_type:
ret += "}\n" ret += "}\n"
if not ret: if not ret:
ret = "Not any hooks when only_customized=%s" % only_customized ret = "Not any customized hooks have been defined, you can set show_default=True to see the default hooks information"
return ret return ret
def __getitem__(self, hook_type): def __getitem__(self, hook_type):
...@@ -259,8 +264,8 @@ class BaseTask(object): ...@@ -259,8 +264,8 @@ class BaseTask(object):
self._hooks = TaskHooks() self._hooks = TaskHooks()
for hook_type, event_hooks in self._hooks._registered_hooks.items(): for hook_type, event_hooks in self._hooks._registered_hooks.items():
self._hooks.add(hook_type, "default", self._hooks.add(hook_type, "default",
eval("self._default_%s_event" % hook_type)) eval("self._default_%s" % hook_type))
setattr(BaseTask, "_%s_event" % hook_type, setattr(BaseTask, "_%s" % hook_type,
self.create_event_function(hook_type)) self.create_event_function(hook_type))
# accelerate predict # accelerate predict
...@@ -581,13 +586,18 @@ class BaseTask(object): ...@@ -581,13 +586,18 @@ class BaseTask(object):
return self._hooks.info(only_customized) return self._hooks.info(only_customized)
def add_hook(self, hook_type, name=None, func=None): def add_hook(self, hook_type, name=None, func=None):
if name == None:
name = "hook_%s" % id(func)
self._hooks.add(hook_type, name=name, func=func) self._hooks.add(hook_type, name=name, func=func)
logger.info("Add hook %s:%s successfully" % (hook_type, name))
def delete_hook(self, hook_type, name): def delete_hook(self, hook_type, name):
self._hooks.delete(hook_type, name) self._hooks.delete(hook_type, name)
logger.info("Delete hook %s:%s successfully" % (hook_type, name))
def modify_hook(self, hook_type, name, func): def modify_hook(self, hook_type, name, func):
self._hooks.modify(hook_type, name, func) self._hooks.modify(hook_type, name, func)
logger.info("Modify hook %s:%s successfully" % (hook_type, name))
def _default_build_env_start_event(self): def _default_build_env_start_event(self):
pass pass
......
...@@ -142,7 +142,7 @@ class ClassifierTask(BaseTask): ...@@ -142,7 +142,7 @@ class ClassifierTask(BaseTask):
} }
except: except:
raise Exception( raise Exception(
"ImageClassificationDataset does not support postprocessing, please use BaseCVDatast instead" "ImageClassificationDataset does not support postprocessing, please use BaseCVDataset instead"
) )
results = [] results = []
for batch_state in run_states: for batch_state in run_states:
......
...@@ -26,6 +26,7 @@ import json ...@@ -26,6 +26,7 @@ import json
from collections import OrderedDict from collections import OrderedDict
import io
import numpy as np import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
from .base_task import BaseTask from .base_task import BaseTask
...@@ -517,13 +518,13 @@ class ReadingComprehensionTask(BaseTask): ...@@ -517,13 +518,13 @@ class ReadingComprehensionTask(BaseTask):
null_score_diff_threshold=self.null_score_diff_threshold, null_score_diff_threshold=self.null_score_diff_threshold,
is_english=self.is_english) is_english=self.is_english)
if self.phase == 'val' or self.phase == 'dev': if self.phase == 'val' or self.phase == 'dev':
with open( with io.open(
self.data_reader.dataset.dev_path, 'r', self.data_reader.dataset.dev_path, 'r',
encoding="utf8") as dataset_file: encoding="utf8") as dataset_file:
dataset_json = json.load(dataset_file) dataset_json = json.load(dataset_file)
dataset = dataset_json['data'] dataset = dataset_json['data']
elif self.phase == 'test': elif self.phase == 'test':
with open( with io.open(
self.data_reader.dataset.test_path, 'r', self.data_reader.dataset.test_path, 'r',
encoding="utf8") as dataset_file: encoding="utf8") as dataset_file:
dataset_json = json.load(dataset_file) dataset_json = json.load(dataset_file)
......
...@@ -168,8 +168,7 @@ class LocalModuleManager(object): ...@@ -168,8 +168,7 @@ class LocalModuleManager(object):
with tarfile.open(module_package, "r:gz") as tar: with tarfile.open(module_package, "r:gz") as tar:
file_names = tar.getnames() file_names = tar.getnames()
size = len(file_names) - 1 size = len(file_names) - 1
module_dir = os.path.split(file_names[0])[0] module_dir = os.path.join(_dir, file_names[0])
module_dir = os.path.join(_dir, module_dir)
for index, file_name in enumerate(file_names): for index, file_name in enumerate(file_names):
tar.extract(file_name, _dir) tar.extract(file_name, _dir)
...@@ -195,7 +194,7 @@ class LocalModuleManager(object): ...@@ -195,7 +194,7 @@ class LocalModuleManager(object):
save_path = os.path.join(MODULE_HOME, module_name) save_path = os.path.join(MODULE_HOME, module_name)
if os.path.exists(save_path): if os.path.exists(save_path):
shutil.move(save_path) shutil.rmtree(save_path)
if from_user_dir: if from_user_dir:
shutil.copytree(module_dir, save_path) shutil.copytree(module_dir, save_path)
else: else:
......
...@@ -37,6 +37,7 @@ from paddlehub.common.lock import lock ...@@ -37,6 +37,7 @@ from paddlehub.common.lock import lock
from paddlehub.common.logger import logger from paddlehub.common.logger import logger
from paddlehub.common.hub_server import CacheUpdater from paddlehub.common.hub_server import CacheUpdater
from paddlehub.common import tmp_dir from paddlehub.common import tmp_dir
from paddlehub.common.downloader import progress
from paddlehub.module import module_desc_pb2 from paddlehub.module import module_desc_pb2
from paddlehub.module.manager import default_module_manager from paddlehub.module.manager import default_module_manager
from paddlehub.module.checker import ModuleChecker from paddlehub.module.checker import ModuleChecker
...@@ -99,10 +100,22 @@ def create_module(directory, name, author, email, module_type, summary, ...@@ -99,10 +100,22 @@ def create_module(directory, name, author, email, module_type, summary,
_cwd = os.getcwd() _cwd = os.getcwd()
os.chdir(base_dir) os.chdir(base_dir)
for dirname, _, files in os.walk(module_dir): module_dir = module_dir.replace(base_dir, ".")
for file in files: tar.add(module_dir, recursive=False)
tar.add(os.path.join(dirname, file).replace(base_dir, ".")) files = []
for dirname, _, subfiles in os.walk(module_dir):
for file in subfiles:
files.append(os.path.join(dirname, file))
total_length = len(files)
print("Create Module {}-{}".format(name, version))
for index, file in enumerate(files):
done = int(float(index) / total_length * 50)
progress("[%-50s] %.2f%%" % ('=' * done,
float(index / total_length * 100)))
tar.add(file)
progress("[%-50s] %.2f%%" % ('=' * 50, 100), end=True)
print("Module package saved as {}".format(save_file))
os.chdir(_cwd) os.chdir(_cwd)
......
...@@ -170,7 +170,7 @@ class WSSPTokenizer(object): ...@@ -170,7 +170,7 @@ class WSSPTokenizer(object):
self.inv_vocab = {v: k for k, v in self.vocab.items()} self.inv_vocab = {v: k for k, v in self.vocab.items()}
self.ws = ws self.ws = ws
self.lower = lower self.lower = lower
self.dict = pickle.load(open(word_dict, 'rb'), encoding='utf8') self.dict = pickle.load(open(word_dict, 'rb'))
self.sp_model = spm.SentencePieceProcessor() self.sp_model = spm.SentencePieceProcessor()
self.window_size = 5 self.window_size = 5
self.sp_model.Load(sp_model_dir) self.sp_model.Load(sp_model_dir)
......
...@@ -30,7 +30,7 @@ ...@@ -30,7 +30,7 @@
使用Bert Service搭建服务主要分为下面三个步骤: 使用Bert Service搭建服务主要分为下面三个步骤:
## Step1:环境准备 ## Step1:准备环境
### 环境要求 ### 环境要求
下表是使用`Bert Service`的环境要求,带有*号标志项为非必需依赖,可根据实际使用需求选择安装。 下表是使用`Bert Service`的环境要求,带有*号标志项为非必需依赖,可根据实际使用需求选择安装。
...@@ -40,8 +40,7 @@ ...@@ -40,8 +40,7 @@
|PaddleHub|>=1.4.0|无| |PaddleHub|>=1.4.0|无|
|PaddlePaddle|>=1.6.1|若使用GPU计算,则对应使用PaddlePaddle-gpu版本| |PaddlePaddle|>=1.6.1|若使用GPU计算,则对应使用PaddlePaddle-gpu版本|
|GCC|>=4.8|无| |GCC|>=4.8|无|
|CUDA*|>=8|若使用GPU,需使用CUDA8以上版本| |paddle-gpu-serving*|>=0.8.2|在`Bert Service`服务端需依赖此包|
|paddle-gpu-serving*|>=0.8.0|在`Bert Service`服务端需依赖此包|
|ujson*|>=1.35|在`Bert Service`客户端需依赖此包| |ujson*|>=1.35|在`Bert Service`客户端需依赖此包|
### 安装步骤 ### 安装步骤
...@@ -84,7 +83,7 @@ $ pip install ujson ...@@ -84,7 +83,7 @@ $ pip install ujson
|[bert_chinese_L-12_H-768_A-12](https://paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)|BERT| |[bert_chinese_L-12_H-768_A-12](https://paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)|BERT|
## Step2:服务端(server) ## Step2:启动服务端(server)
### 简介 ### 简介
server端接收client端发送的数据,执行模型计算过程并将计算结果返回给client端。 server端接收client端发送的数据,执行模型计算过程并将计算结果返回给client端。
...@@ -130,7 +129,7 @@ Paddle Inference Server exit successfully! ...@@ -130,7 +129,7 @@ Paddle Inference Server exit successfully!
``` ```
## Step3:客户端(client) ## Step3:启动客户端(client)
### 简介 ### 简介
client端接收文本数据,并获取server端返回的模型计算的embedding结果。 client端接收文本数据,并获取server端返回的模型计算的embedding结果。
...@@ -197,11 +196,11 @@ input_text = [["西风吹老洞庭波"], ["一夜湘君白发多"], ["醉后不 ...@@ -197,11 +196,11 @@ input_text = [["西风吹老洞庭波"], ["一夜湘君白发多"], ["醉后不
```python ```python
result = bc.get_result(input_text=input_text) result = bc.get_result(input_text=input_text)
``` ```
最后即可得到embedding结果(此处只展示部分结果)。 这样,就得到了embedding结果(此处只展示部分结果)。
```python ```python
[[0.9993321895599361, 0.9994612336158751, 0.9999646544456481, 0.732795298099517, -0.34387934207916204, ... ]] [[0.9993321895599361, 0.9994612336158751, 0.9999646544456481, 0.732795298099517, -0.34387934207916204, ... ]]
``` ```
客户端代码demo文件见[示例](../paddlehub/serving/bert_serving/bert_service.py) 客户端代码demo文件见[示例](../demo/serving/bert_service/bert_service_client.py)
运行命令如下: 运行命令如下:
```shell ```shell
$ python bert_service_client.py $ python bert_service_client.py
......
# PaddleHub Serving模型一键服务部署 # PaddleHub Serving模型一键服务部署
## 简介 ## 简介
### 为什么使用一键服务部署 ### 为什么使用一键服务部署
使用PaddleHub能够快速进行迁移学习和模型预测,但开发者常面临将训练好的模型部署上线的需求,无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一句命令快速得到一个预测服务API,而无需关注网络框架选择和实现。 使用PaddleHub能够快速进行模型预测,但开发者常面临本地预测过程迁移线上的需求。无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一行命令即可快速启动一个模型预测在线服务,而无需关注网络框架选择和实现。
### 什么是一键服务部署 ### 什么是一键服务部署
PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。 PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。
### 支持模型 ### 支持模型
目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括`lac``senta_bilstm`等nlp类模型,以及`yolov3_coco2017``vgg16_imagenet`等cv类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。 目前PaddleHub Serving支持对PaddleHub所有可直接预测的模型进行服务部署,包括`lac``senta_bilstm`等NLP类模型,以及`yolov3_darknet53_coco2017``vgg16_imagenet`等CV类模型,更多模型请参见[PaddleHub支持模型列表](https://paddlepaddle.org.cn/hublist)。未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
### 所需环境
下表是使用PaddleHub Serving的环境要求及注意事项。
|项目|建议版本|说明|
|:-:|:-:|:-:|
|操作系统|Linux/Darwin/Windows|建议使用Linux或Darwin,对多线程启动方式支持性较好|
|PaddleHub|>=1.4.0|无|
|PaddlePaddle|>=1.6.1|若使用GPU计算,则对应使用PaddlePaddle-gpu版本|
## 使用 ## 使用
### Step1:启动服务端部署 ### Step1:启动服务端部署
PaddleHub Serving有两种启动方式,分别是使用命令行命令启动,以及使用配置文件启动。 PaddleHub Serving有两种启动方式,分别是使用命令行启动,以及使用配置文件启动。
#### 命令行命令启动 #### 命令行命令启动
启动命令 启动命令
...@@ -37,7 +28,7 @@ $ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \ ...@@ -37,7 +28,7 @@ $ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
|--modules/-m|PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出<br>*`当不指定Version时,默认选择最新版本`*| |--modules/-m|PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出<br>*`当不指定Version时,默认选择最新版本`*|
|--port/-p|服务端口,默认为8866| |--port/-p|服务端口,默认为8866|
|--use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu| |--use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu|
|--use_multiprocess|是否启用并发方式,默认为单进程方式| |--use_multiprocess|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式<br>*`Windows操作系统只支持单进程方式`*|
#### 配置文件启动 #### 配置文件启动
启动命令 启动命令
...@@ -60,8 +51,8 @@ $ hub serving start --config config.json ...@@ -60,8 +51,8 @@ $ hub serving start --config config.json
"batch_size": "BATCH_SIZE_2" "batch_size": "BATCH_SIZE_2"
} }
], ],
"use_gpu": false,
"port": 8866, "port": 8866,
"use_gpu": false,
"use_multiprocess": false "use_multiprocess": false
} }
``` ```
...@@ -70,10 +61,10 @@ $ hub serving start --config config.json ...@@ -70,10 +61,10 @@ $ hub serving start --config config.json
|参数|用途| |参数|用途|
|-|-| |-|-|
|--modules_info|PaddleHub Serving预安装模型,以字典列表形式列出,其中:<br>`module`为预测服务使用的模型名<br>`version`为预测模型的版本<br>`batch_size`为预测批次大小 |modules_info|PaddleHub Serving预安装模型,以字典列表形式列出,其中:<br>`module`为预测服务使用的模型名<br>`version`为预测模型的版本<br>`batch_size`为预测批次大小
|--use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu| |port|服务端口,默认为8866|
|--port/-p|服务端口,默认为8866| |use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu|
|--use_multiprocess|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式| |use_multiprocess|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式<br>*`Windows操作系统只支持单进程方式`*|
### Step2:访问服务端 ### Step2:访问服务端
...@@ -99,7 +90,7 @@ http://0.0.0.0:8866/predict/<CATEGORY\>/\<MODULE> ...@@ -99,7 +90,7 @@ http://0.0.0.0:8866/predict/<CATEGORY\>/\<MODULE>
### Step1:部署lac在线服务 ### Step1:部署lac在线服务
现在,我们要部署一个lac在线服务,以通过接口获取文本的分词结果。 现在,我们要部署一个lac在线服务,以通过接口获取文本的分词结果。
首先,根据2.1节所述,启动PaddleHub Serving服务端的两种方式分别为: 首先,任意选择一种启动方式,两种方式分别为:
```shell ```shell
$ hub serving start -m lac $ hub serving start -m lac
``` ```
...@@ -148,7 +139,7 @@ if __name__ == "__main__": ...@@ -148,7 +139,7 @@ if __name__ == "__main__":
text_list = ["今天是个好日子", "天气预报说今天要下雨"] text_list = ["今天是个好日子", "天气预报说今天要下雨"]
text = {"text": text_list} text = {"text": text_list}
# 指定预测方法为lac并发送post请求 # 指定预测方法为lac并发送post请求
url = "http://127.0.0.1:8866/predict/text/lac" url = "http://0.0.0.0:8866/predict/text/lac"
r = requests.post(url=url, data=text) r = requests.post(url=url, data=text)
# 打印预测结果 # 打印预测结果
...@@ -180,6 +171,8 @@ if __name__ == "__main__": ...@@ -180,6 +171,8 @@ if __name__ == "__main__":
} }
``` ```
此Demo的具体信息和代码请参见[LAC Serving](../demo/serving/module_serving/lexical_analysis_lac)。另外,下面展示了一些其他的一键服务部署Demo。
## Demo——其他模型的一键部署服务 ## Demo——其他模型的一键部署服务
获取其他PaddleHub Serving的一键服务部署场景示例,可参见下列demo 获取其他PaddleHub Serving的一键服务部署场景示例,可参见下列demo
...@@ -217,4 +210,4 @@ if __name__ == "__main__": ...@@ -217,4 +210,4 @@ if __name__ == "__main__":
&emsp;&emsp;该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。 &emsp;&emsp;该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
## Bert Service ## Bert Service
除了预训练模型一键服务部署功能外,PaddleHub Serving还具有`Bert Service`功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见[Bert Service](./bert_service.md) 除了预训练模型一键服务部署功能外,PaddleHub Serving还具有`Bert Service`功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见[Bert Service](./bert_service.md)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册