Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleHub
提交
7dbbef9c
P
PaddleHub
项目概览
PaddlePaddle
/
PaddleHub
大约 1 年 前同步成功
通知
282
Star
12117
Fork
2091
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
200
列表
看板
标记
里程碑
合并请求
4
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleHub
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
200
Issue
200
列表
看板
标记
里程碑
合并请求
4
合并请求
4
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
7dbbef9c
编写于
1月 13, 2020
作者:
Z
zhangxuefei
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of
https://github.com/PaddlePaddle/PaddleHub
into develop
上级
09835004
1491d0cb
变更
42
隐藏空白更改
内联
并排
Showing
42 changed file
with
247 addition
and
288 deletion
+247
-288
demo/sequence_labeling/predict.py
demo/sequence_labeling/predict.py
+6
-5
demo/serving/README.md
demo/serving/README.md
+4
-4
demo/serving/bert_service/README.md
demo/serving/bert_service/README.md
+1
-2
demo/serving/module_serving/README.md
demo/serving/module_serving/README.md
+18
-104
demo/serving/module_serving/lexical_analysis_lac/README.md
demo/serving/module_serving/lexical_analysis_lac/README.md
+1
-1
demo/serving/module_serving/lexical_analysis_lac/lac_serving_demo.py
...g/module_serving/lexical_analysis_lac/lac_serving_demo.py
+1
-1
demo/serving/module_serving/lexical_analysis_lac/lac_with_dict_serving_demo.py
...erving/lexical_analysis_lac/lac_with_dict_serving_demo.py
+1
-1
demo/serving/module_serving/semantic_model_simnet_bow/simnet_bow_serving_demo.py
...ving/semantic_model_simnet_bow/simnet_bow_serving_demo.py
+1
-1
demo/serving/module_serving/sentiment_analysis_senta_lstm/senta_lstm_serving_demo.py
.../sentiment_analysis_senta_lstm/senta_lstm_serving_demo.py
+1
-1
demo/serving/module_serving/text_censorship_porn_detection_lstm/porn_detection_lstm_serving_demo.py
...p_porn_detection_lstm/porn_detection_lstm_serving_demo.py
+1
-1
paddlehub/commands/serving.py
paddlehub/commands/serving.py
+30
-18
paddlehub/common/downloader.py
paddlehub/common/downloader.py
+1
-1
paddlehub/dataset/base_cv_dataset.py
paddlehub/dataset/base_cv_dataset.py
+3
-3
paddlehub/dataset/base_nlp_dataset.py
paddlehub/dataset/base_nlp_dataset.py
+42
-24
paddlehub/dataset/bq.py
paddlehub/dataset/bq.py
+2
-2
paddlehub/dataset/chnsenticorp.py
paddlehub/dataset/chnsenticorp.py
+2
-2
paddlehub/dataset/cmrc2018.py
paddlehub/dataset/cmrc2018.py
+2
-2
paddlehub/dataset/dataset.py
paddlehub/dataset/dataset.py
+10
-10
paddlehub/dataset/dogcat.py
paddlehub/dataset/dogcat.py
+2
-2
paddlehub/dataset/drcd.py
paddlehub/dataset/drcd.py
+2
-2
paddlehub/dataset/flowers.py
paddlehub/dataset/flowers.py
+2
-2
paddlehub/dataset/food101.py
paddlehub/dataset/food101.py
+2
-2
paddlehub/dataset/glue.py
paddlehub/dataset/glue.py
+2
-2
paddlehub/dataset/iflytek.py
paddlehub/dataset/iflytek.py
+2
-2
paddlehub/dataset/indoor67.py
paddlehub/dataset/indoor67.py
+2
-2
paddlehub/dataset/inews.py
paddlehub/dataset/inews.py
+2
-2
paddlehub/dataset/lcqmc.py
paddlehub/dataset/lcqmc.py
+2
-2
paddlehub/dataset/msra_ner.py
paddlehub/dataset/msra_ner.py
+2
-2
paddlehub/dataset/nlpcc_dbqa.py
paddlehub/dataset/nlpcc_dbqa.py
+2
-2
paddlehub/dataset/squad.py
paddlehub/dataset/squad.py
+2
-2
paddlehub/dataset/stanford_dogs.py
paddlehub/dataset/stanford_dogs.py
+2
-2
paddlehub/dataset/thucnews.py
paddlehub/dataset/thucnews.py
+2
-2
paddlehub/dataset/toxic.py
paddlehub/dataset/toxic.py
+2
-2
paddlehub/dataset/xnli.py
paddlehub/dataset/xnli.py
+2
-2
paddlehub/finetune/task/base_task.py
paddlehub/finetune/task/base_task.py
+44
-34
paddlehub/finetune/task/classifier_task.py
paddlehub/finetune/task/classifier_task.py
+1
-1
paddlehub/finetune/task/reading_comprehension_task.py
paddlehub/finetune/task/reading_comprehension_task.py
+3
-2
paddlehub/module/manager.py
paddlehub/module/manager.py
+2
-3
paddlehub/module/module.py
paddlehub/module/module.py
+17
-4
paddlehub/reader/tokenization.py
paddlehub/reader/tokenization.py
+1
-1
tutorial/bert_service.md
tutorial/bert_service.md
+6
-7
tutorial/serving.md
tutorial/serving.md
+14
-21
未找到文件。
demo/sequence_labeling/predict.py
浏览文件 @
7dbbef9c
...
...
@@ -87,12 +87,13 @@ if __name__ == '__main__':
add_crf
=
True
)
# Data to be predicted
# If using python 2, prefix "u" is necessary
data
=
[
[
"我们变而以书会友,以书结缘,把欧美、港台流行的食品类图谱、画册、工具书汇集一堂。"
],
[
"为了跟踪国际最新食品工艺、流行趋势,大量搜集海外专业书刊资料是提高技艺的捷径。"
],
[
"其中线装古籍逾千册;民国出版物几百种;珍本四册、稀见本四百余册,出版时间跨越三百余年。"
],
[
"有的古木交柯,春机荣欣,从诗人句中得之,而入画中,观之令人心驰。"
],
[
"不过重在晋趣,略增明人气息,妙在集古有道、不露痕迹罢了。"
],
[
u
"我们变而以书会友,以书结缘,把欧美、港台流行的食品类图谱、画册、工具书汇集一堂。"
],
[
u
"为了跟踪国际最新食品工艺、流行趋势,大量搜集海外专业书刊资料是提高技艺的捷径。"
],
[
u
"其中线装古籍逾千册;民国出版物几百种;珍本四册、稀见本四百余册,出版时间跨越三百余年。"
],
[
u
"有的古木交柯,春机荣欣,从诗人句中得之,而入画中,观之令人心驰。"
],
[
u
"不过重在晋趣,略增明人气息,妙在集古有道、不露痕迹罢了。"
],
]
# Add 0x02 between characters to match the format of training data,
...
...
demo/serving/README.md
浏览文件 @
7dbbef9c
...
...
@@ -8,11 +8,11 @@ PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通
PaddleHub Serving主要包括利用Bert Service实现embedding服务化,以及利用预测模型实现预训练模型预测服务化两大功能,未来还将支持开发者使用PaddleHub Fine-tune API的模型服务化。
## Bert Service
Bert Service
是基于
[
Paddle Serving
](
https://github.com/PaddlePaddle/Serving
)
框架的快速部署模型远程计算服务方案,可将embedding过程通过调用API接口的方式实现,减少了对机器资源的依赖。使用PaddleHub可在服务器上一键部署
`Bert Service`
服务,在另外的普通机器上通过客户端接口即可轻松的获取文本对应的embedding数据。
`Bert Service`
是基于
[
Paddle Serving
](
https://github.com/PaddlePaddle/Serving
)
框架的快速部署模型远程计算服务方案,可将embedding过程通过调用API接口的方式实现,减少了对机器资源的依赖。使用PaddleHub可在服务器上一键部署
`Bert Service`
服务,在另外的普通机器上通过客户端接口即可轻松的获取文本对应的embedding数据。
关于
Bert Service的
具体信息和demo请参见
[
Bert Service
](
../../tutorial/bert_service.md
)
关于
其
具体信息和demo请参见
[
Bert Service
](
../../tutorial/bert_service.md
)
该示例展示了利用
Bert Service
进行远程embedding服务化部署和在线预测,并获取文本embedding结果。
该示例展示了利用
`Bert Service`
进行远程embedding服务化部署和在线预测,并获取文本embedding结果。
## 预训练模型一键服务部署
预训练模型一键服务部署是基于PaddleHub的预训练模型快速部署的服务化方案,能够将模型预测以API接口的方式实现。
...
...
@@ -53,4 +53,4 @@ Bert Service是基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)
  
该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
关于Paddle Serving预训练模型一键服务部署功能的具体信息请参见
[
serving
](
module_serving
)
关于Paddle Serving预训练模型一键服务部署功能的具体信息请参见
[
Module Serving
](
module_serving
)
。
demo/serving/bert_service/README.md
浏览文件 @
7dbbef9c
...
...
@@ -68,5 +68,4 @@ Paddle Inference Server exit successfully!
这样,我们就利用一台GPU机器就完成了
`Bert Service`
的部署,并利用另一台普通机器进行了测试,可见通过
`Bert Service`
能够方便地进行在线embedding服务的快速部署。
## 预训练模型一键服务部署
除了
`Bert Service`
外,PaddleHub
Serving还具有预训练模型一键服务部署功能,能够将预训练模型快捷部署上线,对外提供可靠的在线预测服务,具体信息请参见
[
Module Serving
](
../../../tutorial/serving.md
)
。
除了
`Bert Service`
外,PaddleHub Serving还具有预训练模型一键服务部署功能,能够将预训练模型快捷部署上线,对外提供可靠的在线预测服务,具体信息请参见
[
Module Serving
](
../../../tutorial/serving.md
)
。
demo/serving/module_serving/README.md
浏览文件 @
7dbbef9c
# PaddleHub Serving模型一键服务部署
## 简介
### 为什么使用一键服务部署
使用PaddleHub能够快速进行
迁移学习和模型预测,但开发者常面临将训练好的模型部署上线的需求,无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一句命令快速得到一个预测服务API
,而无需关注网络框架选择和实现。
使用PaddleHub能够快速进行
模型预测,但开发者常面临本地预测过程迁移线上的需求。无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一行命令即可快速启动一个模型预测在线服务
,而无需关注网络框架选择和实现。
### 什么是一键服务部署
PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。
### 支持模型
目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括
`lac`
、
`senta_bilstm`
等nlp类模型,以及
`yolov3_coco2017`
、
`vgg16_imagenet`
等cv类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
**NOTE:**
关于PaddleHub Serving一键服务部署的具体信息请参见
[
PaddleHub Servung
](
../../../tutorial/serving.md
)
。
## Demo——部署一个在线lac分词服务
### Step1:部署lac在线服务
现在,我们要部署一个lac在线服务,以通过接口获取文本的分词结果。
首先,根据2.1节所述,启动PaddleHub Serving服务端的两种方式分别为:
```
shell
$
hub serving start
-m
lac
```
或
```
shell
$
hub serving start
-c
serving_config.json
```
其中
`serving_config.json`
的内容如下:
```
json
{
"modules_info"
:
[
{
"module"
:
"lac"
,
"version"
:
"1.0.0"
,
"batch_size"
:
1
}
],
"use_gpu"
:
false
,
"port"
:
8866
,
"use_multiprocess"
:
false
}
```
启动成功界面如图:
<p
align=
"center"
>
<img
src=
"../demo/serving/module_serving/img/start_serving_lac.png"
width=
"100%"
/>
</p>
这样我们就在8866端口部署了lac的在线分词服务。
*此处warning为Flask提示,不影响使用*
### Step2:访问lac预测接口
在服务部署好之后,我们可以进行测试,用来测试的文本为
`今天是个好日子`
和
`天气预报说今天要下雨`
。
客户端代码如下:
```
python
# coding: utf8
import
requests
import
json
if
__name__
==
"__main__"
:
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list
=
[
"今天是个好日子"
,
"天气预报说今天要下雨"
]
text
=
{
"text"
:
text_list
}
# 指定预测方法为lac并发送post请求
url
=
"http://127.0.0.1:8866/predict/text/lac"
r
=
requests
.
post
(
url
=
url
,
data
=
text
)
# 打印预测结果
print
(
json
.
dumps
(
r
.
json
(),
indent
=
4
,
ensure_ascii
=
False
))
```
运行后得到结果:
```
python
{
"results"
:
[
{
"tag"
:
[
"TIME"
,
"v"
,
"q"
,
"n"
],
"word"
:
[
"今天"
,
"是"
,
"个"
,
"好日子"
]
},
{
"tag"
:
[
"n"
,
"v"
,
"TIME"
,
"v"
,
"v"
],
"word"
:
[
"天气预报"
,
"说"
,
"今天"
,
"要"
,
"下雨"
]
}
]
}
```
## Demo——其他模型的一键部署服务
获取其他PaddleHub Serving的一键服务部署场景示例,可参见下列demo:
*
[
图像分类-基于vgg11_imagent
](
../demo/serving/module_serving/classification_vgg11_imagenet
)
目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括
`lac`
、
`senta_bilstm`
等NLP类模型,以及
`yolov3_darknett53_coco2017`
、
`vgg16_imagenet`
等CV类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
**NOTE:**
关于PaddleHub Serving一键服务部署的具体信息请参见
[
PaddleHub Serving
](
../../../tutorial/serving.md
)
。
## Demo
获取PaddleHub Serving的一键服务部署场景示例,可参见下列demo:
*
[
图像分类-基于vgg11_imagent
](
../module_serving/classification_vgg11_imagenet
)
  
该示例展示了利用vgg11_imagent完成图像分类服务化部署和在线预测,获取图像分类结果。
*
[
图像生成-基于stgan_celeba
](
../
demo/serving/
module_serving/GAN_stgan_celeba
)
*
[
图像生成-基于stgan_celeba
](
../module_serving/GAN_stgan_celeba
)
  
该示例展示了利用stgan_celeba生成图像服务化部署和在线预测,获取指定风格的生成图像。
*
[
文本审核-基于porn_detection_lstm
](
../
demo/serving/
module_serving/text_censorship_porn_detection_lstm
)
*
[
文本审核-基于porn_detection_lstm
](
../module_serving/text_censorship_porn_detection_lstm
)
  
该示例展示了利用porn_detection_lstm完成中文文本黄色敏感信息鉴定的服务化部署和在线预测,获取文本是否敏感及其置信度。
*
[
中文词法分析-基于lac
](
../
demo/serving/
module_serving/lexical_analysis_lac
)
*
[
中文词法分析-基于lac
](
../module_serving/lexical_analysis_lac
)
  
该示例展示了利用lac完成中文文本分词服务化部署和在线预测,获取文本的分词结果,并可通过用户自定义词典干预分词结果。
*
[
目标检测-基于yolov3_darknet53_coco2017
](
..
./demo/serving/
serving/object_detection_yolov3_darknet53_coco2017
)
*
[
目标检测-基于yolov3_darknet53_coco2017
](
..
/module_
serving/object_detection_yolov3_darknet53_coco2017
)
  
该示例展示了利用yolov3_darknet53_coco2017完成目标检测服务化部署和在线预测,获取检测结果和覆盖识别框的图片。
*
[
中文语义分析-基于simnet_bow
](
../
demo/serving/
module_serving/semantic_model_simnet_bow
)
*
[
中文语义分析-基于simnet_bow
](
../module_serving/semantic_model_simnet_bow
)
  
该示例展示了利用simnet_bow完成中文文本相似度检测服务化部署和在线预测,获取文本的相似程度。
*
[
图像分割-基于deeplabv3p_xception65_humanseg
](
../
demo/serving/
module_serving/semantic_segmentation_deeplabv3p_xception65_humanseg
)
*
[
图像分割-基于deeplabv3p_xception65_humanseg
](
../module_serving/semantic_segmentation_deeplabv3p_xception65_humanseg
)
  
该示例展示了利用deeplabv3p_xception65_humanseg完成图像分割服务化部署和在线预测,获取识别结果和分割后的图像。
*
[
中文情感分析-基于simnet_bow
](
../
demo/serving/
module_serving/semantic_model_simnet_bow
)
*
[
中文情感分析-基于simnet_bow
](
../module_serving/semantic_model_simnet_bow
)
  
该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
## Bert Service
除了预训练模型一键服务部署功能
外
外,PaddleHub Serving还具有
`Bert Service`
功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见
[
Bert Service
](
../../../tutorial/bert_service.md
)
。
除了预训练模型一键服务部署功能
之
外,PaddleHub Serving还具有
`Bert Service`
功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见
[
Bert Service
](
../../../tutorial/bert_service.md
)
。
demo/serving/module_serving/lexical_analysis_lac/README.md
浏览文件 @
7dbbef9c
...
...
@@ -6,7 +6,7 @@
这里就带领大家使用PaddleHub Serving,通过简单几步部署一个词法分析在线服务。
##
2
启动PaddleHub Serving
##
Step1:
启动PaddleHub Serving
启动命令如下
```
shell
$
hub serving start
-m
lac
...
...
demo/serving/module_serving/lexical_analysis_lac/lac_serving_demo.py
浏览文件 @
7dbbef9c
...
...
@@ -3,7 +3,7 @@ import requests
import
json
if
__name__
==
"__main__"
:
# 指定用于
用于
预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list
=
[
"今天是个好日子"
,
"天气预报说今天要下雨"
]
text
=
{
"text"
:
text_list
}
# 指定预测方法为lac并发送post请求
...
...
demo/serving/module_serving/lexical_analysis_lac/lac_with_dict_serving_demo.py
浏览文件 @
7dbbef9c
...
...
@@ -3,7 +3,7 @@ import requests
import
json
if
__name__
==
"__main__"
:
# 指定用于
用于
预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list
=
[
"今天是个好日子"
,
"天气预报说今天要下雨"
]
text
=
{
"text"
:
text_list
}
# 指定自定义词典{"user_dict": dict.txt}
...
...
demo/serving/module_serving/semantic_model_simnet_bow/simnet_bow_serving_demo.py
浏览文件 @
7dbbef9c
...
...
@@ -3,7 +3,7 @@ import requests
import
json
if
__name__
==
"__main__"
:
# 指定用于
用于
匹配的文本并生成字典{"text_1": [text_a1, text_a2, ... ]
# 指定用于匹配的文本并生成字典{"text_1": [text_a1, text_a2, ... ]
# "text_2": [text_b1, text_b2, ... ]}
text
=
{
"text_1"
:
[
"这道题太难了"
,
"这道题太难了"
,
"这道题太难了"
],
...
...
demo/serving/module_serving/sentiment_analysis_senta_lstm/senta_lstm_serving_demo.py
浏览文件 @
7dbbef9c
...
...
@@ -3,7 +3,7 @@ import requests
import
json
if
__name__
==
"__main__"
:
# 指定用于
用于
预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list
=
[
"我不爱吃甜食"
,
"我喜欢躺在床上看电影"
]
text
=
{
"text"
:
text_list
}
# 指定预测方法为senta_lstm并发送post请求
...
...
demo/serving/module_serving/text_censorship_porn_detection_lstm/porn_detection_lstm_serving_demo.py
浏览文件 @
7dbbef9c
...
...
@@ -3,7 +3,7 @@ import requests
import
json
if
__name__
==
"__main__"
:
# 指定用于
用于
预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list
=
[
"黄片下载"
,
"中国黄页"
]
text
=
{
"text"
:
text_list
}
# 指定预测方法为lac并发送post请求
...
...
paddlehub/commands/serving.py
浏览文件 @
7dbbef9c
...
...
@@ -26,30 +26,42 @@ import paddlehub as hub
from
paddlehub.commands.base_command
import
BaseCommand
,
ENTRY
from
paddlehub.serving
import
app_single
as
app
import
multiprocessing
import
gunicorn.app.base
if
platform
.
system
()
==
"Windows"
:
def
number_of_workers
():
return
(
multiprocessing
.
cpu_count
()
*
2
)
+
1
class
StandaloneApplication
(
object
):
def
__init__
(
self
):
pass
def
load_config
(
self
):
pass
def
load
(
self
):
pass
else
:
import
gunicorn.app.base
class
StandaloneApplication
(
gunicorn
.
app
.
base
.
BaseApplication
):
def
__init__
(
self
,
app
,
options
=
None
):
self
.
options
=
options
or
{}
self
.
application
=
app
super
(
StandaloneApplication
,
self
).
__init__
()
class
StandaloneApplication
(
gunicorn
.
app
.
base
.
BaseApplication
):
def
__init__
(
self
,
app
,
options
=
None
):
self
.
options
=
options
or
{}
self
.
application
=
app
super
(
StandaloneApplication
,
self
).
__init__
()
def
load_config
(
self
):
config
=
{
key
:
value
for
key
,
value
in
self
.
options
.
items
()
if
key
in
self
.
cfg
.
settings
and
value
is
not
None
}
for
key
,
value
in
config
.
items
():
self
.
cfg
.
set
(
key
.
lower
(),
value
)
def
load_config
(
self
):
config
=
{
key
:
value
for
key
,
value
in
self
.
options
.
items
()
if
key
in
self
.
cfg
.
settings
and
value
is
not
None
}
for
key
,
value
in
config
.
items
():
self
.
cfg
.
set
(
key
.
lower
(),
value
)
def
load
(
self
):
return
self
.
application
def
load
(
self
):
return
self
.
application
def
number_of_workers
():
return
(
multiprocessing
.
cpu_count
()
*
2
)
+
1
class
ServingCommand
(
BaseCommand
):
...
...
paddlehub/common/downloader.py
浏览文件 @
7dbbef9c
...
...
@@ -29,7 +29,7 @@ import tarfile
from
paddlehub.common
import
utils
from
paddlehub.common.logger
import
logger
__all__
=
[
'Downloader'
]
__all__
=
[
'Downloader'
,
'progress'
]
FLUSH_INTERVAL
=
0.1
lasttime
=
time
.
time
()
...
...
paddlehub/dataset/base_cv_dataset.py
浏览文件 @
7dbbef9c
...
...
@@ -26,7 +26,7 @@ from paddlehub.common.downloader import default_downloader
from
paddlehub.common.logger
import
logger
class
BaseCVDatast
(
BaseDataset
):
class
BaseCVDatas
e
t
(
BaseDataset
):
def
__init__
(
self
,
base_path
,
train_list_file
=
None
,
...
...
@@ -35,7 +35,7 @@ class BaseCVDatast(BaseDataset):
predict_list_file
=
None
,
label_list_file
=
None
,
label_list
=
None
):
super
(
BaseCVDatast
,
self
).
__init__
(
super
(
BaseCVDatas
e
t
,
self
).
__init__
(
base_path
=
base_path
,
train_file
=
train_list_file
,
dev_file
=
validate_list_file
,
...
...
@@ -65,7 +65,7 @@ class BaseCVDatast(BaseDataset):
return
data
# discarded. please use BaseCVDatast
# discarded. please use BaseCVDatas
e
t
class
ImageClassificationDataset
(
object
):
def
__init__
(
self
):
logger
.
warning
(
...
...
paddlehub/dataset/base_nlp_dataset.py
浏览文件 @
7dbbef9c
...
...
@@ -21,9 +21,10 @@ import io
import
csv
from
paddlehub.dataset
import
InputExample
,
BaseDataset
from
paddlehub.common.logger
import
logger
class
BaseNLPDatast
(
BaseDataset
):
class
BaseNLPDatas
e
t
(
BaseDataset
):
def
__init__
(
self
,
base_path
,
train_file
=
None
,
...
...
@@ -32,11 +33,11 @@ class BaseNLPDatast(BaseDataset):
predict_file
=
None
,
label_file
=
None
,
label_list
=
None
,
train_file_with_head
=
False
,
dev_file_with_head
=
False
,
test_file_with_head
=
False
,
predict_file_with_head
=
False
):
super
(
BaseNLPDatast
,
self
).
__init__
(
train_file_with_head
er
=
False
,
dev_file_with_head
er
=
False
,
test_file_with_head
er
=
False
,
predict_file_with_head
er
=
False
):
super
(
BaseNLPDatas
e
t
,
self
).
__init__
(
base_path
=
base_path
,
train_file
=
train_file
,
dev_file
=
dev_file
,
...
...
@@ -44,37 +45,54 @@ class BaseNLPDatast(BaseDataset):
predict_file
=
predict_file
,
label_file
=
label_file
,
label_list
=
label_list
,
train_file_with_head
=
train_file_with_head
,
dev_file_with_head
=
dev_file_with_head
,
test_file_with_head
=
test_file_with_head
,
predict_file_with_head
=
predict_file_with_head
)
train_file_with_head
er
=
train_file_with_header
,
dev_file_with_head
er
=
dev_file_with_header
,
test_file_with_head
er
=
test_file_with_header
,
predict_file_with_head
er
=
predict_file_with_header
)
def
_read_file
(
self
,
input_file
,
phase
=
None
):
"""Reads a tab separated value file."""
has_warned
=
False
with
io
.
open
(
input_file
,
"r"
,
encoding
=
"UTF-8"
)
as
file
:
reader
=
csv
.
reader
(
file
,
delimiter
=
"
\t
"
,
quotechar
=
None
)
examples
=
[]
for
(
i
,
line
)
in
enumerate
(
reader
):
if
i
==
0
:
ncol
=
len
(
line
)
if
self
.
if_file_with_head
[
phase
]:
if
self
.
if_file_with_head
er
[
phase
]:
continue
if
ncol
==
1
:
if
phase
!=
"predict"
:
example
=
InputExample
(
guid
=
i
,
text_a
=
line
[
0
])
else
:
if
phase
!=
"predict"
:
if
ncol
==
1
:
raise
Exception
(
"the %s file: %s only has one column but it is not a predict file"
%
(
phase
,
input_file
))
elif
ncol
==
2
:
example
=
InputExample
(
guid
=
i
,
text_a
=
line
[
0
],
label
=
line
[
1
])
elif
ncol
==
3
:
example
=
InputExample
(
guid
=
i
,
text_a
=
line
[
0
],
text_b
=
line
[
1
],
label
=
line
[
2
])
elif
ncol
==
2
:
example
=
InputExample
(
guid
=
i
,
text_a
=
line
[
0
],
label
=
line
[
1
])
elif
ncol
==
3
:
example
=
InputExample
(
guid
=
i
,
text_a
=
line
[
0
],
text_b
=
line
[
1
],
label
=
line
[
2
])
else
:
raise
Exception
(
"the %s file: %s has too many columns (should <=3)"
%
(
phase
,
input_file
))
else
:
raise
Exception
(
"the %s file: %s has too many columns (should <=3)"
%
(
phase
,
input_file
))
if
ncol
==
1
:
example
=
InputExample
(
guid
=
i
,
text_a
=
line
[
0
])
elif
ncol
==
2
:
if
not
has_warned
:
logger
.
warning
(
"the predict file: %s has 2 columns, as it is a predict file, the second one will be regarded as text_b"
%
(
input_file
))
has_warned
=
True
example
=
InputExample
(
guid
=
i
,
text_a
=
line
[
0
],
text_b
=
line
[
1
])
else
:
raise
Exception
(
"the predict file: %s has too many columns (should <=2)"
%
(
input_file
))
examples
.
append
(
example
)
return
examples
paddlehub/dataset/bq.py
浏览文件 @
7dbbef9c
...
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
class
BQ
(
BaseNLPDatast
):
class
BQ
(
BaseNLPDatas
e
t
):
def
__init__
(
self
):
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"bq"
)
base_path
=
self
.
_download_dataset
(
...
...
paddlehub/dataset/chnsenticorp.py
浏览文件 @
7dbbef9c
...
...
@@ -23,10 +23,10 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
class
ChnSentiCorp
(
BaseNLPDatast
):
class
ChnSentiCorp
(
BaseNLPDatas
e
t
):
"""
ChnSentiCorp (by Tan Songbo at ICT of Chinese Academy of Sciences, and for
opinion mining)
...
...
paddlehub/dataset/cmrc2018.py
浏览文件 @
7dbbef9c
...
...
@@ -20,7 +20,7 @@ import os
from
paddlehub.reader
import
tokenization
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.logger
import
logger
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/cmrc2018.tar.gz"
SPIECE_UNDERLINE
=
'▁'
...
...
@@ -62,7 +62,7 @@ class CMRC2018Example(object):
return
s
class
CMRC2018
(
BaseNLPDatast
):
class
CMRC2018
(
BaseNLPDatas
e
t
):
"""A single set of features of data."""
def
__init__
(
self
):
...
...
paddlehub/dataset/dataset.py
浏览文件 @
7dbbef9c
...
...
@@ -64,10 +64,10 @@ class BaseDataset(object):
predict_file
=
None
,
label_file
=
None
,
label_list
=
None
,
train_file_with_head
=
False
,
dev_file_with_head
=
False
,
test_file_with_head
=
False
,
predict_file_with_head
=
False
):
train_file_with_head
er
=
False
,
dev_file_with_head
er
=
False
,
test_file_with_head
er
=
False
,
predict_file_with_head
er
=
False
):
if
not
(
train_file
or
dev_file
or
test_file
):
raise
ValueError
(
"At least one file should be assigned"
)
self
.
base_path
=
base_path
...
...
@@ -83,11 +83,11 @@ class BaseDataset(object):
self
.
test_examples
=
[]
self
.
predict_examples
=
[]
self
.
if_file_with_head
=
{
"train"
:
train_file_with_head
,
"dev"
:
dev_file_with_head
,
"test"
:
test_file_with_head
,
"predict"
:
predict_file_with_head
self
.
if_file_with_head
er
=
{
"train"
:
train_file_with_head
er
,
"dev"
:
dev_file_with_head
er
,
"test"
:
test_file_with_head
er
,
"predict"
:
predict_file_with_head
er
}
if
train_file
:
...
...
@@ -128,7 +128,7 @@ class BaseDataset(object):
def
num_labels
(
self
):
return
len
(
self
.
label_list
)
# To
compatibility with the usage of
ImageClassificationDataset
# To
be compatible with
ImageClassificationDataset
def
label_dict
(
self
):
return
{
index
:
key
for
index
,
key
in
enumerate
(
self
.
label_list
)}
...
...
paddlehub/dataset/dogcat.py
浏览文件 @
7dbbef9c
...
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
import
paddlehub
as
hub
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatast
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatas
e
t
class
DogCatDataset
(
BaseCVDatast
):
class
DogCatDataset
(
BaseCVDatas
e
t
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"dog-cat"
)
base_path
=
self
.
_download_dataset
(
...
...
paddlehub/dataset/drcd.py
浏览文件 @
7dbbef9c
...
...
@@ -20,7 +20,7 @@ import os
from
paddlehub.reader
import
tokenization
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.logger
import
logger
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/drcd.tar.gz"
SPIECE_UNDERLINE
=
'▁'
...
...
@@ -62,7 +62,7 @@ class DRCDExample(object):
return
s
class
DRCD
(
BaseNLPDatast
):
class
DRCD
(
BaseNLPDatas
e
t
):
"""A single set of features of data."""
def
__init__
(
self
):
...
...
paddlehub/dataset/flowers.py
浏览文件 @
7dbbef9c
...
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
import
paddlehub
as
hub
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatast
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatas
e
t
class
FlowersDataset
(
BaseCVDatast
):
class
FlowersDataset
(
BaseCVDatas
e
t
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"flower_photos"
)
base_path
=
self
.
_download_dataset
(
...
...
paddlehub/dataset/food101.py
浏览文件 @
7dbbef9c
...
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
import
paddlehub
as
hub
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatast
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatas
e
t
class
Food101Dataset
(
BaseCVDatast
):
class
Food101Dataset
(
BaseCVDatas
e
t
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"food-101"
,
"images"
)
...
...
paddlehub/dataset/glue.py
浏览文件 @
7dbbef9c
...
...
@@ -24,12 +24,12 @@ import io
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.logger
import
logger
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/glue_data.tar.gz"
class
GLUE
(
BaseNLPDatast
):
class
GLUE
(
BaseNLPDatas
e
t
):
"""
Please refer to
https://gluebenchmark.com
...
...
paddlehub/dataset/iflytek.py
浏览文件 @
7dbbef9c
...
...
@@ -22,12 +22,12 @@ import os
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/iflytek.tar.gz"
class
IFLYTEK
(
BaseNLPDatast
):
class
IFLYTEK
(
BaseNLPDatas
e
t
):
def
__init__
(
self
):
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"iflytek"
)
base_path
=
self
.
_download_dataset
(
dataset_dir
,
url
=
_DATA_URL
)
...
...
paddlehub/dataset/indoor67.py
浏览文件 @
7dbbef9c
...
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
import
paddlehub
as
hub
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatast
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatas
e
t
class
Indoor67Dataset
(
BaseCVDatast
):
class
Indoor67Dataset
(
BaseCVDatas
e
t
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"Indoor67"
)
base_path
=
self
.
_download_dataset
(
...
...
paddlehub/dataset/inews.py
浏览文件 @
7dbbef9c
...
...
@@ -23,12 +23,12 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/inews.tar.gz"
class
INews
(
BaseNLPDatast
):
class
INews
(
BaseNLPDatas
e
t
):
"""
INews is a sentiment analysis dataset for Internet News
"""
...
...
paddlehub/dataset/lcqmc.py
浏览文件 @
7dbbef9c
...
...
@@ -23,12 +23,12 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/lcqmc.tar.gz"
class
LCQMC
(
BaseNLPDatast
):
class
LCQMC
(
BaseNLPDatas
e
t
):
def
__init__
(
self
):
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"lcqmc"
)
base_path
=
self
.
_download_dataset
(
dataset_dir
,
url
=
_DATA_URL
)
...
...
paddlehub/dataset/msra_ner.py
浏览文件 @
7dbbef9c
...
...
@@ -23,12 +23,12 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/msra_ner.tar.gz"
class
MSRA_NER
(
BaseNLPDatast
):
class
MSRA_NER
(
BaseNLPDatas
e
t
):
"""
A set of manually annotated Chinese word-segmentation data and
specifications for training and testing a Chinese word-segmentation system
...
...
paddlehub/dataset/nlpcc_dbqa.py
浏览文件 @
7dbbef9c
...
...
@@ -23,12 +23,12 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/nlpcc-dbqa.tar.gz"
class
NLPCC_DBQA
(
BaseNLPDatast
):
class
NLPCC_DBQA
(
BaseNLPDatas
e
t
):
"""
Please refer to
http://tcci.ccf.org.cn/conference/2017/dldoc/taskgline05.pdf
...
...
paddlehub/dataset/squad.py
浏览文件 @
7dbbef9c
...
...
@@ -20,7 +20,7 @@ import os
from
paddlehub.reader
import
tokenization
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.logger
import
logger
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/squad.tar.gz"
...
...
@@ -65,7 +65,7 @@ class SquadExample(object):
return
s
class
SQUAD
(
BaseNLPDatast
):
class
SQUAD
(
BaseNLPDatas
e
t
):
"""A single set of features of data."""
def
__init__
(
self
,
version_2_with_negative
=
False
):
...
...
paddlehub/dataset/stanford_dogs.py
浏览文件 @
7dbbef9c
...
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
import
paddlehub
as
hub
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatast
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatas
e
t
class
StanfordDogsDataset
(
BaseCVDatast
):
class
StanfordDogsDataset
(
BaseCVDatas
e
t
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"StanfordDogs-120"
)
...
...
paddlehub/dataset/thucnews.py
浏览文件 @
7dbbef9c
...
...
@@ -22,12 +22,12 @@ import os
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/thucnews.tar.gz"
class
THUCNEWS
(
BaseNLPDatast
):
class
THUCNEWS
(
BaseNLPDatas
e
t
):
def
__init__
(
self
):
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"thucnews"
)
base_path
=
self
.
_download_dataset
(
dataset_dir
,
url
=
_DATA_URL
)
...
...
paddlehub/dataset/toxic.py
浏览文件 @
7dbbef9c
...
...
@@ -22,12 +22,12 @@ import pandas as pd
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/toxic.tar.gz"
class
Toxic
(
BaseNLPDatast
):
class
Toxic
(
BaseNLPDatas
e
t
):
"""
The kaggle Toxic dataset:
https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
...
...
paddlehub/dataset/xnli.py
浏览文件 @
7dbbef9c
...
...
@@ -25,12 +25,12 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/XNLI-lan.tar.gz"
class
XNLI
(
BaseNLPDatast
):
class
XNLI
(
BaseNLPDatas
e
t
):
"""
Please refer to
https://arxiv.org/pdf/1809.05053.pdf
...
...
paddlehub/finetune/task/base_task.py
浏览文件 @
7dbbef9c
...
...
@@ -24,7 +24,12 @@ import copy
import
logging
import
inspect
from
functools
import
partial
from
collections
import
OrderedDict
import
six
if
six
.
PY2
:
from
inspect
import
getargspec
as
get_args
else
:
from
inspect
import
getfullargspec
as
get_args
import
numpy
as
np
import
paddle.fluid
as
fluid
from
tb_paddle
import
SummaryWriter
...
...
@@ -84,44 +89,44 @@ class RunEnv(object):
class
TaskHooks
():
def
__init__
(
self
):
self
.
_registered_hooks
=
{
"build_env_start
"
:
{}
,
"build_env_end
"
:
{}
,
"finetune_start
"
:
{}
,
"finetune_end
"
:
{}
,
"predict_start
"
:
{}
,
"predict_end
"
:
{}
,
"eval_start
"
:
{}
,
"eval_end
"
:
{}
,
"log_interval
"
:
{}
,
"save_ckpt_interval
"
:
{}
,
"eval_interval
"
:
{}
,
"run_step
"
:
{}
,
"build_env_start
_event"
:
OrderedDict
()
,
"build_env_end
_event"
:
OrderedDict
()
,
"finetune_start
_event"
:
OrderedDict
()
,
"finetune_end
_event"
:
OrderedDict
()
,
"predict_start
_event"
:
OrderedDict
()
,
"predict_end
_event"
:
OrderedDict
()
,
"eval_start
_event"
:
OrderedDict
()
,
"eval_end
_event"
:
OrderedDict
()
,
"log_interval
_event"
:
OrderedDict
()
,
"save_ckpt_interval
_event"
:
OrderedDict
()
,
"eval_interval
_event"
:
OrderedDict
()
,
"run_step
_event"
:
OrderedDict
()
,
}
self
.
_hook_params_num
=
{
"build_env_start"
:
1
,
"build_env_end"
:
1
,
"finetune_start"
:
1
,
"finetune_end"
:
2
,
"predict_start"
:
1
,
"predict_end"
:
2
,
"eval_start"
:
1
,
"eval_end"
:
2
,
"log_interval"
:
2
,
"save_ckpt_interval"
:
1
,
"eval_interval"
:
1
,
"run_step"
:
2
,
"build_env_start
_event
"
:
1
,
"build_env_end
_event
"
:
1
,
"finetune_start
_event
"
:
1
,
"finetune_end
_event
"
:
2
,
"predict_start
_event
"
:
1
,
"predict_end
_event
"
:
2
,
"eval_start
_event
"
:
1
,
"eval_end
_event
"
:
2
,
"log_interval
_event
"
:
2
,
"save_ckpt_interval
_event
"
:
1
,
"eval_interval
_event
"
:
1
,
"run_step
_event
"
:
2
,
}
def
add
(
self
,
hook_type
,
name
=
None
,
func
=
None
):
if
not
func
or
not
callable
(
func
):
raise
TypeError
(
"The hook function is empty or it is not a function"
)
if
name
and
not
isinstance
(
name
,
str
):
raise
TypeError
(
"The hook name must be a string"
)
if
not
name
:
if
name
==
None
:
name
=
"hook_%s"
%
id
(
func
)
# check validity
if
not
isinstance
(
name
,
str
)
or
name
.
strip
()
==
""
:
raise
TypeError
(
"The hook name must be a non-empty string"
)
if
hook_type
not
in
self
.
_registered_hooks
:
raise
ValueError
(
"hook_type: %s does not exist"
%
(
hook_type
))
if
name
in
self
.
_registered_hooks
[
hook_type
]:
...
...
@@ -129,7 +134,7 @@ class TaskHooks():
"name: %s has existed in hook_type:%s, use modify method to modify it"
%
(
name
,
hook_type
))
else
:
args_num
=
len
(
inspect
.
getfullargspec
(
func
).
args
)
args_num
=
len
(
get_args
(
func
).
args
)
if
args_num
!=
self
.
_hook_params_num
[
hook_type
]:
raise
ValueError
(
"The number of parameters to the hook hook_type:%s should be %i"
...
...
@@ -163,13 +168,13 @@ class TaskHooks():
else
:
return
True
def
info
(
self
,
only_customized
=
Tru
e
):
def
info
(
self
,
show_default
=
Fals
e
):
# formatted output the source code
ret
=
""
for
hook_type
,
hooks
in
self
.
_registered_hooks
.
items
():
already_print_type
=
False
for
name
,
func
in
hooks
.
items
():
if
name
==
"default"
and
only_customized
:
if
name
==
"default"
and
not
show_default
:
continue
if
not
already_print_type
:
ret
+=
"hook_type: %s{
\n
"
%
hook_type
...
...
@@ -182,7 +187,7 @@ class TaskHooks():
if
already_print_type
:
ret
+=
"}
\n
"
if
not
ret
:
ret
=
"Not any
hooks when only_customized=%s"
%
only_customized
ret
=
"Not any
customized hooks have been defined, you can set show_default=True to see the default hooks information"
return
ret
def
__getitem__
(
self
,
hook_type
):
...
...
@@ -259,8 +264,8 @@ class BaseTask(object):
self
.
_hooks
=
TaskHooks
()
for
hook_type
,
event_hooks
in
self
.
_hooks
.
_registered_hooks
.
items
():
self
.
_hooks
.
add
(
hook_type
,
"default"
,
eval
(
"self._default_%s
_event
"
%
hook_type
))
setattr
(
BaseTask
,
"_%s
_event
"
%
hook_type
,
eval
(
"self._default_%s"
%
hook_type
))
setattr
(
BaseTask
,
"_%s"
%
hook_type
,
self
.
create_event_function
(
hook_type
))
# accelerate predict
...
...
@@ -581,13 +586,18 @@ class BaseTask(object):
return
self
.
_hooks
.
info
(
only_customized
)
def
add_hook
(
self
,
hook_type
,
name
=
None
,
func
=
None
):
if
name
==
None
:
name
=
"hook_%s"
%
id
(
func
)
self
.
_hooks
.
add
(
hook_type
,
name
=
name
,
func
=
func
)
logger
.
info
(
"Add hook %s:%s successfully"
%
(
hook_type
,
name
))
def
delete_hook
(
self
,
hook_type
,
name
):
self
.
_hooks
.
delete
(
hook_type
,
name
)
logger
.
info
(
"Delete hook %s:%s successfully"
%
(
hook_type
,
name
))
def
modify_hook
(
self
,
hook_type
,
name
,
func
):
self
.
_hooks
.
modify
(
hook_type
,
name
,
func
)
logger
.
info
(
"Modify hook %s:%s successfully"
%
(
hook_type
,
name
))
def
_default_build_env_start_event
(
self
):
pass
...
...
paddlehub/finetune/task/classifier_task.py
浏览文件 @
7dbbef9c
...
...
@@ -142,7 +142,7 @@ class ClassifierTask(BaseTask):
}
except
:
raise
Exception
(
"ImageClassificationDataset does not support postprocessing, please use BaseCVDatast instead"
"ImageClassificationDataset does not support postprocessing, please use BaseCVDatas
e
t instead"
)
results
=
[]
for
batch_state
in
run_states
:
...
...
paddlehub/finetune/task/reading_comprehension_task.py
浏览文件 @
7dbbef9c
...
...
@@ -26,6 +26,7 @@ import json
from
collections
import
OrderedDict
import
io
import
numpy
as
np
import
paddle.fluid
as
fluid
from
.base_task
import
BaseTask
...
...
@@ -517,13 +518,13 @@ class ReadingComprehensionTask(BaseTask):
null_score_diff_threshold
=
self
.
null_score_diff_threshold
,
is_english
=
self
.
is_english
)
if
self
.
phase
==
'val'
or
self
.
phase
==
'dev'
:
with
open
(
with
io
.
open
(
self
.
data_reader
.
dataset
.
dev_path
,
'r'
,
encoding
=
"utf8"
)
as
dataset_file
:
dataset_json
=
json
.
load
(
dataset_file
)
dataset
=
dataset_json
[
'data'
]
elif
self
.
phase
==
'test'
:
with
open
(
with
io
.
open
(
self
.
data_reader
.
dataset
.
test_path
,
'r'
,
encoding
=
"utf8"
)
as
dataset_file
:
dataset_json
=
json
.
load
(
dataset_file
)
...
...
paddlehub/module/manager.py
浏览文件 @
7dbbef9c
...
...
@@ -168,8 +168,7 @@ class LocalModuleManager(object):
with
tarfile
.
open
(
module_package
,
"r:gz"
)
as
tar
:
file_names
=
tar
.
getnames
()
size
=
len
(
file_names
)
-
1
module_dir
=
os
.
path
.
split
(
file_names
[
0
])[
0
]
module_dir
=
os
.
path
.
join
(
_dir
,
module_dir
)
module_dir
=
os
.
path
.
join
(
_dir
,
file_names
[
0
])
for
index
,
file_name
in
enumerate
(
file_names
):
tar
.
extract
(
file_name
,
_dir
)
...
...
@@ -195,7 +194,7 @@ class LocalModuleManager(object):
save_path
=
os
.
path
.
join
(
MODULE_HOME
,
module_name
)
if
os
.
path
.
exists
(
save_path
):
shutil
.
mov
e
(
save_path
)
shutil
.
rmtre
e
(
save_path
)
if
from_user_dir
:
shutil
.
copytree
(
module_dir
,
save_path
)
else
:
...
...
paddlehub/module/module.py
浏览文件 @
7dbbef9c
...
...
@@ -37,6 +37,7 @@ from paddlehub.common.lock import lock
from
paddlehub.common.logger
import
logger
from
paddlehub.common.hub_server
import
CacheUpdater
from
paddlehub.common
import
tmp_dir
from
paddlehub.common.downloader
import
progress
from
paddlehub.module
import
module_desc_pb2
from
paddlehub.module.manager
import
default_module_manager
from
paddlehub.module.checker
import
ModuleChecker
...
...
@@ -99,10 +100,22 @@ def create_module(directory, name, author, email, module_type, summary,
_cwd
=
os
.
getcwd
()
os
.
chdir
(
base_dir
)
for
dirname
,
_
,
files
in
os
.
walk
(
module_dir
):
for
file
in
files
:
tar
.
add
(
os
.
path
.
join
(
dirname
,
file
).
replace
(
base_dir
,
"."
))
module_dir
=
module_dir
.
replace
(
base_dir
,
"."
)
tar
.
add
(
module_dir
,
recursive
=
False
)
files
=
[]
for
dirname
,
_
,
subfiles
in
os
.
walk
(
module_dir
):
for
file
in
subfiles
:
files
.
append
(
os
.
path
.
join
(
dirname
,
file
))
total_length
=
len
(
files
)
print
(
"Create Module {}-{}"
.
format
(
name
,
version
))
for
index
,
file
in
enumerate
(
files
):
done
=
int
(
float
(
index
)
/
total_length
*
50
)
progress
(
"[%-50s] %.2f%%"
%
(
'='
*
done
,
float
(
index
/
total_length
*
100
)))
tar
.
add
(
file
)
progress
(
"[%-50s] %.2f%%"
%
(
'='
*
50
,
100
),
end
=
True
)
print
(
"Module package saved as {}"
.
format
(
save_file
))
os
.
chdir
(
_cwd
)
...
...
paddlehub/reader/tokenization.py
浏览文件 @
7dbbef9c
...
...
@@ -170,7 +170,7 @@ class WSSPTokenizer(object):
self
.
inv_vocab
=
{
v
:
k
for
k
,
v
in
self
.
vocab
.
items
()}
self
.
ws
=
ws
self
.
lower
=
lower
self
.
dict
=
pickle
.
load
(
open
(
word_dict
,
'rb'
)
,
encoding
=
'utf8'
)
self
.
dict
=
pickle
.
load
(
open
(
word_dict
,
'rb'
))
self
.
sp_model
=
spm
.
SentencePieceProcessor
()
self
.
window_size
=
5
self
.
sp_model
.
Load
(
sp_model_dir
)
...
...
tutorial/bert_service.md
浏览文件 @
7dbbef9c
...
...
@@ -30,7 +30,7 @@
使用Bert Service搭建服务主要分为下面三个步骤:
## Step1:
环境准备
## Step1:
准备环境
### 环境要求
下表是使用
`Bert Service`
的环境要求,带有
*
号标志项为非必需依赖,可根据实际使用需求选择安装。
...
...
@@ -40,8 +40,7 @@
|PaddleHub|>=1.4.0|无|
|PaddlePaddle|>=1.6.1|若使用GPU计算,则对应使用PaddlePaddle-gpu版本|
|GCC|>=4.8|无|
|CUDA
*
|>=8|若使用GPU,需使用CUDA8以上版本|
|paddle-gpu-serving
*
|>=0.8.0|在
`Bert Service`
服务端需依赖此包|
|paddle-gpu-serving
*
|>=0.8.2|在
`Bert Service`
服务端需依赖此包|
|ujson
*
|>=1.35|在
`Bert Service`
客户端需依赖此包|
### 安装步骤
...
...
@@ -84,7 +83,7 @@ $ pip install ujson
|
[
bert_chinese_L-12_H-768_A-12
](
https://paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel
)
|BERT|
## Step2:服务端(server)
## Step2:
启动
服务端(server)
### 简介
server端接收client端发送的数据,执行模型计算过程并将计算结果返回给client端。
...
...
@@ -130,7 +129,7 @@ Paddle Inference Server exit successfully!
```
## Step3:客户端(client)
## Step3:
启动
客户端(client)
### 简介
client端接收文本数据,并获取server端返回的模型计算的embedding结果。
...
...
@@ -197,11 +196,11 @@ input_text = [["西风吹老洞庭波"], ["一夜湘君白发多"], ["醉后不
```
python
result
=
bc
.
get_result
(
input_text
=
input_text
)
```
最后即可得到
embedding结果(此处只展示部分结果)。
这样,就得到了
embedding结果(此处只展示部分结果)。
```
python
[[
0.9993321895599361
,
0.9994612336158751
,
0.9999646544456481
,
0.732795298099517
,
-
0.34387934207916204
,
...
]]
```
客户端代码demo文件见
[
示例
](
../
paddlehub/serving/bert_serving/bert_service
.py
)
。
客户端代码demo文件见
[
示例
](
../
demo/serving/bert_service/bert_service_client
.py
)
。
运行命令如下:
```
shell
$
python bert_service_client.py
...
...
tutorial/serving.md
浏览文件 @
7dbbef9c
# PaddleHub Serving模型一键服务部署
## 简介
### 为什么使用一键服务部署
使用PaddleHub能够快速进行
迁移学习和模型预测,但开发者常面临将训练好的模型部署上线的需求,无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一句命令快速得到一个预测服务API
,而无需关注网络框架选择和实现。
使用PaddleHub能够快速进行
模型预测,但开发者常面临本地预测过程迁移线上的需求。无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一行命令即可快速启动一个模型预测在线服务
,而无需关注网络框架选择和实现。
### 什么是一键服务部署
PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。
### 支持模型
目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括
`lac`
、
`senta_bilstm`
等nlp类模型,以及
`yolov3_coco2017`
、
`vgg16_imagenet`
等cv类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
### 所需环境
下表是使用PaddleHub Serving的环境要求及注意事项。
|项目|建议版本|说明|
|:-:|:-:|:-:|
|操作系统|Linux/Darwin/Windows|建议使用Linux或Darwin,对多线程启动方式支持性较好|
|PaddleHub|>=1.4.0|无|
|PaddlePaddle|>=1.6.1|若使用GPU计算,则对应使用PaddlePaddle-gpu版本|
目前PaddleHub Serving支持对PaddleHub所有可直接预测的模型进行服务部署,包括
`lac`
、
`senta_bilstm`
等NLP类模型,以及
`yolov3_darknet53_coco2017`
、
`vgg16_imagenet`
等CV类模型,更多模型请参见
[
PaddleHub支持模型列表
](
https://paddlepaddle.org.cn/hublist
)
。未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
## 使用
### Step1:启动服务端部署
PaddleHub Serving有两种启动方式,分别是使用命令行
命令
启动,以及使用配置文件启动。
PaddleHub Serving有两种启动方式,分别是使用命令行启动,以及使用配置文件启动。
#### 命令行命令启动
启动命令
...
...
@@ -37,7 +28,7 @@ $ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
|--modules/-m|PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出
<br>
*`当不指定Version时,默认选择最新版本`*
|
|--port/-p|服务端口,默认为8866|
|--use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu|
|--use_multiprocess|是否启用并发方式,默认为单进程方式|
|--use_multiprocess|是否启用并发方式,默认为单进程方式
,推荐多核CPU机器使用此方式
<br>
*`Windows操作系统只支持单进程方式`*
|
#### 配置文件启动
启动命令
...
...
@@ -60,8 +51,8 @@ $ hub serving start --config config.json
"batch_size"
:
"BATCH_SIZE_2"
}
],
"use_gpu"
:
false
,
"port"
:
8866
,
"use_gpu"
:
false
,
"use_multiprocess"
:
false
}
```
...
...
@@ -70,10 +61,10 @@ $ hub serving start --config config.json
|参数|用途|
|-|-|
|
--
modules_info|PaddleHub Serving预安装模型,以字典列表形式列出,其中:
<br>
`module`
为预测服务使用的模型名
<br>
`version`
为预测模型的版本
<br>
`batch_size`
为预测批次大小
|
--use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu
|
|
--port/-p|服务端口,默认为8866
|
|
--use_multiprocess|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式
|
|modules_info|PaddleHub Serving预安装模型,以字典列表形式列出,其中:
<br>
`module`
为预测服务使用的模型名
<br>
`version`
为预测模型的版本
<br>
`batch_size`
为预测批次大小
|
port|服务端口,默认为8866
|
|
use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu
|
|
use_multiprocess|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式
<br>
*`Windows操作系统只支持单进程方式`*
|
### Step2:访问服务端
...
...
@@ -99,7 +90,7 @@ http://0.0.0.0:8866/predict/<CATEGORY\>/\<MODULE>
### Step1:部署lac在线服务
现在,我们要部署一个lac在线服务,以通过接口获取文本的分词结果。
首先,
根据2.1节所述,启动PaddleHub Serving服务端的
两种方式分别为:
首先,
任意选择一种启动方式,
两种方式分别为:
```
shell
$
hub serving start
-m
lac
```
...
...
@@ -148,7 +139,7 @@ if __name__ == "__main__":
text_list
=
[
"今天是个好日子"
,
"天气预报说今天要下雨"
]
text
=
{
"text"
:
text_list
}
# 指定预测方法为lac并发送post请求
url
=
"http://
127.0.0.1
:8866/predict/text/lac"
url
=
"http://
0.0.0.0
:8866/predict/text/lac"
r
=
requests
.
post
(
url
=
url
,
data
=
text
)
# 打印预测结果
...
...
@@ -180,6 +171,8 @@ if __name__ == "__main__":
}
```
此Demo的具体信息和代码请参见
[
LAC Serving
](
../demo/serving/module_serving/lexical_analysis_lac
)
。另外,下面展示了一些其他的一键服务部署Demo。
## Demo——其他模型的一键部署服务
获取其他PaddleHub Serving的一键服务部署场景示例,可参见下列demo
...
...
@@ -217,4 +210,4 @@ if __name__ == "__main__":
  
该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
## Bert Service
除了预训练模型一键服务部署功能
外
外,PaddleHub Serving还具有
`Bert Service`
功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见
[
Bert Service
](
./bert_service.md
)
。
除了预训练模型一键服务部署功能
之
外,PaddleHub Serving还具有
`Bert Service`
功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见
[
Bert Service
](
./bert_service.md
)
。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录