Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleHub
提交
7dbbef9c
P
PaddleHub
项目概览
PaddlePaddle
/
PaddleHub
大约 1 年 前同步成功
通知
282
Star
12117
Fork
2091
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
200
列表
看板
标记
里程碑
合并请求
4
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleHub
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
200
Issue
200
列表
看板
标记
里程碑
合并请求
4
合并请求
4
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
7dbbef9c
编写于
1月 13, 2020
作者:
Z
zhangxuefei
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of
https://github.com/PaddlePaddle/PaddleHub
into develop
上级
09835004
1491d0cb
变更
42
显示空白变更内容
内联
并排
Showing
42 changed file
with
247 addition
and
288 deletion
+247
-288
demo/sequence_labeling/predict.py
demo/sequence_labeling/predict.py
+6
-5
demo/serving/README.md
demo/serving/README.md
+4
-4
demo/serving/bert_service/README.md
demo/serving/bert_service/README.md
+1
-2
demo/serving/module_serving/README.md
demo/serving/module_serving/README.md
+18
-104
demo/serving/module_serving/lexical_analysis_lac/README.md
demo/serving/module_serving/lexical_analysis_lac/README.md
+1
-1
demo/serving/module_serving/lexical_analysis_lac/lac_serving_demo.py
...g/module_serving/lexical_analysis_lac/lac_serving_demo.py
+1
-1
demo/serving/module_serving/lexical_analysis_lac/lac_with_dict_serving_demo.py
...erving/lexical_analysis_lac/lac_with_dict_serving_demo.py
+1
-1
demo/serving/module_serving/semantic_model_simnet_bow/simnet_bow_serving_demo.py
...ving/semantic_model_simnet_bow/simnet_bow_serving_demo.py
+1
-1
demo/serving/module_serving/sentiment_analysis_senta_lstm/senta_lstm_serving_demo.py
.../sentiment_analysis_senta_lstm/senta_lstm_serving_demo.py
+1
-1
demo/serving/module_serving/text_censorship_porn_detection_lstm/porn_detection_lstm_serving_demo.py
...p_porn_detection_lstm/porn_detection_lstm_serving_demo.py
+1
-1
paddlehub/commands/serving.py
paddlehub/commands/serving.py
+30
-18
paddlehub/common/downloader.py
paddlehub/common/downloader.py
+1
-1
paddlehub/dataset/base_cv_dataset.py
paddlehub/dataset/base_cv_dataset.py
+3
-3
paddlehub/dataset/base_nlp_dataset.py
paddlehub/dataset/base_nlp_dataset.py
+42
-24
paddlehub/dataset/bq.py
paddlehub/dataset/bq.py
+2
-2
paddlehub/dataset/chnsenticorp.py
paddlehub/dataset/chnsenticorp.py
+2
-2
paddlehub/dataset/cmrc2018.py
paddlehub/dataset/cmrc2018.py
+2
-2
paddlehub/dataset/dataset.py
paddlehub/dataset/dataset.py
+10
-10
paddlehub/dataset/dogcat.py
paddlehub/dataset/dogcat.py
+2
-2
paddlehub/dataset/drcd.py
paddlehub/dataset/drcd.py
+2
-2
paddlehub/dataset/flowers.py
paddlehub/dataset/flowers.py
+2
-2
paddlehub/dataset/food101.py
paddlehub/dataset/food101.py
+2
-2
paddlehub/dataset/glue.py
paddlehub/dataset/glue.py
+2
-2
paddlehub/dataset/iflytek.py
paddlehub/dataset/iflytek.py
+2
-2
paddlehub/dataset/indoor67.py
paddlehub/dataset/indoor67.py
+2
-2
paddlehub/dataset/inews.py
paddlehub/dataset/inews.py
+2
-2
paddlehub/dataset/lcqmc.py
paddlehub/dataset/lcqmc.py
+2
-2
paddlehub/dataset/msra_ner.py
paddlehub/dataset/msra_ner.py
+2
-2
paddlehub/dataset/nlpcc_dbqa.py
paddlehub/dataset/nlpcc_dbqa.py
+2
-2
paddlehub/dataset/squad.py
paddlehub/dataset/squad.py
+2
-2
paddlehub/dataset/stanford_dogs.py
paddlehub/dataset/stanford_dogs.py
+2
-2
paddlehub/dataset/thucnews.py
paddlehub/dataset/thucnews.py
+2
-2
paddlehub/dataset/toxic.py
paddlehub/dataset/toxic.py
+2
-2
paddlehub/dataset/xnli.py
paddlehub/dataset/xnli.py
+2
-2
paddlehub/finetune/task/base_task.py
paddlehub/finetune/task/base_task.py
+44
-34
paddlehub/finetune/task/classifier_task.py
paddlehub/finetune/task/classifier_task.py
+1
-1
paddlehub/finetune/task/reading_comprehension_task.py
paddlehub/finetune/task/reading_comprehension_task.py
+3
-2
paddlehub/module/manager.py
paddlehub/module/manager.py
+2
-3
paddlehub/module/module.py
paddlehub/module/module.py
+17
-4
paddlehub/reader/tokenization.py
paddlehub/reader/tokenization.py
+1
-1
tutorial/bert_service.md
tutorial/bert_service.md
+6
-7
tutorial/serving.md
tutorial/serving.md
+14
-21
未找到文件。
demo/sequence_labeling/predict.py
浏览文件 @
7dbbef9c
...
@@ -87,12 +87,13 @@ if __name__ == '__main__':
...
@@ -87,12 +87,13 @@ if __name__ == '__main__':
add_crf
=
True
)
add_crf
=
True
)
# Data to be predicted
# Data to be predicted
# If using python 2, prefix "u" is necessary
data
=
[
data
=
[
[
"我们变而以书会友,以书结缘,把欧美、港台流行的食品类图谱、画册、工具书汇集一堂。"
],
[
u
"我们变而以书会友,以书结缘,把欧美、港台流行的食品类图谱、画册、工具书汇集一堂。"
],
[
"为了跟踪国际最新食品工艺、流行趋势,大量搜集海外专业书刊资料是提高技艺的捷径。"
],
[
u
"为了跟踪国际最新食品工艺、流行趋势,大量搜集海外专业书刊资料是提高技艺的捷径。"
],
[
"其中线装古籍逾千册;民国出版物几百种;珍本四册、稀见本四百余册,出版时间跨越三百余年。"
],
[
u
"其中线装古籍逾千册;民国出版物几百种;珍本四册、稀见本四百余册,出版时间跨越三百余年。"
],
[
"有的古木交柯,春机荣欣,从诗人句中得之,而入画中,观之令人心驰。"
],
[
u
"有的古木交柯,春机荣欣,从诗人句中得之,而入画中,观之令人心驰。"
],
[
"不过重在晋趣,略增明人气息,妙在集古有道、不露痕迹罢了。"
],
[
u
"不过重在晋趣,略增明人气息,妙在集古有道、不露痕迹罢了。"
],
]
]
# Add 0x02 between characters to match the format of training data,
# Add 0x02 between characters to match the format of training data,
...
...
demo/serving/README.md
浏览文件 @
7dbbef9c
...
@@ -8,11 +8,11 @@ PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通
...
@@ -8,11 +8,11 @@ PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通
PaddleHub Serving主要包括利用Bert Service实现embedding服务化,以及利用预测模型实现预训练模型预测服务化两大功能,未来还将支持开发者使用PaddleHub Fine-tune API的模型服务化。
PaddleHub Serving主要包括利用Bert Service实现embedding服务化,以及利用预测模型实现预训练模型预测服务化两大功能,未来还将支持开发者使用PaddleHub Fine-tune API的模型服务化。
## Bert Service
## Bert Service
Bert Service
是基于
[
Paddle Serving
](
https://github.com/PaddlePaddle/Serving
)
框架的快速部署模型远程计算服务方案,可将embedding过程通过调用API接口的方式实现,减少了对机器资源的依赖。使用PaddleHub可在服务器上一键部署
`Bert Service`
服务,在另外的普通机器上通过客户端接口即可轻松的获取文本对应的embedding数据。
`Bert Service`
是基于
[
Paddle Serving
](
https://github.com/PaddlePaddle/Serving
)
框架的快速部署模型远程计算服务方案,可将embedding过程通过调用API接口的方式实现,减少了对机器资源的依赖。使用PaddleHub可在服务器上一键部署
`Bert Service`
服务,在另外的普通机器上通过客户端接口即可轻松的获取文本对应的embedding数据。
关于
Bert Service的
具体信息和demo请参见
[
Bert Service
](
../../tutorial/bert_service.md
)
关于
其
具体信息和demo请参见
[
Bert Service
](
../../tutorial/bert_service.md
)
该示例展示了利用
Bert Service
进行远程embedding服务化部署和在线预测,并获取文本embedding结果。
该示例展示了利用
`Bert Service`
进行远程embedding服务化部署和在线预测,并获取文本embedding结果。
## 预训练模型一键服务部署
## 预训练模型一键服务部署
预训练模型一键服务部署是基于PaddleHub的预训练模型快速部署的服务化方案,能够将模型预测以API接口的方式实现。
预训练模型一键服务部署是基于PaddleHub的预训练模型快速部署的服务化方案,能够将模型预测以API接口的方式实现。
...
@@ -53,4 +53,4 @@ Bert Service是基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)
...
@@ -53,4 +53,4 @@ Bert Service是基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)
  
该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
  
该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
关于Paddle Serving预训练模型一键服务部署功能的具体信息请参见
[
serving
](
module_serving
)
关于Paddle Serving预训练模型一键服务部署功能的具体信息请参见
[
Module Serving
](
module_serving
)
。
demo/serving/bert_service/README.md
浏览文件 @
7dbbef9c
...
@@ -68,5 +68,4 @@ Paddle Inference Server exit successfully!
...
@@ -68,5 +68,4 @@ Paddle Inference Server exit successfully!
这样,我们就利用一台GPU机器就完成了
`Bert Service`
的部署,并利用另一台普通机器进行了测试,可见通过
`Bert Service`
能够方便地进行在线embedding服务的快速部署。
这样,我们就利用一台GPU机器就完成了
`Bert Service`
的部署,并利用另一台普通机器进行了测试,可见通过
`Bert Service`
能够方便地进行在线embedding服务的快速部署。
## 预训练模型一键服务部署
## 预训练模型一键服务部署
除了
`Bert Service`
外,PaddleHub
除了
`Bert Service`
外,PaddleHub Serving还具有预训练模型一键服务部署功能,能够将预训练模型快捷部署上线,对外提供可靠的在线预测服务,具体信息请参见
[
Module Serving
](
../../../tutorial/serving.md
)
。
Serving还具有预训练模型一键服务部署功能,能够将预训练模型快捷部署上线,对外提供可靠的在线预测服务,具体信息请参见
[
Module Serving
](
../../../tutorial/serving.md
)
。
demo/serving/module_serving/README.md
浏览文件 @
7dbbef9c
# PaddleHub Serving模型一键服务部署
# PaddleHub Serving模型一键服务部署
## 简介
## 简介
### 为什么使用一键服务部署
### 为什么使用一键服务部署
使用PaddleHub能够快速进行
迁移学习和模型预测,但开发者常面临将训练好的模型部署上线的需求,无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一句命令快速得到一个预测服务API
,而无需关注网络框架选择和实现。
使用PaddleHub能够快速进行
模型预测,但开发者常面临本地预测过程迁移线上的需求。无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一行命令即可快速启动一个模型预测在线服务
,而无需关注网络框架选择和实现。
### 什么是一键服务部署
### 什么是一键服务部署
PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。
PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。
### 支持模型
### 支持模型
目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括
`lac`
、
`senta_bilstm`
等nlp类模型,以及
`yolov3_coco2017`
、
`vgg16_imagenet`
等cv类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括
`lac`
、
`senta_bilstm`
等NLP类模型,以及
`yolov3_darknett53_coco2017`
、
`vgg16_imagenet`
等CV类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
**NOTE:**
关于PaddleHub Serving一键服务部署的具体信息请参见
[
PaddleHub Servung
](
../../../tutorial/serving.md
)
。
**NOTE:**
关于PaddleHub Serving一键服务部署的具体信息请参见
[
PaddleHub Serving
](
../../../tutorial/serving.md
)
。
## Demo——部署一个在线lac分词服务
## Demo
### Step1:部署lac在线服务
获取PaddleHub Serving的一键服务部署场景示例,可参见下列demo:
现在,我们要部署一个lac在线服务,以通过接口获取文本的分词结果。
*
[
图像分类-基于vgg11_imagent
](
../module_serving/classification_vgg11_imagenet
)
首先,根据2.1节所述,启动PaddleHub Serving服务端的两种方式分别为:
```
shell
$
hub serving start
-m
lac
```
或
```
shell
$
hub serving start
-c
serving_config.json
```
其中
`serving_config.json`
的内容如下:
```
json
{
"modules_info"
:
[
{
"module"
:
"lac"
,
"version"
:
"1.0.0"
,
"batch_size"
:
1
}
],
"use_gpu"
:
false
,
"port"
:
8866
,
"use_multiprocess"
:
false
}
```
启动成功界面如图:
<p
align=
"center"
>
<img
src=
"../demo/serving/module_serving/img/start_serving_lac.png"
width=
"100%"
/>
</p>
这样我们就在8866端口部署了lac的在线分词服务。
*此处warning为Flask提示,不影响使用*
### Step2:访问lac预测接口
在服务部署好之后,我们可以进行测试,用来测试的文本为
`今天是个好日子`
和
`天气预报说今天要下雨`
。
客户端代码如下:
```
python
# coding: utf8
import
requests
import
json
if
__name__
==
"__main__"
:
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list
=
[
"今天是个好日子"
,
"天气预报说今天要下雨"
]
text
=
{
"text"
:
text_list
}
# 指定预测方法为lac并发送post请求
url
=
"http://127.0.0.1:8866/predict/text/lac"
r
=
requests
.
post
(
url
=
url
,
data
=
text
)
# 打印预测结果
print
(
json
.
dumps
(
r
.
json
(),
indent
=
4
,
ensure_ascii
=
False
))
```
运行后得到结果:
```
python
{
"results"
:
[
{
"tag"
:
[
"TIME"
,
"v"
,
"q"
,
"n"
],
"word"
:
[
"今天"
,
"是"
,
"个"
,
"好日子"
]
},
{
"tag"
:
[
"n"
,
"v"
,
"TIME"
,
"v"
,
"v"
],
"word"
:
[
"天气预报"
,
"说"
,
"今天"
,
"要"
,
"下雨"
]
}
]
}
```
## Demo——其他模型的一键部署服务
获取其他PaddleHub Serving的一键服务部署场景示例,可参见下列demo:
*
[
图像分类-基于vgg11_imagent
](
../demo/serving/module_serving/classification_vgg11_imagenet
)
  
该示例展示了利用vgg11_imagent完成图像分类服务化部署和在线预测,获取图像分类结果。
  
该示例展示了利用vgg11_imagent完成图像分类服务化部署和在线预测,获取图像分类结果。
*
[
图像生成-基于stgan_celeba
](
../
demo/serving/
module_serving/GAN_stgan_celeba
)
*
[
图像生成-基于stgan_celeba
](
../module_serving/GAN_stgan_celeba
)
  
该示例展示了利用stgan_celeba生成图像服务化部署和在线预测,获取指定风格的生成图像。
  
该示例展示了利用stgan_celeba生成图像服务化部署和在线预测,获取指定风格的生成图像。
*
[
文本审核-基于porn_detection_lstm
](
../
demo/serving/
module_serving/text_censorship_porn_detection_lstm
)
*
[
文本审核-基于porn_detection_lstm
](
../module_serving/text_censorship_porn_detection_lstm
)
  
该示例展示了利用porn_detection_lstm完成中文文本黄色敏感信息鉴定的服务化部署和在线预测,获取文本是否敏感及其置信度。
  
该示例展示了利用porn_detection_lstm完成中文文本黄色敏感信息鉴定的服务化部署和在线预测,获取文本是否敏感及其置信度。
*
[
中文词法分析-基于lac
](
../
demo/serving/
module_serving/lexical_analysis_lac
)
*
[
中文词法分析-基于lac
](
../module_serving/lexical_analysis_lac
)
  
该示例展示了利用lac完成中文文本分词服务化部署和在线预测,获取文本的分词结果,并可通过用户自定义词典干预分词结果。
  
该示例展示了利用lac完成中文文本分词服务化部署和在线预测,获取文本的分词结果,并可通过用户自定义词典干预分词结果。
*
[
目标检测-基于yolov3_darknet53_coco2017
](
..
./demo/serving/
serving/object_detection_yolov3_darknet53_coco2017
)
*
[
目标检测-基于yolov3_darknet53_coco2017
](
..
/module_
serving/object_detection_yolov3_darknet53_coco2017
)
  
该示例展示了利用yolov3_darknet53_coco2017完成目标检测服务化部署和在线预测,获取检测结果和覆盖识别框的图片。
  
该示例展示了利用yolov3_darknet53_coco2017完成目标检测服务化部署和在线预测,获取检测结果和覆盖识别框的图片。
*
[
中文语义分析-基于simnet_bow
](
../
demo/serving/
module_serving/semantic_model_simnet_bow
)
*
[
中文语义分析-基于simnet_bow
](
../module_serving/semantic_model_simnet_bow
)
  
该示例展示了利用simnet_bow完成中文文本相似度检测服务化部署和在线预测,获取文本的相似程度。
  
该示例展示了利用simnet_bow完成中文文本相似度检测服务化部署和在线预测,获取文本的相似程度。
*
[
图像分割-基于deeplabv3p_xception65_humanseg
](
../
demo/serving/
module_serving/semantic_segmentation_deeplabv3p_xception65_humanseg
)
*
[
图像分割-基于deeplabv3p_xception65_humanseg
](
../module_serving/semantic_segmentation_deeplabv3p_xception65_humanseg
)
  
该示例展示了利用deeplabv3p_xception65_humanseg完成图像分割服务化部署和在线预测,获取识别结果和分割后的图像。
  
该示例展示了利用deeplabv3p_xception65_humanseg完成图像分割服务化部署和在线预测,获取识别结果和分割后的图像。
*
[
中文情感分析-基于simnet_bow
](
../
demo/serving/
module_serving/semantic_model_simnet_bow
)
*
[
中文情感分析-基于simnet_bow
](
../module_serving/semantic_model_simnet_bow
)
  
该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
  
该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
## Bert Service
## Bert Service
除了预训练模型一键服务部署功能
外
外,PaddleHub Serving还具有
`Bert Service`
功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见
[
Bert Service
](
../../../tutorial/bert_service.md
)
。
除了预训练模型一键服务部署功能
之
外,PaddleHub Serving还具有
`Bert Service`
功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见
[
Bert Service
](
../../../tutorial/bert_service.md
)
。
demo/serving/module_serving/lexical_analysis_lac/README.md
浏览文件 @
7dbbef9c
...
@@ -6,7 +6,7 @@
...
@@ -6,7 +6,7 @@
这里就带领大家使用PaddleHub Serving,通过简单几步部署一个词法分析在线服务。
这里就带领大家使用PaddleHub Serving,通过简单几步部署一个词法分析在线服务。
##
2
启动PaddleHub Serving
##
Step1:
启动PaddleHub Serving
启动命令如下
启动命令如下
```
shell
```
shell
$
hub serving start
-m
lac
$
hub serving start
-m
lac
...
...
demo/serving/module_serving/lexical_analysis_lac/lac_serving_demo.py
浏览文件 @
7dbbef9c
...
@@ -3,7 +3,7 @@ import requests
...
@@ -3,7 +3,7 @@ import requests
import
json
import
json
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
# 指定用于
用于
预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list
=
[
"今天是个好日子"
,
"天气预报说今天要下雨"
]
text_list
=
[
"今天是个好日子"
,
"天气预报说今天要下雨"
]
text
=
{
"text"
:
text_list
}
text
=
{
"text"
:
text_list
}
# 指定预测方法为lac并发送post请求
# 指定预测方法为lac并发送post请求
...
...
demo/serving/module_serving/lexical_analysis_lac/lac_with_dict_serving_demo.py
浏览文件 @
7dbbef9c
...
@@ -3,7 +3,7 @@ import requests
...
@@ -3,7 +3,7 @@ import requests
import
json
import
json
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
# 指定用于
用于
预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list
=
[
"今天是个好日子"
,
"天气预报说今天要下雨"
]
text_list
=
[
"今天是个好日子"
,
"天气预报说今天要下雨"
]
text
=
{
"text"
:
text_list
}
text
=
{
"text"
:
text_list
}
# 指定自定义词典{"user_dict": dict.txt}
# 指定自定义词典{"user_dict": dict.txt}
...
...
demo/serving/module_serving/semantic_model_simnet_bow/simnet_bow_serving_demo.py
浏览文件 @
7dbbef9c
...
@@ -3,7 +3,7 @@ import requests
...
@@ -3,7 +3,7 @@ import requests
import
json
import
json
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
# 指定用于
用于
匹配的文本并生成字典{"text_1": [text_a1, text_a2, ... ]
# 指定用于匹配的文本并生成字典{"text_1": [text_a1, text_a2, ... ]
# "text_2": [text_b1, text_b2, ... ]}
# "text_2": [text_b1, text_b2, ... ]}
text
=
{
text
=
{
"text_1"
:
[
"这道题太难了"
,
"这道题太难了"
,
"这道题太难了"
],
"text_1"
:
[
"这道题太难了"
,
"这道题太难了"
,
"这道题太难了"
],
...
...
demo/serving/module_serving/sentiment_analysis_senta_lstm/senta_lstm_serving_demo.py
浏览文件 @
7dbbef9c
...
@@ -3,7 +3,7 @@ import requests
...
@@ -3,7 +3,7 @@ import requests
import
json
import
json
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
# 指定用于
用于
预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list
=
[
"我不爱吃甜食"
,
"我喜欢躺在床上看电影"
]
text_list
=
[
"我不爱吃甜食"
,
"我喜欢躺在床上看电影"
]
text
=
{
"text"
:
text_list
}
text
=
{
"text"
:
text_list
}
# 指定预测方法为senta_lstm并发送post请求
# 指定预测方法为senta_lstm并发送post请求
...
...
demo/serving/module_serving/text_censorship_porn_detection_lstm/porn_detection_lstm_serving_demo.py
浏览文件 @
7dbbef9c
...
@@ -3,7 +3,7 @@ import requests
...
@@ -3,7 +3,7 @@ import requests
import
json
import
json
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
# 指定用于
用于
预测的文本并生成字典{"text": [text_1, text_2, ... ]}
# 指定用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list
=
[
"黄片下载"
,
"中国黄页"
]
text_list
=
[
"黄片下载"
,
"中国黄页"
]
text
=
{
"text"
:
text_list
}
text
=
{
"text"
:
text_list
}
# 指定预测方法为lac并发送post请求
# 指定预测方法为lac并发送post请求
...
...
paddlehub/commands/serving.py
浏览文件 @
7dbbef9c
...
@@ -26,14 +26,22 @@ import paddlehub as hub
...
@@ -26,14 +26,22 @@ import paddlehub as hub
from
paddlehub.commands.base_command
import
BaseCommand
,
ENTRY
from
paddlehub.commands.base_command
import
BaseCommand
,
ENTRY
from
paddlehub.serving
import
app_single
as
app
from
paddlehub.serving
import
app_single
as
app
import
multiprocessing
import
multiprocessing
import
gunicorn.app.base
if
platform
.
system
()
==
"Windows"
:
def
number_of_workers
():
class
StandaloneApplication
(
object
):
return
(
multiprocessing
.
cpu_count
()
*
2
)
+
1
def
__init__
(
self
):
pass
def
load_config
(
self
):
pass
def
load
(
self
):
pass
else
:
import
gunicorn.app.base
class
StandaloneApplication
(
gunicorn
.
app
.
base
.
BaseApplication
):
class
StandaloneApplication
(
gunicorn
.
app
.
base
.
BaseApplication
):
def
__init__
(
self
,
app
,
options
=
None
):
def
__init__
(
self
,
app
,
options
=
None
):
self
.
options
=
options
or
{}
self
.
options
=
options
or
{}
self
.
application
=
app
self
.
application
=
app
...
@@ -52,6 +60,10 @@ class StandaloneApplication(gunicorn.app.base.BaseApplication):
...
@@ -52,6 +60,10 @@ class StandaloneApplication(gunicorn.app.base.BaseApplication):
return
self
.
application
return
self
.
application
def
number_of_workers
():
return
(
multiprocessing
.
cpu_count
()
*
2
)
+
1
class
ServingCommand
(
BaseCommand
):
class
ServingCommand
(
BaseCommand
):
name
=
"serving"
name
=
"serving"
module_list
=
[]
module_list
=
[]
...
...
paddlehub/common/downloader.py
浏览文件 @
7dbbef9c
...
@@ -29,7 +29,7 @@ import tarfile
...
@@ -29,7 +29,7 @@ import tarfile
from
paddlehub.common
import
utils
from
paddlehub.common
import
utils
from
paddlehub.common.logger
import
logger
from
paddlehub.common.logger
import
logger
__all__
=
[
'Downloader'
]
__all__
=
[
'Downloader'
,
'progress'
]
FLUSH_INTERVAL
=
0.1
FLUSH_INTERVAL
=
0.1
lasttime
=
time
.
time
()
lasttime
=
time
.
time
()
...
...
paddlehub/dataset/base_cv_dataset.py
浏览文件 @
7dbbef9c
...
@@ -26,7 +26,7 @@ from paddlehub.common.downloader import default_downloader
...
@@ -26,7 +26,7 @@ from paddlehub.common.downloader import default_downloader
from
paddlehub.common.logger
import
logger
from
paddlehub.common.logger
import
logger
class
BaseCVDatast
(
BaseDataset
):
class
BaseCVDatas
e
t
(
BaseDataset
):
def
__init__
(
self
,
def
__init__
(
self
,
base_path
,
base_path
,
train_list_file
=
None
,
train_list_file
=
None
,
...
@@ -35,7 +35,7 @@ class BaseCVDatast(BaseDataset):
...
@@ -35,7 +35,7 @@ class BaseCVDatast(BaseDataset):
predict_list_file
=
None
,
predict_list_file
=
None
,
label_list_file
=
None
,
label_list_file
=
None
,
label_list
=
None
):
label_list
=
None
):
super
(
BaseCVDatast
,
self
).
__init__
(
super
(
BaseCVDatas
e
t
,
self
).
__init__
(
base_path
=
base_path
,
base_path
=
base_path
,
train_file
=
train_list_file
,
train_file
=
train_list_file
,
dev_file
=
validate_list_file
,
dev_file
=
validate_list_file
,
...
@@ -65,7 +65,7 @@ class BaseCVDatast(BaseDataset):
...
@@ -65,7 +65,7 @@ class BaseCVDatast(BaseDataset):
return
data
return
data
# discarded. please use BaseCVDatast
# discarded. please use BaseCVDatas
e
t
class
ImageClassificationDataset
(
object
):
class
ImageClassificationDataset
(
object
):
def
__init__
(
self
):
def
__init__
(
self
):
logger
.
warning
(
logger
.
warning
(
...
...
paddlehub/dataset/base_nlp_dataset.py
浏览文件 @
7dbbef9c
...
@@ -21,9 +21,10 @@ import io
...
@@ -21,9 +21,10 @@ import io
import
csv
import
csv
from
paddlehub.dataset
import
InputExample
,
BaseDataset
from
paddlehub.dataset
import
InputExample
,
BaseDataset
from
paddlehub.common.logger
import
logger
class
BaseNLPDatast
(
BaseDataset
):
class
BaseNLPDatas
e
t
(
BaseDataset
):
def
__init__
(
self
,
def
__init__
(
self
,
base_path
,
base_path
,
train_file
=
None
,
train_file
=
None
,
...
@@ -32,11 +33,11 @@ class BaseNLPDatast(BaseDataset):
...
@@ -32,11 +33,11 @@ class BaseNLPDatast(BaseDataset):
predict_file
=
None
,
predict_file
=
None
,
label_file
=
None
,
label_file
=
None
,
label_list
=
None
,
label_list
=
None
,
train_file_with_head
=
False
,
train_file_with_head
er
=
False
,
dev_file_with_head
=
False
,
dev_file_with_head
er
=
False
,
test_file_with_head
=
False
,
test_file_with_head
er
=
False
,
predict_file_with_head
=
False
):
predict_file_with_head
er
=
False
):
super
(
BaseNLPDatast
,
self
).
__init__
(
super
(
BaseNLPDatas
e
t
,
self
).
__init__
(
base_path
=
base_path
,
base_path
=
base_path
,
train_file
=
train_file
,
train_file
=
train_file
,
dev_file
=
dev_file
,
dev_file
=
dev_file
,
...
@@ -44,25 +45,24 @@ class BaseNLPDatast(BaseDataset):
...
@@ -44,25 +45,24 @@ class BaseNLPDatast(BaseDataset):
predict_file
=
predict_file
,
predict_file
=
predict_file
,
label_file
=
label_file
,
label_file
=
label_file
,
label_list
=
label_list
,
label_list
=
label_list
,
train_file_with_head
=
train_file_with_head
,
train_file_with_head
er
=
train_file_with_header
,
dev_file_with_head
=
dev_file_with_head
,
dev_file_with_head
er
=
dev_file_with_header
,
test_file_with_head
=
test_file_with_head
,
test_file_with_head
er
=
test_file_with_header
,
predict_file_with_head
=
predict_file_with_head
)
predict_file_with_head
er
=
predict_file_with_header
)
def
_read_file
(
self
,
input_file
,
phase
=
None
):
def
_read_file
(
self
,
input_file
,
phase
=
None
):
"""Reads a tab separated value file."""
"""Reads a tab separated value file."""
has_warned
=
False
with
io
.
open
(
input_file
,
"r"
,
encoding
=
"UTF-8"
)
as
file
:
with
io
.
open
(
input_file
,
"r"
,
encoding
=
"UTF-8"
)
as
file
:
reader
=
csv
.
reader
(
file
,
delimiter
=
"
\t
"
,
quotechar
=
None
)
reader
=
csv
.
reader
(
file
,
delimiter
=
"
\t
"
,
quotechar
=
None
)
examples
=
[]
examples
=
[]
for
(
i
,
line
)
in
enumerate
(
reader
):
for
(
i
,
line
)
in
enumerate
(
reader
):
if
i
==
0
:
if
i
==
0
:
ncol
=
len
(
line
)
ncol
=
len
(
line
)
if
self
.
if_file_with_head
[
phase
]:
if
self
.
if_file_with_head
er
[
phase
]:
continue
continue
if
ncol
==
1
:
if
phase
!=
"predict"
:
if
phase
!=
"predict"
:
example
=
InputExample
(
guid
=
i
,
text_a
=
line
[
0
])
if
ncol
==
1
:
else
:
raise
Exception
(
raise
Exception
(
"the %s file: %s only has one column but it is not a predict file"
"the %s file: %s only has one column but it is not a predict file"
%
(
phase
,
input_file
))
%
(
phase
,
input_file
))
...
@@ -71,10 +71,28 @@ class BaseNLPDatast(BaseDataset):
...
@@ -71,10 +71,28 @@ class BaseNLPDatast(BaseDataset):
guid
=
i
,
text_a
=
line
[
0
],
label
=
line
[
1
])
guid
=
i
,
text_a
=
line
[
0
],
label
=
line
[
1
])
elif
ncol
==
3
:
elif
ncol
==
3
:
example
=
InputExample
(
example
=
InputExample
(
guid
=
i
,
text_a
=
line
[
0
],
text_b
=
line
[
1
],
label
=
line
[
2
])
guid
=
i
,
text_a
=
line
[
0
],
text_b
=
line
[
1
],
label
=
line
[
2
])
else
:
raise
Exception
(
"the %s file: %s has too many columns (should <=3)"
%
(
phase
,
input_file
))
else
:
if
ncol
==
1
:
example
=
InputExample
(
guid
=
i
,
text_a
=
line
[
0
])
elif
ncol
==
2
:
if
not
has_warned
:
logger
.
warning
(
"the predict file: %s has 2 columns, as it is a predict file, the second one will be regarded as text_b"
%
(
input_file
))
has_warned
=
True
example
=
InputExample
(
guid
=
i
,
text_a
=
line
[
0
],
text_b
=
line
[
1
])
else
:
else
:
raise
Exception
(
raise
Exception
(
"the %s file: %s has too many columns (should <=3)"
%
"the predict file: %s has too many columns (should <=2)"
(
phase
,
input_file
))
%
(
input_file
))
examples
.
append
(
example
)
examples
.
append
(
example
)
return
examples
return
examples
paddlehub/dataset/bq.py
浏览文件 @
7dbbef9c
...
@@ -20,10 +20,10 @@ from __future__ import print_function
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
import
os
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
class
BQ
(
BaseNLPDatast
):
class
BQ
(
BaseNLPDatas
e
t
):
def
__init__
(
self
):
def
__init__
(
self
):
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"bq"
)
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"bq"
)
base_path
=
self
.
_download_dataset
(
base_path
=
self
.
_download_dataset
(
...
...
paddlehub/dataset/chnsenticorp.py
浏览文件 @
7dbbef9c
...
@@ -23,10 +23,10 @@ import csv
...
@@ -23,10 +23,10 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
class
ChnSentiCorp
(
BaseNLPDatast
):
class
ChnSentiCorp
(
BaseNLPDatas
e
t
):
"""
"""
ChnSentiCorp (by Tan Songbo at ICT of Chinese Academy of Sciences, and for
ChnSentiCorp (by Tan Songbo at ICT of Chinese Academy of Sciences, and for
opinion mining)
opinion mining)
...
...
paddlehub/dataset/cmrc2018.py
浏览文件 @
7dbbef9c
...
@@ -20,7 +20,7 @@ import os
...
@@ -20,7 +20,7 @@ import os
from
paddlehub.reader
import
tokenization
from
paddlehub.reader
import
tokenization
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.logger
import
logger
from
paddlehub.common.logger
import
logger
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/cmrc2018.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/cmrc2018.tar.gz"
SPIECE_UNDERLINE
=
'▁'
SPIECE_UNDERLINE
=
'▁'
...
@@ -62,7 +62,7 @@ class CMRC2018Example(object):
...
@@ -62,7 +62,7 @@ class CMRC2018Example(object):
return
s
return
s
class
CMRC2018
(
BaseNLPDatast
):
class
CMRC2018
(
BaseNLPDatas
e
t
):
"""A single set of features of data."""
"""A single set of features of data."""
def
__init__
(
self
):
def
__init__
(
self
):
...
...
paddlehub/dataset/dataset.py
浏览文件 @
7dbbef9c
...
@@ -64,10 +64,10 @@ class BaseDataset(object):
...
@@ -64,10 +64,10 @@ class BaseDataset(object):
predict_file
=
None
,
predict_file
=
None
,
label_file
=
None
,
label_file
=
None
,
label_list
=
None
,
label_list
=
None
,
train_file_with_head
=
False
,
train_file_with_head
er
=
False
,
dev_file_with_head
=
False
,
dev_file_with_head
er
=
False
,
test_file_with_head
=
False
,
test_file_with_head
er
=
False
,
predict_file_with_head
=
False
):
predict_file_with_head
er
=
False
):
if
not
(
train_file
or
dev_file
or
test_file
):
if
not
(
train_file
or
dev_file
or
test_file
):
raise
ValueError
(
"At least one file should be assigned"
)
raise
ValueError
(
"At least one file should be assigned"
)
self
.
base_path
=
base_path
self
.
base_path
=
base_path
...
@@ -83,11 +83,11 @@ class BaseDataset(object):
...
@@ -83,11 +83,11 @@ class BaseDataset(object):
self
.
test_examples
=
[]
self
.
test_examples
=
[]
self
.
predict_examples
=
[]
self
.
predict_examples
=
[]
self
.
if_file_with_head
=
{
self
.
if_file_with_head
er
=
{
"train"
:
train_file_with_head
,
"train"
:
train_file_with_head
er
,
"dev"
:
dev_file_with_head
,
"dev"
:
dev_file_with_head
er
,
"test"
:
test_file_with_head
,
"test"
:
test_file_with_head
er
,
"predict"
:
predict_file_with_head
"predict"
:
predict_file_with_head
er
}
}
if
train_file
:
if
train_file
:
...
@@ -128,7 +128,7 @@ class BaseDataset(object):
...
@@ -128,7 +128,7 @@ class BaseDataset(object):
def
num_labels
(
self
):
def
num_labels
(
self
):
return
len
(
self
.
label_list
)
return
len
(
self
.
label_list
)
# To
compatibility with the usage of
ImageClassificationDataset
# To
be compatible with
ImageClassificationDataset
def
label_dict
(
self
):
def
label_dict
(
self
):
return
{
index
:
key
for
index
,
key
in
enumerate
(
self
.
label_list
)}
return
{
index
:
key
for
index
,
key
in
enumerate
(
self
.
label_list
)}
...
...
paddlehub/dataset/dogcat.py
浏览文件 @
7dbbef9c
...
@@ -20,10 +20,10 @@ from __future__ import print_function
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
import
os
import
paddlehub
as
hub
import
paddlehub
as
hub
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatast
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatas
e
t
class
DogCatDataset
(
BaseCVDatast
):
class
DogCatDataset
(
BaseCVDatas
e
t
):
def
__init__
(
self
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"dog-cat"
)
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"dog-cat"
)
base_path
=
self
.
_download_dataset
(
base_path
=
self
.
_download_dataset
(
...
...
paddlehub/dataset/drcd.py
浏览文件 @
7dbbef9c
...
@@ -20,7 +20,7 @@ import os
...
@@ -20,7 +20,7 @@ import os
from
paddlehub.reader
import
tokenization
from
paddlehub.reader
import
tokenization
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.logger
import
logger
from
paddlehub.common.logger
import
logger
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/drcd.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/drcd.tar.gz"
SPIECE_UNDERLINE
=
'▁'
SPIECE_UNDERLINE
=
'▁'
...
@@ -62,7 +62,7 @@ class DRCDExample(object):
...
@@ -62,7 +62,7 @@ class DRCDExample(object):
return
s
return
s
class
DRCD
(
BaseNLPDatast
):
class
DRCD
(
BaseNLPDatas
e
t
):
"""A single set of features of data."""
"""A single set of features of data."""
def
__init__
(
self
):
def
__init__
(
self
):
...
...
paddlehub/dataset/flowers.py
浏览文件 @
7dbbef9c
...
@@ -20,10 +20,10 @@ from __future__ import print_function
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
import
os
import
paddlehub
as
hub
import
paddlehub
as
hub
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatast
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatas
e
t
class
FlowersDataset
(
BaseCVDatast
):
class
FlowersDataset
(
BaseCVDatas
e
t
):
def
__init__
(
self
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"flower_photos"
)
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"flower_photos"
)
base_path
=
self
.
_download_dataset
(
base_path
=
self
.
_download_dataset
(
...
...
paddlehub/dataset/food101.py
浏览文件 @
7dbbef9c
...
@@ -20,10 +20,10 @@ from __future__ import print_function
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
import
os
import
paddlehub
as
hub
import
paddlehub
as
hub
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatast
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatas
e
t
class
Food101Dataset
(
BaseCVDatast
):
class
Food101Dataset
(
BaseCVDatas
e
t
):
def
__init__
(
self
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"food-101"
,
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"food-101"
,
"images"
)
"images"
)
...
...
paddlehub/dataset/glue.py
浏览文件 @
7dbbef9c
...
@@ -24,12 +24,12 @@ import io
...
@@ -24,12 +24,12 @@ import io
from
paddlehub.dataset
import
InputExample
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.logger
import
logger
from
paddlehub.common.logger
import
logger
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/glue_data.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/glue_data.tar.gz"
class
GLUE
(
BaseNLPDatast
):
class
GLUE
(
BaseNLPDatas
e
t
):
"""
"""
Please refer to
Please refer to
https://gluebenchmark.com
https://gluebenchmark.com
...
...
paddlehub/dataset/iflytek.py
浏览文件 @
7dbbef9c
...
@@ -22,12 +22,12 @@ import os
...
@@ -22,12 +22,12 @@ import os
from
paddlehub.dataset
import
InputExample
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/iflytek.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/iflytek.tar.gz"
class
IFLYTEK
(
BaseNLPDatast
):
class
IFLYTEK
(
BaseNLPDatas
e
t
):
def
__init__
(
self
):
def
__init__
(
self
):
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"iflytek"
)
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"iflytek"
)
base_path
=
self
.
_download_dataset
(
dataset_dir
,
url
=
_DATA_URL
)
base_path
=
self
.
_download_dataset
(
dataset_dir
,
url
=
_DATA_URL
)
...
...
paddlehub/dataset/indoor67.py
浏览文件 @
7dbbef9c
...
@@ -20,10 +20,10 @@ from __future__ import print_function
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
import
os
import
paddlehub
as
hub
import
paddlehub
as
hub
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatast
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatas
e
t
class
Indoor67Dataset
(
BaseCVDatast
):
class
Indoor67Dataset
(
BaseCVDatas
e
t
):
def
__init__
(
self
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"Indoor67"
)
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"Indoor67"
)
base_path
=
self
.
_download_dataset
(
base_path
=
self
.
_download_dataset
(
...
...
paddlehub/dataset/inews.py
浏览文件 @
7dbbef9c
...
@@ -23,12 +23,12 @@ import csv
...
@@ -23,12 +23,12 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/inews.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/inews.tar.gz"
class
INews
(
BaseNLPDatast
):
class
INews
(
BaseNLPDatas
e
t
):
"""
"""
INews is a sentiment analysis dataset for Internet News
INews is a sentiment analysis dataset for Internet News
"""
"""
...
...
paddlehub/dataset/lcqmc.py
浏览文件 @
7dbbef9c
...
@@ -23,12 +23,12 @@ import csv
...
@@ -23,12 +23,12 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/lcqmc.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/lcqmc.tar.gz"
class
LCQMC
(
BaseNLPDatast
):
class
LCQMC
(
BaseNLPDatas
e
t
):
def
__init__
(
self
):
def
__init__
(
self
):
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"lcqmc"
)
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"lcqmc"
)
base_path
=
self
.
_download_dataset
(
dataset_dir
,
url
=
_DATA_URL
)
base_path
=
self
.
_download_dataset
(
dataset_dir
,
url
=
_DATA_URL
)
...
...
paddlehub/dataset/msra_ner.py
浏览文件 @
7dbbef9c
...
@@ -23,12 +23,12 @@ import csv
...
@@ -23,12 +23,12 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/msra_ner.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/msra_ner.tar.gz"
class
MSRA_NER
(
BaseNLPDatast
):
class
MSRA_NER
(
BaseNLPDatas
e
t
):
"""
"""
A set of manually annotated Chinese word-segmentation data and
A set of manually annotated Chinese word-segmentation data and
specifications for training and testing a Chinese word-segmentation system
specifications for training and testing a Chinese word-segmentation system
...
...
paddlehub/dataset/nlpcc_dbqa.py
浏览文件 @
7dbbef9c
...
@@ -23,12 +23,12 @@ import csv
...
@@ -23,12 +23,12 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/nlpcc-dbqa.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/nlpcc-dbqa.tar.gz"
class
NLPCC_DBQA
(
BaseNLPDatast
):
class
NLPCC_DBQA
(
BaseNLPDatas
e
t
):
"""
"""
Please refer to
Please refer to
http://tcci.ccf.org.cn/conference/2017/dldoc/taskgline05.pdf
http://tcci.ccf.org.cn/conference/2017/dldoc/taskgline05.pdf
...
...
paddlehub/dataset/squad.py
浏览文件 @
7dbbef9c
...
@@ -20,7 +20,7 @@ import os
...
@@ -20,7 +20,7 @@ import os
from
paddlehub.reader
import
tokenization
from
paddlehub.reader
import
tokenization
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.logger
import
logger
from
paddlehub.common.logger
import
logger
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/squad.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/squad.tar.gz"
...
@@ -65,7 +65,7 @@ class SquadExample(object):
...
@@ -65,7 +65,7 @@ class SquadExample(object):
return
s
return
s
class
SQUAD
(
BaseNLPDatast
):
class
SQUAD
(
BaseNLPDatas
e
t
):
"""A single set of features of data."""
"""A single set of features of data."""
def
__init__
(
self
,
version_2_with_negative
=
False
):
def
__init__
(
self
,
version_2_with_negative
=
False
):
...
...
paddlehub/dataset/stanford_dogs.py
浏览文件 @
7dbbef9c
...
@@ -20,10 +20,10 @@ from __future__ import print_function
...
@@ -20,10 +20,10 @@ from __future__ import print_function
import
os
import
os
import
paddlehub
as
hub
import
paddlehub
as
hub
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatast
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDatas
e
t
class
StanfordDogsDataset
(
BaseCVDatast
):
class
StanfordDogsDataset
(
BaseCVDatas
e
t
):
def
__init__
(
self
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"StanfordDogs-120"
)
"StanfordDogs-120"
)
...
...
paddlehub/dataset/thucnews.py
浏览文件 @
7dbbef9c
...
@@ -22,12 +22,12 @@ import os
...
@@ -22,12 +22,12 @@ import os
from
paddlehub.dataset
import
InputExample
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/thucnews.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/thucnews.tar.gz"
class
THUCNEWS
(
BaseNLPDatast
):
class
THUCNEWS
(
BaseNLPDatas
e
t
):
def
__init__
(
self
):
def
__init__
(
self
):
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"thucnews"
)
dataset_dir
=
os
.
path
.
join
(
DATA_HOME
,
"thucnews"
)
base_path
=
self
.
_download_dataset
(
dataset_dir
,
url
=
_DATA_URL
)
base_path
=
self
.
_download_dataset
(
dataset_dir
,
url
=
_DATA_URL
)
...
...
paddlehub/dataset/toxic.py
浏览文件 @
7dbbef9c
...
@@ -22,12 +22,12 @@ import pandas as pd
...
@@ -22,12 +22,12 @@ import pandas as pd
from
paddlehub.dataset
import
InputExample
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/toxic.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/toxic.tar.gz"
class
Toxic
(
BaseNLPDatast
):
class
Toxic
(
BaseNLPDatas
e
t
):
"""
"""
The kaggle Toxic dataset:
The kaggle Toxic dataset:
https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
...
...
paddlehub/dataset/xnli.py
浏览文件 @
7dbbef9c
...
@@ -25,12 +25,12 @@ import csv
...
@@ -25,12 +25,12 @@ import csv
from
paddlehub.dataset
import
InputExample
from
paddlehub.dataset
import
InputExample
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.common.dir
import
DATA_HOME
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatast
from
paddlehub.dataset.base_nlp_dataset
import
BaseNLPDatas
e
t
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/XNLI-lan.tar.gz"
_DATA_URL
=
"https://bj.bcebos.com/paddlehub-dataset/XNLI-lan.tar.gz"
class
XNLI
(
BaseNLPDatast
):
class
XNLI
(
BaseNLPDatas
e
t
):
"""
"""
Please refer to
Please refer to
https://arxiv.org/pdf/1809.05053.pdf
https://arxiv.org/pdf/1809.05053.pdf
...
...
paddlehub/finetune/task/base_task.py
浏览文件 @
7dbbef9c
...
@@ -24,7 +24,12 @@ import copy
...
@@ -24,7 +24,12 @@ import copy
import
logging
import
logging
import
inspect
import
inspect
from
functools
import
partial
from
functools
import
partial
from
collections
import
OrderedDict
import
six
if
six
.
PY2
:
from
inspect
import
getargspec
as
get_args
else
:
from
inspect
import
getfullargspec
as
get_args
import
numpy
as
np
import
numpy
as
np
import
paddle.fluid
as
fluid
import
paddle.fluid
as
fluid
from
tb_paddle
import
SummaryWriter
from
tb_paddle
import
SummaryWriter
...
@@ -84,44 +89,44 @@ class RunEnv(object):
...
@@ -84,44 +89,44 @@ class RunEnv(object):
class
TaskHooks
():
class
TaskHooks
():
def
__init__
(
self
):
def
__init__
(
self
):
self
.
_registered_hooks
=
{
self
.
_registered_hooks
=
{
"build_env_start
"
:
{}
,
"build_env_start
_event"
:
OrderedDict
()
,
"build_env_end
"
:
{}
,
"build_env_end
_event"
:
OrderedDict
()
,
"finetune_start
"
:
{}
,
"finetune_start
_event"
:
OrderedDict
()
,
"finetune_end
"
:
{}
,
"finetune_end
_event"
:
OrderedDict
()
,
"predict_start
"
:
{}
,
"predict_start
_event"
:
OrderedDict
()
,
"predict_end
"
:
{}
,
"predict_end
_event"
:
OrderedDict
()
,
"eval_start
"
:
{}
,
"eval_start
_event"
:
OrderedDict
()
,
"eval_end
"
:
{}
,
"eval_end
_event"
:
OrderedDict
()
,
"log_interval
"
:
{}
,
"log_interval
_event"
:
OrderedDict
()
,
"save_ckpt_interval
"
:
{}
,
"save_ckpt_interval
_event"
:
OrderedDict
()
,
"eval_interval
"
:
{}
,
"eval_interval
_event"
:
OrderedDict
()
,
"run_step
"
:
{}
,
"run_step
_event"
:
OrderedDict
()
,
}
}
self
.
_hook_params_num
=
{
self
.
_hook_params_num
=
{
"build_env_start"
:
1
,
"build_env_start
_event
"
:
1
,
"build_env_end"
:
1
,
"build_env_end
_event
"
:
1
,
"finetune_start"
:
1
,
"finetune_start
_event
"
:
1
,
"finetune_end"
:
2
,
"finetune_end
_event
"
:
2
,
"predict_start"
:
1
,
"predict_start
_event
"
:
1
,
"predict_end"
:
2
,
"predict_end
_event
"
:
2
,
"eval_start"
:
1
,
"eval_start
_event
"
:
1
,
"eval_end"
:
2
,
"eval_end
_event
"
:
2
,
"log_interval"
:
2
,
"log_interval
_event
"
:
2
,
"save_ckpt_interval"
:
1
,
"save_ckpt_interval
_event
"
:
1
,
"eval_interval"
:
1
,
"eval_interval
_event
"
:
1
,
"run_step"
:
2
,
"run_step
_event
"
:
2
,
}
}
def
add
(
self
,
hook_type
,
name
=
None
,
func
=
None
):
def
add
(
self
,
hook_type
,
name
=
None
,
func
=
None
):
if
not
func
or
not
callable
(
func
):
if
not
func
or
not
callable
(
func
):
raise
TypeError
(
raise
TypeError
(
"The hook function is empty or it is not a function"
)
"The hook function is empty or it is not a function"
)
if
name
and
not
isinstance
(
name
,
str
):
if
name
==
None
:
raise
TypeError
(
"The hook name must be a string"
)
if
not
name
:
name
=
"hook_%s"
%
id
(
func
)
name
=
"hook_%s"
%
id
(
func
)
# check validity
# check validity
if
not
isinstance
(
name
,
str
)
or
name
.
strip
()
==
""
:
raise
TypeError
(
"The hook name must be a non-empty string"
)
if
hook_type
not
in
self
.
_registered_hooks
:
if
hook_type
not
in
self
.
_registered_hooks
:
raise
ValueError
(
"hook_type: %s does not exist"
%
(
hook_type
))
raise
ValueError
(
"hook_type: %s does not exist"
%
(
hook_type
))
if
name
in
self
.
_registered_hooks
[
hook_type
]:
if
name
in
self
.
_registered_hooks
[
hook_type
]:
...
@@ -129,7 +134,7 @@ class TaskHooks():
...
@@ -129,7 +134,7 @@ class TaskHooks():
"name: %s has existed in hook_type:%s, use modify method to modify it"
"name: %s has existed in hook_type:%s, use modify method to modify it"
%
(
name
,
hook_type
))
%
(
name
,
hook_type
))
else
:
else
:
args_num
=
len
(
inspect
.
getfullargspec
(
func
).
args
)
args_num
=
len
(
get_args
(
func
).
args
)
if
args_num
!=
self
.
_hook_params_num
[
hook_type
]:
if
args_num
!=
self
.
_hook_params_num
[
hook_type
]:
raise
ValueError
(
raise
ValueError
(
"The number of parameters to the hook hook_type:%s should be %i"
"The number of parameters to the hook hook_type:%s should be %i"
...
@@ -163,13 +168,13 @@ class TaskHooks():
...
@@ -163,13 +168,13 @@ class TaskHooks():
else
:
else
:
return
True
return
True
def
info
(
self
,
only_customized
=
Tru
e
):
def
info
(
self
,
show_default
=
Fals
e
):
# formatted output the source code
# formatted output the source code
ret
=
""
ret
=
""
for
hook_type
,
hooks
in
self
.
_registered_hooks
.
items
():
for
hook_type
,
hooks
in
self
.
_registered_hooks
.
items
():
already_print_type
=
False
already_print_type
=
False
for
name
,
func
in
hooks
.
items
():
for
name
,
func
in
hooks
.
items
():
if
name
==
"default"
and
only_customized
:
if
name
==
"default"
and
not
show_default
:
continue
continue
if
not
already_print_type
:
if
not
already_print_type
:
ret
+=
"hook_type: %s{
\n
"
%
hook_type
ret
+=
"hook_type: %s{
\n
"
%
hook_type
...
@@ -182,7 +187,7 @@ class TaskHooks():
...
@@ -182,7 +187,7 @@ class TaskHooks():
if
already_print_type
:
if
already_print_type
:
ret
+=
"}
\n
"
ret
+=
"}
\n
"
if
not
ret
:
if
not
ret
:
ret
=
"Not any
hooks when only_customized=%s"
%
only_customized
ret
=
"Not any
customized hooks have been defined, you can set show_default=True to see the default hooks information"
return
ret
return
ret
def
__getitem__
(
self
,
hook_type
):
def
__getitem__
(
self
,
hook_type
):
...
@@ -259,8 +264,8 @@ class BaseTask(object):
...
@@ -259,8 +264,8 @@ class BaseTask(object):
self
.
_hooks
=
TaskHooks
()
self
.
_hooks
=
TaskHooks
()
for
hook_type
,
event_hooks
in
self
.
_hooks
.
_registered_hooks
.
items
():
for
hook_type
,
event_hooks
in
self
.
_hooks
.
_registered_hooks
.
items
():
self
.
_hooks
.
add
(
hook_type
,
"default"
,
self
.
_hooks
.
add
(
hook_type
,
"default"
,
eval
(
"self._default_%s
_event
"
%
hook_type
))
eval
(
"self._default_%s"
%
hook_type
))
setattr
(
BaseTask
,
"_%s
_event
"
%
hook_type
,
setattr
(
BaseTask
,
"_%s"
%
hook_type
,
self
.
create_event_function
(
hook_type
))
self
.
create_event_function
(
hook_type
))
# accelerate predict
# accelerate predict
...
@@ -581,13 +586,18 @@ class BaseTask(object):
...
@@ -581,13 +586,18 @@ class BaseTask(object):
return
self
.
_hooks
.
info
(
only_customized
)
return
self
.
_hooks
.
info
(
only_customized
)
def
add_hook
(
self
,
hook_type
,
name
=
None
,
func
=
None
):
def
add_hook
(
self
,
hook_type
,
name
=
None
,
func
=
None
):
if
name
==
None
:
name
=
"hook_%s"
%
id
(
func
)
self
.
_hooks
.
add
(
hook_type
,
name
=
name
,
func
=
func
)
self
.
_hooks
.
add
(
hook_type
,
name
=
name
,
func
=
func
)
logger
.
info
(
"Add hook %s:%s successfully"
%
(
hook_type
,
name
))
def
delete_hook
(
self
,
hook_type
,
name
):
def
delete_hook
(
self
,
hook_type
,
name
):
self
.
_hooks
.
delete
(
hook_type
,
name
)
self
.
_hooks
.
delete
(
hook_type
,
name
)
logger
.
info
(
"Delete hook %s:%s successfully"
%
(
hook_type
,
name
))
def
modify_hook
(
self
,
hook_type
,
name
,
func
):
def
modify_hook
(
self
,
hook_type
,
name
,
func
):
self
.
_hooks
.
modify
(
hook_type
,
name
,
func
)
self
.
_hooks
.
modify
(
hook_type
,
name
,
func
)
logger
.
info
(
"Modify hook %s:%s successfully"
%
(
hook_type
,
name
))
def
_default_build_env_start_event
(
self
):
def
_default_build_env_start_event
(
self
):
pass
pass
...
...
paddlehub/finetune/task/classifier_task.py
浏览文件 @
7dbbef9c
...
@@ -142,7 +142,7 @@ class ClassifierTask(BaseTask):
...
@@ -142,7 +142,7 @@ class ClassifierTask(BaseTask):
}
}
except
:
except
:
raise
Exception
(
raise
Exception
(
"ImageClassificationDataset does not support postprocessing, please use BaseCVDatast instead"
"ImageClassificationDataset does not support postprocessing, please use BaseCVDatas
e
t instead"
)
)
results
=
[]
results
=
[]
for
batch_state
in
run_states
:
for
batch_state
in
run_states
:
...
...
paddlehub/finetune/task/reading_comprehension_task.py
浏览文件 @
7dbbef9c
...
@@ -26,6 +26,7 @@ import json
...
@@ -26,6 +26,7 @@ import json
from
collections
import
OrderedDict
from
collections
import
OrderedDict
import
io
import
numpy
as
np
import
numpy
as
np
import
paddle.fluid
as
fluid
import
paddle.fluid
as
fluid
from
.base_task
import
BaseTask
from
.base_task
import
BaseTask
...
@@ -517,13 +518,13 @@ class ReadingComprehensionTask(BaseTask):
...
@@ -517,13 +518,13 @@ class ReadingComprehensionTask(BaseTask):
null_score_diff_threshold
=
self
.
null_score_diff_threshold
,
null_score_diff_threshold
=
self
.
null_score_diff_threshold
,
is_english
=
self
.
is_english
)
is_english
=
self
.
is_english
)
if
self
.
phase
==
'val'
or
self
.
phase
==
'dev'
:
if
self
.
phase
==
'val'
or
self
.
phase
==
'dev'
:
with
open
(
with
io
.
open
(
self
.
data_reader
.
dataset
.
dev_path
,
'r'
,
self
.
data_reader
.
dataset
.
dev_path
,
'r'
,
encoding
=
"utf8"
)
as
dataset_file
:
encoding
=
"utf8"
)
as
dataset_file
:
dataset_json
=
json
.
load
(
dataset_file
)
dataset_json
=
json
.
load
(
dataset_file
)
dataset
=
dataset_json
[
'data'
]
dataset
=
dataset_json
[
'data'
]
elif
self
.
phase
==
'test'
:
elif
self
.
phase
==
'test'
:
with
open
(
with
io
.
open
(
self
.
data_reader
.
dataset
.
test_path
,
'r'
,
self
.
data_reader
.
dataset
.
test_path
,
'r'
,
encoding
=
"utf8"
)
as
dataset_file
:
encoding
=
"utf8"
)
as
dataset_file
:
dataset_json
=
json
.
load
(
dataset_file
)
dataset_json
=
json
.
load
(
dataset_file
)
...
...
paddlehub/module/manager.py
浏览文件 @
7dbbef9c
...
@@ -168,8 +168,7 @@ class LocalModuleManager(object):
...
@@ -168,8 +168,7 @@ class LocalModuleManager(object):
with
tarfile
.
open
(
module_package
,
"r:gz"
)
as
tar
:
with
tarfile
.
open
(
module_package
,
"r:gz"
)
as
tar
:
file_names
=
tar
.
getnames
()
file_names
=
tar
.
getnames
()
size
=
len
(
file_names
)
-
1
size
=
len
(
file_names
)
-
1
module_dir
=
os
.
path
.
split
(
file_names
[
0
])[
0
]
module_dir
=
os
.
path
.
join
(
_dir
,
file_names
[
0
])
module_dir
=
os
.
path
.
join
(
_dir
,
module_dir
)
for
index
,
file_name
in
enumerate
(
file_names
):
for
index
,
file_name
in
enumerate
(
file_names
):
tar
.
extract
(
file_name
,
_dir
)
tar
.
extract
(
file_name
,
_dir
)
...
@@ -195,7 +194,7 @@ class LocalModuleManager(object):
...
@@ -195,7 +194,7 @@ class LocalModuleManager(object):
save_path
=
os
.
path
.
join
(
MODULE_HOME
,
module_name
)
save_path
=
os
.
path
.
join
(
MODULE_HOME
,
module_name
)
if
os
.
path
.
exists
(
save_path
):
if
os
.
path
.
exists
(
save_path
):
shutil
.
mov
e
(
save_path
)
shutil
.
rmtre
e
(
save_path
)
if
from_user_dir
:
if
from_user_dir
:
shutil
.
copytree
(
module_dir
,
save_path
)
shutil
.
copytree
(
module_dir
,
save_path
)
else
:
else
:
...
...
paddlehub/module/module.py
浏览文件 @
7dbbef9c
...
@@ -37,6 +37,7 @@ from paddlehub.common.lock import lock
...
@@ -37,6 +37,7 @@ from paddlehub.common.lock import lock
from
paddlehub.common.logger
import
logger
from
paddlehub.common.logger
import
logger
from
paddlehub.common.hub_server
import
CacheUpdater
from
paddlehub.common.hub_server
import
CacheUpdater
from
paddlehub.common
import
tmp_dir
from
paddlehub.common
import
tmp_dir
from
paddlehub.common.downloader
import
progress
from
paddlehub.module
import
module_desc_pb2
from
paddlehub.module
import
module_desc_pb2
from
paddlehub.module.manager
import
default_module_manager
from
paddlehub.module.manager
import
default_module_manager
from
paddlehub.module.checker
import
ModuleChecker
from
paddlehub.module.checker
import
ModuleChecker
...
@@ -99,10 +100,22 @@ def create_module(directory, name, author, email, module_type, summary,
...
@@ -99,10 +100,22 @@ def create_module(directory, name, author, email, module_type, summary,
_cwd
=
os
.
getcwd
()
_cwd
=
os
.
getcwd
()
os
.
chdir
(
base_dir
)
os
.
chdir
(
base_dir
)
for
dirname
,
_
,
files
in
os
.
walk
(
module_dir
):
module_dir
=
module_dir
.
replace
(
base_dir
,
"."
)
for
file
in
files
:
tar
.
add
(
module_dir
,
recursive
=
False
)
tar
.
add
(
os
.
path
.
join
(
dirname
,
file
).
replace
(
base_dir
,
"."
))
files
=
[]
for
dirname
,
_
,
subfiles
in
os
.
walk
(
module_dir
):
for
file
in
subfiles
:
files
.
append
(
os
.
path
.
join
(
dirname
,
file
))
total_length
=
len
(
files
)
print
(
"Create Module {}-{}"
.
format
(
name
,
version
))
for
index
,
file
in
enumerate
(
files
):
done
=
int
(
float
(
index
)
/
total_length
*
50
)
progress
(
"[%-50s] %.2f%%"
%
(
'='
*
done
,
float
(
index
/
total_length
*
100
)))
tar
.
add
(
file
)
progress
(
"[%-50s] %.2f%%"
%
(
'='
*
50
,
100
),
end
=
True
)
print
(
"Module package saved as {}"
.
format
(
save_file
))
os
.
chdir
(
_cwd
)
os
.
chdir
(
_cwd
)
...
...
paddlehub/reader/tokenization.py
浏览文件 @
7dbbef9c
...
@@ -170,7 +170,7 @@ class WSSPTokenizer(object):
...
@@ -170,7 +170,7 @@ class WSSPTokenizer(object):
self
.
inv_vocab
=
{
v
:
k
for
k
,
v
in
self
.
vocab
.
items
()}
self
.
inv_vocab
=
{
v
:
k
for
k
,
v
in
self
.
vocab
.
items
()}
self
.
ws
=
ws
self
.
ws
=
ws
self
.
lower
=
lower
self
.
lower
=
lower
self
.
dict
=
pickle
.
load
(
open
(
word_dict
,
'rb'
)
,
encoding
=
'utf8'
)
self
.
dict
=
pickle
.
load
(
open
(
word_dict
,
'rb'
))
self
.
sp_model
=
spm
.
SentencePieceProcessor
()
self
.
sp_model
=
spm
.
SentencePieceProcessor
()
self
.
window_size
=
5
self
.
window_size
=
5
self
.
sp_model
.
Load
(
sp_model_dir
)
self
.
sp_model
.
Load
(
sp_model_dir
)
...
...
tutorial/bert_service.md
浏览文件 @
7dbbef9c
...
@@ -30,7 +30,7 @@
...
@@ -30,7 +30,7 @@
使用Bert Service搭建服务主要分为下面三个步骤:
使用Bert Service搭建服务主要分为下面三个步骤:
## Step1:
环境准备
## Step1:
准备环境
### 环境要求
### 环境要求
下表是使用
`Bert Service`
的环境要求,带有
*
号标志项为非必需依赖,可根据实际使用需求选择安装。
下表是使用
`Bert Service`
的环境要求,带有
*
号标志项为非必需依赖,可根据实际使用需求选择安装。
...
@@ -40,8 +40,7 @@
...
@@ -40,8 +40,7 @@
|PaddleHub|>=1.4.0|无|
|PaddleHub|>=1.4.0|无|
|PaddlePaddle|>=1.6.1|若使用GPU计算,则对应使用PaddlePaddle-gpu版本|
|PaddlePaddle|>=1.6.1|若使用GPU计算,则对应使用PaddlePaddle-gpu版本|
|GCC|>=4.8|无|
|GCC|>=4.8|无|
|CUDA
*
|>=8|若使用GPU,需使用CUDA8以上版本|
|paddle-gpu-serving
*
|>=0.8.2|在
`Bert Service`
服务端需依赖此包|
|paddle-gpu-serving
*
|>=0.8.0|在
`Bert Service`
服务端需依赖此包|
|ujson
*
|>=1.35|在
`Bert Service`
客户端需依赖此包|
|ujson
*
|>=1.35|在
`Bert Service`
客户端需依赖此包|
### 安装步骤
### 安装步骤
...
@@ -84,7 +83,7 @@ $ pip install ujson
...
@@ -84,7 +83,7 @@ $ pip install ujson
|
[
bert_chinese_L-12_H-768_A-12
](
https://paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel
)
|BERT|
|
[
bert_chinese_L-12_H-768_A-12
](
https://paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel
)
|BERT|
## Step2:服务端(server)
## Step2:
启动
服务端(server)
### 简介
### 简介
server端接收client端发送的数据,执行模型计算过程并将计算结果返回给client端。
server端接收client端发送的数据,执行模型计算过程并将计算结果返回给client端。
...
@@ -130,7 +129,7 @@ Paddle Inference Server exit successfully!
...
@@ -130,7 +129,7 @@ Paddle Inference Server exit successfully!
```
```
## Step3:客户端(client)
## Step3:
启动
客户端(client)
### 简介
### 简介
client端接收文本数据,并获取server端返回的模型计算的embedding结果。
client端接收文本数据,并获取server端返回的模型计算的embedding结果。
...
@@ -197,11 +196,11 @@ input_text = [["西风吹老洞庭波"], ["一夜湘君白发多"], ["醉后不
...
@@ -197,11 +196,11 @@ input_text = [["西风吹老洞庭波"], ["一夜湘君白发多"], ["醉后不
```
python
```
python
result
=
bc
.
get_result
(
input_text
=
input_text
)
result
=
bc
.
get_result
(
input_text
=
input_text
)
```
```
最后即可得到
embedding结果(此处只展示部分结果)。
这样,就得到了
embedding结果(此处只展示部分结果)。
```
python
```
python
[[
0.9993321895599361
,
0.9994612336158751
,
0.9999646544456481
,
0.732795298099517
,
-
0.34387934207916204
,
...
]]
[[
0.9993321895599361
,
0.9994612336158751
,
0.9999646544456481
,
0.732795298099517
,
-
0.34387934207916204
,
...
]]
```
```
客户端代码demo文件见
[
示例
](
../
paddlehub/serving/bert_serving/bert_service
.py
)
。
客户端代码demo文件见
[
示例
](
../
demo/serving/bert_service/bert_service_client
.py
)
。
运行命令如下:
运行命令如下:
```
shell
```
shell
$
python bert_service_client.py
$
python bert_service_client.py
...
...
tutorial/serving.md
浏览文件 @
7dbbef9c
# PaddleHub Serving模型一键服务部署
# PaddleHub Serving模型一键服务部署
## 简介
## 简介
### 为什么使用一键服务部署
### 为什么使用一键服务部署
使用PaddleHub能够快速进行
迁移学习和模型预测,但开发者常面临将训练好的模型部署上线的需求,无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一句命令快速得到一个预测服务API
,而无需关注网络框架选择和实现。
使用PaddleHub能够快速进行
模型预测,但开发者常面临本地预测过程迁移线上的需求。无论是对外开放服务端口,还是在局域网中搭建预测服务,都需要PaddleHub具有快速部署模型预测服务的能力。在这个背景下,模型一键服务部署工具——PaddleHub Serving应运而生。开发者通过一行命令即可快速启动一个模型预测在线服务
,而无需关注网络框架选择和实现。
### 什么是一键服务部署
### 什么是一键服务部署
PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。
PaddleHub Serving是基于PaddleHub的一键模型服务部署工具,能够通过简单的Hub命令行工具轻松启动一个模型预测在线服务,前端通过Flask和Gunicorn完成网络请求的处理,后端直接调用PaddleHub预测接口,同时支持使用多进程方式利用多核提高并发能力,保证预测服务的性能。
### 支持模型
### 支持模型
目前PaddleHub Serving支持PaddleHub所有可直接用于预测的模型进行服务部署,包括
`lac`
、
`senta_bilstm`
等nlp类模型,以及
`yolov3_coco2017`
、
`vgg16_imagenet`
等cv类模型,未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
目前PaddleHub Serving支持对PaddleHub所有可直接预测的模型进行服务部署,包括
`lac`
、
`senta_bilstm`
等NLP类模型,以及
`yolov3_darknet53_coco2017`
、
`vgg16_imagenet`
等CV类模型,更多模型请参见
[
PaddleHub支持模型列表
](
https://paddlepaddle.org.cn/hublist
)
。未来还将支持开发者使用PaddleHub Fine-tune API得到的模型用于快捷服务部署。
### 所需环境
下表是使用PaddleHub Serving的环境要求及注意事项。
|项目|建议版本|说明|
|:-:|:-:|:-:|
|操作系统|Linux/Darwin/Windows|建议使用Linux或Darwin,对多线程启动方式支持性较好|
|PaddleHub|>=1.4.0|无|
|PaddlePaddle|>=1.6.1|若使用GPU计算,则对应使用PaddlePaddle-gpu版本|
## 使用
## 使用
### Step1:启动服务端部署
### Step1:启动服务端部署
PaddleHub Serving有两种启动方式,分别是使用命令行
命令
启动,以及使用配置文件启动。
PaddleHub Serving有两种启动方式,分别是使用命令行启动,以及使用配置文件启动。
#### 命令行命令启动
#### 命令行命令启动
启动命令
启动命令
...
@@ -37,7 +28,7 @@ $ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
...
@@ -37,7 +28,7 @@ $ hub serving start --modules [Module1==Version1, Module2==Version2, ...] \
|--modules/-m|PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出
<br>
*`当不指定Version时,默认选择最新版本`*
|
|--modules/-m|PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出
<br>
*`当不指定Version时,默认选择最新版本`*
|
|--port/-p|服务端口,默认为8866|
|--port/-p|服务端口,默认为8866|
|--use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu|
|--use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu|
|--use_multiprocess|是否启用并发方式,默认为单进程方式|
|--use_multiprocess|是否启用并发方式,默认为单进程方式
,推荐多核CPU机器使用此方式
<br>
*`Windows操作系统只支持单进程方式`*
|
#### 配置文件启动
#### 配置文件启动
启动命令
启动命令
...
@@ -60,8 +51,8 @@ $ hub serving start --config config.json
...
@@ -60,8 +51,8 @@ $ hub serving start --config config.json
"batch_size"
:
"BATCH_SIZE_2"
"batch_size"
:
"BATCH_SIZE_2"
}
}
],
],
"use_gpu"
:
false
,
"port"
:
8866
,
"port"
:
8866
,
"use_gpu"
:
false
,
"use_multiprocess"
:
false
"use_multiprocess"
:
false
}
}
```
```
...
@@ -70,10 +61,10 @@ $ hub serving start --config config.json
...
@@ -70,10 +61,10 @@ $ hub serving start --config config.json
|参数|用途|
|参数|用途|
|-|-|
|-|-|
|
--
modules_info|PaddleHub Serving预安装模型,以字典列表形式列出,其中:
<br>
`module`
为预测服务使用的模型名
<br>
`version`
为预测模型的版本
<br>
`batch_size`
为预测批次大小
|modules_info|PaddleHub Serving预安装模型,以字典列表形式列出,其中:
<br>
`module`
为预测服务使用的模型名
<br>
`version`
为预测模型的版本
<br>
`batch_size`
为预测批次大小
|
--use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu
|
|
port|服务端口,默认为8866
|
|
--port/-p|服务端口,默认为8866
|
|
use_gpu|使用GPU进行预测,必须安装paddlepaddle-gpu
|
|
--use_multiprocess|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式
|
|
use_multiprocess|是否启用并发方式,默认为单进程方式,推荐多核CPU机器使用此方式
<br>
*`Windows操作系统只支持单进程方式`*
|
### Step2:访问服务端
### Step2:访问服务端
...
@@ -99,7 +90,7 @@ http://0.0.0.0:8866/predict/<CATEGORY\>/\<MODULE>
...
@@ -99,7 +90,7 @@ http://0.0.0.0:8866/predict/<CATEGORY\>/\<MODULE>
### Step1:部署lac在线服务
### Step1:部署lac在线服务
现在,我们要部署一个lac在线服务,以通过接口获取文本的分词结果。
现在,我们要部署一个lac在线服务,以通过接口获取文本的分词结果。
首先,
根据2.1节所述,启动PaddleHub Serving服务端的
两种方式分别为:
首先,
任意选择一种启动方式,
两种方式分别为:
```
shell
```
shell
$
hub serving start
-m
lac
$
hub serving start
-m
lac
```
```
...
@@ -148,7 +139,7 @@ if __name__ == "__main__":
...
@@ -148,7 +139,7 @@ if __name__ == "__main__":
text_list
=
[
"今天是个好日子"
,
"天气预报说今天要下雨"
]
text_list
=
[
"今天是个好日子"
,
"天气预报说今天要下雨"
]
text
=
{
"text"
:
text_list
}
text
=
{
"text"
:
text_list
}
# 指定预测方法为lac并发送post请求
# 指定预测方法为lac并发送post请求
url
=
"http://
127.0.0.1
:8866/predict/text/lac"
url
=
"http://
0.0.0.0
:8866/predict/text/lac"
r
=
requests
.
post
(
url
=
url
,
data
=
text
)
r
=
requests
.
post
(
url
=
url
,
data
=
text
)
# 打印预测结果
# 打印预测结果
...
@@ -180,6 +171,8 @@ if __name__ == "__main__":
...
@@ -180,6 +171,8 @@ if __name__ == "__main__":
}
}
```
```
此Demo的具体信息和代码请参见
[
LAC Serving
](
../demo/serving/module_serving/lexical_analysis_lac
)
。另外,下面展示了一些其他的一键服务部署Demo。
## Demo——其他模型的一键部署服务
## Demo——其他模型的一键部署服务
获取其他PaddleHub Serving的一键服务部署场景示例,可参见下列demo
获取其他PaddleHub Serving的一键服务部署场景示例,可参见下列demo
...
@@ -217,4 +210,4 @@ if __name__ == "__main__":
...
@@ -217,4 +210,4 @@ if __name__ == "__main__":
  
该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
  
该示例展示了利用senta_lstm完成中文文本情感分析服务化部署和在线预测,获取文本的情感分析结果。
## Bert Service
## Bert Service
除了预训练模型一键服务部署功能
外
外,PaddleHub Serving还具有
`Bert Service`
功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见
[
Bert Service
](
./bert_service.md
)
。
除了预训练模型一键服务部署功能
之
外,PaddleHub Serving还具有
`Bert Service`
功能,支持ernie_tiny、bert等模型快速部署,对外提供可靠的在线embedding服务,具体信息请参见
[
Bert Service
](
./bert_service.md
)
。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录