Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleHub
提交
76e77f74
P
PaddleHub
项目概览
PaddlePaddle
/
PaddleHub
大约 1 年 前同步成功
通知
282
Star
12117
Fork
2091
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
200
列表
看板
标记
里程碑
合并请求
4
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleHub
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
200
Issue
200
列表
看板
标记
里程碑
合并请求
4
合并请求
4
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
76e77f74
编写于
3月 24, 2020
作者:
S
Steffy-zxf
提交者:
GitHub
3月 24, 2020
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
update docs (#472)
* add finetune to module tutorial; add 1.6 release note
上级
4c58fe5e
变更
9
隐藏空白更改
内联
并排
Showing
9 changed file
with
633 addition
and
26 deletion
+633
-26
README.md
README.md
+9
-2
RELEASE.md
RELEASE.md
+15
-0
demo/text_classification/README.md
demo/text_classification/README.md
+6
-0
demo/text_classification/finetuned_model_to_module/__init__.py
...text_classification/finetuned_model_to_module/__init__.py
+0
-0
demo/text_classification/finetuned_model_to_module/module.py
demo/text_classification/finetuned_model_to_module/module.py
+124
-0
docs/contribution/contri_pretrained_model.md
docs/contribution/contri_pretrained_model.md
+202
-23
docs/imgs/humanseg_test_res.png
docs/imgs/humanseg_test_res.png
+0
-0
docs/tutorial/finetuned_model_to_module.md
docs/tutorial/finetuned_model_to_module.md
+275
-0
docs/tutorial/tutorial_index.rst
docs/tutorial/tutorial_index.rst
+2
-1
未找到文件。
README.md
浏览文件 @
76e77f74
...
@@ -99,8 +99,15 @@ $ wget https://paddlehub.bj.bcebos.com/resources/test_image.jpg
...
@@ -99,8 +99,15 @@ $ wget https://paddlehub.bj.bcebos.com/resources/test_image.jpg
$
hub run ace2p
--input_path
test_image.jpg
$
hub run ace2p
--input_path
test_image.jpg
$
hub run deeplabv3p_xception65_humanseg
--input_path
test_image.jpg
$
hub run deeplabv3p_xception65_humanseg
--input_path
test_image.jpg
```
```
<p
align=
"center"
>
<img src="./docs/imgs/img_seg_result.jpeg" align="middle"
<p
align=
"center"
>
<img
src=
"./docs/imgs/img_seg_result.jpeg"
width=
"35%"
/>
<img
src=
"./docs/imgs/humanseg_test_res.png"
width=
"35%"
/>
</p>
<p
align=
'center'
>
         
ace2p分割结果展示
                
humanseg分割结果展示
   
</p>
</p>
PaddleHub还提供图像分类、语义模型、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型,更多模型介绍,请前往
[
https://www.paddlepaddle.org.cn/hub
](
https://www.paddlepaddle.org.cn/hub
)
查看
PaddleHub还提供图像分类、语义模型、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型,更多模型介绍,请前往
[
https://www.paddlepaddle.org.cn/hub
](
https://www.paddlepaddle.org.cn/hub
)
查看
...
...
RELEASE.md
浏览文件 @
76e77f74
# `v1.6.0`
*
NLP Module全面升级,提升应用性和灵活性
*
lac、senta系列(bow、cnn、bilstm、gru、lstm)、simnet_bow、porn_detection系列(cnn、gru、lstm)升级高性能预测,性能提升高达50%
*
ERNIE、BERT、RoBERTa等Transformer类语义模型新增获取预训练embedding接口get_embedding,方便接入下游任务,提升应用性
*
新增RoBERTa通过模型结构压缩得到的3层Transformer模型
[
rbt3
](
https://www.paddlepaddle.org.cn/hubdetail?name=rbt3&en_category=SemanticModel
)
、
[
rbtl3
](
https://www.paddlepaddle.org.cn/hubdetail?name=rbtl3&en_category=SemanticModel
)
*
Task predict接口增加高性能预测模式accelerate_mode,性能提升高达90%
*
PaddleHub Module创建流程开放,支持Fine-tune模型转化,全面提升应用性和灵活性
*
[
预训练模型转化为PaddleHub Module教程
](
./docs/contribution/contri_pretrained_model.md
)
*
[
Fine-tune模型转化为PaddleHub Module教程
](
./docs/tutorial/finetuned_model_to_module.md
)
*
[
PaddleHub Serving
](
/docs/tutorial/serving.md
)
优化启动方式,支持更加灵活的参数配置
# `v1.5.4`
# `v1.5.4`
*
修复Fine-tune中断,checkpoint文件恢复训练失败的问题
*
修复Fine-tune中断,checkpoint文件恢复训练失败的问题
...
...
demo/text_classification/README.md
浏览文件 @
76e77f74
...
@@ -218,3 +218,9 @@ python predict.py --checkpoint_dir $CKPT_DIR --max_seq_len 128
...
@@ -218,3 +218,9 @@ python predict.py --checkpoint_dir $CKPT_DIR --max_seq_len 128
## 超参优化AutoDL Finetuner
## 超参优化AutoDL Finetuner
PaddleHub还提供了超参优化(Hyperparameter Tuning)功能, 自动搜索最优模型超参得到更好的模型效果。详细信息参见
[
AutoDL Finetuner超参优化功能教程
](
../../docs/tutorial/autofinetune.md
)
。
PaddleHub还提供了超参优化(Hyperparameter Tuning)功能, 自动搜索最优模型超参得到更好的模型效果。详细信息参见
[
AutoDL Finetuner超参优化功能教程
](
../../docs/tutorial/autofinetune.md
)
。
## Fine-tune之后保存的模型转化为PaddleHub Module
代码详见
[
finetuned_model_to_module
](
./finetuned_model_to_module
)
文件夹下
Fine-tune之后保存的模型转化为PaddleHub Module
[
教程
](
../../docs/tutorial/finetuned_model_to_module.md
)
demo/text_classification/finetuned_model_to_module/__init__.py
0 → 100644
浏览文件 @
76e77f74
demo/text_classification/finetuned_model_to_module/module.py
0 → 100644
浏览文件 @
76e77f74
# -*- coding:utf-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Finetuning on classification task """
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
os
import
numpy
as
np
from
paddlehub.common.logger
import
logger
from
paddlehub.module.module
import
moduleinfo
,
serving
import
paddlehub
as
hub
@
moduleinfo
(
name
=
"ernie_tiny_finetuned"
,
version
=
"1.0.0"
,
summary
=
"ERNIE tiny which was fine-tuned on the chnsenticorp dataset."
,
author
=
"anonymous"
,
author_email
=
""
,
type
=
"nlp/semantic_model"
)
class
ERNIETinyFinetuned
(
hub
.
Module
):
def
_initialize
(
self
,
ckpt_dir
=
"ckpt_chnsenticorp"
,
num_class
=
2
,
max_seq_len
=
128
,
use_gpu
=
False
,
batch_size
=
1
):
self
.
ckpt_dir
=
os
.
path
.
join
(
self
.
directory
,
ckpt_dir
)
self
.
num_class
=
num_class
self
.
MAX_SEQ_LEN
=
max_seq_len
# Load Paddlehub ERNIE Tiny pretrained model
self
.
module
=
hub
.
Module
(
name
=
"ernie_tiny"
)
inputs
,
outputs
,
program
=
self
.
module
.
context
(
trainable
=
True
,
max_seq_len
=
max_seq_len
)
self
.
vocab_path
=
self
.
module
.
get_vocab_path
()
# Download dataset and use accuracy as metrics
# Choose dataset: GLUE/XNLI/ChinesesGLUE/NLPCC-DBQA/LCQMC
# metric should be acc, f1 or matthews
metrics_choices
=
[
"acc"
]
# For ernie_tiny, it use sub-word to tokenize chinese sentence
# If not ernie tiny, sp_model_path and word_dict_path should be set None
reader
=
hub
.
reader
.
ClassifyReader
(
vocab_path
=
self
.
module
.
get_vocab_path
(),
max_seq_len
=
max_seq_len
,
sp_model_path
=
self
.
module
.
get_spm_path
(),
word_dict_path
=
self
.
module
.
get_word_dict_path
())
# Construct transfer learning network
# Use "pooled_output" for classification tasks on an entire sentence.
# Use "sequence_output" for token-level output.
pooled_output
=
outputs
[
"pooled_output"
]
# Setup feed list for data feeder
# Must feed all the tensor of module need
feed_list
=
[
inputs
[
"input_ids"
].
name
,
inputs
[
"position_ids"
].
name
,
inputs
[
"segment_ids"
].
name
,
inputs
[
"input_mask"
].
name
,
]
# Setup runing config for PaddleHub Finetune API
config
=
hub
.
RunConfig
(
use_data_parallel
=
False
,
use_cuda
=
use_gpu
,
batch_size
=
batch_size
,
checkpoint_dir
=
self
.
ckpt_dir
,
strategy
=
hub
.
AdamWeightDecayStrategy
())
# Define a classfication finetune task by PaddleHub's API
self
.
cls_task
=
hub
.
TextClassifierTask
(
data_reader
=
reader
,
feature
=
pooled_output
,
feed_list
=
feed_list
,
num_classes
=
self
.
num_class
,
config
=
config
,
metrics_choices
=
metrics_choices
)
def
predict
(
self
,
data
,
return_result
=
False
,
accelerate_mode
=
True
):
"""
Get prediction results
"""
run_states
=
self
.
cls_task
.
predict
(
data
=
data
,
return_result
=
return_result
,
accelerate_mode
=
accelerate_mode
)
return
run_states
if
__name__
==
"__main__"
:
ernie_tiny
=
ERNIETinyFinetuned
(
ckpt_dir
=
"../ckpt_chnsenticorp"
,
num_class
=
2
)
# Data to be prdicted
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
index
=
0
run_states
=
ernie_tiny
.
predict
(
data
=
data
)
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
docs/contribution/contri_pretrained_model.md
浏览文件 @
76e77f74
#
贡献预训练模型
#
如何编写一个PaddleHub Module
我们非常欢迎开发者贡献预训练模型到PaddleHub中,如果你想要贡献预训练模型,请提供以下资源:
## 模型基本信息
## 模型
我们准备编写一个PaddleHub Module,Module的基本信息如下:
```
yaml
name
:
senta_test
version
:
1.0.0
summary
:
This is a PaddleHub Module. Just for test.
author
:
anonymous
author_email
:
type
:
nlp/sentiment_analysis
```
请提供相应的网络结构和参数文件,除了PaddlePaddle的模型外,我们也支持将其他主流框架的模型转换到PaddleHub中,包括:
**本示例代码可以参考[senta_module_sample](../../demo/senta_module_sample/)**
*
tensorflow
*
pytorch
*
mxnet
*
caffe
*
onnx
您可以直接使用
[
**x2paddle**
](
https://github.com/PaddlePaddle/X2Paddle
)
进行转换,也可以将相应模型提供给我们,由我们进行转换
Module存在一个接口sentiment_classify,用于接收传入文本,并给出文本的情感倾向(正面/负面),支持python接口调用和命令行调用。
```
python
import
paddlehub
as
hub
## 相关代码
senta_test
=
hub
.
Module
(
name
=
"senta_test"
)
senta_test
.
sentiment_classify
(
texts
=
[
"这部电影太差劲了"
])
```
```
cmd
hub run senta_test --input_text 这部电影太差劲了
```
*
支持预测的模型,请提供相应的预测脚本以及测试样例
<br/>
*
支持finetune的模型,请提供相应的finetune demo
##
相应的介绍资料
##
策略
|资料|是否必选|
为了示例代码简单起见,我们使用一个非常简单的情感判断策略,当输入文本中带有词表中指定单词时,则判断文本倾向为负向,否则为正向
<br/>
## Module创建
### step 1. 创建必要的目录与文件
创建一个senta_test的目录,并在senta_test目录下分别创建__init__.py、module.py、processor.py、vocab.list,其中
|文件名|用途|
|-|-|
|-|-|
|模型结构|√|
|
\_\_
init
\_\_
.py|空文件|
|预训练的数据集|√|
|module.py|主模块,提供Module的实现代码|
|模型介绍文案|√|
|processor.py|辅助模块,提供词表加载的方法|
|源代码链接||
|vocab.list|存放词表|
|模型结构图||
|第三方库依赖||
```
cmd
➜ tree senta_test
senta_test/
├── vocab.list
├── __init__.py
├── module.py
└── processor.py
```
### step 2. 实现辅助模块processor
在processor.py中实现一个load_vocab接口用于读取词表
```
python
def
load_vocab
(
vocab_path
):
with
open
(
vocab_path
)
as
file
:
return
file
.
read
().
split
()
```
### step 3. 编写Module处理代码
module.py文件为Module的入口代码所在,我们需要在其中实现预测逻辑。
#### step 3_1. 引入必要的头文件
```
python
import
argparse
import
os
import
paddlehub
as
hub
from
paddlehub.module.module
import
runnable
,
moduleinfo
from
senta_test.processor
import
load_vocab
```
**NOTE:**
当引用Module中模块时,需要输入全路径,如senta_test.processor
#### step 3_2. 定义SentaTest类
module.py中需要有一个继承了hub.Module的类存在,该类负责实现预测逻辑,并使用moduleinfo填写基本信息。当使用hub.Module(name="senta_test")加载Module时,PaddleHub会自动创建SentaTest的对象并返回。
```
python
@
moduleinfo
(
name
=
"senta_test"
,
version
=
"1.0.0"
,
summary
=
"This is a PaddleHub Module. Just for test."
,
author
=
"anonymous"
,
author_email
=
""
,
type
=
"nlp/sentiment_analysis"
,
)
class
SentaTest
(
hub
.
Module
):
...
```
#### step 3_3. 执行必要的初始化
```
python
def
_initialize
(
self
):
# add arg parser
self
.
parser
=
argparse
.
ArgumentParser
(
description
=
"Run the senta_test module."
,
prog
=
'hub run senta_test'
,
usage
=
'%(prog)s'
,
add_help
=
True
)
self
.
parser
.
add_argument
(
'--input_text'
,
type
=
str
,
default
=
None
,
help
=
"text to predict"
)
# load word dict
vocab_path
=
os
.
path
.
join
(
self
.
directory
,
"vocab.list"
)
self
.
vocab
=
load_vocab
(
vocab_path
)
```
`注意`
:执行类的初始化不能使用默认的__init__接口,而是应该重载实现_initialize接口。对象默认内置了directory属性,可以直接获取到Module所在路径
#### step 3_4. 完善预测逻辑
```
python
def
sentiment_classify
(
self
,
texts
):
results
=
[]
for
text
in
texts
:
sentiment
=
"positive"
for
word
in
self
.
vocab
:
if
word
in
text
:
sentiment
=
"negative"
break
results
.
append
({
"text"
:
text
,
"sentiment"
:
sentiment
})
return
results
```
#### step 3_5. 支持命令行调用
如果希望Module可以支持命令行调用,则需要提供一个经过runnable修饰的接口,接口负责解析传入数据并进行预测,将结果返回。
如果不需要提供命令行预测功能,则可以不实现该接口,PaddleHub在用命令行执行时,会自动发现该Module不支持命令行方式,并给出提示。
```
python
@
runnable
def
run_cmd
(
self
,
argvs
):
args
=
self
.
parser
.
parse_args
(
argvs
)
texts
=
[
args
.
input_text
]
return
self
.
sentiment_classify
(
texts
)
```
#### step 3_6. 支持serving调用
如果希望Module可以支持PaddleHub Serving部署预测服务,则需要提供一个经过serving修饰的接口,接口负责解析传入数据并进行预测,将结果返回。
如果不需要提供PaddleHub Serving部署预测服务,则可以不需要加上serving修饰。
```
python
@
serving
def
sentiment_classify
(
self
,
texts
):
results
=
[]
for
text
in
texts
:
sentiment
=
"positive"
for
word
in
self
.
vocab
:
if
word
in
text
:
sentiment
=
"negative"
break
results
.
append
({
"text"
:
text
,
"sentiment"
:
sentiment
})
return
results
```
### 完整代码
*
[
module.py
](
./senta_test/module.py
)
*
[
processor.py
](
./senta_test/module.py
)
<br/>
## 测试步骤
完成Module编写后,我们可以通过以下方式测试该Module
### 调用方法1
将Module安装到本机中,再通过Hub.Module(name=...)加载
```
shell
hub
install
senta_test
```
```
python
import
paddlehub
as
hub
senta_test
=
hub
.
Module
(
name
=
"senta_test"
)
senta_test
.
sentiment_classify
(
texts
=
[
"这部电影太差劲了"
])
```
### 调用方法2
直接通过Hub.Module(directory=...)加载
```
python
import
paddlehub
as
hub
senta_test
=
hub
.
Module
(
directory
=
"senta_test/"
)
senta_test
.
sentiment_classify
(
texts
=
[
"这部电影太差劲了"
])
```
### 调用方法3
将senta_test作为路径加到环境变量中,直接加载SentaTest对象
```
shell
export
PYTHONPATH
=
senta_test:
$PYTHONPATH
```
```
python
from
senta_test.module
import
SentaTest
SentaTest
.
sentiment_classify
(
texts
=
[
"这部电影太差劲了"
])
```
**NOTE:**
### 调用方法4
将Module安装到本机中,再通过hub run运行
*
为了保证使用体验,请确保模型在python 2.7/3.x下均可正常运行
```
shell
hub
install
senta_test
hub run senta_test
--input_text
"这部电影太差劲了"
```
docs/imgs/humanseg_test_res.png
0 → 100644
浏览文件 @
76e77f74
35.0 KB
docs/tutorial/finetuned_model_to_module.md
0 → 100644
浏览文件 @
76e77f74
# Fine-tune保存的模型如何转化为一个PaddleHub Module
## 模型基本信息
本示例以模型ERNIE Tiny在数据集ChnSentiCorp上完成情感分类Fine-tune任务后保存的模型转化为一个PaddleHub Module,Module的基本信息如下:
```
yaml
name
:
ernie_tiny_finetuned
version
:
1.0.0
summary
:
ERNIE tiny which was fine-tuned on the chnsenticorp dataset.
author
:
anonymous
author_email
:
type
:
nlp/semantic_model
```
**本示例代码可以参考[finetuned_model_to_module](../../demo/text_classification/finetuned_model_to_module/)**
Module存在一个接口predict,用于接收带预测,并给出文本的情感倾向(正面/负面),支持python接口调用和命令行调用。
```
python
import
paddlehub
as
hub
ernie_tiny_finetuned
=
hub
.
Module
(
name
=
"ernie_tiny_finetuned"
)
ernie_tiny_finetuned
.
predcit
(
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]])
```
## Module创建
### step 1. 创建必要的目录与文件
创建一个finetuned_model_to_module的目录,并在finetuned_model_to_module目录下分别创建__init__.py、module.py,其中
|文件名|用途|
|-|-|
|
\_\_
init
\_\_
.py|空文件|
|module.py|主模块,提供Module的实现代码|
|ckpt文件|利用PaddleHub Fine-tune得到的ckpt文件夹,其中必须包含best_model文件|
```
cmd
➜ tree finetuned_model_to_module
finetuned_model_to_module/
├── __init__.py
├── ckpt_chnsenticorp
│ ├── ***
│ ├── best_model
│ │ ├── ***
└── module.py
```
### step 2. 编写Module处理代码
module.py文件为Module的入口代码所在,我们需要在其中实现预测逻辑。
#### step 2_1. 引入必要的头文件
```
python
import
os
import
numpy
as
np
from
paddlehub.common.logger
import
logger
from
paddlehub.module.module
import
moduleinfo
,
serving
import
paddlehub
as
hub
```
#### step 2_2. 定义ERNIE_Tiny_Finetuned类
module.py中需要有一个继承了hub.Module的类存在,该类负责实现预测逻辑,并使用moduleinfo填写基本信息。当使用hub.Module(name="ernie_tiny_finetuned")加载Module时,PaddleHub会自动创建ERNIE_Tiny_Finetuned的对象并返回。
```
python
@
moduleinfo
(
name
=
"ernie_tiny_finetuned"
,
version
=
"1.0.0"
,
summary
=
"ERNIE tiny which was fine-tuned on the chnsenticorp dataset."
,
author
=
"anonymous"
,
author_email
=
""
,
type
=
"nlp/semantic_model"
)
class
ERNIETinyFinetuned
(
hub
.
Module
):
...
```
#### step 2_3. 执行必要的初始化
```
python
def
_initialize
(
self
,
ckpt_dir
=
"ckpt_chnsenticorp"
,
num_class
=
2
,
max_seq_len
=
128
,
use_gpu
=
False
,
batch_size
=
1
):
self
.
ckpt_dir
=
os
.
path
.
join
(
self
.
directory
,
ckpt_dir
)
self
.
num_class
=
num_class
self
.
MAX_SEQ_LEN
=
max_seq_len
self
.
params_path
=
os
.
path
.
join
(
self
.
ckpt_dir
,
'best_model'
)
if
not
os
.
path
.
exists
(
self
.
params_path
):
logger
.
error
(
"%s doesn't contain the best_model file which saves the best parameters as fietuning."
)
exit
()
# Load Paddlehub ERNIE Tiny pretrained model
self
.
module
=
hub
.
Module
(
name
=
"ernie_tiny"
)
inputs
,
outputs
,
program
=
self
.
module
.
context
(
trainable
=
True
,
max_seq_len
=
max_seq_len
)
self
.
vocab_path
=
self
.
module
.
get_vocab_path
()
# Download dataset and use accuracy as metrics
# Choose dataset: GLUE/XNLI/ChinesesGLUE/NLPCC-DBQA/LCQMC
# metric should be acc, f1 or matthews
metrics_choices
=
[
"acc"
]
# For ernie_tiny, it use sub-word to tokenize chinese sentence
# If not ernie tiny, sp_model_path and word_dict_path should be set None
reader
=
hub
.
reader
.
ClassifyReader
(
vocab_path
=
self
.
module
.
get_vocab_path
(),
max_seq_len
=
max_seq_len
,
sp_model_path
=
self
.
module
.
get_spm_path
(),
word_dict_path
=
self
.
module
.
get_word_dict_path
())
# Construct transfer learning network
# Use "pooled_output" for classification tasks on an entire sentence.
# Use "sequence_output" for token-level output.
pooled_output
=
outputs
[
"pooled_output"
]
# Setup feed list for data feeder
# Must feed all the tensor of module need
feed_list
=
[
inputs
[
"input_ids"
].
name
,
inputs
[
"position_ids"
].
name
,
inputs
[
"segment_ids"
].
name
,
inputs
[
"input_mask"
].
name
,
]
# Setup runing config for PaddleHub Finetune API
config
=
hub
.
RunConfig
(
use_data_parallel
=
False
,
use_cuda
=
use_gpu
,
batch_size
=
batch_size
,
checkpoint_dir
=
self
.
ckpt_dir
,
strategy
=
hub
.
AdamWeightDecayStrategy
())
# Define a classfication finetune task by PaddleHub's API
self
.
cls_task
=
hub
.
TextClassifierTask
(
data_reader
=
reader
,
feature
=
pooled_output
,
feed_list
=
feed_list
,
num_classes
=
self
.
num_class
,
config
=
config
,
metrics_choices
=
metrics_choices
)
```
初始化过程即为Fine-tune时创建Task的过程。
**NOTE:**
执行类的初始化不能使用默认的__init__接口,而是应该重载实现_initialize接口。对象默认内置了directory属性,可以直接获取到Module所在路径
#### step 3_4. 完善预测逻辑
```
python
def
predict
(
self
,
data
,
return_result
=
False
,
accelerate_mode
=
True
):
"""
Get prediction results
"""
run_states
=
self
.
cls_task
.
predict
(
data
=
data
,
return_result
=
return_result
,
accelerate_mode
=
accelerate_mode
)
return
run_states
```
#### step 3_5. 支持serving调用
如果希望Module可以支持PaddleHub Serving部署预测服务,则需要将预测接口predcit加上serving修饰(
`@serving`
),接口负责解析传入数据并进行预测,将结果返回。
如果不需要提供PaddleHub Serving部署预测服务,则可以不需要加上serving修饰。
```
python
@
serving
def
predict
(
self
,
data
,
return_result
=
False
,
accelerate_mode
=
True
):
"""
Get prediction results
"""
run_states
=
self
.
cls_task
.
predict
(
data
=
data
,
return_result
=
return_result
,
accelerate_mode
=
accelerate_mode
)
return
run_states
```
### 完整代码
*
[
module.py
](
../../demo/text_classification/finetuned_model_to_module/module.py
)
*
[
__init__.py
](
../../demo/text_classification/finetuned_model_to_module/__init__.py
)
**NOTE:**
`__init__.py`
是空文件
## 测试步骤
完成Module编写后,我们可以通过以下方式测试该Module
### 调用方法1
将Module安装到本机中,再通过Hub.Module(name=...)加载
```
shell
hub
install
finetuned_model_to_module
```
安装成功会显示
**Successfully installed ernie_tiny_finetuned**
```
python
import
paddlehub
as
hub
import
numpy
as
np
ernie_tiny
=
hub
.
Module
(
name
=
"ernie_tiny_finetuned"
)
# Data to be prdicted
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
index
=
0
run_states
=
ernie_tiny
.
predict
(
data
=
data
)
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
```
### 调用方法2
直接通过Hub.Module(directory=...)加载
```
python
import
paddlehub
as
hub
import
numpy
as
np
ernie_tiny_finetuned
=
hub
.
Module
(
directory
=
"finetuned_model_to_module/"
)
# Data to be prdicted
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
index
=
0
run_states
=
ernie_tiny
.
predict
(
data
=
data
)
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
```
### 调用方法3
将finetuned_model_to_module作为路径加到环境变量中,直接加载ERNIETinyFinetuned对象
```
shell
export
PYTHONPATH
=
finetuned_model_to_module:
$PYTHONPATH
```
```
python
from
finetuned_model_to_module.module
import
ERNIETinyFinetuned
import
numpy
as
np
# Data to be prdicted
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
run_states
=
ERNIETinyFinetuned
.
predict
(
data
=
data
)
index
=
0
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
```
docs/tutorial/tutorial_index.rst
浏览文件 @
76e77f74
...
@@ -11,10 +11,11 @@
...
@@ -11,10 +11,11 @@
命令行工具<cmdintro>
命令行工具<cmdintro>
自定义数据<how_to_load_data>
自定义数据<how_to_load_data>
Fine-tune模型转化为PaddleHub Module<finetuned_model_to_module.md>
自定义任务<how_to_define_task>
自定义任务<how_to_define_task>
服务化部署<serving>
服务化部署<serving>
文本Embedding服务<bert_service>
文本Embedding服务<bert_service>
语义相似度计算<sentence_sim>
语义相似度计算<sentence_sim>
ULMFit优化策略<strategy_exp>
ULMFit优化策略<strategy_exp>
超参优化<autofinetune>
超参优化<autofinetune>
Hook机制<hook>
Hook机制<hook>
\ No newline at end of file
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录