Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleHub
提交
22c4494f
P
PaddleHub
项目概览
PaddlePaddle
/
PaddleHub
大约 2 年 前同步成功
通知
285
Star
12117
Fork
2091
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
200
列表
看板
标记
里程碑
合并请求
4
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleHub
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
200
Issue
200
列表
看板
标记
里程碑
合并请求
4
合并请求
4
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
22c4494f
编写于
5月 11, 2020
作者:
S
Steffy-zxf
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of
https://github.com/PaddlePaddle/PaddleHub
into add-preset-net
上级
b3b8cb0f
b7e8230f
变更
23
展开全部
隐藏空白更改
内联
并排
Showing
23 changed file
with
826 addition
and
111 deletion
+826
-111
README.md
README.md
+18
-16
demo/image_classification/img_classifier_dygraph.py
demo/image_classification/img_classifier_dygraph.py
+89
-0
demo/sequence_labeling/sequence_label_dygraph.py
demo/sequence_labeling/sequence_label_dygraph.py
+107
-0
demo/text_classification/finetuned_model_to_module/module.py
demo/text_classification/finetuned_model_to_module/module.py
+12
-10
demo/text_classification/text_classifier_dygraph.py
demo/text_classification/text_classifier_dygraph.py
+98
-0
docs/pretrained_models.md
docs/pretrained_models.md
+178
-0
docs/reference/config.md
docs/reference/config.md
+4
-4
docs/reference/task/base_task.md
docs/reference/task/base_task.md
+0
-9
docs/tutorial/define_task_example.md
docs/tutorial/define_task_example.md
+84
-0
docs/tutorial/finetuned_model_to_module.md
docs/tutorial/finetuned_model_to_module.md
+63
-30
docs/tutorial/how_to_load_data.md
docs/tutorial/how_to_load_data.md
+3
-0
hub_module/scripts/configs/faster_rcnn_resnet50_fpn_venus.yml
...module/scripts/configs/faster_rcnn_resnet50_fpn_venus.yml
+5
-5
paddlehub/__init__.py
paddlehub/__init__.py
+1
-0
paddlehub/common/downloader.py
paddlehub/common/downloader.py
+74
-17
paddlehub/common/hub_server.py
paddlehub/common/hub_server.py
+2
-1
paddlehub/common/logger.py
paddlehub/common/logger.py
+2
-1
paddlehub/common/paddle_helper.py
paddlehub/common/paddle_helper.py
+4
-2
paddlehub/dataset/food101.py
paddlehub/dataset/food101.py
+3
-3
paddlehub/module/manager.py
paddlehub/module/manager.py
+9
-5
paddlehub/module/module.py
paddlehub/module/module.py
+7
-2
paddlehub/module/nlp_module.py
paddlehub/module/nlp_module.py
+61
-4
paddlehub/reader/cv_reader.py
paddlehub/reader/cv_reader.py
+1
-1
paddlehub/version.py
paddlehub/version.py
+1
-1
未找到文件。
README.md
浏览文件 @
22c4494f
...
@@ -8,18 +8,18 @@
...
@@ -8,18 +8,18 @@




PaddleHub是飞桨生态的预训练模型应用工具,开发者可以便捷地使用高质量的预训练模型结合Fine-tune API快速完成模型迁移到部署的全流程工作。PaddleHub提供的预训练模型涵盖了图像分类、目标检测、词法分析、语义模型、情感分析、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型。更多详情可查看官网:https://www.paddlepaddle.org.cn/hu
PaddleHub是飞桨生态的预训练模型应用工具,开发者可以便捷地使用高质量的预训练模型结合Fine-tune API快速完成模型迁移到部署的全流程工作。PaddleHub提供的预训练模型涵盖了图像分类、目标检测、词法分析、语义模型、情感分析、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型。更多详情可查看官网:https://www.paddlepaddle.org.cn/hu
b
PaddleHub以预训练模型应用为核心具备以下特点:
PaddleHub以预训练模型应用为核心具备以下特点:
*
**[模型即软件](#模型即软件)**
,通过Python API或命令行实现模型调用,可快速体验或集成飞桨特色预训练模型。
*
**[模型即软件](#模型即软件)**
,通过Python API或命令行实现模型调用,可快速体验或集成飞桨特色预训练模型。
*
**[易用的迁移学习](#迁移学习)**
,通过Fine-tune API,内置多种优化策略,只需少量代码即可完成预训练模型的Fine-tuning。
*
**[易用的迁移学习](#
易用的
迁移学习)**
,通过Fine-tune API,内置多种优化策略,只需少量代码即可完成预训练模型的Fine-tuning。
*
**[一键模型转服务](#
服务化部署paddlehub-serving
)**
,简单一行命令即可搭建属于自己的深度学习模型API服务完成部署。
*
**[一键模型转服务](#
一键模型转服务
)**
,简单一行命令即可搭建属于自己的深度学习模型API服务完成部署。
*
**[自动超参优化](#
超参优化autodl-finetuner
)**
,内置AutoDL Finetuner能力,一键启动自动化超参搜索。
*
**[自动超参优化](#
自动超参优化
)**
,内置AutoDL Finetuner能力,一键启动自动化超参搜索。
<p
align=
"center"
>
<p
align=
"center"
>
...
@@ -66,7 +66,7 @@ PaddleHub采用模型即软件的设计理念,所有的预训练模型与Pytho
...
@@ -66,7 +66,7 @@ PaddleHub采用模型即软件的设计理念,所有的预训练模型与Pytho
安装PaddleHub后,执行命令
[
hub run
](
./docs/tutorial/cmdintro.md
)
,即可快速体验无需代码、一键预测的功能:
安装PaddleHub后,执行命令
[
hub run
](
./docs/tutorial/cmdintro.md
)
,即可快速体验无需代码、一键预测的功能:
*
使用
[
目标检测
](
http
://www.paddlepaddle.org.cn/hub?filter=
category&value=ObjectDetection
)
模型pyramidbox_lite_mobile_mask对图片进行口罩检测
*
使用
[
目标检测
](
http
s://www.paddlepaddle.org.cn/hublist?filter=en_
category&value=ObjectDetection
)
模型pyramidbox_lite_mobile_mask对图片进行口罩检测
```
shell
```
shell
$
wget https://paddlehub.bj.bcebos.com/resources/test_mask_detection.jpg
$
wget https://paddlehub.bj.bcebos.com/resources/test_mask_detection.jpg
$
hub run pyramidbox_lite_mobile_mask
--input_path
test_mask_detection.jpg
$
hub run pyramidbox_lite_mobile_mask
--input_path
test_mask_detection.jpg
...
@@ -75,19 +75,22 @@ $ hub run pyramidbox_lite_mobile_mask --input_path test_mask_detection.jpg
...
@@ -75,19 +75,22 @@ $ hub run pyramidbox_lite_mobile_mask --input_path test_mask_detection.jpg
<img src="./docs/imgs/test_mask_detection_result.jpg" align="middle"
<img src="./docs/imgs/test_mask_detection_result.jpg" align="middle"
</p>
</p>
*
使用
[
词法分析
](
http
://www.paddlepaddle.org.cn/hub?filter=
category&value=LexicalAnalysis
)
模型LAC进行分词
*
使用
[
词法分析
](
http
s://www.paddlepaddle.org.cn/hublist?filter=en_
category&value=LexicalAnalysis
)
模型LAC进行分词
```
shell
```
shell
$
hub run lac
--input_text
"今天是个好日子"
$
hub run lac
--input_text
"现在,慕尼黑再保险公司不仅是此类行动的倡议者,更是将其大量气候数据整合进保险产品中,并与公众共享大量天气信息,参与到新能源领域的保障中。"
[{
'word'
:
[
'今天'
,
'是'
,
'个'
,
'好日子'
]
,
'tag'
:
[
'TIME'
,
'v'
,
'q'
,
'n'
]}]
[{
'word'
:
[
'现在'
,
','
,
'慕尼黑再保险公司'
,
'不仅'
,
'是'
,
'此类'
,
'行动'
,
'的'
,
'倡议者'
,
','
,
'更是'
,
'将'
,
'其'
,
'大量'
,
'气候'
,
'数据'
,
'整合'
,
'进'
,
'保险'
,
'产品'
,
'中'
,
','
,
'并'
,
'与'
,
'公众'
,
'共享'
,
'大量'
,
'天气'
,
'信息'
,
','
,
'参与'
,
'到'
,
'新能源'
,
'领域'
,
'的'
,
'保障'
,
'中'
,
'。'
]
,
'tag'
:
[
'TIME'
,
'w'
,
'ORG'
,
'c'
,
'v'
,
'r'
,
'n'
,
'u'
,
'n'
,
'w'
,
'd'
,
'p'
,
'r'
,
'a'
,
'n'
,
'n'
,
'v'
,
'v'
,
'n'
,
'n'
,
'f'
,
'w'
,
'c'
,
'p'
,
'n'
,
'v'
,
'a'
,
'n'
,
'n'
,
'w'
,
'v'
,
'v'
,
'n'
,
'n'
,
'u'
,
'vn'
,
'f'
,
'w'
]
}]
```
```
*
使用
[
情感分析
](
http
://www.paddlepaddle.org.cn/hub?filter=
category&value=SentimentAnalysis
)
模型Senta对句子进行情感预测
*
使用
[
情感分析
](
http
s://www.paddlepaddle.org.cn/hublist?filter=en_
category&value=SentimentAnalysis
)
模型Senta对句子进行情感预测
```
shell
```
shell
$
hub run senta_bilstm
--input_text
"今天天气真好"
$
hub run senta_bilstm
--input_text
"今天天气真好"
{
'text'
:
'今天天气真好'
,
'sentiment_label'
: 1,
'sentiment_key'
:
'positive'
,
'positive_probs'
: 0.9798,
'negative_probs'
: 0.0202
}]
{
'text'
:
'今天天气真好'
,
'sentiment_label'
: 1,
'sentiment_key'
:
'positive'
,
'positive_probs'
: 0.9798,
'negative_probs'
: 0.0202
}]
```
```
*
使用
[
目标检测
](
http
://www.paddlepaddle.org.cn/hub?filter=
category&value=ObjectDetection
)
模型Ultra-Light-Fast-Generic-Face-Detector-1MB对图片进行人脸识别
*
使用
[
目标检测
](
http
s://www.paddlepaddle.org.cn/hublist?filter=en_
category&value=ObjectDetection
)
模型Ultra-Light-Fast-Generic-Face-Detector-1MB对图片进行人脸识别
```
shell
```
shell
$
wget https://paddlehub.bj.bcebos.com/resources/test_image.jpg
$
wget https://paddlehub.bj.bcebos.com/resources/test_image.jpg
$
hub run ultra_light_fast_generic_face_detector_1mb_640
--input_path
test_image.jpg
$
hub run ultra_light_fast_generic_face_detector_1mb_640
--input_path
test_image.jpg
...
@@ -110,11 +113,11 @@ $ hub run deeplabv3p_xception65_humanseg --input_path test_image.jpg
...
@@ -110,11 +113,11 @@ $ hub run deeplabv3p_xception65_humanseg --input_path test_image.jpg
</p>
</p>
<p
align=
'center'
>
<p
align=
'center'
>
         
ace2p分割结果展示
                
         
ACE2P人体部件分割
                
humanseg分割结果展示
   
HumanSeg人像分割
   
</p>
</p>
PaddleHub还提供图像分类、语义模型、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型,更多模型介绍,请前往
[
https://www.paddlepaddle.org.cn/hub
](
https://www.paddlepaddle.org.cn/hub
)
查看
PaddleHub还提供图像分类、语义模型、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型,更多模型介绍,请前往
[
预训练模型介绍
](
./docs/pretrained_models.md
)
或者PaddleHub官网
[
https://www.paddlepaddle.org.cn/hub
](
https://www.paddlepaddle.org.cn/hub
)
查看
### 易用的迁移学习
### 易用的迁移学习
...
@@ -189,6 +192,5 @@ $ hub uninstall ernie
...
@@ -189,6 +192,5 @@ $ hub uninstall ernie
## 更新历史
## 更新历史
PaddleHub v1.6.0已发布!
PaddleHub v1.6 已发布!
更多升级详情参考
[
更新历史
](
./RELEASE.md
)
详情参考
[
更新历史
](
./RELEASE.md
)
demo/image_classification/img_classifier_dygraph.py
0 → 100644
浏览文件 @
22c4494f
#coding:utf-8
import
argparse
import
os
import
numpy
as
np
import
paddlehub
as
hub
import
paddle.fluid
as
fluid
from
paddle.fluid.dygraph
import
Linear
from
paddle.fluid.dygraph.base
import
to_variable
from
paddle.fluid.optimizer
import
AdamOptimizer
# yapf: disable
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
1
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
"paddlehub_finetune_ckpt_dygraph"
,
help
=
"Path to save log data."
)
parser
.
add_argument
(
"--batch_size"
,
type
=
int
,
default
=
16
,
help
=
"Total examples' number in batch for training."
)
parser
.
add_argument
(
"--log_interval"
,
type
=
int
,
default
=
10
,
help
=
"log interval."
)
parser
.
add_argument
(
"--save_interval"
,
type
=
int
,
default
=
10
,
help
=
"save interval."
)
# yapf: enable.
class
ResNet50
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
num_classes
,
backbone
):
super
(
ResNet50
,
self
).
__init__
()
self
.
fc
=
Linear
(
input_dim
=
2048
,
output_dim
=
num_classes
)
self
.
backbone
=
backbone
def
forward
(
self
,
imgs
):
feature_map
=
self
.
backbone
(
imgs
)
feature_map
=
fluid
.
layers
.
reshape
(
feature_map
,
shape
=
[
-
1
,
2048
])
pred
=
self
.
fc
(
feature_map
)
return
fluid
.
layers
.
softmax
(
pred
)
def
finetune
(
args
):
with
fluid
.
dygraph
.
guard
():
resnet50_vd_10w
=
hub
.
Module
(
name
=
"resnet50_vd_10w"
)
dataset
=
hub
.
dataset
.
Flowers
()
resnet
=
ResNet50
(
num_classes
=
dataset
.
num_labels
,
backbone
=
resnet50_vd_10w
)
adam
=
AdamOptimizer
(
learning_rate
=
0.001
,
parameter_list
=
resnet
.
parameters
())
state_dict_path
=
os
.
path
.
join
(
args
.
checkpoint_dir
,
'dygraph_state_dict'
)
if
os
.
path
.
exists
(
state_dict_path
+
'.pdparams'
):
state_dict
,
_
=
fluid
.
load_dygraph
(
state_dict_path
)
resnet
.
load_dict
(
state_dict
)
reader
=
hub
.
reader
.
ImageClassificationReader
(
image_width
=
resnet50_vd_10w
.
get_expected_image_width
(),
image_height
=
resnet50_vd_10w
.
get_expected_image_height
(),
images_mean
=
resnet50_vd_10w
.
get_pretrained_images_mean
(),
images_std
=
resnet50_vd_10w
.
get_pretrained_images_std
(),
dataset
=
dataset
)
train_reader
=
reader
.
data_generator
(
batch_size
=
args
.
batch_size
,
phase
=
'train'
)
loss_sum
=
acc_sum
=
cnt
=
0
# 执行epoch_num次训练
for
epoch
in
range
(
args
.
num_epoch
):
# 读取训练数据进行训练
for
batch_id
,
data
in
enumerate
(
train_reader
()):
imgs
=
np
.
array
(
data
[
0
][
0
])
labels
=
np
.
array
(
data
[
0
][
1
])
pred
=
resnet
(
imgs
)
acc
=
fluid
.
layers
.
accuracy
(
pred
,
to_variable
(
labels
))
loss
=
fluid
.
layers
.
cross_entropy
(
pred
,
to_variable
(
labels
))
avg_loss
=
fluid
.
layers
.
mean
(
loss
)
avg_loss
.
backward
()
# 参数更新
adam
.
minimize
(
avg_loss
)
loss_sum
+=
avg_loss
.
numpy
()
*
imgs
.
shape
[
0
]
acc_sum
+=
acc
.
numpy
()
*
imgs
.
shape
[
0
]
cnt
+=
imgs
.
shape
[
0
]
if
batch_id
%
args
.
log_interval
==
0
:
print
(
'epoch {}: loss {}, acc {}'
.
format
(
epoch
,
loss_sum
/
cnt
,
acc_sum
/
cnt
))
loss_sum
=
acc_sum
=
cnt
=
0
if
batch_id
%
args
.
save_interval
==
0
:
state_dict
=
resnet
.
state_dict
()
fluid
.
save_dygraph
(
state_dict
,
state_dict_path
)
if
__name__
==
"__main__"
:
args
=
parser
.
parse_args
()
finetune
(
args
)
demo/sequence_labeling/sequence_label_dygraph.py
0 → 100644
浏览文件 @
22c4494f
#coding:utf-8
import
argparse
import
os
import
numpy
as
np
import
paddlehub
as
hub
import
paddle.fluid
as
fluid
from
paddle.fluid.dygraph
import
Linear
from
paddle.fluid.dygraph.base
import
to_variable
from
paddle.fluid.optimizer
import
AdamOptimizer
from
paddlehub.finetune.evaluate
import
chunk_eval
,
calculate_f1
# yapf: disable
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
1
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--batch_size"
,
type
=
int
,
default
=
16
,
help
=
"Total examples' number in batch for training."
)
parser
.
add_argument
(
"--log_interval"
,
type
=
int
,
default
=
10
,
help
=
"log interval."
)
parser
.
add_argument
(
"--save_interval"
,
type
=
int
,
default
=
10
,
help
=
"save interval."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
"paddlehub_finetune_ckpt_dygraph"
,
help
=
"Path to save log data."
)
parser
.
add_argument
(
"--max_seq_len"
,
type
=
int
,
default
=
512
,
help
=
"Number of words of the longest seqence."
)
# yapf: enable.
class
TransformerSequenceLabelLayer
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
num_classes
,
transformer
):
super
(
TransformerSequenceLabelLayer
,
self
).
__init__
()
self
.
num_classes
=
num_classes
self
.
transformer
=
transformer
self
.
fc
=
Linear
(
input_dim
=
768
,
output_dim
=
num_classes
)
def
forward
(
self
,
input_ids
,
position_ids
,
segment_ids
,
input_mask
):
result
=
self
.
transformer
(
input_ids
,
position_ids
,
segment_ids
,
input_mask
)
pred
=
self
.
fc
(
result
[
'sequence_output'
])
ret_infers
=
fluid
.
layers
.
reshape
(
x
=
fluid
.
layers
.
argmax
(
pred
,
axis
=
2
),
shape
=
[
-
1
,
1
])
pred
=
fluid
.
layers
.
reshape
(
pred
,
shape
=
[
-
1
,
self
.
num_classes
])
return
fluid
.
layers
.
softmax
(
pred
),
ret_infers
def
finetune
(
args
):
ernie
=
hub
.
Module
(
name
=
"ernie"
,
max_seq_len
=
args
.
max_seq_len
)
with
fluid
.
dygraph
.
guard
():
dataset
=
hub
.
dataset
.
MSRA_NER
()
ts
=
TransformerSequenceLabelLayer
(
num_classes
=
dataset
.
num_labels
,
transformer
=
ernie
)
adam
=
AdamOptimizer
(
learning_rate
=
1e-5
,
parameter_list
=
ts
.
parameters
())
state_dict_path
=
os
.
path
.
join
(
args
.
checkpoint_dir
,
'dygraph_state_dict'
)
if
os
.
path
.
exists
(
state_dict_path
+
'.pdparams'
):
state_dict
,
_
=
fluid
.
load_dygraph
(
state_dict_path
)
ts
.
load_dict
(
state_dict
)
reader
=
hub
.
reader
.
SequenceLabelReader
(
dataset
=
dataset
,
vocab_path
=
ernie
.
get_vocab_path
(),
max_seq_len
=
args
.
max_seq_len
,
sp_model_path
=
ernie
.
get_spm_path
(),
word_dict_path
=
ernie
.
get_word_dict_path
())
train_reader
=
reader
.
data_generator
(
batch_size
=
args
.
batch_size
,
phase
=
'train'
)
loss_sum
=
total_infer
=
total_label
=
total_correct
=
cnt
=
0
# 执行epoch_num次训练
for
epoch
in
range
(
args
.
num_epoch
):
# 读取训练数据进行训练
for
batch_id
,
data
in
enumerate
(
train_reader
()):
input_ids
=
np
.
array
(
data
[
0
][
0
]).
astype
(
np
.
int64
)
position_ids
=
np
.
array
(
data
[
0
][
1
]).
astype
(
np
.
int64
)
segment_ids
=
np
.
array
(
data
[
0
][
2
]).
astype
(
np
.
int64
)
input_mask
=
np
.
array
(
data
[
0
][
3
]).
astype
(
np
.
float32
)
labels
=
np
.
array
(
data
[
0
][
4
]).
astype
(
np
.
int64
).
reshape
(
-
1
,
1
)
seq_len
=
np
.
squeeze
(
np
.
array
(
data
[
0
][
5
]).
astype
(
np
.
int64
),
axis
=
1
)
pred
,
ret_infers
=
ts
(
input_ids
,
position_ids
,
segment_ids
,
input_mask
)
loss
=
fluid
.
layers
.
cross_entropy
(
pred
,
to_variable
(
labels
))
avg_loss
=
fluid
.
layers
.
mean
(
loss
)
avg_loss
.
backward
()
# 参数更新
adam
.
minimize
(
avg_loss
)
loss_sum
+=
avg_loss
.
numpy
()
*
labels
.
shape
[
0
]
label_num
,
infer_num
,
correct_num
=
chunk_eval
(
labels
,
ret_infers
.
numpy
(),
seq_len
,
dataset
.
num_labels
,
1
)
cnt
+=
labels
.
shape
[
0
]
total_infer
+=
infer_num
total_label
+=
label_num
total_correct
+=
correct_num
if
batch_id
%
args
.
log_interval
==
0
:
precision
,
recall
,
f1
=
calculate_f1
(
total_label
,
total_infer
,
total_correct
)
print
(
'epoch {}: loss {}, f1 {} recall {} precision {}'
.
format
(
epoch
,
loss_sum
/
cnt
,
f1
,
recall
,
precision
))
loss_sum
=
total_infer
=
total_label
=
total_correct
=
cnt
=
0
if
batch_id
%
args
.
save_interval
==
0
:
state_dict
=
ts
.
state_dict
()
fluid
.
save_dygraph
(
state_dict
,
state_dict_path
)
if
__name__
==
"__main__"
:
args
=
parser
.
parse_args
()
finetune
(
args
)
demo/text_classification/finetuned_model_to_module/module.py
浏览文件 @
22c4494f
...
@@ -94,6 +94,7 @@ class ERNIETinyFinetuned(hub.Module):
...
@@ -94,6 +94,7 @@ class ERNIETinyFinetuned(hub.Module):
config
=
config
,
config
=
config
,
metrics_choices
=
metrics_choices
)
metrics_choices
=
metrics_choices
)
@
serving
def
predict
(
self
,
data
,
return_result
=
False
,
accelerate_mode
=
True
):
def
predict
(
self
,
data
,
return_result
=
False
,
accelerate_mode
=
True
):
"""
"""
Get prediction results
Get prediction results
...
@@ -102,7 +103,14 @@ class ERNIETinyFinetuned(hub.Module):
...
@@ -102,7 +103,14 @@ class ERNIETinyFinetuned(hub.Module):
data
=
data
,
data
=
data
,
return_result
=
return_result
,
return_result
=
return_result
,
accelerate_mode
=
accelerate_mode
)
accelerate_mode
=
accelerate_mode
)
return
run_states
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
prediction
=
[]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
batch_result
=
batch_result
.
tolist
()
prediction
+=
batch_result
return
prediction
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
...
@@ -113,12 +121,6 @@ if __name__ == "__main__":
...
@@ -113,12 +121,6 @@ if __name__ == "__main__":
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
index
=
0
predictions
=
ernie_tiny
.
predict
(
data
=
data
)
run_states
=
ernie_tiny
.
predict
(
data
=
data
)
for
index
,
text
in
enumerate
(
data
):
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
predictions
[
index
]))
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
demo/text_classification/text_classifier_dygraph.py
0 → 100644
浏览文件 @
22c4494f
#coding:utf-8
import
argparse
import
os
import
numpy
as
np
import
paddlehub
as
hub
import
paddle.fluid
as
fluid
from
paddle.fluid.dygraph
import
Linear
from
paddle.fluid.dygraph.base
import
to_variable
from
paddle.fluid.optimizer
import
AdamOptimizer
# yapf: disable
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
1
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--batch_size"
,
type
=
int
,
default
=
16
,
help
=
"Total examples' number in batch for training."
)
parser
.
add_argument
(
"--log_interval"
,
type
=
int
,
default
=
10
,
help
=
"log interval."
)
parser
.
add_argument
(
"--save_interval"
,
type
=
int
,
default
=
10
,
help
=
"save interval."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
"paddlehub_finetune_ckpt_dygraph"
,
help
=
"Path to save log data."
)
parser
.
add_argument
(
"--max_seq_len"
,
type
=
int
,
default
=
512
,
help
=
"Number of words of the longest seqence."
)
# yapf: enable.
class
TransformerClassifier
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
num_classes
,
transformer
):
super
(
TransformerClassifier
,
self
).
__init__
()
self
.
num_classes
=
num_classes
self
.
transformer
=
transformer
self
.
fc
=
Linear
(
input_dim
=
768
,
output_dim
=
num_classes
)
def
forward
(
self
,
input_ids
,
position_ids
,
segment_ids
,
input_mask
):
result
=
self
.
transformer
(
input_ids
,
position_ids
,
segment_ids
,
input_mask
)
cls_feats
=
fluid
.
layers
.
dropout
(
result
[
'pooled_output'
],
dropout_prob
=
0.1
,
dropout_implementation
=
"upscale_in_train"
)
cls_feats
=
fluid
.
layers
.
reshape
(
cls_feats
,
shape
=
[
-
1
,
768
])
pred
=
self
.
fc
(
cls_feats
)
return
fluid
.
layers
.
softmax
(
pred
)
def
finetune
(
args
):
ernie
=
hub
.
Module
(
name
=
"ernie"
,
max_seq_len
=
args
.
max_seq_len
)
with
fluid
.
dygraph
.
guard
():
dataset
=
hub
.
dataset
.
ChnSentiCorp
()
tc
=
TransformerClassifier
(
num_classes
=
dataset
.
num_labels
,
transformer
=
ernie
)
adam
=
AdamOptimizer
(
learning_rate
=
1e-5
,
parameter_list
=
tc
.
parameters
())
state_dict_path
=
os
.
path
.
join
(
args
.
checkpoint_dir
,
'dygraph_state_dict'
)
if
os
.
path
.
exists
(
state_dict_path
+
'.pdparams'
):
state_dict
,
_
=
fluid
.
load_dygraph
(
state_dict_path
)
tc
.
load_dict
(
state_dict
)
reader
=
hub
.
reader
.
ClassifyReader
(
dataset
=
dataset
,
vocab_path
=
ernie
.
get_vocab_path
(),
max_seq_len
=
args
.
max_seq_len
,
sp_model_path
=
ernie
.
get_spm_path
(),
word_dict_path
=
ernie
.
get_word_dict_path
())
train_reader
=
reader
.
data_generator
(
batch_size
=
args
.
batch_size
,
phase
=
'train'
)
loss_sum
=
acc_sum
=
cnt
=
0
# 执行epoch_num次训练
for
epoch
in
range
(
args
.
num_epoch
):
# 读取训练数据进行训练
for
batch_id
,
data
in
enumerate
(
train_reader
()):
input_ids
=
np
.
array
(
data
[
0
][
0
]).
astype
(
np
.
int64
)
position_ids
=
np
.
array
(
data
[
0
][
1
]).
astype
(
np
.
int64
)
segment_ids
=
np
.
array
(
data
[
0
][
2
]).
astype
(
np
.
int64
)
input_mask
=
np
.
array
(
data
[
0
][
3
]).
astype
(
np
.
float32
)
labels
=
np
.
array
(
data
[
0
][
4
]).
astype
(
np
.
int64
)
pred
=
tc
(
input_ids
,
position_ids
,
segment_ids
,
input_mask
)
acc
=
fluid
.
layers
.
accuracy
(
pred
,
to_variable
(
labels
))
loss
=
fluid
.
layers
.
cross_entropy
(
pred
,
to_variable
(
labels
))
avg_loss
=
fluid
.
layers
.
mean
(
loss
)
avg_loss
.
backward
()
# 参数更新
adam
.
minimize
(
avg_loss
)
loss_sum
+=
avg_loss
.
numpy
()
*
labels
.
shape
[
0
]
acc_sum
+=
acc
.
numpy
()
*
labels
.
shape
[
0
]
cnt
+=
labels
.
shape
[
0
]
if
batch_id
%
args
.
log_interval
==
0
:
print
(
'epoch {}: loss {}, acc {}'
.
format
(
epoch
,
loss_sum
/
cnt
,
acc_sum
/
cnt
))
loss_sum
=
acc_sum
=
cnt
=
0
if
batch_id
%
args
.
save_interval
==
0
:
state_dict
=
tc
.
state_dict
()
fluid
.
save_dygraph
(
state_dict
,
state_dict_path
)
if
__name__
==
"__main__"
:
args
=
parser
.
parse_args
()
finetune
(
args
)
docs/pretrained_models.md
0 → 100644
浏览文件 @
22c4494f
此差异已折叠。
点击以展开。
docs/reference/config.md
浏览文件 @
22c4494f
...
@@ -8,8 +8,8 @@
...
@@ -8,8 +8,8 @@
hub
.
RunConfig
(
hub
.
RunConfig
(
log_interval
=
10
,
log_interval
=
10
,
eval_interval
=
100
,
eval_interval
=
100
,
use_pyreader
=
Fals
e
,
use_pyreader
=
Tru
e
,
use_data_parallel
=
Fals
e
,
use_data_parallel
=
Tru
e
,
save_ckpt_interval
=
None
,
save_ckpt_interval
=
None
,
use_cuda
=
False
,
use_cuda
=
False
,
checkpoint_dir
=
None
,
checkpoint_dir
=
None
,
...
@@ -22,8 +22,8 @@ hub.RunConfig(
...
@@ -22,8 +22,8 @@ hub.RunConfig(
*
`log_interval`
: 打印训练日志的周期,默认为10。
*
`log_interval`
: 打印训练日志的周期,默认为10。
*
`eval_interval`
: 进行评估的周期,默认为100。
*
`eval_interval`
: 进行评估的周期,默认为100。
*
`use_pyreader`
: 是否使用pyreader,默认
Fals
e。
*
`use_pyreader`
: 是否使用pyreader,默认
Tru
e。
*
`use_data_parallel`
: 是否使用并行计算,默认
Fals
e。打开该功能依赖nccl库。
*
`use_data_parallel`
: 是否使用并行计算,默认
Tru
e。打开该功能依赖nccl库。
*
`save_ckpt_interval`
: 保存checkpoint的周期,默认为None。
*
`save_ckpt_interval`
: 保存checkpoint的周期,默认为None。
*
`use_cuda`
: 是否使用GPU训练和评估,默认为False。
*
`use_cuda`
: 是否使用GPU训练和评估,默认为False。
*
`checkpoint_dir`
: checkpoint的保存目录,默认为None,此时会在工作目录下根据时间戳生成一个临时目录。
*
`checkpoint_dir`
: checkpoint的保存目录,默认为None,此时会在工作目录下根据时间戳生成一个临时目录。
...
...
docs/reference/task/base_task.md
浏览文件 @
22c4494f
...
@@ -169,15 +169,6 @@ import paddlehub as hub
...
@@ -169,15 +169,6 @@ import paddlehub as hub
task
.
predict
()
task
.
predict
()
```
```
## Func `predict`
根据config配置进行predict
**示例**
```
python
import
paddlehub
as
hub
...
task
.
predict
()
```
## Property `is_train_phase`
## Property `is_train_phase`
判断是否处于训练阶段
判断是否处于训练阶段
...
...
docs/tutorial/define_task_example.md
0 → 100644
浏览文件 @
22c4494f
# 如何修改Task中的模型网络
在应用中,用户需要更换迁移网络结构以调整模型在数据集上的性能。根据
[
如何自定义Task
](
./how_to_define_task.md
)
,本教程展示如何修改Task中的默认网络。
以序列标注任务为例,本教程展示如何修改默认网络结构。SequenceLabelTask提供了两种网络选择,一种是FC网络,一种是FC+CRF网络。
此时如果想在这基础之上,添加LSTM网络,组成BiLSTM+CRF的一种序列标注任务常用网络结构。
此时,需要定义一个Task,继承自SequenceLabelTask,并改写其中build_net()方法。
下方代码示例写了一个BiLSTM+CRF的网络。代码如下:
```
python
class
SequenceLabelTask_BiLSTMCRF
(
SequenceLabelTask
):
def
_build_net
(
self
):
"""
自定义序列标注迁移网络结构BiLSTM+CRF
"""
self
.
seq_len
=
fluid
.
layers
.
data
(
name
=
"seq_len"
,
shape
=
[
1
],
dtype
=
'int64'
,
lod_level
=
0
)
if
version_compare
(
paddle
.
__version__
,
"1.6"
):
self
.
seq_len_used
=
fluid
.
layers
.
squeeze
(
self
.
seq_len
,
axes
=
[
1
])
else
:
self
.
seq_len_used
=
self
.
seq_len
if
self
.
add_crf
:
# 迁移网络为BiLSTM+CRF
# 去padding
unpad_feature
=
fluid
.
layers
.
sequence_unpad
(
self
.
feature
,
length
=
self
.
seq_len_used
)
# bilstm层
hid_dim
=
128
fc0
=
fluid
.
layers
.
fc
(
input
=
unpad_feature
,
size
=
hid_dim
*
4
)
rfc0
=
fluid
.
layers
.
fc
(
input
=
unpad_feature
,
size
=
hid_dim
*
4
)
lstm_h
,
c
=
fluid
.
layers
.
dynamic_lstm
(
input
=
fc0
,
size
=
hid_dim
*
4
,
is_reverse
=
False
)
rlstm_h
,
c
=
fluid
.
layers
.
dynamic_lstm
(
input
=
rfc0
,
size
=
hid_dim
*
4
,
is_reverse
=
True
)
# 拼接lstm
lstm_concat
=
fluid
.
layers
.
concat
(
input
=
[
lstm_h
,
rlstm_h
],
axis
=
1
)
self
.
emission
=
fluid
.
layers
.
fc
(
size
=
self
.
num_classes
,
input
=
lstm_concat
,
param_attr
=
fluid
.
ParamAttr
(
initializer
=
fluid
.
initializer
.
Uniform
(
low
=-
0.1
,
high
=
0.1
),
regularizer
=
fluid
.
regularizer
.
L2DecayRegularizer
(
regularization_coeff
=
1e-4
)))
size
=
self
.
emission
.
shape
[
1
]
fluid
.
layers
.
create_parameter
(
shape
=
[
size
+
2
,
size
],
dtype
=
self
.
emission
.
dtype
,
name
=
'crfw'
)
# CRF层
self
.
ret_infers
=
fluid
.
layers
.
crf_decoding
(
input
=
self
.
emission
,
param_attr
=
fluid
.
ParamAttr
(
name
=
'crfw'
))
ret_infers
=
fluid
.
layers
.
assign
(
self
.
ret_infers
)
# 返回预测值,list类型
return
[
ret_infers
]
else
:
# 迁移网络为FC
self
.
logits
=
fluid
.
layers
.
fc
(
input
=
self
.
feature
,
size
=
self
.
num_classes
,
num_flatten_dims
=
2
,
param_attr
=
fluid
.
ParamAttr
(
name
=
"cls_seq_label_out_w"
,
initializer
=
fluid
.
initializer
.
TruncatedNormal
(
scale
=
0.02
)),
bias_attr
=
fluid
.
ParamAttr
(
name
=
"cls_seq_label_out_b"
,
initializer
=
fluid
.
initializer
.
Constant
(
0.
)))
self
.
ret_infers
=
fluid
.
layers
.
reshape
(
x
=
fluid
.
layers
.
argmax
(
self
.
logits
,
axis
=
2
),
shape
=
[
-
1
,
1
])
logits
=
self
.
logits
logits
=
fluid
.
layers
.
flatten
(
logits
,
axis
=
2
)
logits
=
fluid
.
layers
.
softmax
(
logits
)
self
.
num_labels
=
logits
.
shape
[
1
]
# 返回预测值,list类型
return
[
logits
]
```
以上代码通过继承PaddleHub已经内置的Task,改写其中_build_net方法即可实现自定义迁移网络结构。
docs/tutorial/finetuned_model_to_module.md
浏览文件 @
22c4494f
...
@@ -148,7 +148,9 @@ def _initialize(self,
...
@@ -148,7 +148,9 @@ def _initialize(self,
初始化过程即为Fine-tune时创建Task的过程。
初始化过程即为Fine-tune时创建Task的过程。
**NOTE:**
执行类的初始化不能使用默认的__init__接口,而是应该重载实现_initialize接口。对象默认内置了directory属性,可以直接获取到Module所在路径
**NOTE:**
1.
执行类的初始化不能使用默认的__init__接口,而是应该重载实现_initialize接口。对象默认内置了directory属性,可以直接获取到Module所在路径。
2.
使用Fine-tune保存的模型预测时,无需加载数据集Dataset,即Reader中的dataset参数可为None。
#### step 3_4. 完善预测逻辑
#### step 3_4. 完善预测逻辑
```
python
```
python
...
@@ -160,7 +162,14 @@ def predict(self, data, return_result=False, accelerate_mode=True):
...
@@ -160,7 +162,14 @@ def predict(self, data, return_result=False, accelerate_mode=True):
data
=
data
,
data
=
data
,
return_result
=
return_result
,
return_result
=
return_result
,
accelerate_mode
=
accelerate_mode
)
accelerate_mode
=
accelerate_mode
)
return
run_states
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
prediction
=
[]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
batch_result
=
batch_result
.
tolist
()
prediction
+=
batch_result
return
prediction
```
```
#### step 3_5. 支持serving调用
#### step 3_5. 支持serving调用
...
@@ -179,7 +188,14 @@ def predict(self, data, return_result=False, accelerate_mode=True):
...
@@ -179,7 +188,14 @@ def predict(self, data, return_result=False, accelerate_mode=True):
data
=
data
,
data
=
data
,
return_result
=
return_result
,
return_result
=
return_result
,
accelerate_mode
=
accelerate_mode
)
accelerate_mode
=
accelerate_mode
)
return
run_states
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
prediction
=
[]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
batch_result
=
batch_result
.
tolist
()
prediction
+=
batch_result
return
prediction
```
```
### 完整代码
### 完整代码
...
@@ -214,15 +230,9 @@ ernie_tiny = hub.Module(name="ernie_tiny_finetuned")
...
@@ -214,15 +230,9 @@ ernie_tiny = hub.Module(name="ernie_tiny_finetuned")
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
index
=
0
predictions
=
ernie_tiny
.
predict
(
data
=
data
)
run_states
=
ernie_tiny
.
predict
(
data
=
data
)
for
index
,
text
in
enumerate
(
data
):
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
predictions
[
index
]))
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
```
```
### 调用方法2
### 调用方法2
...
@@ -238,15 +248,9 @@ ernie_tiny_finetuned = hub.Module(directory="finetuned_model_to_module/")
...
@@ -238,15 +248,9 @@ ernie_tiny_finetuned = hub.Module(directory="finetuned_model_to_module/")
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
index
=
0
predictions
=
ernie_tiny
.
predict
(
data
=
data
)
run_states
=
ernie_tiny
.
predict
(
data
=
data
)
for
index
,
text
in
enumerate
(
data
):
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
predictions
[
index
]))
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
```
```
### 调用方法3
### 调用方法3
...
@@ -263,13 +267,42 @@ import numpy as np
...
@@ -263,13 +267,42 @@ import numpy as np
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
run_states
=
ERNIETinyFinetuned
.
predict
(
data
=
data
)
predictions
=
ERNIETinyFinetuned
.
predict
(
data
=
data
)
index
=
0
for
index
,
text
in
enumerate
(
data
):
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
predictions
[
index
]))
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
```
```
### PaddleHub Serving调用方法
**第一步:启动预测服务**
```
shell
hub serving start
-m
ernie_tiny_finetuned
```
**第二步:发送请求,获取预测结果**
通过如下脚本既可以发送请求:
```
python
# coding: utf8
import
requests
import
json
# 待预测文本
texts
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
# key为'data', 对应着预测接口predict的参数data
data
=
{
'data'
:
texts
}
# 指定模型为ernie_tiny_finetuned并发送post请求,且请求的headers为application/json方式
url
=
"http://127.0.0.1:8866/predict/ernie_tiny_finetuned"
headers
=
{
"Content-Type"
:
"application/json"
}
r
=
requests
.
post
(
url
=
url
,
headers
=
headers
,
data
=
json
.
dumps
(
data
))
# 打印预测结果
print
(
json
.
dumps
(
r
.
json
(),
indent
=
4
,
ensure_ascii
=
False
))
```
关与PaddleHub Serving更多信息参见
[
Hub Serving教程
](
../../docs/tutorial/serving.md
)
以及
[
Demo
](
../../demo/serving
)
docs/tutorial/how_to_load_data.md
浏览文件 @
22c4494f
...
@@ -22,6 +22,7 @@
...
@@ -22,6 +22,7 @@
如果您有两个输入文本text_a、text_b,则第一列为第一个输入文本text_a, 第二列应为第二个输入文本text_b,第三列文本类别label。列与列之间以Tab键分隔。数据集第一行为
`text_a text_b label`
(中间以Tab键分隔)。
如果您有两个输入文本text_a、text_b,则第一列为第一个输入文本text_a, 第二列应为第二个输入文本text_b,第三列文本类别label。列与列之间以Tab键分隔。数据集第一行为
`text_a text_b label`
(中间以Tab键分隔)。
```
text
```
text
text_a label
text_a label
15.4寸笔记本的键盘确实爽,基本跟台式机差不多了,蛮喜欢数字小键盘,输数字特方便,样子也很美观,做工也相当不错 1
15.4寸笔记本的键盘确实爽,基本跟台式机差不多了,蛮喜欢数字小键盘,输数字特方便,样子也很美观,做工也相当不错 1
...
@@ -36,6 +37,7 @@ text_a label
...
@@ -36,6 +37,7 @@ text_a label
*
数据集文件编码格式建议为utf8格式。
*
数据集文件编码格式建议为utf8格式。
*
如果相应的数据集文件没有上述的列说明,如train.tsv文件没有第一行的
`text_a label`
,则train_file_with_header=False。
*
如果相应的数据集文件没有上述的列说明,如train.tsv文件没有第一行的
`text_a label`
,则train_file_with_header=False。
*
如果您还有预测数据(没有文本类别),可以将预测数据存放在predict.tsv文件,文件格式和train.tsv类似。去掉label一列即可。
*
如果您还有预测数据(没有文本类别),可以将预测数据存放在predict.tsv文件,文件格式和train.tsv类似。去掉label一列即可。
*
分类任务中,数据集的label必须从0开始计数
```
python
```
python
...
@@ -117,6 +119,7 @@ dog
...
@@ -117,6 +119,7 @@ dog
*
训练/验证/测试集的数据列表文件中的图片路径需要相对于dataset_dir的相对路径,例如图片的实际位置为
`/test/data/dog/dog1.jpg`
。base_path为
`/test/data`
,则文件中填写的路径应该为
`dog/dog1.jpg`
。
*
训练/验证/测试集的数据列表文件中的图片路径需要相对于dataset_dir的相对路径,例如图片的实际位置为
`/test/data/dog/dog1.jpg`
。base_path为
`/test/data`
,则文件中填写的路径应该为
`dog/dog1.jpg`
。
*
如果您还有预测数据(没有文本类别),可以将预测数据存放在predict_list.txt文件,文件格式和train_list.txt类似。去掉label一列即可
*
如果您还有预测数据(没有文本类别),可以将预测数据存放在predict_list.txt文件,文件格式和train_list.txt类似。去掉label一列即可
*
如果您的数据集类别较少,可以不用定义label_list.txt,可以选择定义label_list=["数据集所有类别"]。
*
如果您的数据集类别较少,可以不用定义label_list.txt,可以选择定义label_list=["数据集所有类别"]。
*
分类任务中,数据集的label必须从0开始计数
```
python
```
python
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDataset
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDataset
...
...
hub_module/scripts/configs/faster_rcnn_resnet50_fpn_venus.yml
浏览文件 @
22c4494f
name
:
faster_rcnn_resnet50_fpn_venus
name
:
faster_rcnn_resnet50_fpn_venus
dir
:
"
modules/image/object_detection/faster_rcnn_resnet50_fpn_venus"
dir
:
"
modules/image/object_detection/faster_rcnn_resnet50_fpn_venus"
#
resources:
resources
:
#
-
-
# url: https://paddlehub.bj.bcebos.com/model/cv/faster_rcnn_resnet50_fpn
_model.tar.gz
url
:
https://paddlehub.bj.bcebos.com/model/cv/faster_rcnn_resnet50_fpn_venus
_model.tar.gz
#
dest: faster_rcnn_resnet50_fpn_model
dest
:
faster_rcnn_resnet50_fpn_model
#
uncompress: True
uncompress
:
True
paddlehub/__init__.py
浏览文件 @
22c4494f
...
@@ -39,6 +39,7 @@ from .common.logger import logger
...
@@ -39,6 +39,7 @@ from .common.logger import logger
from
.common.paddle_helper
import
connect_program
from
.common.paddle_helper
import
connect_program
from
.common.hub_server
import
HubServer
from
.common.hub_server
import
HubServer
from
.common.hub_server
import
server_check
from
.common.hub_server
import
server_check
from
.common.downloader
import
download
,
ResourceNotFoundError
,
ServerConnectionError
from
.module.module
import
Module
from
.module.module
import
Module
from
.module.base_processor
import
BaseProcessor
from
.module.base_processor
import
BaseProcessor
...
...
paddlehub/common/downloader.py
浏览文件 @
22c4494f
#coding:utf-8
#
coding:utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
#
# Licensed under the Apache License, Version 2.0 (the
"License"
# Licensed under the Apache License, Version 2.0 (the
'License'
# you may not use this file except in compliance with the License.
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# You may obtain a copy of the License at
#
#
# http://www.apache.org/licenses/LICENSE-2.0
# http://www.apache.org/licenses/LICENSE-2.0
#
#
# Unless required by applicable law or agreed to in writing, software
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an
"AS IS"
BASIS,
# distributed under the License is distributed on an
'AS IS'
BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
...
@@ -28,6 +28,8 @@ import tarfile
...
@@ -28,6 +28,8 @@ import tarfile
from
paddlehub.common
import
utils
from
paddlehub.common
import
utils
from
paddlehub.common.logger
import
logger
from
paddlehub.common.logger
import
logger
from
paddlehub.common
import
tmp_dir
import
paddlehub
as
hub
__all__
=
[
'Downloader'
,
'progress'
]
__all__
=
[
'Downloader'
,
'progress'
]
FLUSH_INTERVAL
=
0.1
FLUSH_INTERVAL
=
0.1
...
@@ -38,10 +40,10 @@ lasttime = time.time()
...
@@ -38,10 +40,10 @@ lasttime = time.time()
def
progress
(
str
,
end
=
False
):
def
progress
(
str
,
end
=
False
):
global
lasttime
global
lasttime
if
end
:
if
end
:
str
+=
"
\n
"
str
+=
'
\n
'
lasttime
=
0
lasttime
=
0
if
time
.
time
()
-
lasttime
>=
FLUSH_INTERVAL
:
if
time
.
time
()
-
lasttime
>=
FLUSH_INTERVAL
:
sys
.
stdout
.
write
(
"
\r
%s"
%
str
)
sys
.
stdout
.
write
(
'
\r
%s'
%
str
)
lasttime
=
time
.
time
()
lasttime
=
time
.
time
()
sys
.
stdout
.
flush
()
sys
.
stdout
.
flush
()
...
@@ -67,7 +69,7 @@ class Downloader(object):
...
@@ -67,7 +69,7 @@ class Downloader(object):
if
retry_times
<
retry_limit
:
if
retry_times
<
retry_limit
:
retry_times
+=
1
retry_times
+=
1
else
:
else
:
tips
=
"Cannot download {0} within retry limit {1}"
.
format
(
tips
=
'Cannot download {0} within retry limit {1}'
.
format
(
url
,
retry_limit
)
url
,
retry_limit
)
return
False
,
tips
,
None
return
False
,
tips
,
None
r
=
requests
.
get
(
url
,
stream
=
True
)
r
=
requests
.
get
(
url
,
stream
=
True
)
...
@@ -82,19 +84,19 @@ class Downloader(object):
...
@@ -82,19 +84,19 @@ class Downloader(object):
total_length
=
int
(
total_length
)
total_length
=
int
(
total_length
)
starttime
=
time
.
time
()
starttime
=
time
.
time
()
if
print_progress
:
if
print_progress
:
print
(
"Downloading %s"
%
save_name
)
print
(
'Downloading %s'
%
save_name
)
for
data
in
r
.
iter_content
(
chunk_size
=
4096
):
for
data
in
r
.
iter_content
(
chunk_size
=
4096
):
dl
+=
len
(
data
)
dl
+=
len
(
data
)
f
.
write
(
data
)
f
.
write
(
data
)
if
print_progress
:
if
print_progress
:
done
=
int
(
50
*
dl
/
total_length
)
done
=
int
(
50
*
dl
/
total_length
)
progress
(
progress
(
"[%-50s] %.2f%%"
%
'[%-50s] %.2f%%'
%
(
'='
*
done
,
float
(
dl
/
total_length
*
100
)))
(
'='
*
done
,
float
(
dl
/
total_length
*
100
)))
if
print_progress
:
if
print_progress
:
progress
(
"[%-50s] %.2f%%"
%
(
'='
*
50
,
100
),
end
=
True
)
progress
(
'[%-50s] %.2f%%'
%
(
'='
*
50
,
100
),
end
=
True
)
tips
=
"File %s download completed!"
%
(
file_name
)
tips
=
'File %s download completed!'
%
(
file_name
)
return
True
,
tips
,
file_name
return
True
,
tips
,
file_name
def
uncompress
(
self
,
def
uncompress
(
self
,
...
@@ -104,24 +106,25 @@ class Downloader(object):
...
@@ -104,24 +106,25 @@ class Downloader(object):
print_progress
=
False
):
print_progress
=
False
):
dirname
=
os
.
path
.
dirname
(
file
)
if
dirname
is
None
else
dirname
dirname
=
os
.
path
.
dirname
(
file
)
if
dirname
is
None
else
dirname
if
print_progress
:
if
print_progress
:
print
(
"Uncompress %s"
%
file
)
print
(
'Uncompress %s'
%
file
)
with
tarfile
.
open
(
file
,
"r:gz"
)
as
tar
:
with
tarfile
.
open
(
file
,
'r:*'
)
as
tar
:
file_names
=
tar
.
getnames
()
file_names
=
tar
.
getnames
()
size
=
len
(
file_names
)
-
1
size
=
len
(
file_names
)
-
1
module_dir
=
os
.
path
.
join
(
dirname
,
file_names
[
0
])
module_dir
=
os
.
path
.
join
(
dirname
,
file_names
[
0
])
for
index
,
file_name
in
enumerate
(
file_names
):
for
index
,
file_name
in
enumerate
(
file_names
):
if
print_progress
:
if
print_progress
:
done
=
int
(
50
*
float
(
index
)
/
size
)
done
=
int
(
50
*
float
(
index
)
/
size
)
progress
(
"[%-50s] %.2f%%"
%
(
'='
*
done
,
progress
(
'[%-50s] %.2f%%'
%
(
'='
*
done
,
float
(
index
/
size
*
100
)))
float
(
index
/
size
*
100
)))
tar
.
extract
(
file_name
,
dirname
)
tar
.
extract
(
file_name
,
dirname
)
if
print_progress
:
if
print_progress
:
progress
(
"[%-50s] %.2f%%"
%
(
'='
*
50
,
100
),
end
=
True
)
progress
(
'[%-50s] %.2f%%'
%
(
'='
*
50
,
100
),
end
=
True
)
if
delete_file
:
if
delete_file
:
os
.
remove
(
file
)
os
.
remove
(
file
)
return
True
,
"File %s uncompress completed!"
%
file
,
module_dir
return
True
,
'File %s uncompress completed!'
%
file
,
module_dir
def
download_file_and_uncompress
(
self
,
def
download_file_and_uncompress
(
self
,
url
,
url
,
...
@@ -147,8 +150,62 @@ class Downloader(object):
...
@@ -147,8 +150,62 @@ class Downloader(object):
if
save_name
:
if
save_name
:
save_name
=
os
.
path
.
join
(
save_path
,
save_name
)
save_name
=
os
.
path
.
join
(
save_path
,
save_name
)
shutil
.
move
(
file
,
save_name
)
shutil
.
move
(
file
,
save_name
)
return
result
,
"%s
\n
%s"
%
(
tips_1
,
tips_2
),
save_name
return
result
,
'%s
\n
%s'
%
(
tips_1
,
tips_2
),
save_name
return
result
,
"%s
\n
%s"
%
(
tips_1
,
tips_2
),
file
return
result
,
'%s
\n
%s'
%
(
tips_1
,
tips_2
),
file
default_downloader
=
Downloader
()
default_downloader
=
Downloader
()
class
ResourceNotFoundError
(
Exception
):
def
__init__
(
self
,
name
,
version
=
None
):
self
.
name
=
name
self
.
version
=
version
def
__str__
(
self
):
if
self
.
version
:
tips
=
'No resource named {} was found'
.
format
(
self
.
name
)
else
:
tips
=
'No resource named {}-{} was found'
.
format
(
self
.
name
,
self
.
version
)
return
tips
class
ServerConnectionError
(
Exception
):
def
__str__
(
self
):
tips
=
'Can
\'
t connect to Hub Server:{}'
.
format
(
hub
.
HubServer
().
server_url
[
0
])
return
tips
def
download
(
name
,
save_path
,
version
=
None
,
decompress
=
True
,
resource_type
=
'Model'
,
extra
=
None
):
file
=
os
.
path
.
join
(
save_path
,
name
)
file
=
os
.
path
.
realpath
(
file
)
if
os
.
path
.
exists
(
file
):
return
if
not
hub
.
HubServer
().
_server_check
():
raise
ServerConnectionError
search_result
=
hub
.
HubServer
().
get_resource_url
(
name
,
resource_type
=
resource_type
,
version
=
version
,
extra
=
extra
)
if
not
search_result
:
raise
ResourceNotFoundError
(
name
,
version
)
url
=
search_result
[
'url'
]
with
tmp_dir
()
as
_dir
:
if
not
os
.
path
.
exists
(
save_path
):
os
.
makedirs
(
save_path
)
_
,
_
,
savefile
=
default_downloader
.
download_file
(
url
=
url
,
save_path
=
_dir
,
print_progress
=
True
)
if
tarfile
.
is_tarfile
(
savefile
)
and
decompress
:
_
,
_
,
savefile
=
default_downloader
.
uncompress
(
file
=
savefile
,
print_progress
=
True
)
shutil
.
move
(
savefile
,
file
)
paddlehub/common/hub_server.py
浏览文件 @
22c4494f
...
@@ -46,7 +46,8 @@ class HubServer(object):
...
@@ -46,7 +46,8 @@ class HubServer(object):
config_file_path
=
os
.
path
.
join
(
CONF_HOME
,
'config.json'
)
config_file_path
=
os
.
path
.
join
(
CONF_HOME
,
'config.json'
)
if
not
os
.
path
.
exists
(
CONF_HOME
):
if
not
os
.
path
.
exists
(
CONF_HOME
):
utils
.
mkdir
(
CONF_HOME
)
utils
.
mkdir
(
CONF_HOME
)
if
not
os
.
path
.
exists
(
config_file_path
):
if
not
os
.
path
.
exists
(
config_file_path
)
or
0
==
os
.
path
.
getsize
(
config_file_path
):
with
open
(
config_file_path
,
'w+'
)
as
fp
:
with
open
(
config_file_path
,
'w+'
)
as
fp
:
lock
.
flock
(
fp
,
lock
.
LOCK_EX
)
lock
.
flock
(
fp
,
lock
.
LOCK_EX
)
fp
.
write
(
json
.
dumps
(
default_server_config
))
fp
.
write
(
json
.
dumps
(
default_server_config
))
...
...
paddlehub/common/logger.py
浏览文件 @
22c4494f
...
@@ -62,7 +62,8 @@ class Logger(object):
...
@@ -62,7 +62,8 @@ class Logger(object):
self
.
logger
.
setLevel
(
logging
.
DEBUG
)
self
.
logger
.
setLevel
(
logging
.
DEBUG
)
self
.
logger
.
propagate
=
False
self
.
logger
.
propagate
=
False
if
os
.
path
.
exists
(
os
.
path
.
join
(
CONF_HOME
,
"config.json"
)):
config_path
=
os
.
path
.
join
(
CONF_HOME
,
"config.json"
)
if
os
.
path
.
exists
(
config_path
)
and
0
<
os
.
path
.
getsize
(
config_path
):
with
open
(
os
.
path
.
join
(
CONF_HOME
,
"config.json"
),
"r"
)
as
fp
:
with
open
(
os
.
path
.
join
(
CONF_HOME
,
"config.json"
),
"r"
)
as
fp
:
level
=
json
.
load
(
fp
).
get
(
"log_level"
,
"DEBUG"
)
level
=
json
.
load
(
fp
).
get
(
"log_level"
,
"DEBUG"
)
self
.
logLevel
=
level
self
.
logLevel
=
level
...
...
paddlehub/common/paddle_helper.py
浏览文件 @
22c4494f
...
@@ -19,10 +19,11 @@ from __future__ import print_function
...
@@ -19,10 +19,11 @@ from __future__ import print_function
import
copy
import
copy
import
paddle
import
paddle.fluid
as
fluid
import
paddle.fluid
as
fluid
from
paddlehub.module
import
module_desc_pb2
from
paddlehub.module
import
module_desc_pb2
from
paddlehub.common.utils
import
from_pyobj_to_module_attr
,
from_module_attr_to_pyobj
from
paddlehub.common.utils
import
from_pyobj_to_module_attr
,
from_module_attr_to_pyobj
,
version_compare
from
paddlehub.common.logger
import
logger
from
paddlehub.common.logger
import
logger
dtype_map
=
{
dtype_map
=
{
...
@@ -62,7 +63,8 @@ def get_variable_info(var):
...
@@ -62,7 +63,8 @@ def get_variable_info(var):
var_info
[
'trainable'
]
=
var
.
trainable
var_info
[
'trainable'
]
=
var
.
trainable
var_info
[
'optimize_attr'
]
=
var
.
optimize_attr
var_info
[
'optimize_attr'
]
=
var
.
optimize_attr
var_info
[
'regularizer'
]
=
var
.
regularizer
var_info
[
'regularizer'
]
=
var
.
regularizer
var_info
[
'gradient_clip_attr'
]
=
var
.
gradient_clip_attr
if
not
version_compare
(
paddle
.
__version__
,
'1.8'
):
var_info
[
'gradient_clip_attr'
]
=
var
.
gradient_clip_attr
var_info
[
'do_model_average'
]
=
var
.
do_model_average
var_info
[
'do_model_average'
]
=
var
.
do_model_average
else
:
else
:
var_info
[
'persistable'
]
=
var
.
persistable
var_info
[
'persistable'
]
=
var
.
persistable
...
...
paddlehub/dataset/food101.py
浏览文件 @
22c4494f
...
@@ -25,11 +25,11 @@ from paddlehub.dataset.base_cv_dataset import BaseCVDataset
...
@@ -25,11 +25,11 @@ from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class
Food101Dataset
(
BaseCVDataset
):
class
Food101Dataset
(
BaseCVDataset
):
def
__init__
(
self
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"food-101"
,
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"food-101"
)
"images"
)
dataset_path
=
self
.
_download_dataset
(
base_path
=
self
.
_download_dataset
(
dataset_path
=
dataset_path
,
dataset_path
=
dataset_path
,
url
=
"https://bj.bcebos.com/paddlehub-dataset/Food101.tar.gz"
)
url
=
"https://bj.bcebos.com/paddlehub-dataset/Food101.tar.gz"
)
base_path
=
os
.
path
.
join
(
dataset_path
,
"images"
)
super
(
Food101Dataset
,
self
).
__init__
(
super
(
Food101Dataset
,
self
).
__init__
(
base_path
=
base_path
,
base_path
=
base_path
,
train_list_file
=
"train_list.txt"
,
train_list_file
=
"train_list.txt"
,
...
...
paddlehub/module/manager.py
浏览文件 @
22c4494f
...
@@ -96,8 +96,10 @@ class LocalModuleManager(object):
...
@@ -96,8 +96,10 @@ class LocalModuleManager(object):
for
sub_dir_name
in
os
.
listdir
(
self
.
local_modules_dir
):
for
sub_dir_name
in
os
.
listdir
(
self
.
local_modules_dir
):
sub_dir_path
=
os
.
path
.
join
(
self
.
local_modules_dir
,
sub_dir_name
)
sub_dir_path
=
os
.
path
.
join
(
self
.
local_modules_dir
,
sub_dir_name
)
if
os
.
path
.
isdir
(
sub_dir_path
):
if
os
.
path
.
isdir
(
sub_dir_path
):
if
"-"
in
sub_dir_path
:
if
"-"
in
sub_dir_name
:
new_sub_dir_path
=
sub_dir_path
.
replace
(
"-"
,
"_"
)
sub_dir_name
=
sub_dir_name
.
replace
(
"-"
,
"_"
)
new_sub_dir_path
=
os
.
path
.
join
(
self
.
local_modules_dir
,
sub_dir_name
)
shutil
.
move
(
sub_dir_path
,
new_sub_dir_path
)
shutil
.
move
(
sub_dir_path
,
new_sub_dir_path
)
sub_dir_path
=
new_sub_dir_path
sub_dir_path
=
new_sub_dir_path
valid
,
info
=
self
.
check_module_valid
(
sub_dir_path
)
valid
,
info
=
self
.
check_module_valid
(
sub_dir_path
)
...
@@ -180,11 +182,13 @@ class LocalModuleManager(object):
...
@@ -180,11 +182,13 @@ class LocalModuleManager(object):
with
tarfile
.
open
(
module_package
,
"r:gz"
)
as
tar
:
with
tarfile
.
open
(
module_package
,
"r:gz"
)
as
tar
:
file_names
=
tar
.
getnames
()
file_names
=
tar
.
getnames
()
size
=
len
(
file_names
)
-
1
size
=
len
(
file_names
)
-
1
module_dir
=
os
.
path
.
join
(
_dir
,
file_names
[
0
])
module_name
=
file_names
[
0
]
module_dir
=
os
.
path
.
join
(
_dir
,
module_name
)
for
index
,
file_name
in
enumerate
(
file_names
):
for
index
,
file_name
in
enumerate
(
file_names
):
tar
.
extract
(
file_name
,
_dir
)
tar
.
extract
(
file_name
,
_dir
)
if
"-"
in
module_dir
:
if
"-"
in
module_name
:
new_module_dir
=
module_dir
.
replace
(
"-"
,
"_"
)
module_name
=
module_name
.
replace
(
"-"
,
"_"
)
new_module_dir
=
os
.
path
.
join
(
_dir
,
module_name
)
shutil
.
move
(
module_dir
,
new_module_dir
)
shutil
.
move
(
module_dir
,
new_module_dir
)
module_dir
=
new_module_dir
module_dir
=
new_module_dir
module_name
=
hub
.
Module
(
directory
=
module_dir
).
name
module_name
=
hub
.
Module
(
directory
=
module_dir
).
name
...
...
paddlehub/module/module.py
浏览文件 @
22c4494f
...
@@ -89,7 +89,7 @@ def moduleinfo(name, version, author, author_email, summary, type):
...
@@ -89,7 +89,7 @@ def moduleinfo(name, version, author, author_email, summary, type):
return
_wrapper
return
_wrapper
class
Module
(
object
):
class
Module
(
fluid
.
dygraph
.
Layer
):
def
__new__
(
cls
,
def
__new__
(
cls
,
name
=
None
,
name
=
None
,
directory
=
None
,
directory
=
None
,
...
@@ -121,7 +121,7 @@ class Module(object):
...
@@ -121,7 +121,7 @@ class Module(object):
module
=
Module
.
init_with_directory
(
module
=
Module
.
init_with_directory
(
directory
=
directory
,
**
kwargs
)
directory
=
directory
,
**
kwargs
)
else
:
else
:
module
=
object
.
__new__
(
cls
)
module
=
fluid
.
dygraph
.
Layer
.
__new__
(
cls
)
return
module
return
module
...
@@ -135,6 +135,7 @@ class Module(object):
...
@@ -135,6 +135,7 @@ class Module(object):
if
"_is_initialize"
in
self
.
__dict__
and
self
.
_is_initialize
:
if
"_is_initialize"
in
self
.
__dict__
and
self
.
_is_initialize
:
return
return
super
(
Module
,
self
).
__init__
()
_run_func_name
=
self
.
_get_func_name
(
self
.
__class__
,
_run_func_name
=
self
.
_get_func_name
(
self
.
__class__
,
_module_runnable_func
)
_module_runnable_func
)
self
.
_run_func
=
getattr
(
self
,
self
.
_run_func
=
getattr
(
self
,
...
@@ -248,6 +249,10 @@ class Module(object):
...
@@ -248,6 +249,10 @@ class Module(object):
def
_initialize
(
self
):
def
_initialize
(
self
):
pass
pass
def
forward
(
self
,
*
args
,
**
kwargs
):
raise
RuntimeError
(
'{} does not support dynamic graph mode yet.'
.
format
(
self
.
name
))
class
ModuleHelper
(
object
):
class
ModuleHelper
(
object
):
def
__init__
(
self
,
directory
):
def
__init__
(
self
,
directory
):
...
...
paddlehub/module/nlp_module.py
浏览文件 @
22c4494f
...
@@ -24,13 +24,15 @@ import os
...
@@ -24,13 +24,15 @@ import os
import
re
import
re
import
six
import
six
import
paddle
import
numpy
as
np
import
numpy
as
np
import
paddle.fluid
as
fluid
import
paddle.fluid
as
fluid
from
paddlehub.common
import
paddle_helper
from
paddle.fluid.core
import
PaddleTensor
,
AnalysisConfig
,
create_paddle_predictor
import
paddlehub
as
hub
import
paddlehub
as
hub
from
paddle.fluid.core
import
PaddleTensor
,
AnalysisConfig
,
create_paddle_predictor
from
paddlehub.common
import
paddle_helper
,
tmp_dir
from
paddlehub.common.logger
import
logger
from
paddlehub.common.logger
import
logger
from
paddlehub.common.utils
import
sys_stdin_encoding
from
paddlehub.common.utils
import
sys_stdin_encoding
,
version_compare
from
paddlehub.io.parser
import
txt_parser
from
paddlehub.io.parser
import
txt_parser
from
paddlehub.module.module
import
runnable
from
paddlehub.module.module
import
runnable
...
@@ -246,6 +248,45 @@ class TransformerModule(NLPBaseModule):
...
@@ -246,6 +248,45 @@ class TransformerModule(NLPBaseModule):
Tranformer Module base class can be used by BERT, ERNIE, RoBERTa and so on.
Tranformer Module base class can be used by BERT, ERNIE, RoBERTa and so on.
"""
"""
def
__init__
(
self
,
name
=
None
,
directory
=
None
,
module_dir
=
None
,
version
=
None
,
max_seq_len
=
128
,
**
kwargs
):
if
not
directory
:
return
super
(
TransformerModule
,
self
).
__init__
(
name
=
name
,
directory
=
directory
,
module_dir
=
module_dir
,
version
=
version
,
**
kwargs
)
self
.
max_seq_len
=
max_seq_len
if
version_compare
(
paddle
.
__version__
,
'1.8.0'
):
with
tmp_dir
()
as
_dir
:
input_dict
,
output_dict
,
program
=
self
.
context
(
max_seq_len
=
max_seq_len
)
fluid
.
io
.
save_inference_model
(
dirname
=
_dir
,
main_program
=
program
,
feeded_var_names
=
[
input_dict
[
'input_ids'
].
name
,
input_dict
[
'position_ids'
].
name
,
input_dict
[
'segment_ids'
].
name
,
input_dict
[
'input_mask'
].
name
],
target_vars
=
[
output_dict
[
"pooled_output"
],
output_dict
[
"sequence_output"
]
],
executor
=
fluid
.
Executor
(
fluid
.
CPUPlace
()))
with
fluid
.
dygraph
.
guard
():
self
.
model_runner
=
fluid
.
dygraph
.
StaticModelRunner
(
_dir
)
def
init_pretraining_params
(
self
,
exe
,
pretraining_params_path
,
def
init_pretraining_params
(
self
,
exe
,
pretraining_params_path
,
main_program
):
main_program
):
assert
os
.
path
.
exists
(
assert
os
.
path
.
exists
(
...
@@ -271,7 +312,7 @@ class TransformerModule(NLPBaseModule):
...
@@ -271,7 +312,7 @@ class TransformerModule(NLPBaseModule):
def
context
(
def
context
(
self
,
self
,
max_seq_len
=
128
,
max_seq_len
=
None
,
trainable
=
True
,
trainable
=
True
,
):
):
"""
"""
...
@@ -287,6 +328,9 @@ class TransformerModule(NLPBaseModule):
...
@@ -287,6 +328,9 @@ class TransformerModule(NLPBaseModule):
"""
"""
if
not
max_seq_len
:
max_seq_len
=
self
.
max_seq_len
assert
max_seq_len
<=
self
.
MAX_SEQ_LEN
and
max_seq_len
>=
1
,
"max_seq_len({}) should be in the range of [1, {}]"
.
format
(
assert
max_seq_len
<=
self
.
MAX_SEQ_LEN
and
max_seq_len
>=
1
,
"max_seq_len({}) should be in the range of [1, {}]"
.
format
(
max_seq_len
,
self
.
MAX_SEQ_LEN
)
max_seq_len
,
self
.
MAX_SEQ_LEN
)
...
@@ -431,3 +475,16 @@ class TransformerModule(NLPBaseModule):
...
@@ -431,3 +475,16 @@ class TransformerModule(NLPBaseModule):
"The module context has not been initialized. "
"The module context has not been initialized. "
"Please call context() before using get_params_layer"
)
"Please call context() before using get_params_layer"
)
return
self
.
params_layer
return
self
.
params_layer
def
forward
(
self
,
input_ids
,
position_ids
,
segment_ids
,
input_mask
):
if
version_compare
(
paddle
.
__version__
,
'1.8.0'
):
pooled_output
,
sequence_output
=
self
.
model_runner
(
input_ids
,
position_ids
,
segment_ids
,
input_mask
)
return
{
'pooled_output'
:
pooled_output
,
'sequence_output'
:
sequence_output
}
else
:
raise
RuntimeError
(
'{} only support dynamic graph mode in paddle >= 1.8.0'
.
format
(
self
.
name
))
paddlehub/reader/cv_reader.py
浏览文件 @
22c4494f
...
@@ -165,7 +165,7 @@ class ImageClassificationReader(BaseReader):
...
@@ -165,7 +165,7 @@ class ImageClassificationReader(BaseReader):
for
image_path
,
label
in
data
:
for
image_path
,
label
in
data
:
image
=
preprocess
(
image_path
)
image
=
preprocess
(
image_path
)
images
.
append
(
image
.
astype
(
'float32'
))
images
.
append
(
image
.
astype
(
'float32'
))
labels
.
append
([
int
(
label
)])
labels
.
append
([
np
.
int64
(
label
)])
if
len
(
images
)
==
batch_size
:
if
len
(
images
)
==
batch_size
:
if
return_list
:
if
return_list
:
...
...
paddlehub/version.py
浏览文件 @
22c4494f
...
@@ -13,5 +13,5 @@
...
@@ -13,5 +13,5 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
""" PaddleHub version string """
""" PaddleHub version string """
hub_version
=
"1.6.
0
"
hub_version
=
"1.6.
2
"
module_proto_version
=
"1.0.0"
module_proto_version
=
"1.0.0"
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录