Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleHub
提交
22c4494f
P
PaddleHub
项目概览
PaddlePaddle
/
PaddleHub
大约 1 年 前同步成功
通知
282
Star
12117
Fork
2091
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
200
列表
看板
标记
里程碑
合并请求
4
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleHub
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
200
Issue
200
列表
看板
标记
里程碑
合并请求
4
合并请求
4
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
22c4494f
编写于
5月 11, 2020
作者:
S
Steffy-zxf
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'develop' of
https://github.com/PaddlePaddle/PaddleHub
into add-preset-net
上级
b3b8cb0f
b7e8230f
变更
23
展开全部
隐藏空白更改
内联
并排
Showing
23 changed file
with
826 addition
and
111 deletion
+826
-111
README.md
README.md
+18
-16
demo/image_classification/img_classifier_dygraph.py
demo/image_classification/img_classifier_dygraph.py
+89
-0
demo/sequence_labeling/sequence_label_dygraph.py
demo/sequence_labeling/sequence_label_dygraph.py
+107
-0
demo/text_classification/finetuned_model_to_module/module.py
demo/text_classification/finetuned_model_to_module/module.py
+12
-10
demo/text_classification/text_classifier_dygraph.py
demo/text_classification/text_classifier_dygraph.py
+98
-0
docs/pretrained_models.md
docs/pretrained_models.md
+178
-0
docs/reference/config.md
docs/reference/config.md
+4
-4
docs/reference/task/base_task.md
docs/reference/task/base_task.md
+0
-9
docs/tutorial/define_task_example.md
docs/tutorial/define_task_example.md
+84
-0
docs/tutorial/finetuned_model_to_module.md
docs/tutorial/finetuned_model_to_module.md
+63
-30
docs/tutorial/how_to_load_data.md
docs/tutorial/how_to_load_data.md
+3
-0
hub_module/scripts/configs/faster_rcnn_resnet50_fpn_venus.yml
...module/scripts/configs/faster_rcnn_resnet50_fpn_venus.yml
+5
-5
paddlehub/__init__.py
paddlehub/__init__.py
+1
-0
paddlehub/common/downloader.py
paddlehub/common/downloader.py
+74
-17
paddlehub/common/hub_server.py
paddlehub/common/hub_server.py
+2
-1
paddlehub/common/logger.py
paddlehub/common/logger.py
+2
-1
paddlehub/common/paddle_helper.py
paddlehub/common/paddle_helper.py
+4
-2
paddlehub/dataset/food101.py
paddlehub/dataset/food101.py
+3
-3
paddlehub/module/manager.py
paddlehub/module/manager.py
+9
-5
paddlehub/module/module.py
paddlehub/module/module.py
+7
-2
paddlehub/module/nlp_module.py
paddlehub/module/nlp_module.py
+61
-4
paddlehub/reader/cv_reader.py
paddlehub/reader/cv_reader.py
+1
-1
paddlehub/version.py
paddlehub/version.py
+1
-1
未找到文件。
README.md
浏览文件 @
22c4494f
...
...
@@ -8,18 +8,18 @@
![
python version
](
https://img.shields.io/badge/python-3.6+-orange.svg
)
![
support os
](
https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg
)
PaddleHub是飞桨生态的预训练模型应用工具,开发者可以便捷地使用高质量的预训练模型结合Fine-tune API快速完成模型迁移到部署的全流程工作。PaddleHub提供的预训练模型涵盖了图像分类、目标检测、词法分析、语义模型、情感分析、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型。更多详情可查看官网:https://www.paddlepaddle.org.cn/hu
PaddleHub是飞桨生态的预训练模型应用工具,开发者可以便捷地使用高质量的预训练模型结合Fine-tune API快速完成模型迁移到部署的全流程工作。PaddleHub提供的预训练模型涵盖了图像分类、目标检测、词法分析、语义模型、情感分析、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型。更多详情可查看官网:https://www.paddlepaddle.org.cn/hu
b
PaddleHub以预训练模型应用为核心具备以下特点:
*
**[模型即软件](#模型即软件)**
,通过Python API或命令行实现模型调用,可快速体验或集成飞桨特色预训练模型。
*
**[易用的迁移学习](#迁移学习)**
,通过Fine-tune API,内置多种优化策略,只需少量代码即可完成预训练模型的Fine-tuning。
*
**[易用的迁移学习](#
易用的
迁移学习)**
,通过Fine-tune API,内置多种优化策略,只需少量代码即可完成预训练模型的Fine-tuning。
*
**[一键模型转服务](#
服务化部署paddlehub-serving
)**
,简单一行命令即可搭建属于自己的深度学习模型API服务完成部署。
*
**[一键模型转服务](#
一键模型转服务
)**
,简单一行命令即可搭建属于自己的深度学习模型API服务完成部署。
*
**[自动超参优化](#
超参优化autodl-finetuner
)**
,内置AutoDL Finetuner能力,一键启动自动化超参搜索。
*
**[自动超参优化](#
自动超参优化
)**
,内置AutoDL Finetuner能力,一键启动自动化超参搜索。
<p
align=
"center"
>
...
...
@@ -66,7 +66,7 @@ PaddleHub采用模型即软件的设计理念,所有的预训练模型与Pytho
安装PaddleHub后,执行命令
[
hub run
](
./docs/tutorial/cmdintro.md
)
,即可快速体验无需代码、一键预测的功能:
*
使用
[
目标检测
](
http
://www.paddlepaddle.org.cn/hub?filter=
category&value=ObjectDetection
)
模型pyramidbox_lite_mobile_mask对图片进行口罩检测
*
使用
[
目标检测
](
http
s://www.paddlepaddle.org.cn/hublist?filter=en_
category&value=ObjectDetection
)
模型pyramidbox_lite_mobile_mask对图片进行口罩检测
```
shell
$
wget https://paddlehub.bj.bcebos.com/resources/test_mask_detection.jpg
$
hub run pyramidbox_lite_mobile_mask
--input_path
test_mask_detection.jpg
...
...
@@ -75,19 +75,22 @@ $ hub run pyramidbox_lite_mobile_mask --input_path test_mask_detection.jpg
<img src="./docs/imgs/test_mask_detection_result.jpg" align="middle"
</p>
*
使用
[
词法分析
](
http
://www.paddlepaddle.org.cn/hub?filter=
category&value=LexicalAnalysis
)
模型LAC进行分词
*
使用
[
词法分析
](
http
s://www.paddlepaddle.org.cn/hublist?filter=en_
category&value=LexicalAnalysis
)
模型LAC进行分词
```
shell
$
hub run lac
--input_text
"今天是个好日子"
[{
'word'
:
[
'今天'
,
'是'
,
'个'
,
'好日子'
]
,
'tag'
:
[
'TIME'
,
'v'
,
'q'
,
'n'
]}]
$
hub run lac
--input_text
"现在,慕尼黑再保险公司不仅是此类行动的倡议者,更是将其大量气候数据整合进保险产品中,并与公众共享大量天气信息,参与到新能源领域的保障中。"
[{
'word'
:
[
'现在'
,
','
,
'慕尼黑再保险公司'
,
'不仅'
,
'是'
,
'此类'
,
'行动'
,
'的'
,
'倡议者'
,
','
,
'更是'
,
'将'
,
'其'
,
'大量'
,
'气候'
,
'数据'
,
'整合'
,
'进'
,
'保险'
,
'产品'
,
'中'
,
','
,
'并'
,
'与'
,
'公众'
,
'共享'
,
'大量'
,
'天气'
,
'信息'
,
','
,
'参与'
,
'到'
,
'新能源'
,
'领域'
,
'的'
,
'保障'
,
'中'
,
'。'
]
,
'tag'
:
[
'TIME'
,
'w'
,
'ORG'
,
'c'
,
'v'
,
'r'
,
'n'
,
'u'
,
'n'
,
'w'
,
'd'
,
'p'
,
'r'
,
'a'
,
'n'
,
'n'
,
'v'
,
'v'
,
'n'
,
'n'
,
'f'
,
'w'
,
'c'
,
'p'
,
'n'
,
'v'
,
'a'
,
'n'
,
'n'
,
'w'
,
'v'
,
'v'
,
'n'
,
'n'
,
'u'
,
'vn'
,
'f'
,
'w'
]
}]
```
*
使用
[
情感分析
](
http
://www.paddlepaddle.org.cn/hub?filter=
category&value=SentimentAnalysis
)
模型Senta对句子进行情感预测
*
使用
[
情感分析
](
http
s://www.paddlepaddle.org.cn/hublist?filter=en_
category&value=SentimentAnalysis
)
模型Senta对句子进行情感预测
```
shell
$
hub run senta_bilstm
--input_text
"今天天气真好"
{
'text'
:
'今天天气真好'
,
'sentiment_label'
: 1,
'sentiment_key'
:
'positive'
,
'positive_probs'
: 0.9798,
'negative_probs'
: 0.0202
}]
```
*
使用
[
目标检测
](
http
://www.paddlepaddle.org.cn/hub?filter=
category&value=ObjectDetection
)
模型Ultra-Light-Fast-Generic-Face-Detector-1MB对图片进行人脸识别
*
使用
[
目标检测
](
http
s://www.paddlepaddle.org.cn/hublist?filter=en_
category&value=ObjectDetection
)
模型Ultra-Light-Fast-Generic-Face-Detector-1MB对图片进行人脸识别
```
shell
$
wget https://paddlehub.bj.bcebos.com/resources/test_image.jpg
$
hub run ultra_light_fast_generic_face_detector_1mb_640
--input_path
test_image.jpg
...
...
@@ -110,11 +113,11 @@ $ hub run deeplabv3p_xception65_humanseg --input_path test_image.jpg
</p>
<p
align=
'center'
>
         
ace2p分割结果展示
                
humanseg分割结果展示
   
         
ACE2P人体部件分割
                
HumanSeg人像分割
   
</p>
PaddleHub还提供图像分类、语义模型、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型,更多模型介绍,请前往
[
https://www.paddlepaddle.org.cn/hub
](
https://www.paddlepaddle.org.cn/hub
)
查看
PaddleHub还提供图像分类、语义模型、视频分类、图像生成、图像分割、文本审核、关键点检测等主流模型,更多模型介绍,请前往
[
预训练模型介绍
](
./docs/pretrained_models.md
)
或者PaddleHub官网
[
https://www.paddlepaddle.org.cn/hub
](
https://www.paddlepaddle.org.cn/hub
)
查看
### 易用的迁移学习
...
...
@@ -189,6 +192,5 @@ $ hub uninstall ernie
## 更新历史
PaddleHub v1.6.0已发布!
详情参考
[
更新历史
](
./RELEASE.md
)
PaddleHub v1.6 已发布!
更多升级详情参考
[
更新历史
](
./RELEASE.md
)
demo/image_classification/img_classifier_dygraph.py
0 → 100644
浏览文件 @
22c4494f
#coding:utf-8
import
argparse
import
os
import
numpy
as
np
import
paddlehub
as
hub
import
paddle.fluid
as
fluid
from
paddle.fluid.dygraph
import
Linear
from
paddle.fluid.dygraph.base
import
to_variable
from
paddle.fluid.optimizer
import
AdamOptimizer
# yapf: disable
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
1
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
"paddlehub_finetune_ckpt_dygraph"
,
help
=
"Path to save log data."
)
parser
.
add_argument
(
"--batch_size"
,
type
=
int
,
default
=
16
,
help
=
"Total examples' number in batch for training."
)
parser
.
add_argument
(
"--log_interval"
,
type
=
int
,
default
=
10
,
help
=
"log interval."
)
parser
.
add_argument
(
"--save_interval"
,
type
=
int
,
default
=
10
,
help
=
"save interval."
)
# yapf: enable.
class
ResNet50
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
num_classes
,
backbone
):
super
(
ResNet50
,
self
).
__init__
()
self
.
fc
=
Linear
(
input_dim
=
2048
,
output_dim
=
num_classes
)
self
.
backbone
=
backbone
def
forward
(
self
,
imgs
):
feature_map
=
self
.
backbone
(
imgs
)
feature_map
=
fluid
.
layers
.
reshape
(
feature_map
,
shape
=
[
-
1
,
2048
])
pred
=
self
.
fc
(
feature_map
)
return
fluid
.
layers
.
softmax
(
pred
)
def
finetune
(
args
):
with
fluid
.
dygraph
.
guard
():
resnet50_vd_10w
=
hub
.
Module
(
name
=
"resnet50_vd_10w"
)
dataset
=
hub
.
dataset
.
Flowers
()
resnet
=
ResNet50
(
num_classes
=
dataset
.
num_labels
,
backbone
=
resnet50_vd_10w
)
adam
=
AdamOptimizer
(
learning_rate
=
0.001
,
parameter_list
=
resnet
.
parameters
())
state_dict_path
=
os
.
path
.
join
(
args
.
checkpoint_dir
,
'dygraph_state_dict'
)
if
os
.
path
.
exists
(
state_dict_path
+
'.pdparams'
):
state_dict
,
_
=
fluid
.
load_dygraph
(
state_dict_path
)
resnet
.
load_dict
(
state_dict
)
reader
=
hub
.
reader
.
ImageClassificationReader
(
image_width
=
resnet50_vd_10w
.
get_expected_image_width
(),
image_height
=
resnet50_vd_10w
.
get_expected_image_height
(),
images_mean
=
resnet50_vd_10w
.
get_pretrained_images_mean
(),
images_std
=
resnet50_vd_10w
.
get_pretrained_images_std
(),
dataset
=
dataset
)
train_reader
=
reader
.
data_generator
(
batch_size
=
args
.
batch_size
,
phase
=
'train'
)
loss_sum
=
acc_sum
=
cnt
=
0
# 执行epoch_num次训练
for
epoch
in
range
(
args
.
num_epoch
):
# 读取训练数据进行训练
for
batch_id
,
data
in
enumerate
(
train_reader
()):
imgs
=
np
.
array
(
data
[
0
][
0
])
labels
=
np
.
array
(
data
[
0
][
1
])
pred
=
resnet
(
imgs
)
acc
=
fluid
.
layers
.
accuracy
(
pred
,
to_variable
(
labels
))
loss
=
fluid
.
layers
.
cross_entropy
(
pred
,
to_variable
(
labels
))
avg_loss
=
fluid
.
layers
.
mean
(
loss
)
avg_loss
.
backward
()
# 参数更新
adam
.
minimize
(
avg_loss
)
loss_sum
+=
avg_loss
.
numpy
()
*
imgs
.
shape
[
0
]
acc_sum
+=
acc
.
numpy
()
*
imgs
.
shape
[
0
]
cnt
+=
imgs
.
shape
[
0
]
if
batch_id
%
args
.
log_interval
==
0
:
print
(
'epoch {}: loss {}, acc {}'
.
format
(
epoch
,
loss_sum
/
cnt
,
acc_sum
/
cnt
))
loss_sum
=
acc_sum
=
cnt
=
0
if
batch_id
%
args
.
save_interval
==
0
:
state_dict
=
resnet
.
state_dict
()
fluid
.
save_dygraph
(
state_dict
,
state_dict_path
)
if
__name__
==
"__main__"
:
args
=
parser
.
parse_args
()
finetune
(
args
)
demo/sequence_labeling/sequence_label_dygraph.py
0 → 100644
浏览文件 @
22c4494f
#coding:utf-8
import
argparse
import
os
import
numpy
as
np
import
paddlehub
as
hub
import
paddle.fluid
as
fluid
from
paddle.fluid.dygraph
import
Linear
from
paddle.fluid.dygraph.base
import
to_variable
from
paddle.fluid.optimizer
import
AdamOptimizer
from
paddlehub.finetune.evaluate
import
chunk_eval
,
calculate_f1
# yapf: disable
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
1
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--batch_size"
,
type
=
int
,
default
=
16
,
help
=
"Total examples' number in batch for training."
)
parser
.
add_argument
(
"--log_interval"
,
type
=
int
,
default
=
10
,
help
=
"log interval."
)
parser
.
add_argument
(
"--save_interval"
,
type
=
int
,
default
=
10
,
help
=
"save interval."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
"paddlehub_finetune_ckpt_dygraph"
,
help
=
"Path to save log data."
)
parser
.
add_argument
(
"--max_seq_len"
,
type
=
int
,
default
=
512
,
help
=
"Number of words of the longest seqence."
)
# yapf: enable.
class
TransformerSequenceLabelLayer
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
num_classes
,
transformer
):
super
(
TransformerSequenceLabelLayer
,
self
).
__init__
()
self
.
num_classes
=
num_classes
self
.
transformer
=
transformer
self
.
fc
=
Linear
(
input_dim
=
768
,
output_dim
=
num_classes
)
def
forward
(
self
,
input_ids
,
position_ids
,
segment_ids
,
input_mask
):
result
=
self
.
transformer
(
input_ids
,
position_ids
,
segment_ids
,
input_mask
)
pred
=
self
.
fc
(
result
[
'sequence_output'
])
ret_infers
=
fluid
.
layers
.
reshape
(
x
=
fluid
.
layers
.
argmax
(
pred
,
axis
=
2
),
shape
=
[
-
1
,
1
])
pred
=
fluid
.
layers
.
reshape
(
pred
,
shape
=
[
-
1
,
self
.
num_classes
])
return
fluid
.
layers
.
softmax
(
pred
),
ret_infers
def
finetune
(
args
):
ernie
=
hub
.
Module
(
name
=
"ernie"
,
max_seq_len
=
args
.
max_seq_len
)
with
fluid
.
dygraph
.
guard
():
dataset
=
hub
.
dataset
.
MSRA_NER
()
ts
=
TransformerSequenceLabelLayer
(
num_classes
=
dataset
.
num_labels
,
transformer
=
ernie
)
adam
=
AdamOptimizer
(
learning_rate
=
1e-5
,
parameter_list
=
ts
.
parameters
())
state_dict_path
=
os
.
path
.
join
(
args
.
checkpoint_dir
,
'dygraph_state_dict'
)
if
os
.
path
.
exists
(
state_dict_path
+
'.pdparams'
):
state_dict
,
_
=
fluid
.
load_dygraph
(
state_dict_path
)
ts
.
load_dict
(
state_dict
)
reader
=
hub
.
reader
.
SequenceLabelReader
(
dataset
=
dataset
,
vocab_path
=
ernie
.
get_vocab_path
(),
max_seq_len
=
args
.
max_seq_len
,
sp_model_path
=
ernie
.
get_spm_path
(),
word_dict_path
=
ernie
.
get_word_dict_path
())
train_reader
=
reader
.
data_generator
(
batch_size
=
args
.
batch_size
,
phase
=
'train'
)
loss_sum
=
total_infer
=
total_label
=
total_correct
=
cnt
=
0
# 执行epoch_num次训练
for
epoch
in
range
(
args
.
num_epoch
):
# 读取训练数据进行训练
for
batch_id
,
data
in
enumerate
(
train_reader
()):
input_ids
=
np
.
array
(
data
[
0
][
0
]).
astype
(
np
.
int64
)
position_ids
=
np
.
array
(
data
[
0
][
1
]).
astype
(
np
.
int64
)
segment_ids
=
np
.
array
(
data
[
0
][
2
]).
astype
(
np
.
int64
)
input_mask
=
np
.
array
(
data
[
0
][
3
]).
astype
(
np
.
float32
)
labels
=
np
.
array
(
data
[
0
][
4
]).
astype
(
np
.
int64
).
reshape
(
-
1
,
1
)
seq_len
=
np
.
squeeze
(
np
.
array
(
data
[
0
][
5
]).
astype
(
np
.
int64
),
axis
=
1
)
pred
,
ret_infers
=
ts
(
input_ids
,
position_ids
,
segment_ids
,
input_mask
)
loss
=
fluid
.
layers
.
cross_entropy
(
pred
,
to_variable
(
labels
))
avg_loss
=
fluid
.
layers
.
mean
(
loss
)
avg_loss
.
backward
()
# 参数更新
adam
.
minimize
(
avg_loss
)
loss_sum
+=
avg_loss
.
numpy
()
*
labels
.
shape
[
0
]
label_num
,
infer_num
,
correct_num
=
chunk_eval
(
labels
,
ret_infers
.
numpy
(),
seq_len
,
dataset
.
num_labels
,
1
)
cnt
+=
labels
.
shape
[
0
]
total_infer
+=
infer_num
total_label
+=
label_num
total_correct
+=
correct_num
if
batch_id
%
args
.
log_interval
==
0
:
precision
,
recall
,
f1
=
calculate_f1
(
total_label
,
total_infer
,
total_correct
)
print
(
'epoch {}: loss {}, f1 {} recall {} precision {}'
.
format
(
epoch
,
loss_sum
/
cnt
,
f1
,
recall
,
precision
))
loss_sum
=
total_infer
=
total_label
=
total_correct
=
cnt
=
0
if
batch_id
%
args
.
save_interval
==
0
:
state_dict
=
ts
.
state_dict
()
fluid
.
save_dygraph
(
state_dict
,
state_dict_path
)
if
__name__
==
"__main__"
:
args
=
parser
.
parse_args
()
finetune
(
args
)
demo/text_classification/finetuned_model_to_module/module.py
浏览文件 @
22c4494f
...
...
@@ -94,6 +94,7 @@ class ERNIETinyFinetuned(hub.Module):
config
=
config
,
metrics_choices
=
metrics_choices
)
@
serving
def
predict
(
self
,
data
,
return_result
=
False
,
accelerate_mode
=
True
):
"""
Get prediction results
...
...
@@ -102,7 +103,14 @@ class ERNIETinyFinetuned(hub.Module):
data
=
data
,
return_result
=
return_result
,
accelerate_mode
=
accelerate_mode
)
return
run_states
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
prediction
=
[]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
batch_result
=
batch_result
.
tolist
()
prediction
+=
batch_result
return
prediction
if
__name__
==
"__main__"
:
...
...
@@ -113,12 +121,6 @@ if __name__ == "__main__":
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
index
=
0
run_states
=
ernie_tiny
.
predict
(
data
=
data
)
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
predictions
=
ernie_tiny
.
predict
(
data
=
data
)
for
index
,
text
in
enumerate
(
data
):
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
predictions
[
index
]))
demo/text_classification/text_classifier_dygraph.py
0 → 100644
浏览文件 @
22c4494f
#coding:utf-8
import
argparse
import
os
import
numpy
as
np
import
paddlehub
as
hub
import
paddle.fluid
as
fluid
from
paddle.fluid.dygraph
import
Linear
from
paddle.fluid.dygraph.base
import
to_variable
from
paddle.fluid.optimizer
import
AdamOptimizer
# yapf: disable
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
1
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--batch_size"
,
type
=
int
,
default
=
16
,
help
=
"Total examples' number in batch for training."
)
parser
.
add_argument
(
"--log_interval"
,
type
=
int
,
default
=
10
,
help
=
"log interval."
)
parser
.
add_argument
(
"--save_interval"
,
type
=
int
,
default
=
10
,
help
=
"save interval."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
"paddlehub_finetune_ckpt_dygraph"
,
help
=
"Path to save log data."
)
parser
.
add_argument
(
"--max_seq_len"
,
type
=
int
,
default
=
512
,
help
=
"Number of words of the longest seqence."
)
# yapf: enable.
class
TransformerClassifier
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
num_classes
,
transformer
):
super
(
TransformerClassifier
,
self
).
__init__
()
self
.
num_classes
=
num_classes
self
.
transformer
=
transformer
self
.
fc
=
Linear
(
input_dim
=
768
,
output_dim
=
num_classes
)
def
forward
(
self
,
input_ids
,
position_ids
,
segment_ids
,
input_mask
):
result
=
self
.
transformer
(
input_ids
,
position_ids
,
segment_ids
,
input_mask
)
cls_feats
=
fluid
.
layers
.
dropout
(
result
[
'pooled_output'
],
dropout_prob
=
0.1
,
dropout_implementation
=
"upscale_in_train"
)
cls_feats
=
fluid
.
layers
.
reshape
(
cls_feats
,
shape
=
[
-
1
,
768
])
pred
=
self
.
fc
(
cls_feats
)
return
fluid
.
layers
.
softmax
(
pred
)
def
finetune
(
args
):
ernie
=
hub
.
Module
(
name
=
"ernie"
,
max_seq_len
=
args
.
max_seq_len
)
with
fluid
.
dygraph
.
guard
():
dataset
=
hub
.
dataset
.
ChnSentiCorp
()
tc
=
TransformerClassifier
(
num_classes
=
dataset
.
num_labels
,
transformer
=
ernie
)
adam
=
AdamOptimizer
(
learning_rate
=
1e-5
,
parameter_list
=
tc
.
parameters
())
state_dict_path
=
os
.
path
.
join
(
args
.
checkpoint_dir
,
'dygraph_state_dict'
)
if
os
.
path
.
exists
(
state_dict_path
+
'.pdparams'
):
state_dict
,
_
=
fluid
.
load_dygraph
(
state_dict_path
)
tc
.
load_dict
(
state_dict
)
reader
=
hub
.
reader
.
ClassifyReader
(
dataset
=
dataset
,
vocab_path
=
ernie
.
get_vocab_path
(),
max_seq_len
=
args
.
max_seq_len
,
sp_model_path
=
ernie
.
get_spm_path
(),
word_dict_path
=
ernie
.
get_word_dict_path
())
train_reader
=
reader
.
data_generator
(
batch_size
=
args
.
batch_size
,
phase
=
'train'
)
loss_sum
=
acc_sum
=
cnt
=
0
# 执行epoch_num次训练
for
epoch
in
range
(
args
.
num_epoch
):
# 读取训练数据进行训练
for
batch_id
,
data
in
enumerate
(
train_reader
()):
input_ids
=
np
.
array
(
data
[
0
][
0
]).
astype
(
np
.
int64
)
position_ids
=
np
.
array
(
data
[
0
][
1
]).
astype
(
np
.
int64
)
segment_ids
=
np
.
array
(
data
[
0
][
2
]).
astype
(
np
.
int64
)
input_mask
=
np
.
array
(
data
[
0
][
3
]).
astype
(
np
.
float32
)
labels
=
np
.
array
(
data
[
0
][
4
]).
astype
(
np
.
int64
)
pred
=
tc
(
input_ids
,
position_ids
,
segment_ids
,
input_mask
)
acc
=
fluid
.
layers
.
accuracy
(
pred
,
to_variable
(
labels
))
loss
=
fluid
.
layers
.
cross_entropy
(
pred
,
to_variable
(
labels
))
avg_loss
=
fluid
.
layers
.
mean
(
loss
)
avg_loss
.
backward
()
# 参数更新
adam
.
minimize
(
avg_loss
)
loss_sum
+=
avg_loss
.
numpy
()
*
labels
.
shape
[
0
]
acc_sum
+=
acc
.
numpy
()
*
labels
.
shape
[
0
]
cnt
+=
labels
.
shape
[
0
]
if
batch_id
%
args
.
log_interval
==
0
:
print
(
'epoch {}: loss {}, acc {}'
.
format
(
epoch
,
loss_sum
/
cnt
,
acc_sum
/
cnt
))
loss_sum
=
acc_sum
=
cnt
=
0
if
batch_id
%
args
.
save_interval
==
0
:
state_dict
=
tc
.
state_dict
()
fluid
.
save_dygraph
(
state_dict
,
state_dict_path
)
if
__name__
==
"__main__"
:
args
=
parser
.
parse_args
()
finetune
(
args
)
docs/pretrained_models.md
0 → 100644
浏览文件 @
22c4494f
此差异已折叠。
点击以展开。
docs/reference/config.md
浏览文件 @
22c4494f
...
...
@@ -8,8 +8,8 @@
hub
.
RunConfig
(
log_interval
=
10
,
eval_interval
=
100
,
use_pyreader
=
Fals
e
,
use_data_parallel
=
Fals
e
,
use_pyreader
=
Tru
e
,
use_data_parallel
=
Tru
e
,
save_ckpt_interval
=
None
,
use_cuda
=
False
,
checkpoint_dir
=
None
,
...
...
@@ -22,8 +22,8 @@ hub.RunConfig(
*
`log_interval`
: 打印训练日志的周期,默认为10。
*
`eval_interval`
: 进行评估的周期,默认为100。
*
`use_pyreader`
: 是否使用pyreader,默认
Fals
e。
*
`use_data_parallel`
: 是否使用并行计算,默认
Fals
e。打开该功能依赖nccl库。
*
`use_pyreader`
: 是否使用pyreader,默认
Tru
e。
*
`use_data_parallel`
: 是否使用并行计算,默认
Tru
e。打开该功能依赖nccl库。
*
`save_ckpt_interval`
: 保存checkpoint的周期,默认为None。
*
`use_cuda`
: 是否使用GPU训练和评估,默认为False。
*
`checkpoint_dir`
: checkpoint的保存目录,默认为None,此时会在工作目录下根据时间戳生成一个临时目录。
...
...
docs/reference/task/base_task.md
浏览文件 @
22c4494f
...
...
@@ -169,15 +169,6 @@ import paddlehub as hub
task
.
predict
()
```
## Func `predict`
根据config配置进行predict
**示例**
```
python
import
paddlehub
as
hub
...
task
.
predict
()
```
## Property `is_train_phase`
判断是否处于训练阶段
...
...
docs/tutorial/define_task_example.md
0 → 100644
浏览文件 @
22c4494f
# 如何修改Task中的模型网络
在应用中,用户需要更换迁移网络结构以调整模型在数据集上的性能。根据
[
如何自定义Task
](
./how_to_define_task.md
)
,本教程展示如何修改Task中的默认网络。
以序列标注任务为例,本教程展示如何修改默认网络结构。SequenceLabelTask提供了两种网络选择,一种是FC网络,一种是FC+CRF网络。
此时如果想在这基础之上,添加LSTM网络,组成BiLSTM+CRF的一种序列标注任务常用网络结构。
此时,需要定义一个Task,继承自SequenceLabelTask,并改写其中build_net()方法。
下方代码示例写了一个BiLSTM+CRF的网络。代码如下:
```
python
class
SequenceLabelTask_BiLSTMCRF
(
SequenceLabelTask
):
def
_build_net
(
self
):
"""
自定义序列标注迁移网络结构BiLSTM+CRF
"""
self
.
seq_len
=
fluid
.
layers
.
data
(
name
=
"seq_len"
,
shape
=
[
1
],
dtype
=
'int64'
,
lod_level
=
0
)
if
version_compare
(
paddle
.
__version__
,
"1.6"
):
self
.
seq_len_used
=
fluid
.
layers
.
squeeze
(
self
.
seq_len
,
axes
=
[
1
])
else
:
self
.
seq_len_used
=
self
.
seq_len
if
self
.
add_crf
:
# 迁移网络为BiLSTM+CRF
# 去padding
unpad_feature
=
fluid
.
layers
.
sequence_unpad
(
self
.
feature
,
length
=
self
.
seq_len_used
)
# bilstm层
hid_dim
=
128
fc0
=
fluid
.
layers
.
fc
(
input
=
unpad_feature
,
size
=
hid_dim
*
4
)
rfc0
=
fluid
.
layers
.
fc
(
input
=
unpad_feature
,
size
=
hid_dim
*
4
)
lstm_h
,
c
=
fluid
.
layers
.
dynamic_lstm
(
input
=
fc0
,
size
=
hid_dim
*
4
,
is_reverse
=
False
)
rlstm_h
,
c
=
fluid
.
layers
.
dynamic_lstm
(
input
=
rfc0
,
size
=
hid_dim
*
4
,
is_reverse
=
True
)
# 拼接lstm
lstm_concat
=
fluid
.
layers
.
concat
(
input
=
[
lstm_h
,
rlstm_h
],
axis
=
1
)
self
.
emission
=
fluid
.
layers
.
fc
(
size
=
self
.
num_classes
,
input
=
lstm_concat
,
param_attr
=
fluid
.
ParamAttr
(
initializer
=
fluid
.
initializer
.
Uniform
(
low
=-
0.1
,
high
=
0.1
),
regularizer
=
fluid
.
regularizer
.
L2DecayRegularizer
(
regularization_coeff
=
1e-4
)))
size
=
self
.
emission
.
shape
[
1
]
fluid
.
layers
.
create_parameter
(
shape
=
[
size
+
2
,
size
],
dtype
=
self
.
emission
.
dtype
,
name
=
'crfw'
)
# CRF层
self
.
ret_infers
=
fluid
.
layers
.
crf_decoding
(
input
=
self
.
emission
,
param_attr
=
fluid
.
ParamAttr
(
name
=
'crfw'
))
ret_infers
=
fluid
.
layers
.
assign
(
self
.
ret_infers
)
# 返回预测值,list类型
return
[
ret_infers
]
else
:
# 迁移网络为FC
self
.
logits
=
fluid
.
layers
.
fc
(
input
=
self
.
feature
,
size
=
self
.
num_classes
,
num_flatten_dims
=
2
,
param_attr
=
fluid
.
ParamAttr
(
name
=
"cls_seq_label_out_w"
,
initializer
=
fluid
.
initializer
.
TruncatedNormal
(
scale
=
0.02
)),
bias_attr
=
fluid
.
ParamAttr
(
name
=
"cls_seq_label_out_b"
,
initializer
=
fluid
.
initializer
.
Constant
(
0.
)))
self
.
ret_infers
=
fluid
.
layers
.
reshape
(
x
=
fluid
.
layers
.
argmax
(
self
.
logits
,
axis
=
2
),
shape
=
[
-
1
,
1
])
logits
=
self
.
logits
logits
=
fluid
.
layers
.
flatten
(
logits
,
axis
=
2
)
logits
=
fluid
.
layers
.
softmax
(
logits
)
self
.
num_labels
=
logits
.
shape
[
1
]
# 返回预测值,list类型
return
[
logits
]
```
以上代码通过继承PaddleHub已经内置的Task,改写其中_build_net方法即可实现自定义迁移网络结构。
docs/tutorial/finetuned_model_to_module.md
浏览文件 @
22c4494f
...
...
@@ -148,7 +148,9 @@ def _initialize(self,
初始化过程即为Fine-tune时创建Task的过程。
**NOTE:**
执行类的初始化不能使用默认的__init__接口,而是应该重载实现_initialize接口。对象默认内置了directory属性,可以直接获取到Module所在路径
**NOTE:**
1.
执行类的初始化不能使用默认的__init__接口,而是应该重载实现_initialize接口。对象默认内置了directory属性,可以直接获取到Module所在路径。
2.
使用Fine-tune保存的模型预测时,无需加载数据集Dataset,即Reader中的dataset参数可为None。
#### step 3_4. 完善预测逻辑
```
python
...
...
@@ -160,7 +162,14 @@ def predict(self, data, return_result=False, accelerate_mode=True):
data
=
data
,
return_result
=
return_result
,
accelerate_mode
=
accelerate_mode
)
return
run_states
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
prediction
=
[]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
batch_result
=
batch_result
.
tolist
()
prediction
+=
batch_result
return
prediction
```
#### step 3_5. 支持serving调用
...
...
@@ -179,7 +188,14 @@ def predict(self, data, return_result=False, accelerate_mode=True):
data
=
data
,
return_result
=
return_result
,
accelerate_mode
=
accelerate_mode
)
return
run_states
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
prediction
=
[]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
batch_result
=
batch_result
.
tolist
()
prediction
+=
batch_result
return
prediction
```
### 完整代码
...
...
@@ -214,15 +230,9 @@ ernie_tiny = hub.Module(name="ernie_tiny_finetuned")
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
index
=
0
run_states
=
ernie_tiny
.
predict
(
data
=
data
)
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
predictions
=
ernie_tiny
.
predict
(
data
=
data
)
for
index
,
text
in
enumerate
(
data
):
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
predictions
[
index
]))
```
### 调用方法2
...
...
@@ -238,15 +248,9 @@ ernie_tiny_finetuned = hub.Module(directory="finetuned_model_to_module/")
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
index
=
0
run_states
=
ernie_tiny
.
predict
(
data
=
data
)
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
predictions
=
ernie_tiny
.
predict
(
data
=
data
)
for
index
,
text
in
enumerate
(
data
):
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
predictions
[
index
]))
```
### 调用方法3
...
...
@@ -263,13 +267,42 @@ import numpy as np
data
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
run_states
=
ERNIETinyFinetuned
.
predict
(
data
=
data
)
index
=
0
results
=
[
run_state
.
run_results
for
run_state
in
run_states
]
for
batch_result
in
results
:
# get predict index
batch_result
=
np
.
argmax
(
batch_result
,
axis
=
2
)[
0
]
for
result
in
batch_result
:
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
result
))
index
+=
1
predictions
=
ERNIETinyFinetuned
.
predict
(
data
=
data
)
for
index
,
text
in
enumerate
(
data
):
print
(
"%s
\t
predict=%s"
%
(
data
[
index
][
0
],
predictions
[
index
]))
```
### PaddleHub Serving调用方法
**第一步:启动预测服务**
```
shell
hub serving start
-m
ernie_tiny_finetuned
```
**第二步:发送请求,获取预测结果**
通过如下脚本既可以发送请求:
```
python
# coding: utf8
import
requests
import
json
# 待预测文本
texts
=
[[
"这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般"
],
[
"交通方便;环境很好;服务态度很好 房间较小"
],
[
"19天硬盘就罢工了~~~算上运来的一周都没用上15天~~~可就是不能换了~~~唉~~~~你说这算什么事呀~~~"
]]
# key为'data', 对应着预测接口predict的参数data
data
=
{
'data'
:
texts
}
# 指定模型为ernie_tiny_finetuned并发送post请求,且请求的headers为application/json方式
url
=
"http://127.0.0.1:8866/predict/ernie_tiny_finetuned"
headers
=
{
"Content-Type"
:
"application/json"
}
r
=
requests
.
post
(
url
=
url
,
headers
=
headers
,
data
=
json
.
dumps
(
data
))
# 打印预测结果
print
(
json
.
dumps
(
r
.
json
(),
indent
=
4
,
ensure_ascii
=
False
))
```
关与PaddleHub Serving更多信息参见
[
Hub Serving教程
](
../../docs/tutorial/serving.md
)
以及
[
Demo
](
../../demo/serving
)
docs/tutorial/how_to_load_data.md
浏览文件 @
22c4494f
...
...
@@ -22,6 +22,7 @@
如果您有两个输入文本text_a、text_b,则第一列为第一个输入文本text_a, 第二列应为第二个输入文本text_b,第三列文本类别label。列与列之间以Tab键分隔。数据集第一行为
`text_a text_b label`
(中间以Tab键分隔)。
```
text
text_a label
15.4寸笔记本的键盘确实爽,基本跟台式机差不多了,蛮喜欢数字小键盘,输数字特方便,样子也很美观,做工也相当不错 1
...
...
@@ -36,6 +37,7 @@ text_a label
*
数据集文件编码格式建议为utf8格式。
*
如果相应的数据集文件没有上述的列说明,如train.tsv文件没有第一行的
`text_a label`
,则train_file_with_header=False。
*
如果您还有预测数据(没有文本类别),可以将预测数据存放在predict.tsv文件,文件格式和train.tsv类似。去掉label一列即可。
*
分类任务中,数据集的label必须从0开始计数
```
python
...
...
@@ -117,6 +119,7 @@ dog
*
训练/验证/测试集的数据列表文件中的图片路径需要相对于dataset_dir的相对路径,例如图片的实际位置为
`/test/data/dog/dog1.jpg`
。base_path为
`/test/data`
,则文件中填写的路径应该为
`dog/dog1.jpg`
。
*
如果您还有预测数据(没有文本类别),可以将预测数据存放在predict_list.txt文件,文件格式和train_list.txt类似。去掉label一列即可
*
如果您的数据集类别较少,可以不用定义label_list.txt,可以选择定义label_list=["数据集所有类别"]。
*
分类任务中,数据集的label必须从0开始计数
```
python
from
paddlehub.dataset.base_cv_dataset
import
BaseCVDataset
...
...
hub_module/scripts/configs/faster_rcnn_resnet50_fpn_venus.yml
浏览文件 @
22c4494f
name
:
faster_rcnn_resnet50_fpn_venus
dir
:
"
modules/image/object_detection/faster_rcnn_resnet50_fpn_venus"
#
resources:
#
-
# url: https://paddlehub.bj.bcebos.com/model/cv/faster_rcnn_resnet50_fpn
_model.tar.gz
#
dest: faster_rcnn_resnet50_fpn_model
#
uncompress: True
resources
:
-
url
:
https://paddlehub.bj.bcebos.com/model/cv/faster_rcnn_resnet50_fpn_venus
_model.tar.gz
dest
:
faster_rcnn_resnet50_fpn_model
uncompress
:
True
paddlehub/__init__.py
浏览文件 @
22c4494f
...
...
@@ -39,6 +39,7 @@ from .common.logger import logger
from
.common.paddle_helper
import
connect_program
from
.common.hub_server
import
HubServer
from
.common.hub_server
import
server_check
from
.common.downloader
import
download
,
ResourceNotFoundError
,
ServerConnectionError
from
.module.module
import
Module
from
.module.base_processor
import
BaseProcessor
...
...
paddlehub/common/downloader.py
浏览文件 @
22c4494f
#coding:utf-8
#
coding:utf-8
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the
"License"
# Licensed under the Apache License, Version 2.0 (the
'License'
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an
"AS IS"
BASIS,
# distributed under the License is distributed on an
'AS IS'
BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
...
...
@@ -28,6 +28,8 @@ import tarfile
from
paddlehub.common
import
utils
from
paddlehub.common.logger
import
logger
from
paddlehub.common
import
tmp_dir
import
paddlehub
as
hub
__all__
=
[
'Downloader'
,
'progress'
]
FLUSH_INTERVAL
=
0.1
...
...
@@ -38,10 +40,10 @@ lasttime = time.time()
def
progress
(
str
,
end
=
False
):
global
lasttime
if
end
:
str
+=
"
\n
"
str
+=
'
\n
'
lasttime
=
0
if
time
.
time
()
-
lasttime
>=
FLUSH_INTERVAL
:
sys
.
stdout
.
write
(
"
\r
%s"
%
str
)
sys
.
stdout
.
write
(
'
\r
%s'
%
str
)
lasttime
=
time
.
time
()
sys
.
stdout
.
flush
()
...
...
@@ -67,7 +69,7 @@ class Downloader(object):
if
retry_times
<
retry_limit
:
retry_times
+=
1
else
:
tips
=
"Cannot download {0} within retry limit {1}"
.
format
(
tips
=
'Cannot download {0} within retry limit {1}'
.
format
(
url
,
retry_limit
)
return
False
,
tips
,
None
r
=
requests
.
get
(
url
,
stream
=
True
)
...
...
@@ -82,19 +84,19 @@ class Downloader(object):
total_length
=
int
(
total_length
)
starttime
=
time
.
time
()
if
print_progress
:
print
(
"Downloading %s"
%
save_name
)
print
(
'Downloading %s'
%
save_name
)
for
data
in
r
.
iter_content
(
chunk_size
=
4096
):
dl
+=
len
(
data
)
f
.
write
(
data
)
if
print_progress
:
done
=
int
(
50
*
dl
/
total_length
)
progress
(
"[%-50s] %.2f%%"
%
'[%-50s] %.2f%%'
%
(
'='
*
done
,
float
(
dl
/
total_length
*
100
)))
if
print_progress
:
progress
(
"[%-50s] %.2f%%"
%
(
'='
*
50
,
100
),
end
=
True
)
progress
(
'[%-50s] %.2f%%'
%
(
'='
*
50
,
100
),
end
=
True
)
tips
=
"File %s download completed!"
%
(
file_name
)
tips
=
'File %s download completed!'
%
(
file_name
)
return
True
,
tips
,
file_name
def
uncompress
(
self
,
...
...
@@ -104,24 +106,25 @@ class Downloader(object):
print_progress
=
False
):
dirname
=
os
.
path
.
dirname
(
file
)
if
dirname
is
None
else
dirname
if
print_progress
:
print
(
"Uncompress %s"
%
file
)
with
tarfile
.
open
(
file
,
"r:gz"
)
as
tar
:
print
(
'Uncompress %s'
%
file
)
with
tarfile
.
open
(
file
,
'r:*'
)
as
tar
:
file_names
=
tar
.
getnames
()
size
=
len
(
file_names
)
-
1
module_dir
=
os
.
path
.
join
(
dirname
,
file_names
[
0
])
for
index
,
file_name
in
enumerate
(
file_names
):
if
print_progress
:
done
=
int
(
50
*
float
(
index
)
/
size
)
progress
(
"[%-50s] %.2f%%"
%
(
'='
*
done
,
progress
(
'[%-50s] %.2f%%'
%
(
'='
*
done
,
float
(
index
/
size
*
100
)))
tar
.
extract
(
file_name
,
dirname
)
if
print_progress
:
progress
(
"[%-50s] %.2f%%"
%
(
'='
*
50
,
100
),
end
=
True
)
progress
(
'[%-50s] %.2f%%'
%
(
'='
*
50
,
100
),
end
=
True
)
if
delete_file
:
os
.
remove
(
file
)
return
True
,
"File %s uncompress completed!"
%
file
,
module_dir
return
True
,
'File %s uncompress completed!'
%
file
,
module_dir
def
download_file_and_uncompress
(
self
,
url
,
...
...
@@ -147,8 +150,62 @@ class Downloader(object):
if
save_name
:
save_name
=
os
.
path
.
join
(
save_path
,
save_name
)
shutil
.
move
(
file
,
save_name
)
return
result
,
"%s
\n
%s"
%
(
tips_1
,
tips_2
),
save_name
return
result
,
"%s
\n
%s"
%
(
tips_1
,
tips_2
),
file
return
result
,
'%s
\n
%s'
%
(
tips_1
,
tips_2
),
save_name
return
result
,
'%s
\n
%s'
%
(
tips_1
,
tips_2
),
file
default_downloader
=
Downloader
()
class
ResourceNotFoundError
(
Exception
):
def
__init__
(
self
,
name
,
version
=
None
):
self
.
name
=
name
self
.
version
=
version
def
__str__
(
self
):
if
self
.
version
:
tips
=
'No resource named {} was found'
.
format
(
self
.
name
)
else
:
tips
=
'No resource named {}-{} was found'
.
format
(
self
.
name
,
self
.
version
)
return
tips
class
ServerConnectionError
(
Exception
):
def
__str__
(
self
):
tips
=
'Can
\'
t connect to Hub Server:{}'
.
format
(
hub
.
HubServer
().
server_url
[
0
])
return
tips
def
download
(
name
,
save_path
,
version
=
None
,
decompress
=
True
,
resource_type
=
'Model'
,
extra
=
None
):
file
=
os
.
path
.
join
(
save_path
,
name
)
file
=
os
.
path
.
realpath
(
file
)
if
os
.
path
.
exists
(
file
):
return
if
not
hub
.
HubServer
().
_server_check
():
raise
ServerConnectionError
search_result
=
hub
.
HubServer
().
get_resource_url
(
name
,
resource_type
=
resource_type
,
version
=
version
,
extra
=
extra
)
if
not
search_result
:
raise
ResourceNotFoundError
(
name
,
version
)
url
=
search_result
[
'url'
]
with
tmp_dir
()
as
_dir
:
if
not
os
.
path
.
exists
(
save_path
):
os
.
makedirs
(
save_path
)
_
,
_
,
savefile
=
default_downloader
.
download_file
(
url
=
url
,
save_path
=
_dir
,
print_progress
=
True
)
if
tarfile
.
is_tarfile
(
savefile
)
and
decompress
:
_
,
_
,
savefile
=
default_downloader
.
uncompress
(
file
=
savefile
,
print_progress
=
True
)
shutil
.
move
(
savefile
,
file
)
paddlehub/common/hub_server.py
浏览文件 @
22c4494f
...
...
@@ -46,7 +46,8 @@ class HubServer(object):
config_file_path
=
os
.
path
.
join
(
CONF_HOME
,
'config.json'
)
if
not
os
.
path
.
exists
(
CONF_HOME
):
utils
.
mkdir
(
CONF_HOME
)
if
not
os
.
path
.
exists
(
config_file_path
):
if
not
os
.
path
.
exists
(
config_file_path
)
or
0
==
os
.
path
.
getsize
(
config_file_path
):
with
open
(
config_file_path
,
'w+'
)
as
fp
:
lock
.
flock
(
fp
,
lock
.
LOCK_EX
)
fp
.
write
(
json
.
dumps
(
default_server_config
))
...
...
paddlehub/common/logger.py
浏览文件 @
22c4494f
...
...
@@ -62,7 +62,8 @@ class Logger(object):
self
.
logger
.
setLevel
(
logging
.
DEBUG
)
self
.
logger
.
propagate
=
False
if
os
.
path
.
exists
(
os
.
path
.
join
(
CONF_HOME
,
"config.json"
)):
config_path
=
os
.
path
.
join
(
CONF_HOME
,
"config.json"
)
if
os
.
path
.
exists
(
config_path
)
and
0
<
os
.
path
.
getsize
(
config_path
):
with
open
(
os
.
path
.
join
(
CONF_HOME
,
"config.json"
),
"r"
)
as
fp
:
level
=
json
.
load
(
fp
).
get
(
"log_level"
,
"DEBUG"
)
self
.
logLevel
=
level
...
...
paddlehub/common/paddle_helper.py
浏览文件 @
22c4494f
...
...
@@ -19,10 +19,11 @@ from __future__ import print_function
import
copy
import
paddle
import
paddle.fluid
as
fluid
from
paddlehub.module
import
module_desc_pb2
from
paddlehub.common.utils
import
from_pyobj_to_module_attr
,
from_module_attr_to_pyobj
from
paddlehub.common.utils
import
from_pyobj_to_module_attr
,
from_module_attr_to_pyobj
,
version_compare
from
paddlehub.common.logger
import
logger
dtype_map
=
{
...
...
@@ -62,7 +63,8 @@ def get_variable_info(var):
var_info
[
'trainable'
]
=
var
.
trainable
var_info
[
'optimize_attr'
]
=
var
.
optimize_attr
var_info
[
'regularizer'
]
=
var
.
regularizer
var_info
[
'gradient_clip_attr'
]
=
var
.
gradient_clip_attr
if
not
version_compare
(
paddle
.
__version__
,
'1.8'
):
var_info
[
'gradient_clip_attr'
]
=
var
.
gradient_clip_attr
var_info
[
'do_model_average'
]
=
var
.
do_model_average
else
:
var_info
[
'persistable'
]
=
var
.
persistable
...
...
paddlehub/dataset/food101.py
浏览文件 @
22c4494f
...
...
@@ -25,11 +25,11 @@ from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class
Food101Dataset
(
BaseCVDataset
):
def
__init__
(
self
):
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"food-101"
,
"images"
)
base_path
=
self
.
_download_dataset
(
dataset_path
=
os
.
path
.
join
(
hub
.
common
.
dir
.
DATA_HOME
,
"food-101"
)
dataset_path
=
self
.
_download_dataset
(
dataset_path
=
dataset_path
,
url
=
"https://bj.bcebos.com/paddlehub-dataset/Food101.tar.gz"
)
base_path
=
os
.
path
.
join
(
dataset_path
,
"images"
)
super
(
Food101Dataset
,
self
).
__init__
(
base_path
=
base_path
,
train_list_file
=
"train_list.txt"
,
...
...
paddlehub/module/manager.py
浏览文件 @
22c4494f
...
...
@@ -96,8 +96,10 @@ class LocalModuleManager(object):
for
sub_dir_name
in
os
.
listdir
(
self
.
local_modules_dir
):
sub_dir_path
=
os
.
path
.
join
(
self
.
local_modules_dir
,
sub_dir_name
)
if
os
.
path
.
isdir
(
sub_dir_path
):
if
"-"
in
sub_dir_path
:
new_sub_dir_path
=
sub_dir_path
.
replace
(
"-"
,
"_"
)
if
"-"
in
sub_dir_name
:
sub_dir_name
=
sub_dir_name
.
replace
(
"-"
,
"_"
)
new_sub_dir_path
=
os
.
path
.
join
(
self
.
local_modules_dir
,
sub_dir_name
)
shutil
.
move
(
sub_dir_path
,
new_sub_dir_path
)
sub_dir_path
=
new_sub_dir_path
valid
,
info
=
self
.
check_module_valid
(
sub_dir_path
)
...
...
@@ -180,11 +182,13 @@ class LocalModuleManager(object):
with
tarfile
.
open
(
module_package
,
"r:gz"
)
as
tar
:
file_names
=
tar
.
getnames
()
size
=
len
(
file_names
)
-
1
module_dir
=
os
.
path
.
join
(
_dir
,
file_names
[
0
])
module_name
=
file_names
[
0
]
module_dir
=
os
.
path
.
join
(
_dir
,
module_name
)
for
index
,
file_name
in
enumerate
(
file_names
):
tar
.
extract
(
file_name
,
_dir
)
if
"-"
in
module_dir
:
new_module_dir
=
module_dir
.
replace
(
"-"
,
"_"
)
if
"-"
in
module_name
:
module_name
=
module_name
.
replace
(
"-"
,
"_"
)
new_module_dir
=
os
.
path
.
join
(
_dir
,
module_name
)
shutil
.
move
(
module_dir
,
new_module_dir
)
module_dir
=
new_module_dir
module_name
=
hub
.
Module
(
directory
=
module_dir
).
name
...
...
paddlehub/module/module.py
浏览文件 @
22c4494f
...
...
@@ -89,7 +89,7 @@ def moduleinfo(name, version, author, author_email, summary, type):
return
_wrapper
class
Module
(
object
):
class
Module
(
fluid
.
dygraph
.
Layer
):
def
__new__
(
cls
,
name
=
None
,
directory
=
None
,
...
...
@@ -121,7 +121,7 @@ class Module(object):
module
=
Module
.
init_with_directory
(
directory
=
directory
,
**
kwargs
)
else
:
module
=
object
.
__new__
(
cls
)
module
=
fluid
.
dygraph
.
Layer
.
__new__
(
cls
)
return
module
...
...
@@ -135,6 +135,7 @@ class Module(object):
if
"_is_initialize"
in
self
.
__dict__
and
self
.
_is_initialize
:
return
super
(
Module
,
self
).
__init__
()
_run_func_name
=
self
.
_get_func_name
(
self
.
__class__
,
_module_runnable_func
)
self
.
_run_func
=
getattr
(
self
,
...
...
@@ -248,6 +249,10 @@ class Module(object):
def
_initialize
(
self
):
pass
def
forward
(
self
,
*
args
,
**
kwargs
):
raise
RuntimeError
(
'{} does not support dynamic graph mode yet.'
.
format
(
self
.
name
))
class
ModuleHelper
(
object
):
def
__init__
(
self
,
directory
):
...
...
paddlehub/module/nlp_module.py
浏览文件 @
22c4494f
...
...
@@ -24,13 +24,15 @@ import os
import
re
import
six
import
paddle
import
numpy
as
np
import
paddle.fluid
as
fluid
from
paddlehub.common
import
paddle_helper
from
paddle.fluid.core
import
PaddleTensor
,
AnalysisConfig
,
create_paddle_predictor
import
paddlehub
as
hub
from
paddle.fluid.core
import
PaddleTensor
,
AnalysisConfig
,
create_paddle_predictor
from
paddlehub.common
import
paddle_helper
,
tmp_dir
from
paddlehub.common.logger
import
logger
from
paddlehub.common.utils
import
sys_stdin_encoding
from
paddlehub.common.utils
import
sys_stdin_encoding
,
version_compare
from
paddlehub.io.parser
import
txt_parser
from
paddlehub.module.module
import
runnable
...
...
@@ -246,6 +248,45 @@ class TransformerModule(NLPBaseModule):
Tranformer Module base class can be used by BERT, ERNIE, RoBERTa and so on.
"""
def
__init__
(
self
,
name
=
None
,
directory
=
None
,
module_dir
=
None
,
version
=
None
,
max_seq_len
=
128
,
**
kwargs
):
if
not
directory
:
return
super
(
TransformerModule
,
self
).
__init__
(
name
=
name
,
directory
=
directory
,
module_dir
=
module_dir
,
version
=
version
,
**
kwargs
)
self
.
max_seq_len
=
max_seq_len
if
version_compare
(
paddle
.
__version__
,
'1.8.0'
):
with
tmp_dir
()
as
_dir
:
input_dict
,
output_dict
,
program
=
self
.
context
(
max_seq_len
=
max_seq_len
)
fluid
.
io
.
save_inference_model
(
dirname
=
_dir
,
main_program
=
program
,
feeded_var_names
=
[
input_dict
[
'input_ids'
].
name
,
input_dict
[
'position_ids'
].
name
,
input_dict
[
'segment_ids'
].
name
,
input_dict
[
'input_mask'
].
name
],
target_vars
=
[
output_dict
[
"pooled_output"
],
output_dict
[
"sequence_output"
]
],
executor
=
fluid
.
Executor
(
fluid
.
CPUPlace
()))
with
fluid
.
dygraph
.
guard
():
self
.
model_runner
=
fluid
.
dygraph
.
StaticModelRunner
(
_dir
)
def
init_pretraining_params
(
self
,
exe
,
pretraining_params_path
,
main_program
):
assert
os
.
path
.
exists
(
...
...
@@ -271,7 +312,7 @@ class TransformerModule(NLPBaseModule):
def
context
(
self
,
max_seq_len
=
128
,
max_seq_len
=
None
,
trainable
=
True
,
):
"""
...
...
@@ -287,6 +328,9 @@ class TransformerModule(NLPBaseModule):
"""
if
not
max_seq_len
:
max_seq_len
=
self
.
max_seq_len
assert
max_seq_len
<=
self
.
MAX_SEQ_LEN
and
max_seq_len
>=
1
,
"max_seq_len({}) should be in the range of [1, {}]"
.
format
(
max_seq_len
,
self
.
MAX_SEQ_LEN
)
...
...
@@ -431,3 +475,16 @@ class TransformerModule(NLPBaseModule):
"The module context has not been initialized. "
"Please call context() before using get_params_layer"
)
return
self
.
params_layer
def
forward
(
self
,
input_ids
,
position_ids
,
segment_ids
,
input_mask
):
if
version_compare
(
paddle
.
__version__
,
'1.8.0'
):
pooled_output
,
sequence_output
=
self
.
model_runner
(
input_ids
,
position_ids
,
segment_ids
,
input_mask
)
return
{
'pooled_output'
:
pooled_output
,
'sequence_output'
:
sequence_output
}
else
:
raise
RuntimeError
(
'{} only support dynamic graph mode in paddle >= 1.8.0'
.
format
(
self
.
name
))
paddlehub/reader/cv_reader.py
浏览文件 @
22c4494f
...
...
@@ -165,7 +165,7 @@ class ImageClassificationReader(BaseReader):
for
image_path
,
label
in
data
:
image
=
preprocess
(
image_path
)
images
.
append
(
image
.
astype
(
'float32'
))
labels
.
append
([
int
(
label
)])
labels
.
append
([
np
.
int64
(
label
)])
if
len
(
images
)
==
batch_size
:
if
return_list
:
...
...
paddlehub/version.py
浏览文件 @
22c4494f
...
...
@@ -13,5 +13,5 @@
# See the License for the specific language governing permissions and
# limitations under the License.
""" PaddleHub version string """
hub_version
=
"1.6.
0
"
hub_version
=
"1.6.
2
"
module_proto_version
=
"1.0.0"
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录