Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleHub
提交
582ecfc9
P
PaddleHub
项目概览
PaddlePaddle
/
PaddleHub
大约 2 年 前同步成功
通知
285
Star
12117
Fork
2091
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
200
列表
看板
标记
里程碑
合并请求
4
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleHub
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
200
Issue
200
列表
看板
标记
里程碑
合并请求
4
合并请求
4
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
582ecfc9
编写于
4月 14, 2019
作者:
Z
Zeyu Chen
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
reorg demo
上级
e1d33b79
变更
13
隐藏空白更改
内联
并排
Showing
13 changed file
with
38 addition
and
246 deletion
+38
-246
demo/ernie-classification/ernie_tiny_demo.py
demo/ernie-classification/ernie_tiny_demo.py
+0
-34
demo/ernie-classification/question_answering.py
demo/ernie-classification/question_answering.py
+0
-85
demo/ernie-classification/question_matching.py
demo/ernie-classification/question_matching.py
+0
-85
demo/ernie-classification/run_question_answering.sh
demo/ernie-classification/run_question_answering.sh
+0
-10
demo/ernie-classification/run_question_matching.sh
demo/ernie-classification/run_question_matching.sh
+0
-10
demo/ernie-classification/run_sentiment_cls.sh
demo/ernie-classification/run_sentiment_cls.sh
+0
-11
demo/sequence-labeling/run_sequence_labeling.sh
demo/sequence-labeling/run_sequence_labeling.sh
+2
-3
demo/sequence-labeling/sequence_label.py
demo/sequence-labeling/sequence_label.py
+4
-5
demo/text-classification/README.md
demo/text-classification/README.md
+0
-0
demo/text-classification/cls_predict.py
demo/text-classification/cls_predict.py
+1
-1
demo/text-classification/run_classifier.sh
demo/text-classification/run_classifier.sh
+19
-0
demo/text-classification/run_predict.sh
demo/text-classification/run_predict.sh
+0
-0
demo/text-classification/text_classifier.py
demo/text-classification/text_classifier.py
+12
-2
未找到文件。
demo/ernie-classification/ernie_tiny_demo.py
已删除
100644 → 0
浏览文件 @
e1d33b79
import
paddle.fluid
as
fluid
import
paddlehub
as
hub
# Step1
module
=
hub
.
Module
(
name
=
"ernie"
)
inputs
,
outputs
,
program
=
module
.
context
(
trainable
=
True
,
max_seq_len
=
128
)
# Step2
dataset
=
hub
.
dataset
.
ChnSentiCorp
()
reader
=
hub
.
reader
.
ClassifyReader
(
dataset
=
dataset
,
vocab_path
=
module
.
get_vocab_path
(),
max_seq_len
=
128
)
# Step3
with
fluid
.
program_guard
(
program
):
label
=
fluid
.
layers
.
data
(
name
=
"label"
,
shape
=
[
1
],
dtype
=
'int64'
)
pooled_output
=
outputs
[
"pooled_output"
]
cls_task
=
hub
.
create_text_classification_task
(
feature
=
pooled_output
,
label
=
label
,
num_classes
=
dataset
.
num_labels
)
# Step4
strategy
=
hub
.
AdamWeightDecayStrategy
(
learning_rate
=
5e-5
,
weight_decay
=
0.01
)
config
=
hub
.
RunConfig
(
use_cuda
=
True
,
num_epoch
=
3
,
batch_size
=
32
,
strategy
=
strategy
)
feed_list
=
[
inputs
[
"input_ids"
].
name
,
inputs
[
"position_ids"
].
name
,
inputs
[
"segment_ids"
].
name
,
inputs
[
"input_mask"
].
name
,
label
.
name
]
hub
.
finetune_and_eval
(
task
=
cls_task
,
data_reader
=
reader
,
feed_list
=
feed_list
,
config
=
config
)
demo/ernie-classification/question_answering.py
已删除
100644 → 0
浏览文件 @
e1d33b79
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Finetuning on classification task """
import
argparse
import
paddle.fluid
as
fluid
import
paddlehub
as
hub
# yapf: disable
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
3
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--learning_rate"
,
type
=
float
,
default
=
5e-5
,
help
=
"Learning rate used to train with warmup."
)
parser
.
add_argument
(
"--weight_decay"
,
type
=
float
,
default
=
0.01
,
help
=
"Weight decay rate for L2 regularizer."
)
parser
.
add_argument
(
"--data_dir"
,
type
=
str
,
default
=
None
,
help
=
"Path to training data."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
None
,
help
=
"Directory to model checkpoint"
)
parser
.
add_argument
(
"--max_seq_len"
,
type
=
int
,
default
=
512
,
help
=
"Number of words of the longest seqence."
)
parser
.
add_argument
(
"--batch_size"
,
type
=
int
,
default
=
32
,
help
=
"Total examples' number in batch for training."
)
args
=
parser
.
parse_args
()
# yapf: enable.
if
__name__
==
'__main__'
:
# Step1: load Paddlehub ERNIE pretrained model
module
=
hub
.
Module
(
name
=
"ernie"
)
inputs
,
outputs
,
program
=
module
.
context
(
trainable
=
True
,
max_seq_len
=
args
.
max_seq_len
)
# Step2: Download dataset and use ClassifyReader to read dataset
dataset
=
hub
.
dataset
.
NLPCC_DBQA
()
reader
=
hub
.
reader
.
ClassifyReader
(
dataset
=
dataset
,
vocab_path
=
module
.
get_vocab_path
(),
max_seq_len
=
args
.
max_seq_len
)
# Step3: construct transfer learning network
with
fluid
.
program_guard
(
program
):
label
=
fluid
.
layers
.
data
(
name
=
"label"
,
shape
=
[
1
],
dtype
=
'int64'
)
# Use "pooled_output" for classification tasks on an entire sentence.
# Use "sequence_output" for token-level output.
pooled_output
=
outputs
[
"pooled_output"
]
# Setup feed list for data feeder
# Must feed all the tensor of ERNIE's module need
feed_list
=
[
inputs
[
"input_ids"
].
name
,
inputs
[
"position_ids"
].
name
,
inputs
[
"segment_ids"
].
name
,
inputs
[
"input_mask"
].
name
,
label
.
name
]
# Define a classfication finetune task by PaddleHub's API
cls_task
=
hub
.
create_text_classification_task
(
pooled_output
,
label
,
num_classes
=
dataset
.
num_labels
)
# Step4: Select finetune strategy, setup config and finetune
strategy
=
hub
.
AdamWeightDecayStrategy
(
weight_decay
=
args
.
weight_decay
,
learning_rate
=
args
.
learning_rate
,
warmup_strategy
=
"linear_warmup_decay"
,
)
# Setup runing config for PaddleHub Finetune API
config
=
hub
.
RunConfig
(
use_cuda
=
True
,
num_epoch
=
args
.
num_epoch
,
batch_size
=
args
.
batch_size
,
checkpoint_dir
=
args
.
checkpoint_dir
,
strategy
=
strategy
)
# Finetune and evaluate by PaddleHub's API
# will finish training, evaluation, testing, save model automatically
hub
.
finetune_and_eval
(
task
=
cls_task
,
data_reader
=
reader
,
feed_list
=
feed_list
,
config
=
config
)
demo/ernie-classification/question_matching.py
已删除
100644 → 0
浏览文件 @
e1d33b79
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Finetuning on classification task """
import
argparse
import
paddle.fluid
as
fluid
import
paddlehub
as
hub
# yapf: disable
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
3
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--learning_rate"
,
type
=
float
,
default
=
5e-5
,
help
=
"Learning rate used to train with warmup."
)
parser
.
add_argument
(
"--weight_decay"
,
type
=
float
,
default
=
0.01
,
help
=
"Weight decay rate for L2 regularizer."
)
parser
.
add_argument
(
"--data_dir"
,
type
=
str
,
default
=
None
,
help
=
"Path to training data."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
None
,
help
=
"Directory to model checkpoint"
)
parser
.
add_argument
(
"--max_seq_len"
,
type
=
int
,
default
=
512
,
help
=
"Number of words of the longest seqence."
)
parser
.
add_argument
(
"--batch_size"
,
type
=
int
,
default
=
32
,
help
=
"Total examples' number in batch for training."
)
args
=
parser
.
parse_args
()
# yapf: enable.
if
__name__
==
'__main__'
:
# Step1: load Paddlehub ERNIE pretrained model
module
=
hub
.
Module
(
name
=
"ernie"
)
inputs
,
outputs
,
program
=
module
.
context
(
trainable
=
True
,
max_seq_len
=
args
.
max_seq_len
)
# Step2: Download dataset and use ClassifyReader to read dataset
dataset
=
hub
.
dataset
.
LCQMC
()
reader
=
hub
.
reader
.
ClassifyReader
(
dataset
=
dataset
,
vocab_path
=
module
.
get_vocab_path
(),
max_seq_len
=
args
.
max_seq_len
)
# Step3: construct transfer learning network
with
fluid
.
program_guard
(
program
):
label
=
fluid
.
layers
.
data
(
name
=
"label"
,
shape
=
[
1
],
dtype
=
'int64'
)
# Use "pooled_output" for classification tasks on an entire sentence.
# Use "sequence_output" for token-level output.
pooled_output
=
outputs
[
"pooled_output"
]
# Setup feed list for data feeder
# Must feed all the tensor of ERNIE's module need
feed_list
=
[
inputs
[
"input_ids"
].
name
,
inputs
[
"position_ids"
].
name
,
inputs
[
"segment_ids"
].
name
,
inputs
[
"input_mask"
].
name
,
label
.
name
]
# Define a classfication finetune task by PaddleHub's API
cls_task
=
hub
.
create_text_classification_task
(
pooled_output
,
label
,
num_classes
=
dataset
.
num_labels
)
# Step4: Select finetune strategy, setup config and finetune
strategy
=
hub
.
AdamWeightDecayStrategy
(
weight_decay
=
args
.
weight_decay
,
learning_rate
=
args
.
learning_rate
,
warmup_strategy
=
"linear_warmup_decay"
,
)
# Setup runing config for PaddleHub Finetune API
config
=
hub
.
RunConfig
(
use_cuda
=
True
,
num_epoch
=
args
.
num_epoch
,
batch_size
=
args
.
batch_size
,
checkpoint_dir
=
args
.
checkpoint_dir
,
strategy
=
strategy
)
# Finetune and evaluate by PaddleHub's API
# will finish training, evaluation, testing, save model automatically
hub
.
finetune_and_eval
(
task
=
cls_task
,
data_reader
=
reader
,
feed_list
=
feed_list
,
config
=
config
)
demo/ernie-classification/run_question_answering.sh
已删除
100644 → 0
浏览文件 @
e1d33b79
export
CUDA_VISIBLE_DEVICES
=
3
CKPT_DIR
=
"./ckpt_dbqa"
python
-u
question_answering.py
\
--batch_size
8
\
--weight_decay
0.01
\
--checkpoint_dir
$CKPT_DIR
\
--num_epoch
3
\
--max_seq_len
512
\
--learning_rate
2e-5
demo/ernie-classification/run_question_matching.sh
已删除
100644 → 0
浏览文件 @
e1d33b79
export
CUDA_VISIBLE_DEVICES
=
5
CKPT_DIR
=
"./ckpt_question_matching"
python
-u
question_matching.py
\
--batch_size
32
\
--weight_decay
0.0
\
--checkpoint_dir
$CKPT_DIR
\
--num_epoch
3
\
--max_seq_len
128
\
--learning_rate
2e-5
demo/ernie-classification/run_sentiment_cls.sh
已删除
100644 → 0
浏览文件 @
e1d33b79
export
CUDA_VISIBLE_DEVICES
=
5
CKPT_DIR
=
"./ckpt_sentiment_cls"
python
-u
sentiment_cls.py
\
--batch_size
32
\
--use_gpu
=
False
\
--weight_decay
0.01
\
--checkpoint_dir
$CKPT_DIR
\
--num_epoch
3
\
--max_seq_len
128
\
--learning_rate
5e-5
demo/
ernie-seq
-labeling/run_sequence_labeling.sh
→
demo/
sequence
-labeling/run_sequence_labeling.sh
浏览文件 @
582ecfc9
export
CUDA_VISIBLE_DEVICES
=
6
export
CUDA_VISIBLE_DEVICES
=
0
CKPT_DIR
=
"./ckpt_sequence_labeling"
CKPT_DIR
=
"./ckpt_sequence_labeling"
python
-u
sequence_label.py
\
python
-u
sequence_labeling.py
\
--batch_size
16
\
--batch_size
16
\
--weight_decay
0.01
\
--weight_decay
0.01
\
--checkpoint_dir
$CKPT_DIR
\
--checkpoint_dir
$CKPT_DIR
\
...
...
demo/
ernie-seq-labeling/sequence_labeling
.py
→
demo/
sequence-labeling/sequence_label
.py
浏览文件 @
582ecfc9
...
@@ -37,13 +37,12 @@ if __name__ == '__main__':
...
@@ -37,13 +37,12 @@ if __name__ == '__main__':
trainable
=
True
,
max_seq_len
=
args
.
max_seq_len
)
trainable
=
True
,
max_seq_len
=
args
.
max_seq_len
)
# Step2: Download dataset and use SequenceLabelReader to read dataset
# Step2: Download dataset and use SequenceLabelReader to read dataset
dataset
=
hub
.
dataset
.
MSRA_NER
()
reader
=
hub
.
reader
.
SequenceLabelReader
(
reader
=
hub
.
reader
.
SequenceLabelReader
(
dataset
=
hub
.
dataset
.
MSRA_NER
()
,
dataset
=
dataset
,
vocab_path
=
module
.
get_vocab_path
(),
vocab_path
=
module
.
get_vocab_path
(),
max_seq_len
=
args
.
max_seq_len
)
max_seq_len
=
args
.
max_seq_len
)
num_labels
=
len
(
reader
.
get_labels
())
# Step3: construct transfer learning network
# Step3: construct transfer learning network
with
fluid
.
program_guard
(
program
):
with
fluid
.
program_guard
(
program
):
label
=
fluid
.
layers
.
data
(
label
=
fluid
.
layers
.
data
(
...
@@ -62,11 +61,11 @@ if __name__ == '__main__':
...
@@ -62,11 +61,11 @@ if __name__ == '__main__':
seq_len
seq_len
]
]
# Define a sequence labeling finetune task by PaddleHub's API
# Define a sequence labeling finetune task by PaddleHub's API
seq_label_task
=
hub
.
create_seq_label
ing
_task
(
seq_label_task
=
hub
.
create_seq_label_task
(
feature
=
sequence_output
,
feature
=
sequence_output
,
labels
=
label
,
labels
=
label
,
seq_len
=
seq_len
,
seq_len
=
seq_len
,
num_classes
=
num_labels
)
num_classes
=
dataset
.
num_labels
)
# Select a finetune strategy
# Select a finetune strategy
strategy
=
hub
.
AdamWeightDecayStrategy
(
strategy
=
hub
.
AdamWeightDecayStrategy
(
...
...
demo/
ernie
-classification/README.md
→
demo/
text
-classification/README.md
浏览文件 @
582ecfc9
文件已移动
demo/
ernie
-classification/cls_predict.py
→
demo/
text
-classification/cls_predict.py
浏览文件 @
582ecfc9
...
@@ -64,7 +64,7 @@ if __name__ == '__main__':
...
@@ -64,7 +64,7 @@ if __name__ == '__main__':
]
]
# Define a classfication finetune task by PaddleHub's API
# Define a classfication finetune task by PaddleHub's API
cls_task
=
hub
.
create_text_cl
assification
_task
(
cls_task
=
hub
.
create_text_cl
s
_task
(
feature
=
pooled_output
,
label
=
label
,
num_classes
=
dataset
.
num_labels
)
feature
=
pooled_output
,
label
=
label
,
num_classes
=
dataset
.
num_labels
)
# classificatin probability tensor
# classificatin probability tensor
...
...
demo/text-classification/run_classifier.sh
0 → 100644
浏览文件 @
582ecfc9
export
CUDA_VISIBLE_DEVICES
=
5
# User can select senticorp, nlpcc_dbqa, lcqmc for different task
DATASET
=
"senticorp"
CKPT_DIR
=
"./ckpt_
${
DATASET
}
"
# Recommending hyper parameters for difference task
# ChnSentiCorp: batch_size=24, weight_decay=0.01, num_epoch=3, max_seq_len=128, lr=5e-5
# NLPCC_DBQA: batch_size=8, weight_decay=0.01, num_epoch=3, max_seq_len=512, lr=2e-5
# LCQMC: batch_size=32, weight_decay=0, num_epoch=3, max_seq_len=128, lr=2e-5
python
-u
text_classifier.py
\
--batch_size
=
24
\
--use_gpu
=
True
\
--dataset
=
${
DATASET
}
\
--checkpoint_dir
=
${
CKPT_DIR
}
\
--learning_rate
=
5e-5
\
--weight_decay
=
0.01
\
--max_seq_len
=
128
--num_epoch
=
3
\
demo/
ernie
-classification/run_predict.sh
→
demo/
text
-classification/run_predict.sh
浏览文件 @
582ecfc9
文件已移动
demo/
ernie-classification/sentiment_cls
.py
→
demo/
text-classification/text_classifier
.py
浏览文件 @
582ecfc9
...
@@ -23,8 +23,10 @@ import paddlehub as hub
...
@@ -23,8 +23,10 @@ import paddlehub as hub
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
3
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
3
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--use_gpu"
,
type
=
ast
.
literal_eval
,
default
=
False
,
help
=
"Whether use GPU for finetuning, input should be True or False"
)
parser
.
add_argument
(
"--use_gpu"
,
type
=
ast
.
literal_eval
,
default
=
False
,
help
=
"Whether use GPU for finetuning, input should be True or False"
)
parser
.
add_argument
(
"--dataset"
,
type
=
str
,
default
=
"senticorp"
,
help
=
"Directory to model checkpoint"
)
parser
.
add_argument
(
"--learning_rate"
,
type
=
float
,
default
=
5e-5
,
help
=
"Learning rate used to train with warmup."
)
parser
.
add_argument
(
"--learning_rate"
,
type
=
float
,
default
=
5e-5
,
help
=
"Learning rate used to train with warmup."
)
parser
.
add_argument
(
"--weight_decay"
,
type
=
float
,
default
=
0.01
,
help
=
"Weight decay rate for L2 regularizer."
)
parser
.
add_argument
(
"--weight_decay"
,
type
=
float
,
default
=
0.01
,
help
=
"Weight decay rate for L2 regularizer."
)
parser
.
add_argument
(
"--warmup_proportion"
,
type
=
float
,
default
=
0.0
,
help
=
"Warmup proportion params for warmup strategy"
)
parser
.
add_argument
(
"--data_dir"
,
type
=
str
,
default
=
None
,
help
=
"Path to training data."
)
parser
.
add_argument
(
"--data_dir"
,
type
=
str
,
default
=
None
,
help
=
"Path to training data."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
None
,
help
=
"Directory to model checkpoint"
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
None
,
help
=
"Directory to model checkpoint"
)
parser
.
add_argument
(
"--max_seq_len"
,
type
=
int
,
default
=
512
,
help
=
"Number of words of the longest seqence."
)
parser
.
add_argument
(
"--max_seq_len"
,
type
=
int
,
default
=
512
,
help
=
"Number of words of the longest seqence."
)
...
@@ -40,7 +42,16 @@ if __name__ == '__main__':
...
@@ -40,7 +42,16 @@ if __name__ == '__main__':
trainable
=
True
,
max_seq_len
=
args
.
max_seq_len
)
trainable
=
True
,
max_seq_len
=
args
.
max_seq_len
)
# Step2: Download dataset and use ClassifyReader to read dataset
# Step2: Download dataset and use ClassifyReader to read dataset
dataset
=
hub
.
dataset
.
ChnSentiCorp
()
dataset
=
None
if
args
.
dataset
.
lower
()
==
"senticorp"
:
dataset
=
hub
.
dataset
.
ChnSentiCorp
()
elif
args
.
dataset
.
lower
()
==
"nlpcc_dbqa"
:
dataset
=
hub
.
dataset
.
NLPCC_DBQA
()
elif
args
.
dataset
.
lower
()
==
"lcqmc"
:
dataset
=
hub
.
dataset
.
LCQMC
()
else
:
raise
ValueError
(
"%s dataset is not defined"
%
args
.
dataset
)
reader
=
hub
.
reader
.
ClassifyReader
(
reader
=
hub
.
reader
.
ClassifyReader
(
dataset
=
dataset
,
dataset
=
dataset
,
vocab_path
=
module
.
get_vocab_path
(),
vocab_path
=
module
.
get_vocab_path
(),
...
@@ -72,7 +83,6 @@ if __name__ == '__main__':
...
@@ -72,7 +83,6 @@ if __name__ == '__main__':
)
)
# Setup runing config for PaddleHub Finetune API
# Setup runing config for PaddleHub Finetune API
print
(
args
.
use_gpu
)
config
=
hub
.
RunConfig
(
config
=
hub
.
RunConfig
(
use_cuda
=
args
.
use_gpu
,
use_cuda
=
args
.
use_gpu
,
num_epoch
=
args
.
num_epoch
,
num_epoch
=
args
.
num_epoch
,
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录