Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleHub
提交
abd29633
P
PaddleHub
项目概览
PaddlePaddle
/
PaddleHub
大约 1 年 前同步成功
通知
282
Star
12117
Fork
2091
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
200
列表
看板
标记
里程碑
合并请求
4
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleHub
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
200
Issue
200
列表
看板
标记
里程碑
合并请求
4
合并请求
4
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
abd29633
编写于
4月 12, 2019
作者:
Z
Zeyu Chen
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
update README.md
上级
a409d43c
变更
8
隐藏空白更改
内联
并排
Showing
8 changed file
with
62 addition
and
9 deletion
+62
-9
demo/ernie-classification/README.md
demo/ernie-classification/README.md
+57
-1
demo/ernie-classification/question_answering.py
demo/ernie-classification/question_answering.py
+0
-1
demo/ernie-classification/question_matching.py
demo/ernie-classification/question_matching.py
+0
-1
demo/ernie-classification/run_question_matching.sh
demo/ernie-classification/run_question_matching.sh
+1
-1
demo/ernie-classification/run_sentiment_cls.sh
demo/ernie-classification/run_sentiment_cls.sh
+1
-1
demo/ernie-classification/sentiment_cls.py
demo/ernie-classification/sentiment_cls.py
+0
-1
demo/ernie-seq-labeling/run_sequence_labeling.sh
demo/ernie-seq-labeling/run_sequence_labeling.sh
+1
-1
demo/ernie-seq-labeling/sequence_labeling.py
demo/ernie-seq-labeling/sequence_labeling.py
+2
-2
未找到文件。
demo/ernie-classification/README.md
浏览文件 @
abd29633
# ERNIE Classification
本示例如果使用PaddleHub Finetune API快速的完成Transformer类模型ERNIE或BERT完成文本分类任务。
本示例将展示如何使用PaddleHub Finetune API利用ERNIE完成分类任务。
其中分类任务可以分为两大类
*
单句分类
-
中文情感分析任务 ChnSentiCorp
*
句对分类
-
语义相似度 LCQMC
-
检索式问答任务 nlpcc-dbqa
## 如何开始Finetune
在完成安装PaddlePaddle与PaddleHub后,通过执行脚本
`sh run_sentiment_cls.sh`
即可开始使用ERNIE对ChnSentiCorp数据集进行Finetune。
其中脚本参数说明如下:
```
bash
--batch_size
: 批处理大小,请结合显存情况进行调整,若出现显存不足错误,请调低这一参数值
--weight_decay
:
--checkpoint_dir
: 模型保存路径,PaddleHub会自动保存验证集上表现最好的模型
--num_epoch
: Finetune迭代的轮数
--max_seq_len
: ERNIE模型使用的最大序列长度,最大不能超过512,
若出现显存不足错误,请调低这一参数
```
## 代码步骤
使用PaddleHub Finetune API进行Finetune可以分为一下4个步骤
### Step1: 加载预训练模型
```
python
module
=
hub
.
Module
(
name
=
"ernie"
)
inputs
,
outputs
,
program
=
module
.
context
(
trainable
=
True
,
max_seq_len
=
128
)
```
其中最大序列长度
`max_seq_len`
是可以调整的参数,建议值128,根据任务文本长度不同可以调整该值,但最大不超过512。
如果想尝试BERT模型,例如BERT中文模型,只需要更换Module中的参数即可.
PaddleHub除了ERNIE,还提供以下BERT模型:
BERT模型名 | PaddleHub Module name
---------------------------------- | :------:
BERT-Base, Uncased | bert_uncased_L-12_H-768_A-12
BERT-Large, Uncased | bert_uncased_L-24_H-1024_A-16
BERT-Base, Cased | bert_cased_L-12_H-768_A-12
BERT-Large, Cased | bert_cased_L-24_H-1024_A-16
BERT-Base, Multilingual Cased | bert_multi_cased_L-12_H-768_A-12
BERT-Base, Chinese | bert_chinese_L-12_H-768_A-12
```
python
# 即可无缝切换BERT中文模型
module
=
hub
.
Module
(
name
=
"bert_chinese_L-12_H-768_A-12"
)
```
demo/ernie-classification/question_answering.py
浏览文件 @
abd29633
...
...
@@ -22,7 +22,6 @@ import paddlehub as hub
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
3
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--learning_rate"
,
type
=
float
,
default
=
5e-5
,
help
=
"Learning rate used to train with warmup."
)
parser
.
add_argument
(
"--hub_module_dir"
,
type
=
str
,
default
=
None
,
help
=
"PaddleHub module directory"
)
parser
.
add_argument
(
"--weight_decay"
,
type
=
float
,
default
=
0.01
,
help
=
"Weight decay rate for L2 regularizer."
)
parser
.
add_argument
(
"--data_dir"
,
type
=
str
,
default
=
None
,
help
=
"Path to training data."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
None
,
help
=
"Directory to model checkpoint"
)
...
...
demo/ernie-classification/question_matching.py
浏览文件 @
abd29633
...
...
@@ -22,7 +22,6 @@ import paddlehub as hub
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
3
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--learning_rate"
,
type
=
float
,
default
=
5e-5
,
help
=
"Learning rate used to train with warmup."
)
parser
.
add_argument
(
"--hub_module_dir"
,
type
=
str
,
default
=
None
,
help
=
"PaddleHub module directory"
)
parser
.
add_argument
(
"--weight_decay"
,
type
=
float
,
default
=
0.01
,
help
=
"Weight decay rate for L2 regularizer."
)
parser
.
add_argument
(
"--data_dir"
,
type
=
str
,
default
=
None
,
help
=
"Path to training data."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
None
,
help
=
"Directory to model checkpoint"
)
...
...
demo/ernie-classification/run_question_matching.sh
浏览文件 @
abd29633
export
CUDA_VISIBLE_DEVICES
=
0
export
CUDA_VISIBLE_DEVICES
=
5
CKPT_DIR
=
"./ckpt_question_matching"
python
-u
question_matching.py
\
...
...
demo/ernie-classification/run_sentiment_cls.sh
浏览文件 @
abd29633
export
CUDA_VISIBLE_DEVICES
=
3
export
CUDA_VISIBLE_DEVICES
=
5
CKPT_DIR
=
"./ckpt_sentiment_cls"
python
-u
sentiment_cls.py
\
...
...
demo/ernie-classification/sentiment_cls.py
浏览文件 @
abd29633
...
...
@@ -22,7 +22,6 @@ import paddlehub as hub
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
3
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--learning_rate"
,
type
=
float
,
default
=
5e-5
,
help
=
"Learning rate used to train with warmup."
)
parser
.
add_argument
(
"--hub_module_dir"
,
type
=
str
,
default
=
None
,
help
=
"PaddleHub module directory"
)
parser
.
add_argument
(
"--weight_decay"
,
type
=
float
,
default
=
0.01
,
help
=
"Weight decay rate for L2 regularizer."
)
parser
.
add_argument
(
"--data_dir"
,
type
=
str
,
default
=
None
,
help
=
"Path to training data."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
None
,
help
=
"Directory to model checkpoint"
)
...
...
demo/ernie-seq-labeling/run_sequence_labeling.sh
浏览文件 @
abd29633
export
CUDA_VISIBLE_DEVICES
=
0
export
CUDA_VISIBLE_DEVICES
=
6
CKPT_DIR
=
"./ckpt_sequence_labeling"
...
...
demo/ernie-seq-labeling/sequence_labeling.py
浏览文件 @
abd29633
...
...
@@ -13,7 +13,8 @@
# limitations under the License.
"""Finetuning on sequence labeling task."""
import
paddle
import
argparse
import
paddle.fluid
as
fluid
import
paddlehub
as
hub
...
...
@@ -21,7 +22,6 @@ import paddlehub as hub
parser
=
argparse
.
ArgumentParser
(
__doc__
)
parser
.
add_argument
(
"--num_epoch"
,
type
=
int
,
default
=
3
,
help
=
"Number of epoches for fine-tuning."
)
parser
.
add_argument
(
"--learning_rate"
,
type
=
float
,
default
=
5e-5
,
help
=
"Learning rate used to train with warmup."
)
parser
.
add_argument
(
"--hub_module_dir"
,
type
=
str
,
default
=
None
,
help
=
"PaddleHub module directory"
)
parser
.
add_argument
(
"--weight_decay"
,
type
=
float
,
default
=
0.01
,
help
=
"Weight decay rate for L2 regularizer."
)
parser
.
add_argument
(
"--checkpoint_dir"
,
type
=
str
,
default
=
None
,
help
=
"Directory to model checkpoint"
)
parser
.
add_argument
(
"--max_seq_len"
,
type
=
int
,
default
=
512
,
help
=
"Number of words of the longest seqence."
)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录