未验证 提交 fd360a7f 编写于 作者: M Meiyim 提交者: GitHub

Dygraph fix3 (#457)

* update readme

* update demo

* + 160G model

* qa model bugfix: models inherits docstrings

* Update README.zh.md

* Update README.en.md

* Update README.zh.md

* reorganize binaries

* Update README.zh.md

* Update README.en.md

* Update README.zh.md

* Update README.en.md
上级 ef8879f6
English|[简体中文](./README.zh.md)
![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone.png)
![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone_en.png)
**Remind: This repo has been refactored, for paper re-production or backward compatibility; plase checkout to [repro branch](https://github.com/PaddlePaddle/ERNIE/tree/repro)**
......@@ -89,23 +89,23 @@ pip install paddle-ernie
or
```shell
git clone -b dygraph https://github.com/PaddlePaddle/ERNIE.git --single-branch
git clone https://github.com/PaddlePaddle/ERNIE.git --depth 1
cd ERNIE
pip install -r requirements.txt
pip setup.py -e .
pip install -e .
```
##### 3. download pretrained models (optional)
| Model | Description |
| :------------------------------------------------- | :----------------------------------------------------------- |
| [ERNIE 1.0 Base for Chinese](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | L12H768A12 |
| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | L3H1024A16 |
| [ERNIE 2.0 Base for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | L12H768A12 |
| [ERNIE 2.0 Large for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | L24H1024A16 |
| [ERNIE Gen base for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | L12H768A12 |
| [ERNIE Gen Large for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| L24H1024A16 |
| Model | Description |abbreviation|
| :------------------------------------------------- | :----------------------------------------------------------- |:-----------|
| [ERNIE 1.0 Base for Chinese](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | L12H768A12 |ernie-1.0|
| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | L3H1024A16 |ernie-tiny|
| [ERNIE 2.0 Base for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | L12H768A12 |ernie-2.0-en|
| [ERNIE 2.0 Large for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | L24H1024A16 |ernie-2.0-large-en|
| [ERNIE Gen base for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | L12H768A12 |ernie-gen-base-en|
| [ERNIE Gen Large for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| L24H1024A16 | ernie-gen-large-en |
| [ERNIE Gen Large 160G for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 + 160G pretrain corpus | ernie-gen-large-160g-en |
##### 4. download datasets
......@@ -143,26 +143,31 @@ see [demo](https://ernie-github.cdn.bcebos.com/data-mnli-m.tar.gz) data for MNLI
- try eager execution with `dygraph model` :
```script
python3 ./demo/finetune_classifier_dygraph.py \
--from_pretrained ernie_1.0 \
--data_dir ./data/xnli
python3 ./ernie_d/demo/finetune_classifier_dygraph.py \
--from_pretrained ernie-1.0 \
--data_dir ./data/xnli
```
- Distributed finetune
`paddle.distributed.launch` is a process manager, we use it to launch python processes on each avalible GPU devices:
when in distributed training, `max_steps` is used as stopping criteria rather than `epoch` to prevent dead block.
also notice than we shard the train data according to device id to prevent over fitting.
When in distributed training, `max_steps` is used as stopping criteria rather than `epoch` to prevent dead block.
You could calculate `max_steps` with `EPOCH * NUM_TRAIN_EXAMPLES / TOTAL_BATCH`.
Also notice than we shard the train data according to device id to prevent over fitting.
demo:
(make sure you have more than 2 GPUs,
online model download can not work in `paddle.distributed.launch`,
you need to run single card finetuning first to get pretrained model, or donwload and extract one manualy from [here](#section-pretrained-models)):
```script
python3 -m paddle.distributed.launch \
./demo/finetune_classifier_dygraph_distributed.py \
--data_dir data/mnli \
--max_steps 10000 \
--from_pretrained ernie2.0-en
--from_pretrained ernie-2.0-en
```
......
[English](./README.en.md)|简体中文
![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone.png)
![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone_zh.png)
ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架,该框架将大数据预训练与多源丰富知识相结合,通过持续学习技术,不断吸收海量文本数据中词汇、结构、语义等方面的知识,实现模型效果不断进化。ERNIE在情感分析、文本匹配、自然语言推理、词法分析、阅读理解、智能问答等16个公开数据集上全面显著超越世界领先技术,在国际权威的通用语言理解评估基准GLUE上,得分首次突破90分,获得全球第一。在今年3月落下帷幕的全球最大语义评测SemEval 2020上,ERNIE摘得5项世界冠军, 该技术也被全球顶级科技商业杂志《麻省理工科技评论》官方网站报道,相关创新成果也被国际顶级学术会议AAAI、IJCAI收录。ERNIE在工业界得到了大规模应用,如搜索引擎、新闻推荐、广告系统、语音交互、智能客服等。
......@@ -87,24 +87,25 @@ pip install paddle-ernie
或者
```shell
git clone -b dygraph https://github.com/PaddlePaddle/ERNIE.git --single-branch
git clone https://github.com/PaddlePaddle/ERNIE.git --depth 1
cd ERNIE
pip install -r requirements.txt
pip setup.py -e .
pip install -e .
```
##### 3. 下载预训练模型(可选)
##### 3. 下载预训练模型(可选)<a name="section-pretrained-models"></a>
| Model | Description |
| :------------------------------------------------- | :----------------------------------------------------------- |
| [ERNIE 1.0 Base 中文](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | L12H768A12 |
| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | L3H1024A16 |
| [ERNIE 2.0 Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | base: L12H768A12 |
| [ERNIE 2.0 Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | large: L24H1024A16|
| [ERNIE Gen base 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | L12H768A12 |
| [ERNIE Gen Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| L24H1024A16 |
| Model | 细节参数 |下载简写|
| :------------------------------------------------- |:------------------------------------------------------------------------- |:-------|
| [ERNIE 1.0 Base 中文](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz) | Layer:12, Hidden:768, Heads:12 |ernie-1.0|
| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz) | Layer:3, Hdden:1024, Heads:16 |ernie-tiny|
| [ERNIE 2.0 Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz) | Layer:12, Hidden:768, Heads:12 |ernie-2.0-en|
| [ERNIE 2.0 Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | Layer:24, Hidden:1024, Heads16 |ernie-2.0-large-en|
| [ERNIE Gen Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz) | Layer:12, Hidden:768, Heads:12 |ernie-gen-base-en|
| [ERNIE Gen Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 |ernie-gen-large-en|
| [ERNIE Gen Large 160G英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 + 额外160G 预训练语料 | ernie-gen-large-160g-en |
##### 4. 下载数据集
......@@ -144,9 +145,9 @@ data/xnli
- 使用 `动态图` 模型进行finetune:
```script
python3 ./demo/finetune_classifier_dygraph.py \
--from_pretrained ernie_1.0 \
--data_dir ./data/xnli
python3 ./ernie_d/demo/finetune_classifier_dygraph.py \
--from_pretrained ernie-1.0 \
--data_dir ./data/xnli
```
- 分布式 finetune
......@@ -154,9 +155,11 @@ python3 ./demo/finetune_classifier_dygraph.py \
`paddle.distributed.launch` 是一个进程管理器,我们采用它在每一张GPU上启动一个python进程,并配置相应的环境变量以进行分布式训练:
当采用分布式训练时,我们采用`max_steps`做为终止条件而非`epoch`, 这样处理是为了避免进程间死锁。
你可以通过`EPOCH * NUM_TRAIN_EXAMPLES / TOTAL_BATCH`的方式计算出所需执行的`max_steps`.
另外值得注意的是训练集需要在不同的进程间进行切分;以避免所有进程训练同一份数据造成的过拟合。
示例脚本(请确保你有两张以上GPU卡):
示例脚本(请确保你有两张以上GPU卡, 在线模型下载功能在`paddle.distributed.launch`下无法工作,
你可能需要一个先通过单卡finetune方式下载预训练模型,或者根据[这里](#section-pretrained-models)手动下载并解压预训练模型):
```script
python3 -m paddle.distributed.launch \
......@@ -227,7 +230,7 @@ sids = np.expand_dims(sids, 0)
result = client(ids, sids)
```
你也可从[此处]((https://ernie.bj.bcebos.com/ernie1.0_zh_inference_model.tar.gz).)下载一个预先制作好的ernie-1.0 base模型的 `inference_model`.
你也可从[此处]((https://ernie.bj.bcebos.com/ernie1.0_zh_inference_model.tar.gz.)下载一个预先制作好的ernie-1.0 base模型的 `inference_model`.
该模型没有经过finetune,一般可以用做上层模型结构的 feature-base finetune或者做为一个文本特征抽取器。
因为该模行由老版API 产出,在进行客户端请求时需要在输入tensor后面追加一个维度:
......
......@@ -51,11 +51,13 @@ if __name__ == '__main__':
parser.add_argument('--bsz', type=int, default=32, help='batchsize')
parser.add_argument('--epoch', type=int, default=3, help='epoch')
parser.add_argument('--data_dir', type=str, required=True, help='data directory includes train / develop data')
parser.add_argument('--max_steps', type=int, required=True, help='max_train_steps, set this to EPOCH * NUM_SAMPLES / BATCH_SIZE')
parser.add_argument('--warmup_proportion', type=float, default=0.1)
parser.add_argument('--use_lr_decay', action='store_true', help='if set, learning rate will decay to zero at `max_steps`')
parser.add_argument('--warmup_proportion', type=float, default=0.1, help='if use_lr_decay is set, '
'learning rate will raise to `lr` at `warmup_proportion` * `max_steps` and decay to 0. at `max_steps`')
parser.add_argument('--lr', type=float, default=5e-5, help='learning rate')
parser.add_argument('--inference_model_dir', type=str, default=None, help='inference model output directory')
parser.add_argument('--save_dir', type=str, default=None, help='model output directory')
parser.add_argument('--max_steps', type=int, default=None, help='max_train_steps, set this to EPOCH * NUM_SAMPLES / BATCH_SIZE')
parser.add_argument('--wd', type=float, default=0.01, help='weight decay, aka L2 regularizer')
......@@ -102,7 +104,11 @@ if __name__ == '__main__':
with FD.guard(place):
model = ErnieModelForSequenceClassification.from_pretrained(args.from_pretrained, num_labels=3, name='')
opt = AdamW(learning_rate=LinearDecay(args.lr, int(args.warmup_proportion * args.max_steps), args.max_steps), parameter_list=model.parameters(), weight_decay=args.wd)
if args.use_lr_decay:
opt = AdamW(learning_rate=LinearDecay(args.lr, int(args.warmup_proportion * args.max_steps), args.max_steps), parameter_list=model.parameters(), weight_decay=args.wd)
else:
opt = AdamW(args.lr, parameter_list=model.parameters(), weight_decay=args.wd)
g_clip = F.dygraph_grad_clip.GradClipByGlobalNorm(1.0) #experimental
for epoch in range(args.epoch):
for step, d in enumerate(tqdm(train_ds.start(place), desc='training')):
......@@ -117,7 +123,7 @@ if __name__ == '__main__':
acc = []
with FD.base._switch_tracer_mode_guard_(is_train=False):
model.eval()
for step, d in enumerate(tqdm(dev_ds.start(), desc='evaluating %d' % epoch)):
for step, d in enumerate(tqdm(dev_ds.start(place), desc='evaluating %d' % epoch)):
ids, sids, label = d
loss, logits = model(ids, sids, labels=label)
#print('\n'.join(map(str, logits.numpy().tolist())))
......
......@@ -44,8 +44,8 @@ from ernie.modeling_ernie import ErnieModel, ErnieModelForQuestionAnswering
from ernie.tokenizing_ernie import ErnieTokenizer, ErnieTinyTokenizer
from ernie.optimization import AdamW, LinearDecay
from ernie.mrc import mrc_reader
from ernie.mrc import mrc_metrics
from demo.mrc import mrc_reader
from demo.mrc import mrc_metrics
log.setLevel(logging.DEBUG)
logging.getLogger().addHandler(log.handlers[0])
......
......@@ -41,3 +41,10 @@ def _fetch_from_remote(url, force_download=False):
log.debug('%s cached in %s' % (url, cached_dir))
return cached_dir
def add_docstring(doc):
def func(f):
f.__doc__ += ('\n======other docs from supper class ======\n%s' % doc)
return f
return func
......@@ -29,7 +29,7 @@ import paddle.fluid.dygraph as D
import paddle.fluid as F
import paddle.fluid.layers as L
from ernie.file_utils import _fetch_from_remote
from ernie.file_utils import _fetch_from_remote, add_docstring
log = logging.getLogger(__name__)
......@@ -288,6 +288,11 @@ class ErnieModel(D.Layer, PretrainedModel):
Mask to avoid performing attention on the padding token indices of the encoder input.
attn_bias(optional, `Variable` of shape `[batch_size, seq_len, seq_len] or False`):
3D version of `input_mask`, if set, overrides `input_mask`; if set not False, will not apply attention mask
past_cache(optional, tuple of two lists: cached key and cached value,
each is a list of `Variable`s of shape `[batch_size, seq_len, hidden_size]`):
cached key/value tensor that will be concated to generated key/value when performing self attention.
if set, `attn_bias` should not be None.
Returns:
pooled (`Variable` of shape `[batch_size, hidden_size]`):
output logits of pooler classifier
......@@ -360,6 +365,7 @@ class ErnieModelForSequenceClassification(ErnieModel):
prob = cfg.get('classifier_dropout_prob', cfg['hidden_dropout_prob'])
self.dropout = lambda i: L.dropout(i, dropout_prob=prob, dropout_implementation="upscale_in_train",) if self.training else i
@add_docstring(ErnieModel.forward.__doc__)
def forward(self, *args, **kwargs):
"""
Args:
......@@ -400,6 +406,7 @@ class ErnieModelForTokenClassification(ErnieModel):
prob = cfg.get('classifier_dropout_prob', cfg['hidden_dropout_prob'])
self.dropout = lambda i: L.dropout(i, dropout_prob=prob, dropout_implementation="upscale_in_train",) if self.training else i
@add_docstring(ErnieModel.forward.__doc__)
def forward(self, *args, **kwargs):
"""
Args:
......@@ -441,6 +448,7 @@ class ErnieModelForQuestionAnswering(ErnieModel):
prob = cfg.get('classifier_dropout_prob', cfg['hidden_dropout_prob'])
self.dropout = lambda i: L.dropout(i, dropout_prob=prob, dropout_implementation="upscale_in_train",) if self.training else i
@add_docstring(ErnieModel.forward.__doc__)
def forward(self, *args, **kwargs):
"""
Args:
......@@ -460,7 +468,7 @@ class ErnieModelForQuestionAnswering(ErnieModel):
start_pos = kwargs.pop('start_pos', None)
end_pos = kwargs.pop('end_pos', None)
pooled, encoded, _ = super(ErnieModelForQuestionAnswering, self).forward(*args, **kwargs)
pooled, encoded = super(ErnieModelForQuestionAnswering, self).forward(*args, **kwargs)
encoded = self.dropout(encoded)
encoded = self.classifier(encoded)
start_logit, end_logits = L.unstack(encoded, axis=-1)
......@@ -529,6 +537,7 @@ class ErnieModelForPretraining(ErnieModel):
is_bias=True,
)
@add_docstring(ErnieModel.forward.__doc__)
def forward(self, *args, **kwargs):
"""
Args:
......@@ -550,7 +559,7 @@ class ErnieModelForPretraining(ErnieModel):
mlm_labels = kwargs.pop('labels')
mlm_pos = kwargs.pop('mlm_pos')
nsp_labels = kwargs.pop('nsp_labels')
pooled, encoded, _ = super(ErnieModelForPretraining, self).forward(*args, **kwargs)
pooled, encoded = super(ErnieModelForPretraining, self).forward(*args, **kwargs)
if len(mlm_labels.shape) == 1:
mlm_labels = L.reshape(mlm_labels, [-1, 1])
if len(nsp_labels.shape) == 1:
......
......@@ -32,6 +32,7 @@ class ErnieModelForGeneration(ErnieModel):
resource_map = {
'ernie-gen-base-en': ErnieModel.bce + 'model-ernie-gen-base-en.1.tar.gz',
'ernie-gen-large-en': ErnieModel.bce + 'model-ernie-gen-large-en.1.tar.gz',
'ernie-gen-large-160g-en': ErnieModel.bce + 'model-ernie-gen-large-160g-en.1.tar.gz',
'ernie-1.0': ErnieModel.bce + 'model-ernie1.0.1.tar.gz',
}
def __init__(self, cfg, name=None):
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册