提交 f6c68c85 编写于 作者: X xixiaoyao

test=develop[

上级 ee949ea1
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
PALM
===
PALM (PAddLe for Multi-task) 是一个灵活易用的多任务学习框架,框架中内置了丰富的模型backbone(BERT、ERNIE等)、常见的任务范式(分类、匹配、序列标注、机器阅读理解等)和数据集读取与处理工具。对于典型的任务场景,用户几乎无需书写代码便可完成新任务的添加;对于特殊的任务场景,用户可通过对预置接口的实现来完成对新任务的支持。
## 安装
# 多任务学习框架PaddlePALM
目前仅支持git clone源码的方式使用:
```shell
git clone https://github.com/PaddlePaddle/PALM.git
```
**环境依赖**
- Python >= 2.7
- cuda >= 9.0
- cudnn >= 7.0
- PaddlePaddle >= 1.6 (请参考[安装指南](http://www.paddlepaddle.org/#quick-start)进行安装)
## 目录结构
- backbone: 多任务学习的主干网络表示,支持bert, ernie等,用户可自定义添加
- config:存放各个任务实例的配置文件,用户添加任务时需在此建立该任务的配置文件
- data: 存放各个任务的数据集
- pretrain_model: 存放预训练模型、字典及其相关配置
- optimizer: 优化器,用户可在此自定义优化器
- reader: 各个任务的数据读取与处理模块以及做reader融合的joint_reader文件
- paradigm: 任务输出层相关网络结构描述
- utils: 通用工具函数文件
- mtl_run.py: 多任务学习的主要流程描述
- run.sh: 多任务学习启动脚本
## 使用说明
# 安装
pip install paddlepalm
框架给出了三个添加完成的任务示例:*Machine Reading Comprehension**Mask Language Model**Question Answer Matching*。其中在`mtl_config.yaml`中将*Machine Reading Comprehension*设置为了主任务,其他为辅助任务,用户可通过如下命令启动多任务学习
# 使用
```shell
bash run.sh
```
*提示:首次运行时,脚本会自动下载预训练的bert和ernie模型,请耐心等待*
### 1. 创建任务实例
### 多任务学习配置
使用yaml格式描述任务实例,每个任务实例中的必选字段包括
`mtl_config.yaml`中完成对多任务训练和推理的主配置,配置包含如下
- train_file: 训练集文件路径
- reader: 数据集载入与处理工具名,框架预置reader列表见[这里](https://www.baidu.com/)
- backbone: 骨架模型名,框架预置reader列表见[这里](https://www.baidu.com/)
- paradigm: 任务范式(类型)名,框架预置paradigm列表见[这里](https://www.baidu.com/)
***必选字段***
### 2. 完成训练配置
- main_task:*(str)* 指定主任务的名称,目前仅支持单个主任务。名称选取自`config`文件夹中的配置的文件名(不包含后缀`.yaml`和为task共享而设置的中间后缀)
- auxiliary_task:*(str)* 指定辅助任务,支持多个辅助任务,辅助任务之间使用空格隔开。名称选取自`config`文件夹中的配置的文件名(不包含后缀`.yaml`和为task共享而设置的中间后缀)
- do_train:*(bool)* 训练标志位
- do_predict:*(bool)* 预测标志位,目前仅支持对主任务进行预测
- checkpoint_path: *(str)* 模型保存、训练断点恢复和预测模型载入路径,从该路径载入模型时默认读取最后一个训练step的模型
- backbone_model:*(str)* 使用的骨干网络,名称选取自`backbone`目录下的模块。注意,更换backbone时,若使用预训练模型,应同步更换预训练模型参数、配置和字典等相关字段
- vocab_path:*(str)* 字典文件,纯文本格式存储,其中每行为一个单词
- optimizer:*(str)* 优化器名称,名称选取自`optimizer`中的文件名
- learning_rate:*(str)* 训练阶段的学习率
- skip_steps:*(int)* 训练阶段打印日志的频率(step为单位)
- epoch:*(int)* 主任务的训练epoch数
- use_cuda:*(bool)* 使用GPU训练的标志位
- warmup_proportion:*(float)* 预训练模型finetuning时的warmup比例
- use_ema:*(bool)* 是否开启ema进行训练和推理
- ema_decay:*(float)* 开启ema时的衰减指数
- random_seed:*(int)* 随机种子
- use_fp16:*(bool)* 开启混合精度训练标志位
- loss_scaling:*(float)* 开启混合精度训练时的loss缩放因子
使用yaml格式完成配置多任务学习中的相关参数,如指定任务实例及其相关的主辅关系、参数复用关系、采样权重等
***可选字段***
### 3. 开始训练
- pretrain_model_path:*(str)* 预训练模型的载入路径,该路径下应包含存储模型参数的params文件夹
- pretrain_config_path:*(str)* 预训练模型的配置文件,json格式描述
- do_lower_case:*(bool)* 预处理阶段是否区分大小写
- 其他用户自定义字段
```python
import paddlepalm as palm
### 使用示例
若内置任务可满足用户需求,或用户已经完成自定义任务的添加,可通过如下方式直接启动多任务学习。
if __name__ == '__main__':
controller = palm.Controller('config.yaml', task_dir='task_instance')
controller.load_pretrain('pretrain_model/ernie/params')
controller.train()
```
例如,框架中内置了一个小数据集,包含MRQA阅读理解评测数据`mrqa`、MaskLM训练数据`mlm4mrqa`和问题与答案所在上下文的匹配数据集`am4mrqa`,而在框架中已经内置了机器阅读理解任务(`reading_comprehension`)、掩码语言模型任务(`mask_language_model`)和问答匹配任务(`answer_matching`)。这里我们希望用掩码语言模型和问答匹配任务来提升机器阅读理解任务的效果,那么我们可通过如下流程完成多任务学习的启动。
### 4. 预测
首先在config文件夹中添加训练任务相关的配置文件:
用户可在训练结束后直接调用pred接口对某个目标任务进行预测
1. `reading_comprehension.yaml`
示例:
```python
train_file: "data/mrqa/mrqa-combined.train.raw.json"
predict_file: "data/mrqa/mrqa-combined.dev.raw.json"
batch_size: 4
mix_ratio: 1.0
in_tokens: false
doc_stride: 128
sample_rate: 0.02
...
```
import paddlepalm as palm
2. `mask_language_model.yaml`
```python
train_file: "data/mlm4mrqa"
mix_ratio: 0.4
batch_size: 4
in_tokens: False
generate_neg_sample: False
if __name__ == '__main__':
controller = palm.Controller(config_path='config.yaml', task_dir='task_instance')
controller.load_pretrain('pretrain_model/ernie/params')
controller.train()
controller.pred('mrqa')
```
3. `answer_matching.yaml`
```python
train_file: "data/am4mrqa/train.txt"
mix_ratio: 0.4
batch_size: 4
in_tokens: False
```
而后可以在主配置文件`mtl_config.yaml`中完成多任务学习的配置,其中,使用`main_task`字段指定主任务,使用`auxilary_task`可指定辅助任务,多个辅助任务之间使用空格"` `"隔开。
epoch的设定仅针对设定为主任务有效,`mix ratio`的基准值1.0也是针对主任务的训练步数而言的。例如,对于`epoch=2`,若将`reading_comprehension`任务的`mix ratio`设定为1.0,`mask_language_model``mix ratio`设定为0.5,那么`reading_comprehension`任务将训练两个完整的`epoch`,而`mask_language_model`任务的训练步数等于`reading_comprehension`训练步数的一半。
也可新建controller直接预测
```python
main_task: "reading_comprehension"
auxiliary_task: "mask_language_model answer_matching"
do_train: True
epoch: 2
...
import paddlepalm as palm
if __name__ == '__main__':
controller = palm.Controller(config_path='config.yaml', task_dir='task_instance')
controller.pred('mrqa', infermodel_path='output_model/firstrun2/infer_model')
```
最后,运行`run.sh`启动三个任务的联合训练。若用户希望删掉其中某些辅助任务,可通过修改`mtl_config.yaml`中的`auxilary_task`字段来实现。
### 添加新任务
用户添加任务时,在准备好该任务的数据集后,需要完成如下3处开发工作:
***config模块***
位于`./config`目录。存放各个任务实例的配置文件,使用`yaml`格式描述。配置文件中的必选字段包括
- batch_size:每个训练或推理step所使用样本数。当`in_tokens`为True时,`batch_size`表示每个step所包含的tokens数量。
- in_tokens:是否使用lod tensor的方式构造batch,当`in_tokens`为False时,使用padding方式构造batch。
训练阶段包含的必选字段包括
- train_file:训练集文件路径
- mix_ratio:该任务的训练阶段采样权重(1.0代表与主任务采样次数的期望相同)
推理阶段包含的必选字段包括
- predict_file:测试集文件路径
此外用户可根据任务需要,自行定义其他超参数,该超参可在创建任务模型时被访问
***reader模块***
位于`./reader`目录下。完成数据集读取与处理。新增的reader应放置在`paradigm`目录下,且包含一个`get_input_shape`方法和`DataProcessor`类。
- **get_input_shape**: *(function)* 定义reader给backbone和task_paradigm生成的数据的shape和dtype,且需要同时返回训练和推理阶段的定义。
- 输入参数
- args: *(dict)* 解析后的任务配置
- 返回值
- train_input_shape: *(dict)* 包含backbone和task两个key,每个key对应的value为一个list,存储若干`(shape, dtype)`的元组
- test_input_shape: *(dict)* 包含backbone和task两个key,每个key对应的value为一个list,存储若干`(shape, dtype)`的元组
- **DataProcessor***(class)* 定义数据集的载入、预处理和遍历
- \_\_init\_\_: 构造函数,解析和存储相关参数,进行必要的初始化
- 输入参数
- args: *(dict)* 解析后的任务配置
- 返回值
-
- data_generator: *(function)* 数据集的迭代器,被遍历时每次yield一个batch
- 输入参数
- phase: *(str)* 任务所处阶段,支持训练`train`和推理`predict`两种可选阶段
- shuffle: *(bool)* 训练阶段是否进行数据集打乱
- dev_count: *(int)* 可用的GPU数量或CPU数量
- yield输出
- tensors: (list) 根据`get_input_shape`中定义的任务backbone和task的所需输入shape和类型,来yield相应list结构的数据。其中被yield出的list的头部元素为backbone要求的输入数据,后续元素为task要求的输入数据
- get_num_examples: *(function)* 返回样本数。注意由于滑动窗口等机制,实际运行时产生的样本数可能多于数据集中的样本数,这时应返回runtime阶段实际样本数
- 输入参数
-
- 返回值
- num_examples: *(int)* 样本数量
***task_paradigm模块***
位于`./paradigm`目录下。描述任务范式(如分类、匹配、阅读理解等)。新增的任务范式应放置在`paradigm`目录下,且应包含`compute_loss``create_model`两个必选方法,以及`postprocess``global_postprocess`两个可选方法。
- create_model:*(function)* 创建task模型
- 输入参数
- reader_input:*(nested Variables)* 数据输入层的输出,定义位于该任务的reader模块的`input_shape`方法中。输入的前N个元素为backbone的输入元素,之后的元素为task的输入。
- base_model:*(Model)* 模型backbone的实例,可调用backbone的对外输出接口来实现task与backbone的连接。一般来说,backbone的输出接口最少包括`final_sentence_representation``final_word_representation`两个属性。
- base_model.final_sentence_representation:*(Variable)* 输入文本的向量表示,shape为`[batch_size, hidden_size]`
- base_model.final_word_representation:*(Variable)* 输入文本中每个单词的向量表示,shape为`[batch_size, max_seqlen, hidden_size]`
- is_training:*(bool)* 训练标志位
- args:*(Argument)* 任务相关的参数配置,具体参数在config文件夹中定义
- 返回值
- output_tensors: *(dict)* 任务输出的tensor字典。训练阶段的输出字典中应至少包括num_seqs元素,num_seqs记录了batch中所包含的样本数(在输入为lod tensor(args.in_tokens被设置为True)时所以样本压平打平,没有样本数维度)
- compute_loss: *(function)* 计算task在训练阶段的batch平均损失值
- 输入参数
- output_tensors: *(dict)* 创建task时(调用`create_model`)时返回值,存储计算loss所需的Variables的名字到实例的映射
- args:*(Argument)* 任务相关的参数配置,具体参数在config文件夹中定义
- 返回值
- total_loss:*(Variable)* 当前batch的平均损失值
- postprocess:*(function)* 推理阶段对每个推理step得到的fetch_results进行的后处理,返回对该step的每个样本的后处理结果
- 输入参数
- fetch_results:(dict) 当前推理step的fetch_dict中的计算结果,其中fetch_dict在create_model时定义并返回。
- 返回值
- processed_results:(list)当前推理step所有样本的后处理结果。
- global_postprocess: *(function)* 推理结束后,对全部样本的后处理结果进行最终处理(如结果保存、二次后处理等)
- 输入参数
- pred_buf:所有测试集样本的预测后处理结果
- processor:任务的数据集载入与处理类DataProcessor的实例
- mtl_args:多任务学习配置,在`mtl_conf.yaml`中定义
- task_args:任务相关的参数配置,在`conf`文件夹中定义
- 返回值
-
***命名规范***
为新任务创建config,task_paradigm和reader文件后,应将三者文件名统一,且为reader文件的文件名增加`_reader`后缀。例如,用户添加的新任务名为yelp_senti,则config文件名为`yelp_senti.yaml`,放置于config文件夹下;task_paradigm文件名为`yelp_senti.py`,放置于paradigm文件夹下;reader文件名为`yelp_senti_reader.py`,放置于reader文件夹下。
***One-to-One模式(任务层共享)***
框架默认使用one-to-many的模式进行多任务训练,即多任务共享encoder,不共享输出层。该版本同时支持one-to-one模式,即多任务同时共享encoder和输出层(模型参数完全共享,但是有不同的数据源)。该模式通过config文件命名的方式开启,具体流程如下。
```
1. mtl_config.yaml下用户配置任务相关的名称,如main_task: "reading_comprehension"
2. 如果一个任务的数据集是多个来源,请在configs下对同一个任务添加多个任务配置,如任务为"reading_comprehension"有两个数据集需要训练,且每个batch内的数据都来自同一数据集,则需要添加reading_comprehension.name1.yaml和reading_comprehension.name2.yaml两个配置文件,其中name1和name2用户可根据自己需求定义名称,框架内不限定名称定义;
3. 启动多任务学习:sh run.sh
```
# 运行机制
### 多任务学习机制
pass
## 框架结构与运行原理
框架结构如图所示
### 训练终止机制
![框架图](https://tva1.sinaimg.cn/large/006y8mN6ly1g7goo0bjzwj31c20om13h.jpg)
- 默认的设置:
- **所有target任务达到目标训练步数后多任务学习停止**
- 未设置成target任务的任务(即辅助任务)不会影响训练终止与否,只是担任”陪训“的角色
- 注:默认所有的任务都是target任务,用户可以通过`target_tag`来标记目标/辅助任务
- 每个目标任务的目标训练步数由num_epochs和mix_ratio计算得到
其中`mtl_config.yaml`用于配置多任务主控的参数设定,每个任务实例的配置由用户完成后放置于`config`文件夹中。当用户运行`run.sh`后,脚本启动多任务学习控制器,控制器开始解析`mtl_config.yaml`和各个任务实例的配置文件,进而创建backbone、为各个任务创建reader和任务层,最后控制器启动训练任务,实现多任务训练。
### 保存机制
## License
This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](https://github.com/PaddlePaddle/models/blob/develop/LICENSE).
- 默认的设置:
- 训练过程中,保存下来的模型分为checkpoint (ckpt)和inference model (infermodel)两种:
- ckpt保存的是包含所有任务的总计算图(即整个多任务学习计算图),用于训练中断恢复
- infermodel保存的是某个目标任务的推理计算图和推理依赖的相关配置
- 对于每个target任务,训练到预期的步数后自动保存inference model,之后不再保存。(注:保存inference model不影响ckpt的保存)
- 用户可改配置
- 使用`save_ckpt_every_steps`来控制保存ckpt的频率,默认不保存
- 每个task instance均可使用`save_infermodel_every_steps`来控制该task保存infermodel的频率,默认为-1,即只在达到目标训练步数时保存一下
## 许可证书
此向导由[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)贡献,受[Apache-2.0 license](https://github.com/PaddlePaddle/models/blob/develop/LICENSE)许可认证。
该目录用来存放模型backbone,用户可通过实现backbone的接口完成自定义。
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
from functools import reduce
import numpy as np
import paddle.fluid as fluid
import paddle.fluid.layers as layers
from paddle.fluid.layer_helper import LayerHelper
def layer_norm(x, begin_norm_axis=1, epsilon=1e-6, param_attr=None, bias_attr=None):
helper = LayerHelper('layer_norm', **locals())
mean = layers.reduce_mean(x, dim=begin_norm_axis, keep_dim=True)
shift_x = layers.elementwise_sub(x=x, y=mean, axis=0)
variance = layers.reduce_mean(layers.square(shift_x), dim=begin_norm_axis, keep_dim=True)
r_stdev = layers.rsqrt(variance + epsilon)
norm_x = layers.elementwise_mul(x=shift_x, y=r_stdev, axis=0)
param_shape = [reduce(lambda x, y: x * y, norm_x.shape[begin_norm_axis:])]
param_dtype = norm_x.dtype
scale = helper.create_parameter(
attr=param_attr,
shape=param_shape,
dtype=param_dtype,
default_initializer=fluid.initializer.Constant(1.))
bias = helper.create_parameter(
attr=bias_attr,
shape=param_shape,
dtype=param_dtype,
is_bias=True,
default_initializer=fluid.initializer.Constant(0.))
out = layers.elementwise_mul(x=norm_x, y=scale, axis=-1)
out = layers.elementwise_add(x=out, y=bias, axis=-1)
return out
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError(
"Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input = queries,
size = d_key * n_head,
num_flatten_dims = 2,
param_attr = fluid.ParamAttr(
name = name + '_query_fc.w_0',
initializer = param_initializer),
bias_attr = name + '_query_fc.b_0')
k = layers.fc(input = keys,
size = d_key * n_head,
num_flatten_dims = 2,
param_attr = fluid.ParamAttr(
name = name + '_key_fc.w_0',
initializer = param_initializer),
bias_attr = name + '_key_fc.b_0')
v = layers.fc(input = values,
size = d_value * n_head,
num_flatten_dims = 2,
param_attr = fluid.ParamAttr(
name = name + '_value_fc.w_0',
initializer = param_initializer),
bias_attr = name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(
x = x, shape = [0, 0, n_head, hidden_size // n_head], inplace=False)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(
x = trans_x,
shape = [0, 0, trans_x.shape[2] * trans_x.shape[3]],
inplace = False)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x = q, scale = d_key**-0.5)
product = layers.matmul(x = scaled_q, y = k, transpose_y = True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(
weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat(
[layers.reshape(
cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat(
[layers.reshape(
cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key,
dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input = out,
size = d_model,
num_flatten_dims = 2,
param_attr=fluid.ParamAttr(
name = name + '_output_fc.w_0',
initializer = param_initializer),
bias_attr = name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x,
d_inner_hid,
d_hid,
dropout_rate,
hidden_act,
param_initializer=None,
name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(
name=name + '_fc_0.w_0',
initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(
hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test = False)
out = layers.fc(input = hidden,
size = d_hid,
num_flatten_dims = 2,
param_attr=fluid.ParamAttr(
name = name + '_fc_1.w_0',
initializer = param_initializer),
bias_attr = name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0.,
name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x = out, dtype = "float32")
out = layer_norm(
out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(
name = name + '_layer_norm_scale',
initializer = fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(
name = name + '_layer_norm_bias',
initializer = fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x = out, dtype = "float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(
out,
dropout_prob = dropout_rate,
dropout_implementation = "upscale_in_train",
is_test = False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(
pre_process_layer(
enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer = param_initializer,
name = name + '_multi_head_att')
attn_output = post_process_layer(
enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name = name + '_post_att')
ffd_output = positionwise_feed_forward(
pre_process_layer(
attn_output,
preprocess_cmd,
prepostprocess_dropout,
name = name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer = param_initializer,
name = name + '_ffn')
return post_process_layer(
attn_output,
ffd_output,
postprocess_cmd,
prepostprocess_dropout,
name = name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name='',
return_all = False):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
enc_outputs = []
for i in range(n_layer):
enc_output = encoder_layer(
enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer = param_initializer,
name = name + '_layer_' + str(i))
enc_input = enc_output
if i < n_layer - 1:
enc_outputs.append(enc_output)
enc_output = pre_process_layer(
enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
enc_outputs.append(enc_output)
if not return_all:
return enc_output
else:
return enc_output, enc_outputs
task_instance: "mrqa, match4mrqa"
target_tag: 1, 0
mix_ratio: 1.0, 0.5
save_path: "output_model/firstrun"
backbone: "ernie"
backbone_config_path: "pretrain_model/ernie/ernie_config.json"
vocab_path: "pretrain_model/ernie/vocab.txt"
do_lower_case: True
max_seq_len: 512
batch_size: 5
num_epochs: 2
optimizer: "adam"
learning_rate: 3e-5
warmup_proportion: 0.1
weight_decay: 0.1
train_file: "data/am4mrqa/train.txt"
mix_ratio: 0.4
batch_size: 4
in_tokens: False
generate_neg_sample: False
train_file: "data/mrqa/mrqa-combined.train.raw.json"
predict_file: "data/mrqa/mrqa-combined.dev.raw.json"
sample_rate: 0.02
mix_ratio: 1.0
batch_size: 4
in_tokens: false
doc_stride: 128
with_negative: false
max_query_length: 64
max_answer_length: 30
n_best_size: 20
null_score_diff_threshold: 0.0
verbose: False
因为 它太大了无法显示 source diff 。你可以改为 查看blob
因为 它太大了无法显示 source diff 。你可以改为 查看blob
此差异已折叠。
# coding: utf-8
f='mrqa-combined.train.raw.json'
import json
a=json.load(open(f))
a=a['data']
writer = open('train.json','w')
for s in a:
p = s['paragraphs']
assert len(p) == 1
p = p[0]
q = {}
q['context'] = p['context']
q['qa_list'] = p['qas']
writer.write(json.dumps(q)+'\n')
writer.close()
因为 它太大了无法显示 source diff 。你可以改为 查看blob
因为 它太大了无法显示 source diff 。你可以改为 查看blob
user define model dataset
import paddlepalm as palm
if __name__ == '__main__':
controller = palm.Controller('config.yaml', task_dir='task_instance')
controller.load_pretrain('pretrain_model/ernie/params')
controller.train()
controller = palm.Controller(config='config.yaml', task_dir='task_instance', for_train=False)
controller.pred('mrqa', inference_model_dir='output_model/firstrun/infer_model')
......@@ -21,6 +21,10 @@ else
exit 1
fi
if [[ ! -d pretrain_model ]]; then
mkdir pretrain_model
fi
cd pretrain_model
mkdir $name
cd $name
......
main_task: "reading_comprehension"
auxiliary_task: "mask_language_model answer_matching"
do_train: True
do_predict: True
use_cuda: True
checkpoint_path: "output_model/firstrun"
backbone_model: "bert_model"
pretrain_model_path: "pretrain_model/bert"
pretrain_config_path: "pretrain_model/bert/bert_config.json"
vocab_path: "pretrain_model/bert/vocab.txt"
# backbone_model: "ernie_model"
# pretrain_model_path: "pretrain_model/ernie/params"
# pretrain_config_path: "pretrain_model/ernie/ernie_config.json"
# vocab_path: "pretrain_model/ernie/vocab.txt"
optimizer: "bert_optimizer"
learning_rate: 3e-5
lr_scheduler: "linear_warmup_decay"
skip_steps: 10
save_steps: 10000
epoch: 2
warmup_proportion: 0.1
weight_decay: 0.1
do_lower_case: True
max_seq_len: 512
use_ema: True
ema_decay: 0.9999
random_seed: 0
loss_scaling: 1.0
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# -*- coding: utf-8 -*-
import os
import sys
import time
import argparse
import importlib
import collections
import numpy as np
import multiprocessing
import paddle
import paddle.fluid as fluid
from utils.configure import PDConfig
from utils.placeholder import Placeholder
from utils.configure import JsonConfig, ArgumentGroup, print_arguments
from utils.init import init_pretraining_params, init_checkpoint
sys.path.append("reader")
import joint_reader
from joint_reader import create_reader
sys.path.append("optimizer")
sys.path.append("paradigm")
sys.path.append("backbone")
TASKSET_PATH="config"
def train(multitask_config):
# load task config
print("Loading multi_task configure...................")
args = PDConfig(yaml_file=[multitask_config])
args.build()
index = 0
reader_map_task = dict()
task_args_list = list()
reader_args_list = list()
id_map_task = {index: args.main_task}
print("Loading main task configure....................")
main_task_name = args.main_task
task_config_files = [i for i in os.listdir(TASKSET_PATH) if i.endswith('.yaml')]
main_config_list = [config for config in task_config_files if config.split('.')[0] == main_task_name]
main_args = None
for config in main_config_list:
main_yaml = os.path.join(TASKSET_PATH, config)
main_args = PDConfig(yaml_file=[multitask_config, main_yaml])
main_args.build()
main_args.Print()
if not task_args_list or main_task_name != task_args_list[-1][0]:
task_args_list.append((main_task_name, main_args))
reader_args_list.append((config.strip('.yaml'), main_args))
reader_map_task[config.strip('.yaml')] = main_task_name
print("Loading auxiliary tasks configure...................")
aux_task_name_list = args.auxiliary_task.strip().split()
for aux_task_name in aux_task_name_list:
index += 1
id_map_task[index] = aux_task_name
print("Loading %s auxiliary tasks configure......." % aux_task_name)
aux_config_list = [config for config in task_config_files if config.split('.')[0] == aux_task_name]
for aux_yaml in aux_config_list:
aux_yaml = os.path.join(TASKSET_PATH, aux_yaml)
aux_args = PDConfig(yaml_file=[multitask_config, aux_yaml])
aux_args.build()
aux_args.Print()
if aux_task_name != task_args_list[-1][0]:
task_args_list.append((aux_task_name, aux_args))
reader_args_list.append((aux_yaml.strip('.yaml'), aux_args))
reader_map_task[aux_yaml.strip('.yaml')] = aux_task_name
# import tasks reader module and build joint_input_shape
input_shape_list = []
reader_module_dict = {}
input_shape_dict = {}
for params in task_args_list:
task_reader_mdl = "%s_reader" % params[0]
reader_module = importlib.import_module(task_reader_mdl)
reader_servlet_cls = getattr(reader_module, "get_input_shape")
reader_input_shape = reader_servlet_cls(params[1])
reader_module_dict[params[0]] = reader_module
input_shape_list.append(reader_input_shape)
input_shape_dict[params[0]] = reader_input_shape
train_input_shape, test_input_shape, task_map_id = joint_reader.joint_input_shape(input_shape_list)
# import backbone model
backbone_mdl = args.backbone_model
backbone_cls = "Model"
backbone_module = importlib.import_module(backbone_mdl)
backbone_servlet = getattr(backbone_module, backbone_cls)
if not (args.do_train or args.do_predict):
raise ValueError("For args `do_train` and `do_predict`, at "
"least one of them must be True.")
if args.use_cuda:
place = fluid.CUDAPlace(0)
dev_count = fluid.core.get_cuda_device_count()
else:
place = fluid.CPUPlace()
dev_count = int(os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
exe = fluid.Executor(place)
startup_prog = fluid.default_startup_program()
if args.random_seed is not None:
startup_prog.random_seed = args.random_seed
if args.do_train:
#create joint pyreader
print('creating readers...')
gens = []
main_generator = ""
for params in reader_args_list:
generator_cls = getattr(reader_module_dict[reader_map_task[params[0]]], "DataProcessor")
generator_inst = generator_cls(params[1])
reader_generator = generator_inst.data_generator(phase='train', shuffle=True, dev_count=dev_count)
if not main_generator:
main_generator = generator_inst
gens.append((reader_generator, params[1].mix_ratio, reader_map_task[params[0]]))
joint_generator, train_pyreader, model_inputs = create_reader("train_reader", train_input_shape, True, task_map_id, gens)
train_pyreader.decorate_tensor_provider(joint_generator)
# build task inputs
task_inputs_list = []
main_test_input = []
task_id = model_inputs[0]
backbone_inputs = model_inputs[task_map_id[0][0]: task_map_id[0][1]]
for i in range(1, len(task_map_id)):
task_inputs = backbone_inputs + model_inputs[task_map_id[i][0]: task_map_id[i][1]]
task_inputs_list.append(task_inputs)
# build backbone model
print('building model backbone...')
conf = vars(args)
if args.pretrain_config_path is not None:
model_conf = JsonConfig(args.pretrain_config_path).asdict()
for k, v in model_conf.items():
if k in conf:
assert k == conf[k], "ERROR: argument {} in pretrain_model_config is NOT consistent with which in main.yaml"
conf.update(model_conf)
backbone_inst = backbone_servlet(conf, is_training=True)
print('building task models...')
num_train_examples = main_generator.get_num_examples()
if main_args.in_tokens:
max_train_steps = int(main_args.epoch * num_train_examples) // (
main_args.batch_size // main_args.max_seq_len) // dev_count
else:
max_train_steps = int(main_args.epoch * num_train_examples) // (
main_args.batch_size) // dev_count
mix_ratio_list = [task_args[1].mix_ratio for task_args in task_args_list]
args.max_train_steps = int(max_train_steps * (sum(mix_ratio_list) / main_args.mix_ratio))
print("Max train steps: %d" % max_train_steps)
build_strategy = fluid.BuildStrategy()
train_program = fluid.default_main_program()
with fluid.program_guard(train_program, startup_prog):
with fluid.unique_name.guard():
backbone_inst.build_model(backbone_inputs)
all_loss_list = []
for i in range(len(task_args_list)):
task_name = task_args_list[i][0]
task_args = task_args_list[i][1]
if hasattr(task_args, 'paradigm'):
task_net = task_args.paradigm
else:
task_net = task_name
task_net_mdl = importlib.import_module(task_net)
task_net_cls = getattr(task_net_mdl, "create_model")
output_tensor = task_net_cls(task_inputs_list[i], base_model=backbone_inst, is_training=True, args=task_args)
loss_cls = getattr(task_net_mdl, "compute_loss")
task_loss = loss_cls(output_tensor, task_args)
all_loss_list.append(task_loss)
num_seqs = output_tensor['num_seqs']
task_one_hot = fluid.layers.one_hot(task_id, len(task_args_list))
all_loss = fluid.layers.concat(all_loss_list, axis=0)
loss = fluid.layers.reduce_sum(task_one_hot * all_loss)
programs = [train_program, startup_prog]
optimizer_mdl = importlib.import_module(args.optimizer)
optimizer_inst = getattr(optimizer_mdl, "optimization")
optimizer_inst(loss, programs, args=args)
loss.persistable = True
num_seqs.persistable = True
ema = fluid.optimizer.ExponentialMovingAverage(args.ema_decay)
ema.update()
train_compiled_program = fluid.CompiledProgram(train_program).with_data_parallel(
loss_name=loss.name, build_strategy=build_strategy)
if args.do_predict:
conf = vars(args)
if args.pretrain_config_path is not None:
model_conf = JsonConfig(args.pretrain_config_path).asdict()
for k, v in model_conf.items():
if k in conf:
assert v == conf[k], "ERROR: argument {} in pretrain_model_config is NOT consistent with which in main.yaml".format(k)
conf.update(model_conf)
mod = reader_module_dict[main_task_name]
DataProcessor = getattr(mod, 'DataProcessor')
predict_processor = DataProcessor(main_args)
test_generator = predict_processor.data_generator(
phase='predict',
shuffle=False,
dev_count=dev_count)
new_test_input_shape = input_shape_dict[main_task_name][1]['backbone'] + input_shape_dict[main_task_name][1]['task']
assert new_test_input_shape == test_input_shape
build_strategy = fluid.BuildStrategy()
test_prog = fluid.Program()
with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard():
placeholder = Placeholder(test_input_shape)
test_pyreader, model_inputs = placeholder.build(
capacity=100, reader_name="test_reader")
test_pyreader.decorate_tensor_provider(test_generator)
# create model
backbone_inst = backbone_servlet(conf, is_training=False)
backbone_inst.build_model(model_inputs)
task_net_mdl = importlib.import_module(main_task_name)
task_net_cls = getattr(task_net_mdl, "create_model")
postprocess = getattr(task_net_mdl, "postprocess")
global_postprocess = getattr(task_net_mdl, "global_postprocess")
output_tensor = task_net_cls(model_inputs, base_model=backbone_inst, is_training=False, args=main_args)
if 'ema' not in dir():
ema = fluid.optimizer.ExponentialMovingAverage(args.ema_decay)
pred_fetch_names = []
fetch_vars = []
for i,j in output_tensor.items():
pred_fetch_names.append(i)
fetch_vars.append(j)
for var in fetch_vars:
var.persistable = True
pred_fetch_list = [i.name for i in fetch_vars]
test_prog = test_prog.clone(for_test=True)
test_compiled_program = fluid.CompiledProgram(test_prog).with_data_parallel(
build_strategy=build_strategy)
exe.run(startup_prog)
if args.do_train:
if args.pretrain_model_path:
init_pretraining_params(
exe,
args.pretrain_model_path,
main_program=startup_prog,
use_fp16=False)
if args.checkpoint_path:
if os.path.exists(args.checkpoint_path):
init_checkpoint(
exe,
args.checkpoint_path,
main_program=startup_prog,
use_fp16=False)
else:
os.makedirs(args.checkpoint_path)
elif args.do_predict:
if not args.checkpoint_path:
raise ValueError("args 'checkpoint_path' should be set if"
"only doing prediction!")
init_checkpoint(
exe,
args.checkpoint_path,
main_program=test_prog,
use_fp16=False)
if args.do_train:
print('start training...')
train_pyreader.start()
steps = 0
total_cost, total_num_seqs = [], []
time_begin = time.time()
while True:
try:
steps += 1
if steps % args.skip_steps == 0:
fetch_list = [loss.name, num_seqs.name, task_id.name]
else:
fetch_list = []
outputs = exe.run(train_compiled_program, fetch_list=fetch_list)
if steps % args.skip_steps == 0:
np_loss, np_num_seqs, np_task_id = outputs
total_cost.extend(np_loss * np_num_seqs)
total_num_seqs.extend(np_num_seqs)
time_end = time.time()
used_time = time_end - time_begin
current_example, epoch = main_generator.get_train_progress()
cur_task_name = id_map_task[np_task_id[0][0]]
print("epoch: %d, task_name: %s, progress: %d/%d, step: %d, loss: %f, "
"speed: %f steps/s" %
(epoch, cur_task_name, current_example, num_train_examples, steps,
np.sum(total_cost) / np.sum(total_num_seqs),
args.skip_steps / used_time))
total_cost, total_num_seqs = [], []
time_begin = time.time()
if steps % args.save_steps == 0:
save_path = os.path.join(args.checkpoint_path,
"step_" + str(steps))
fluid.io.save_persistables(exe, save_path, train_program)
if steps == max_train_steps:
save_path = os.path.join(args.checkpoint_path,
"step_" + str(steps) + "_final")
fluid.io.save_persistables(exe, save_path, train_program)
break
except paddle.fluid.core.EOFException as err:
save_path = os.path.join(args.checkpoint_path,
"step_" + str(steps) + "_final")
fluid.io.save_persistables(exe, save_path, train_program)
train_pyreader.reset()
break
if args.do_predict:
print('start predicting...')
cnt = 0
if args.use_ema:
with ema.apply(exe):
test_pyreader.start()
pred_buf = []
while True:
try:
fetch_res = exe.run(fetch_list=pred_fetch_list, program=test_compiled_program)
cnt += 1
if cnt % 200 == 0:
print('predicting {}th batch...'.format(cnt))
fetch_dict = {}
for key,val in zip(pred_fetch_names, fetch_res):
fetch_dict[key] = val
res = postprocess(fetch_dict)
if res is not None:
pred_buf.extend(res)
except fluid.core.EOFException:
test_pyreader.reset()
break
global_postprocess(pred_buf, predict_processor, args, main_args)
else:
test_pyreader.start()
pred_buf = []
while True:
try:
fetch_res = exe.run(fetch_list=pred_fetch_list, program=test_compiled_program)
cnt += 1
if cnt % 200 == 0:
print('predicting {}th batch...'.format(cnt))
fetch_dict = {}
for key,val in zip(pred_fetch_names, fetch_res):
fetch_dict[key] = val
res = postprocess(fetch_dict)
if res is not None:
pred_buf.extend(res)
except fluid.core.EOFException:
test_pyreader.reset()
break
global_postprocess(pred_buf, predict_processor, args, main_args)
if __name__ == '__main__':
multitask_config = "mtl_config.yaml"
train(multitask_config)
该目录存放优化器,用户可通过实现相关接口完成自定义
import sys
from paddlepalm.mtl_controller import Controller
sys.path.append('paddlepalm')
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -11,24 +12,27 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT model"""
"""v1.1
BERT model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from paddle import fluid
from paddle.fluid import layers
import backbone.utils.transformer as transformer
from paddlepalm.backbone.utils.transformer import pre_process_layer, encoder
from paddlepalm.interface import backbone
class Model(object):
class Model(backbone):
def __init__(self,
config,
is_training=False,
model_name=''):
phase):
# self._is_training = phase == 'train' # backbone一般不用关心运行阶段,因为outputs在任何阶段基本不会变
self._emb_size = config["hidden_size"]
self._n_layer = config["num_hidden_layers"]
self._n_head = config["num_attention_heads"]
......@@ -39,8 +43,6 @@ class Model(object):
self._prepostprocess_dropout = config["hidden_dropout_prob"]
self._attention_dropout = config["attention_probs_dropout_prob"]
self._is_training = is_training
self.model_name = model_name
self._word_emb_name = self.model_name + "word_embedding"
......@@ -52,35 +54,48 @@ class Model(object):
self._param_initializer = fluid.initializer.TruncatedNormal(
scale=config["initializer_range"])
def build_model(self, reader_input, use_fp16=False):
dtype = "float16" if use_fp16 else "float32"
@property
def inputs_attr(self):
return {"token_ids": [-1, self._max_position_seq_len, 1], 'int64'],
"position_ids": [-1, self._max_position_seq_len, 1], 'int64'],
"segment_ids": [-1, self._max_position_seq_len, 1], 'int64'],
"input_mask": [-1, self._max_position_seq_len, 1], 'float32']}
src_ids, pos_ids, sent_ids, input_mask = reader_input[:4]
@property
def outputs_attr(self):
return {"word_emb": [-1, self._max_position_seq_len, self._emb_size],
"sentence_emb": [-1, self._emb_size],
"sentence_pair_emb": [-1, self._emb_size]}
def build(self, inputs):
src_ids = inputs['token_ids']
pos_ids = inputs['position_ids']
sent_ids = inputs['segment_ids']
input_mask = inputs['input_mask']
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(
emb_out = layers.embedding(
input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=dtype,
dtype="float32",
param_attr=fluid.ParamAttr(
name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
self.emb_out = emb_out
position_emb_out = fluid.layers.embedding(
position_emb_out = layers.embedding(
input=pos_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=dtype,
dtype="float32",
param_attr=fluid.ParamAttr(
name=self._pos_emb_name, initializer=self._param_initializer))
self.position_emb_out = position_emb_out
sent_emb_out = fluid.layers.embedding(
sent_emb_out = layers.embedding(
sent_ids,
size=[self._sent_types, self._emb_size],
dtype=dtype,
dtype="float32"
param_attr=fluid.ParamAttr(
name=self._sent_emb_name, initializer=self._param_initializer))
......@@ -88,24 +103,21 @@ class Model(object):
emb_out = emb_out + position_emb_out + sent_emb_out
emb_out = transformer.pre_process_layer(
emb_out = pre_process_layer(
emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=dtype)
self_attn_mask = fluid.layers.matmul(
self_attn_mask = layers.matmul(
x = input_mask, y = input_mask, transpose_y = True)
self_attn_mask = fluid.layers.scale(
self_attn_mask = layers.scale(
x = self_attn_mask, scale = 10000.0, bias = -1.0, bias_after_scale = False)
n_head_self_attn_mask = fluid.layers.stack(
n_head_self_attn_mask = layers.stack(
x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = transformer.encoder(
enc_out = encoder(
enc_input = emb_out,
attn_bias = n_head_self_attn_mask,
n_layer = self._n_layer,
......@@ -123,9 +135,9 @@ class Model(object):
param_initializer = self._param_initializer,
name = self.model_name + 'encoder')
next_sent_feat = fluid.layers.slice(
input = self._enc_out, axes = [1], starts = [0], ends = [1])
self.next_sent_feat = fluid.layers.fc(
next_sent_feat = layers.slice(
input = enc_out, axes = [1], starts = [0], ends = [1])
next_sent_feat = layers.fc(
input = next_sent_feat,
size = self._emb_size,
act = "tanh",
......@@ -133,19 +145,12 @@ class Model(object):
name = self.model_name + "pooled_fc.w_0",
initializer = self._param_initializer),
bias_attr = "pooled_fc.b_0")
@property
def final_word_representation(self):
"""final layer output of transformer encoder as the (contextual) word representation"""
return self._enc_out
@property
def final_sentence_representation(self):
"""final representation of the first token ([CLS]) as sentence representation """
return self.next_sent_feat
return {'word_emb': enc_out,
'sentence_emb': next_sent_feat,
'sentence_pair_emb': next_sent_feat}
if __name__ == "__main__":
print("hello world!")
def postprocess(self, rt_outputs):
pass
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from paddle.fluid import layers
class Model(backbone):
def __init__(self, config, phase):
# self._is_training = phase == 'train' # backbone一般不用关心运行阶段,因为outputs在任何阶段基本不会变
self._emb_size = config["emb_size"]
self._voc_size = config["vocab_size"]
@property
def inputs_attr(self):
return {"token_ids": [-1, self._max_position_seq_len, 1], 'int64']}
@property
def outputs_attr(self):
return {"word_emb": [-1, self._max_position_seq_len, self._emb_size],
"sentence_emb": [-1, self._emb_size*2]}
def build(self, inputs):
tok_ids = inputs['token_ids']
emb_out = layers.embedding(
input=tok_ids,
size=[self._voc_size, self._emb_size],
dtype='float32',
param_attr=fluid.ParamAttr(
name='word_emb',
initializer=fluid.initializer.TruncatedNormal(scale=0.1)),
is_sparse=False)
sent_emb1 = layers.reduce_mean(emb_out, axis=1)
sent_emb2 = layers.reduce_max(emb_out, axis=1)
sent_emb = layers.concat([sent_emb1, sent_emb2], axis=1)
return {'word_emb': emb_out,
'sentence_emb': sent_emb}
def postprocess(self, rt_outputs):
pass
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -19,16 +20,20 @@ from __future__ import print_function
from __future__ import unicode_literals
from __future__ import absolute_import
import paddle.fluid as fluid
from paddle import fluid
from paddle.fluid import layers
import backbone.utils.transformer4ernie as transformer
from paddlepalm.backbone.utils.transformer import pre_process_layer, encoder
from paddlepalm.interface import backbone
class Model(object):
class Model(backbone):
def __init__(self,
config,
is_training=False,
):
phase):
# self._is_training = phase == 'train' # backbone一般不用关心运行阶段,因为outputs在任何阶段基本不会变
self._emb_size = config['hidden_size']
self._n_layer = config['num_hidden_layers']
......@@ -40,6 +45,8 @@ class Model(object):
else:
self._sent_types = config['type_vocab_size']
self._task_types = config['task_type_vocab_size']
self._hidden_act = config['hidden_act']
self._prepostprocess_dropout = config['hidden_dropout_prob']
self._attention_dropout = config['attention_probs_dropout_prob']
......@@ -53,12 +60,29 @@ class Model(object):
self._param_initializer = fluid.initializer.TruncatedNormal(
scale=config['initializer_range'])
@property
def inputs_attr(self):
return {"token_ids": [[-1, -1, 1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'],
"task_ids": [[-1,-1, 1], 'int64']}
@property
def outputs_attr(self):
return {"word_embedding": [[-1, -1, self._emb_size], 'float32'],
"encoder_outputs": [[-1, -1, self._emb_size], 'float32'],
"sentence_embedding": [[-1, self._emb_size], 'float32'],
"sentence_pair_embedding": [[-1, self._emb_size], 'float32']}
def build_model(self, reader_input, use_fp16=False):
def build(self, inputs):
dtype = "float16" if use_fp16 else "float32"
src_ids = inputs['token_ids']
pos_ids = inputs['position_ids']
sent_ids = inputs['segment_ids']
input_mask = inputs['input_mask']
task_ids = inputs['task_ids']
src_ids, pos_ids, sent_ids, input_mask = reader_input[:4]
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(
input=src_ids,
......@@ -85,12 +109,19 @@ class Model(object):
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = transformer.pre_process_layer(
task_emb_out = fluid.layers.embedding(
task_ids,
size=[self._task_types, self._emb_size],
dtype=self._emb_dtype,
param_attr=fluid.ParamAttr(
name=self._task_emb_name,
initializer=self._param_initializer))
emb_out = emb_out + task_emb_out
emb_out = pre_process_layer(
emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if dtype == "float16":
emb_out = fluid.layers.cast(x=emb_out, dtype=dtype)
input_mask = fluid.layers.cast(x=input_mask, dtype=dtype)
self_attn_mask = fluid.layers.matmul(
x=input_mask, y=input_mask, transpose_y=True)
......@@ -100,7 +131,7 @@ class Model(object):
x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = transformer.encoder(
enc_out = encoder(
enc_input=emb_out,
attn_bias=n_head_self_attn_mask,
n_layer=self._n_layer,
......@@ -117,20 +148,11 @@ class Model(object):
postprocess_cmd="dan",
param_initializer=self._param_initializer,
name='encoder')
if dtype == "float16":
self._enc_out = fluid.layers.cast(
x=self._enc_out, dtype=self._emb_dtype)
@property
def final_word_representation(self):
return self._enc_out
@property
def final_sentence_representation(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(
input=self._enc_out, axes=[1], starts=[0], ends=[1])
input=enc_out, axes=[1], starts=[0], ends=[1])
next_sent_feat = fluid.layers.reshape(next_sent_feat, [-1, next_sent_feat.shape[-1]])
next_sent_feat = fluid.layers.fc(
input=next_sent_feat,
size=self._emb_size,
......@@ -138,5 +160,11 @@ class Model(object):
param_attr=fluid.ParamAttr(
name="pooled_fc.w_0", initializer=self._param_initializer),
bias_attr="pooled_fc.b_0")
return next_sent_feat
return {'word_embedding': emb_out,
'encoder_outputs': enc_out,
'sentence_embedding': next_sent_feat,
'sentence_pair_embedding': next_sent_feat}
def postprocess(self, rt_outputs):
pass
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -22,7 +23,6 @@ from functools import partial
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
......
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -11,49 +12,31 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# -*- coding: utf-8 -*-
import paddle.fluid as fluid
BACKBONE_DIR='paddlepalm.backbone'
TASK_INSTANCE_DIR='paddlepalm.task_instance'
READER_DIR='paddlepalm.reader'
PARADIGM_DIR='paddlepalm.task_paradigm'
OPTIMIZER_DIR='paddlepalm.optimizer'
OPTIMIZE_METHOD='optimize'
REQUIRED_ARGS={
'task_instance': str,
'backbone': str,
'optimizer': str,
'learning_rate': float,
'batch_size': int
}
OPTIONAL_ARGS={
'mix_ratio': str,
'target_tag': str,
'reuse_rag': str
}
TASK_REQUIRED_ARGS={
'paradigm': str,
'reader': str,
'train_file': str
}
def compute_loss(output_tensors, args=None):
"""Compute loss for mrc model"""
labels = output_tensors['labels']
logits = output_tensors['logits']
ce_loss, probs = fluid.layers.softmax_with_cross_entropy(
logits=logits, label=labels, return_softmax=True)
loss = fluid.layers.mean(x=ce_loss)
return loss
def create_model(reader_input, base_model=None, is_training=True, args=None):
"""
given the base model, reader_input
return the output tensors
"""
labels = reader_input[-1]
cls_feats = base_model.final_sentence_representation
cls_feats = fluid.layers.dropout(
x=cls_feats,
dropout_prob=0.1,
dropout_implementation="upscale_in_train")
logits = fluid.layers.fc(
input=cls_feats,
size=2,
param_attr=fluid.ParamAttr(
name="cls_out_w",
initializer=fluid.initializer.TruncatedNormal(scale=0.02)),
bias_attr=fluid.ParamAttr(
name="cls_out_b", initializer=fluid.initializer.Constant(0.)))
num_seqs = fluid.layers.fill_constant(shape=[1], value=512, dtype='int64')
output_tensors = {}
output_tensors['labels'] = labels
output_tensors['logits'] = logits
output_tensors['num_seqs'] = num_seqs
return output_tensors
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""v1.1"""
class reader(object):
"""interface of data manager."""
def __init__(self, config):
assert isinstance(config, dict)
# @property
# def inputs_attr(self):
# """描述reader输入对象的属性,包含各个对象的名字、shape以及数据类型。当某个对象为标量数据类型(如str, int, float等)时,shape设置为空列表[],当某个对象的某个维度长度可变时,shape中的相应维度设置为-1.
# Return:
# dict类型。对各个输入对象的属性描述。例如,
# 对于文本分类任务,可能需要包含输入文本和所属标签的id
# {"text": ([], 'str'),
# "label": ([], 'int')}
# 对于标注任务,可能需要输入词序列和对应的标签
# {"tokens", ([-1], 'str'),
# "tags", ([-1], 'str')}
# 对于机器阅读理解任务,可能需要包含上下文、问题、回答、答案区域的起止位置等
# {"paragraph", ([], 'str'),
# "question", ([], 'str'),
# "start_position", ([], 'int')
# """
# raise NotImplementedError()
@property
def outputs_attr(self):
"""描述reader输出对象(被yield出的对象)的属性,包含各个对象的名字、shape以及数据类型。当某个对象为标量数据类型(如str, int, float等)时,shape设置为空列表[],当某个对象的某个维度长度可变时,shape中的相应维度设置为-1。
注意:当使用mini-batch梯度下降学习策略时,,应为常规的输入对象设置batch_size维度(一般为-1)
Return:
dict类型。对各个输入对象的属性描述。例如,
对于文本分类和匹配任务,yield的输出内容可能包含如下的对象(下游backbone和task可按需访问其中的对象)
{"token_ids": ([-1, max_len], 'int64'),
"input_ids": ([-1, max_len], 'int64'),
"segment_ids": ([-1, max_len], 'int64'),
"input_mask": ([-1, max_len], 'float32'),
"label": ([-1], 'int')}
"""
raise NotImplementedError()
# def parse_line(self):
# """框架内部使用字典描述每个样本,字典的key为inputs_attr,value为每个input对应的符合attr描述的值。
# 该函数负责将文本行解析成符合inputs_attr描述的字典类型的样本。默认的parse_line方法会读取json格式的数据集文件,数据集的每一行为json格式描述的样本。
# 用户可通过对该方法的继承改写来适配不同格式的数据集,例如csv格式甚至tfrecord文件。
# """
# raise NotImplementedError()
#
# def tokenize(self, line):
# """框架中内置了word piece tokenizer等分词器,用户可通过修改tokenizer超参数来制定使用的分词器,若内置的分词器均无法满足需求,用户可通过对该方法的继承改写来自定义分词器。
# Args:
# - line: a unicode string.
# Return:
# a list of tokens
# """
# raise NotImplementedError()
def iterator(self):
"""数据集遍历接口,注意,当数据集遍历到尾部时该接口应自动完成指针重置,即重新从数据集头部开始新的遍历。
Yield:
(dict) elements that meet the requirements in output_templete
"""
raise NotImplementedError()
@property
def num_examples(self):
"""数据集中的样本数量,即每个epoch中iterator所生成的样本数。注意,使用滑动窗口等可能导致数据集样本数发生变化的策略时,该接口应返回runtime阶段的实际样本数。"""
raise NotImplementedError()
class backbone(object):
"""interface of backbone model."""
def __init__(self, config, phase):
"""
Args:
config: dict类型。描述了 多任务配置文件+预训练模型配置文件 中定义超参数
phase: str类型。运行阶段,目前支持train和predict
"""
assert isinstance(config, dict)
@property
def inputs_attr(self):
"""描述backbone从reader处需要得到的输入对象的属性,包含各个对象的名字、shape以及数据类型。当某个对象为标量数据类型(如str, int, float等)时,shape设置为空列表[],当某个对象的某个维度长度可变时,shape中的相应维度设置为-1。
Return:
dict类型。对各个输入对象的属性描述。例如,
对于文本分类和匹配任务,bert backbone依赖的reader对象主要包含如下的对象
{"token_ids": ([-1, max_len], 'int64'),
"input_ids": ([-1, max_len], 'int64'),
"segment_ids": ([-1, max_len], 'int64'),
"input_mask": ([-1, max_len], 'float32')}"""
raise NotImplementedError()
@property
def outputs_attr(self):
"""描述backbone输出对象的属性,包含各个对象的名字、shape以及数据类型。当某个对象为标量数据类型(如str, int, float等)时,shape设置为空列表[],当某个对象的某个维度长度可变时,shape中的相应维度设置为-1。
Return:
dict类型。对各个输出对象的属性描述。例如,
对于文本分类和匹配任务,bert backbone的输出内容可能包含如下的对象
{"word_emb": ([-1, max_seqlen, word_emb_size], 'float32'),
"sentence_emb": ([-1, hidden_size], 'float32'),
"sim_vec": ([-1, hidden_size], 'float32')}"""
raise NotImplementedError()
def build(self, inputs):
"""建立backbone的计算图。将符合inputs_attr描述的静态图Variable输入映射成符合outputs_attr描述的静态图Variable输出。
Args:
inputs: dict类型。字典中包含inputs_attr中的对象名到计算图Variable的映射,inputs中至少会包含inputs_attr中定义的对象
Return:
需要输出的计算图变量,输出对象会被加入到fetch_list中,从而在每个训练/推理step时得到runtime的计算结果,该计算结果会被传入postprocess方法中供用户处理。
"""
raise NotImplementedError()
class task_paradigm(object):
def __init__(self, config, phase, backbone_config):
"""
config: dict类型。描述了 任务实例(task instance)+多任务配置文件 中定义超参数
phase: str类型。运行阶段,目前支持train和predict
"""
@property
def inputs_attrs(self):
"""描述task_layer需要从reader, backbone等输入对象集合所读取到的输入对象的属性,第一级key为对象集和的名字,如backbone,reader等(后续会支持更灵活的输入),第二级key为对象集和中各对象的属性,包括对象的名字,shape和dtype。当某个对象为标量数据类型(如str, int, float等)时,shape设置为空列表[],当某个对象的某个维度长度可变时,shape中的相应维度设置为-1。
Return:
dict类型。对各个对象集及其输入对象的属性描述。"""
raise NotImplementedError()
@property
def outputs_attr(self):
"""描述task输出对象的属性,包括对象的名字,shape和dtype。输出对象会被加入到fetch_list中,从而在每个训练/推理step时得到runtime的计算结果,该计算结果会被传入postprocess方法中供用户处理。
当某个对象为标量数据类型(如str, int, float等)时,shape设置为空列表[],当某个对象的某个维度长度可变时,shape中的相应维度设置为-1。
Return:
dict类型。对各个输入对象的属性描述。注意,训练阶段必须包含名为loss的输出对象。
"""
raise NotImplementedError()
def build(self, inputs):
"""建立task_layer的计算图。将符合inputs_attrs描述的来自各个对象集的静态图Variables映射成符合outputs_attr描述的静态图Variable输出。
Args:
inputs: dict类型。字典中包含inputs_attrs中的对象名到计算图Variable的映射,inputs中至少会包含inputs_attr中定义的对象
Return:
需要输出的计算图变量,输出对象会被加入到fetch_list中,从而在每个训练/推理step时得到runtime的计算结果,该计算结果会被传入postprocess方法中供用户处理。
"""
raise NotImplementedError()
def postprocess(self, rt_outputs):
"""每个训练或推理step后针对当前batch的task_layer的runtime计算结果进行相关后处理。注意,rt_outputs除了包含build方法,还自动包含了loss的计算结果。"""
pass
def post_postprocess(self, global_buffer):
pass
此差异已折叠。
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -48,25 +49,23 @@ def linear_warmup_decay(learning_rate, warmup_steps, num_train_steps):
return lr
def optimization(loss, programs, args):
train_program = programs[0]
startup_prog = programs[1]
warmup_steps = args.max_train_steps * args.warmup_proportion
def optimize(loss, config, max_train_steps=None, warmup_steps=0, train_program=None):
if warmup_steps > 0:
if args.lr_scheduler == 'noam_decay':
decay_strategy = config.get('lr_scheduler', 'linear_warmup_decay')
if decay_strategy == 'noam_decay':
scheduled_lr = fluid.layers.learning_rate_scheduler\
.noam_decay(1/(warmup_steps *(float(args.learning_rate) ** 2)),
.noam_decay(1/(warmup_steps *(config['learning_rate'] ** 2)),
warmup_steps)
elif args.lr_scheduler == 'linear_warmup_decay':
scheduled_lr = linear_warmup_decay(float(args.learning_rate), warmup_steps,
args.max_train_steps)
elif decay_strategy == 'linear_warmup_decay':
scheduled_lr = linear_warmup_decay(config['learning_rate'], warmup_steps,
max_train_steps)
else:
raise ValueError("Unkown learning rate scheduler, should be "
raise ValueError("Unkown lr_scheduler, should be "
"'noam_decay' or 'linear_warmup_decay'")
optimizer = fluid.optimizer.Adam(learning_rate=scheduled_lr)
else:
optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
scheduled_lr = args.learning_rate
optimizer = fluid.optimizer.Adam(learning_rate=config['learning_rate'])
scheduled_lr = config['learning_rate']
clip_norm_thres = 1.0
# When using mixed precision training, scale the gradient clip threshold
......@@ -91,13 +90,19 @@ def optimization(loss, programs, args):
_, param_grads = optimizer.minimize(loss)
if args.weight_decay > 0:
for block in fluid.default_main_program().blocks:
for var_name in block.vars:
if var_name.startswith("embedding"):
print(block.vars[var_name])
if config.get('weight_decay', 0) > 0:
for param, grad in param_grads:
if exclude_from_weight_decay(param.name):
continue
with param.block.program._optimized_guard(
[param, grad]), fluid.framework.name_scope("weight_decay"):
updated_param = param - param_list[
param.name] * args.weight_decay * scheduled_lr
param.name] * config['weight_decay'] * scheduled_lr
fluid.layers.assign(output=param, input=updated_param)
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddlepalm.interface import reader
from paddlepalm.reader.utils.reader4ernie import ClassifyReader
class Reader(reader):
def __init__(self, config, phase='train', dev_count=1, print_prefix=''):
"""
Args:
phase: train, eval, pred
"""
self._is_training = phase == 'train'
reader = ClassifyReader(config['vocab_path'],
max_seq_len=config['max_seq_len'],
do_lower_case=config.get('do_lower_case', False),
for_cn=config.get('for_cn', False),
random_seed=config.get('seed', None))
self._reader = reader
self._dev_count = dev_count
self._batch_size = config['batch_size']
self._max_seq_len = config['max_seq_len']
if phase == 'train':
self._input_file = config['train_file']
self._num_epochs = None # 防止iteartor终止
self._shuffle = config.get('shuffle', False)
self._shuffle_buffer = config.get('shuffle_buffer', 5000)
elif phase == 'eval':
self._input_file = config['dev_file']
self._num_epochs = 1
self._shuffle = False
self._batch_size = config.get('pred_batch_size', self._batch_size)
elif phase == 'pred':
self._input_file = config['pred_file']
self._num_epochs = 1
self._shuffle = False
self._batch_size = config.get('pred_batch_size', self._batch_size)
self._phase = phase
# self._batch_size =
self._print_first_n = config.get('print_first_n', 1)
@property
def outputs_attr(self):
if self._is_training:
return {"token_ids": [[-1, -1, 1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'],
"label_ids": [[-1,1], 'int64'],
"task_ids": [[-1, -1, 1], 'int64']
}
else:
return {"token_ids": [[-1, -1, 1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'],
"task_ids": [[-1, -1, 1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32']
}
def load_data(self):
self._data_generator = self._reader.data_generator(self._input_file, self._batch_size, self._num_epochs, dev_count=self._dev_count, shuffle=self._shuffle, phase=self._phase)
def iterator(self):
def list_to_dict(x):
names = ['token_ids', 'segment_ids', 'position_ids', 'task_ids', 'input_mask',
'label_ids', 'unique_ids']
outputs = {n: i for n,i in zip(names, x)}
del outputs['unique_ids']
if not self._is_training:
del outputs['label_ids']
return outputs
for batch in self._data_generator():
yield list_to_dict(batch)
def get_epoch_outputs(self):
return {'examples': self._reader.get_examples(self._phase),
'features': self._reader.get_features(self._phase)}
@property
def num_examples(self):
return self._reader.get_num_examples(phase=self._phase)
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddlepalm.interface import reader
from paddlepalm.reader.utils.reader4ernie import BaseReader
class Reader(reader):
def __init__(self, config, phase='train', dev_count=1, print_prefix=''):
"""
Args:
phase: train, eval, pred
"""
self._is_training = phase == 'train'
reader = ClassifyReader(config['vocab_path'],
max_seq_len=config['max_seq_len'],
do_lower_case=config.get('do_lower_case', False),
for_cn=config.get('for_cn', False),
random_seed=config.get('seed', None))
self._reader = reader
self._dev_count = dev_count
self._batch_size = config['batch_size']
self._max_seq_len = config['max_seq_len']
if phase == 'train':
self._input_file = config['train_file']
self._num_epochs = None # 防止iteartor终止
self._shuffle = config.get('shuffle', False)
self._shuffle_buffer = config.get('shuffle_buffer', 5000)
elif phase == 'eval':
self._input_file = config['dev_file']
self._num_epochs = 1
self._shuffle = False
self._batch_size = config.get('pred_batch_size', self._batch_size)
elif phase == 'pred':
self._input_file = config['pred_file']
self._num_epochs = 1
self._shuffle = False
self._batch_size = config.get('pred_batch_size', self._batch_size)
self._phase = phase
# self._batch_size =
self._print_first_n = config.get('print_first_n', 1)
@property
def outputs_attr(self):
if self._is_training:
return {"token_ids": [[-1, -1, 1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'],
"label_ids": [[-1,1], 'int64'],
"task_ids": [[-1, -1, 1], 'int64']
}
else:
return {"token_ids": [[-1, -1, 1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'],
"task_ids": [[-1, -1, 1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32']
}
def load_data(self):
self._data_generator = self._reader.data_generator(self._input_file, self._batch_size, self._num_epochs, dev_count=self._dev_count, shuffle=self._shuffle, phase=self._phase)
def iterator(self):
def list_to_dict(x):
names = ['token_ids', 'position_ids', 'segment_ids', 'input_mask',
'task_ids', 'mask_label', 'mask_pos']
outputs = {n: i for n,i in zip(names, x)}
del outputs['unique_ids']
if not self._is_training:
del outputs['label_ids']
return outputs
for batch in self._data_generator():
yield list_to_dict(batch)
def get_epoch_outputs(self):
return {'examples': self._reader.get_examples(self._phase),
'features': self._reader.get_features(self._phase)}
@property
def num_examples(self):
return self._reader.get_num_examples(phase=self._phase)
此差异已折叠。
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddlepalm.interface import reader
from paddlepalm.reader.utils.reader4ernie import MRCReader
class Reader(reader):
def __init__(self, config, phase='train', dev_count=1, print_prefix=''):
"""
Args:
phase: train, eval, pred
"""
self._is_training = phase == 'train'
reader = MRCReader(config['vocab_path'],
max_seq_len=config['max_seq_len'],
do_lower_case=config.get('do_lower_case', False),
tokenizer='FullTokenizer',
doc_stride=config['doc_stride'],
max_query_length=config['max_query_len'],
random_seed=config.get('seed', None))
self._reader = reader
self._dev_count = dev_count
self._batch_size = config['batch_size']
self._max_seq_len = config['max_seq_len']
if phase == 'train':
self._input_file = config['train_file']
# self._num_epochs = config['num_epochs']
self._num_epochs = None # 防止iteartor终止
self._shuffle = config.get('shuffle', False)
self._shuffle_buffer = config.get('shuffle_buffer', 5000)
if phase == 'eval':
self._input_file = config['dev_file']
self._num_epochs = 1
self._shuffle = False
self._batch_size = config.get('pred_batch_size', self._batch_size)
elif phase == 'pred':
self._input_file = config['pred_file']
self._num_epochs = 1
self._shuffle = False
self._batch_size = config.get('pred_batch_size', self._batch_size)
self._phase = phase
# self._batch_size =
self._print_first_n = config.get('print_first_n', 1)
# TODO: without slide window version
self._with_slide_window = config.get('with_slide_window', False)
@property
def outputs_attr(self):
if self._is_training:
return {"token_ids": [[-1, -1, 1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'],
"start_positions": [[-1, 1], 'int64'],
"end_positions": [[-1, 1], 'int64'],
"task_ids": [[-1, -1, 1], 'int64']
}
else:
return {"token_ids": [[-1, -1, 1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'],
"task_ids": [[-1, -1, 1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'],
"unique_ids": [[-1, 1], 'int64']
}
@property
def epoch_outputs_attr(self):
if not self._is_training:
return {"examples": None,
"features": None}
def load_data(self):
self._data_generator = self._reader.data_generator(self._input_file, self._batch_size, self._num_epochs, dev_count=self._dev_count, shuffle=self._shuffle, phase=self._phase)
def iterator(self):
def list_to_dict(x):
names = ['token_ids', 'segment_ids', 'position_ids', 'task_ids', 'input_mask',
'start_positions', 'end_positions', 'unique_ids']
outputs = {n: i for n,i in zip(names, x)}
if self._is_training:
del outputs['unique_ids']
else:
del outputs['start_positions']
del outputs['end_positions']
return outputs
for batch in self._data_generator():
yield list_to_dict(batch)
def get_epoch_outputs(self):
return {'examples': self._reader.get_examples(self._phase),
'features': self._reader.get_features(self._phase)}
@property
def num_examples(self):
return self._reader.get_num_examples(phase=self._phase)
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -11,7 +12,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# -*- coding: utf-8 -*-
"""Mask, padding and batching."""
from __future__ import absolute_import
from __future__ import division
......
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Mask, padding and batching."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from six.moves import xrange
def mask(batch_tokens,
seg_labels,
mask_word_tags,
total_token_num,
vocab_size,
CLS=1,
SEP=2,
MASK=3):
"""
Add mask for batch_tokens, return out, mask_label, mask_pos;
Note: mask_pos responding the batch_tokens after padded;
"""
max_len = max([len(sent) for sent in batch_tokens])
mask_label = []
mask_pos = []
prob_mask = np.random.rand(total_token_num)
# Note: the first token is [CLS], so [low=1]
replace_ids = np.random.randint(1, high=vocab_size, size=total_token_num)
pre_sent_len = 0
prob_index = 0
for sent_index, sent in enumerate(batch_tokens):
mask_flag = False
mask_word = mask_word_tags[sent_index]
prob_index += pre_sent_len
if mask_word:
beg = 0
for token_index, token in enumerate(sent):
seg_label = seg_labels[sent_index][token_index]
if seg_label == 1:
continue
if beg == 0:
if seg_label != -1:
beg = token_index
continue
prob = prob_mask[prob_index + beg]
if prob > 0.15:
pass
else:
for index in xrange(beg, token_index):
prob = prob_mask[prob_index + index]
base_prob = 1.0
if index == beg:
base_prob = 0.15
if base_prob * 0.2 < prob <= base_prob:
mask_label.append(sent[index])
sent[index] = MASK
mask_flag = True
mask_pos.append(sent_index * max_len + index)
elif base_prob * 0.1 < prob <= base_prob * 0.2:
mask_label.append(sent[index])
sent[index] = replace_ids[prob_index + index]
mask_flag = True
mask_pos.append(sent_index * max_len + index)
else:
mask_label.append(sent[index])
mask_pos.append(sent_index * max_len + index)
if seg_label == -1:
beg = 0
else:
beg = token_index
else:
for token_index, token in enumerate(sent):
prob = prob_mask[prob_index + token_index]
if prob > 0.15:
continue
elif 0.03 < prob <= 0.15:
# mask
if token != SEP and token != CLS:
mask_label.append(sent[token_index])
sent[token_index] = MASK
mask_flag = True
mask_pos.append(sent_index * max_len + token_index)
elif 0.015 < prob <= 0.03:
# random replace
if token != SEP and token != CLS:
mask_label.append(sent[token_index])
sent[token_index] = replace_ids[prob_index +
token_index]
mask_flag = True
mask_pos.append(sent_index * max_len + token_index)
else:
# keep the original token
if token != SEP and token != CLS:
mask_label.append(sent[token_index])
mask_pos.append(sent_index * max_len + token_index)
pre_sent_len = len(sent)
mask_label = np.array(mask_label).astype("int64").reshape([-1, 1])
mask_pos = np.array(mask_pos).astype("int64").reshape([-1, 1])
return batch_tokens, mask_label, mask_pos
def pad_batch_data(insts,
pad_idx=0,
return_pos=False,
return_input_mask=False,
return_max_len=False,
return_num_token=False,
return_seq_lens=False):
"""
Pad the instances to the max sequence length in batch, and generate the
corresponding position data and attention bias.
"""
return_list = []
max_len = max(len(inst) for inst in insts)
# Any token included in dict can be used to pad, since the paddings' loss
# will be masked out by weights and make no effect on parameter gradients.
inst_data = np.array(
[inst + list([pad_idx] * (max_len - len(inst))) for inst in insts])
return_list += [inst_data.astype("int64").reshape([-1, max_len, 1])]
# position data
if return_pos:
inst_pos = np.array([
list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst))
for inst in insts
])
return_list += [inst_pos.astype("int64").reshape([-1, max_len, 1])]
if return_input_mask:
# This is used to avoid attention on paddings.
input_mask_data = np.array([[1] * len(inst) + [0] *
(max_len - len(inst)) for inst in insts])
input_mask_data = np.expand_dims(input_mask_data, axis=-1)
return_list += [input_mask_data.astype("float32")]
if return_max_len:
return_list += [max_len]
if return_num_token:
num_token = 0
for inst in insts:
num_token += len(inst)
return_list += [num_token]
if return_seq_lens:
seq_lens = np.array([len(inst) for inst in insts])
return_list += [seq_lens.astype("int64").reshape([-1, 1])]
return return_list if len(return_list) > 1 else return_list[0]
if __name__ == "__main__":
pass
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Mask, padding and batching."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def mask(batch_tokens, total_token_num, vocab_size, CLS=1, SEP=2, MASK=3):
"""
Add mask for batch_tokens, return out, mask_label, mask_pos;
Note: mask_pos responding the batch_tokens after padded;
"""
max_len = max([len(sent) for sent in batch_tokens])
mask_label = []
mask_pos = []
prob_mask = np.random.rand(total_token_num)
# Note: the first token is [CLS], so [low=1]
replace_ids = np.random.randint(1, high=vocab_size, size=total_token_num)
pre_sent_len = 0
prob_index = 0
for sent_index, sent in enumerate(batch_tokens):
mask_flag = False
prob_index += pre_sent_len
for token_index, token in enumerate(sent):
prob = prob_mask[prob_index + token_index]
if prob > 0.15:
continue
elif 0.03 < prob <= 0.15:
# mask
if token != SEP and token != CLS:
mask_label.append(sent[token_index])
sent[token_index] = MASK
mask_flag = True
mask_pos.append(sent_index * max_len + token_index)
elif 0.015 < prob <= 0.03:
# random replace
if token != SEP and token != CLS:
mask_label.append(sent[token_index])
sent[token_index] = replace_ids[prob_index + token_index]
mask_flag = True
mask_pos.append(sent_index * max_len + token_index)
else:
# keep the original token
if token != SEP and token != CLS:
mask_label.append(sent[token_index])
mask_pos.append(sent_index * max_len + token_index)
pre_sent_len = len(sent)
# ensure at least mask one word in a sentence
while not mask_flag:
token_index = int(np.random.randint(1, high=len(sent) - 1, size=1))
if sent[token_index] != SEP and sent[token_index] != CLS:
mask_label.append(sent[token_index])
sent[token_index] = MASK
mask_flag = True
mask_pos.append(sent_index * max_len + token_index)
mask_label = np.array(mask_label).astype("int64").reshape([-1, 1])
mask_pos = np.array(mask_pos).astype("int64").reshape([-1, 1])
return batch_tokens, mask_label, mask_pos
def prepare_batch_data(insts,
total_token_num,
max_len=None,
voc_size=0,
pad_id=None,
cls_id=None,
sep_id=None,
mask_id=None,
task_id=0,
return_input_mask=True,
return_max_len=True,
return_num_token=False):
"""
1. generate Tensor of data
2. generate Tensor of position
3. generate self attention mask, [shape: batch_size * max_len * max_len]
"""
batch_src_ids = [inst[0] for inst in insts]
batch_sent_ids = [inst[1] for inst in insts]
batch_pos_ids = [inst[2] for inst in insts]
# First step: do mask without padding
out, mask_label, mask_pos = mask(
batch_src_ids,
total_token_num,
vocab_size=voc_size,
CLS=cls_id,
SEP=sep_id,
MASK=mask_id)
# Second step: padding
src_id, self_input_mask = pad_batch_data(
out,
max_len=max_len,
pad_idx=pad_id, return_input_mask=True)
pos_id = pad_batch_data(
batch_pos_ids,
max_len=max_len,
pad_idx=pad_id,
return_pos=False,
return_input_mask=False)
sent_id = pad_batch_data(
batch_sent_ids,
max_len=max_len,
pad_idx=pad_id,
return_pos=False,
return_input_mask=False)
task_ids = np.ones_like(
src_id, dtype="int64") * task_id
return_list = [
src_id, pos_id, sent_id, self_input_mask, task_ids, mask_label, mask_pos
]
return return_list if len(return_list) > 1 else return_list[0]
def pad_batch_data(insts,
max_len=None,
pad_idx=0,
return_pos=False,
return_input_mask=False,
return_max_len=False,
return_num_token=False):
"""
Pad the instances to the max sequence length in batch, and generate the
corresponding position data and input mask.
"""
return_list = []
if max_len is None:
max_len = max(len(inst) for inst in insts)
# Any token included in dict can be used to pad, since the paddings' loss
# will be masked out by weights and make no effect on parameter gradients.
inst_data = np.array([
list(inst) + list([pad_idx] * (max_len - len(inst))) for inst in insts
])
return_list += [inst_data.astype("int64").reshape([-1, max_len, 1])]
# position data
if return_pos:
inst_pos = np.array([
list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst))
for inst in insts
])
return_list += [inst_pos.astype("int64").reshape([-1, max_len, 1])]
if return_input_mask:
# This is used to avoid attention on paddings.
input_mask_data = np.array([[1] * len(inst) + [0] *
(max_len - len(inst)) for inst in insts])
input_mask_data = np.expand_dims(input_mask_data, axis=-1)
return_list += [input_mask_data.astype("float32")]
if return_max_len:
return_list += [max_len]
if return_num_token:
num_token = 0
for inst in insts:
num_token += len(inst)
return_list += [num_token]
return return_list if len(return_list) > 1 else return_list[0]
if __name__ == "__main__":
pass
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
class MRQAExample(object):
"""A single training/test example for simple sequence classification.
For examples without an answer, the start and end position are -1.
"""
def __init__(self,
qas_id,
question_text,
doc_tokens,
orig_answer_text=None,
start_position=None,
end_position=None,
is_impossible=False):
self.qas_id = qas_id
self.question_text = question_text
self.doc_tokens = doc_tokens
self.orig_answer_text = orig_answer_text
self.start_position = start_position
self.end_position = end_position
self.is_impossible = is_impossible
def __str__(self):
return self.__repr__()
def __repr__(self):
s = ""
s += "qas_id: %s" % (tokenization.printable_text(self.qas_id))
s += ", question_text: %s" % (
tokenization.printable_text(self.question_text))
s += ", doc_tokens: [%s]" % (" ".join(self.doc_tokens))
if self.start_position:
s += ", start_position: %d" % (self.start_position)
if self.start_position:
s += ", end_position: %d" % (self.end_position)
if self.start_position:
s += ", is_impossible: %r" % (self.is_impossible)
return s
class MRQAFeature(object):
"""A single set of features of data."""
def __init__(self,
unique_id,
example_index,
doc_span_index,
tokens,
token_to_orig_map,
token_is_max_context,
input_ids,
input_mask,
segment_ids,
start_position=None,
end_position=None,
is_impossible=None):
self.unique_id = unique_id
self.example_index = example_index
self.doc_span_index = doc_span_index
self.tokens = tokens
self.token_to_orig_map = token_to_orig_map
self.token_is_max_context = token_is_max_context
self.input_ids = input_ids
self.input_mask = input_mask
self.segment_ids = segment_ids
self.start_position = start_position
self.end_position = end_position
self.is_impossible = is_impossible
此差异已折叠。
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddlepalm.interface import reader as base_reader
from paddlepalm.interface import task_paradigm as base_paradigm
import os
import json
from paddle import fluid
class TaskInstance(object):
def __init__(self, name, id, config={}, verbose=True):
self._name = name
self._config = config
self._verbose = verbose
self._save_infermodel_path = os.path.join(self._config['save_path'], 'infer_model')
self._save_ckpt_path = os.path.join(self._config['save_path'], 'ckpt')
# following flags can be fetch from instance config file
self._is_target = config.get('is_target', True)
self._first_target = config.get('is_first_target', False)
self._task_reuse_scope = config.get('task_reuse_scope', name)
self._feeded_var_names = None
self._target_vars = None
# training process management
self._mix_ratio = None
self._expected_train_steps = None
self._expected_train_epochs = None
self._steps_pur_epoch = None
self._cur_train_epoch = 0
self._cur_train_step = 0
self._train_finish = False
# 存放不同运行阶段(train,eval,pred)的数据集reader,key为phase,value为Reader实例
self._reader = {'train': None, 'eval': None, 'pred': None}
self._input_layer = None
self._inputname_to_varname = {}
self._task_layer = {'train': None, 'eval': None, 'pred': None}
self._pred_input_name_list = []
self._pred_input_varname_list = []
self._pred_fetch_name_list = []
self._pred_fetch_var_list = []
self._Reader = None
self._Paradigm = None
self._exe = fluid.Executor(fluid.CPUPlace())
self._save_protocol = {
'input_names': 'self._pred_input_name_list',
'input_varnames': 'self._pred_input_varname_list',
'fetch_list': 'self._pred_fetch_name_list'}
def build_task_layer(self, net_inputs, phase):
output_vars = self._task_layer[phase].build(net_inputs)
if phase == 'pred':
self._pred_fetch_name_list, self._pred_fetch_var_list = zip(*output_vars.items())
return output_vars
def postprocess(self, rt_outputs, phase):
return self._task_layer[phase].postprocess(rt_outputs)
def epoch_postprocess(self, epoch_inputs, phase):
return self._task_layer[phase].epoch_postprocess(epoch_inputs)
def save(self, suffix=''):
dirpath = self._save_infermodel_path + suffix
self._pred_input_varname_list = [str(i) for i in self._pred_input_varname_list]
fluid.io.save_inference_model(dirpath, self._pred_input_varname_list, self._pred_fetch_var_list, self._exe)
# fluid.io.save_inference_model(dirpath, self._pred_input_varname_list, self._pred_fetch_var_list, self._exe, params_filename='__params__')
print(self._name + ': inference model saved at ' + dirpath)
conf = {}
for k, strv in self._save_protocol.items():
exec('v={}'.format(strv))
conf[k] = v
with open(os.path.join(dirpath, '__conf__'), 'w') as writer:
writer.write(json.dumps(conf, indent=1))
def load(self, infer_model_path=None):
if infer_model_path is None:
infer_model_path = self._save_infermodel_path
for k,v in json.load(open(os.path.join(infer_model_path, '__conf__'))).items():
strv = self._save_protocol[k]
exec('{}=v'.format(strv))
pred_prog, self._pred_input_varname_list, self._pred_fetch_var_list = \
fluid.io.load_inference_model(infer_model_path, self._exe)
# pred_prog, self._pred_input_varname_list, self._pred_fetch_var_list = \
# fluid.io.load_inference_model(infer_model_path, self._exe, params_filename='__params__')
print(self._name+': inference model loaded from ' + infer_model_path)
return pred_prog
@property
def name(self):
return self._name
@property
def Reader(self):
return self._Reader
@Reader.setter
def Reader(self, cls):
assert base_reader.__name__ == cls.__bases__[-1].__name__, \
"expect: {}, receive: {}.".format(base_reader.__name__, \
cls.__bases__[-1].__name__)
self._Reader = cls
@property
def Paradigm(self):
return self._Paradigm
@Paradigm.setter
def Paradigm(self, cls):
assert base_paradigm.__name__ == cls.__bases__[-1].__name__, \
"expect: {}, receive: {}.".format(base_paradigm.__name__, \
cls.__bases__[-1].__name__)
self._Paradigm = cls
@property
def config(self):
return self._config
@property
def reader(self):
return self._reader
@property
def pred_input(self):
return zip(*[self._pred_input_name_list, self._pred_input_varname_list])
@pred_input.setter
def pred_input(self, val):
assert isinstance(val, dict)
self._pred_input_name_list, self._pred_input_varname_list = \
zip(*[[k, v.name] for k,v in val.items()])
# print(self._pred_input_name_list)
@property
def pred_fetch_list(self):
return [self._pred_fetch_name_list, self._pred_fetch_var_list]
@property
def task_layer(self):
return self._task_layer
@property
def is_first_target(self):
return self._is_first_target
@is_first_target.setter
def is_first_target(self, value):
self._is_first_target = bool(value)
if self._is_first_target:
assert self._is_target, "ERROR: only target task could be set as main task."
if self._verbose and self._is_first_target:
print("{}: set as main task".format(self._name))
@property
def is_target(self):
if self._is_target is not None:
return self._is_target
else:
raise ValueError("{}: is_target is None".format(self._name))
@is_target.setter
def is_target(self, value):
self._is_target = bool(value)
if self._verbose:
if self._is_target:
print('{}: set as target task.'.format(self._name))
else:
print('{}: set as aux task.'.format(self._name))
@property
def mix_ratio(self):
if self._mix_ratio is not None:
return self._mix_ratio
else:
raise ValueError("{}: mix_ratio is None".format(self._name))
@mix_ratio.setter
def mix_ratio(self, value):
self._mix_ratio = float(value)
if self._verbose:
print('{}: mix_ratio is set to {}'.format(self._name, self._mix_ratio))
@property
def expected_train_steps(self):
return self._expected_train_steps
@expected_train_steps.setter
def expected_train_steps(self, value):
self._expected_train_steps = value
self._expected_train_epochs = value / float(self._steps_pur_epoch)
@property
def expected_train_epochs(self):
return self._expected_train_epochs
@property
def cur_train_epoch(self):
return self._cur_train_epoch
@cur_train_epoch.setter
def cur_train_epoch(self, value):
self._cur_train_epoch = value
@property
def cur_train_step(self):
return self._cur_train_step
@cur_train_step.setter
def cur_train_step(self, value):
self._cur_train_step = value
if self._cur_train_step > self._steps_pur_epoch:
self._cur_train_epoch += 1
self._cur_train_step = 1
if self._is_target and self._cur_train_step + self._cur_train_epoch * self._steps_pur_epoch >= self._expected_train_steps:
self._train_finish = True
print(self._name+': train finished!')
self.save()
# fluid.io.save_inference_model(self._save_infermodel_path, )
@property
def steps_pur_epoch(self):
return self._steps_pur_epoch
@steps_pur_epoch.setter
def steps_pur_epoch(self, value):
self._steps_pur_epoch = value
@property
def train_finish(self):
return self._train_finish
@property
def task_reuse_scope(self):
if self._task_reuse_scope is not None:
return self._task_reuse_scope
else:
raise ValueError("{}: task_reuse_scope is None".format(self._name))
@task_reuse_scope.setter
def task_reuse_scope(self, scope_name):
self._task_reuse_scope = str(scope_name)
if self._verbose:
print('{}: task_reuse_scope is set to {}'.format(self._name, self._task_reuse_scope))
def check_instances(insts):
"""to check ids, first_target"""
pass
def _check_ids():
pass
def _check_targets():
pass
def _check_reuse_scopes():
pass
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
from paddlepalm.interface import task_paradigm
from paddle.fluid import layers
class TaskParadigm(task_paradigm):
'''
classification
'''
def __init___(self, config, phase):
self._is_training = phase == 'train'
self.sent_emb_size = config['hidden_size']
self.num_classes = config['n_classes']
@property
def inputs_attrs(self):
return {'bakcbone': {"sentence_emb": [-1, self.sent_emb_size], 'float32']},
'reader': {"label_ids": [[-1, 1], 'int64']}}
@property
def outputs_attrs(self):
if self._is_training:
return {'loss': [[1], 'float32']}
else:
return {'logits': [-1, self.num_classes], 'float32'}
def build(self, **inputs):
sent_emb = inputs['backbone']['sentence_emb']
label_ids = inputs['reader']['label_ids']
logits = fluid.layers.fc(
input=ent_emb
size=self.num_classes,
param_attr=fluid.ParamAttr(
name="cls_out_w",
initializer=fluid.initializer.TruncatedNormal(scale=0.1)),
bias_attr=fluid.ParamAttr(
name="cls_out_b", initializer=fluid.initializer.Constant(0.)))
loss = fluid.layers.softmax_with_cross_entropy(
logits=logits, label=label_ids)
loss = layers.mean(loss)
if self._is_training:
return {"loss": loss}
else:
return {"logits":logits}
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
from paddlepalm.interface import task_paradigm
from paddle.fluid import layers
class TaskParadigm(task_paradigm):
'''
matching
'''
def __init__(self, config, phase, backbone_config=None):
self._is_training = phase == 'train'
self._hidden_size = backbone_config['hidden_size']
@property
def inputs_attrs(self):
if self._is_training:
reader = {"label_ids": [[-1, 1], 'int64']}
else:
reader = {}
bb = {"sentence_pair_embedding": [[-1, self._hidden_size], 'float32']}
return {'reader': reader, 'backbone': bb}
@property
def outputs_attrs(self):
if self._is_training:
return {"loss": [[1], 'float32']}
else:
return {"logits": [[-1, 1], 'float32']}
def build(self, inputs):
labels = inputs["reader"]["label_ids"]
cls_feats = inputs["backbone"]["sentence_pair_embedding"]
cls_feats = fluid.layers.dropout(
x=cls_feats,
dropout_prob=0.1,
dropout_implementation="upscale_in_train")
logits = fluid.layers.fc(
input=cls_feats,
size=2,
param_attr=fluid.ParamAttr(
name="cls_out_w",
initializer=fluid.initializer.TruncatedNormal(scale=0.02)),
bias_attr=fluid.ParamAttr(
name="cls_out_b",
initializer=fluid.initializer.Constant(0.)))
ce_loss, probs = fluid.layers.softmax_with_cross_entropy(
logits=logits, label=labels, return_softmax=True)
loss = fluid.layers.mean(x=ce_loss)
if self._is_training:
return {'loss': loss}
else:
return {'logits': logits}
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
from paddlepalm.interface import task_paradigm
from paddle.fluid import layers
class TaskParadigm(task_paradigm):
'''
matching
'''
def __init__(self, config, phase, backbone_config=None):
self._is_training = phase == 'train'
self._hidden_size = backbone_config['hidden_size']
self._vocab_size = backbone_config['vocab_size']
self._hidden_act = backbone_config['hidden_act']
self._initializer_range = backbone_config['initializer_range']
@property
def inputs_attrs(self):
if self._is_training:
reader = {"label_ids": [[-1, 1], 'int64']}
else:
reader = {}
bb = {"encoder_outputs": [[-1, self._hidden_size], 'float32']}
return {'reader': reader, 'backbone': bb}
@property
def outputs_attrs(self):
if self._is_training:
return {"loss": [[1], 'float32']}
else:
return {"logits": [[-1, 1], 'float32']}
def build(self, inputs):
mask_label = inputs["reader"]["mask_label"]
mask_pos = inputs["reader"]["mask_pos"]
word_emb = inputs["backbone"]["word_embedding"]
enc_out = inputs["backbone"]["encoder_outputs"]
emb_size = word_emb.shape[-1]
_param_initializer = fluid.initializer.TruncatedNormal(
scale=self._initializer_range)
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
reshaped_emb_out = fluid.layers.reshape(
x=enc_out, shape=[-1, emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
num_seqs = fluid.layers.fill_constant(shape=[1], value=512, dtype='int64')
# transform: fc
mask_trans_feat = fluid.layers.fc(
input=mask_feat,
size=emb_size,
act=self._hidden_act,
param_attr=fluid.ParamAttr(
name='mask_lm_trans_fc.w_0',
initializer=_param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(
mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(
name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
# print fluid.default_main_program().global_block()
# fc_out = fluid.layers.matmul(
# x=mask_trans_feat,
# y=fluid.default_main_program().global_block().var(
# _word_emb_name),
# transpose_y=True)
fc_out = fluid.layers.matmul(
x=mask_trans_feat,
y=word_emb,
transpose_y=True)
fc_out += fluid.layers.create_parameter(
shape=[self._vocab_size],
dtype='float32',
attr=mask_lm_out_bias_attr,
is_bias=True)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(
logits=fc_out, label=mask_label)
loss = fluid.layers.mean(mask_lm_loss)
if self._is_training:
return {'loss': loss}
else:
return None
此差异已折叠。
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -20,7 +21,7 @@ from __future__ import print_function
import collections
import unicodedata
import six
import io
def convert_to_unicode(text):
"""Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
......@@ -68,15 +69,15 @@ def printable_text(text):
def load_vocab(vocab_file):
"""Loads a vocabulary file into a dictionary."""
vocab = collections.OrderedDict()
with io.open(vocab_file, encoding="utf8") as fin:
for num, line in enumerate(fin):
items = convert_to_unicode(line.strip()).split("\t")
if len(items) > 2:
break
token = items[0]
index = items[1] if len(items) == 2 else num
token = token.strip()
vocab[token] = int(index)
fin = open(vocab_file)
for num, line in enumerate(fin):
items = convert_to_unicode(line.strip()).split("\t")
if len(items) > 2:
break
token = items[0]
index = items[1] if len(items) == 2 else num
token = token.strip()
vocab[token] = int(index)
return vocab
......
此差异已折叠。
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
MAXLEN = 70
def print_dict(dic, title=""):
if title:
title = ' ' + title + ' '
left_len = (MAXLEN - len(title)) // 2
title = '-' * left_len + title
right_len = MAXLEN - len(title)
title = title + '-' * right_len
else:
title = '-' * MAXLEN
print(title)
for name in dic:
print("{: <25}\t{}".format(str(name), str(dic[name])))
print("")
# print("-" * MAXLEN + '\n')
此差异已折叠。
# -*- coding: UTF-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
def is_whitespace(c):
if c == " " or c == "\t" or c == "\r" or c == "\n" or ord(c) == 0x202F:
return True
return False
该目录用来存放任务范式,用户可通过实现paradigm的接口完成自定义。
此差异已折叠。
此差异已折叠。
该目录用于存放预训练及其配置文件,用户可通过运行`download_pretrain.sh`下载内置的预训练模型。
该目录存放数据集载入与处理模块reader,用户可通过实现相关接口完成自定义
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
#!/bin/bash
# for gpu memory optimization
export FLAGS_sync_nccl_allreduce=0
export FLAGS_eager_delete_tensor_gb=1
export CUDA_VISIBLE_DEVICES=0
if [[ ! -d pretrain_model/bert ]]; then
bash download_pretrain.sh bert
fi
if [[ ! -d pretrain_model/ernie ]]; then
bash download_pretrain.sh ernie
fi
python -u mtl_run.py
export CUDA_VISIBLE_DEVICES=0
export FLAGS_fraction_of_gpu_memory_to_use=0.1
export FLAGS_eager_delete_tensor_gb=0
python demo.py
#!/bin/sh
if [[ $# != 1 ]]; then
echo "usage: bash convert_params.sh <params_dir>"
exit 1
fi
echo "converting..."
cd $1
mkdir .palm.backup
for file in $(ls *)
do cp $file "backbone-"$file; mv $file .palm.backup
done
cd - >/dev/null
echo "done!"
此差异已折叠。
此差异已折叠。
此差异已折叠。
train_file: "data/match4mrqa/train.txt"
reader: match4ernie
paradigm: match
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册