test=develop[

f6c68c85 · xixiaoyao · ee949ea1 · ee949ea1 · f6c68c85 · ee949ea1
80 changed file
--- a/LICENSE
+++ b/LICENSE
-                                 Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-
-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
-   1. Definitions.
-
-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
-
-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
-
-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
-
-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
-
-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
-
-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
-
-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
-
-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
-
-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
-
-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
-
-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
-
-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
-
-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
-
-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
-
-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
-
-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
-
-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
-
-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
-
-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
-
-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
-
-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
-
-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
-
-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
-
-   END OF TERMS AND CONDITIONS
-
-   APPENDIX: How to apply the Apache License to your work.
-
-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
-
-   Copyright [yyyy] [name of copyright owner]
-
-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--- a/README.md
+++ b/README.md
-PALM
-===
-PALM (PAddLe for Multi-task) 是一个灵活易用的多任务学习框架，框架中内置了丰富的模型backbone（BERT、ERNIE等）、常见的任务范式（分类、匹配、序列标注、机器阅读理解等）和数据集读取与处理工具。对于典型的任务场景，用户几乎无需书写代码便可完成新任务的添加；对于特殊的任务场景，用户可通过对预置接口的实现来完成对新任务的支持。

-## 安装
+# 多任务学习框架PaddlePALM

-目前仅支持git clone源码的方式使用:
-```shell
-git clone https://github.com/PaddlePaddle/PALM.git
-```
-
-**环境依赖**
- Python >= 2.7
- cuda >= 9.0
- cudnn >= 7.0
- PaddlePaddle >= 1.6 (请参考[安装指南](http://www.paddlepaddle.org/#quick-start)进行安装)
-
-## 目录结构
-
- backbone: 多任务学习的主干网络表示，支持bert, ernie等，用户可自定义添加
- config：存放各个任务实例的配置文件，用户添加任务时需在此建立该任务的配置文件
- data: 存放各个任务的数据集
- pretrain_model: 存放预训练模型、字典及其相关配置
- optimizer: 优化器，用户可在此自定义优化器
- reader: 各个任务的数据读取与处理模块以及做reader融合的joint_reader文件
- paradigm: 任务输出层相关网络结构描述
- utils: 通用工具函数文件
- mtl_run.py: 多任务学习的主要流程描述
- run.sh: 多任务学习启动脚本
-
-## 使用说明
+# 安装
+pip install paddlepalm

-框架给出了三个添加完成的任务示例：*Machine Reading Comprehension*、*Mask Language Model*和*Question Answer Matching*。其中在`mtl_config.yaml`中将*Machine Reading Comprehension*设置为了主任务，其他为辅助任务，用户可通过如下命令启动多任务学习
+# 使用

-```shell
-bash run.sh
-```
-
-*提示：首次运行时，脚本会自动下载预训练的bert和ernie模型，请耐心等待*
+### 1. 创建任务实例

-### 多任务学习配置
+使用yaml格式描述任务实例，每个任务实例中的必选字段包括

-在`mtl_config.yaml`中完成对多任务训练和推理的主配置，配置包含如下
+- train_file: 训练集文件路径
+- reader: 数据集载入与处理工具名，框架预置reader列表见[这里](https://www.baidu.com/)
+- backbone: 骨架模型名，框架预置reader列表见[这里](https://www.baidu.com/)
+- paradigm: 任务范式(类型)名，框架预置paradigm列表见[这里](https://www.baidu.com/)

-***必选字段***
+### 2. 完成训练配置

- main_task：*(str)* 指定主任务的名称，目前仅支持单个主任务。名称选取自`config`文件夹中的配置的文件名（不包含后缀`.yaml`和为task共享而设置的中间后缀）
- auxiliary_task：*(str)* 指定辅助任务，支持多个辅助任务，辅助任务之间使用空格隔开。名称选取自`config`文件夹中的配置的文件名（不包含后缀`.yaml`和为task共享而设置的中间后缀）
- do_train：*(bool)* 训练标志位
- do_predict：*(bool)* 预测标志位，目前仅支持对主任务进行预测
- checkpoint_path: *(str)* 模型保存、训练断点恢复和预测模型载入路径，从该路径载入模型时默认读取最后一个训练step的模型
- backbone_model：*(str)* 使用的骨干网络，名称选取自`backbone`目录下的模块。注意，更换backbone时，若使用预训练模型，应同步更换预训练模型参数、配置和字典等相关字段
- vocab_path：*(str)* 字典文件，纯文本格式存储，其中每行为一个单词
- optimizer：*(str)* 优化器名称，名称选取自`optimizer`中的文件名
- learning_rate：*(str)* 训练阶段的学习率
- skip_steps：*(int)* 训练阶段打印日志的频率（step为单位）
- epoch：*(int)* 主任务的训练epoch数
- use_cuda：*(bool)* 使用GPU训练的标志位
- warmup_proportion：*(float)* 预训练模型finetuning时的warmup比例
- use_ema：*(bool)* 是否开启ema进行训练和推理
- ema_decay：*(float)* 开启ema时的衰减指数
- random_seed：*(int)* 随机种子
- use_fp16：*(bool)* 开启混合精度训练标志位
- loss_scaling：*(float)* 开启混合精度训练时的loss缩放因子
+使用yaml格式完成配置多任务学习中的相关参数，如指定任务实例及其相关的主辅关系、参数复用关系、采样权重等

-***可选字段***
+### 3. 开始训练

- pretrain_model_path：*(str)* 预训练模型的载入路径，该路径下应包含存储模型参数的params文件夹
- pretrain_config_path：*(str)* 预训练模型的配置文件，json格式描述
- do_lower_case：*(bool)* 预处理阶段是否区分大小写
- 其他用户自定义字段
+```python

+import paddlepalm as palm

-### 使用示例
-若内置任务可满足用户需求，或用户已经完成自定义任务的添加，可通过如下方式直接启动多任务学习。
+if __name__ == '__main__':
+    controller = palm.Controller('config.yaml', task_dir='task_instance')
+    controller.load_pretrain('pretrain_model/ernie/params')
+    controller.train()
+```

-例如，框架中内置了一个小数据集，包含MRQA阅读理解评测数据`mrqa`、MaskLM训练数据`mlm4mrqa`和问题与答案所在上下文的匹配数据集`am4mrqa`，而在框架中已经内置了机器阅读理解任务(`reading_comprehension`)、掩码语言模型任务（`mask_language_model`)和问答匹配任务（`answer_matching`）。这里我们希望用掩码语言模型和问答匹配任务来提升机器阅读理解任务的效果，那么我们可通过如下流程完成多任务学习的启动。
+### 4. 预测

-首先在config文件夹中添加训练任务相关的配置文件：
+用户可在训练结束后直接调用pred接口对某个目标任务进行预测

-1. `reading_comprehension.yaml`
+示例：
 ```python
-train_file: "data/mrqa/mrqa-combined.train.raw.json"
-predict_file: "data/mrqa/mrqa-combined.dev.raw.json"
-batch_size: 4
-mix_ratio: 1.0
-in_tokens: false
-doc_stride: 128
-sample_rate: 0.02
-...
-```
+import paddlepalm as palm

-2. `mask_language_model.yaml`
-```python
-train_file: "data/mlm4mrqa"
-mix_ratio: 0.4
-batch_size: 4
-in_tokens: False
-generate_neg_sample: False
+if __name__ == '__main__':
+    controller = palm.Controller(config_path='config.yaml', task_dir='task_instance')
+    controller.load_pretrain('pretrain_model/ernie/params')
+    controller.train()
+    controller.pred('mrqa')
 ```

-3. `answer_matching.yaml`
-```python
-train_file: "data/am4mrqa/train.txt" 
-mix_ratio: 0.4
-batch_size: 4
-in_tokens: False
-```
-
-而后可以在主配置文件`mtl_config.yaml`中完成多任务学习的配置，其中，使用`main_task`字段指定主任务，使用`auxilary_task`可指定辅助任务，多个辅助任务之间使用空格"` `"隔开。
-
-epoch的设定仅针对设定为主任务有效，`mix ratio`的基准值1.0也是针对主任务的训练步数而言的。例如，对于`epoch=2`，若将`reading_comprehension`任务的`mix ratio`设定为1.0，`mask_language_model`的`mix ratio`设定为0.5，那么`reading_comprehension`任务将训练两个完整的`epoch`，而`mask_language_model`任务的训练步数等于`reading_comprehension`训练步数的一半。
+也可新建controller直接预测

 ```python
-main_task: "reading_comprehension"
-auxiliary_task: "mask_language_model answer_matching"
-do_train: True
-epoch: 2
-...
+import paddlepalm as palm
+
+if __name__ == '__main__':
+    controller = palm.Controller(config_path='config.yaml', task_dir='task_instance')
+    controller.pred('mrqa', infermodel_path='output_model/firstrun2/infer_model')
 ```

-最后，运行`run.sh`启动三个任务的联合训练。若用户希望删掉其中某些辅助任务，可通过修改`mtl_config.yaml`中的`auxilary_task`字段来实现。
-
-
-### 添加新任务
-
-用户添加任务时，在准备好该任务的数据集后，需要完成如下3处开发工作：
-
-***config模块***
-
-位于`./config`目录。存放各个任务实例的配置文件，使用`yaml`格式描述。配置文件中的必选字段包括
-
- batch_size：每个训练或推理step所使用样本数。当`in_tokens`为True时，`batch_size`表示每个step所包含的tokens数量。
- in_tokens：是否使用lod tensor的方式构造batch，当`in_tokens`为False时，使用padding方式构造batch。
-
-训练阶段包含的必选字段包括
-
- train_file：训练集文件路径
- mix_ratio：该任务的训练阶段采样权重（1.0代表与主任务采样次数的期望相同）
-
-推理阶段包含的必选字段包括
-
- predict_file：测试集文件路径
-
-此外用户可根据任务需要，自行定义其他超参数，该超参可在创建任务模型时被访问
-
-***reader模块***
-
-位于`./reader`目录下。完成数据集读取与处理。新增的reader应放置在`paradigm`目录下，且包含一个`get_input_shape`方法和`DataProcessor`类。
-
- **get_input_shape**: *(function)*  定义reader给backbone和task_paradigm生成的数据的shape和dtype，且需要同时返回训练和推理阶段的定义。
-  - 输入参数
-    - args: *(dict)* 解析后的任务配置
-  - 返回值
-    - train_input_shape: *(dict)* 包含backbone和task两个key，每个key对应的value为一个list，存储若干`(shape, dtype)`的元组
-    - test_input_shape: *(dict)* 包含backbone和task两个key，每个key对应的value为一个list，存储若干`(shape, dtype)`的元组
- **DataProcessor**：*(class)*   定义数据集的载入、预处理和遍历
-  - \_\_init\_\_: 构造函数，解析和存储相关参数，进行必要的初始化
-    - 输入参数
-      - args: *(dict)* 解析后的任务配置
-    - 返回值
-      - 无
-  - data_generator: *(function)* 数据集的迭代器，被遍历时每次yield一个batch
-    - 输入参数
-      - phase: *(str)* 任务所处阶段，支持训练`train`和推理`predict`两种可选阶段
-      - shuffle: *(bool)* 训练阶段是否进行数据集打乱
-      - dev_count: *(int)* 可用的GPU数量或CPU数量
-    - yield输出
-      - tensors: (list) 根据`get_input_shape`中定义的任务backbone和task的所需输入shape和类型，来yield相应list结构的数据。其中被yield出的list的头部元素为backbone要求的输入数据，后续元素为task要求的输入数据
-  - get_num_examples: *(function)* 返回样本数。注意由于滑动窗口等机制，实际运行时产生的样本数可能多于数据集中的样本数，这时应返回runtime阶段实际样本数
-    - 输入参数
-      - 无
-    - 返回值
-      - num_examples: *(int)* 样本数量
-
-***task_paradigm模块***
-
-位于`./paradigm`目录下。描述任务范式（如分类、匹配、阅读理解等）。新增的任务范式应放置在`paradigm`目录下，且应包含`compute_loss`和`create_model`两个必选方法，以及`postprocess`，`global_postprocess`两个可选方法。
-
- create_model：*(function)* 创建task模型
-  - 输入参数
-    - reader_input：*(nested Variables)* 数据输入层的输出，定义位于该任务的reader模块的`input_shape`方法中。输入的前N个元素为backbone的输入元素，之后的元素为task的输入。
-    - base_model：*(Model)* 模型backbone的实例，可调用backbone的对外输出接口来实现task与backbone的连接。一般来说，backbone的输出接口最少包括`final_sentence_representation`和`final_word_representation`两个属性。
-      - base_model.final_sentence_representation：*(Variable)* 输入文本的向量表示，shape为`[batch_size, hidden_size]`。
-      - base_model.final_word_representation：*(Variable)* 输入文本中每个单词的向量表示，shape为`[batch_size, max_seqlen, hidden_size]`
-    - is_training：*(bool)* 训练标志位
-    - args：*(Argument)* 任务相关的参数配置，具体参数在config文件夹中定义
-  - 返回值
-    - output_tensors: *(dict)* 任务输出的tensor字典。训练阶段的输出字典中应至少包括num_seqs元素，num_seqs记录了batch中所包含的样本数（在输入为lod tensor（args.in_tokens被设置为True）时所以样本压平打平，没有样本数维度）
- compute_loss: *(function)* 计算task在训练阶段的batch平均损失值
-  - 输入参数
-    - output_tensors: *(dict)* 创建task时（调用`create_model`）时返回值，存储计算loss所需的Variables的名字到实例的映射
-    - args：*(Argument)* 任务相关的参数配置，具体参数在config文件夹中定义
-  - 返回值
-    - total_loss：*(Variable)* 当前batch的平均损失值
- postprocess：*(function)* 推理阶段对每个推理step得到的fetch_results进行的后处理，返回对该step的每个样本的后处理结果
-  - 输入参数
-    - fetch_results：(dict) 当前推理step的fetch_dict中的计算结果，其中fetch_dict在create_model时定义并返回。
-  - 返回值
-    - processed_results：(list)当前推理step所有样本的后处理结果。
- global_postprocess: *(function)* 推理结束后，对全部样本的后处理结果进行最终处理（如结果保存、二次后处理等）
-  - 输入参数
-    - pred_buf：所有测试集样本的预测后处理结果
-    - processor：任务的数据集载入与处理类DataProcessor的实例
-    - mtl_args：多任务学习配置，在`mtl_conf.yaml`中定义
-    - task_args：任务相关的参数配置，在`conf`文件夹中定义
-  - 返回值
-    - 无
-
-***命名规范***
-
-为新任务创建config，task_paradigm和reader文件后，应将三者文件名统一，且为reader文件的文件名增加`_reader`后缀。例如，用户添加的新任务名为yelp_senti，则config文件名为`yelp_senti.yaml`，放置于config文件夹下；task_paradigm文件名为`yelp_senti.py`，放置于paradigm文件夹下；reader文件名为`yelp_senti_reader.py`，放置于reader文件夹下。
-
-***One-to-One模式（任务层共享）***
-
-框架默认使用one-to-many的模式进行多任务训练，即多任务共享encoder，不共享输出层。该版本同时支持one-to-one模式，即多任务同时共享encoder和输出层（模型参数完全共享，但是有不同的数据源）。该模式通过config文件命名的方式开启，具体流程如下。

-```
-1. mtl_config.yaml下用户配置任务相关的名称，如main_task: "reading_comprehension"
-2. 如果一个任务的数据集是多个来源，请在configs下对同一个任务添加多个任务配置，如任务为"reading_comprehension"有两个数据集需要训练，且每个batch内的数据都来自同一数据集，则需要添加reading_comprehension.name1.yaml和reading_comprehension.name2.yaml两个配置文件，其中name1和name2用户可根据自己需求定义名称，框架内不限定名称定义；
-3. 启动多任务学习：sh run.sh
-```
+# 运行机制
+
+### 多任务学习机制
+pass 

-## 框架结构与运行原理
-框架结构如图所示
+### 训练终止机制

-![框架图](https://tva1.sinaimg.cn/large/006y8mN6ly1g7goo0bjzwj31c20om13h.jpg)
+- 默认的设置：
+  - **所有target任务达到目标训练步数后多任务学习停止**
+  - 未设置成target任务的任务（即辅助任务）不会影响训练终止与否，只是担任”陪训“的角色
+  - 注：默认所有的任务都是target任务，用户可以通过`target_tag`来标记目标/辅助任务
+  - 每个目标任务的目标训练步数由num_epochs和mix_ratio计算得到

-其中`mtl_config.yaml`用于配置多任务主控的参数设定，每个任务实例的配置由用户完成后放置于`config`文件夹中。当用户运行`run.sh`后，脚本启动多任务学习控制器，控制器开始解析`mtl_config.yaml`和各个任务实例的配置文件，进而创建backbone、为各个任务创建reader和任务层，最后控制器启动训练任务，实现多任务训练。
+### 保存机制

-## License
-This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](https://github.com/PaddlePaddle/models/blob/develop/LICENSE).
+- 默认的设置：
+  - 训练过程中，保存下来的模型分为checkpoint (ckpt)和inference model (infermodel)两种：
+    - ckpt保存的是包含所有任务的总计算图（即整个多任务学习计算图），用于训练中断恢复
+    - infermodel保存的是某个目标任务的推理计算图和推理依赖的相关配置
+  - 对于每个target任务，训练到预期的步数后自动保存inference model，之后不再保存。（注：保存inference model不影响ckpt的保存）
+- 用户可改配置
+  - 使用`save_ckpt_every_steps`来控制保存ckpt的频率，默认不保存
+  - 每个task instance均可使用`save_infermodel_every_steps`来控制该task保存infermodel的频率，默认为-1，即只在达到目标训练步数时保存一下

-## 许可证书
-此向导由[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)贡献，受[Apache-2.0 license](https://github.com/PaddlePaddle/models/blob/develop/LICENSE)许可认证。


--- a/backbone/README.md
+++ b/backbone/README.md
-
-该目录用来存放模型backbone，用户可通过实现backbone的接口完成自定义。
-
--- a/backbone/utils/transformer.py
+++ b/backbone/utils/transformer.py
-#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Transformer encoder."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from functools import partial
-from functools import reduce
-import numpy as np
-
-import paddle.fluid as fluid
-import paddle.fluid.layers as layers
-from paddle.fluid.layer_helper import LayerHelper
-
-
-def layer_norm(x, begin_norm_axis=1, epsilon=1e-6, param_attr=None, bias_attr=None):
-    helper = LayerHelper('layer_norm', **locals())
-    mean = layers.reduce_mean(x, dim=begin_norm_axis, keep_dim=True)
-    shift_x = layers.elementwise_sub(x=x, y=mean, axis=0)
-    variance = layers.reduce_mean(layers.square(shift_x), dim=begin_norm_axis, keep_dim=True)
-    r_stdev = layers.rsqrt(variance + epsilon)
-    norm_x = layers.elementwise_mul(x=shift_x, y=r_stdev, axis=0)
-
-    param_shape = [reduce(lambda x, y: x * y, norm_x.shape[begin_norm_axis:])]
-    param_dtype = norm_x.dtype
-    scale = helper.create_parameter(
-        attr=param_attr,
-        shape=param_shape,
-        dtype=param_dtype,
-        default_initializer=fluid.initializer.Constant(1.))
-    bias = helper.create_parameter(
-        attr=bias_attr,
-        shape=param_shape,
-        dtype=param_dtype,
-        is_bias=True,
-        default_initializer=fluid.initializer.Constant(0.))
-
-    out = layers.elementwise_mul(x=norm_x, y=scale, axis=-1)
-    out = layers.elementwise_add(x=out, y=bias, axis=-1)
-
-    return out
-
-
-def multi_head_attention(queries,
-                         keys,
-                         values,
-                         attn_bias,
-                         d_key,
-                         d_value,
-                         d_model,
-                         n_head=1,
-                         dropout_rate=0.,
-                         cache=None,
-                         param_initializer=None,
-                         name='multi_head_att'):
-    """
-    Multi-Head Attention. Note that attn_bias is added to the logit before
-    computing softmax activiation to mask certain selected positions so that
-    they will not considered in attention weights.
-    """
-    keys = queries if keys is None else keys
-    values = keys if values is None else values
-
-    if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
-        raise ValueError(
-            "Inputs: quries, keys and values should all be 3-D tensors.")
-
-    def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
-        """
-        Add linear projection to queries, keys, and values.
-        """
-        q = layers.fc(input = queries,
-                      size = d_key * n_head,
-                      num_flatten_dims = 2,
-                      param_attr = fluid.ParamAttr(
-                          name = name + '_query_fc.w_0',
-                          initializer = param_initializer),
-                      bias_attr = name + '_query_fc.b_0')
-        k = layers.fc(input = keys,
-                      size = d_key * n_head,
-                      num_flatten_dims = 2,
-                      param_attr = fluid.ParamAttr(
-                          name = name + '_key_fc.w_0',
-                          initializer = param_initializer),
-                      bias_attr = name + '_key_fc.b_0')
-        v = layers.fc(input = values,
-                      size = d_value * n_head,
-                      num_flatten_dims = 2,
-                      param_attr = fluid.ParamAttr(
-                          name = name + '_value_fc.w_0',
-                          initializer = param_initializer),
-                      bias_attr = name + '_value_fc.b_0')
-        return q, k, v
-
-    def __split_heads(x, n_head):
-        """
-        Reshape the last dimension of inpunt tensor x so that it becomes two
-        dimensions and then transpose. Specifically, input a tensor with shape
-        [bs, max_sequence_length, n_head * hidden_dim] then output a tensor
-        with shape [bs, n_head, max_sequence_length, hidden_dim].
-        """
-        hidden_size = x.shape[-1]
-        # The value 0 in shape attr means copying the corresponding dimension
-        # size of the input as the output dimension size.
-        reshaped = layers.reshape(
-            x = x, shape = [0, 0, n_head, hidden_size // n_head], inplace=False)
-
-        # permuate the dimensions into:
-        # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
-        return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
-
-    def __combine_heads(x):
-        """
-        Transpose and then reshape the last two dimensions of inpunt tensor x
-        so that it becomes one dimension, which is reverse to __split_heads.
-        """
-        if len(x.shape) == 3: return x
-        if len(x.shape) != 4:
-            raise ValueError("Input(x) should be a 4-D Tensor.")
-
-        trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
-        # The value 0 in shape attr means copying the corresponding dimension
-        # size of the input as the output dimension size.
-        return layers.reshape(
-            x = trans_x,
-            shape = [0, 0, trans_x.shape[2] * trans_x.shape[3]],
-            inplace = False)
-
-    def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
-        """
-        Scaled Dot-Product Attention
-        """
-        scaled_q = layers.scale(x = q, scale = d_key**-0.5)
-        product = layers.matmul(x = scaled_q, y = k, transpose_y = True)
-        if attn_bias:
-            product += attn_bias
-        weights = layers.softmax(product)
-        if dropout_rate:
-            weights = layers.dropout(
-                weights,
-                dropout_prob=dropout_rate,
-                dropout_implementation="upscale_in_train",
-                is_test=False)
-        out = layers.matmul(weights, v)
-        return out
-
-    q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
-
-    if cache is not None:  # use cache and concat time steps
-        # Since the inplace reshape in __split_heads changes the shape of k and
-        # v, which is the cache input for next time step, reshape the cache
-        # input from the previous time step first.
-        k = cache["k"] = layers.concat(
-            [layers.reshape(
-                cache["k"], shape=[0, 0, d_model]), k], axis=1)
-        v = cache["v"] = layers.concat(
-            [layers.reshape(
-                cache["v"], shape=[0, 0, d_model]), v], axis=1)
-
-    q = __split_heads(q, n_head)
-    k = __split_heads(k, n_head)
-    v = __split_heads(v, n_head)
-
-    ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key,
-                                                  dropout_rate)
-
-    out = __combine_heads(ctx_multiheads)
-
-    # Project back to the model size.
-    proj_out = layers.fc(input = out,
-                         size = d_model,
-                         num_flatten_dims = 2,
-                         param_attr=fluid.ParamAttr(
-                             name = name + '_output_fc.w_0',
-                             initializer = param_initializer),
-                         bias_attr = name + '_output_fc.b_0')
-    return proj_out
-
-
-def positionwise_feed_forward(x,
-                              d_inner_hid,
-                              d_hid,
-                              dropout_rate,
-                              hidden_act,
-                              param_initializer=None,
-                              name='ffn'):
-    """
-    Position-wise Feed-Forward Networks.
-    This module consists of two linear transformations with a ReLU activation
-    in between, which is applied to each position separately and identically.
-    """
-    hidden = layers.fc(input=x,
-                       size=d_inner_hid,
-                       num_flatten_dims=2,
-                       act=hidden_act,
-                       param_attr=fluid.ParamAttr(
-                           name=name + '_fc_0.w_0',
-                           initializer=param_initializer),
-                       bias_attr=name + '_fc_0.b_0')
-    if dropout_rate:
-        hidden = layers.dropout(
-            hidden,
-            dropout_prob=dropout_rate,
-            dropout_implementation="upscale_in_train",
-            is_test = False)
-
-    out = layers.fc(input = hidden,
-                    size = d_hid,
-                    num_flatten_dims = 2,
-                    param_attr=fluid.ParamAttr(
-                        name = name + '_fc_1.w_0', 
-                        initializer = param_initializer),
-                    bias_attr = name + '_fc_1.b_0')
-    return out
-
-
-def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0.,
-                           name=''):
-    """
-    Add residual connection, layer normalization and droput to the out tensor
-    optionally according to the value of process_cmd.
-    This will be used before or after multi-head attention and position-wise
-    feed-forward networks.
-    """
-    for cmd in process_cmd:
-        if cmd == "a":  # add residual connection
-            out = out + prev_out if prev_out else out
-        elif cmd == "n":  # add layer normalization
-            out_dtype = out.dtype
-            if out_dtype == fluid.core.VarDesc.VarType.FP16:
-                out = layers.cast(x = out, dtype = "float32")
-            out = layer_norm(
-                out,
-                begin_norm_axis=len(out.shape) - 1,
-                param_attr=fluid.ParamAttr(
-                    name = name + '_layer_norm_scale',
-                    initializer = fluid.initializer.Constant(1.)),
-                bias_attr=fluid.ParamAttr(
-                    name = name + '_layer_norm_bias',
-                    initializer = fluid.initializer.Constant(0.)))
-            if out_dtype == fluid.core.VarDesc.VarType.FP16:
-                out = layers.cast(x = out, dtype = "float16")
-        elif cmd == "d":  # add dropout
-            if dropout_rate:
-                out = layers.dropout(
-                    out,
-                    dropout_prob = dropout_rate,
-                    dropout_implementation = "upscale_in_train",
-                    is_test = False)
-    return out
-
-
-pre_process_layer = partial(pre_post_process_layer, None)
-post_process_layer = pre_post_process_layer
-
-def encoder_layer(enc_input,
-                  attn_bias,
-                  n_head,
-                  d_key,
-                  d_value,
-                  d_model,
-                  d_inner_hid,
-                  prepostprocess_dropout,
-                  attention_dropout,
-                  relu_dropout,
-                  hidden_act,
-                  preprocess_cmd="n",
-                  postprocess_cmd="da",
-                  param_initializer=None,
-                  name=''):
-    """The encoder layers that can be stacked to form a deep encoder.
-    This module consits of a multi-head (self) attention followed by
-    position-wise feed-forward networks and both the two components companied
-    with the post_process_layer to add residual connection, layer normalization
-    and droput.
-    """
-    attn_output = multi_head_attention(
-        pre_process_layer(
-            enc_input,
-            preprocess_cmd,
-            prepostprocess_dropout,
-            name=name + '_pre_att'),
-        None,
-        None,
-        attn_bias,
-        d_key,
-        d_value,
-        d_model,
-        n_head,
-        attention_dropout,
-        param_initializer = param_initializer,
-        name = name + '_multi_head_att')
-    attn_output = post_process_layer(
-        enc_input,
-        attn_output,
-        postprocess_cmd,
-        prepostprocess_dropout,
-        name = name + '_post_att')
-    ffd_output = positionwise_feed_forward(
-        pre_process_layer(
-            attn_output,
-            preprocess_cmd,
-            prepostprocess_dropout,
-            name = name + '_pre_ffn'),
-        d_inner_hid,
-        d_model,
-        relu_dropout,
-        hidden_act,
-        param_initializer = param_initializer,
-        name = name + '_ffn')
-    return post_process_layer(
-        attn_output,
-        ffd_output,
-        postprocess_cmd,
-        prepostprocess_dropout,
-        name = name + '_post_ffn')
-
-
-def encoder(enc_input,
-            attn_bias,
-            n_layer,
-            n_head,
-            d_key,
-            d_value,
-            d_model,
-            d_inner_hid,
-            prepostprocess_dropout,
-            attention_dropout,
-            relu_dropout,
-            hidden_act,
-            preprocess_cmd="n",
-            postprocess_cmd="da",
-            param_initializer=None,
-            name='',
-            return_all = False):
-    """
-    The encoder is composed of a stack of identical layers returned by calling
-    encoder_layer.
-    """
-    enc_outputs = []
-    for i in range(n_layer):
-        enc_output = encoder_layer(
-            enc_input,
-            attn_bias,
-            n_head,
-            d_key,
-            d_value,
-            d_model,
-            d_inner_hid,
-            prepostprocess_dropout,
-            attention_dropout,
-            relu_dropout,
-            hidden_act,
-            preprocess_cmd,
-            postprocess_cmd,
-            param_initializer = param_initializer,
-            name = name + '_layer_' + str(i))
-        enc_input = enc_output
-        if i < n_layer - 1:
-            enc_outputs.append(enc_output)
-
-    enc_output = pre_process_layer(
-        enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
-    enc_outputs.append(enc_output)
-
-    if not return_all:
-        return enc_output
-    else:
-        return enc_output, enc_outputs
--- a/config.yaml
+++ b/config.yaml
+task_instance: "mrqa, match4mrqa"
+target_tag: 1, 0
+mix_ratio: 1.0, 0.5
+
+save_path: "output_model/firstrun"
+
+backbone: "ernie"
+backbone_config_path: "pretrain_model/ernie/ernie_config.json"
+
+vocab_path: "pretrain_model/ernie/vocab.txt"
+do_lower_case: True
+max_seq_len: 512
+
+batch_size: 5
+num_epochs: 2
+optimizer: "adam"
+learning_rate: 3e-5
+warmup_proportion: 0.1
+weight_decay: 0.1
+
--- a/config/answer_matching.yaml
+++ b/config/answer_matching.yaml
-train_file: "data/am4mrqa/train.txt"
-mix_ratio: 0.4
-batch_size: 4
-in_tokens: False
-generate_neg_sample: False
--- a/config/reading_comprehension.yaml
+++ b/config/reading_comprehension.yaml
-train_file: "data/mrqa/mrqa-combined.train.raw.json"
-predict_file: "data/mrqa/mrqa-combined.dev.raw.json"
-sample_rate: 0.02
-mix_ratio: 1.0
-batch_size: 4
-in_tokens: false
-doc_stride: 128
-with_negative: false
-max_query_length: 64
-max_answer_length: 30
-n_best_size: 20
-null_score_diff_threshold: 0.0
-verbose: False
--- a/data/am4mrqa/dev.txt
+++ b/data/am4mrqa/dev.txt
--- a/data/am4mrqa/train.txt
+++ b/data/am4mrqa/train.txt
--- a/data/match4mrqa/train.txt
+++ b/data/match4mrqa/train.txt
--- a/data/mrqa/convert.py
+++ b/data/mrqa/convert.py
+# coding: utf-8
+f='mrqa-combined.train.raw.json'
+import json
+a=json.load(open(f))
+a=a['data']
+writer = open('train.json','w')
+    
+for s in a:
+    p = s['paragraphs']
+    assert len(p) == 1
+    p = p[0]
+    q = {}
+    q['context'] = p['context']
+    q['qa_list'] = p['qas']
+    writer.write(json.dumps(q)+'\n')
+    
+writer.close()
+    
--- a/data/mrqa/mrqa-combined.dev.raw.json
+++ b/data/mrqa/mrqa-combined.dev.raw.json
--- a/data/mrqa/train.json
+++ b/data/mrqa/train.json
--- a/data/user_define/user_define.md
+++ b/data/user_define/user_define.md
+user define model dataset
--- a/demo.py
+++ b/demo.py
+import paddlepalm as palm
+
+if __name__ == '__main__':
+    controller = palm.Controller('config.yaml', task_dir='task_instance')
+    controller.load_pretrain('pretrain_model/ernie/params')
+    controller.train()
+
+    controller = palm.Controller(config='config.yaml', task_dir='task_instance', for_train=False)
+    controller.pred('mrqa', inference_model_dir='output_model/firstrun/infer_model')
+
--- a/download_pretrain.sh
+++ b/download_pretrain.sh
@@ -21,6 +21,10 @@ else
    exit 1
 fi

+if [[ ! -d pretrain_model ]]; then
+    mkdir pretrain_model
+fi
+
 cd pretrain_model
 mkdir $name
 cd $name

--- a/mtl_config.yaml
+++ b/mtl_config.yaml
-main_task: "reading_comprehension"
-auxiliary_task: "mask_language_model answer_matching"
-
-do_train: True
-do_predict: True
-
-use_cuda: True
-
-checkpoint_path: "output_model/firstrun"
-
-backbone_model: "bert_model"
-pretrain_model_path: "pretrain_model/bert"
-pretrain_config_path: "pretrain_model/bert/bert_config.json"
-vocab_path: "pretrain_model/bert/vocab.txt"
-
-# backbone_model: "ernie_model"
-# pretrain_model_path: "pretrain_model/ernie/params"
-# pretrain_config_path: "pretrain_model/ernie/ernie_config.json"
-# vocab_path: "pretrain_model/ernie/vocab.txt"
-
-optimizer: "bert_optimizer"
-learning_rate: 3e-5
-lr_scheduler: "linear_warmup_decay"
-skip_steps: 10
-save_steps: 10000
-epoch: 2
-warmup_proportion: 0.1
-weight_decay: 0.1
-do_lower_case: True
-max_seq_len: 512
-use_ema: True
-ema_decay: 0.9999
-random_seed: 0
-loss_scaling: 1.0
-
--- a/mtl_run.py
+++ b/mtl_run.py
-#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# -*- coding: utf-8 -*-
-
-import os
-import sys
-import time
-import argparse
-import importlib
-import collections
-import numpy as np
-import multiprocessing
-
-import paddle
-import paddle.fluid as fluid
-
-from utils.configure import PDConfig
-from utils.placeholder import Placeholder 
-from utils.configure import JsonConfig, ArgumentGroup, print_arguments
-from utils.init import init_pretraining_params, init_checkpoint
-
-sys.path.append("reader")
-import joint_reader
-from joint_reader import create_reader
-
-sys.path.append("optimizer")
-sys.path.append("paradigm")
-sys.path.append("backbone")
-
-TASKSET_PATH="config"
-
-def train(multitask_config): 
-
-    # load task config
-    print("Loading multi_task configure...................")
-    args = PDConfig(yaml_file=[multitask_config])
-    args.build()
-
-    index = 0
-    reader_map_task = dict()
-    task_args_list = list()
-    reader_args_list = list()
-    id_map_task = {index: args.main_task}
-    print("Loading main task configure....................")
-    main_task_name = args.main_task
-    task_config_files = [i for i in os.listdir(TASKSET_PATH) if i.endswith('.yaml')]
-    main_config_list = [config for config in task_config_files if config.split('.')[0] == main_task_name]
-    main_args = None
-    for config in main_config_list: 
-        main_yaml = os.path.join(TASKSET_PATH, config)
-        main_args = PDConfig(yaml_file=[multitask_config, main_yaml])
-        main_args.build()
-        main_args.Print()
-        if not task_args_list or main_task_name != task_args_list[-1][0]: 
-            task_args_list.append((main_task_name, main_args))
-        reader_args_list.append((config.strip('.yaml'), main_args))
-        reader_map_task[config.strip('.yaml')] = main_task_name
-
-    print("Loading auxiliary tasks configure...................")
-    aux_task_name_list = args.auxiliary_task.strip().split()
-    for aux_task_name in aux_task_name_list: 
-        index += 1
-        id_map_task[index] = aux_task_name
-        print("Loading %s auxiliary tasks configure......." % aux_task_name)
-        aux_config_list = [config for config in task_config_files if config.split('.')[0] == aux_task_name]
-        for aux_yaml in aux_config_list: 
-            aux_yaml = os.path.join(TASKSET_PATH, aux_yaml)
-            aux_args = PDConfig(yaml_file=[multitask_config, aux_yaml])
-            aux_args.build()
-            aux_args.Print()
-            if aux_task_name != task_args_list[-1][0]: 
-                task_args_list.append((aux_task_name, aux_args))
-            reader_args_list.append((aux_yaml.strip('.yaml'), aux_args))
-            reader_map_task[aux_yaml.strip('.yaml')] = aux_task_name
-
-    # import tasks reader module and build joint_input_shape
-    input_shape_list = []
-    reader_module_dict = {}
-    input_shape_dict = {}
-    for params in task_args_list: 
-        task_reader_mdl = "%s_reader" % params[0]
-        reader_module = importlib.import_module(task_reader_mdl)
-        reader_servlet_cls = getattr(reader_module, "get_input_shape")
-        reader_input_shape = reader_servlet_cls(params[1])
-        reader_module_dict[params[0]] = reader_module
-        input_shape_list.append(reader_input_shape)
-        input_shape_dict[params[0]] = reader_input_shape
-    train_input_shape, test_input_shape, task_map_id = joint_reader.joint_input_shape(input_shape_list)
-
-    # import backbone model
-    backbone_mdl = args.backbone_model
-    backbone_cls = "Model"
-    backbone_module = importlib.import_module(backbone_mdl)
-    backbone_servlet = getattr(backbone_module, backbone_cls)
-
-    if not (args.do_train or args.do_predict):
-        raise ValueError("For args `do_train` and `do_predict`, at "
-                         "least one of them must be True.")
-    if args.use_cuda:
-        place = fluid.CUDAPlace(0)
-        dev_count = fluid.core.get_cuda_device_count()
-    else:
-        place = fluid.CPUPlace()
-        dev_count = int(os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
-    exe = fluid.Executor(place)
-    startup_prog = fluid.default_startup_program()
-
-    if args.random_seed is not None:
-        startup_prog.random_seed = args.random_seed
-
-    if args.do_train: 
-        #create joint pyreader
-        print('creating readers...')
-        gens = []
-        main_generator = ""
-        for params in reader_args_list: 
-            generator_cls = getattr(reader_module_dict[reader_map_task[params[0]]], "DataProcessor")
-            generator_inst = generator_cls(params[1])
-            reader_generator = generator_inst.data_generator(phase='train', shuffle=True, dev_count=dev_count)
-            if not main_generator: 
-                main_generator = generator_inst
-            gens.append((reader_generator, params[1].mix_ratio, reader_map_task[params[0]]))
-        joint_generator, train_pyreader, model_inputs = create_reader("train_reader", train_input_shape, True, task_map_id, gens)
-
-        train_pyreader.decorate_tensor_provider(joint_generator)
-
-        # build task inputs 
-        task_inputs_list = []
-        main_test_input = []
-        task_id = model_inputs[0]
-        backbone_inputs = model_inputs[task_map_id[0][0]: task_map_id[0][1]]
-        for i in range(1, len(task_map_id)): 
-            task_inputs = backbone_inputs + model_inputs[task_map_id[i][0]: task_map_id[i][1]]
-            task_inputs_list.append(task_inputs)
-
-        # build backbone model
-        print('building model backbone...')
-        conf = vars(args)
-        if args.pretrain_config_path is not None:
-            model_conf = JsonConfig(args.pretrain_config_path).asdict()
-            for k, v in model_conf.items():
-                if k in conf:
-                    assert k == conf[k], "ERROR: argument {} in pretrain_model_config is NOT consistent with which in main.yaml"
-            conf.update(model_conf)
-
-        backbone_inst = backbone_servlet(conf, is_training=True)
-       
-        print('building task models...')
-        num_train_examples = main_generator.get_num_examples()
-        if main_args.in_tokens:
-            max_train_steps = int(main_args.epoch * num_train_examples) // (
-                    main_args.batch_size // main_args.max_seq_len) // dev_count
-        else:
-            max_train_steps = int(main_args.epoch * num_train_examples) // (
-                main_args.batch_size) // dev_count
-        mix_ratio_list = [task_args[1].mix_ratio for task_args in task_args_list]
-        args.max_train_steps = int(max_train_steps * (sum(mix_ratio_list) / main_args.mix_ratio))
-        print("Max train steps: %d" % max_train_steps)
-
-        build_strategy = fluid.BuildStrategy()
-        train_program = fluid.default_main_program()
-        with fluid.program_guard(train_program, startup_prog):
-            with fluid.unique_name.guard():
-                
-                backbone_inst.build_model(backbone_inputs)
-                all_loss_list = []
-
-                for i in range(len(task_args_list)): 
-                    task_name = task_args_list[i][0]
-                    task_args = task_args_list[i][1]
-
-                    if hasattr(task_args, 'paradigm'):
-                        task_net = task_args.paradigm
-                    else:
-                        task_net = task_name
-
-                    task_net_mdl = importlib.import_module(task_net)
-                    task_net_cls = getattr(task_net_mdl, "create_model")
-                    output_tensor = task_net_cls(task_inputs_list[i], base_model=backbone_inst, is_training=True, args=task_args)
-                    loss_cls = getattr(task_net_mdl, "compute_loss")
-                    task_loss = loss_cls(output_tensor, task_args)
-                    all_loss_list.append(task_loss)
-                    num_seqs = output_tensor['num_seqs']
-
-                task_one_hot = fluid.layers.one_hot(task_id, len(task_args_list))
-                all_loss = fluid.layers.concat(all_loss_list, axis=0)
-                loss = fluid.layers.reduce_sum(task_one_hot * all_loss)
-              
-                programs = [train_program, startup_prog]
-                optimizer_mdl = importlib.import_module(args.optimizer)
-                optimizer_inst = getattr(optimizer_mdl, "optimization")
-                optimizer_inst(loss, programs, args=args)
-                
-                loss.persistable = True
-                num_seqs.persistable = True
-
-                ema = fluid.optimizer.ExponentialMovingAverage(args.ema_decay)
-                ema.update()
-
-        train_compiled_program = fluid.CompiledProgram(train_program).with_data_parallel(
-            loss_name=loss.name, build_strategy=build_strategy)
-
-    if args.do_predict:
-        conf = vars(args)
-        if args.pretrain_config_path is not None:
-            model_conf = JsonConfig(args.pretrain_config_path).asdict()
-            for k, v in model_conf.items():
-                if k in conf:
-                    assert v == conf[k], "ERROR: argument {} in pretrain_model_config is NOT consistent with which in main.yaml".format(k)
-            conf.update(model_conf)
-        mod = reader_module_dict[main_task_name]
-        DataProcessor = getattr(mod, 'DataProcessor')
-        predict_processor = DataProcessor(main_args)
-        test_generator = predict_processor.data_generator(
-            phase='predict',
-            shuffle=False,
-            dev_count=dev_count)
-
-        new_test_input_shape = input_shape_dict[main_task_name][1]['backbone'] + input_shape_dict[main_task_name][1]['task']
-        assert new_test_input_shape == test_input_shape
-        build_strategy = fluid.BuildStrategy()
-        test_prog = fluid.Program()
-        with fluid.program_guard(test_prog, startup_prog):
-            with fluid.unique_name.guard():
-                placeholder = Placeholder(test_input_shape)
-                test_pyreader, model_inputs = placeholder.build(
-                    capacity=100, reader_name="test_reader")
-
-                test_pyreader.decorate_tensor_provider(test_generator)
-
-                # create model
-                backbone_inst = backbone_servlet(conf, is_training=False)
-
-                backbone_inst.build_model(model_inputs)
-
-                task_net_mdl = importlib.import_module(main_task_name)
-                task_net_cls = getattr(task_net_mdl, "create_model")
-                postprocess = getattr(task_net_mdl, "postprocess")
-                global_postprocess = getattr(task_net_mdl, "global_postprocess")
-                output_tensor = task_net_cls(model_inputs, base_model=backbone_inst, is_training=False, args=main_args)
-
-                if 'ema' not in dir():
-                    ema = fluid.optimizer.ExponentialMovingAverage(args.ema_decay)
-
-                pred_fetch_names = []
-                fetch_vars = []
-                for i,j in output_tensor.items():
-                    pred_fetch_names.append(i)
-                    fetch_vars.append(j)
-                for var in fetch_vars:
-                    var.persistable = True
-                pred_fetch_list = [i.name for i in fetch_vars]
-
-
-        test_prog = test_prog.clone(for_test=True)
-        test_compiled_program = fluid.CompiledProgram(test_prog).with_data_parallel(
-            build_strategy=build_strategy)
-
-    exe.run(startup_prog)
-
-    if args.do_train:
-        if args.pretrain_model_path:
-            init_pretraining_params(
-                exe,
-                args.pretrain_model_path,
-                main_program=startup_prog,
-                use_fp16=False)
-        if args.checkpoint_path:
-            if os.path.exists(args.checkpoint_path):
-                init_checkpoint(
-                    exe,
-                    args.checkpoint_path,
-                    main_program=startup_prog,
-                    use_fp16=False)
-            else:
-                os.makedirs(args.checkpoint_path)
-
-    elif args.do_predict:
-        if not args.checkpoint_path:
-            raise ValueError("args 'checkpoint_path' should be set if"
-                             "only doing prediction!")
-        init_checkpoint(
-            exe,
-            args.checkpoint_path,
-            main_program=test_prog,
-            use_fp16=False)
-
-    if args.do_train:
-        print('start training...')
-        train_pyreader.start()
-
-        steps = 0
-        total_cost, total_num_seqs = [], []
-        time_begin = time.time()
-        while True:
-            try:
-                steps += 1
-                if steps % args.skip_steps == 0:
-                    fetch_list = [loss.name, num_seqs.name, task_id.name]
-                else:
-                    fetch_list = []
-
-                outputs = exe.run(train_compiled_program, fetch_list=fetch_list)
-
-                if steps % args.skip_steps == 0:
-                    np_loss, np_num_seqs, np_task_id = outputs
-                    total_cost.extend(np_loss * np_num_seqs)
-                    total_num_seqs.extend(np_num_seqs)
-
-                    time_end = time.time()
-                    used_time = time_end - time_begin
-                    current_example, epoch = main_generator.get_train_progress()
-                   
-                    cur_task_name = id_map_task[np_task_id[0][0]]
-                    print("epoch: %d, task_name: %s, progress: %d/%d, step: %d, loss: %f, "
-                          "speed: %f steps/s" %
-                          (epoch, cur_task_name, current_example, num_train_examples, steps,
-                           np.sum(total_cost) / np.sum(total_num_seqs),
-                           args.skip_steps / used_time))
-                    total_cost, total_num_seqs = [], []
-                    time_begin = time.time()
-
-                if steps % args.save_steps == 0:
-                    save_path = os.path.join(args.checkpoint_path,
-                                             "step_" + str(steps))
-                    fluid.io.save_persistables(exe, save_path, train_program)
-                if steps == max_train_steps:
-                    save_path = os.path.join(args.checkpoint_path,
-                                             "step_" + str(steps) + "_final")
-                    fluid.io.save_persistables(exe, save_path, train_program)
-                    break
-            except paddle.fluid.core.EOFException as err:
-                save_path = os.path.join(args.checkpoint_path,
-                                         "step_" + str(steps) + "_final")
-                fluid.io.save_persistables(exe, save_path, train_program)
-                train_pyreader.reset()
-                break
-
-    if args.do_predict:
-        print('start predicting...')
-        cnt = 0
-        if args.use_ema:
-            with ema.apply(exe):
-                test_pyreader.start()
-                pred_buf = []
-                while True:
-                    try:
-                        fetch_res = exe.run(fetch_list=pred_fetch_list, program=test_compiled_program)
-                        cnt += 1
-                        if cnt % 200 == 0:
-                            print('predicting {}th batch...'.format(cnt))
-                        fetch_dict = {}
-                        for key,val in zip(pred_fetch_names, fetch_res):
-                            fetch_dict[key] = val
-                        res = postprocess(fetch_dict)
-                        if res is not None:
-                            pred_buf.extend(res)
-                    except fluid.core.EOFException:
-                        test_pyreader.reset()
-                        break
-                global_postprocess(pred_buf, predict_processor, args, main_args)
-        else:
-            test_pyreader.start()
-            pred_buf = []
-            while True:
-                try:
-                    fetch_res = exe.run(fetch_list=pred_fetch_list, program=test_compiled_program)
-                    cnt += 1
-                    if cnt % 200 == 0:
-                        print('predicting {}th batch...'.format(cnt))
-                    fetch_dict = {}
-                    for key,val in zip(pred_fetch_names, fetch_res):
-                        fetch_dict[key] = val
-                    res = postprocess(fetch_dict)
-                    if res is not None:
-                        pred_buf.extend(res)
-                except fluid.core.EOFException:
-                    test_pyreader.reset()
-                    break
-            global_postprocess(pred_buf, predict_processor, args, main_args)
-
-
-if __name__ == '__main__':
-
-    multitask_config = "mtl_config.yaml"
-    train(multitask_config)
--- a/optimizer/README.md
+++ b/optimizer/README.md
-
-该目录存放优化器，用户可通过实现相关接口完成自定义
--- a/backbone/__init__.py
+++ b/backbone/__init__.py
--- a/paddlepalm/__init__.py
+++ b/paddlepalm/__init__.py
+
+import sys
+from paddlepalm.mtl_controller import Controller
+sys.path.append('paddlepalm')
+
--- a/backbone/utils/__init__.py
+++ b/backbone/utils/__init__.py
--- a/paradigm/__init__.py
+++ b/paradigm/__init__.py
--- a/backbone/bert_model.py
+++ b/backbone/bert_model.py
+# -*- coding: UTF-8 -*-
 #   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -11,24 +12,27 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-"""BERT model"""
+"""v1.1 
+BERT model."""

 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

-import paddle.fluid as fluid
+from paddle import fluid
 from paddle.fluid import layers

-import backbone.utils.transformer as transformer
+from paddlepalm.backbone.utils.transformer import pre_process_layer, encoder
+from paddlepalm.interface import backbone
+
    
-class Model(object):
+class Model(backbone):
    
    def __init__(self,
                 config,
-                 is_training=False,
-                 model_name=''):
+                 phase):

+        # self._is_training = phase == 'train' # backbone一般不用关心运行阶段，因为outputs在任何阶段基本不会变
        self._emb_size = config["hidden_size"]
        self._n_layer = config["num_hidden_layers"]
        self._n_head = config["num_attention_heads"]
@@ -39,8 +43,6 @@ class Model(object):
        self._prepostprocess_dropout = config["hidden_dropout_prob"]
        self._attention_dropout = config["attention_probs_dropout_prob"]

-        self._is_training = is_training
-
        self.model_name = model_name

        self._word_emb_name = self.model_name + "word_embedding"
@@ -52,35 +54,48 @@ class Model(object):
        self._param_initializer = fluid.initializer.TruncatedNormal(
            scale=config["initializer_range"])

-    def build_model(self, reader_input, use_fp16=False):
-        
-        dtype = "float16" if use_fp16 else "float32"
+    @property
+    def inputs_attr(self):
+        return {"token_ids": [-1, self._max_position_seq_len, 1], 'int64'],
+                "position_ids": [-1, self._max_position_seq_len, 1], 'int64'],
+                "segment_ids": [-1, self._max_position_seq_len, 1], 'int64'],
+                "input_mask": [-1, self._max_position_seq_len, 1], 'float32']}

-        src_ids, pos_ids, sent_ids, input_mask = reader_input[:4]
+    @property
+    def outputs_attr(self):
+        return {"word_emb": [-1, self._max_position_seq_len, self._emb_size],
+                "sentence_emb": [-1, self._emb_size],
+                "sentence_pair_emb": [-1, self._emb_size]}
+
+    def build(self, inputs):
+        src_ids = inputs['token_ids']
+        pos_ids = inputs['position_ids']
+        sent_ids = inputs['segment_ids']
+        input_mask = inputs['input_mask']
        # padding id in vocabulary must be set to 0
-        emb_out = fluid.layers.embedding(
+        emb_out = layers.embedding(
            input=src_ids,
            size=[self._voc_size, self._emb_size],
-            dtype=dtype,
+            dtype="float32",
            param_attr=fluid.ParamAttr(
                name=self._word_emb_name, initializer=self._param_initializer),
            is_sparse=False)
        
        self.emb_out = emb_out
        
-        position_emb_out = fluid.layers.embedding(
+        position_emb_out = layers.embedding(
            input=pos_ids,
            size=[self._max_position_seq_len, self._emb_size],
-            dtype=dtype,
+            dtype="float32",
            param_attr=fluid.ParamAttr(
                name=self._pos_emb_name, initializer=self._param_initializer))
    
        self.position_emb_out = position_emb_out

-        sent_emb_out = fluid.layers.embedding(
+        sent_emb_out = layers.embedding(
            sent_ids,
            size=[self._sent_types, self._emb_size],
-            dtype=dtype,
+            dtype="float32"
            param_attr=fluid.ParamAttr(
                name=self._sent_emb_name, initializer=self._param_initializer))

@@ -88,24 +103,21 @@ class Model(object):

        emb_out = emb_out + position_emb_out + sent_emb_out

-        emb_out = transformer.pre_process_layer(
+        emb_out = pre_process_layer(
            emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')

-        if dtype == "float16":
-            input_mask = fluid.layers.cast(x=input_mask, dtype=dtype)
-
-        self_attn_mask = fluid.layers.matmul(
+        self_attn_mask = layers.matmul(
            x = input_mask, y = input_mask, transpose_y = True)

-        self_attn_mask = fluid.layers.scale(
+        self_attn_mask = layers.scale(
            x = self_attn_mask, scale = 10000.0, bias = -1.0, bias_after_scale = False)
        
-        n_head_self_attn_mask = fluid.layers.stack(
+        n_head_self_attn_mask = layers.stack(
            x=[self_attn_mask] * self._n_head, axis=1)
        
        n_head_self_attn_mask.stop_gradient = True

-        self._enc_out = transformer.encoder(
+        enc_out = encoder(
            enc_input = emb_out,
            attn_bias = n_head_self_attn_mask,
            n_layer = self._n_layer,
@@ -123,9 +135,9 @@ class Model(object):
            param_initializer = self._param_initializer,
            name = self.model_name + 'encoder')

-        next_sent_feat = fluid.layers.slice(
-            input = self._enc_out, axes = [1], starts = [0], ends = [1])
-        self.next_sent_feat = fluid.layers.fc(
+        next_sent_feat = layers.slice(
+            input = enc_out, axes = [1], starts = [0], ends = [1])
+        next_sent_feat = layers.fc(
            input = next_sent_feat,
            size = self._emb_size,
            act = "tanh",
@@ -133,19 +145,12 @@ class Model(object):
                name = self.model_name + "pooled_fc.w_0", 
                initializer = self._param_initializer),
            bias_attr = "pooled_fc.b_0")
-    @property
-    def final_word_representation(self):
-        """final layer output of transformer encoder as the (contextual) word representation"""
-        return self._enc_out
-
-    @property
-    def final_sentence_representation(self):
-        """final representation of the first token ([CLS]) as sentence representation """
-
-        return self.next_sent_feat

+        return {'word_emb': enc_out,
+                'sentence_emb': next_sent_feat, 
+                'sentence_pair_emb': next_sent_feat}

-if __name__ == "__main__":
-    print("hello world!")
+    def postprocess(self, rt_outputs):
+        pass


--- a/paddlepalm/backbone/bow.py
+++ b/paddlepalm/backbone/bow.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+from paddle.fluid import layers
+
+class Model(backbone):
+    
+    def __init__(self, config, phase):
+
+        # self._is_training = phase == 'train' # backbone一般不用关心运行阶段，因为outputs在任何阶段基本不会变
+
+        self._emb_size = config["emb_size"]
+        self._voc_size = config["vocab_size"]
+
+    @property
+    def inputs_attr(self):
+        return {"token_ids": [-1, self._max_position_seq_len, 1], 'int64']}
+
+    @property
+    def outputs_attr(self):
+        return {"word_emb": [-1, self._max_position_seq_len, self._emb_size],
+                "sentence_emb": [-1, self._emb_size*2]}
+
+    def build(self, inputs):
+
+        tok_ids = inputs['token_ids']
+        
+        emb_out = layers.embedding(
+            input=tok_ids,
+            size=[self._voc_size, self._emb_size],
+            dtype='float32',
+            param_attr=fluid.ParamAttr(
+                name='word_emb', 
+                initializer=fluid.initializer.TruncatedNormal(scale=0.1)),
+            is_sparse=False)
+
+        sent_emb1 = layers.reduce_mean(emb_out, axis=1)
+        sent_emb2 = layers.reduce_max(emb_out, axis=1)
+        sent_emb = layers.concat([sent_emb1, sent_emb2], axis=1)
+        return {'word_emb': emb_out,
+                'sentence_emb': sent_emb}
+
+    def postprocess(self, rt_outputs):
+        pass
+
+
--- a/backbone/ernie_model.py
+++ b/backbone/ernie_model.py
+# -*- coding: UTF-8 -*-
 #   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -19,16 +20,20 @@ from __future__ import print_function
 from __future__ import unicode_literals
 from __future__ import absolute_import

-import paddle.fluid as fluid
+from paddle import fluid
+from paddle.fluid import layers

-import backbone.utils.transformer4ernie as transformer
+from paddlepalm.backbone.utils.transformer import pre_process_layer, encoder
+from paddlepalm.interface import backbone


-class Model(object):
+class Model(backbone):
+
    def __init__(self,
                 config,
-                 is_training=False,
-                 ):
+                 phase):
+
+        # self._is_training = phase == 'train' # backbone一般不用关心运行阶段，因为outputs在任何阶段基本不会变

        self._emb_size = config['hidden_size']
        self._n_layer = config['num_hidden_layers']
@@ -40,6 +45,8 @@ class Model(object):
        else:
            self._sent_types = config['type_vocab_size']

+        self._task_types = config['task_type_vocab_size']
+
        self._hidden_act = config['hidden_act']
        self._prepostprocess_dropout = config['hidden_dropout_prob']
        self._attention_dropout = config['attention_probs_dropout_prob']
@@ -53,12 +60,29 @@ class Model(object):
        self._param_initializer = fluid.initializer.TruncatedNormal(
            scale=config['initializer_range'])

+    @property
+    def inputs_attr(self):
+        return {"token_ids": [[-1, -1, 1], 'int64'],
+                "position_ids": [[-1, -1, 1], 'int64'],
+                "segment_ids": [[-1, -1, 1], 'int64'],
+                "input_mask": [[-1, -1, 1], 'float32'],
+                "task_ids": [[-1,-1, 1], 'int64']}
+
+    @property
+    def outputs_attr(self):
+        return {"word_embedding": [[-1, -1, self._emb_size], 'float32'],
+                "encoder_outputs": [[-1, -1, self._emb_size], 'float32'],
+                "sentence_embedding": [[-1, self._emb_size], 'float32'],
+                "sentence_pair_embedding": [[-1, self._emb_size], 'float32']}

-    def build_model(self, reader_input, use_fp16=False):
+    def build(self, inputs):

-        dtype = "float16" if use_fp16 else "float32"
+        src_ids = inputs['token_ids']
+        pos_ids = inputs['position_ids']
+        sent_ids = inputs['segment_ids']
+        input_mask = inputs['input_mask']
+        task_ids = inputs['task_ids']

-        src_ids, pos_ids, sent_ids, input_mask = reader_input[:4]
        # padding id in vocabulary must be set to 0
        emb_out = fluid.layers.embedding(
            input=src_ids,
@@ -85,12 +109,19 @@ class Model(object):
        emb_out = emb_out + position_emb_out
        emb_out = emb_out + sent_emb_out

-        emb_out = transformer.pre_process_layer(
+        task_emb_out = fluid.layers.embedding(
+            task_ids,
+            size=[self._task_types, self._emb_size],
+            dtype=self._emb_dtype,
+            param_attr=fluid.ParamAttr(
+                name=self._task_emb_name,
+                initializer=self._param_initializer))
+
+        emb_out = emb_out + task_emb_out
+
+        emb_out = pre_process_layer(
            emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')

-        if dtype == "float16":
-            emb_out = fluid.layers.cast(x=emb_out, dtype=dtype)
-            input_mask = fluid.layers.cast(x=input_mask, dtype=dtype)
        self_attn_mask = fluid.layers.matmul(
            x=input_mask, y=input_mask, transpose_y=True)

@@ -100,7 +131,7 @@ class Model(object):
            x=[self_attn_mask] * self._n_head, axis=1)
        n_head_self_attn_mask.stop_gradient = True

-        self._enc_out = transformer.encoder(
+        enc_out = encoder(
            enc_input=emb_out,
            attn_bias=n_head_self_attn_mask,
            n_layer=self._n_layer,
@@ -117,20 +148,11 @@ class Model(object):
            postprocess_cmd="dan",
            param_initializer=self._param_initializer,
            name='encoder')
-        if dtype == "float16":        
-            self._enc_out = fluid.layers.cast(
-                x=self._enc_out, dtype=self._emb_dtype)
-
-
-    @property
-    def final_word_representation(self):
-        return self._enc_out

-    @property
-    def final_sentence_representation(self):
-        """Get the first feature of each sequence for classification"""
+        
        next_sent_feat = fluid.layers.slice(
-            input=self._enc_out, axes=[1], starts=[0], ends=[1])
+            input=enc_out, axes=[1], starts=[0], ends=[1])
+        next_sent_feat = fluid.layers.reshape(next_sent_feat, [-1, next_sent_feat.shape[-1]])
        next_sent_feat = fluid.layers.fc(
            input=next_sent_feat,
            size=self._emb_size,
@@ -138,5 +160,11 @@ class Model(object):
            param_attr=fluid.ParamAttr(
                name="pooled_fc.w_0", initializer=self._param_initializer),
            bias_attr="pooled_fc.b_0")
-        return next_sent_feat

+        return {'word_embedding': emb_out,
+                'encoder_outputs': enc_out,
+                'sentence_embedding': next_sent_feat,
+                'sentence_pair_embedding': next_sent_feat}
+
+    def postprocess(self, rt_outputs):
+        pass
--- a/reader/__init__.py
+++ b/reader/__init__.py
--- a/backbone/utils/transformer4ernie.py
+++ b/backbone/utils/transformer4ernie.py
+# -*- coding: UTF-8 -*-
 #   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -22,7 +23,6 @@ from functools import partial
 import paddle.fluid as fluid
 import paddle.fluid.layers as layers

-
 def multi_head_attention(queries,
                         keys,
                         values,

--- a/paradigm/answer_matching.py
+++ b/paradigm/answer_matching.py
+# -*- coding: UTF-8 -*-
 #   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -11,49 +12,31 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-# -*- coding: utf-8 -*-

-import paddle.fluid as fluid
+BACKBONE_DIR='paddlepalm.backbone'
+TASK_INSTANCE_DIR='paddlepalm.task_instance'
+READER_DIR='paddlepalm.reader'
+PARADIGM_DIR='paddlepalm.task_paradigm'
+OPTIMIZER_DIR='paddlepalm.optimizer'
+OPTIMIZE_METHOD='optimize'
+
+REQUIRED_ARGS={
+    'task_instance': str,
+    'backbone': str,
+    'optimizer': str,
+    'learning_rate': float,
+    'batch_size': int
+    }
+
+OPTIONAL_ARGS={
+    'mix_ratio': str,
+    'target_tag': str,
+    'reuse_rag': str
+    }
+
+TASK_REQUIRED_ARGS={
+    'paradigm': str,
+    'reader': str,
+    'train_file': str
+    }

-
-def compute_loss(output_tensors, args=None):
-    """Compute loss for mrc model"""
-    labels = output_tensors['labels']
-    logits = output_tensors['logits']
-
-    ce_loss, probs = fluid.layers.softmax_with_cross_entropy(
-        logits=logits, label=labels, return_softmax=True)
-    loss = fluid.layers.mean(x=ce_loss)
-
-    return loss
-
-
-def create_model(reader_input, base_model=None, is_training=True, args=None):
-    """
-        given the base model, reader_input
-        return the output tensors
-    """
-    labels = reader_input[-1]
-
-    cls_feats = base_model.final_sentence_representation
-    cls_feats = fluid.layers.dropout(
-        x=cls_feats,
-        dropout_prob=0.1,
-        dropout_implementation="upscale_in_train")
-    logits = fluid.layers.fc(
-        input=cls_feats,
-        size=2,
-        param_attr=fluid.ParamAttr(
-            name="cls_out_w",
-            initializer=fluid.initializer.TruncatedNormal(scale=0.02)),
-        bias_attr=fluid.ParamAttr(
-            name="cls_out_b", initializer=fluid.initializer.Constant(0.)))
-
-    num_seqs = fluid.layers.fill_constant(shape=[1], value=512, dtype='int64') 
-
-    output_tensors = {}
-    output_tensors['labels'] = labels
-    output_tensors['logits'] = logits
-    output_tensors['num_seqs'] = num_seqs
-
-    return output_tensors
--- a/paddlepalm/interface.py
+++ b/paddlepalm/interface.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""v1.1"""
+
+class reader(object):
+    """interface of data manager."""
+
+    def __init__(self, config):
+        assert isinstance(config, dict)
+
+    # @property
+    # def inputs_attr(self):
+    #     """描述reader输入对象的属性，包含各个对象的名字、shape以及数据类型。当某个对象为标量数据类型（如str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape中的相应维度设置为-1.
+    #     Return:
+    #         dict类型。对各个输入对象的属性描述。例如，
+    #         对于文本分类任务，可能需要包含输入文本和所属标签的id
+    #             {"text": ([], 'str'),
+    #              "label": ([], 'int')}
+    #         对于标注任务，可能需要输入词序列和对应的标签
+    #             {"tokens", ([-1], 'str'),
+    #              "tags", ([-1], 'str')}
+    #         对于机器阅读理解任务，可能需要包含上下文、问题、回答、答案区域的起止位置等
+    #             {"paragraph", ([], 'str'),
+    #              "question", ([], 'str'),
+    #              "start_position", ([], 'int')
+    #         """
+    #     raise NotImplementedError()
+
+    @property
+    def outputs_attr(self):
+        """描述reader输出对象（被yield出的对象）的属性，包含各个对象的名字、shape以及数据类型。当某个对象为标量数据类型（如str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape中的相应维度设置为-1。
+        注意：当使用mini-batch梯度下降学习策略时，，应为常规的输入对象设置batch_size维度（一般为-1）
+        Return:
+            dict类型。对各个输入对象的属性描述。例如，
+            对于文本分类和匹配任务，yield的输出内容可能包含如下的对象（下游backbone和task可按需访问其中的对象）
+                {"token_ids": ([-1, max_len], 'int64'),
+                 "input_ids": ([-1, max_len], 'int64'),
+                 "segment_ids": ([-1, max_len], 'int64'),
+                 "input_mask": ([-1, max_len], 'float32'),
+                 "label": ([-1], 'int')}
+        """
+        raise NotImplementedError()
+
+    # def parse_line(self):
+    #     """框架内部使用字典描述每个样本，字典的key为inputs_attr，value为每个input对应的符合attr描述的值。
+    #         该函数负责将文本行解析成符合inputs_attr描述的字典类型的样本。默认的parse_line方法会读取json格式的数据集文件，数据集的每一行为json格式描述的样本。
+    #         用户可通过对该方法的继承改写来适配不同格式的数据集，例如csv格式甚至tfrecord文件。
+    #         """
+    #     raise NotImplementedError()
+    # 
+    # def tokenize(self, line):
+    #     """框架中内置了word piece tokenizer等分词器，用户可通过修改tokenizer超参数来制定使用的分词器，若内置的分词器均无法满足需求，用户可通过对该方法的继承改写来自定义分词器。
+    #         Args:
+    #             - line: a unicode string. 
+    #         Return:
+    #             a list of tokens
+    #         """
+    #     raise NotImplementedError()
+    
+    def iterator(self):
+        """数据集遍历接口，注意，当数据集遍历到尾部时该接口应自动完成指针重置，即重新从数据集头部开始新的遍历。
+        Yield:
+            (dict) elements that meet the requirements in output_templete
+        """
+        raise NotImplementedError()
+
+    @property
+    def num_examples(self):
+        """数据集中的样本数量，即每个epoch中iterator所生成的样本数。注意，使用滑动窗口等可能导致数据集样本数发生变化的策略时，该接口应返回runtime阶段的实际样本数。"""
+        raise NotImplementedError()
+
+
+
+class backbone(object):
+    """interface of backbone model."""
+
+    def __init__(self, config, phase):
+        """
+        Args:
+            config: dict类型。描述了 多任务配置文件+预训练模型配置文件 中定义超参数
+            phase: str类型。运行阶段，目前支持train和predict
+            """
+        assert isinstance(config, dict)
+
+    @property
+    def inputs_attr(self):
+        """描述backbone从reader处需要得到的输入对象的属性，包含各个对象的名字、shape以及数据类型。当某个对象为标量数据类型（如str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape中的相应维度设置为-1。
+        Return:
+            dict类型。对各个输入对象的属性描述。例如，
+            对于文本分类和匹配任务，bert backbone依赖的reader对象主要包含如下的对象
+                {"token_ids": ([-1, max_len], 'int64'),
+                 "input_ids": ([-1, max_len], 'int64'),
+                 "segment_ids": ([-1, max_len], 'int64'),
+                 "input_mask": ([-1, max_len], 'float32')}"""
+        raise NotImplementedError()
+
+    @property
+    def outputs_attr(self):
+        """描述backbone输出对象的属性，包含各个对象的名字、shape以及数据类型。当某个对象为标量数据类型（如str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape中的相应维度设置为-1。
+        Return:
+            dict类型。对各个输出对象的属性描述。例如，
+            对于文本分类和匹配任务，bert backbone的输出内容可能包含如下的对象
+                {"word_emb": ([-1, max_seqlen, word_emb_size], 'float32'),
+                 "sentence_emb": ([-1, hidden_size], 'float32'),
+                 "sim_vec": ([-1, hidden_size], 'float32')}""" 
+        raise NotImplementedError()
+
+    def build(self, inputs):
+        """建立backbone的计算图。将符合inputs_attr描述的静态图Variable输入映射成符合outputs_attr描述的静态图Variable输出。
+        Args:
+            inputs: dict类型。字典中包含inputs_attr中的对象名到计算图Variable的映射，inputs中至少会包含inputs_attr中定义的对象
+        Return:
+           需要输出的计算图变量，输出对象会被加入到fetch_list中，从而在每个训练/推理step时得到runtime的计算结果，该计算结果会被传入postprocess方法中供用户处理。
+            """
+        raise NotImplementedError()
+
+
+
+
+class task_paradigm(object):
+
+    def __init__(self, config, phase, backbone_config):
+        """
+            config: dict类型。描述了 任务实例(task instance)+多任务配置文件 中定义超参数
+            phase: str类型。运行阶段，目前支持train和predict
+            """
+
+    @property
+    def inputs_attrs(self):
+        """描述task_layer需要从reader, backbone等输入对象集合所读取到的输入对象的属性，第一级key为对象集和的名字，如backbone，reader等（后续会支持更灵活的输入），第二级key为对象集和中各对象的属性，包括对象的名字，shape和dtype。当某个对象为标量数据类型（如str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape中的相应维度设置为-1。
+        Return:
+            dict类型。对各个对象集及其输入对象的属性描述。"""
+        raise NotImplementedError()
+
+    @property
+    def outputs_attr(self):
+        """描述task输出对象的属性，包括对象的名字，shape和dtype。输出对象会被加入到fetch_list中，从而在每个训练/推理step时得到runtime的计算结果，该计算结果会被传入postprocess方法中供用户处理。
+        当某个对象为标量数据类型（如str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape中的相应维度设置为-1。
+        Return:
+            dict类型。对各个输入对象的属性描述。注意，训练阶段必须包含名为loss的输出对象。
+            """
+
+        raise NotImplementedError()
+
+    def build(self, inputs):
+        """建立task_layer的计算图。将符合inputs_attrs描述的来自各个对象集的静态图Variables映射成符合outputs_attr描述的静态图Variable输出。
+        Args:
+            inputs: dict类型。字典中包含inputs_attrs中的对象名到计算图Variable的映射，inputs中至少会包含inputs_attr中定义的对象
+        Return:
+           需要输出的计算图变量，输出对象会被加入到fetch_list中，从而在每个训练/推理step时得到runtime的计算结果，该计算结果会被传入postprocess方法中供用户处理。
+
+        """
+        raise NotImplementedError()
+
+    def postprocess(self, rt_outputs):
+        """每个训练或推理step后针对当前batch的task_layer的runtime计算结果进行相关后处理。注意，rt_outputs除了包含build方法，还自动包含了loss的计算结果。"""
+        pass
+        
+    def post_postprocess(self, global_buffer):
+        pass
+
--- a/paddlepalm/mtl_controller.py
+++ b/paddlepalm/mtl_controller.py
--- a/utils/__init__.py
+++ b/utils/__init__.py
--- a/optimizer/bert_optimizer.py
+++ b/optimizer/bert_optimizer.py
+# -*- coding: UTF-8 -*-
 #   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -48,25 +49,23 @@ def linear_warmup_decay(learning_rate, warmup_steps, num_train_steps):
        return lr


-def optimization(loss, programs, args): 
-    train_program = programs[0]
-    startup_prog = programs[1]
-    warmup_steps = args.max_train_steps * args.warmup_proportion
+def optimize(loss, config, max_train_steps=None, warmup_steps=0, train_program=None):
    if warmup_steps > 0:
-        if args.lr_scheduler == 'noam_decay':
+        decay_strategy = config.get('lr_scheduler', 'linear_warmup_decay')
+        if decay_strategy == 'noam_decay':
            scheduled_lr = fluid.layers.learning_rate_scheduler\
-             .noam_decay(1/(warmup_steps *(float(args.learning_rate) ** 2)),
+             .noam_decay(1/(warmup_steps *(config['learning_rate'] ** 2)),
                         warmup_steps)
-        elif args.lr_scheduler == 'linear_warmup_decay':
-            scheduled_lr = linear_warmup_decay(float(args.learning_rate), warmup_steps,
-                                               args.max_train_steps)
+        elif decay_strategy == 'linear_warmup_decay':
+            scheduled_lr = linear_warmup_decay(config['learning_rate'], warmup_steps,
+                                               max_train_steps)
        else:
-            raise ValueError("Unkown learning rate scheduler, should be "
+            raise ValueError("Unkown lr_scheduler, should be "
                             "'noam_decay' or 'linear_warmup_decay'")
        optimizer = fluid.optimizer.Adam(learning_rate=scheduled_lr)
    else:
-        optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
-        scheduled_lr = args.learning_rate
+        optimizer = fluid.optimizer.Adam(learning_rate=config['learning_rate'])
+        scheduled_lr = config['learning_rate']

    clip_norm_thres = 1.0
    # When using mixed precision training, scale the gradient clip threshold
@@ -91,13 +90,19 @@ def optimization(loss, programs, args):

    _, param_grads = optimizer.minimize(loss)

-    if args.weight_decay > 0:
+    for block in fluid.default_main_program().blocks:
+        for var_name in block.vars:
+            if var_name.startswith("embedding"):
+                print(block.vars[var_name])
+            
+
+    if config.get('weight_decay', 0) > 0:
        for param, grad in param_grads:
            if exclude_from_weight_decay(param.name):
                continue
            with param.block.program._optimized_guard(
                [param, grad]), fluid.framework.name_scope("weight_decay"):
                updated_param = param - param_list[
-                    param.name] * args.weight_decay * scheduled_lr
+                    param.name] * config['weight_decay'] * scheduled_lr
                fluid.layers.assign(output=param, input=updated_param)

--- a/paddlepalm/reader/__init__.py
+++ b/paddlepalm/reader/__init__.py
--- a/paddlepalm/reader/cls4bert.py
+++ b/paddlepalm/reader/cls4bert.py
--- a/paddlepalm/reader/match4ernie.py
+++ b/paddlepalm/reader/match4ernie.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddlepalm.interface import reader
+from paddlepalm.reader.utils.reader4ernie import ClassifyReader
+
+class Reader(reader):
+    
+    def __init__(self, config, phase='train', dev_count=1, print_prefix=''):
+        """
+        Args:
+            phase: train, eval, pred
+            """
+
+        self._is_training = phase == 'train'
+
+        reader = ClassifyReader(config['vocab_path'],
+            max_seq_len=config['max_seq_len'],
+            do_lower_case=config.get('do_lower_case', False),
+            for_cn=config.get('for_cn', False),
+            random_seed=config.get('seed', None))
+        self._reader = reader
+        self._dev_count = dev_count
+
+        self._batch_size = config['batch_size']
+        self._max_seq_len = config['max_seq_len']
+        if phase == 'train':
+            self._input_file = config['train_file']
+            self._num_epochs = None # 防止iteartor终止
+            self._shuffle = config.get('shuffle', False)
+            self._shuffle_buffer = config.get('shuffle_buffer', 5000)
+        elif phase == 'eval':
+            self._input_file = config['dev_file']
+            self._num_epochs = 1
+            self._shuffle = False
+            self._batch_size = config.get('pred_batch_size', self._batch_size)
+        elif phase == 'pred':
+            self._input_file = config['pred_file']
+            self._num_epochs = 1
+            self._shuffle = False
+            self._batch_size = config.get('pred_batch_size', self._batch_size)
+
+        self._phase = phase
+        # self._batch_size = 
+        self._print_first_n = config.get('print_first_n', 1)
+
+
+    @property
+    def outputs_attr(self):
+        if self._is_training:
+            return {"token_ids": [[-1, -1, 1], 'int64'],
+                    "position_ids": [[-1, -1, 1], 'int64'],
+                    "segment_ids": [[-1, -1, 1], 'int64'],
+                    "input_mask": [[-1, -1, 1], 'float32'],
+                    "label_ids": [[-1,1], 'int64'],
+                    "task_ids": [[-1, -1, 1], 'int64']
+                    }
+        else:
+            return {"token_ids": [[-1, -1, 1], 'int64'],
+                    "position_ids": [[-1, -1, 1], 'int64'],
+                    "segment_ids": [[-1, -1, 1], 'int64'],
+                    "task_ids": [[-1, -1, 1], 'int64'],
+                    "input_mask": [[-1, -1, 1], 'float32']
+                    }
+
+
+    def load_data(self):
+        self._data_generator = self._reader.data_generator(self._input_file, self._batch_size, self._num_epochs, dev_count=self._dev_count, shuffle=self._shuffle, phase=self._phase)
+
+    def iterator(self): 
+
+        def list_to_dict(x):
+            names = ['token_ids', 'segment_ids', 'position_ids', 'task_ids', 'input_mask', 
+                'label_ids', 'unique_ids']
+            outputs = {n: i for n,i in zip(names, x)}
+            del outputs['unique_ids']
+            if not self._is_training:
+                del outputs['label_ids']
+            return outputs
+
+        for batch in self._data_generator():
+            yield list_to_dict(batch)
+
+    def get_epoch_outputs(self):
+        return {'examples': self._reader.get_examples(self._phase),
+                'features': self._reader.get_features(self._phase)}
+
+    @property
+    def num_examples(self):
+        return self._reader.get_num_examples(phase=self._phase)
+
--- a/paddlepalm/reader/mlm.py
+++ b/paddlepalm/reader/mlm.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddlepalm.interface import reader
+from paddlepalm.reader.utils.reader4ernie import BaseReader
+
+class Reader(reader):
+    
+    def __init__(self, config, phase='train', dev_count=1, print_prefix=''):
+        """
+        Args:
+            phase: train, eval, pred
+            """
+
+        self._is_training = phase == 'train'
+
+        reader = ClassifyReader(config['vocab_path'],
+            max_seq_len=config['max_seq_len'],
+            do_lower_case=config.get('do_lower_case', False),
+            for_cn=config.get('for_cn', False),
+            random_seed=config.get('seed', None))
+        self._reader = reader
+        self._dev_count = dev_count
+
+        self._batch_size = config['batch_size']
+        self._max_seq_len = config['max_seq_len']
+        if phase == 'train':
+            self._input_file = config['train_file']
+            self._num_epochs = None # 防止iteartor终止
+            self._shuffle = config.get('shuffle', False)
+            self._shuffle_buffer = config.get('shuffle_buffer', 5000)
+        elif phase == 'eval':
+            self._input_file = config['dev_file']
+            self._num_epochs = 1
+            self._shuffle = False
+            self._batch_size = config.get('pred_batch_size', self._batch_size)
+        elif phase == 'pred':
+            self._input_file = config['pred_file']
+            self._num_epochs = 1
+            self._shuffle = False
+            self._batch_size = config.get('pred_batch_size', self._batch_size)
+
+        self._phase = phase
+        # self._batch_size = 
+        self._print_first_n = config.get('print_first_n', 1)
+
+
+    @property
+    def outputs_attr(self):
+        if self._is_training:
+            return {"token_ids": [[-1, -1, 1], 'int64'],
+                    "position_ids": [[-1, -1, 1], 'int64'],
+                    "segment_ids": [[-1, -1, 1], 'int64'],
+                    "input_mask": [[-1, -1, 1], 'float32'],
+                    "label_ids": [[-1,1], 'int64'],
+                    "task_ids": [[-1, -1, 1], 'int64']
+                    }
+        else:
+            return {"token_ids": [[-1, -1, 1], 'int64'],
+                    "position_ids": [[-1, -1, 1], 'int64'],
+                    "segment_ids": [[-1, -1, 1], 'int64'],
+                    "task_ids": [[-1, -1, 1], 'int64'],
+                    "input_mask": [[-1, -1, 1], 'float32']
+                    }
+
+
+    def load_data(self):
+        self._data_generator = self._reader.data_generator(self._input_file, self._batch_size, self._num_epochs, dev_count=self._dev_count, shuffle=self._shuffle, phase=self._phase)
+
+    def iterator(self): 
+
+        def list_to_dict(x):
+            names = ['token_ids', 'position_ids', 'segment_ids', 'input_mask', 
+                'task_ids', 'mask_label', 'mask_pos']
+            outputs = {n: i for n,i in zip(names, x)}
+            del outputs['unique_ids']
+            if not self._is_training:
+                del outputs['label_ids']
+            return outputs
+
+        for batch in self._data_generator():
+            yield list_to_dict(batch)
+
+    def get_epoch_outputs(self):
+        return {'examples': self._reader.get_examples(self._phase),
+                'features': self._reader.get_features(self._phase)}
+
+    @property
+    def num_examples(self):
+        return self._reader.get_num_examples(phase=self._phase)
+
--- a/paddlepalm/reader/mrc4bert.py
+++ b/paddlepalm/reader/mrc4bert.py
--- a/paddlepalm/reader/mrc4ernie.py
+++ b/paddlepalm/reader/mrc4ernie.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddlepalm.interface import reader
+from paddlepalm.reader.utils.reader4ernie import MRCReader
+
+class Reader(reader):
+    
+    def __init__(self, config, phase='train', dev_count=1, print_prefix=''):
+        """
+        Args:
+            phase: train, eval, pred
+            """
+
+        self._is_training = phase == 'train'
+
+        reader = MRCReader(config['vocab_path'],
+            max_seq_len=config['max_seq_len'],
+            do_lower_case=config.get('do_lower_case', False),
+            tokenizer='FullTokenizer',
+            doc_stride=config['doc_stride'],
+            max_query_length=config['max_query_len'],
+            random_seed=config.get('seed', None))
+        self._reader = reader
+        self._dev_count = dev_count
+
+        self._batch_size = config['batch_size']
+        self._max_seq_len = config['max_seq_len']
+        if phase == 'train':
+            self._input_file = config['train_file']
+            # self._num_epochs = config['num_epochs']
+            self._num_epochs = None # 防止iteartor终止
+            self._shuffle = config.get('shuffle', False)
+            self._shuffle_buffer = config.get('shuffle_buffer', 5000)
+        if phase == 'eval':
+            self._input_file = config['dev_file']
+            self._num_epochs = 1
+            self._shuffle = False
+            self._batch_size = config.get('pred_batch_size', self._batch_size)
+        elif phase == 'pred':
+            self._input_file = config['pred_file']
+            self._num_epochs = 1
+            self._shuffle = False
+            self._batch_size = config.get('pred_batch_size', self._batch_size)
+
+        self._phase = phase
+        # self._batch_size = 
+        self._print_first_n = config.get('print_first_n', 1)
+
+        # TODO: without slide window version
+        self._with_slide_window = config.get('with_slide_window', False)
+
+
+    @property
+    def outputs_attr(self):
+        if self._is_training:
+            return {"token_ids": [[-1, -1, 1], 'int64'],
+                    "position_ids": [[-1, -1, 1], 'int64'],
+                    "segment_ids": [[-1, -1, 1], 'int64'],
+                    "input_mask": [[-1, -1, 1], 'float32'],
+                    "start_positions": [[-1, 1], 'int64'],
+                    "end_positions": [[-1, 1], 'int64'],
+                    "task_ids": [[-1, -1, 1], 'int64']
+                    }
+        else:
+            return {"token_ids": [[-1, -1, 1], 'int64'],
+                    "position_ids": [[-1, -1, 1], 'int64'],
+                    "segment_ids": [[-1, -1, 1], 'int64'],
+                    "task_ids": [[-1, -1, 1], 'int64'],
+                    "input_mask": [[-1, -1, 1], 'float32'],
+                    "unique_ids": [[-1, 1], 'int64']
+                    }
+
+    @property
+    def epoch_outputs_attr(self):
+        if not self._is_training:
+            return {"examples": None,
+                    "features": None}
+
+    def load_data(self):
+        self._data_generator = self._reader.data_generator(self._input_file, self._batch_size, self._num_epochs, dev_count=self._dev_count, shuffle=self._shuffle, phase=self._phase)
+
+    def iterator(self): 
+
+        def list_to_dict(x):
+            names = ['token_ids', 'segment_ids', 'position_ids', 'task_ids', 'input_mask', 
+                'start_positions', 'end_positions', 'unique_ids']
+            outputs = {n: i for n,i in zip(names, x)}
+            if self._is_training:
+                del outputs['unique_ids']
+            else:
+                del outputs['start_positions']
+                del outputs['end_positions']
+            return outputs
+
+        for batch in self._data_generator():
+            yield list_to_dict(batch)
+
+    def get_epoch_outputs(self):
+        return {'examples': self._reader.get_examples(self._phase),
+                'features': self._reader.get_features(self._phase)}
+
+    @property
+    def num_examples(self):
+        return self._reader.get_num_examples(phase=self._phase)
+
--- a/paddlepalm/reader/utils/__init__.py
+++ b/paddlepalm/reader/utils/__init__.py
--- a/utils/batching.py
+++ b/utils/batching.py
+# -*- coding: UTF-8 -*-
 #   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -11,7 +12,6 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-# -*- coding: utf-8 -*-
 """Mask, padding and batching."""
 from __future__ import absolute_import
 from __future__ import division

--- a/paddlepalm/reader/utils/batching4ernie.py
+++ b/paddlepalm/reader/utils/batching4ernie.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Mask, padding and batching."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from six.moves import xrange
+
+
+def mask(batch_tokens,
+         seg_labels,
+         mask_word_tags,
+         total_token_num,
+         vocab_size,
+         CLS=1,
+         SEP=2,
+         MASK=3):
+    """
+    Add mask for batch_tokens, return out, mask_label, mask_pos;
+    Note: mask_pos responding the batch_tokens after padded;
+    """
+    max_len = max([len(sent) for sent in batch_tokens])
+    mask_label = []
+    mask_pos = []
+    prob_mask = np.random.rand(total_token_num)
+    # Note: the first token is [CLS], so [low=1]
+    replace_ids = np.random.randint(1, high=vocab_size, size=total_token_num)
+    pre_sent_len = 0
+    prob_index = 0
+    for sent_index, sent in enumerate(batch_tokens):
+        mask_flag = False
+        mask_word = mask_word_tags[sent_index]
+        prob_index += pre_sent_len
+        if mask_word:
+            beg = 0
+            for token_index, token in enumerate(sent):
+                seg_label = seg_labels[sent_index][token_index]
+                if seg_label == 1:
+                    continue
+                if beg == 0:
+                    if seg_label != -1:
+                        beg = token_index
+                    continue
+
+                prob = prob_mask[prob_index + beg]
+                if prob > 0.15:
+                    pass
+                else:
+                    for index in xrange(beg, token_index):
+                        prob = prob_mask[prob_index + index]
+                        base_prob = 1.0
+                        if index == beg:
+                            base_prob = 0.15
+                        if base_prob * 0.2 < prob <= base_prob:
+                            mask_label.append(sent[index])
+                            sent[index] = MASK
+                            mask_flag = True
+                            mask_pos.append(sent_index * max_len + index)
+                        elif base_prob * 0.1 < prob <= base_prob * 0.2:
+                            mask_label.append(sent[index])
+                            sent[index] = replace_ids[prob_index + index]
+                            mask_flag = True
+                            mask_pos.append(sent_index * max_len + index)
+                        else:
+                            mask_label.append(sent[index])
+                            mask_pos.append(sent_index * max_len + index)
+
+                if seg_label == -1:
+                    beg = 0
+                else:
+                    beg = token_index
+        else:
+            for token_index, token in enumerate(sent):
+                prob = prob_mask[prob_index + token_index]
+                if prob > 0.15:
+                    continue
+                elif 0.03 < prob <= 0.15:
+                    # mask
+                    if token != SEP and token != CLS:
+                        mask_label.append(sent[token_index])
+                        sent[token_index] = MASK
+                        mask_flag = True
+                        mask_pos.append(sent_index * max_len + token_index)
+                elif 0.015 < prob <= 0.03:
+                    # random replace
+                    if token != SEP and token != CLS:
+                        mask_label.append(sent[token_index])
+                        sent[token_index] = replace_ids[prob_index +
+                                                        token_index]
+                        mask_flag = True
+                        mask_pos.append(sent_index * max_len + token_index)
+                else:
+                    # keep the original token
+                    if token != SEP and token != CLS:
+                        mask_label.append(sent[token_index])
+                        mask_pos.append(sent_index * max_len + token_index)
+
+        pre_sent_len = len(sent)
+
+    mask_label = np.array(mask_label).astype("int64").reshape([-1, 1])
+    mask_pos = np.array(mask_pos).astype("int64").reshape([-1, 1])
+    return batch_tokens, mask_label, mask_pos
+
+
+def pad_batch_data(insts,
+                   pad_idx=0,
+                   return_pos=False,
+                   return_input_mask=False,
+                   return_max_len=False,
+                   return_num_token=False,
+                   return_seq_lens=False):
+    """
+    Pad the instances to the max sequence length in batch, and generate the
+    corresponding position data and attention bias.
+    """
+    return_list = []
+    max_len = max(len(inst) for inst in insts)
+    # Any token included in dict can be used to pad, since the paddings' loss
+    # will be masked out by weights and make no effect on parameter gradients.
+
+    inst_data = np.array(
+        [inst + list([pad_idx] * (max_len - len(inst))) for inst in insts])
+    return_list += [inst_data.astype("int64").reshape([-1, max_len, 1])]
+
+    # position data
+    if return_pos:
+        inst_pos = np.array([
+            list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst))
+            for inst in insts
+        ])
+
+        return_list += [inst_pos.astype("int64").reshape([-1, max_len, 1])]
+
+    if return_input_mask:
+        # This is used to avoid attention on paddings.
+        input_mask_data = np.array([[1] * len(inst) + [0] *
+                                    (max_len - len(inst)) for inst in insts])
+        input_mask_data = np.expand_dims(input_mask_data, axis=-1)
+        return_list += [input_mask_data.astype("float32")]
+
+    if return_max_len:
+        return_list += [max_len]
+
+    if return_num_token:
+        num_token = 0
+        for inst in insts:
+            num_token += len(inst)
+        return_list += [num_token]
+
+    if return_seq_lens:
+        seq_lens = np.array([len(inst) for inst in insts])
+        return_list += [seq_lens.astype("int64").reshape([-1, 1])]
+
+    return return_list if len(return_list) > 1 else return_list[0]
+
+
+if __name__ == "__main__":
+
+    pass
--- a/paddlepalm/reader/utils/mlm_batching.py
+++ b/paddlepalm/reader/utils/mlm_batching.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Mask, padding and batching."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+
+
+def mask(batch_tokens, total_token_num, vocab_size, CLS=1, SEP=2, MASK=3):
+    """
+    Add mask for batch_tokens, return out, mask_label, mask_pos;
+    Note: mask_pos responding the batch_tokens after padded;
+    """
+    max_len = max([len(sent) for sent in batch_tokens])
+    mask_label = []
+    mask_pos = []
+    prob_mask = np.random.rand(total_token_num)
+    # Note: the first token is [CLS], so [low=1]
+    replace_ids = np.random.randint(1, high=vocab_size, size=total_token_num)
+    pre_sent_len = 0
+    prob_index = 0
+    for sent_index, sent in enumerate(batch_tokens):
+        mask_flag = False
+        prob_index += pre_sent_len
+        for token_index, token in enumerate(sent):
+            prob = prob_mask[prob_index + token_index]
+            if prob > 0.15:
+                continue
+            elif 0.03 < prob <= 0.15:
+                # mask
+                if token != SEP and token != CLS:
+                    mask_label.append(sent[token_index])
+                    sent[token_index] = MASK
+                    mask_flag = True
+                    mask_pos.append(sent_index * max_len + token_index)
+            elif 0.015 < prob <= 0.03:
+                # random replace
+                if token != SEP and token != CLS:
+                    mask_label.append(sent[token_index])
+                    sent[token_index] = replace_ids[prob_index + token_index]
+                    mask_flag = True
+                    mask_pos.append(sent_index * max_len + token_index)
+            else:
+                # keep the original token
+                if token != SEP and token != CLS:
+                    mask_label.append(sent[token_index])
+                    mask_pos.append(sent_index * max_len + token_index)
+        pre_sent_len = len(sent)
+        # ensure at least mask one word in a sentence
+        while not mask_flag:
+            token_index = int(np.random.randint(1, high=len(sent) - 1, size=1))
+            if sent[token_index] != SEP and sent[token_index] != CLS:
+                mask_label.append(sent[token_index])
+                sent[token_index] = MASK
+                mask_flag = True
+                mask_pos.append(sent_index * max_len + token_index)
+    mask_label = np.array(mask_label).astype("int64").reshape([-1, 1])
+    mask_pos = np.array(mask_pos).astype("int64").reshape([-1, 1])
+    return batch_tokens, mask_label, mask_pos
+
+
+def prepare_batch_data(insts,
+                       total_token_num,
+                       max_len=None,
+                       voc_size=0,
+                       pad_id=None,
+                       cls_id=None,
+                       sep_id=None,
+                       mask_id=None,
+                       task_id=0,
+                       return_input_mask=True,
+                       return_max_len=True,
+                       return_num_token=False):
+    """
+    1. generate Tensor of data
+    2. generate Tensor of position
+    3. generate self attention mask, [shape: batch_size *  max_len * max_len]
+    """
+    batch_src_ids = [inst[0] for inst in insts]
+    batch_sent_ids = [inst[1] for inst in insts]
+    batch_pos_ids = [inst[2] for inst in insts]
+
+    # First step: do mask without padding
+    out, mask_label, mask_pos = mask(
+        batch_src_ids,
+        total_token_num,
+        vocab_size=voc_size,
+        CLS=cls_id,
+        SEP=sep_id,
+        MASK=mask_id)
+    # Second step: padding
+    src_id, self_input_mask = pad_batch_data(
+        out, 
+        max_len=max_len,
+        pad_idx=pad_id, return_input_mask=True)
+    pos_id = pad_batch_data(
+        batch_pos_ids,
+        max_len=max_len,
+        pad_idx=pad_id,
+        return_pos=False,
+        return_input_mask=False)
+    sent_id = pad_batch_data(
+        batch_sent_ids,
+        max_len=max_len,
+        pad_idx=pad_id,
+        return_pos=False,
+        return_input_mask=False)
+    task_ids = np.ones_like(
+        src_id, dtype="int64") * task_id
+    return_list = [
+        src_id, pos_id, sent_id, self_input_mask, task_ids, mask_label, mask_pos
+    ]
+    return return_list if len(return_list) > 1 else return_list[0]
+
+
+def pad_batch_data(insts,
+                   max_len=None,
+                   pad_idx=0,
+                   return_pos=False,
+                   return_input_mask=False,
+                   return_max_len=False,
+                   return_num_token=False):
+    """
+    Pad the instances to the max sequence length in batch, and generate the
+    corresponding position data and input mask.
+    """
+    return_list = []
+    if max_len is None:
+        max_len = max(len(inst) for inst in insts)
+    # Any token included in dict can be used to pad, since the paddings' loss
+    # will be masked out by weights and make no effect on parameter gradients.
+    inst_data = np.array([
+        list(inst) + list([pad_idx] * (max_len - len(inst))) for inst in insts
+    ])
+    return_list += [inst_data.astype("int64").reshape([-1, max_len, 1])]
+    # position data
+    if return_pos:
+        inst_pos = np.array([
+            list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst))
+            for inst in insts
+        ])
+        return_list += [inst_pos.astype("int64").reshape([-1, max_len, 1])]
+    if return_input_mask:
+        # This is used to avoid attention on paddings.
+        input_mask_data = np.array([[1] * len(inst) + [0] *
+                                    (max_len - len(inst)) for inst in insts])
+        input_mask_data = np.expand_dims(input_mask_data, axis=-1)
+        return_list += [input_mask_data.astype("float32")]
+    if return_max_len:
+        return_list += [max_len]
+    if return_num_token:
+        num_token = 0
+        for inst in insts:
+            num_token += len(inst)
+        return_list += [num_token]
+    return return_list if len(return_list) > 1 else return_list[0]
+
+
+if __name__ == "__main__":
+    pass
+
+
--- a/paddlepalm/reader/utils/mrqa_helper.py
+++ b/paddlepalm/reader/utils/mrqa_helper.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+class MRQAExample(object):
+    """A single training/test example for simple sequence classification.
+
+     For examples without an answer, the start and end position are -1.
+  """
+
+    def __init__(self,
+                 qas_id,
+                 question_text,
+                 doc_tokens,
+                 orig_answer_text=None,
+                 start_position=None,
+                 end_position=None,
+                 is_impossible=False):
+        self.qas_id = qas_id
+        self.question_text = question_text
+        self.doc_tokens = doc_tokens
+        self.orig_answer_text = orig_answer_text
+        self.start_position = start_position
+        self.end_position = end_position
+        self.is_impossible = is_impossible
+
+    def __str__(self):
+        return self.__repr__()
+
+    def __repr__(self):
+        s = ""
+        s += "qas_id: %s" % (tokenization.printable_text(self.qas_id))
+        s += ", question_text: %s" % (
+            tokenization.printable_text(self.question_text))
+        s += ", doc_tokens: [%s]" % (" ".join(self.doc_tokens))
+        if self.start_position:
+            s += ", start_position: %d" % (self.start_position)
+        if self.start_position:
+            s += ", end_position: %d" % (self.end_position)
+        if self.start_position:
+            s += ", is_impossible: %r" % (self.is_impossible)
+        return s
+
+
+class MRQAFeature(object):
+    """A single set of features of data."""
+
+    def __init__(self,
+                 unique_id,
+                 example_index,
+                 doc_span_index,
+                 tokens,
+                 token_to_orig_map,
+                 token_is_max_context,
+                 input_ids,
+                 input_mask,
+                 segment_ids,
+                 start_position=None,
+                 end_position=None,
+                 is_impossible=None):
+        self.unique_id = unique_id
+        self.example_index = example_index
+        self.doc_span_index = doc_span_index
+        self.tokens = tokens
+        self.token_to_orig_map = token_to_orig_map
+        self.token_is_max_context = token_is_max_context
+        self.input_ids = input_ids
+        self.input_mask = input_mask
+        self.segment_ids = segment_ids
+        self.start_position = start_position
+        self.end_position = end_position
+        self.is_impossible = is_impossible
+
--- a/paddlepalm/reader/utils/reader4ernie.py
+++ b/paddlepalm/reader/utils/reader4ernie.py
--- a/paddlepalm/task_instance.py
+++ b/paddlepalm/task_instance.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddlepalm.interface import reader as base_reader
+from paddlepalm.interface import task_paradigm as base_paradigm
+import os
+import json
+from paddle import fluid
+
+class TaskInstance(object):
+    
+    def __init__(self, name, id, config={}, verbose=True):
+        self._name = name
+        self._config = config
+        self._verbose = verbose
+
+        self._save_infermodel_path = os.path.join(self._config['save_path'], 'infer_model')
+        self._save_ckpt_path = os.path.join(self._config['save_path'], 'ckpt')
+
+        # following flags can be fetch from instance config file
+        self._is_target = config.get('is_target', True)
+        self._first_target = config.get('is_first_target', False)
+        self._task_reuse_scope = config.get('task_reuse_scope', name)
+
+        self._feeded_var_names = None
+        self._target_vars = None
+
+        # training process management
+        self._mix_ratio = None
+        self._expected_train_steps = None
+        self._expected_train_epochs = None
+        self._steps_pur_epoch = None
+        self._cur_train_epoch = 0
+        self._cur_train_step = 0
+        self._train_finish = False
+
+        # 存放不同运行阶段（train，eval，pred）的数据集reader，key为phase，value为Reader实例
+        self._reader = {'train': None, 'eval': None, 'pred': None}
+        self._input_layer = None
+        self._inputname_to_varname = {}
+        self._task_layer = {'train': None, 'eval': None, 'pred': None}
+        self._pred_input_name_list = []
+        self._pred_input_varname_list = []
+        self._pred_fetch_name_list = []
+        self._pred_fetch_var_list = []
+
+        self._Reader = None
+        self._Paradigm = None
+
+        self._exe = fluid.Executor(fluid.CPUPlace())
+
+        self._save_protocol = {
+            'input_names': 'self._pred_input_name_list',
+            'input_varnames': 'self._pred_input_varname_list',
+            'fetch_list': 'self._pred_fetch_name_list'}
+
+
+    def build_task_layer(self, net_inputs, phase):
+        output_vars = self._task_layer[phase].build(net_inputs)
+        if phase == 'pred':
+            self._pred_fetch_name_list, self._pred_fetch_var_list = zip(*output_vars.items())
+        return output_vars
+
+    def postprocess(self, rt_outputs, phase):
+        return self._task_layer[phase].postprocess(rt_outputs)
+
+    def epoch_postprocess(self, epoch_inputs, phase):
+        return self._task_layer[phase].epoch_postprocess(epoch_inputs)
+    
+    def save(self, suffix=''):
+        dirpath = self._save_infermodel_path + suffix
+        self._pred_input_varname_list = [str(i) for i in self._pred_input_varname_list]
+
+        fluid.io.save_inference_model(dirpath, self._pred_input_varname_list, self._pred_fetch_var_list, self._exe)
+        # fluid.io.save_inference_model(dirpath, self._pred_input_varname_list, self._pred_fetch_var_list, self._exe, params_filename='__params__')
+        print(self._name + ': inference model saved at ' + dirpath)
+
+        conf = {}
+        for k, strv in self._save_protocol.items():
+            exec('v={}'.format(strv))
+            conf[k] = v
+        with open(os.path.join(dirpath, '__conf__'), 'w') as writer:
+            writer.write(json.dumps(conf, indent=1))
+
+    def load(self, infer_model_path=None):
+        if infer_model_path is None:
+            infer_model_path = self._save_infermodel_path
+        for k,v in json.load(open(os.path.join(infer_model_path, '__conf__'))).items():
+            strv = self._save_protocol[k]
+            exec('{}=v'.format(strv))
+        pred_prog, self._pred_input_varname_list, self._pred_fetch_var_list = \
+            fluid.io.load_inference_model(infer_model_path, self._exe)
+        # pred_prog, self._pred_input_varname_list, self._pred_fetch_var_list = \
+        #     fluid.io.load_inference_model(infer_model_path, self._exe, params_filename='__params__')
+        print(self._name+': inference model loaded from ' + infer_model_path)
+        return pred_prog
+
+    @property
+    def name(self):
+        return self._name
+
+    @property
+    def Reader(self):
+        return self._Reader
+
+    @Reader.setter
+    def Reader(self, cls):
+        assert base_reader.__name__ == cls.__bases__[-1].__name__, \
+            "expect: {}, receive: {}.".format(base_reader.__name__, \
+                                              cls.__bases__[-1].__name__)
+        self._Reader = cls
+
+    @property
+    def Paradigm(self):
+        return self._Paradigm
+
+    @Paradigm.setter
+    def Paradigm(self, cls):
+        assert base_paradigm.__name__ == cls.__bases__[-1].__name__, \
+            "expect: {}, receive: {}.".format(base_paradigm.__name__, \
+                                              cls.__bases__[-1].__name__)
+        self._Paradigm = cls
+
+    @property
+    def config(self):
+        return self._config
+
+    @property
+    def reader(self):
+        return self._reader
+
+    @property
+    def pred_input(self):
+        return zip(*[self._pred_input_name_list, self._pred_input_varname_list])
+
+    @pred_input.setter
+    def pred_input(self, val):
+        assert isinstance(val, dict)
+        self._pred_input_name_list, self._pred_input_varname_list = \
+            zip(*[[k, v.name] for k,v in val.items()])
+        # print(self._pred_input_name_list)
+
+    @property
+    def pred_fetch_list(self):
+        return [self._pred_fetch_name_list, self._pred_fetch_var_list]
+
+    @property
+    def task_layer(self):
+        return self._task_layer
+
+    @property
+    def is_first_target(self):
+        return self._is_first_target
+
+    @is_first_target.setter
+    def is_first_target(self, value):
+        self._is_first_target = bool(value)
+        if self._is_first_target:
+            assert self._is_target, "ERROR: only target task could be set as main task."
+        if self._verbose and self._is_first_target:
+            print("{}: set as main task".format(self._name))
+
+    @property
+    def is_target(self):
+        if self._is_target is not None:
+            return self._is_target
+        else:
+            raise ValueError("{}: is_target is None".format(self._name))
+
+    @is_target.setter
+    def is_target(self, value):
+        self._is_target = bool(value)
+        if self._verbose:
+            if self._is_target:
+                print('{}: set as target task.'.format(self._name))
+            else:
+                print('{}: set as aux task.'.format(self._name))
+
+    @property
+    def mix_ratio(self):
+        if self._mix_ratio is not None:
+            return self._mix_ratio
+        else:
+            raise ValueError("{}: mix_ratio is None".format(self._name))
+
+    @mix_ratio.setter
+    def mix_ratio(self, value):
+        self._mix_ratio = float(value)
+        if self._verbose:
+            print('{}: mix_ratio is set to {}'.format(self._name, self._mix_ratio))
+
+    @property
+    def expected_train_steps(self):
+        return self._expected_train_steps
+
+    @expected_train_steps.setter
+    def expected_train_steps(self, value):
+        self._expected_train_steps = value
+        self._expected_train_epochs = value / float(self._steps_pur_epoch)
+
+    @property
+    def expected_train_epochs(self):
+        return self._expected_train_epochs
+
+    @property
+    def cur_train_epoch(self):
+        return self._cur_train_epoch
+
+    @cur_train_epoch.setter
+    def cur_train_epoch(self, value):
+        self._cur_train_epoch = value
+
+    @property
+    def cur_train_step(self):
+        return self._cur_train_step
+
+    @cur_train_step.setter
+    def cur_train_step(self, value):
+        self._cur_train_step = value
+        if self._cur_train_step > self._steps_pur_epoch:
+            self._cur_train_epoch += 1
+            self._cur_train_step = 1
+        if self._is_target and self._cur_train_step + self._cur_train_epoch * self._steps_pur_epoch >= self._expected_train_steps:
+            self._train_finish = True
+            print(self._name+': train finished!')
+            self.save()
+            # fluid.io.save_inference_model(self._save_infermodel_path, )
+
+    @property
+    def steps_pur_epoch(self):
+        return self._steps_pur_epoch
+
+    @steps_pur_epoch.setter
+    def steps_pur_epoch(self, value):
+        self._steps_pur_epoch = value
+
+    @property
+    def train_finish(self):
+        return self._train_finish
+
+    @property
+    def task_reuse_scope(self):
+        if self._task_reuse_scope is not None:
+            return self._task_reuse_scope
+        else:
+            raise ValueError("{}: task_reuse_scope is None".format(self._name))
+
+    @task_reuse_scope.setter
+    def task_reuse_scope(self, scope_name):
+        self._task_reuse_scope = str(scope_name)
+        if self._verbose:
+            print('{}: task_reuse_scope is set to {}'.format(self._name, self._task_reuse_scope))
+
+
+
+
+
+
+
+
+        
+
+def check_instances(insts):
+    """to check ids, first_target"""
+    pass
+
+def _check_ids():
+    pass
+
+def _check_targets():
+    pass
+
+def _check_reuse_scopes():
+    pass
--- a/paddlepalm/task_paradigm/__init__.py
+++ b/paddlepalm/task_paradigm/__init__.py
--- a/paddlepalm/task_paradigm/cls.py
+++ b/paddlepalm/task_paradigm/cls.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle.fluid as fluid
+from paddlepalm.interface import task_paradigm
+from paddle.fluid import layers
+
+class TaskParadigm(task_paradigm):
+    '''
+    classification
+    '''
+    def __init___(self, config, phase):
+        self._is_training = phase == 'train'
+        self.sent_emb_size = config['hidden_size']
+        self.num_classes = config['n_classes']
+    
+    @property
+    def inputs_attrs(self):
+        return {'bakcbone': {"sentence_emb": [-1, self.sent_emb_size], 'float32']},
+                'reader': {"label_ids": [[-1, 1], 'int64']}}
+
+    @property
+    def outputs_attrs(self):
+        if self._is_training:
+            return {'loss': [[1], 'float32']}
+        else:
+            return {'logits': [-1, self.num_classes], 'float32'}
+
+    def build(self, **inputs):
+        sent_emb = inputs['backbone']['sentence_emb']
+        label_ids = inputs['reader']['label_ids']
+
+        logits = fluid.layers.fc(
+            input=ent_emb
+            size=self.num_classes,
+            param_attr=fluid.ParamAttr(
+                name="cls_out_w",
+                initializer=fluid.initializer.TruncatedNormal(scale=0.1)),
+            bias_attr=fluid.ParamAttr(
+                name="cls_out_b", initializer=fluid.initializer.Constant(0.)))
+
+        loss = fluid.layers.softmax_with_cross_entropy(
+            logits=logits, label=label_ids)
+        loss = layers.mean(loss)
+        if self._is_training:
+            return {"loss": loss}
+        else:
+            return {"logits":logits}
--- a/paddlepalm/task_paradigm/match.py
+++ b/paddlepalm/task_paradigm/match.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle.fluid as fluid
+from paddlepalm.interface import task_paradigm
+from paddle.fluid import layers
+
+class TaskParadigm(task_paradigm):
+    '''
+    matching
+    '''
+    def __init__(self, config, phase, backbone_config=None):
+        self._is_training = phase == 'train'
+        self._hidden_size = backbone_config['hidden_size']
+    
+    @property
+    def inputs_attrs(self):
+        if self._is_training:
+            reader = {"label_ids": [[-1, 1], 'int64']}
+        else:
+            reader = {}
+        bb = {"sentence_pair_embedding": [[-1, self._hidden_size], 'float32']}
+        return {'reader': reader, 'backbone': bb}
+
+    @property
+    def outputs_attrs(self):
+        if self._is_training:
+            return {"loss": [[1], 'float32']}
+        else:
+            return {"logits": [[-1, 1], 'float32']}
+
+    def build(self, inputs):
+        labels = inputs["reader"]["label_ids"] 
+        cls_feats = inputs["backbone"]["sentence_pair_embedding"]
+
+        cls_feats = fluid.layers.dropout(
+            x=cls_feats,
+            dropout_prob=0.1,
+            dropout_implementation="upscale_in_train")
+        logits = fluid.layers.fc(
+            input=cls_feats,
+            size=2,
+            param_attr=fluid.ParamAttr(
+                name="cls_out_w",
+                initializer=fluid.initializer.TruncatedNormal(scale=0.02)),
+            bias_attr=fluid.ParamAttr(
+                name="cls_out_b",
+                initializer=fluid.initializer.Constant(0.)))
+        ce_loss, probs = fluid.layers.softmax_with_cross_entropy(
+            logits=logits, label=labels, return_softmax=True)
+        loss = fluid.layers.mean(x=ce_loss)
+
+        if self._is_training:
+            return {'loss': loss}
+        else:
+            return {'logits': logits}
+
--- a/paddlepalm/task_paradigm/mlm.py
+++ b/paddlepalm/task_paradigm/mlm.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle.fluid as fluid
+from paddlepalm.interface import task_paradigm
+from paddle.fluid import layers
+
+class TaskParadigm(task_paradigm):
+    '''
+    matching
+    '''
+    def __init__(self, config, phase, backbone_config=None):
+        self._is_training = phase == 'train'
+        self._hidden_size = backbone_config['hidden_size']
+        self._vocab_size = backbone_config['vocab_size']
+        self._hidden_act = backbone_config['hidden_act']
+        self._initializer_range = backbone_config['initializer_range']
+    
+    @property
+    def inputs_attrs(self):
+        if self._is_training:
+            reader = {"label_ids": [[-1, 1], 'int64']}
+        else:
+            reader = {}
+        bb = {"encoder_outputs": [[-1, self._hidden_size], 'float32']}
+        return {'reader': reader, 'backbone': bb}
+
+    @property
+    def outputs_attrs(self):
+        if self._is_training:
+            return {"loss": [[1], 'float32']}
+        else:
+            return {"logits": [[-1, 1], 'float32']}
+
+    def build(self, inputs):
+        mask_label = inputs["reader"]["mask_label"] 
+        mask_pos = inputs["reader"]["mask_pos"] 
+        word_emb = inputs["backbone"]["word_embedding"]
+        enc_out = inputs["backbone"]["encoder_outputs"]
+
+        emb_size = word_emb.shape[-1]
+
+        _param_initializer = fluid.initializer.TruncatedNormal(
+            scale=self._initializer_range)
+
+        mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
+
+        reshaped_emb_out = fluid.layers.reshape(
+            x=enc_out, shape=[-1, emb_size])
+
+        # extract masked tokens' feature
+        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
+        num_seqs = fluid.layers.fill_constant(shape=[1], value=512, dtype='int64')
+
+        # transform: fc
+        mask_trans_feat = fluid.layers.fc(
+            input=mask_feat,
+            size=emb_size,
+            act=self._hidden_act,
+            param_attr=fluid.ParamAttr(
+                name='mask_lm_trans_fc.w_0',
+                initializer=_param_initializer),
+            bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
+        # transform: layer norm
+        mask_trans_feat = pre_process_layer(
+            mask_trans_feat, 'n', name='mask_lm_trans')
+
+        mask_lm_out_bias_attr = fluid.ParamAttr(
+            name="mask_lm_out_fc.b_0",
+            initializer=fluid.initializer.Constant(value=0.0))
+
+        # print fluid.default_main_program().global_block()
+
+        # fc_out = fluid.layers.matmul(
+        #     x=mask_trans_feat,
+        #     y=fluid.default_main_program().global_block().var(
+        #         _word_emb_name),
+        #     transpose_y=True)
+
+        fc_out = fluid.layers.matmul(
+            x=mask_trans_feat,
+            y=word_emb,
+            transpose_y=True)
+        fc_out += fluid.layers.create_parameter(
+            shape=[self._vocab_size],
+            dtype='float32',
+            attr=mask_lm_out_bias_attr,
+            is_bias=True)
+
+        mask_lm_loss = fluid.layers.softmax_with_cross_entropy(
+            logits=fc_out, label=mask_label)
+        loss = fluid.layers.mean(mask_lm_loss)
+
+        if self._is_training:
+            return {'loss': loss}
+        else:
+            return None
+
+
--- a/paddlepalm/task_paradigm/mrc.py
+++ b/paddlepalm/task_paradigm/mrc.py
--- a/paddlepalm/tokenizer/__init__.py
+++ b/paddlepalm/tokenizer/__init__.py
--- a/utils/tokenization.py
+++ b/utils/tokenization.py
+# -*- coding: UTF-8 -*-
 #   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -20,7 +21,7 @@ from __future__ import print_function
 import collections
 import unicodedata
 import six
-import io
+

 def convert_to_unicode(text):
    """Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
@@ -68,15 +69,15 @@ def printable_text(text):
 def load_vocab(vocab_file):
    """Loads a vocabulary file into a dictionary."""
    vocab = collections.OrderedDict()
-    with io.open(vocab_file, encoding="utf8") as fin:
-        for num, line in enumerate(fin):
-            items = convert_to_unicode(line.strip()).split("\t")
-            if len(items) > 2:
-                break
-            token = items[0]
-            index = items[1] if len(items) == 2 else num
-            token = token.strip()
-            vocab[token] = int(index)
+    fin = open(vocab_file)
+    for num, line in enumerate(fin):
+        items = convert_to_unicode(line.strip()).split("\t")
+        if len(items) > 2:
+            break
+        token = items[0]
+        index = items[1] if len(items) == 2 else num
+        token = token.strip()
+        vocab[token] = int(index)
    return vocab



--- a/paddlepalm/tokenizer/ernie_tokenizer.py
+++ b/paddlepalm/tokenizer/ernie_tokenizer.py
--- a/paddlepalm/utils/__init__.py
+++ b/paddlepalm/utils/__init__.py
--- a/utils/configure.py
+++ b/utils/configure.py
--- a/paddlepalm/utils/print_helper.py
+++ b/paddlepalm/utils/print_helper.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+MAXLEN = 70
+def print_dict(dic, title=""):
+
+    if title:
+        title = ' ' + title + ' '
+        left_len = (MAXLEN - len(title)) // 2
+        title = '-' * left_len + title
+        right_len = MAXLEN - len(title)
+        title = title + '-' * right_len
+    else:
+        title = '-' * MAXLEN
+    print(title)
+    for name in dic:
+        print("{: <25}\t{}".format(str(name), str(dic[name])))
+    print("")
+    # print("-" * MAXLEN + '\n')
--- a/paddlepalm/utils/reader_helper.py
+++ b/paddlepalm/utils/reader_helper.py
--- a/utils/init.py
+++ b/utils/init.py
--- a/paddlepalm/utils/textprocess_helper.py
+++ b/paddlepalm/utils/textprocess_helper.py
+# -*- coding: UTF-8 -*-
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+def is_whitespace(c):
+    if c == " " or c == "\t" or c == "\r" or c == "\n" or ord(c) == 0x202F:
+        return True
+    return False
--- a/paradigm/README.md
+++ b/paradigm/README.md
-
-该目录用来存放任务范式，用户可通过实现paradigm的接口完成自定义。
-
--- a/paradigm/mask_language_model.py
+++ b/paradigm/mask_language_model.py
--- a/paradigm/reading_comprehension.py
+++ b/paradigm/reading_comprehension.py
--- a/pretrain_model/README.md
+++ b/pretrain_model/README.md
-
-该目录用于存放预训练及其配置文件，用户可通过运行`download_pretrain.sh`下载内置的预训练模型。
-
--- a/reader/README.md
+++ b/reader/README.md
-
-该目录存放数据集载入与处理模块reader，用户可通过实现相关接口完成自定义
-
--- a/reader/answer_matching_reader.py
+++ b/reader/answer_matching_reader.py
--- a/reader/joint_reader.py
+++ b/reader/joint_reader.py
--- a/reader/mask_language_model_reader.py
+++ b/reader/mask_language_model_reader.py
--- a/reader/reading_comprehension_reader.py
+++ b/reader/reading_comprehension_reader.py
--- a/run.sh
+++ b/run.sh
-#!/bin/bash
-
-# for gpu memory optimization
-export FLAGS_sync_nccl_allreduce=0
-export FLAGS_eager_delete_tensor_gb=1
-
-export CUDA_VISIBLE_DEVICES=0
-
-if [[ ! -d pretrain_model/bert ]]; then
-    bash download_pretrain.sh bert
-fi
-
-if [[ ! -d pretrain_model/ernie ]]; then
-    bash download_pretrain.sh ernie
-fi
-
-python -u mtl_run.py
-
--- a/run_demo.sh
+++ b/run_demo.sh
+export CUDA_VISIBLE_DEVICES=0
+export FLAGS_fraction_of_gpu_memory_to_use=0.1
+export FLAGS_eager_delete_tensor_gb=0
+
+python demo.py
+
--- a/script/convert_params.sh
+++ b/script/convert_params.sh
+
+#!/bin/sh
+if [[ $# != 1 ]]; then
+    echo "usage: bash convert_params.sh <params_dir>"
+    exit 1
+fi
+
+echo "converting..."
+cd $1
+mkdir .palm.backup
+
+for file in $(ls *)
+    do cp $file "backbone-"$file; mv $file .palm.backup
+done
+cd - >/dev/null
+
+echo "done!"
+
--- a/script/recover_params.sh
+++ b/script/recover_params.sh
--- a/setup.cfg
+++ b/setup.cfg
--- a/setup.py
+++ b/setup.py
--- a/task_instance/match4mrqa.yaml
+++ b/task_instance/match4mrqa.yaml
+train_file: "data/match4mrqa/train.txt"
+reader: match4ernie
+paradigm: match
+
--- a/config/mask_language_model.yaml
+++ b/config/mask_language_model.yaml
--- a/task_instance/mrqa.yaml
+++ b/task_instance/mrqa.yaml
--- a/utils/fp16.py
+++ b/utils/fp16.py
--- a/utils/placeholder.py
+++ b/utils/placeholder.py