Merge pull request #803 from webYFDT/ernie-kit-open-v1.0

Ernie kit open v1.0

Merge pull request #803 from webYFDT/ernie-kit-open-v1.0
Ernie kit open v1.0
48c0996e · Tesla · GitHub · 92262c3c · efb71cd2 · 48c0996e
440 changed file
--- a/ERNIE_milestone_20210519_zh.png
+++ b/ERNIE_milestone_20210519_zh.png
--- a/LICENSE
+++ b/LICENSE
-                                 Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-
-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
-   1. Definitions.
-
-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
-
-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
-
-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
-
-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
-
-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
-
-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
-
-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
-
-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
-
-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
-
-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
-
-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
-
-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
-
-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
-
-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
-
-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
-
-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
-
-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
-
-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
-
-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
-
-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
-
-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
-
-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
-
-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
-
-   END OF TERMS AND CONDITIONS
-
-   APPENDIX: How to apply the Apache License to your work.
-
-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
-
-   Copyright [yyyy] [name of copyright owner]
-
-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--- a/README.en.md
+++ b/README.en.md
-English|[简体中文](./README.zh.md)
-
-![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone_20210519_en.png)
-
-
-**Remind： This repo has been refactored, for paper re-production or backward compatibility; plase checkout to [repro branch](https://github.com/PaddlePaddle/ERNIE/tree/repro)**
-
-ERNIE 2.0 is a continual pre-training framework for language understanding in which pre-training tasks can be incrementally built and learned through multi-task learning.
-ERNIE 2.0 builds a strong basic for nearly every NLP tasks: Text Classification, Ranking, NER, machine reading comprehension, text genration and so on.
-
-[\[more information\]](https://wenxin.baidu.com/)
-
-# News
-
- Dec.03.2021:
-    - [`ERNIE-M`](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-m) models are **avaliable** now!
-
- May.20.2021:
-    - [`ERNIE-Doc`](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-doc), [`ERNIE-Gram`](./ernie-gram/), [`ERNIE-ViL`](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-vil) models are **avaliable** now!
-    - `ERNIE-UNIMO` has been released in [here](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-unimo).
-
- Dec.29.2020:
- 	- Pretrain and finetune ERNIE with [PaddlePaddle v2.0](https://github.com/PaddlePaddle/Paddle/tree/release/2.0-rc).
-    - New AMP(auto mixed precision) feature for every demo in this repo.
-    - Introducing `Gradient accumulation`, run `ERNIE-large` with only 8G memory.
-
- Sept.24.2020:
-    - We have announced the [`ERNIE-ViL`](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-vil)!
-        - A **knowledge-enhanced** joint representations for vision-language tasks.
-            - Constructing three **Scene Graph Prediction** tasks utilizing structured knowledge.
-	    - The state-of-the-art performance on 5 downstream tasks, 1st place on [VCR leaderboad](https://visualcommonsense.com/leaderboard/).
-
- May.20.2020:
-
-    - Try ERNIE in "`dygraph`", with:
-    	- Eager execution with `paddle.fluid.dygraph`.
-    	- Distributed training.
-    	- Easy deployment.
-    	- Learn NLP in Aistudio tutorials.
-    	- Backward compatibility for old-styled checkpoint
-
-    - [`ERNIE-GEN`](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-gen) is **avaliable** now! ([link here](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-gen))
-    	- the **state-of-the-art** pre-trained model for generation tasks, accepted by `IJCAI-2020`.
-        	- A novel **span-by-span generation pre-training task**.
-        	- An **infilling generation** echanism and a **noise-aware generation** method.
-        	- Implemented by a carefully designed **Multi-Flow Attention** architecture.
-    	- You are able to `download` all models including `base/large/large-430G`.
-
- Apr.30.2020: Release [ERNIESage](https://github.com/PaddlePaddle/PGL/tree/master/examples/erniesage), a novel Graph Neural Network Model using ERNIE as its aggregator. It is implemented through [PGL](https://github.com/PaddlePaddle/PGL)
- Mar.27.2020: [Champion on 5 SemEval2020 sub tasks](https://www.jiqizhixin.com/articles/2020-03-27-8)
- Dec.26.2019: [1st place on GLUE leaderboard](https://www.technologyreview.com/2019/12/26/131372/ai-baidu-ernie-google-bert-natural-language-glue/)
- Nov.6.2019: [Introducing ERNIE-tiny](https://www.jiqizhixin.com/articles/2019-11-06-9)
- Jul.7.2019: [Introducing ERNIE2.0](https://www.jiqizhixin.com/articles/2019-07-31-10)
- Mar.16.2019: [Introducing ERNIE1.0](https://www.jiqizhixin.com/articles/2019-03-16-3)
-
-
-# Table of contents
-* [Tutorials](#tutorials)
-* [Setup](#setup)
-* [Fine-tuning](#fine-tuning)
-* [Pre-training with ERNIE 1.0](#pre-training-with-ernie-10)
-* [Online inference](#online-inference)
-* [Distillation](#distillation)
-
-# Quick Tour
-
-```python
-import numpy as np
-import paddle as P
-from ernie.tokenizing_ernie import ErnieTokenizer
-from ernie.modeling_ernie import ErnieModel
-
-model = ErnieModel.from_pretrained('ernie-1.0')    # Try to get pretrained model from server, make sure you have network connection
-model.eval()
-tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
-
-ids, _ = tokenizer.encode('hello world')
-ids = P.to_tensor(np.expand_dims(ids, 0))  # insert extra `batch` dimension
-pooled, encoded = model(ids)                 # eager execution
-print(pooled.numpy())                        # convert  results to numpy
-
-```
-
-# Tutorials
-
-Don't have GPU? try ERNIE in [AIStudio](https://aistudio.baidu.com/aistudio/index)!
-(please choose the latest version and apply for a GPU environment)
-
-1. [ERNIE for beginners](https://aistudio.baidu.com/studio/edu/group/quick/join/314947)
-1. [Sementic analysis](https://aistudio.baidu.com/aistudio/projectdetail/427482)
-2. [Cloze test](https://aistudio.baidu.com/aistudio/projectdetail/433491)
-3. [Knowledge distillation](https://aistudio.baidu.com/aistudio/projectdetail/439460)
-4. [Ask ERNIE](https://aistudio.baidu.com/aistudio/projectdetail/456443)
-5. [Loading old-styled checkpoint](https://aistudio.baidu.com/aistudio/projectdetail/493415)
-
-# Setup
-
-##### 1. install PaddlePaddle
-
-This repo requires PaddlePaddle 1.7.0+, please see [here](https://www.paddlepaddle.org.cn/install/quick) for installaton instruction.
-
-##### 2. install ernie
-
-```script
-pip install paddle-ernie
-```
-
-or
-
-```shell
-git clone https://github.com/PaddlePaddle/ERNIE.git --depth 1
-cd ERNIE
-pip install -r requirements.txt
-pip install -e .
-```
-
-##### 3. download pretrained models (optional)
-
-| Model                                              | Description                                                  |abbreviation|
-| :------------------------------------------------- | :----------------------------------------------------------- |:-----------|
-| [ERNIE 1.0 Base for Chinese](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz)           | L12H768A12  |ernie-1.0|
-| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz)                         | L3H1024A16  |ernie-tiny|
-| [ERNIE 2.0 Base for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz)        | L12H768A12  |ernie-2.0-en|
-| [ERNIE 2.0 Large for English](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | L24H1024A16 |ernie-2.0-large-en|
-| [ERNIE Gen base for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz)  | L12H768A12  |ernie-gen-base-en|
-| [ERNIE Gen Large for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| L24H1024A16 | ernie-gen-large-en |
-| [ERNIE Gen Large 430G for English](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-430g-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 + 430G pretrain corpus | ernie-gen-large-430g-en |
-| [ERNIE Doc Base for Chinese](https://ernie-github.cdn.bcebos.com/model-ernie-doc-base-zh.tar.gz)| L12H768A12 | ernie-doc-base-zh |
-| [ERNIE Doc Base for English](https://ernie-github.cdn.bcebos.com/model-ernie-doc-base-en.tar.gz)| L12H768A12 | ernie-doc-base-en |
-| [ERNIE Doc Large for English](https://ernie-github.cdn.bcebos.com/model-ernie-doc-large-en.tar.gz)| L24H1024A16 | ernie-doc-large-zh |
-| [ERNIE Gram Base for Chinese](https://ernie-github.cdn.bcebos.com/model-ernie-gram-zh.1.tar.gz) | L12H768A12 | ernie-gram-zh |
-| [ERNIE Gram Base for English](https://ernie-github.cdn.bcebos.com/model-ernie-gram-en.1.tar.gz) | L12H768A12 | ernie-gram-en |
-
-##### 4. download datasets
-
-**English Datasets**
-
-Download the [GLUE datasets](https://gluebenchmark.com/tasks) by running [this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
-
-the `--data_dir` option in the following section assumes a directory tree like this:
-
-```shell
-data/xnli
-├── dev
-│   └── 1
-├── test
-│   └── 1
-└── train
-    └── 1
-```
-
-see [demo](https://ernie-github.cdn.bcebos.com/data-mnli-m.tar.gz) data for MNLI task.
-
-**Chinese Datasets**
-
-| Datasets|Description|
-|:--------|:----------|
-| [XNLI](https://ernie-github.cdn.bcebos.com/data-xnli.tar.gz)                 |XNLI is a natural language inference dataset in 15 languages. It was jointly built by Facebook and New York University. We use Chinese data of XNLI to evaluate language understanding ability of our model. [url](https://github.com/facebookresearch/XNLI)|
-| [ChnSentiCorp](https://ernie-github.cdn.bcebos.com/data-chnsenticorp.tar.gz) |ChnSentiCorp is a sentiment analysis dataset consisting of reviews on online shopping of hotels, notebooks and books.|
-| [MSRA-NER](https://ernie-github.cdn.bcebos.com/data-msra_ner.tar.gz)         |MSRA-NER (SIGHAN2006) dataset is released by MSRA for recognizing the names of people, locations and organizations in text.|
-| [NLPCC2016-DBQA](https://ernie-github.cdn.bcebos.com/data-dbqa.tar.gz)       |NLPCC2016-DBQA is a sub-task of NLPCC-ICCPOL 2016 Shared Task which is hosted by NLPCC(Natural Language Processing and Chinese Computing), this task targets on selecting documents from the candidates to answer the questions. [url: http://tcci.ccf.org.cn/conference/2016/dldoc/evagline2.pdf]|
-|[CMRC2018](https://ernie-github.cdn.bcebos.com/data-cmrc2018.tar.gz)|CMRC2018 is a evaluation of Chinese extractive reading comprehension hosted by Chinese Information Processing Society of China (CIPS-CL). [url](https://github.com/ymcui/cmrc2018)|
-
-
-# Fine-tuning
-
- try eager execution with `dygraph model` :
-
-```script
-python3 ./demo/finetune_classifier.py \
-       --from_pretrained ernie-1.0 \
-       --data_dir ./data/xnli
-```
-
-  - specify `--use_amp` to activate AMP training.
-  - `--bsz` denotes global batch size for one optimization step, `--micro_bsz` denotes maximum batch size for each GPU device.
-if `--micro_bsz < --bsz`, gradient accumulation will be actiavted.
-
-
- Distributed finetune
-
-`paddle.distributed.launch` is a process manager, we use it to launch python processes on each avalible GPU devices:
-
-When in distributed training, `max_steps` is used as stopping criteria rather than `epoch` to prevent dead block.
-You could calculate `max_steps` with `EPOCH * NUM_TRAIN_EXAMPLES / TOTAL_BATCH`.
-Also notice than we shard the train data according to device id to prevent over fitting.
-
-demo:
-(make sure you have more than 2 GPUs,
-online model download can not work in `paddle.distributed.launch`,
-you need to run single card finetuning first to get pretrained model, or donwload and extract one manualy from [here](#section-pretrained-models)):
-
-
-```script
-python3 -m paddle.distributed.launch \
-./demo/finetune_classifier_distributed.py  \
-    --data_dir data/mnli \
-    --max_steps 10000 \
-    --from_pretrained ernie-2.0-en
-```
-
-
-many other demo python scripts:
-
-1. [Sentiment Analysis](./demo/finetune_sentiment_analysis.py)
-1. [Semantic Similarity](./demo/finetune_classifier.py)
-1. [Name Entity Recognition(NER)](./demo/finetune_ner.py)
-1. [Machine Reading Comprehension](./demo/finetune_mrc.py)
-1. [Text generation](./demo/seq2seq/README.md)
-1. [Text classification with `paddle.static` API](./demo/finetune_classifier_static.py)
-
-
-
-
-**recomended hyper parameters:**
-
-|tasks|batch size|learning rate|
-|--|--|--|
-| CoLA         | 32 / 64 (base)  | 3e-5                     |
-| SST-2        | 64 / 256 (base) | 2e-5                     |
-| STS-B        | 128             | 5e-5                     |
-| QQP          | 256             | 3e-5(base)/5e-5(large)   |
-| MNLI         | 256 / 512 (base)| 3e-5                     |
-| QNLI         | 256             | 2e-5                     |
-| RTE          | 16 / 4 (base)   | 2e-5(base)/3e-5(large)   |
-| MRPC         | 16 / 32 (base)  | 3e-5                     |
-| WNLI         | 8               | 2e-5                     |
-| XNLI         | 512             | 1e-4(base)/4e-5(large)   |
-| CMRC2018     | 64              | 3e-5                     |
-| DRCD         | 64              | 5e-5(base)/3e-5(large)   |
-| MSRA-NER(SIGHAN2006)  | 16     | 5e-5(base)/1e-5(large)   |
-| ChnSentiCorp | 24              | 5e-5(base)/1e-5(large)   |
-| LCQMC        | 32              | 2e-5(base)/5e-6(large)   |
-| NLPCC2016-DBQA| 64             | 2e-5(base)/1e-5(large)   |
-| VCR           | 64             | 2e-5(base)/2e-5(large)   |
-
-# Pretraining with ERNIE 1.0
-
-see [here](./demo/pretrain/README.md)
-
-
-# Online inference
-
-If `--inference_model_dir` is passed to `finetune_classifier_dygraph.py`,
-a deployable model will be generated at the end of finetuning and your model is ready to serve.
-
-For details about online inferece, see [C++ inference API](./inference/README.md),
-or you can start a multi-gpu inference server with a few lines of codes:
-
-```shell
-python -m propeller.tools.start_server -m /path/to/saved/inference_model  -p 8881
-```
-
-and call the server just like calling local function (python3 only):
-
-```python
-from propeller.service.client import InferenceClient
-from ernie.tokenizing_ernie import ErnieTokenizer
-
-client = InferenceClient('tcp://localhost:8881')
-tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
-ids, sids = tokenizer.encode('hello world')
-ids = np.expand_dims(ids, 0)
-sids = np.expand_dims(sids, 0)
-result = client(ids, sids)
-```
-
-A pre-made `inference model` for ernie-1.0 can be downloaded at [here](https://ernie.bj.bcebos.com/ernie1.0_zh_inference_model.tar.gz).
-It can be used for feature-based finetuning or feature extraction.
-
-# Distillation
-
-Knowledge distillation is good way to compress and accelerate ERNIE.
-
-For details about distillation, see [here](./demo/distill/README.md)
-
-# Citation
-
-### ERNIE 1.0
-```
-@article{sun2019ernie,
-  title={Ernie: Enhanced representation through knowledge integration},
-  author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Chen, Xuyi and Zhang, Han and Tian, Xin and Zhu, Danxiang and Tian, Hao and Wu, Hua},
-  journal={arXiv preprint arXiv:1904.09223},
-  year={2019}
-}
-```
-
-### ERNIE 2.0
-```
-@article{sun2019ernie20,
-  title={ERNIE 2.0: A Continual Pre-training Framework for Language Understanding},
-  author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:1907.12412},
-  year={2019}
-}
-```
-
-### ERNIE-GEN
-
-```
-@article{xiao2020ernie-gen,
-  title={ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation},
-  author={Xiao, Dongling and Zhang, Han and Li, Yukun and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2001.11314},
-  year={2020}
-}
-```
-
-### ERNIE-ViL
-
-```
-@article{yu2020ernie,
-  title={ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph},
-  author={Yu, Fei and Tang, Jiji and Yin, Weichong and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2006.16934},
-  year={2020}
-}
-
-```
-
-### ERNIE-Gram
-
-```
-@article{xiao2020ernie,
-  title={ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding},
-  author={Xiao, Dongling and Li, Yu-Kun and Zhang, Han and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2010.12148},
-  year={2020}
-}
-```
-
-### ERNIE-Doc
-
-```
-@article{ding2020ernie,
-  title={ERNIE-DOC: The Retrospective Long-Document Modeling Transformer},
-  author={Ding, Siyu and Shang, Junyuan and Wang, Shuohuan and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2012.15688},
-  year={2020}
-}
-```
-
-### ERNIE-UNIMO
-
-```
-@article{li2020unimo,
-  title={UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning},
-  author={Li, Wei and Gao, Can and Niu, Guocheng and Xiao, Xinyan and Liu, Hao and Liu, Jiachen and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2012.15409},
-  year={2020}
-}
-```
-
-### ERNIE-M
-
-```
-@article{ouyang2020ernie,
-  title={Ernie-m: Enhanced multilingual representation by aligning cross-lingual semantics with monolingual corpora},
-  author={Ouyang, Xuan and Wang, Shuohuan and Pang, Chao and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2012.15674},
-  year={2020}
-}
-```
-
-For full reproduction of paper results, please checkout to `repro` branch of this repo.
-
-### Communication
-
- [ERNIE homepage](https://wenxin.baidu.com/)
- [Github Issues](https://github.com/PaddlePaddle/ERNIE/issues): bug reports, feature requests, install issues, usage issues, etc.
- QQ discussion group: 760439550 (ERNIE discussion group).
- QQ discussion group: 958422639 (ERNIE discussion group-v2).
- [Forums](http://ai.baidu.com/forum/topic/list/168?pageNo=1): discuss implementations, research, etc.
--- a/README.md
+++ b/README.md
-README.zh.md
\ No newline at end of file
--- a/README.md
+++ b/README.md
+
+
+# ![ERNIE_milestone_20210519_zh](./ERNIE_milestone_20210519_zh.png)
+
+ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架，该框架将大数据预训练与多源丰富知识相结合，通过持续学习技术，不断吸收海量文本数据中词汇、结构、语义等方面的知识，实现模型效果不断进化。ERNIE在累积 40 余个典型 NLP 任务取得 SOTA 效果，并在 GLUE、VCR、XTREME、SemEval 等国际权威评测上斩获十余项冠军。ERNIE 在 2020年荣获了中国人工智能学会优秀科技成果奖及世界人工智能大会最高荣誉 SAIL奖，该技术也被全球顶级科技商业杂志《麻省理工科技评论》官方网站报道，相关创新成果也被国际顶级学术会议AAAI、ACL、NAACL、IJCAI收录。ERNIE在工业界得到了大规模应用，如搜索引擎、新闻推荐、广告系统、语音交互、智能客服等。
+提醒: ERNIE老版本代码已经迁移至repro分支，欢迎使用我们全新升级的基于动静结合的新版ERNIE套件进行开发。另外，也欢迎上[EasyDL](https://ai.baidu.com/easydl/pro)、[BML](https://ai.baidu.com/bml/app/overview)体验更丰富的功能。
+[【了解更多】](https://wenxin.baidu.com/)
+
+# 开源Roadmap
+
+- 2022.5.20:
+  - 最新开源ERNIE 3.0系列预训练模型:
+    - 110M参数通用模型ERNIE 3.0 Base
+    - 250M参数重量级通用模型ERNIE 3.0 XBase
+    - 24M轻量级通用模型ERNIE 3.0 Medium
+  - 新增语音语义模型ERNIE-SAT（链接待补充）
+  - 新增ERNIE-Gen（中文）预训模型，支持多类主流生成任务：主要包括摘要、问题生成、对话、问答
+  - 动静结合的文心ERNIE开发套件：基于飞桨动态图功能，支持文心ERNIE模型动态图训练。您仅需要在模型训练开启前，修改一个参数配置，即可实现模型训练的动静切换。
+  - 将文本预处理、预训练模型、网络搭建、模型评估、上线部署等NLP开发流程规范封装。
+  - 支持NLP常用任务：文本分类、文本匹配、序列标注、信息抽取、文本生成、数据蒸馏等。
+  - 提供数据清洗、数据增强、分词、格式转换、大小写转换等数据预处理工具。
+- 2021.12.3:
+  - 多语言预训练模型`ERNIE-M` [正式开源](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-m)
+- 2021.5.20:
+  - ERNIE 最新开源四大预训练模型:
+    - 多粒度语言知识模型`ERNIE-Gram` [正式开源](https://github.com/PaddlePaddle/ERNIE/blob/develop/ernie-gram)
+    - 超长文本双向建模预训练模型`ERNIE-Doc` [正式开源](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-doc)
+    - 融合场景图知识的跨模态预训练模型教程`ERNIE-ViL` [正式开源](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-vil)
+    - 语言与视觉一体的预训练模型`ERNIE-UNIMO` [正式开源](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-unimo)
+- 2020.9.24:
+  - `ERNIE-ViL` 技术发布! ([点击进入](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-vil))
+    - 面向视觉-语言知识增强的预训练框架，首次在视觉-语言预训练引入结构化的知识。
+      - 利用场景图中的知识，构建了物体、属性和关系预测任务，精细刻画模态间细粒度语义对齐。
+    - 五项视觉-语言下游任务取得最好效果，[视觉常识推理榜单](https://visualcommonsense.com/)取得第一。
+- 2020.5.20:
+  - `ERNIE-GEN` 模型正式开源! ([点击进入](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-gen))
+    - 最强文本生成预训练模型正式开源，相关工作已被 `IJCAI-2020` 收录。
+      - 首次把 ERNIE 预训练技术能力扩展至文本生成领域，在多个典型任务上取得最佳。
+      - 您现在即可下载论文报告的所有模型（包含 [base/large/large-430G](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-gen/README.zh.md#预训练模型)）。
+    - 首次在预训练阶段加入span-by-span 生成任务，让模型每次能够生成一个语义完整的片段。
+    - 提出填充式生成机制和噪声感知机制来缓解曝光偏差问题。
+    - 精巧的 Mulit-Flow Attention 实现框架。
+- 2020.4.30 发布[ERNIESage](https://github.com/PaddlePaddle/PGL/tree/master/examples/erniesage)， 一种新型图神经网络模型，采用ERNIE做为aggreagtor. 由[PGL](https://github.com/PaddlePaddle/PGL)实现。
+- 2020.3.27 [在SemEval2020五项子任务上夺冠](https://www.jiqizhixin.com/articles/2020-03-27-8)。
+- 2019.12.26 [GLUE榜第一名](https://www.technologyreview.com/2019/12/26/131372/ai-baidu-ernie-google-bert-natural-language-glue/)。
+- 2019.11.6 发布[ERNIE Tiny](https://www.jiqizhixin.com/articles/2019-11-06-9)。
+- 2019.7.7 发布[ERNIE 2.0](https://www.jiqizhixin.com/articles/2019-07-31-10)。
+- 2019.3.16 发布[ERNIE 1.0](https://www.jiqizhixin.com/articles/2019-03-16-3)。
+
+# 环境安装
+
+1. 安装环境依赖：[环境安装](./readme_env.md)
+2. 安装Ernie套件
+
+```plain
+git clone https://github.com/PaddlePaddle/ERNIE.git
+```
+
+# 快速上手：使用文心ERNIE大模型进行训练
+
+- 使用ERNIE2.0作为预训练模型，准备工作包括：
+  - 下载模型
+  - 准备数据
+  - 配置训练json文件
+  - 启动训练模型
+  - 配置预测json文件
+  - 启动预测
+- 我们以文本分类任务为例，来快速上手ERNIE大模型的使用
+
+## 下载模型
+
+- 使用ERNIE2.0预训练模型进行文本分类任务
+- ERNNIE2.0预训练模型的下载与配置
+
+```plain
+# ernie_2.0 模型下载
+# 进入models_hub目录
+cd ./wenxin_appzoo/models_hub
+# 运行下载脚本
+sh download_ernie_2.0_base_ch.sh
+```
+
+## 准备数据
+
+- 文心各个任务的data目录下自带一些示例数据，能够实现直接使用，方便快速熟悉文心的使用。
+- 文本分类任务的数据
+
+```shell
+#进入文本分类任务文件夹
+cd ./wenxin_appzoo/tasks/text_classification/
+#查看文本分类任务自带数据集
+ls ./data
+```
+
+- 注：示例数据仅作为格式演示使用，在真正训练模型时请替换为真实数据。
+
+## 配置训练json文件
+
+- 其预置json文件在./examples/目录下，使用ERNIE2.0预训练模型进行训练的配置文件为的./examples/cls_ernie_fc_ch.json，在该json文件中对数据、模型、训练方式等逻辑进行了配置。
+
+```shell
+#查看 ERNIE2.0预训练模型 训练文本分类任务的配置文件
+cat ./examples/cls_ernie_fc_ch.json
+```
+
+## 启动训练
+
+- 将数据集存放妥当，并配置好cls_ernie_fc_ch.json，我们就可以运行模型训练的命令。
+- 其中，单卡指令为`python run_trainer.py`，如下所示，使用基于ernie的中文文本分类模型在训练集上进行本地模型训练。
+
+```shell
+# ernie 中文文本分类模型
+# 基于json实现预置网络训练。其调用了配置文件./examples/cls_ernie_fc_ch.json
+python run_trainer.py --param_path ./examples/cls_ernie_fc_ch.json
+```
+
+- 多卡指令为:
+
+```plain
+fleetrun --gpus=x,y run_trainer.py./examples/cls_ernie_fc_ch.json
+```
+
+- 训练运行的日志会自动保存在**./log/test.log**文件中。
+- 训练中以及结束后产生的模型文件会默认保存在./output/**目录下，其中**save_inference_model/文件夹会保存用于预测的模型文件，**save_checkpoint/** 文件夹会保存用于热启动的模型文件。
+
+## 配置预测json文件
+
+- 其预置json文件在./examples/目录下，使用ERNIE2.0预训练模型训练的模型进行预测的配置文件为的./examples/cls_ernie_fc_ch_infer.json
+- 主要修改./examples/cls_ernie_fc_ch_infer.json文件的预测模型的输入路径、预测文件的输入路径、预测结果的输出路径，对应修改配置如下：
+
+```json
+{
+"dataset_reader":{"train_reader":{"config":{"data_path":"./data/predict_data"}}},
+"inference":{"inference_model_path":"./output/cls_ernie_fc_ch/save_inference_model/inference_step_251",
+                        "output_path": "./output/predict_result.txt"}
+}
+```
+
+## 启动预测
+
+- 运行run_infer.py ，选择对应的参数配置文件即可。如下所示：
+
+```plain
+python run_infer.py --param_path ./examples/cls_enrie_fc_ch_infer.json
+```
+
+- 预测过程中的日志自动保存在./output/predict_result.txt文件中。
+
+# 预训练模型介绍
+
+- 参考预训练模型原理介绍:[模型介绍](readme_model.md)
+- 预训练模型下载：进入./wenxin_appzoo/models_hub目录下,下载示例：
+
+```plain
+#进入预训练模型下载目录
+cd ./wenxin_appzoo/models_hub
+#下载ERNIE3.0 base模型
+sh downlaod_ernie3.0_base_ch.sh
+```
+
+- 更多开源模型，见[Research](./Research/README.md)
+
+# 模型效果评估
+
+[模型效果评估](readme_score.md)
+
+# 数据集下载
+
+[CLUE数据集](https://www.cluebenchmarks.com/)
+
+[DuIE2.0数据集](https://www.luge.ai/#/luge/dataDetail?id=5)
+
+[MSRA_NER数据集](https://ernie-github.cdn.bcebos.com/data-msra_ner.tar.gz)
+
+# 应用场景
+
+文本分类（[文本分类](./nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/README.md)）
+
+文本匹配（[文本匹配](./nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/README.md)）
+
+系列标注（[序列标注](./nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/README.md)）
+
+信息抽取（[信息抽取](./nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/README.md)）
+
+文本生成（[文本生成](./nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/README.md)）
+
+数据蒸馏（[数据蒸馏](./nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/README.md)）
+
+工具使用（[工具使用](./nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/README.md)）
+
+# 文献引用
+
+### ERNIE 1.0
+
+```json
+@article{sun2019ernie,
+  title={Ernie: Enhanced representation through knowledge integration},
+  author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Chen, Xuyi and Zhang, Han and Tian, Xin and Zhu, Danxiang and Tian, Hao and Wu, Hua},
+  journal={arXiv preprint arXiv:1904.09223},
+  year={2019}
+}
+```
+
+### ERNIE 2.0
+
+```json
+@inproceedings{sun2020ernie,
+  title={Ernie 2.0: A continual pre-training framework for language understanding},
+  author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Tian, Hao and Wu, Hua and Wang, Haifeng},
+  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
+  volume={34},
+  number={05},
+  pages={8968--8975},
+  year={2020}
+}
+```
+
+### ERNIE-GEN
+
+```json
+@article{xiao2020ernie,
+  title={Ernie-gen: An enhanced multi-flow pre-training and fine-tuning framework for natural language generation},
+  author={Xiao, Dongling and Zhang, Han and Li, Yukun and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
+  journal={arXiv preprint arXiv:2001.11314},
+  year={2020}
+}
+```
+
+### ERNIE-ViL
+
+```json
+@article{yu2020ernie,
+  title={Ernie-vil: Knowledge enhanced vision-language representations through scene graph},
+  author={Yu, Fei and Tang, Jiji and Yin, Weichong and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
+  journal={arXiv preprint arXiv:2006.16934},
+  year={2020}
+}
+```
+
+### ERNIE-Gram
+
+```json
+@article{xiao2020ernie,
+  title={ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding},
+  author={Xiao, Dongling and Li, Yu-Kun and Zhang, Han and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
+  journal={arXiv preprint arXiv:2010.12148},
+  year={2020}
+}
+```
+
+### ERNIE-Doc
+
+```json
+@article{ding2020ernie,
+  title={ERNIE-Doc: A retrospective long-document modeling transformer},
+  author={Ding, Siyu and Shang, Junyuan and Wang, Shuohuan and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
+  journal={arXiv preprint arXiv:2012.15688},
+  year={2020}
+}
+```
+
+### ERNIE-UNIMO
+
+```json
+@article{li2020unimo,
+  title={Unimo: Towards unified-modal understanding and generation via cross-modal contrastive learning},
+  author={Li, Wei and Gao, Can and Niu, Guocheng and Xiao, Xinyan and Liu, Hao and Liu, Jiachen and Wu, Hua and Wang, Haifeng},
+  journal={arXiv preprint arXiv:2012.15409},
+  year={2020}
+}
+```
+
+### ERNIE-M
+
+```json
+@article{ouyang2020ernie,
+  title={Ernie-m: Enhanced multilingual representation by aligning cross-lingual semantics with monolingual corpora},
+  author={Ouyang, Xuan and Wang, Shuohuan and Pang, Chao and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
+  journal={arXiv preprint arXiv:2012.15674},
+  year={2020}
+}
+```
\ No newline at end of file
--- a/README.zh.md
+++ b/README.zh.md
-[English](./README.en.md)|简体中文
-
-![./.metas/ERNIE_milestone.png](./.metas/ERNIE_milestone_20210519_zh.png)
-
-ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架，该框架将大数据预训练与多源丰富知识相结合，通过持续学习技术，不断吸收海量文本数据中词汇、结构、语义等方面的知识，实现模型效果不断进化。ERNIE在累积 40 余个典型 NLP 任务取得 SOTA 效果，并在 GLUE、VCR、XTREME、SemEval 等国际权威评测上斩获十余项冠军。ERNIE 在 2020年荣获了中国人工智能学会优秀科技成果奖及世界人工智能大会最高荣誉 SAIL奖，该技术也被全球顶级科技商业杂志《麻省理工科技评论》官方网站报道，相关创新成果也被国际顶级学术会议AAAI、ACL、NAACL、IJCAI收录。ERNIE在工业界得到了大规模应用，如搜索引擎、新闻推荐、广告系统、语音交互、智能客服等。
-
-**提醒: ERNIE老版本代码已经迁移至[repro分支](https://github.com/PaddlePaddle/ERNIE/tree/repro)，欢迎使用我们全新升级的基于动静结合的新版ERNIE套件进行开发。另外，也欢迎上[EasyDL](https://ai.baidu.com/easydl/pro)体验更丰富的功能（如ERNIE 2.0、ERNIE 2.1、ERNIE领域模型等）。**
-
-[【了解更多】](https://wenxin.baidu.com/)
-
-# 新闻
-
- 2021.12.3:
-  - 多语言预训练模型`ERNIE-M` [正式开源](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-m)
-
- 2021.5.20:
-   - ERNIE 最新开源四大预训练模型:
-      - 多粒度语言知识模型`ERNIE-Gram` [正式开源](./ernie-gram/)
-      - 超长文本双向建模预训练模型`ERNIE-Doc` [正式开源](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-doc)
-      - 融合场景图知识的跨模态预训练模型`ERNIE-ViL` [正式开源](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-vil)
-      - 语言与视觉一体的预训练模型`ERNIE-UNIMO` [正式开源](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-unimo)
-
- 2020.12.29:
-   - `ERNIE`开源工具套件全面升级 [PaddlePaddle v2.0](https://github.com/PaddlePaddle/Paddle/tree/release/2.0-rc)
-   - 所有demo教程均引入AMP（混合精度训练), 平均提速达2.3倍。
-   - 引入`Gradient accumulation`, 8G显存也可运行`ERNIE-large`模型。
-
- 2020.9.24:
-   - `ERNIE-ViL` 技术发布! ([点击进入](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-vil))
-       - 面向视觉-语言知识增强的预训练框架，首次在视觉-语言预训练引入结构化的知识。
-           - 利用场景图中的知识，构建了物体、属性和关系预测任务，精细刻画模态间细粒度语义对齐。
-       - 五项视觉-语言下游任务取得最好效果，[视觉常识推理榜单](https://visualcommonsense.com/)取得第一。
-
-
- 2020.5.20:
-    - 欢迎试用`动态图`实现的 ERNIE:
-        - 动态执行, 所见即所得。
-        - 大规模分布式训练。
-        - 易于部署。
-        - 通过Aistudio 教程快速入门NLP。
-        - 向后兼容老版 checkpoint。
-    -  `ERNIE-GEN` 模型正式开源! ([点击进入](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-gen))
-        - 最强文本生成预训练模型正式开源，相关工作已被 `IJCAI-2020` 收录。
-            - 首次把 ERNIE 预训练技术能力扩展至文本生成领域，在多个典型任务上取得最佳。
-            - 您现在即可下载论文报告的所有模型（包含 [`base/large/large-430G`](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-gen/README.zh.md#预训练模型)）。
-        - 首次在预训练阶段加入span-by-span 生成任务，让模型每次能够生成一个语义完整的片段。
-        - 提出填充式生成机制和噪声感知机制来缓解曝光偏差问题。
-        - 精巧的 Mulit-Flow Attention 实现框架。
- 2020.4.30 发布[ERNIESage](https://github.com/PaddlePaddle/PGL/tree/master/examples/erniesage)， 一种新型图神经网络模型，采用ERNIE做为aggreagtor. 由[PGL](https://github.com/PaddlePaddle/PGL)实现。
- 2020.3.27 [在SemEval2020五项子任务上夺冠](https://www.jiqizhixin.com/articles/2020-03-27-8)。
- 2019.12.26 [GLUE榜第一名](https://www.technologyreview.com/2019/12/26/131372/ai-baidu-ernie-google-bert-natural-language-glue/)。
- 2019.11.6 发布[ERNIE Tiny](https://www.jiqizhixin.com/articles/2019-11-06-9)。
- 2019.7.7 发布[ERNIE 2.0](https://www.jiqizhixin.com/articles/2019-07-31-10)。
- 2019.3.16 发布[ERNIE 1.0](https://www.jiqizhixin.com/articles/2019-03-16-3)。
-
-
-# 导航
-
-* [教程](#教程)
-* [安装](#安装)
-* [支持的NLP任务](#支持的nlp任务)
-* [预训练(ERNIE 1.0)](#预训练-ernie-10)
-* [在线预测](#在线预测)
-* [蒸馏](#蒸馏)
-
-# 快速上手
-```python
-import numpy as np
-import paddle as P
-from ernie.tokenizing_ernie import ErnieTokenizer
-from ernie.modeling_ernie import ErnieModel
-
-model = ErnieModel.from_pretrained('ernie-1.0')    # Try to get pretrained model from server, make sure you have network connection
-model.eval()
-tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
-
-ids, _ = tokenizer.encode('hello world')
-ids = P.to_tensor(np.expand_dims(ids, 0))  # insert extra `batch` dimension
-pooled, encoded = model(ids)                 # eager execution
-print(pooled.numpy())                        # convert  results to numpy
-
-```
-
-# 教程
-
-手边没有GPU？欢迎在[AIStudio](https://aistudio.baidu.com/aistudio/index)中直接试用 ERNIE.
-(请选择最新版本的教程并申请GPU运行环境)
-
-1. [从0开始学ERNIE](https://aistudio.baidu.com/studio/edu/group/quick/join/314947)
-1. [情感识别](https://aistudio.baidu.com/aistudio/projectdetail/427482)
-2. [完形填空](https://aistudio.baidu.com/aistudio/projectdetail/433491)
-3. [知识蒸馏](https://aistudio.baidu.com/aistudio/projectdetail/439460)
-4. [万事不决问ERNIE](https://aistudio.baidu.com/aistudio/projectdetail/456443)
-5. [加载并读取老式checkpoint](https://aistudio.baidu.com/aistudio/projectdetail/493415)
-6. [ERNIE作诗](https://aistudio.baidu.com/aistudio/projectdetail/502844)
-
-# 安装
-
-##### 1. 安装 PaddlePaddle
-
-本项目依赖PaddlePaddle 1.7.0+， 请参考[这里](https://www.paddlepaddle.org.cn/install/quick)安装 PaddlePaddle。
-
-##### 2. 安装 ERNIE 套件
-
-
-```script
-pip install paddle-ernie
-```
-
-或者
-
-```shell
-git clone https://github.com/PaddlePaddle/ERNIE.git --depth 1
-cd ERNIE
-pip install -r requirements.txt
-pip install -e .
-```
-`propeller`是辅助模型训练的高级框架，包含NLP常用的前、后处理流程。你可以通过将本repo根目录放入`PYTHONPATH`的方式导入`propeller`:
-```shell
-export PYTHONPATH=$PWD:$PYTHONPATH
-```
-
-##### 3. 下载预训练模型（可选）<a name="section-pretrained-models"></a>
-
-
-| Model                                              | 细节参数                                                                  |下载简写|
-| :------------------------------------------------- |:------------------------------------------------------------------------- |:-------|
-| [ERNIE 1.0 Base 中文](https://ernie-github.cdn.bcebos.com/model-ernie1.0.1.tar.gz)           | Layer:12, Hidden:768, Heads:12  |ernie-1.0|
-| [ERNIE Tiny](https://ernie-github.cdn.bcebos.com/model-ernie_tiny.1.tar.gz)                  | Layer:3, Hdden:1024, Heads:16   |ernie-tiny|
-| [ERNIE 2.0 Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-en.1.tar.gz)        | Layer:12, Hidden:768, Heads:12  |ernie-2.0-en|
-| [ERNIE 2.0 Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie2.0-large-en.1.tar.gz) | Layer:24, Hidden:1024, Heads16  |ernie-2.0-large-en|
-| [ERNIE Gen Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-base-en.1.tar.gz)  | Layer:12, Hidden:768, Heads:12  |ernie-gen-base-en|
-| [ERNIE Gen Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 |ernie-gen-large-en|
-| [ERNIE Gen Large 430G英文](https://ernie-github.cdn.bcebos.com/model-ernie-gen-large-430g-en.1.tar.gz)| Layer:24, Hidden:1024, Heads:16 + 额外430G 预训练语料 | ernie-gen-large-430g-en |
-| [ERNIE Doc Base 中文](https://ernie-github.cdn.bcebos.com/model-ernie-doc-base-zh.tar.gz)| Layer:12, Hidden:768, Heads:12 |ernie-doc-base-zh|
-| [ERNIE Doc Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie-doc-base-en.tar.gz)| Layer:12, Hidden:768, Heads:12 |ernie-doc-base-en|
-| [ERNIE Doc Large 英文](https://ernie-github.cdn.bcebos.com/model-ernie-doc-large-en.tar.gz)| Layer:24, Hidden:1024, Heads:16 |ernie-doc-large-en|
-| [ERNIE Gram Base 中文](https://ernie-github.cdn.bcebos.com/model-ernie-gram-zh.1.tar.gz)| L12H768A12 |ernie-gram-zh|
-| [ERNIE Gram Base 英文](https://ernie-github.cdn.bcebos.com/model-ernie-gram-en.1.tar.gz)| L12H768A12 |ernie-gram-en|
-
-##### 4. 下载数据集
-
-
-**英文数据集**
-
-运行[此](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)脚本，下载[GLUE datasets](https://gluebenchmark.com/tasks).
-
-请将数据目录整理成以下格式，方便在后续 demo 教程中使用（通过`--data_dir`参数将数据路径传入训练脚本）；
-
-```shell
-data/xnli
-├── dev
-│   └── 1
-├── test
-│   └── 1
-└── train
-    └── 1
-```
-
-[示例](https://ernie-github.cdn.bcebos.com/data-mnli-m.tar.gz)数据（MNLI任务测试、训练集合）。
-
-
-**中文数据**
-
-| 数据集|描述|
-|:--------|:----------|
-| [XNLI](https://ernie-github.cdn.bcebos.com/data-xnli.tar.gz)                 |XNLI 是由 Facebook 和纽约大学的研究者联合构建的自然语言推断数据集，包括 15 种语言的数据。我们用其中的中文数据来评估模型的语言理解能力。[链接](https://github.com/facebookresearch/XNLI)|
-| [ChnSentiCorp](https://ernie-github.cdn.bcebos.com/data-chnsenticorp.tar.gz) |ChnSentiCorp 是一个中文情感分析数据集，包含酒店、笔记本电脑和书籍的网购评论。|
-| [MSRA-NER](https://ernie-github.cdn.bcebos.com/data-msra_ner.tar.gz)         |MSRA-NER (SIGHAN2006) 数据集由微软亚研院发布，其目标是识别文本中具有特定意义的实体，包括人名、地名、机构名。|
-| [NLPCC2016-DBQA](https://ernie-github.cdn.bcebos.com/data-dbqa.tar.gz)       |NLPCC2016-DBQA 是由国际自然语言处理和中文计算会议 NLPCC 于 2016 年举办的评测任务，其目标是从候选中找到合适的文档作为问题的答案。[链接](http://tcci.ccf.org.cn/conference/2016/dldoc/evagline2.pdf)|
-|[CMRC2018](https://ernie-github.cdn.bcebos.com/data-cmrc2018.tar.gz)|CMRC2018 是中文信息学会举办的评测，评测的任务是抽取类阅读理解。[链接](https://github.com/ymcui/cmrc2018)
-
-
-# 支持的NLP任务
-
- 使用 `动态图` 模型进行finetune:
-
-```script
-python3 ./demo/finetune_classifier.py \
-       --from_pretrained ernie-1.0 \
-       --data_dir ./data/xnli
-```
-
-   - 加入`--use_amp`以启用AMP功能(请在支持`TensorCore`设备上启用AMP)
-   - 通过`--bsz`指定全局batch\_size(一步优化中模型所能见到的样本数), 通过`--micro_bsz` 指定输入给每一张GPU卡的样本数
-若`--bsz > --micro_bsz` 脚本会自动开启梯度累计功能.
-
-
- 分布式 finetune
-
-`paddle.distributed.launch` 是一个进程管理器，我们采用它在每一张GPU上启动一个python进程，并配置相应的环境变量以进行分布式训练:
-
-当采用分布式训练时，我们采用`max_steps`做为终止条件而非`epoch`, 这样处理是为了避免进程间死锁。
-你可以通过`EPOCH * NUM_TRAIN_EXAMPLES / TOTAL_BATCH`的方式计算出所需执行的`max_steps`.
-另外值得注意的是训练集需要在不同的进程间进行切分；以避免所有进程训练同一份数据造成的过拟合。
-
-示例脚本（请确保你有两张以上GPU卡, 在线模型下载功能在`paddle.distributed.launch`下无法工作，
-你可能需要一个先通过单卡finetune方式下载预训练模型，或者根据[这里](#section-pretrained-models)手动下载并解压预训练模型）:
-
-```script
-python3 -m paddle.distributed.launch \
-./demo/finetune_classifier_distributed.py \
-    --data_dir data/mnli \
-    --max_steps 10000 \
-    --from_pretrained ernie2.0-en
-```
-
-
-更多示例脚本:
-
-1. [情感分析](./demo/finetune_sentiment_analysis.py)
-1. [语义匹配](./demo/finetune_classifier.py)
-1. [命名实体识别(NER)](./demo/finetune_ner.py)
-1. [机器阅读理解](./demo/finetune_mrc.py) (需要多卡环境运行；参见上面"分布式 finetune"一节)
-1. [文本摘要生成](./demo/seq2seq/README.md)
-1. [使用静态图完成文本分类](./demo/finetune_classifier_static.py)
-
-
-**推荐超参数设置：**
-
-|任务|batch size|learning rate|
-|--|--|--|
-| CoLA         | 32 / 64 (base)  | 3e-5                     |
-| SST-2        | 64 / 256 (base) | 2e-5                     |
-| STS-B        | 128             | 5e-5                     |
-| QQP          | 256             | 3e-5(base)/5e-5(large)   |
-| MNLI         | 256 / 512 (base)| 3e-5                     |
-| QNLI         | 256             | 2e-5                     |
-| RTE          | 16 / 4 (base)   | 2e-5(base)/3e-5(large)   |
-| MRPC         | 16 / 32 (base)  | 3e-5                     |
-| WNLI         | 8               | 2e-5                     |
-| XNLI         | 512             | 1e-4(base)/4e-5(large)   |
-| CMRC2018     | 64              | 3e-5                     |
-| DRCD         | 64              | 5e-5(base)/3e-5(large)   |
-| MSRA-NER(SIGHAN2006)  | 16     | 5e-5(base)/1e-5(large)   |
-| ChnSentiCorp | 24              | 5e-5(base)/1e-5(large)   |
-| LCQMC        | 32              | 2e-5(base)/5e-6(large)   |
-| NLPCC2016-DBQA| 64             | 2e-5(base)/1e-5(large)   |
-| VCR           | 64             | 2e-5(base)/2e-5(large)   |
-
-# 预训练 (ERNIE 1.0)
-
-请见[这里](./demo/pretrain/README.md)
-
-# 在线预测
-
-如果`finetune_classifier.py`中指定了`--inference_model_dir`参数，finetune脚本会将你的模型序列化并产出可以直接部署线上预测的`inference_model`.
-
-关于生产环境中使用线上预测代码的实现细节，请见[C++ inference API](./inference/README.md).
-或者你可以使用`propeller`启动一个多GPU预测服务(需要GPU环境)，只需执行：
-
-```shell
-python -m propeller.tools.start_server -m /path/to/saved/inference_model  -p 8881
-```
-
-即可启动预测服务；随后在Python端采用如下命令访问该服务(仅限 python3):
-
-```python
-from propeller.service.client import InferenceClient
-from ernie.tokenizing_ernie import ErnieTokenizer
-
-client = InferenceClient('tcp://localhost:8881')
-tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
-ids, sids = tokenizer.encode('hello world')
-ids = np.expand_dims(ids, 0)
-sids = np.expand_dims(sids, 0)
-result = client(ids, sids)
-```
-
-你也可从[此处](https://ernie.bj.bcebos.com/ernie1.0_zh_inference_model.tar.gz)下载一个预先制作好的ernie-1.0 base模型的 `inference_model`.
-该模型没有经过finetune，一般可以用做上层模型结构的 feature-base finetune或者做为一个文本特征抽取器。
-因为该模行由老版API 产出，在进行客户端请求时需要在输入tensor后面追加一个维度：
-
-```python3
-ids = np.expand_dims(ids, -1) # ids.shape==[BATCH, SEQLEN, 1]
-```
-
-# 蒸馏
-
-知识蒸馏是进行ERNIE模型压缩、加速的有效方式；关于知识蒸馏的实现细节请参见[这里](./demo/distill/README.md)。
-
-# 文献引用
-
-### ERNIE 1.0
-```
-@article{sun2019ernie,
-  title={Ernie: Enhanced representation through knowledge integration},
-  author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Chen, Xuyi and Zhang, Han and Tian, Xin and Zhu, Danxiang and Tian, Hao and Wu, Hua},
-  journal={arXiv preprint arXiv:1904.09223},
-  year={2019}
-}
-```
-
-### ERNIE 2.0
-```
-@article{sun2019ernie20,
-  title={ERNIE 2.0: A Continual Pre-training Framework for Language Understanding},
-  author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:1907.12412},
-  year={2019}
-}
-```
-
-### ERNIE-GEN
-
-```
-@article{xiao2020ernie-gen,
-  title={ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation},
-  author={Xiao, Dongling and Zhang, Han and Li, Yukun and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2001.11314},
-  year={2020}
-}
-```
-
-### ERNIE-ViL
-
-```
-@article{yu2020ernie,
-  title={ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph},
-  author={Yu, Fei and Tang, Jiji and Yin, Weichong and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2006.16934},
-  year={2020}
-}
-
-```
-
-### ERNIE-Gram
-
-```
-@article{xiao2020ernie,
-  title={ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding},
-  author={Xiao, Dongling and Li, Yu-Kun and Zhang, Han and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2010.12148},
-  year={2020}
-}
-```
-
-### ERNIE-Doc
-
-```
-@article{ding2020ernie,
-  title={ERNIE-DOC: The Retrospective Long-Document Modeling Transformer},
-  author={Ding, Siyu and Shang, Junyuan and Wang, Shuohuan and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2012.15688},
-  year={2020}
-}
-```
-
-### ERNIE-UNIMO
-
-```
-@article{li2020unimo,
-  title={UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning},
-  author={Li, Wei and Gao, Can and Niu, Guocheng and Xiao, Xinyan and Liu, Hao and Liu, Jiachen and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2012.15409},
-  year={2020}
-}
-```
-
-### ERNIE-M
-
-```
-@article{ouyang2020ernie,
-  title={Ernie-m: Enhanced multilingual representation by aligning cross-lingual semantics with monolingual corpora},
-  author={Ouyang, Xuan and Wang, Shuohuan and Pang, Chao and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:2012.15674},
-  year={2020}
-}
-```
-
-若希望复现 paper 中的所有实验，请切换至本repo的`repro`分支。
-
-### 讨论组
- [ERNIE官方主页](https://wenxin.baidu.com/)
- [Github Issues](https://github.com/PaddlePaddle/ERNIE/issues): bug reports, feature requests, install issues, usage issues, etc.
- QQ 群: 760439550 (ERNIE discussion group).
- QQ 2群: 958422639 (ERNIE discussion group-v2).
- [Forums](http://ai.baidu.com/forum/topic/list/168?pageNo=1): discuss implementations, research, etc.
--- a/Research/readme.md
+++ b/Research/readme.md
+
+
+- 多粒度语言知识模型[ERNIE-Gram](https://github.com/PaddlePaddle/ERNIE/blob/develop/ernie-gram)
+- 超长文本双向建模预训练模型 [ERNIE-Doc](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-doc)
+- 融合场景图知识的跨模态预训练模型教程 [ERNIE-ViL](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-vil)
+- 语言与视觉一体的预训练模型 [ERNIE-UNIMO](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-unimo)
+
+- 新增语音语义模型ERNIE-SAT（链接待补充）
\ No newline at end of file
--- a/demo/distill/README.md
+++ b/demo/distill/README.md
-* [ERNIE Slim 数据蒸馏](#ernie-slim-数据蒸馏)
-    * [ERNIE数据蒸馏三步](#ernie数据蒸馏三步)
-    * [数据增强](#数据增强)
-* [使用教程](#使用教程)
-* [效果验证](#效果验证)
-    * [Case#1 用户提供“无标注数据”](#case1)
-    * [Case#2 用户未提供“无标注数据”](#case2)
-
-# ERNIE Slim 数据蒸馏
-在ERNIE强大的语义理解能力背后，是需要同样强大的算力才能支撑起如此大规模模型的训练和预测。很多工业应用场景对性能要求较高，若不能有效压缩则无法实际应用。
-
-![ernie_distill](../../.metas/ernie_distill.png)
-
-因此，如上图所示，我们基于[数据蒸馏技术](https://arxiv.org/pdf/1712.04440.pdf)构建了**ERNIE Slim数据蒸馏系统**。它的原理是通过数据作为桥梁，将ERNIE模型的知识迁移至小模型，以达到损失很小的效果却能达到上千倍的预测速度提升的效果。
-
-
-### ERNIE数据蒸馏三步
-
- - **Step 1**. 使用ERNIE模型对输入标注数据对进行fine-tune，得到Teacher Model
- - **Step 2**. 使用ERNIE Service对以下无监督数据进行预测：
-
-   1. 用户提供的大规模无标注数据，需与标注数据同源
-   2. 对标注数据进行数据增强，具体增强策略见下节
-   3. 对无标注数据和数据增强数据进行一定比例混合
-
- - **Step 3.** 使用步骤2的数据训练出Student Model
-
-
-### 数据增强
-目前采用三种[数据增强策略](https://arxiv.org/pdf/1903.12136.pdf)策略，对于不用的任务可以特定的比例混合。三种数据增强策略包括：
-
- 1. 添加噪声：对原始样本中的词，以一定的概率（如0.1）替换为”UNK”标签
- 2. 同词性词替换：对原始样本中的所有词，以一定的概率（如0.1）替换为本数据集钟随机一个同词性的词
- 3. N-sampling：从原始样本中，随机选取位置截取长度为m的片段作为新的样本，其中片段的长度m为0到原始样本长度之间的随机值
-
-
-# 使用教程
-
-我们采用上述3种增强策略制作了chnsenticorp的增强数据：增强后的数据为原训练数据的10倍(96000行)，可以从[这里](https://ernie-github.cdn.bcebos.com/data-chnsenticorp-distill.tar.gz)下载。即可执行下面的脚本开始蒸馏。
-
-```shell
-python ./distill/distill.py
-```
-
-# 效果验证
-我们将实际应用场景分类为两种：
-
-### Case#1 用户提供“无标注数据”<a name="case1"></a>
-
-|模型 | 评论低质识别【分类 \| ACC】 | 中文情感【分类 \| ACC】 |问题识别【分类 \| ACC】|搜索问答匹配【匹配 \| 正逆序】|
-|---|---|---|---|---|
-|ERNIE-Finetune | 90.6% | 96.2% | 97.5% | 4.25 |
-|非ERNIE基线（BOW）| 80.8% | 94.7% | 93.0% | 1.83 |
-|**+ 数据蒸馏** | 87.2% | 95.8% | 96.3% | 3.30 |
-
-### Case#2 用户未提供“无标注数据”（通过数据增强生成数据）<a name="case2"></a>
-
-|模型 |ChnSentiCorp |
-|---|---|
-|ERNIE-Finetune |95.4% |
-|非ERNIE基线(BOW)|90.1%|
-|**+ 数据蒸馏** |91.4%|
-|非ERNIE基线（LSTM）|91.2%|
-|**+ 数据蒸馏**|93.9%|
--- a/demo/distill/distill.py
+++ b/demo/distill/distill.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import sys
-import os
-
-import numpy as np
-from sklearn.metrics import f1_score
-import paddle as P
-from paddle.nn import functional as F
-import propeller.paddle as propeller
-
-from ernie.tokenizing_ernie import ErnieTokenizer
-from ernie.modeling_ernie import ErnieModelForSequenceClassification
-from demo.utils import create_if_not_exists, get_warmup_and_linear_decay
-
-# 本例子采用chnsenticorp中文情感识别任务作为示范；并且事先通过数据增强扩充了蒸馏所需的无监督数据
-#
-# 下载数据；并存放在 ./chnsenticorp-data/
-# 数据分为3列：原文；空格切词；情感标签
-# 其中第一列为ERNIE的输入；第二列为BoW词袋模型的输入
-# 事先统计好的BoW 词典在 ./chnsenticorp-data/vocab.bow.txt
-
-# 定义finetune teacher模型所需要的超参数
-DATA_DIR = './chnsenticorp-data/'
-SEQLEN = 256
-BATCH = 32
-EPOCH = 10
-LR = 5e-5
-
-tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
-
-student_vocab = {
-    i.strip(): l
-    for l, i in enumerate(
-        open(
-            os.path.join(DATA_DIR, 'vocab.bow.txt'), encoding='utf8')
-        .readlines())
-}
-
-
-def space_tokenizer(i):
-    return i.decode('utf8').split()
-
-
-feature_column = propeller.data.FeatureColumns([
-    propeller.data.TextColumn(
-        'seg_a',
-        unk_id=tokenizer.unk_id,
-        vocab_dict=tokenizer.vocab,
-        tokenizer=tokenizer.tokenize),
-    propeller.data.TextColumn(
-        'seg_a_student',
-        unk_id=student_vocab['[UNK]'],
-        vocab_dict=student_vocab,
-        tokenizer=space_tokenizer),
-    propeller.data.LabelColumn(
-        'label', vocab_dict={
-            b"0": 0,
-            b"1": 1,
-        }),
-])
-
-
-def map_fn(seg_a, seg_a_student, label):
-    seg_a, _ = tokenizer.truncate(seg_a, [], seqlen=SEQLEN)
-    sentence, segments = tokenizer.build_for_ernie(seg_a)
-    return seg_a_student, sentence, segments, label
-
-
-train_ds = feature_column.build_dataset('train', data_dir=os.path.join(DATA_DIR, 'train/'), shuffle=True, repeat=False, use_gz=False) \
-    .map(map_fn) \
-    .padded_batch(BATCH)
-
-train_ds_unlabel = feature_column.build_dataset('train-da', data_dir=os.path.join(DATA_DIR, 'train-data-augmented/'), shuffle=True, repeat=False, use_gz=False) \
-    .map(map_fn) \
-    .padded_batch(BATCH)
-
-dev_ds = feature_column.build_dataset('dev', data_dir=os.path.join(DATA_DIR, 'dev/'), shuffle=False, repeat=False, use_gz=False) \
-    .map(map_fn) \
-    .padded_batch(BATCH,)
-
-shapes = ([-1, SEQLEN], [-1, SEQLEN], [-1, SEQLEN], [-1])
-types = ('int64', 'int64', 'int64', 'int64')
-
-train_ds.data_shapes = shapes
-train_ds.data_types = types
-train_ds_unlabel.data_shapes = shapes
-train_ds_unlabel.data_types = types
-dev_ds.data_shapes = shapes
-dev_ds.data_types = types
-
-place = P.CUDAPlace(0)
-
-
-def evaluate_teacher(model, dataset):
-    all_pred, all_label = [], []
-    with P.no_grad():
-        model.eval()
-        for step, (ids_student, ids, _, labels) in enumerate(
-                P.io.DataLoader(
-                    dataset, places=place, batch_size=None)):
-            _, logits = model(ids)
-            pred = logits.argmax(-1)
-            all_pred.extend(pred.numpy())
-            all_label.extend(labels.numpy())
-        f1 = f1_score(all_label, all_pred, average='macro')
-        model.train()
-        return f1
-
-
-teacher_model = ErnieModelForSequenceClassification.from_pretrained(
-    'ernie-1.0', num_labels=2)
-teacher_model.train()
-if not os.path.exists('./teacher_model.bin'):
-    g_clip = P.nn.ClipGradByGlobalNorm(1.0)  #experimental
-    lr_scheduler = P.optimizer.lr.LambdaDecay(
-        LR,
-        get_warmup_and_linear_decay(9600 * EPOCH / BATCH,
-                                    9600 * EPOCH * 0.1 / BATCH))
-
-    opt = P.optimizer.AdamW(
-        lr_scheduler,
-        parameters=teacher_model.parameters(),
-        weight_decay=0.01,
-        grad_clip=g_clip)
-    for epoch in range(EPOCH):
-        for step, (ids_student, ids, sids, labels) in enumerate(
-                P.io.DataLoader(
-                    train_ds, places=place, batch_size=None)):
-            loss, logits = teacher_model(ids, labels=labels)
-            loss.backward()
-            opt.step()
-            lr_scheduler.step()
-            teacher_model.clear_gradients()
-
-            if step % 10 == 0:
-                _lr = lr_scheduler.get_lr()
-                _l = loss.numpy()
-                msg = '[step-%d] train loss %.5f lr %.3e' % (step, _l, _lr)
-                print(msg)
-            if step % 100 == 0:
-                f1 = evaluate_teacher(teacher_model, dev_ds)
-                print('teacher f1: %.5f' % f1)
-    P.save(teacher_model.state_dict(),str( './teacher_model.bin'))
-else:
-    state_dict = P.load('./teacher_model.bin')
-    teacher_model.set_state_dict(state_dict)
-    f1 = evaluate_teacher(teacher_model, dev_ds)
-    print('teacher f1: %.5f' % f1)
-
-# 定义finetune student 模型所需要的超参数
-SEQLEN = 256
-BATCH = 32
-EPOCH = 10
-LR = 1e-4
-
-
-def evaluate_student(model, dataset):
-    all_pred, all_label = [], []
-    with P.no_grad():
-        model.eval()
-        for step, (ids_student, ids, _, labels) in enumerate(
-                P.io.DataLoader(
-                    dataset, places=place, batch_size=None)):
-            _, logits = model(ids_student)
-            pred = logits.argmax(-1)
-            all_pred.extend(pred.numpy())
-            all_label.extend(labels.numpy())
-        f1 = f1_score(all_label, all_pred, average='macro')
-        model.train()
-        return f1
-
-
-class BOW(P.nn.Layer):
-    def __init__(self):
-        super().__init__()
-        self.emb = P.nn.Embedding(len(student_vocab), 128, padding_idx=0)
-        self.fc = P.nn.Linear(128, 2)
-
-    def forward(self, ids, labels=None):
-        embbed = self.emb(ids)
-        pad_mask = (ids != 0).cast('float32').unsqueeze(-1)
-
-        embbed = (embbed * pad_mask).sum(1)
-        embbed = F.softsign(embbed)
-        logits = self.fc(embbed)
-        if labels is not None:
-            if len(labels.shape) == 1:
-                labels = labels.reshape([-1, 1])
-            loss = F.cross_entropy(logits, labels).mean()
-        else:
-            loss = None
-        return loss, logits
-
-
-class CNN(P.nn.Layer):
-    def __init__(self):
-        super().__init__()
-        self.emb = P.nn.Embedding(30002, 128, padding_idx=0)
-        self.cnn = P.nn.Conv2D(128, 128, (1, 3), padding=(0, 1), act='relu')
-        self.pool = P.nn.Pool2D((1, 3), pool_padding=(0, 1))
-        self.fc = P.nn.Linear(128, 2)
-
-    def forward(self, ids, labels=None):
-        embbed = self.emb(ids)
-        #d_batch, d_seqlen = ids.shape
-        hidden = embbed
-        hidden = hidden.transpose([0, 2, 1]).unsqueeze(2)  #change to NCWH
-        hidden = self.cnn(hidden)
-        hidden = self.pool(hidden).squeeze(2).transpose([0, 2, 1])
-        pad_mask = (ids != 0).cast('float32').unsqueeze(-1)
-        hidden = P.nn.funcional.softsign(L(hidden * pad_mask).sum(1))
-        logits = self.fc(hidden)
-        if labels is not None:
-            if len(labels.shape) == 1:
-                labels = labels.reshape([-1, 1])
-            loss = F.cross_entropy(logits, labels).mean()
-        else:
-            loss = None
-        return loss, logits
-
-
-def KL(pred, target):
-    pred = F.log_softmax(pred)
-    target = F.softmax(target)
-    loss = F.kl_div(pred, target)
-    return loss
-
-
-teacher_model.eval()
-model = BOW()
-g_clip = P.nn.ClipGradByGlobalNorm(1.0)  #experimental
-
-lr_scheduler = P.optimizer.lr.LambdaDecay(
-    LR,
-    get_warmup_and_linear_decay(9600 * EPOCH / BATCH,
-                                9600 * EPOCH * 0.1 / BATCH))
-
-opt = P.optimizer.AdamW(
-    lr_scheduler,
-    parameters=model.parameters(),
-    weight_decay=0.01,
-    grad_clip=g_clip)
-model.train()
-
-for epoch in range(EPOCH - 1):
-    for step, (
-            ids_student, ids, sids, label
-    ) in enumerate(P.io.DataLoader(
-            train_ds, places=place, batch_size=None)):
-        with P.no_grad():
-            _, logits_t = teacher_model(ids, sids)  # teacher 模型输出logits
-        _, logits_s = model(ids_student)  # student 模型输出logits
-        loss_ce, _ = model(ids_student, labels=label)
-        loss_kd = KL(logits_s, logits_t.detach())  # 由KL divergence度量两个分布的距离
-        loss = loss_ce + loss_kd
-        loss.backward()
-        opt.step()
-        lr_scheduler.step()
-        model.clear_gradients()
-        if step % 10 == 0:
-            _lr = lr_scheduler.get_lr()
-            _l = loss.numpy()
-            msg = '[step-%d] train loss %.5f lr %.3e' % (step, _l, _lr)
-            print(msg)
-
-    f1 = evaluate_student(model, dev_ds)
-    print('student f1 %.5f' % f1)
-
-# 最后再加一轮hard label训练巩固结果
-for step, (
-        ids_student, ids, sids, label
-) in enumerate(P.io.DataLoader(
-        train_ds, places=place, batch_size=None)):
-    loss, _ = model(ids_student, labels=label)
-    loss.backward()
-    opt.step()
-    model.clear_gradients()
-    if step % 10 == 0:
-        _lr = lr_scheduler.get_lr()
-        _l = loss.numpy()
-        msg = '[step-%d] train loss %.5f lr %.3e' % (step, _l, _lr)
-        print(msg)
-
-f1 = evaluate_student(model, dev_ds)
-print('final f1 %.5f' % f1)
--- a/demo/finetune_classifier.py
+++ b/demo/finetune_classifier.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import os
-import re
-import time
-import logging
-import json
-from random import random
-from functools import reduce, partial
-from visualdl import LogWriter
-
-import numpy as np
-import logging
-import argparse
-from pathlib import Path
-import paddle as P
-
-from propeller import log
-import propeller.paddle as propeller
-
-log.setLevel(logging.DEBUG)
-logging.getLogger().setLevel(logging.DEBUG)
-
-#from model.bert import BertConfig, BertModelLayer
-from ernie.modeling_ernie import ErnieModel, ErnieModelForSequenceClassification
-from ernie.tokenizing_ernie import ErnieTokenizer, ErnieTinyTokenizer
-#from ernie.optimization import AdamW, LinearDecay
-from demo.utils import create_if_not_exists, get_warmup_and_linear_decay
-
-parser = argparse.ArgumentParser('classify model with ERNIE')
-parser.add_argument(
-    '--from_pretrained',
-    type=Path,
-    required=True,
-    help='pretrained model directory or tag')
-parser.add_argument(
-    '--max_seqlen',
-    type=int,
-    default=128,
-    help='max sentence length, should not greater than 512')
-parser.add_argument(
-    '--bsz',
-    type=int,
-    default=128,
-    help='global batch size for each optimizer step')
-parser.add_argument(
-    '--micro_bsz',
-    type=int,
-    default=32,
-    help='batch size for each device. if `--bsz` > `--micro_bsz` * num_device, will do grad accumulate'
-)
-parser.add_argument('--epoch', type=int, default=3, help='epoch')
-parser.add_argument(
-    '--data_dir',
-    type=str,
-    required=True,
-    help='data directory includes train / develop data')
-parser.add_argument(
-    '--use_lr_decay',
-    action='store_true',
-    help='if set, learning rate will decay to zero at `max_steps`')
-parser.add_argument(
-    '--warmup_proportion',
-    type=float,
-    default=0.1,
-    help='if use_lr_decay is set, '
-    'learning rate will raise to `lr` at `warmup_proportion` * `max_steps` and decay to 0. at `max_steps`'
-)
-parser.add_argument('--lr', type=float, default=5e-5, help='learning rate')
-parser.add_argument(
-    '--inference_model_dir',
-    type=Path,
-    default=None,
-    help='inference model output directory')
-parser.add_argument(
-    '--save_dir', type=Path, required=True, help='model output directory')
-parser.add_argument(
-    '--max_steps',
-    type=int,
-    default=None,
-    help='max_train_steps, set this to EPOCH * NUM_SAMPLES / BATCH_SIZE')
-parser.add_argument(
-    '--wd', type=float, default=0.01, help='weight decay, aka L2 regularizer')
-parser.add_argument(
-    '--init_checkpoint',
-    type=str,
-    default=None,
-    help='checkpoint to warm start from')
-parser.add_argument(
-    '--use_amp',
-    action='store_true',
-    help='only activate AMP(auto mixed precision accelatoin) on TensorCore compatible devices'
-)
-
-args = parser.parse_args()
-
-if args.bsz > args.micro_bsz:
-    assert args.bsz % args.micro_bsz == 0, 'cannot perform gradient accumulate with bsz:%d micro_bsz:%d' % (
-        args.bsz, args.micro_bsz)
-    acc_step = args.bsz // args.micro_bsz
-    log.info(
-        'performing gradient accumulate: global_bsz:%d, micro_bsz:%d, accumulate_steps:%d'
-        % (args.bsz, args.micro_bsz, acc_step))
-    args.bsz = args.micro_bsz
-else:
-    acc_step = 1
-
-tokenizer = ErnieTokenizer.from_pretrained(args.from_pretrained)
-#tokenizer = ErnieTinyTokenizer.from_pretrained(args.from_pretrained)
-
-feature_column = propeller.data.FeatureColumns([
-    propeller.data.TextColumn(
-        'seg_a',
-        unk_id=tokenizer.unk_id,
-        vocab_dict=tokenizer.vocab,
-        tokenizer=tokenizer.tokenize),
-    propeller.data.TextColumn(
-        'seg_b',
-        unk_id=tokenizer.unk_id,
-        vocab_dict=tokenizer.vocab,
-        tokenizer=tokenizer.tokenize),
-    propeller.data.LabelColumn(
-        'label',
-        vocab_dict={
-            b"contradictory": 0,
-            b"contradiction": 0,
-            b"entailment": 1,
-            b"neutral": 2,
-        }),
-])
-
-
-def map_fn(seg_a, seg_b, label):
-    seg_a, seg_b = tokenizer.truncate(seg_a, seg_b, seqlen=args.max_seqlen)
-    sentence, segments = tokenizer.build_for_ernie(seg_a, seg_b)
-    return sentence, segments, label
-
-
-train_ds = feature_column.build_dataset('train', data_dir=os.path.join(args.data_dir, 'train'), shuffle=True, repeat=False, use_gz=False) \
-                               .map(map_fn) \
-                               .padded_batch(args.bsz, (0, 0, 0))
-
-dev_ds = feature_column.build_dataset('dev', data_dir=os.path.join(args.data_dir, 'dev'), shuffle=False, repeat=False, use_gz=False) \
-                               .map(map_fn) \
-                               .padded_batch(args.bsz, (0, 0, 0))
-
-place = P.CUDAPlace(0)
-model = ErnieModelForSequenceClassification.from_pretrained(
-    args.from_pretrained, num_labels=3, name='')
-
-if args.init_checkpoint is not None:
-    log.info('loading checkpoint from %s' % args.init_checkpoint)
-    sd = P.load(str(args.init_checkpoint))
-    model.set_state_dict(sd)
-
-g_clip = P.nn.ClipGradByGlobalNorm(1.0)  #experimental
-param_name_to_exclue_from_weight_decay = re.compile(
-    r'.*layer_norm_scale|.*layer_norm_bias|.*b_0')
-if args.use_lr_decay:
-    lr_scheduler = P.optimizer.lr.LambdaDecay(
-        args.lr,
-        get_warmup_and_linear_decay(
-            args.max_steps, int(args.warmup_proportion * args.max_steps)))
-    opt = P.optimizer.AdamW(
-        lr_scheduler,
-        parameters=model.parameters(),
-        weight_decay=args.wd,
-        apply_decay_param_fun=lambda n: not param_name_to_exclue_from_weight_decay.match(n),
-        grad_clip=g_clip)
-else:
-    lr_scheduler = None
-    opt = P.optimizer.AdamW(
-        args.lr,
-        parameters=model.parameters(),
-        weight_decay=args.wd,
-        apply_decay_param_fun=lambda n: not param_name_to_exclue_from_weight_decay.match(n),
-        grad_clip=g_clip)
-
-scaler = P.amp.GradScaler(enable=args.use_amp)
-step, inter_step = 0, 0
-with LogWriter(
-        logdir=str(create_if_not_exists(args.save_dir / 'vdl'))) as log_writer:
-    with P.amp.auto_cast(enable=args.use_amp):
-        for epoch in range(args.epoch):
-            for ids, sids, label in P.io.DataLoader(
-                    train_ds, places=P.CUDAPlace(0), batch_size=None):
-                inter_step += 1
-                loss, _ = model(ids, sids, labels=label)
-                loss /= acc_step
-                loss = scaler.scale(loss)
-                loss.backward()
-                if inter_step % acc_step != 0:
-                    continue
-                step += 1
-                scaler.minimize(opt, loss)
-                model.clear_gradients()
-                lr_scheduler and lr_scheduler.step()
-
-                if step % 10 == 0:
-                    _lr = lr_scheduler.get_lr(
-                    ) if args.use_lr_decay else args.lr
-                    if args.use_amp:
-                        _l = (loss / scaler._scale).numpy()
-                        msg = '[step-%d] train loss %.5f lr %.3e scaling %.3e' % (
-                            step, _l, _lr, scaler._scale.numpy())
-                    else:
-                        _l = loss.numpy()
-                        msg = '[step-%d] train loss %.5f lr %.3e' % (step, _l,
-                                                                     _lr)
-                    log.debug(msg)
-                    log_writer.add_scalar('loss', _l, step=step)
-                    log_writer.add_scalar('lr', _lr, step=step)
-                if step % 100 == 0:
-                    acc = []
-                    with P.no_grad():
-                        model.eval()
-                        for ids, sids, label in P.io.DataLoader(
-                                dev_ds, places=P.CUDAPlace(0),
-                                batch_size=None):
-                            loss, logits = model(ids, sids, labels=label)
-                            #print('\n'.join(map(str, logits.numpy().tolist())))
-                            a = (logits.argmax(-1) == label)
-                            acc.append(a.numpy())
-                        model.train()
-                    acc = np.concatenate(acc).mean()
-                    log_writer.add_scalar('eval/acc', acc, step=step)
-                    log.debug('acc %.5f' % acc)
-                    if args.save_dir is not None:
-                        P.save(model.state_dict(), str(args.save_dir / 'ckpt.bin'))
-if args.save_dir is not None:
-    P.save(model.state_dict(),str( args.save_dir / 'ckpt.bin'))
-if args.inference_model_dir is not None:
-
-    class InferenceModel(ErnieModelForSequenceClassification):
-        def forward(self, ids, sids):
-            _, logits = super(InferenceModel, self).forward(ids, sids)
-            return logits
-
-    model.__class__ = InferenceModel
-    log.debug('saving inference model')
-    src_placeholder = P.zeros([2, 2], dtype='int64')
-    sent_placehodler = P.zeros([2, 2], dtype='int64')
-    _, static = P.jit.TracedLayer.trace(
-        model, inputs=[src_placeholder, sent_placehodler])
-    static.save_inference_model(str(args.inference_model_dir))
-
-    #class InferenceModel(ErnieModelForSequenceClassification):
-    #    @P.jit.to_static
-    #    def forward(self, ids, sids):
-    #        _, logits =  super(InferenceModel, self).forward(ids, sids, labels=None)
-    #        return logits
-    #model.__class__ = InferenceModel
-    #src_placeholder = P.zeros([2, 2], dtype='int64')
-    #sent_placehodler = P.zeros([2, 2], dtype='int64')
-    #P.jit.save(model, args.inference_model_dir, input_var=[src_placeholder, sent_placehodler])
-    log.debug('done')
--- a/demo/finetune_classifier_distributed.py
+++ b/demo/finetune_classifier_distributed.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import os
-import time
-import logging
-import json
-import re
-from random import random
-from functools import reduce, partial
-
-import numpy as np
-import logging
-#from visualdl import LogWriter
-
-from pathlib import Path
-import paddle as P
-from propeller import log
-import propeller.paddle as propeller
-
-#from model.bert import BertConfig, BertModelLayer
-from ernie.modeling_ernie import ErnieModel, ErnieModelForSequenceClassification
-from ernie.tokenizing_ernie import ErnieTokenizer, ErnieTinyTokenizer
-#from ernie.optimization import AdamW, LinearDecay
-from demo.utils import create_if_not_exists, get_warmup_and_linear_decay
-
-log.setLevel(logging.DEBUG)
-logging.getLogger().setLevel(logging.DEBUG)
-
-parser = propeller.ArgumentParser('classify model with ERNIE')
-parser.add_argument(
-    '--from_pretrained',
-    type=Path,
-    required=True,
-    help='pretrained model directory or tag')
-parser.add_argument(
-    '--max_seqlen',
-    type=int,
-    default=128,
-    help='max sentence length, should not greater than 512')
-parser.add_argument('--bsz', type=int, default=32, help='batchsize')
-parser.add_argument(
-    '--data_dir',
-    type=str,
-    required=True,
-    help='data directory includes train / develop data')
-parser.add_argument(
-    '--max_steps',
-    type=int,
-    required=True,
-    help='max_train_steps, set this to EPOCH * NUM_SAMPLES / BATCH_SIZE')
-parser.add_argument('--warmup_proportion', type=float, default=0.1)
-parser.add_argument('--lr', type=float, default=5e-5, help='learning rate')
-parser.add_argument(
-    '--save_dir', type=Path, required=True, help='model output directory')
-parser.add_argument(
-    '--wd', type=float, default=0.01, help='weight decay, aka L2 regularizer')
-parser.add_argument(
-    '--init_checkpoint',
-    type=str,
-    default=None,
-    help='checkpoint to warm start from')
-
-parser.add_argument(
-    '--use_amp',
-    action='store_true',
-    help='only activate AMP(auto mixed precision accelatoin) on TensorCore compatible devices'
-)
-
-args = parser.parse_args()
-env = P.distributed.ParallelEnv()
-
-tokenizer = ErnieTokenizer.from_pretrained(args.from_pretrained)
-#tokenizer = ErnieTinyTokenizer.from_pretrained(args.from_pretrained)
-
-feature_column = propeller.data.FeatureColumns([
-    propeller.data.TextColumn(
-        'seg_a',
-        unk_id=tokenizer.unk_id,
-        vocab_dict=tokenizer.vocab,
-        tokenizer=tokenizer.tokenize),
-    propeller.data.TextColumn(
-        'seg_b',
-        unk_id=tokenizer.unk_id,
-        vocab_dict=tokenizer.vocab,
-        tokenizer=tokenizer.tokenize),
-    propeller.data.LabelColumn(
-        'label', vocab_dict={
-            b"0": 0,
-            b"1": 1,
-            b"2": 2,
-        }),
-])
-
-
-def map_fn(seg_a, seg_b, label):
-    seg_a, seg_b = tokenizer.truncate(seg_a, seg_b, seqlen=args.max_seqlen)
-    sentence, segments = tokenizer.build_for_ernie(seg_a, seg_b)
-    return sentence, segments, label
-
-train_ds = feature_column.build_dataset('train', data_dir=os.path.join(args.data_dir, 'train'),
-                                            shuffle=True, repeat=True, use_gz=False, shard=True) \
-                               .map(map_fn) \
-                               .padded_batch(args.bsz, (0, 0, 0))
-
-dev_ds = feature_column.build_dataset('dev', data_dir=os.path.join(args.data_dir, 'dev'),
-                                        shuffle=False, repeat=False, use_gz=False) \
-                               .map(map_fn) \
-                               .padded_batch(args.bsz, (0, 0, 0))
-
-shapes = ([-1, args.max_seqlen], [-1, args.max_seqlen], [-1])
-types = ('int64', 'int64', 'int64')
-
-P.distributed.init_parallel_env()
-model = ErnieModelForSequenceClassification.from_pretrained(
-    args.from_pretrained, num_labels=3, name='')
-
-if args.init_checkpoint is not None:
-    log.info('loading checkpoint from %s' % args.init_checkpoint)
-    sd = P.load(str(args.init_checkpoint))
-    model.set_state_dict(sd)
-
-model = P.DataParallel(model)
-
-g_clip = P.nn.ClipGradByGlobalNorm(1.0)  #experimental
-param_name_to_exclue_from_weight_decay = re.compile(
-    r'.*layer_norm_scale|.*layer_norm_bias|.*b_0')
-
-lr_scheduler = P.optimizer.lr.LambdaDecay(
-    args.lr,
-    get_warmup_and_linear_decay(args.max_steps,
-                                int(args.warmup_proportion * args.max_steps)))
-opt = P.optimizer.AdamW(
-    learning_rate=lr_scheduler,
-    parameters=model.parameters(),
-    apply_decay_param_fun=lambda n: not param_name_to_exclue_from_weight_decay.match(n),
-    weight_decay=args.wd,
-    grad_clip=g_clip)
-scaler = P.amp.GradScaler(enable=args.use_amp)
-step = 0
-create_if_not_exists(args.save_dir)
-
-#with LogWriter(logdir=str(create_if_not_exists(args.save_dir / 'vdl-%d' % env.dev_id))) as log_writer:
-with P.amp.auto_cast(enable=args.use_amp):
-    for ids, sids, label in P.io.DataLoader(
-            train_ds, places=P.CUDAPlace(env.dev_id), batch_size=None):
-        step += 1
-        loss, _ = model(ids, sids, labels=label)
-        loss = scaler.scale(loss)
-        loss.backward()
-        scaler.minimize(opt, loss)
-        model.clear_gradients()
-        lr_scheduler.step()
-
-        # do logging
-        if step % 10 == 0:
-            _lr = lr_scheduler.get_lr()
-            if args.use_amp:
-                _l = (loss / scaler._scale).numpy()
-                msg = '[rank-%d][step-%d] train loss %.5f lr %.3e scaling %.3e' % (
-                    env.dev_id, step, _l, _lr, scaler._scale.numpy())
-            else:
-                _l = loss.numpy()
-                msg = '[rank-%d][step-%d] train loss %.5f lr %.3e' % (
-                    env.dev_id, step, _l, _lr)
-            log.debug(msg)
-            #log_writer.add_scalar('loss', _l, step=step)
-            #log_writer.add_scalar('lr', _lr, step=step)
-
-        # do saving
-        if step % 100 == 0 and env.dev_id == 0:
-            acc = []
-            with P.no_grad():
-                model.eval()
-                for d in P.io.DataLoader(
-                        dev_ds, places=P.CUDAPlace(env.dev_id),
-                        batch_size=None):
-                    ids, sids, label = d
-                    loss, logits = model(ids, sids, labels=label)
-                    a = (logits.argmax(-1) == label)
-                    acc.append(a.numpy())
-                model.train()
-            acc = np.concatenate(acc).mean()
-            #log_writer.add_scalar('eval/acc', acc, step=step)
-            log.debug('acc %.5f' % acc)
-            if args.save_dir is not None:
-                P.save(model.state_dict(),str( args.save_dir / 'ckpt.bin'))
-        # exit 
-        if step > args.max_steps:
-            break
-
-if args.save_dir is not None and env.dev_id == 0:
-    P.save(model.state_dict(),str( args.save_dir / 'ckpt.bin'))
-log.debug('done')
--- a/demo/finetune_classifier_static.py
+++ b/demo/finetune_classifier_static.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-from __future__ import absolute_import
-
-import os
-import re
-import time
-import logging
-from random import random
-import json
-from functools import reduce, partial
-
-import numpy as np
-import multiprocessing
-import tempfile
-import re
-
-import paddle as P
-
-from ernie.modeling_ernie import ErnieModel, ErnieModelForSequenceClassification
-from ernie.tokenizing_ernie import ErnieTokenizer, ErnieTinyTokenizer
-from demo.optimization import optimization
-#import utils.data
-
-from propeller import log
-import propeller.paddle as propeller
-
-log.setLevel(logging.DEBUG)
-logging.getLogger().setLevel(logging.DEBUG)
-
-
-def model_fn(features, mode, params, run_config):
-    ernie = ErnieModelForSequenceClassification(params, name='')
-    if mode is not propeller.RunMode.TRAIN:
-        ernie.eval()
-    else:
-        ernie.train()
-
-    metrics, loss = None, None
-    if mode is propeller.RunMode.PREDICT:
-        src_ids, sent_ids = features
-        _, logits = ernie(src_ids, sent_ids)
-        predictions = [logits, ]
-    else:
-        src_ids, sent_ids, labels = features
-        if mode is propeller.RunMode.EVAL:
-            loss, logits = ernie(src_ids, sent_ids, labels=labels)
-            pred = logits.argmax(axis=1)
-            acc = propeller.metrics.Acc(labels, pred)
-            metrics = {'acc': acc}
-            predictions = [pred]
-            train_hooks = None
-        else:
-            loss, logits = ernie(src_ids, sent_ids, labels=labels)
-            lr_step_hook, loss_scale_coef = optimization(
-                loss=loss,
-                warmup_steps=int(run_config.max_steps *
-                                 params['warmup_proportion']),
-                num_train_steps=run_config.max_steps,
-                learning_rate=params['learning_rate'],
-                train_program=P.static.default_main_program(),
-                startup_prog=P.static.default_startup_program(),
-                use_fp16=args.use_amp,
-                weight_decay=params['weight_decay'],
-                scheduler="linear_warmup_decay", )
-            scheduled_lr = P.static.default_main_program().global_block().var(
-                'learning_rate_0')
-            propeller.summary.scalar('lr', scheduled_lr)
-            predictions = [logits, ]
-            train_hooks = [lr_step_hook]
-
-    return propeller.ModelSpec(
-        loss=loss,
-        mode=mode,
-        metrics=metrics,
-        predictions=predictions,
-        train_hooks=train_hooks)
-
-
-if __name__ == '__main__':
-    parser = propeller.ArgumentParser('DAN model with Paddle')
-    parser.add_argument('--do_predict', action='store_true')
-    parser.add_argument('--max_seqlen', type=int, default=128)
-    parser.add_argument('--data_dir', type=str, required=True)
-    parser.add_argument('--from_pretrained', type=str, required=True)
-    parser.add_argument('--warm_start_from', type=str)
-    parser.add_argument('--epoch', type=int, default=3)
-    parser.add_argument('--use_amp', action='store_true')
-
-    args = parser.parse_args()
-
-    P.enable_static()
-
-    if not os.path.exists(args.from_pretrained):
-        raise ValueError('--from_pretrained not found: %s' %
-                         args.from_pretrained)
-    cfg_file_path = os.path.join(args.from_pretrained, 'ernie_config.json')
-    param_path = os.path.join(args.from_pretrained, 'params')
-    vocab_path = os.path.join(args.from_pretrained, 'vocab.txt')
-
-    assert os.path.exists(cfg_file_path) and os.path.exists(
-        param_path) and os.path.exists(vocab_path)
-
-    hparams_cli = propeller.parse_hparam(args)
-    hparams_config_file = json.loads(open(cfg_file_path).read())
-    default_hparams = propeller.HParams(
-        batch_size=32,
-        num_labels=3,
-        warmup_proportion=0.1,
-        learning_rate=5e-5,
-        weight_decay=0.01,
-        use_task_id=False,
-        use_fp16=args.use_amp)
-
-    hparams = default_hparams.join(propeller.HParams(
-        **hparams_config_file)).join(hparams_cli)
-
-    default_run_config = dict(
-        max_steps=args.epoch * 390000 / hparams.batch_size,
-        save_steps=1000,
-        log_steps=10,
-        max_ckpt=1,
-        skip_steps=0,
-        model_dir=tempfile.mkdtemp(),
-        eval_steps=100)
-    run_config = dict(default_run_config, **json.loads(args.run_config))
-    run_config = propeller.RunConfig(**run_config)
-
-    tokenizer = ErnieTokenizer.from_pretrained(args.from_pretrained)
-    #tokenizer = ErnieTinyTokenizer.from_pretrained(args.from_pretrained)
-    unk_id = tokenizer.vocab['[UNK]']
-
-    shapes = ([-1, args.max_seqlen], [-1, args.max_seqlen], [-1])
-    types = ('int64', 'int64', 'int64')
-    if not args.do_predict:
-        feature_column = propeller.data.FeatureColumns([
-            propeller.data.TextColumn(
-                'title',
-                unk_id=unk_id,
-                vocab_dict=tokenizer.vocab,
-                tokenizer=tokenizer.tokenize),
-            propeller.data.TextColumn(
-                'comment',
-                unk_id=unk_id,
-                vocab_dict=tokenizer.vocab,
-                tokenizer=tokenizer.tokenize),
-            propeller.data.LabelColumn(
-                'label',
-                vocab_dict={
-                    b"contradictory": 0,
-                    b"contradiction": 0,
-                    b"entailment": 1,
-                    b"neutral": 2,
-                }),
-        ])
-
-        def map_fn(seg_a, seg_b, label):
-            seg_a, seg_b = tokenizer.truncate(
-                seg_a, seg_b, seqlen=args.max_seqlen)
-            sentence, segments = tokenizer.build_for_ernie(seg_a, seg_b)
-            #label = np.expand_dims(label, -1) #
-            return sentence, segments, label
-
-        train_ds = feature_column.build_dataset('train', data_dir=os.path.join(args.data_dir, 'train'), shuffle=True, repeat=True, use_gz=False) \
-                                       .map(map_fn) \
-                                       .padded_batch(hparams.batch_size)
-
-        dev_ds = feature_column.build_dataset('dev', data_dir=os.path.join(args.data_dir, 'dev'), shuffle=False, repeat=False, use_gz=False) \
-                                       .map(map_fn) \
-                                       .padded_batch(hparams.batch_size)
-
-        test_ds = feature_column.build_dataset('test', data_dir=os.path.join(args.data_dir, 'test'), shuffle=False, repeat=False, use_gz=False) \
-                                       .map(map_fn) \
-                                       .padded_batch(hparams.batch_size) \
-
-        train_ds.data_shapes = shapes
-        train_ds.data_types = types
-        dev_ds.data_shapes = shapes
-        dev_ds.data_types = types
-        test_ds.data_shapes = shapes
-        test_ds.data_types = types
-
-        varname_to_warmstart = re.compile(
-            r'^encoder.*[wb]_0$|^.*embedding$|^.*bias$|^.*scale$|^pooled_fc.[wb]_0$'
-        )
-
-        ws = propeller.WarmStartSetting(
-                predicate_fn=lambda v: varname_to_warmstart.match(v.name) and os.path.exists(os.path.join(param_path, v.name)),
-                from_dir=param_path,
-            )
-
-        best_exporter = propeller.train.exporter.BestExporter(
-            os.path.join(run_config.model_dir, 'best'),
-            cmp_fn=lambda old, new: new['dev']['acc'] > old['dev']['acc'])
-        propeller.train.train_and_eval(
-            model_class_or_model_fn=model_fn,
-            params=hparams,
-            run_config=run_config,
-            train_dataset=train_ds,
-            eval_dataset={'dev': dev_ds,
-                          'test': test_ds},
-            warm_start_setting=ws,
-            exporters=[best_exporter])
-
-        print('dev_acc3\t%.5f\ntest_acc3\t%.5f' %
-              (best_exporter._best['dev']['acc'],
-               best_exporter._best['test']['acc']))
-
-    else:
-        feature_column = propeller.data.FeatureColumns([
-            propeller.data.TextColumn(
-                'title',
-                unk_id=unk_id,
-                vocab_dict=tokenizer.vocab,
-                tokenizer=tokenizer.tokenize),
-            propeller.data.TextColumn(
-                'comment',
-                unk_id=unk_id,
-                vocab_dict=tokenizer.vocab,
-                tokenizer=tokenizer.tokenize),
-        ])
-
-        def map_fn(seg_a, seg_b):
-            seg_a, seg_b = tokenizer.truncate(
-                seg_a, seg_b, seqlen=args.max_seqlen)
-            sentence, segments = tokenizer.build_for_ernie(seg_a, seg_b)
-            return sentence, segments
-
-
-        predict_ds = feature_column.build_dataset_from_stdin('predict') \
-                               .map(map_fn) \
-                               .padded_batch(hparams.batch_size) \
-
-        predict_ds.data_shapes = shapes[:-1]
-        predict_ds.data_types = types[:-1]
-
-        est = propeller.Learner(model_fn, run_config, hparams)
-        for res, in est.predict(predict_ds, ckpt=-1):
-            print('%d\t%.5f\t%.5f\t%.5f' %
-                  (np.argmax(res), res[0], res[1], res[2]))
--- a/demo/finetune_mrc.py
+++ b/demo/finetune_mrc.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import division
-from __future__ import absolute_import
-from __future__ import print_function
-from __future__ import unicode_literals
-
-import os
-import re
-import time
-import logging
-import json
-from pathlib import Path
-from random import random
-from tqdm import tqdm
-from functools import reduce, partial
-import pickle
-import argparse
-from functools import partial
-from io import open
-
-import numpy as np
-import logging
-
-import paddle as P
-
-from propeller import log
-import propeller.paddle as propeller
-
-from ernie.modeling_ernie import ErnieModel, ErnieModelForQuestionAnswering
-from ernie.tokenizing_ernie import ErnieTokenizer, ErnieTinyTokenizer
-#from ernie.optimization import AdamW, LinearDecay
-
-from demo.mrc import mrc_reader
-from demo.mrc import mrc_metrics
-from demo.utils import create_if_not_exists, get_warmup_and_linear_decay
-
-log.setLevel(logging.DEBUG)
-logging.getLogger().setLevel(logging.DEBUG)
-
-
-def evaluate(model, ds, all_examples, all_features, tokenizer, args):
-    dev_file = json.loads(open(args.dev_file, encoding='utf8').read())
-    with P.no_grad():
-        log.debug('start eval')
-        model.eval()
-        all_res = []
-        for step, (uids, token_ids, token_type_ids, _, __) in enumerate(
-                P.io.DataLoader(
-                    ds, places=P.CUDAPlace(env.dev_id), batch_size=None)):
-            _, start_logits, end_logits = model(token_ids, token_type_ids)
-            res = [
-                mrc_metrics.RawResult(
-                    unique_id=u, start_logits=s, end_logits=e)
-                for u, s, e in zip(uids.numpy(),
-                                   start_logits.numpy(), end_logits.numpy())
-            ]
-            all_res += res
-        open('all_res', 'wb').write(pickle.dumps(all_res))
-        all_pred, all_nbests = mrc_metrics.make_results(
-            tokenizer,
-            all_examples,
-            all_features,
-            all_res,
-            n_best_size=args.n_best_size,
-            max_answer_length=args.max_answer_length,
-            do_lower_case=tokenizer.lower)
-        f1, em, _, __ = mrc_metrics.evaluate(dev_file, all_pred)
-        model.train()
-        log.debug('done eval')
-        return f1, em
-
-
-def train(model, train_dataset, dev_dataset, dev_examples, dev_features,
-          tokenizer, args):
-    model = P.DataParallel(model)
-
-    max_steps = len(train_features) * args.epoch // args.bsz
-
-    g_clip = P.nn.ClipGradByGlobalNorm(1.0)  #experimental
-    lr_scheduler = P.optimizer.lr.LambdaDecay(
-        args.lr,
-        get_warmup_and_linear_decay(max_steps,
-                                    int(args.warmup_proportion * max_steps)))
-
-    opt = P.optimizer.AdamW(
-        lr_scheduler,
-        parameters=model.parameters(),
-        weight_decay=args.wd,
-        grad_clip=g_clip)
-
-    train_dataset = train_dataset \
-            .cache_shuffle_shard(env.nranks, env.dev_id, drop_last=True) \
-            .padded_batch(args.bsz)
-
-    log.debug('init training with args: %s' % repr(args))
-    scaler = P.amp.GradScaler(enable=args.use_amp)
-    create_if_not_exists(args.save_dir)
-
-    with P.amp.auto_cast(enable=args.use_amp):
-        for step, (_, token_ids, token_type_ids, start_pos,
-                   end_pos) in enumerate(
-                       P.io.DataLoader(
-                           train_dataset,
-                           places=P.CUDAPlace(env.dev_id),
-                           batch_size=None)):
-            loss, _, __ = model(
-                token_ids,
-                token_type_ids,
-                start_pos=start_pos,
-                end_pos=end_pos)
-            loss = scaler.scale(loss)
-            loss.backward()
-            scaler.minimize(opt, loss)
-            model.clear_gradients()
-            lr_scheduler.step()
-
-            if env.dev_id == 0 and step % 10 == 0:
-                _lr = lr_scheduler.get_lr()
-                if args.use_amp:
-                    _l = (loss / scaler._scale).numpy()
-                    msg = '[rank-%d][step-%d] train loss %.5f lr %.3e scaling %.3e' % (
-                        env.dev_id, step, _l, _lr, scaler._scale.numpy())
-                else:
-                    _l = loss.numpy()
-                    msg = '[rank-%d][step-%d] train loss %.5f lr %.3e' % (
-                        env.dev_id, step, _l, _lr)
-                log.debug(msg)
-
-            if env.dev_id == 0 and step % 100 == 0:
-                f1, em = evaluate(model, dev_dataset, dev_examples,
-                                  dev_features, tokenizer, args)
-                log.debug('[step %d] eval result: f1 %.5f em %.5f' %
-                          (step, f1, em))
-            if env.dev_id == 0 and args.save_dir is not None:
-                P.save(model.state_dict(), str(args.save_dir / 'ckpt.bin'))
-            if step > max_steps:
-                break
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser('MRC model with ERNIE')
-    parser.add_argument(
-        '--from_pretrained',
-        type=Path,
-        required=True,
-        help='pretrained model directory or tag')
-    parser.add_argument(
-        '--max_seqlen',
-        type=int,
-        default=512,
-        help='max sentence length, should not greater than 512')
-    parser.add_argument('--bsz', type=int, default=8, help='batchsize')
-    parser.add_argument('--epoch', type=int, default=2, help='epoch')
-    parser.add_argument(
-        '--train_file',
-        type=str,
-        required=True,
-        help='data directory includes train / develop data')
-    parser.add_argument(
-        '--dev_file',
-        type=str,
-        required=True,
-        help='data directory includes train / develop data')
-    parser.add_argument('--warmup_proportion', type=float, default=0.1)
-    parser.add_argument('--lr', type=float, default=3e-5, help='learning rate')
-    parser.add_argument(
-        '--save_dir', type=Path, required=True, help='model output directory')
-    parser.add_argument(
-        '--n_best_size', type=int, default=20, help='nbest prediction to keep')
-    parser.add_argument(
-        '--max_answer_length', type=int, default=100, help='max answer span')
-    parser.add_argument(
-        '--wd',
-        type=float,
-        default=0.01,
-        help='weight decay, aka L2 regularizer')
-    parser.add_argument(
-        '--use_amp',
-        action='store_true',
-        help='only activate AMP(auto mixed precision accelatoin) on TensorCore compatible devices'
-    )
-
-    args = parser.parse_args()
-
-    env = P.distributed.ParallelEnv()
-    P.distributed.init_parallel_env()
-
-    tokenizer = ErnieTokenizer.from_pretrained(args.from_pretrained)
-
-    if not os.path.exists(args.train_file):
-        raise RuntimeError('input data not found at %s' % args.train_file)
-    if not os.path.exists(args.dev_file):
-        raise RuntimeError('input data not found at %s' % args.dev_file)
-
-    log.info('making train/dev data...')
-    train_examples = mrc_reader.read_files(args.train_file, is_training=True)
-    train_features = mrc_reader.convert_example_to_features(
-        train_examples, args.max_seqlen, tokenizer, is_training=True)
-
-    dev_examples = mrc_reader.read_files(args.dev_file, is_training=False)
-    dev_features = mrc_reader.convert_example_to_features(
-        dev_examples, args.max_seqlen, tokenizer, is_training=False)
-
-    log.info('train examples: %d, features: %d' %
-             (len(train_examples), len(train_features)))
-
-    def map_fn(unique_id, example_index, doc_span_index, tokens,
-               token_to_orig_map, token_is_max_context, token_ids,
-               position_ids, text_type_ids, start_position, end_position):
-        if start_position is None:
-            start_position = 0
-        if end_position is None:
-            end_position = 0
-        return np.array(unique_id), np.array(token_ids), np.array(
-            text_type_ids), np.array(start_position), np.array(end_position)
-
-    train_dataset = propeller.data.Dataset.from_list(train_features).map(
-        map_fn)
-
-    dev_dataset = propeller.data.Dataset.from_list(dev_features).map(
-        map_fn).padded_batch(args.bsz)
-
-    model = ErnieModelForQuestionAnswering.from_pretrained(
-        args.from_pretrained, name='')
-
-    train(model, train_dataset, dev_dataset, dev_examples, dev_features,
-          tokenizer, args)
-
-    if env.dev_id == 0:
-        f1, em = evaluate(model, dev_dataset, dev_examples, dev_features,
-                          tokenizer, args)
-        log.debug('final eval result: f1 %.5f em %.5f' % (f1, em))
-    if env.dev_id == 0 and args.save_dir is not None:
-        P.save(model.state_dict(), str(args.save_dir / 'ckpt.bin'))
--- a/demo/finetune_ner.py
+++ b/demo/finetune_ner.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import os
-import re
-import time
-import logging
-import six
-import json
-from random import random
-from tqdm import tqdm
-from collections import OrderedDict
-from functools import reduce, partial
-from pathlib import Path
-from visualdl import LogWriter
-
-import numpy as np
-import multiprocessing
-import pickle
-import logging
-
-from sklearn.metrics import f1_score
-import paddle as P
-
-from propeller import log
-import propeller.paddle as propeller
-
-log.setLevel(logging.DEBUG)
-logging.getLogger().setLevel(logging.DEBUG)
-
-from demo.utils import create_if_not_exists, get_warmup_and_linear_decay
-from ernie.modeling_ernie import ErnieModel, ErnieModelForSequenceClassification, ErnieModelForTokenClassification
-from ernie.tokenizing_ernie import ErnieTokenizer
-#from ernie.optimization import AdamW, LinearDecay
-
-parser = propeller.ArgumentParser('NER model with ERNIE')
-parser.add_argument('--max_seqlen', type=int, default=256)
-parser.add_argument('--bsz', type=int, default=32)
-parser.add_argument('--data_dir', type=str, required=True)
-parser.add_argument('--epoch', type=int, default=6)
-parser.add_argument(
-    '--warmup_proportion',
-    type=float,
-    default=0.1,
-    help='if use_lr_decay is set, '
-    'learning rate will raise to `lr` at `warmup_proportion` * `max_steps` and decay to 0. at `max_steps`'
-)
-parser.add_argument(
-    '--max_steps',
-    type=int,
-    required=True,
-    help='max_train_steps, set this to EPOCH * NUM_SAMPLES / BATCH_SIZE, used in learning rate scheduler'
-)
-parser.add_argument(
-    '--use_amp',
-    action='store_true',
-    help='only activate AMP(auto mixed precision accelatoin) on TensorCore compatible devices'
-)
-
-parser.add_argument('--from_pretrained', type=Path, required=True)
-parser.add_argument('--lr', type=float, default=5e-5, help='learning rate')
-parser.add_argument(
-    '--save_dir', type=Path, required=True, help='model output directory')
-parser.add_argument(
-    '--wd', type=float, default=0.01, help='weight decay, aka L2 regularizer')
-args = parser.parse_args()
-
-tokenizer = ErnieTokenizer.from_pretrained(args.from_pretrained)
-
-
-def tokenizer_func(inputs):
-    ret = inputs.split(b'\2')
-    tokens, orig_pos = [], []
-    for i, r in enumerate(ret):
-        t = tokenizer.tokenize(r)
-        for tt in t:
-            tokens.append(tt)
-            orig_pos.append(i)
-    assert len(tokens) == len(orig_pos)
-    return tokens + orig_pos
-
-
-def tokenizer_func_for_label(inputs):
-    return inputs.split(b'\2')
-
-
-feature_map = {
-    b"B-PER": 0,
-    b"I-PER": 1,
-    b"B-ORG": 2,
-    b"I-ORG": 3,
-    b"B-LOC": 4,
-    b"I-LOC": 5,
-    b"O": 6,
-}
-other_tag_id = feature_map[b'O']
-
-feature_column = propeller.data.FeatureColumns([
-    propeller.data.TextColumn(
-        'text_a',
-        unk_id=tokenizer.unk_id,
-        vocab_dict=tokenizer.vocab,
-        tokenizer=tokenizer_func), propeller.data.TextColumn(
-            'label',
-            unk_id=other_tag_id,
-            vocab_dict=feature_map,
-            tokenizer=tokenizer_func_for_label, )
-])
-
-
-def before(seg, label):
-    seg, orig_pos = np.split(seg, 2)
-    aligned_label = label[orig_pos]
-    seg, _ = tokenizer.truncate(seg, [], args.max_seqlen)
-    aligned_label, _ = tokenizer.truncate(aligned_label, [], args.max_seqlen)
-    orig_pos, _ = tokenizer.truncate(orig_pos, [], args.max_seqlen)
-
-    sentence, segments = tokenizer.build_for_ernie(
-        seg
-    )  #utils.data.build_1_pair(seg, max_seqlen=args.max_seqlen, cls_id=cls_id, sep_id=sep_id)
-    aligned_label = np.concatenate([[0], aligned_label, [0]], 0)
-    orig_pos = np.concatenate([[0], orig_pos, [0]])
-
-    assert len(aligned_label) == len(sentence) == len(orig_pos), (
-        len(aligned_label), len(sentence), len(orig_pos))  # alinged
-    return sentence, segments, aligned_label, label, orig_pos
-
-train_ds = feature_column.build_dataset('train', data_dir=os.path.join(args.data_dir, 'train'), shuffle=True, repeat=False, use_gz=False) \
-                               .map(before) \
-                               .padded_batch(args.bsz, (0,0,-100, other_tag_id + 1, 0)) \
-
-dev_ds = feature_column.build_dataset('dev', data_dir=os.path.join(args.data_dir, 'dev'), shuffle=False, repeat=False, use_gz=False) \
-                               .map(before) \
-                               .padded_batch(args.bsz, (0,0,-100, other_tag_id + 1,0)) \
-
-test_ds = feature_column.build_dataset('test', data_dir=os.path.join(args.data_dir, 'test'), shuffle=False, repeat=False, use_gz=False) \
-                               .map(before) \
-                               .padded_batch(args.bsz, (0,0,-100, other_tag_id + 1,0)) \
-
-
-def evaluate(model, dataset):
-    model.eval()
-    with P.no_grad():
-        chunkf1 = propeller.metrics.ChunkF1(None, None, None, len(feature_map))
-        for step, (ids, sids, aligned_label, label, orig_pos
-                   ) in enumerate(P.io.DataLoader(
-                       dataset, batch_size=None)):
-            loss, logits = model(ids, sids)
-            #print('\n'.join(map(str, logits.numpy().tolist())))
-
-            assert orig_pos.shape[0] == logits.shape[0] == ids.shape[
-                0] == label.shape[0]
-            for pos, lo, la, id in zip(orig_pos.numpy(),
-                                       logits.numpy(),
-                                       label.numpy(), ids.numpy()):
-                _dic = OrderedDict()
-                assert len(pos) == len(lo) == len(id)
-                for _pos, _lo, _id in zip(pos, lo, id):
-                    if _id > tokenizer.mask_id:  # [MASK] is the largest special token
-                        _dic.setdefault(_pos, []).append(_lo)
-                merged_lo = np.array(
-                    [np.array(l).mean(0) for _, l in six.iteritems(_dic)])
-                merged_preds = np.argmax(merged_lo, -1)
-                la = la[np.where(la != (other_tag_id + 1))]  #remove pad
-                if len(la) > len(merged_preds):
-                    log.warn(
-                        'accuracy loss due to truncation: label len:%d, truncate to %d'
-                        % (len(la), len(merged_preds)))
-                    merged_preds = np.pad(merged_preds,
-                                          [0, len(la) - len(merged_preds)],
-                                          mode='constant',
-                                          constant_values=7)
-                else:
-                    assert len(la) == len(
-                        merged_preds
-                    ), 'expect label == prediction, got %d vs %d' % (
-                        la.shape, merged_preds.shape)
-                chunkf1.update((merged_preds, la, np.array(len(la))))
-        #f1 = f1_score(np.concatenate(all_label), np.concatenate(all_pred), average='macro')
-        f1 = chunkf1.eval()
-    model.train()
-    return f1
-
-
-model = ErnieModelForTokenClassification.from_pretrained(
-    args.from_pretrained,
-    num_labels=len(feature_map),
-    name='',
-    has_pooler=False)
-
-g_clip = P.nn.ClipGradByGlobalNorm(1.0)  #experimental
-param_name_to_exclue_from_weight_decay = re.compile(
-    r'.*layer_norm_scale|.*layer_norm_bias|.*b_0')
-lr_scheduler = P.optimizer.lr.LambdaDecay(
-    args.lr,
-    get_warmup_and_linear_decay(args.max_steps,
-                                int(args.warmup_proportion * args.max_steps)))
-opt = P.optimizer.AdamW(
-    lr_scheduler,
-    parameters=model.parameters(),
-    weight_decay=args.wd,
-    apply_decay_param_fun=lambda n: not param_name_to_exclue_from_weight_decay.match(n),
-    grad_clip=g_clip)
-
-scaler = P.amp.GradScaler(enable=args.use_amp)
-with LogWriter(
-        logdir=str(create_if_not_exists(args.save_dir / 'vdl'))) as log_writer:
-    with P.amp.auto_cast(enable=args.use_amp):
-        for epoch in range(args.epoch):
-            for step, (
-                    ids, sids, aligned_label, label, orig_pos
-            ) in enumerate(P.io.DataLoader(
-                    train_ds, batch_size=None)):
-                loss, logits = model(ids, sids, labels=aligned_label)
-                #loss, logits = model(ids, sids, labels=aligned_label, loss_weights=P.cast(ids != 0, 'float32'))
-                loss = scaler.scale(loss)
-                loss.backward()
-                scaler.minimize(opt, loss)
-                model.clear_gradients()
-                lr_scheduler.step()
-
-                if step % 10 == 0:
-                    _lr = lr_scheduler.get_lr()
-                    if args.use_amp:
-                        _l = (loss / scaler._scale).numpy()
-                        msg = '[step-%d] train loss %.5f lr %.3e scaling %.3e' % (
-                            step, _l, _lr, scaler._scale.numpy())
-                    else:
-                        _l = loss.numpy()
-                        msg = '[step-%d] train loss %.5f lr %.3e' % (step, _l,
-                                                                     _lr)
-                    log.debug(msg)
-                    log_writer.add_scalar('loss', _l, step=step)
-                    log_writer.add_scalar('lr', _lr, step=step)
-
-                if step % 100 == 0:
-                    f1 = evaluate(model, dev_ds)
-                    log.debug('eval f1: %.5f' % f1)
-                    log_writer.add_scalar('eval/f1', f1, step=step)
-                    if args.save_dir is not None:
-                        P.save(model.state_dict(),str( args.save_dir / 'ckpt.bin'))
-
-f1 = evaluate(model, dev_ds)
-log.debug('final eval f1: %.5f' % f1)
-log_writer.add_scalar('eval/f1', f1, step=step)
-if args.save_dir is not None:
-    P.save(model.state_dict(),str( args.save_dir / 'ckpt.bin'))
--- a/demo/finetune_sentiment_analysis.py
+++ b/demo/finetune_sentiment_analysis.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import os
-import re
-import time
-import logging
-import json
-from random import random
-from tqdm import tqdm
-from functools import reduce, partial
-from pathlib import Path
-from visualdl import LogWriter
-
-import numpy as np
-import logging
-import argparse
-
-import paddle as P
-
-from propeller import log
-import propeller.paddle as propeller
-
-log.setLevel(logging.DEBUG)
-logging.getLogger().setLevel(logging.DEBUG)
-
-#from model.bert import BertConfig, BertModelLayer
-from ernie.modeling_ernie import ErnieModel, ErnieModelForSequenceClassification
-from ernie.tokenizing_ernie import ErnieTokenizer, ErnieTinyTokenizer
-#from ernie.optimization import AdamW, LinearDecay
-from demo.utils import create_if_not_exists, get_warmup_and_linear_decay
-
-parser = argparse.ArgumentParser('classify model with ERNIE')
-parser.add_argument(
-    '--from_pretrained',
-    type=Path,
-    required=True,
-    help='pretrained model directory or tag')
-parser.add_argument(
-    '--max_seqlen',
-    type=int,
-    default=128,
-    help='max sentence length, should not greater than 512')
-parser.add_argument('--bsz', type=int, default=32, help='batchsize')
-parser.add_argument('--epoch', type=int, default=3, help='epoch')
-parser.add_argument(
-    '--data_dir',
-    type=str,
-    required=True,
-    help='data directory includes train / develop data')
-parser.add_argument(
-    '--max_steps',
-    type=int,
-    required=True,
-    help='max_train_steps, set this to EPOCH * NUM_SAMPLES / BATCH_SIZE')
-parser.add_argument('--warmup_proportion', type=float, default=0.1)
-parser.add_argument('--lr', type=float, default=5e-5, help='learning rate')
-parser.add_argument('--eval', action='store_true')
-parser.add_argument(
-    '--save_dir', type=Path, required=True, help='model output directory')
-parser.add_argument(
-    '--init_checkpoint',
-    type=str,
-    default=None,
-    help='checkpoint to warm start from')
-parser.add_argument(
-    '--wd', type=float, default=0.01, help='weight decay, aka L2 regularizer')
-parser.add_argument(
-    '--use_amp',
-    action='store_true',
-    help='only activate AMP(auto mixed precision accelatoin) on TensorCore compatible devices'
-)
-
-args = parser.parse_args()
-
-tokenizer = ErnieTokenizer.from_pretrained(args.from_pretrained)
-#tokenizer = ErnieTinyTokenizer.from_pretrained(args.from_pretrained)
-
-model = ErnieModelForSequenceClassification.from_pretrained(
-    args.from_pretrained, num_labels=3, name='')
-if not args.eval:
-    feature_column = propeller.data.FeatureColumns([
-        propeller.data.TextColumn(
-            'seg_a',
-            unk_id=tokenizer.unk_id,
-            vocab_dict=tokenizer.vocab,
-            tokenizer=tokenizer.tokenize),
-        propeller.data.LabelColumn('label'),
-    ])
-
-    def map_fn(seg_a, label):
-        seg_a, _ = tokenizer.truncate(seg_a, [], seqlen=args.max_seqlen)
-        sentence, segments = tokenizer.build_for_ernie(seg_a, [])
-        return sentence, segments, label
-
-
-    train_ds = feature_column.build_dataset('train', data_dir=os.path.join(args.data_dir, 'train'), shuffle=True, repeat=False, use_gz=False) \
-                                   .map(map_fn) \
-                                   .padded_batch(args.bsz)
-
-    dev_ds = feature_column.build_dataset('dev', data_dir=os.path.join(args.data_dir, 'dev'), shuffle=False, repeat=False, use_gz=False) \
-                                   .map(map_fn) \
-                                   .padded_batch(args.bsz)
-
-    g_clip = P.nn.ClipGradByGlobalNorm(1.0)  #experimental
-    lr_scheduler = P.optimizer.lr.LambdaDecay(
-        args.lr,
-        get_warmup_and_linear_decay(
-            args.max_steps, int(args.warmup_proportion * args.max_steps)))
-
-    param_name_to_exclue_from_weight_decay = re.compile(
-        r'.*layer_norm_scale|.*layer_norm_bias|.*b_0')
-
-    opt = P.optimizer.AdamW(
-        lr_scheduler,
-        parameters=model.parameters(),
-        weight_decay=args.wd,
-        apply_decay_param_fun=lambda n: not param_name_to_exclue_from_weight_decay.match(n),
-        grad_clip=g_clip)
-    scaler = P.amp.GradScaler(enable=args.use_amp)
-    with LogWriter(logdir=str(create_if_not_exists(args.save_dir /
-                                                   'vdl'))) as log_writer:
-        with P.amp.auto_cast(enable=args.use_amp):
-            for epoch in range(args.epoch):
-                for step, d in enumerate(
-                        P.io.DataLoader(
-                            train_ds, places=P.CUDAPlace(0), batch_size=None)):
-                    ids, sids, label = d
-                    loss, _ = model(ids, sids, labels=label)
-                    loss = scaler.scale(loss)
-                    loss.backward()
-                    scaler.minimize(opt, loss)
-                    model.clear_gradients()
-                    lr_scheduler.step()
-
-                    if step % 10 == 0:
-                        _lr = lr_scheduler.get_lr()
-                        if args.use_amp:
-                            _l = (loss / scaler._scale).numpy()
-                            msg = '[step-%d] train loss %.5f lr %.3e scaling %.3e' % (
-                                step, _l, _lr, scaler._scale.numpy())
-                        else:
-                            _l = loss.numpy()
-                            msg = '[step-%d] train loss %.5f lr %.3e' % (
-                                step, _l, _lr)
-                        log.debug(msg)
-                        log_writer.add_scalar('loss', _l, step=step)
-                        log_writer.add_scalar('lr', _lr, step=step)
-
-                    if step % 100 == 0:
-                        acc = []
-                        with P.no_grad():
-                            model.eval()
-                            for step, d in enumerate(
-                                    P.io.DataLoader(
-                                        dev_ds,
-                                        places=P.CUDAPlace(0),
-                                        batch_size=None)):
-                                ids, sids, label = d
-                                loss, logits = model(ids, sids, labels=label)
-                                a = (logits.argmax(-1) == label)
-                                acc.append(a.numpy())
-                            model.train()
-                        acc = np.concatenate(acc).mean()
-                        log_writer.add_scalar('eval/acc', acc, step=step)
-                        log.debug('acc %.5f' % acc)
-                        if args.save_dir is not None:
-                            P.save(model.state_dict(),
-                                   str(args.save_dir / 'ckpt.bin'))
-        if args.save_dir is not None:
-            P.save(model.state_dict(), str(args.save_dir / 'ckpt.bin'))
-else:
-    feature_column = propeller.data.FeatureColumns([
-        propeller.data.TextColumn(
-            'seg_a',
-            unk_id=tokenizer.unk_id,
-            vocab_dict=tokenizer.vocab,
-            tokenizer=tokenizer.tokenize),
-    ])
-
-    sd = P.load(str(args.init_checkpoint))
-    model.set_dict(sd)
-    model.eval()
-
-    def map_fn(seg_a):
-        seg_a, _ = tokenizer.truncate(seg_a, [], seqlen=args.max_seqlen)
-        sentence, segments = tokenizer.build_for_ernie(seg_a, [])
-        return sentence, segments
-
-    predict_ds = feature_column.build_dataset_from_stdin('predict') \
-                                   .map(map_fn) \
-                                   .padded_batch(args.bsz)
-
-    for step, (ids, sids) in enumerate(
-            P.io.DataLoader(
-                predict_ds, places=P.CUDAPlace(0), batch_size=None)):
-        _, logits = model(ids, sids)
-        pred = logits.numpy().argmax(-1)
-        print('\n'.join(map(str, pred.tolist())))
--- a/demo/mrc/mrc_reader.py
+++ b/demo/mrc/mrc_reader.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import division
-from __future__ import absolute_import
-from __future__ import print_function
-from __future__ import unicode_literals
-
-import sys
-import argparse
-import logging
-from functools import partial
-from io import open
-
-open = partial(open, encoding='utf-8')
-
-import json
-from collections import namedtuple
-
-log = logging.getLogger(__name__)
-
-Example = namedtuple('Example', [
-    'qas_id', 'question_text', 'doc_tokens', 'orig_answer_text',
-    'start_position', 'end_position'
-])
-
-Feature = namedtuple("Feature", [
-    "unique_id", "example_index", "doc_span_index", "tokens",
-    "token_to_orig_map", "token_is_max_context", "token_ids", "position_ids",
-    "text_type_ids", "start_position", "end_position"
-])
-
-
-def _tokenize_chinese_chars(text):
-    """Adds whitespace around any CJK character."""
-
-    def _is_chinese_char(cp):
-        """Checks whether CP is the codepoint of a CJK character."""
-        # This defines a "chinese character" as anything in the CJK Unicode block:
-        #     https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
-        #
-        # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
-        # despite its name. The modern Korean Hangul alphabet is a different block,
-        # as is Japanese Hiragana and Katakana. Those alphabets are used to write
-        # space-separated words, so they are not treated specially and handled
-        # like the all of the other languages.
-        if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #
-            (cp >= 0x3400 and cp <= 0x4DBF) or  #
-            (cp >= 0x20000 and cp <= 0x2A6DF) or  #
-            (cp >= 0x2A700 and cp <= 0x2B73F) or  #
-            (cp >= 0x2B740 and cp <= 0x2B81F) or  #
-            (cp >= 0x2B820 and cp <= 0x2CEAF) or
-            (cp >= 0xF900 and cp <= 0xFAFF) or  #
-            (cp >= 0x2F800 and cp <= 0x2FA1F)):  #
-            return True
-
-        return False
-
-    output = []
-    buff = ""
-    for char in text:
-        cp = ord(char)
-        if _is_chinese_char(cp):
-            if buff != "":
-                output.append(buff)
-                buff = ""
-            output.append(char)
-        else:
-            buff += char
-
-    if buff != "":
-        output.append(buff)
-
-    return output
-
-
-def _check_is_max_context(doc_spans, cur_span_index, position):
-    """chech is max context"""
-    best_score = None
-    best_span_index = None
-    for (span_index, doc_span) in enumerate(doc_spans):
-        end = doc_span.start + doc_span.length - 1
-        if position < doc_span.start:
-            continue
-        if position > end:
-            continue
-        num_left_context = position - doc_span.start
-        num_right_context = end - position
-        score = min(num_left_context,
-                    num_right_context) + 0.01 * doc_span.length
-        if best_score is None or score > best_score:
-            best_score = score
-            best_span_index = span_index
-
-    return cur_span_index == best_span_index
-
-
-def _improve_answer_span(doc_tokens, input_start, input_end, tokenizer,
-                         orig_answer_text):
-    """improve answer span"""
-    tok_answer_text = " ".join(tokenizer.tokenize(orig_answer_text))
-
-    for new_start in range(input_start, input_end + 1):
-        for new_end in range(input_end, new_start - 1, -1):
-            text_span = " ".join(doc_tokens[new_start:(new_end + 1)])
-            if text_span == tok_answer_text:
-                return (new_start, new_end)
-
-    return (input_start, input_end)
-
-
-def read_files(input_file, is_training):
-    """read file"""
-    examples = []
-    with open(input_file, "r") as f:
-        input_data = json.load(f)["data"]
-        for entry in input_data:
-            for paragraph in entry["paragraphs"]:
-                paragraph_text = paragraph["context"]
-                for qa in paragraph["qas"]:
-                    qas_id = qa["id"]
-                    question_text = qa["question"]
-                    start_pos = None
-                    end_pos = None
-                    orig_answer_text = None
-
-                    if is_training:
-                        if len(qa["answers"]) != 1:
-                            raise ValueError(
-                                "For training, each question should have exactly 1 answer."
-                            )
-
-                        answer = qa["answers"][0]
-                        orig_answer_text = answer["text"]
-                        answer_offset = answer["answer_start"]
-                        answer_length = len(orig_answer_text)
-                        doc_tokens = [
-                            paragraph_text[:answer_offset], paragraph_text[
-                                answer_offset:answer_offset + answer_length],
-                            paragraph_text[answer_offset + answer_length:]
-                        ]
-
-                        start_pos = 1
-                        end_pos = 1
-
-                        actual_text = " ".join(doc_tokens[start_pos:(end_pos +
-                                                                     1)])
-                        if actual_text.find(orig_answer_text) == -1:
-                            log.info("Could not find answer: '%s' vs. '%s'",
-                                     actual_text, orig_answer_text)
-                            continue
-                    else:
-                        doc_tokens = _tokenize_chinese_chars(paragraph_text)
-
-                    example = Example(
-                        qas_id=qas_id,
-                        question_text=question_text,
-                        doc_tokens=doc_tokens,
-                        orig_answer_text=orig_answer_text,
-                        start_position=start_pos,
-                        end_position=end_pos)
-                    examples.append(example)
-
-    return examples
-
-
-def convert_example_to_features(examples,
-                                max_seq_length,
-                                tokenizer,
-                                is_training,
-                                doc_stride=128,
-                                max_query_length=64):
-    """convert example to feature"""
-    features = []
-    unique_id = 1000000000
-
-    for (example_index, example) in enumerate(examples):
-        query_tokens = tokenizer.tokenize(example.question_text)
-        if len(query_tokens) > max_query_length:
-            query_tokens = query_tokens[0:max_query_length]
-        tok_to_orig_index = []
-        orig_to_tok_index = []
-        all_doc_tokens = []
-        for (i, token) in enumerate(example.doc_tokens):
-            orig_to_tok_index.append(len(all_doc_tokens))
-            sub_tokens = tokenizer.tokenize(token)
-            for sub_token in sub_tokens:
-                tok_to_orig_index.append(i)
-                all_doc_tokens.append(sub_token)
-        #log.info(orig_to_tok_index, example.start_position)
-
-        tok_start_position = None
-        tok_end_position = None
-        if is_training:
-            tok_start_position = orig_to_tok_index[example.start_position]
-            if example.end_position < len(example.doc_tokens) - 1:
-                tok_end_position = orig_to_tok_index[example.end_position +
-                                                     1] - 1
-            else:
-                tok_end_position = len(all_doc_tokens) - 1
-            (tok_start_position, tok_end_position) = _improve_answer_span(
-                all_doc_tokens, tok_start_position, tok_end_position,
-                tokenizer, example.orig_answer_text)
-
-        max_tokens_for_doc = max_seq_length - len(query_tokens) - 3
-        _DocSpan = namedtuple("DocSpan", ["start", "length"])
-        doc_spans = []
-        start_offset = 0
-        while start_offset < len(all_doc_tokens):
-            length = len(all_doc_tokens) - start_offset
-            if length > max_tokens_for_doc:
-                length = max_tokens_for_doc
-            doc_spans.append(_DocSpan(start=start_offset, length=length))
-            if start_offset + length == len(all_doc_tokens):
-                break
-            start_offset += min(length, doc_stride)
-
-        for (doc_span_index, doc_span) in enumerate(doc_spans):
-            tokens = []
-            token_to_orig_map = {}
-            token_is_max_context = {}
-            text_type_ids = []
-            tokens.append("[CLS]")
-            text_type_ids.append(0)
-            for token in query_tokens:
-                tokens.append(token)
-                text_type_ids.append(0)
-            tokens.append("[SEP]")
-            text_type_ids.append(0)
-
-            for i in range(doc_span.length):
-                split_token_index = doc_span.start + i
-                token_to_orig_map[len(tokens)] = tok_to_orig_index[
-                    split_token_index]
-
-                is_max_context = _check_is_max_context(
-                    doc_spans, doc_span_index, split_token_index)
-                token_is_max_context[len(tokens)] = is_max_context
-                tokens.append(all_doc_tokens[split_token_index])
-                text_type_ids.append(1)
-            tokens.append("[SEP]")
-            text_type_ids.append(1)
-
-            token_ids = tokenizer.convert_tokens_to_ids(tokens)
-            position_ids = list(range(len(token_ids)))
-            start_position = None
-            end_position = None
-            if is_training:
-                doc_start = doc_span.start
-                doc_end = doc_span.start + doc_span.length - 1
-                out_of_span = False
-                if not (tok_start_position >= doc_start and
-                        tok_end_position <= doc_end):
-                    out_of_span = True
-                if out_of_span:
-                    start_position = 0
-                    end_position = 0
-                else:
-                    doc_offset = len(query_tokens) + 2
-                    start_position = tok_start_position - doc_start + doc_offset
-                    end_position = tok_end_position - doc_start + doc_offset
-
-            feature = Feature(
-                unique_id=unique_id,
-                example_index=example_index,
-                doc_span_index=doc_span_index,
-                tokens=tokens,
-                token_to_orig_map=token_to_orig_map,
-                token_is_max_context=token_is_max_context,
-                token_ids=token_ids,
-                position_ids=position_ids,
-                text_type_ids=text_type_ids,
-                start_position=start_position,
-                end_position=end_position)
-            features.append(feature)
-
-            unique_id += 1
-
-    return features
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description='main')
-    parser.add_argument("--input", type=str, default=None)
-    args = parser.parse_args()
-
-    from ernie.tokenizing_ernie import ErnieTokenizer
-    tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
-    examples = read_files(args.input, True)
-    features = convert_example_to_features(examples, 512, tokenizer, True)
-    log.debug(len(examples))
-    log.debug(len(features))
--- a/demo/optimization.py
+++ b/demo/optimization.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from __future__ import unicode_literals
-from __future__ import absolute_import
-
-import logging
-import re
-
-import numpy as np
-import paddle as P
-import paddle.distributed.fleet as fleet
-from propeller.paddle.train.hooks import RunHook
-
-log = logging.getLogger(__name__)
-
-from demo.utils import create_if_not_exists, get_warmup_and_linear_decay
-
-
-def optimization(
-        loss,
-        warmup_steps,
-        num_train_steps,
-        learning_rate,
-        train_program,
-        startup_prog,
-        weight_decay,
-        scheduler='linear_warmup_decay',
-        use_fp16=False, ):
-    """do backword for static"""
-
-    def exclude_from_weight_decay(param):
-        name = param.rstrip('.master')
-        if name.find("layer_norm") > -1:
-            return True
-        bias_suffix = ["_bias", "_b", ".b_0"]
-        for suffix in bias_suffix:
-            if name.endswith(suffix):
-                return True
-        return False
-
-    g_clip = P.nn.ClipGradByGlobalNorm(1.0)
-    lr_scheduler = P.optimizer.lr.LambdaDecay(
-        learning_rate,
-        get_warmup_and_linear_decay(num_train_steps, warmup_steps))
-
-    optimizer = P.optimizer.AdamW(
-        learning_rate=lr_scheduler,
-        weight_decay=weight_decay,
-        grad_clip=g_clip,
-        apply_decay_param_fun=exclude_from_weight_decay)
-
-    if use_fp16:
-        log.info('AMP activated')
-        if weight_decay > 0.:
-            raise ValueError(
-                'paddle amp will ignore `weight_decay`, see https://github.com/PaddlePaddle/Paddle/issues/29794'
-            )
-        #amp_list = P.fluid.contrib.mixed_precision.AutoMixedPrecisionLists(
-        #    custom_white_list=['softmax', 'layer_norm', 'gelu'])
-        optimizer = P.fluid.contrib.mixed_precision.decorate(
-            optimizer, init_loss_scaling=2**15, use_dynamic_loss_scaling=True)
-        _, param_grads = optimizer.minimize(loss)
-        loss_scaling = P.static.default_main_program().global_block().var(
-            'loss_scaling_0')
-    else:
-        _, param_grads = optimizer.minimize(loss)
-        loss_scaling = None
-
-    class LRStepHook(RunHook):
-        def after_run(self, _, __):
-            lr_scheduler.step()
-            log.debug('lr step: %.5f' % lr_scheduler.get_lr())
-
-    return LRStepHook(), loss_scaling
--- a/demo/pretrain/README.md
+++ b/demo/pretrain/README.md
-# Distributed Pretrain
-
-only **mask word** strategy from [Ernie1.0](https://arxiv.org/pdf/1904.09223.pdf) is illustrated in this section.
-
-1. make pretrain data
-
-we use documents from multiple datasource (e.g. Wikipedia) to pretrain.
-input text should be segmented with space (even in chinese, this segmentation is used for *mask word*).
-each line corresonds to a *sentence*.
-empty line indicates end of document.
-
-example:
-
-> 数学 是 利用 符号语言 研究 数量 、 结构 、 变化 以及 空间 等 概念 的 一门 学科 ， 从 某种 角度看 属于 形式 科学 的 一种 。
-> 数学 透过 抽象化 和 逻辑推理 的 使用 ， 由 计数 、 计算 、 量度 和 对 物体 形状 及 运动 的 观察 而 产生 。
-> 数学家 们 拓展 这些 概念 ， 为了 公式化 新 的 猜想 以及 从 选定 的 公理 及 定义 中 建立 起 严谨 推导 出 的 定理 。
-> 基础 数学 的 知识 与 运用 总是 个人 与 团体 生活 中 不可或缺 的 一环 。
-> 对 数学 基本概念 的 完善 ， 早 在 古埃及 、 美索不达米亚 及 古印度 内 的 古代 数学 文本 便 可观 见 ， 而 在 古希腊 那里 有 更为 严谨 的 处理 。
-> 从 那时 开始 ， 数学 的 发展 便 持续 不断 地 小幅 进展 ， 至 16 世纪 的 文艺复兴 时期 ， 因为 新 的 科学 发现 和 数学 革新 两者 的 交互 ， 致使 数学 的 加速 发展 ， 直至 今日 。
->
-> 云外镜 （ ） 是 一种 能 反映 遥远 地方 影像 的 镜子 ， 就 好比 现在 的 电视 ， 或是 吉卜赛人 占卜 用 的 水晶球 一样 。
-> 它 属于 付丧神 的 一种 ， 是 镜子 历经 百年 后 幻化 而成 的 妖怪 ， 又名 镜 妖 。
-> 也 有人 说云 外镜 是 狸 妖 幻化 而成 的 ， 当狸 妖 的 肚子 胀大 ， 像 电视 的 映像管 一样 发光 时 ， 就 可以 自由 地 显现出 远方 的 情景 。
-> 著名 的 妖怪 绘师 鸟 山石 燕曾 记载 云外镜 经常 容易 跟 照妖镜 搞混 ， 因为 照妖镜 可以 映照 出 肉眼 看不见 的 妖怪 ， 这点 与 云外 镜会 映照 出 怪异 的 脸孔 是 有些 相似 。
-> 据说 在 阴历 八月 十五日 的 夜晚 ， 在 水晶 盆内 注满 水 ， 将 镜子 平 放在 水面 ， 若 是 映照 出 妖怪 的 模样 ， 就 表示 这 面 镜子 里 住 著 妖怪 。
-
-
-make pretrain data with:
-
-```script
-python3 ./demo/pretrain/make_pretrain_data.py input_file  output_file.gz --vocab /path/to/ernie1.0/vocab.txt
-```
-
-2. run distributed pretrain
-
-```sript
-
-python3 -m paddle.distributed.launch \
-./demo/pretrain/pretrain_dygraph.py \
-    --data_dir "data/*.gz" \
-    --from_pretrained /path/to/ernie1.0_pretrain_dir/
-
-```
--- a/demo/pretrain/make_pretrain_data.py
+++ b/demo/pretrain/make_pretrain_data.py
-import sys
-import argparse
-import struct
-import random as r
-import re
-import gzip
-import logging
-from itertools import accumulate
-from functools import reduce, partial, wraps
-from propeller import log
-from propeller.paddle.data import feature_pb2, example_pb2
-#jfrom data_util import RawtextColumn
-
-import io
-sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
-sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
-
-
-def gen_segs(segment_piece):
-    if len(segment_piece) == 0:
-        return []
-    else:
-        return [min(segment_piece)] * len(segment_piece)
-
-
-whit_space_pat = re.compile(r'\S+')
-
-
-def segment(inputs, inputs_segment):
-    ret = [r.span() for r in whit_space_pat.finditer(inputs)]
-    ret = [(inputs[s:e], gen_segs(inputs_segment[s:e]))
-           for i, (s, e) in enumerate(ret)]
-    return ret
-
-
-def tokenize(sen, seg_info):
-    """
-    char tokenizer (wordpiece english)
-    normed txt(space seperated or not) => list of word-piece
-    """
-    sen = sen.lower()
-    res_word, res_segments = [], []
-    for match in pat.finditer(sen):
-        words, pos = _wordpiece(
-            match.group(0), vocab=vocab_set, unk_token='[UNK]')
-        start_of_word = match.span()[0]
-        for w, p in zip(words, pos):
-            res_word.append(w)
-            res_segments.append(
-                gen_segs(seg_info[p[0] + start_of_word:p[1] + start_of_word]))
-    return res_word, res_segments
-
-
-def parse_txt(line):
-    if len(line) == 0:
-        return []
-    line = line.decode('utf8')
-    ret_line, ret_seginfo = [], []
-
-    for l, i in segment(line, list(range(len(line)))):
-        for ll, ii in zip(*tokenize(l, i)):
-            ret_line.append(ll)
-            ret_seginfo.append(ii)
-
-    if args.check and r.random() < 0.005:
-        print('****', file=sys.stderr)
-        print(line, file=sys.stderr)
-        print('|'.join(ret_line), file=sys.stderr)
-        print(ret_seginfo, file=sys.stderr)
-        print('****', file=sys.stderr)
-
-    ret_line = [vocab.get(r, vocab['[UNK]']) for r in ret_line]
-    ret_seginfo = [[-1] if i == [] else i
-                   for i in ret_seginfo]  #for sentence piece only
-    ret_seginfo = [min(i) for i in ret_seginfo]
-    return ret_line, ret_seginfo
-
-
-def build_example(slots):
-    txt, seginfo = slots
-    txt_fe_list = feature_pb2.FeatureList(feature=[
-        feature_pb2.Feature(int64_list=feature_pb2.Int64List(value=t))
-        for t in txt
-    ])
-    segsinfo_fe_list = feature_pb2.FeatureList(feature=[
-        feature_pb2.Feature(int64_list=feature_pb2.Int64List(value=s))
-        for s in seginfo
-    ])
-    assert len(txt_fe_list.feature) == len(
-        segsinfo_fe_list.feature), 'txt[%d] and seginfo[%d] size not match' % (
-            len(txt_fe_list.feature), len(segsinfo_fe_list.feature))
-    features = {
-        'txt': txt_fe_list,
-        'segs': segsinfo_fe_list,
-    }
-
-    ex = example_pb2.SequenceExample(feature_lists=feature_pb2.FeatureLists(
-        feature_list=features))
-    return ex
-
-
-def write_gz(serialized, to_file):
-    l = len(serialized)
-    packed_data = struct.pack('i%ds' % l, l, serialized)
-    to_file.write(packed_data)
-
-
-def build_bb(from_file, to_file):
-    slots = []
-    for i, line in enumerate(from_file):
-        line = line.strip()
-        if args.verbose and i % 10000 == 0:
-            log.debug(i)
-        if len(line) == 0:
-            if len(slots) != 0:
-                transposed_slots = list(zip(*slots))
-                ex = build_example(transposed_slots)
-                write_gz(ex.SerializeToString(), to_file)
-                slots = []
-            continue
-        parsed_line = parse_txt(line)
-        slots.append(parsed_line)
-
-    if len(slots) != 0:
-        transposed_slots = list(zip(*slots))
-        ex = build_example(transposed_slots)
-        write_gz(ex.SerializeToString(), to_file)
-        slots = []
-
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser('Pretrain Data Maker')
-    parser.add_argument('src', type=str)
-    parser.add_argument('tgt', type=str)
-    parser.add_argument('--vocab', type=str, required=True)
-    parser.add_argument('-v', '--verbose', action='store_true')
-    parser.add_argument('-c', '--check', action='store_true')
-
-    args = parser.parse_args()
-    log.setLevel(logging.DEBUG)
-
-    from ernie.tokenizing_ernie import _wordpiece
-    pat = re.compile(r'([a-zA-Z0-9]+|\S)')
-
-    vocab = {
-        j.strip().split(b'\t')[0].decode('utf8'): i
-        for i, j in enumerate(open(args.vocab, 'rb'))
-    }
-    vocab_set = set(vocab.keys())
-
-    with open(args.src, 'rb') as from_file, gzip.open(args.tgt,
-                                                      'wb') as to_file:
-        log.info('making gz from bb %s ==> %s' % (from_file, to_file))
-        build_bb(from_file, to_file)
-        log.info('done: %s' % to_file)
--- a/demo/pretrain/pretrain.py
+++ b/demo/pretrain/pretrain.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-from __future__ import print_function
-from __future__ import absolute_import
-from __future__ import unicode_literals
-
-import sys
-import io
-import os
-import time
-import numpy as np
-import re
-import logging
-import six
-from glob import glob
-from pathlib import Path
-from functools import reduce, partial
-import itertools
-
-import paddle as P
-import sentencepiece as spm
-import json
-
-from tqdm import tqdm
-
-import random as r
-
-from ernie.modeling_ernie import ErnieModelForPretraining
-from ernie.tokenizing_ernie import ErnieTokenizer
-#from ernie.optimization import AdamW, LinearDecay
-
-import propeller as propeller_base
-import propeller.paddle as propeller
-from propeller.paddle.data import Dataset
-from propeller import log
-from demo.utils import create_if_not_exists, get_warmup_and_linear_decay
-
-log.setLevel(logging.DEBUG)
-logging.getLogger().setLevel(logging.DEBUG)
-
-if six.PY3:
-    from itertools import accumulate
-else:
-    import operator
-
-    def accumulate(iterable, func=operator.add, initial=None):
-        'Return running totals'
-        # accumulate([1,2,3,4,5]) --> 1 3 6 10 15
-        # accumulate([1,2,3,4,5], initial=100) --> 100 101 103 106 110 115
-        # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
-        it = iter(iterable)
-        total = initial
-        if initial is None:
-            try:
-                total = next(it)
-            except StopIteration:
-                return
-        yield total
-        for element in it:
-            total = func(total, element)
-            yield total
-
-
-def truncate_sentence(seq, from_length, to_length):
-    random_begin = np.random.randint(
-        0, np.maximum(0, from_length - to_length) + 1)
-    return seq[random_begin:random_begin + to_length]
-
-
-def build_pair(seg_a, seg_b, max_seqlen, vocab):
-    #log.debug('pair %s \n %s' % (seg_a, seg_b))
-    cls_id = vocab['[CLS]']
-    sep_id = vocab['[SEP]']
-    a_len = len(seg_a)
-    b_len = len(seg_b)
-    ml = max_seqlen - 3
-    half_ml = ml // 2
-    if a_len > b_len:
-        a_len_truncated, b_len_truncated = np.maximum(
-            half_ml, ml - b_len), np.minimum(half_ml, b_len)
-    else:
-        a_len_truncated, b_len_truncated = np.minimum(
-            half_ml, a_len), np.maximum(half_ml, ml - a_len)
-
-    seg_a = truncate_sentence(seg_a, a_len, a_len_truncated)
-    seg_b = truncate_sentence(seg_b, b_len, b_len_truncated)
-
-    seg_a_txt, seg_a_info = seg_a[:, 0], seg_a[:, 1]
-    seg_b_txt, seg_b_info = seg_b[:, 0], seg_b[:, 1]
-
-    token_type_a = np.ones_like(seg_a_txt, dtype=np.int64) * 0
-    token_type_b = np.ones_like(seg_b_txt, dtype=np.int64) * 1
-    sen_emb = np.concatenate(
-        [[cls_id], seg_a_txt, [sep_id], seg_b_txt, [sep_id]], 0)
-    info_emb = np.concatenate([[-1], seg_a_info, [-1], seg_b_info, [-1]], 0)
-    token_type_emb = np.concatenate(
-        [[0], token_type_a, [0], token_type_b, [1]], 0)
-
-    return sen_emb, info_emb, token_type_emb
-
-
-def apply_mask(sentence, seg_info, mask_rate, vocab_size, vocab):
-    pad_id = vocab['[PAD]']
-    mask_id = vocab['[MASK]']
-    shape = sentence.shape
-    batch_size, seqlen = shape
-
-    invalid_pos = np.where(seg_info == -1)
-    seg_info += 1  #no more =1
-    seg_info_flatten = seg_info.reshape([-1])
-    seg_info_incr = seg_info_flatten - np.roll(seg_info_flatten, shift=1)
-    seg_info = np.add.accumulate(
-        np.array([0 if s == 0 else 1 for s in seg_info_incr])).reshape(shape)
-    seg_info[invalid_pos] = -1
-
-    u_seginfo = np.array([i for i in np.unique(seg_info) if i != -1])
-    np.random.shuffle(u_seginfo)
-    sample_num = max(1, int(len(u_seginfo) * mask_rate))
-    u_seginfo = u_seginfo[:sample_num]
-    mask = reduce(np.logical_or, [seg_info == i for i in u_seginfo])
-
-    mask[:, 0] = False  # ignore CLS head
-
-    rand = np.random.rand(*shape)
-    choose_original = rand < 0.1  #
-    choose_random_id = (0.1 < rand) & (rand < 0.2)  #
-    choose_mask_id = 0.2 < rand  #
-    random_id = np.random.randint(1, vocab_size, size=shape)
-
-    replace_id = mask_id * choose_mask_id + \
-                 random_id * choose_random_id + \
-                 sentence * choose_original
-
-    mask_pos = np.where(mask)
-    #mask_pos_flatten = list(map(lambda idx: idx[0] * seqlen + idx[1], zip(*mask_pos))) #transpose
-    mask_label = sentence[mask_pos]
-    sentence[mask_pos] = replace_id[mask_pos]  #overwrite
-    #log.debug(mask_pos_flatten)
-    return sentence, np.stack(mask_pos, -1), mask_label
-
-
-def make_pretrain_dataset(name, dir, vocab, args):
-    gz_files = glob(dir)
-    if not gz_files:
-        raise ValueError('train data not found in %s' % gz_files)
-
-    log.info('read from %s' % '\n'.join(gz_files))
-    max_input_seqlen = args.max_seqlen
-    max_pretrain_seqlen = lambda: max_input_seqlen if r.random() > 0.15 else r.randint(1, max_input_seqlen)  # short sentence rate
-
-    def _parse_gz(record_str):  # function that takes python_str as input
-        ex = propeller_base.data.example_pb2.SequenceExample()
-        ex.ParseFromString(record_str)
-        doc = [
-            np.array(
-                f.int64_list.value, dtype=np.int64)
-            for f in ex.feature_lists.feature_list['txt'].feature
-        ]
-        doc_seg = [
-            np.array(
-                f.int64_list.value, dtype=np.int64)
-            for f in ex.feature_lists.feature_list['segs'].feature
-        ]
-        return doc, doc_seg
-
-    def bb_to_segments(filename):
-        ds = Dataset.from_record_file(filename).map(_parse_gz)
-        iterable = iter(ds)
-
-        def gen():
-            buf, size = [], 0
-            iterator = iter(ds)
-            while 1:
-                doc, doc_seg = next(iterator)
-                for line, line_seg in zip(doc, doc_seg):
-                    #line = np.array(sp_model.SampleEncodeAsIds(line, -1, 0.1), dtype=np.int64) # 0.1 means large variance on sentence piece result
-                    if len(line) == 0:
-                        continue
-                    line = np.array(
-                        line
-                    )  # 0.1 means large variance on sentence piece result
-                    line_seg = np.array(line_seg)
-                    size += len(line)
-                    buf.append(np.stack([line, line_seg]).transpose())
-                    if size > max_input_seqlen:
-                        yield buf,
-                        buf, size = [], 0
-                if len(buf) != 0:
-                    yield buf,
-                    buf, size = [], 0
-
-        return Dataset.from_generator_func(gen)
-
-    def sample_negative(dataset):
-        def gen():
-            iterator = iter(dataset)
-            while True:
-                chunk_a, = next(iterator)
-                #chunk_b, = next(iterator)
-
-                seqlen = max_pretrain_seqlen()
-                seqlen_a = r.randint(1, seqlen)
-                seqlen_b = seqlen - seqlen_a
-                len_a = list(accumulate([len(c) for c in chunk_a]))
-                buf_a = [c for c, l in zip(chunk_a, len_a)
-                         if l < seqlen_a]  #always take the first one
-                buf_b = [
-                    c for c, l in zip(chunk_a, len_a) if seqlen_a <= l < seqlen
-                ]
-
-                if r.random() < 0.5:  #pos or neg
-                    label = np.int64(1)
-                else:
-                    label = np.int64(0)
-                    buf_a, buf_b = buf_b, buf_a
-
-                if not (len(buf_a) and len(buf_b)):
-                    continue
-                a = np.concatenate(buf_a)
-                b = np.concatenate(buf_b)
-                #log.debug(a)
-                #log.debug(b)
-                sample, seg_info, token_type = build_pair(
-                    a, b, args.max_seqlen,
-                    vocab)  #negative sample might exceed max seqlen
-                yield sample, seg_info, token_type, label
-
-        ds = propeller.data.Dataset.from_generator_func(gen)
-        return ds
-
-    def after(sentence, seg_info, segments, label):
-        batch_size, seqlen = sentence.shape
-        sentence, mask_pos, mlm_label = apply_mask(sentence, seg_info,
-                                                   args.mask_rate,
-                                                   len(vocab), vocab)
-
-        ra = r.random()
-        if ra < args.check:
-            print('***')
-            print('\n'.join([
-                str(j) + '\t' + '|'.join(map(str, i))
-                for i, j in zip(sentence.tolist(), label)
-            ]))
-            print('***')
-            print('\n'.join(
-                ['|'.join(map(str, i)) for i in seg_info.tolist()]))
-            print('***')
-            print('|'.join(map(str, mlm_label.tolist())))
-            print('***')
-
-        return sentence, segments, mlm_label, mask_pos, label
-
-    # pretrain pipeline
-    dataset = Dataset.from_list(gz_files)
-    if propeller.train.distribution.status.mode == propeller.train.distribution.DistributionMode.NCCL:
-        log.info('Apply sharding in distribution env')
-        if len(gz_files) < propeller.train.distribution.status.num_replica:
-            raise ValueError(
-                'not enough train file to shard: # of train files: %d, # of workers %d'
-                % (len(gz_files),
-                   propeller.train.distribution.status.num_replica))
-        dataset = dataset.shard(env.nranks, env.dev_id)
-
-    dataset = dataset.repeat().shuffle(buffer_size=len(gz_files))
-
-    dataset = dataset.interleave(
-        map_fn=bb_to_segments, cycle_length=len(gz_files), block_length=1)
-    dataset = dataset.shuffle(
-        buffer_size=1000)  #must shuffle to ensure negative sample randomness
-    dataset = sample_negative(dataset)
-    dataset = dataset.padded_batch(args.bsz, (0, 0, 0, 0)).map(after)
-    dataset.name = name
-    return dataset
-
-
-if __name__ == '__main__':
-    if six.PY3:
-        import io
-        sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
-        sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
-
-    parser = propeller.ArgumentParser('DAN model with Paddle')
-    parser.add_argument(
-        '--max_seqlen',
-        type=int,
-        default=256,
-        help='max sequence length, documents from pretrain data will expand to this length'
-    )
-    parser.add_argument(
-        '--data_dir',
-        type=str,
-        required=True,
-        help='protobuf pretrain data directory')
-    parser.add_argument(
-        '--mask_rate',
-        type=float,
-        default=0.15,
-        help='probability of input token tobe masked')
-    parser.add_argument(
-        '--check', type=float, default=0., help='probability of debug info')
-    parser.add_argument(
-        '--warmup_steps', type=int, default=10000, help='warmups steps')
-    parser.add_argument(
-        '--max_steps', type=int, default=1000000, help='max pretrian steps')
-    parser.add_argument('--lr', type=float, default=1e-4, help='learning_rate')
-    parser.add_argument(
-        '--from_pretrained',
-        type=Path,
-        required=True,
-        help='pretraind model dir')
-    parser.add_argument(
-        '--save_dir', type=Path, required=True, help='model output_dir')
-    parser.add_argument(
-        '--wd',
-        type=float,
-        default=0.01,
-        help='weight decay, aka L2 regularizer')
-    parser.add_argument('--bsz', type=int, default=50)
-    parser.add_argument(
-        '--use_amp',
-        action='store_true',
-        help='only activate AMP(auto mixed precision accelatoin) on TensorCore compatible devices'
-    )
-
-    args = parser.parse_args()
-
-    P.distributed.init_parallel_env()
-    env = P.distributed.ParallelEnv()
-
-    tokenizer = ErnieTokenizer.from_pretrained(args.from_pretrained)
-
-    train_ds = make_pretrain_dataset(
-        'train', args.data_dir, vocab=tokenizer.vocab, args=args)
-
-    model = ErnieModelForPretraining.from_pretrained(args.from_pretrained)
-
-    param_name_to_exclue_from_weight_decay = re.compile(
-        r'.*layer_norm_scale|.*layer_norm_bias|.*b_0')
-
-    lr_scheduler = P.optimizer.lr.LambdaDecay(
-        args.lr,
-        get_warmup_and_linear_decay(args.max_steps, args.warmup_steps))
-    g_clip = P.nn.ClipGradByGlobalNorm(1.0)  #experimental
-
-    opt = P.optimizer.AdamW(
-        learning_rate=lr_scheduler,
-        parameters=model.parameters(),
-        apply_decay_param_fun=lambda n: param_name_to_exclue_from_weight_decay.match(n),
-        weight_decay=args.wd,
-        grad_clip=g_clip)
-
-    model = P.DataParallel(model)
-
-    scaler = P.amp.GradScaler(enable=args.use_amp)
-    create_if_not_exists(args.save_dir)
-    with P.amp.auto_cast(args.use_amp):
-        for step, samples in enumerate(
-                P.io.DataLoader(
-                    train_ds, places=P.CUDAPlace(env.dev_id), batch_size=0)):
-            (src_ids, sent_ids, mlm_label, mask_pos, nsp_label) = samples
-            loss, mlmloss, nsploss = model(
-                src_ids,
-                sent_ids,
-                labels=mlm_label,
-                mlm_pos=mask_pos,
-                nsp_labels=nsp_label)
-            loss = scaler.scale(loss)
-            loss.backward()
-            scaler.minimize(opt, loss)
-            model.clear_gradients()
-            lr_scheduler.step()
-
-            if step % 10 == 0:
-                _lr = lr_scheduler.get_lr()
-                if args.use_amp:
-                    _l = (loss / scaler._scale).numpy()
-                    msg = '[rank-%d][step-%d] train loss %.5f lr %.3e scaling %.3e' % (
-                        env.dev_id, step, _l, _lr, scaler._scale.numpy())
-                else:
-                    _l = loss.numpy()
-                    msg = '[rank-%d][step-%d] train loss %.5f lr %.3e' % (
-                        env.dev_id, step, _l, _lr)
-                log.debug(msg)
-            if step % 1000 == 0 and env.dev_id == 0:
-                log.debug('saveing...')
-                P.save(model.state_dict(),str( args.save_dir / 'ckpt.bin'))
-            if step > args.max_steps:
-                break
-    log.info('done')
--- a/demo/pretrain/pretrain_static.py
+++ b/demo/pretrain/pretrain_static.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-from __future__ import print_function
-from __future__ import absolute_import
-from __future__ import unicode_literals
-
-import sys
-import io
-import os
-import time
-import numpy as np
-import re
-import logging
-import six
-from glob import glob
-from pathlib import Path
-from functools import reduce, partial
-import itertools
-
-import paddle as P
-import json
-
-from tqdm import tqdm
-
-import random as r
-
-from ernie.modeling_ernie import ErnieModelForPretraining
-from ernie.tokenizing_ernie import ErnieTokenizer
-from demo.optimization import optimization
-
-import propeller.paddle as propeller
-import propeller as propeller_base
-from propeller.paddle.data import Dataset
-
-from propeller import log
-
-log.setLevel(logging.DEBUG)
-logging.getLogger().setLevel(logging.DEBUG)
-
-if six.PY3:
-    from itertools import accumulate
-else:
-    import operator
-
-    def accumulate(iterable, func=operator.add, initial=None):
-        'Return running totals'
-        # accumulate([1,2,3,4,5]) --> 1 3 6 10 15
-        # accumulate([1,2,3,4,5], initial=100) --> 100 101 103 106 110 115
-        # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
-        it = iter(iterable)
-        total = initial
-        if initial is None:
-            try:
-                total = next(it)
-            except StopIteration:
-                return
-        yield total
-        for element in it:
-            total = func(total, element)
-            yield total
-
-
-def ernie_pretrain_model_fn(features, mode, params, run_config):
-    """propeller Model wraper for paddle-ERNIE """
-    src_ids, sent_ids, mlm_label, mask_pos, nsp_label = features
-
-    ernie = ErnieModelForPretraining(params, name='')
-    total_loss, mlm_loss, nsp_loss = ernie(
-        src_ids,
-        sent_ids,
-        labels=mlm_label,
-        mlm_pos=mask_pos,
-        nsp_labels=nsp_label)
-
-    metrics = None
-    inf_spec = None
-
-    propeller.summary.scalar('loss', total_loss)
-    propeller.summary.scalar('nsp-loss', nsp_loss)
-    propeller.summary.scalar('mlm-loss', mlm_loss)
-
-    lr_step_hook, loss_scale_coef = optimization(
-        loss=total_loss,
-        warmup_steps=params['warmup_steps'],
-        num_train_steps=run_config.max_steps,
-        learning_rate=params['learning_rate'],
-        train_program=P.static.default_main_program(),
-        startup_prog=P.static.default_startup_program(),
-        weight_decay=params['weight_decay'],
-        scheduler="linear_warmup_decay",
-        use_fp16=args.use_amp, )
-    scheduled_lr = P.static.default_main_program().global_block().var(
-        'learning_rate_0')
-    propeller.summary.scalar('lr', scheduled_lr)
-    if args.use_amp:
-        propeller.summary.scalar('loss_scaling', loss_scale_coef)
-    pred = [total_loss]
-
-    return propeller.ModelSpec(
-        loss=total_loss,
-        mode=mode,
-        metrics=metrics,
-        predictions=pred,
-        train_hooks=[lr_step_hook])
-
-
-def truncate_sentence(seq, from_length, to_length):
-    random_begin = np.random.randint(
-        0, np.maximum(0, from_length - to_length) + 1)
-    return seq[random_begin:random_begin + to_length]
-
-
-def build_pair(seg_a, seg_b, max_seqlen, vocab):
-    #log.debug('pair %s \n %s' % (seg_a, seg_b))
-    cls_id = vocab['[CLS]']
-    sep_id = vocab['[SEP]']
-    a_len = len(seg_a)
-    b_len = len(seg_b)
-    ml = max_seqlen - 3
-    half_ml = ml // 2
-    if a_len > b_len:
-        a_len_truncated, b_len_truncated = np.maximum(
-            half_ml, ml - b_len), np.minimum(half_ml, b_len)
-    else:
-        a_len_truncated, b_len_truncated = np.minimum(
-            half_ml, a_len), np.maximum(half_ml, ml - a_len)
-
-    seg_a = truncate_sentence(seg_a, a_len, a_len_truncated)
-    seg_b = truncate_sentence(seg_b, b_len, b_len_truncated)
-
-    seg_a_txt, seg_a_info = seg_a[:, 0], seg_a[:, 1]
-    seg_b_txt, seg_b_info = seg_b[:, 0], seg_b[:, 1]
-
-    token_type_a = np.ones_like(seg_a_txt, dtype=np.int64) * 0
-    token_type_b = np.ones_like(seg_b_txt, dtype=np.int64) * 1
-    sen_emb = np.concatenate(
-        [[cls_id], seg_a_txt, [sep_id], seg_b_txt, [sep_id]], 0)
-    info_emb = np.concatenate([[-1], seg_a_info, [-1], seg_b_info, [-1]], 0)
-    token_type_emb = np.concatenate(
-        [[0], token_type_a, [0], token_type_b, [1]], 0)
-
-    return sen_emb, info_emb, token_type_emb
-
-
-def apply_mask(sentence, seg_info, mask_rate, vocab_size, vocab):
-    pad_id = vocab['[PAD]']
-    mask_id = vocab['[MASK]']
-    shape = sentence.shape
-    batch_size, seqlen = shape
-
-    invalid_pos = np.where(seg_info == -1)
-    seg_info += 1  #no more =1
-    seg_info_flatten = seg_info.reshape([-1])
-    seg_info_incr = seg_info_flatten - np.roll(seg_info_flatten, shift=1)
-    seg_info = np.add.accumulate(
-        np.array([0 if s == 0 else 1 for s in seg_info_incr])).reshape(shape)
-    seg_info[invalid_pos] = -1
-
-    u_seginfo = np.array([i for i in np.unique(seg_info) if i != -1])
-    np.random.shuffle(u_seginfo)
-    sample_num = max(1, int(len(u_seginfo) * mask_rate))
-    u_seginfo = u_seginfo[:sample_num]
-    mask = reduce(np.logical_or, [seg_info == i for i in u_seginfo])
-
-    mask[:, 0] = False  # ignore CLS head
-
-    rand = np.random.rand(*shape)
-    choose_original = rand < 0.1  #
-    choose_random_id = (0.1 < rand) & (rand < 0.2)  #
-    choose_mask_id = 0.2 < rand  #
-    random_id = np.random.randint(1, vocab_size, size=shape)
-
-    replace_id = mask_id * choose_mask_id + \
-                 random_id * choose_random_id + \
-                 sentence * choose_original
-
-    mask_pos = np.where(mask)
-    #mask_pos_flatten = list(map(lambda idx: idx[0] * seqlen + idx[1], zip(*mask_pos))) #transpose
-    mask_label = sentence[mask_pos]
-    sentence[mask_pos] = replace_id[mask_pos]  #overwrite
-    #log.debug(mask_pos_flatten)
-    return sentence, np.stack(mask_pos, -1), mask_label
-
-
-def make_pretrain_dataset(name, dir, vocab, hparams, args):
-    gz_files = glob(dir)
-    if not gz_files:
-        raise ValueError('train data not found in %s' % dir)
-
-    log.info('read from %s' % '\n'.join(gz_files))
-    max_input_seqlen = args.max_seqlen
-    max_pretrain_seqlen = lambda: max_input_seqlen if r.random() > 0.15 else r.randint(1, max_input_seqlen)  # short sentence rate
-
-    def _parse_gz(record_str):  # function that takes python_str as input
-        ex = propeller_base.data.example_pb2.SequenceExample()
-        ex.ParseFromString(record_str)
-        doc = [
-            np.array(
-                f.int64_list.value, dtype=np.int64)
-            for f in ex.feature_lists.feature_list['txt'].feature
-        ]
-        doc_seg = [
-            np.array(
-                f.int64_list.value, dtype=np.int64)
-            for f in ex.feature_lists.feature_list['segs'].feature
-        ]
-        return doc, doc_seg
-
-    def bb_to_segments(filename):
-        ds = Dataset.from_record_file(filename).map(_parse_gz)
-        iterable = iter(ds)
-
-        def gen():
-            buf, size = [], 0
-            iterator = iter(ds)
-            while 1:
-                doc, doc_seg = next(iterator)
-                for line, line_seg in zip(doc, doc_seg):
-                    #line = np.array(sp_model.SampleEncodeAsIds(line, -1, 0.1), dtype=np.int64) # 0.1 means large variance on sentence piece result
-                    if len(line) == 0:
-                        continue
-                    line = np.array(
-                        line
-                    )  # 0.1 means large variance on sentence piece result
-                    line_seg = np.array(line_seg)
-                    size += len(line)
-                    buf.append(np.stack([line, line_seg]).transpose())
-                    if size > max_input_seqlen:
-                        yield buf,
-                        buf, size = [], 0
-                if len(buf) != 0:
-                    yield buf,
-                    buf, size = [], 0
-
-        return Dataset.from_generator_func(gen)
-
-    def sample_negative(dataset):
-        def gen():
-            iterator = iter(dataset)
-            while True:
-                chunk_a, = next(iterator)
-                #chunk_b, = next(iterator)
-
-                seqlen = max_pretrain_seqlen()
-                seqlen_a = r.randint(1, seqlen)
-                seqlen_b = seqlen - seqlen_a
-                len_a = list(accumulate([len(c) for c in chunk_a]))
-                buf_a = [c for c, l in zip(chunk_a, len_a)
-                         if l < seqlen_a]  #always take the first one
-                buf_b = [
-                    c for c, l in zip(chunk_a, len_a) if seqlen_a <= l < seqlen
-                ]
-
-                if r.random() < 0.5:  #pos or neg
-                    label = np.int64(1)
-                else:
-                    label = np.int64(0)
-                    buf_a, buf_b = buf_b, buf_a
-
-                if not (len(buf_a) and len(buf_b)):
-                    continue
-                a = np.concatenate(buf_a)
-                b = np.concatenate(buf_b)
-                #log.debug(a)
-                #log.debug(b)
-                sample, seg_info, token_type = build_pair(
-                    a, b, args.max_seqlen,
-                    vocab)  #negative sample might exceed max seqlen
-                yield sample, seg_info, token_type, label
-
-        ds = propeller.data.Dataset.from_generator_func(gen)
-        return ds
-
-    def after(sentence, seg_info, segments, label):
-        batch_size, seqlen = sentence.shape
-        sentence, mask_pos, mlm_label = apply_mask(
-            sentence, seg_info, args.mask_rate, hparams.vocab_size, vocab)
-
-        ra = r.random()
-        if ra < args.check:
-            print('***')
-            print('\n'.join([
-                str(j) + '\t' + '|'.join(map(str, i))
-                for i, j in zip(sentence.tolist(), label)
-            ]))
-            print('***')
-            print('\n'.join(
-                ['|'.join(map(str, i)) for i in seg_info.tolist()]))
-            print('***')
-            print('|'.join(map(str, mlm_label.tolist())))
-            print('***')
-
-        return sentence, segments, mlm_label, mask_pos, label
-
-    # pretrain pipeline
-    dataset = Dataset.from_list(gz_files)
-    if propeller.train.distribution.status.mode == propeller.train.distribution.DistributionMode.NCCL:
-        log.info('Apply sharding in distribution env')
-        dataset = dataset.shard(
-            propeller.train.distribution.status.num_replica,
-            propeller.train.distribution.status.replica_id)
-    dataset = dataset.repeat().shuffle(buffer_size=len(gz_files))
-
-    dataset = dataset.interleave(
-        map_fn=bb_to_segments, cycle_length=len(gz_files), block_length=1)
-    dataset = dataset.shuffle(
-        buffer_size=1000)  #must shuffle to ensure negative sample randomness
-    dataset = sample_negative(dataset)
-    dataset = dataset.padded_batch(hparams.batch_size, (0, 0, 0, 0)).map(after)
-    dataset.name = name
-    return dataset
-
-
-if __name__ == '__main__':
-    if six.PY3:
-        import io
-        sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
-        sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
-
-    parser = propeller.ArgumentParser('DAN model with Paddle')
-    parser.add_argument('--max_seqlen', type=int, default=256)
-    parser.add_argument('--data_dir', type=str, required=True)
-    parser.add_argument('--from_pretrained', type=Path, default=None)
-    parser.add_argument('--use_amp', action='store_true')
-    parser.add_argument('--mask_rate', type=float, default=0.15)
-    parser.add_argument('--check', type=float, default=0.)
-
-    args = parser.parse_args()
-    P.enable_static()
-
-    if not os.path.exists(args.from_pretrained):
-        raise ValueError('--from_pretrained not found: %s' %
-                         args.from_pretrained)
-    cfg_file_path = os.path.join(args.from_pretrained, 'ernie_config.json')
-    param_path = os.path.join(args.from_pretrained, 'params')
-    vocab_path = os.path.join(args.from_pretrained, 'vocab.txt')
-    assert os.path.exists(cfg_file_path) and os.path.exists(
-        param_path) and os.path.exists(vocab_path)
-
-    hparams_cli = propeller.parse_hparam(args)
-    hparams_config_file = json.loads(open(cfg_file_path).read())
-    default_hparams = propeller.HParams(
-        batch_size=50,
-        warmup_steps=10000,
-        learning_rate=1e-4,
-        weight_decay=0.01, )
-
-    hparams = default_hparams.join(propeller.HParams(
-        **hparams_config_file)).join(hparams_cli)
-
-    default_run_config = dict(
-        max_steps=1000000,
-        save_steps=10000,
-        log_steps=10,
-        max_ckpt=3,
-        skip_steps=0,
-        eval_steps=-1)
-
-    run_config = dict(default_run_config, **json.loads(args.run_config))
-    run_config = propeller.RunConfig(**run_config)
-
-    tokenizer = ErnieTokenizer.from_pretrained(args.from_pretrained)
-
-    train_ds = make_pretrain_dataset(
-        'train',
-        args.data_dir,
-        vocab=tokenizer.vocab,
-        hparams=hparams,
-        args=args)
-
-    seq_shape = [-1, args.max_seqlen]
-    ints_shape = [-1, ]
-    shapes = (seq_shape, seq_shape, ints_shape, [-1, 2], ints_shape)
-    types = ('int64', 'int64', 'int64', 'int64', 'int64')
-
-    train_ds.data_shapes = shapes
-    train_ds.data_types = types
-    ws = None
-
-    #varname_to_warmstart = re.compile(r'^encoder.*[wb]_0$|^.*embedding$|^.*bias$|^.*scale$|^pooled_fc.[wb]_0$')
-    varname_to_warmstart = re.compile(r'.*')
-    if args.from_pretrained is not None:
-        warm_start_dir = os.path.join(args.from_pretrained, 'params')
-        ws = propeller.WarmStartSetting(
-                predicate_fn=lambda v: varname_to_warmstart.match(v.name) and os.path.exists(os.path.join(warm_start_dir, v.name)),
-                from_dir=warm_start_dir
-            )
-
-    ernie_learner = propeller.Learner(
-        ernie_pretrain_model_fn,
-        run_config,
-        params=hparams,
-        warm_start_setting=ws)
-    ernie_learner.train(train_ds)
--- a/demo/seq2seq/README.md
+++ b/demo/seq2seq/README.md
-# ERNIE-GEN
-
-[ERNIE-GEN](https://arxiv.org/pdf/2001.11314.pdf) is a multi-flow language generation framework for both pre-training and fine-tuning.
-Only finetune strategy is illustrated in this section.
-
-## Finetune
-
-We use Abstractive Summarization task CNN/DailyMail to illustate usage of ERNIE-GEN, you can download preprocessed finetune data from [here](https://ernie-github.cdn.bcebos.com/data-cnndm.tar.gz)
-
-To starts finetuning ERNIE-GEN, run:
-
-```script
-python3 -m paddle.distributed.launch \
-    --log_dir ./log  \
-    ./demo/seq2seq/finetune_seq2seq_dygraph.py \
-    --from_pretrained ernie-gen-base-en \
-    --data_dir ./data/cnndm \
-    --save_dir ./model_cnndm \
-    --label_smooth 0.1 \
-    --use_random_noice \
-    --noise_prob 0.7 \
-    --predict_output_dir ./pred \
-    --max_steps $((287113*30/64))
-```
-
-Note that you need more than 2 GPUs to run the finetuning.
-During multi-gpu finetuning, `max_steps` is used as stop criteria rather than `epoch` to prevent dead block.
-We simply canculate `max_steps` with: `EPOCH * NUM_TRIAN_EXAMPLE / TOTAL_BATCH`.
-This demo script will save a finetuned model at `--save_dir`, and do muti-gpu prediction every `--eval_steps` and save prediction results at `--predict_output_dir`.
-
-
-### Evalution
-
-While finetuning, a serials of prediction files is generated.
-First you need to sort and join all files with:
-
-```shell
-sort -t$'\t' -k1n ./pred/pred.step60000.* |awk -F"\t" '{print $2}'> final_prediction
-```
-
-then use `./eval_cnndm/cnndm_eval.sh` to calcuate all metrics
-(`pyrouge` is required to evalute CNN/Daily Mail.)
-
-```shell
-sh cnndm_eval.sh final_prediction ./data/cnndm/dev.summary
-```
-
-
-### Inference
-
-To run beam serach decode after you got a finetuned model. try:
-
-```shell
-
-cat one_column_source_text| python3 demo/seq2seq/decode.py \
-    --from_pretrained ./ernie_gen_large \
-    --save_dir ./model_cnndm \
-    --bsz 8
-```
--- a/demo/seq2seq/decode.py
+++ b/demo/seq2seq/decode.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import division
-from __future__ import absolute_import
-from __future__ import print_function
-from __future__ import unicode_literals
-
-import sys
-import io
-import re
-import argparse
-import logging
-import json
-import numpy as np
-from pathlib import Path
-from collections import namedtuple
-
-import paddle as P
-from paddle.nn import functional as F
-
-from ernie.modeling_ernie import ErnieModel, ErnieModelForPretraining, ErnieModelForGeneration
-from ernie.modeling_ernie import _build_linear, _build_ln, append_name
-from ernie.tokenizing_ernie import ErnieTokenizer
-
-from propeller import log
-import propeller.paddle as propeller
-
-
-@np.vectorize
-def rev_lookup(i):
-    return rev_dict[i]
-
-
-def gen_bias(encoder_inputs, decoder_inputs, step):
-    decoder_bsz, decoder_seqlen = decoder_inputs.shape[:2]
-    attn_bias = P.reshape(
-        P.arange(
-            0, decoder_seqlen, 1, dtype='float32') + 1, [1, -1, 1])
-    decoder_bias = P.cast(
-        (P.matmul(
-            attn_bias, 1. / attn_bias, transpose_y=True) >= 1.),
-        'float32')  #[1, 1, decoderlen, decoderlen]
-    encoder_bias = P.unsqueeze(
-        P.cast(P.ones_like(encoder_inputs), 'float32'),
-        [1])  #[bsz, 1, encoderlen]
-    encoder_bias = P.tile(
-        encoder_bias, [1, decoder_seqlen, 1])  #[bsz,decoderlen, encoderlen]
-    decoder_bias = P.tile(decoder_bias,
-                          [decoder_bsz, 1, 1])  #[bsz, decoderlen, decoderlen]
-    if step > 0:
-        bias = P.concat([
-            encoder_bias, P.ones([decoder_bsz, decoder_seqlen, step],
-                                 'float32'), decoder_bias
-        ], -1)
-    else:
-        bias = P.concat([encoder_bias, decoder_bias], -1)
-    return bias
-
-
-#def make_data(tokenizer, inputs, max_encode_len):
-#    all_ids, all_sids = [], []
-#    for i in inputs:
-#        q_ids, q_sids = tokenizer.build_for_ernie(
-#                np.array(
-#                    tokenizer.convert_tokens_to_ids(i.split(' '))[: max_encode_len-2],
-#                    dtype=np.int64
-#                    )
-#                )
-#        all_ids.append(q_ids)
-#        all_sids.append(q_sids)
-#    ml = max(map(len, all_ids))
-#    all_ids = [np.pad(i, [0, ml-len(i)], mode='constant')for i in all_ids]
-#    all_sids = [np.pad(i, [0, ml-len(i)], mode='constant')for i in all_sids]
-#    all_ids = np.stack(all_ids, 0)
-#    all_sids = np.stack(all_sids, 0)
-#    return all_ids, all_sids
-
-
-def greedy_search_infilling(model,
-                            q_ids,
-                            q_sids,
-                            sos_id,
-                            eos_id,
-                            attn_id,
-                            max_encode_len=640,
-                            max_decode_len=100,
-                            tgt_type_id=3):
-    model.eval()
-    with P.no_grad():
-        #log.debug(q_ids.numpy().tolist())
-        _, logits, info = model(q_ids, q_sids)
-        gen_ids = P.argmax(logits, -1)
-        d_batch, d_seqlen = q_ids.shape
-        seqlen = P.cast(q_ids != 0, 'int64').sum(1, keepdim=True)
-        log.debug(seqlen.numpy())
-        log.debug(d_seqlen)
-        has_stopped = np.zeros([d_batch], dtype=np.bool)
-        gen_seq_len = np.zeros([d_batch], dtype=np.int64)
-        output_ids = []
-
-        past_cache = info['caches']
-
-        cls_ids = P.ones([d_batch], dtype='int64') * sos_id
-        attn_ids = P.ones([d_batch], dtype='int64') * attn_id
-        ids = P.stack([cls_ids, attn_ids], -1)
-        for step in range(max_decode_len):
-            log.debug('decode step %d' % step)
-            bias = gen_bias(q_ids, ids, step)
-            pos_ids = P.to_tensor(
-                np.tile(
-                    np.array(
-                        [[step, step + 1]], dtype=np.int64), [d_batch, 1]))
-            pos_ids += seqlen
-            _, logits, info = model(
-                ids,
-                P.ones_like(ids) * tgt_type_id,
-                pos_ids=pos_ids,
-                attn_bias=bias,
-                past_cache=past_cache)
-            gen_ids = P.argmax(logits, -1)
-
-            past_cached_k, past_cached_v = past_cache
-            cached_k, cached_v = info['caches']
-            cached_k = [
-                P.concat([pk, k[:, :1, :]], 1)
-                for pk, k in zip(past_cached_k, cached_k)
-            ]  # concat cached
-            cached_v = [
-                P.concat([pv, v[:, :1, :]], 1)
-                for pv, v in zip(past_cached_v, cached_v)
-            ]
-            past_cache = (cached_k, cached_v)
-
-            gen_ids = gen_ids[:, 1]
-            ids = P.stack([gen_ids, attn_ids], 1)
-
-            gen_ids = gen_ids.numpy()
-            has_stopped |= (gen_ids == eos_id).astype(np.bool)
-            gen_seq_len += (1 - has_stopped.astype(np.int64))
-            output_ids.append(gen_ids.tolist())
-            if has_stopped.all():
-                #log.debug('exit because all done')
-                break
-            #if step == 1: break
-        output_ids = np.array(output_ids).transpose([1, 0])
-    return output_ids
-
-
-BeamSearchState = namedtuple('BeamSearchState',
-                             ['log_probs', 'lengths', 'finished'])
-BeamSearchOutput = namedtuple('BeamSearchOutput',
-                              ['scores', 'predicted_ids', 'beam_parent_ids'])
-
-
-def log_softmax(x):
-    e_x = np.exp(x - np.max(x))
-    return np.log(e_x / e_x.sum())
-
-
-def mask_prob(p, onehot_eos, finished):
-    is_finished = P.cast(P.reshape(finished, [-1, 1]) != 0, 'float32')
-    p = is_finished * (1. - P.cast(onehot_eos, 'float32')) * -9999. + (
-        1. - is_finished) * p
-    return p
-
-
-def hyp_score(log_probs, length, length_penalty):
-    lp = P.pow((5. + P.cast(length, 'float32')) / 6., length_penalty)
-    return log_probs / lp
-
-
-def beam_search_step(state, logits, eos_id, beam_width, is_first_step,
-                     length_penalty):
-    """logits.shape == [B*W, V]"""
-    _, vocab_size = logits.shape
-
-    bsz, beam_width = state.log_probs.shape
-    onehot_eos = P.cast(
-        F.one_hot(P.ones([1], 'int64') * eos_id, vocab_size), 'int64')  #[1, V]
-
-    probs = P.log(F.softmax(logits))  #[B*W, V]
-    probs = mask_prob(probs, onehot_eos, state.finished)  #[B*W, V]
-    allprobs = P.reshape(state.log_probs, [-1, 1]) + probs  #[B*W, V]
-
-    not_finished = 1 - P.reshape(state.finished, [-1, 1])  #[B*W,1]
-    not_eos = 1 - onehot_eos
-    length_to_add = not_finished * not_eos  #[B*W,V]
-    alllen = P.reshape(state.lengths, [-1, 1]) + length_to_add
-
-    allprobs = P.reshape(allprobs, [-1, beam_width * vocab_size])
-    alllen = P.reshape(alllen, [-1, beam_width * vocab_size])
-    allscore = hyp_score(allprobs, alllen, length_penalty)
-    if is_first_step:
-        allscore = P.reshape(
-            allscore,
-            [bsz, beam_width, -1])[:, 0, :]  # first step only consiter beam 0
-    scores, idx = P.topk(allscore, k=beam_width)  #[B, W]
-    next_beam_id = idx // vocab_size  #[B, W]
-    next_word_id = idx % vocab_size
-
-    gather_idx = P.concat(
-        [P.nonzero(idx != -1)[:, :1], P.reshape(idx, [-1, 1])], 1)
-    next_probs = P.reshape(P.gather_nd(allprobs, gather_idx), idx.shape)
-    next_len = P.reshape(P.gather_nd(alllen, gather_idx), idx.shape)
-
-    gather_idx = P.concat([
-        P.nonzero(next_beam_id != -1)[:, :1], P.reshape(next_beam_id, [-1, 1])
-    ], 1)
-    next_finished = P.reshape(
-        P.gather_nd(state.finished, gather_idx), state.finished.
-        shape)  #[gather new beam state according to new beam id]
-    #log.debug(gather_idx.numpy())
-    #log.debug(state.finished.numpy())
-    #log.debug(next_finished.numpy())
-
-    next_finished += P.cast(next_word_id == eos_id, 'int64')
-    next_finished = P.cast(next_finished > 0, 'int64')
-
-    #log.debug(next_word_id.numpy())
-    #log.debug(next_beam_id.numpy())
-    next_state = BeamSearchState(
-        log_probs=next_probs, lengths=next_len, finished=next_finished)
-    output = BeamSearchOutput(
-        scores=scores,
-        predicted_ids=next_word_id,
-        beam_parent_ids=next_beam_id)
-
-    return output, next_state
-
-
-def beam_search_infilling(model,
-                          q_ids,
-                          q_sids,
-                          sos_id,
-                          eos_id,
-                          attn_id,
-                          max_encode_len=640,
-                          max_decode_len=100,
-                          beam_width=5,
-                          tgt_type_id=3,
-                          length_penalty=1.0):
-    model.eval()
-    with P.no_grad():
-        #log.debug(q_ids.numpy().tolist())
-        _, __, info = model(q_ids, q_sids)
-        d_batch, d_seqlen = q_ids.shape
-
-        state = BeamSearchState(
-            log_probs=P.zeros([d_batch, beam_width], 'float32'),
-            lengths=P.zeros([d_batch, beam_width], 'int64'),
-            finished=P.zeros([d_batch, beam_width], 'int64'))
-        outputs = []
-
-        def reorder_(t, parent_id):
-            """reorder cache according to parent beam id"""
-            gather_idx = P.nonzero(
-                parent_id != -1)[:, 0] * beam_width + P.reshape(parent_id,
-                                                                [-1])
-            t = P.gather(t, gather_idx)
-            return t
-
-        def tile_(t, times):
-            _shapes = list(t.shape[1:])
-            ret = P.reshape(
-                P.tile(
-                    P.unsqueeze(t, [1]), [
-                        1,
-                        times,
-                    ] + [1, ] * len(_shapes)), [-1, ] + _shapes)
-            return ret
-
-        cached_k, cached_v = info['caches']
-        cached_k = [tile_(k, beam_width) for k in cached_k]
-        cached_v = [tile_(v, beam_width) for v in cached_v]
-        past_cache = (cached_k, cached_v)
-
-        q_ids = tile_(q_ids, beam_width)
-        seqlen = P.cast(q_ids != 0, 'int64').sum(1, keepdim=True)
-        #log.debug(q_ids.shape)
-
-        cls_ids = P.ones([d_batch * beam_width], dtype='int64') * sos_id
-        attn_ids = P.ones(
-            [d_batch * beam_width], dtype='int64') * attn_id  # SOS
-        ids = P.stack([cls_ids, attn_ids], -1)
-        for step in range(max_decode_len):
-            #log.debug('decode step %d' % step)
-            bias = gen_bias(q_ids, ids, step)
-            pos_ids = P.to_tensor(
-                np.tile(
-                    np.array(
-                        [[step, step + 1]], dtype=np.int64),
-                    [d_batch * beam_width, 1]))
-            pos_ids += seqlen
-            _, logits, info = model(
-                ids,
-                P.ones_like(ids) * tgt_type_id,
-                pos_ids=pos_ids,
-                attn_bias=bias,
-                past_cache=past_cache)
-
-            output, state = beam_search_step(
-                state,
-                logits[:, 1],
-                eos_id=eos_id,
-                beam_width=beam_width,
-                is_first_step=(step == 0),
-                length_penalty=length_penalty)
-            outputs.append(output)
-
-            past_cached_k, past_cached_v = past_cache
-            cached_k, cached_v = info['caches']
-            cached_k = [
-                reorder_(
-                    P.concat([pk, k[:, :1, :]], 1), output.beam_parent_ids)
-                for pk, k in zip(past_cached_k, cached_k)
-            ]  # concat cached
-            cached_v = [
-                reorder_(
-                    P.concat([pv, v[:, :1, :]], 1), output.beam_parent_ids)
-                for pv, v in zip(past_cached_v, cached_v)
-            ]
-            past_cache = (cached_k, cached_v)
-
-            pred_ids_flatten = P.reshape(output.predicted_ids,
-                                         [d_batch * beam_width])
-            ids = P.stack([pred_ids_flatten, attn_ids], 1)
-
-            if state.finished.numpy().all():
-                #log.debug('exit because all done')
-                break
-            #if step == 1: break
-
-        final_ids = P.stack([o.predicted_ids for o in outputs], 0)
-        final_parent_ids = P.stack([o.beam_parent_ids for o in outputs], 0)
-        final_ids = P.fluid.layers.gather_tree(
-            final_ids, final_parent_ids)[:, :, 0]  #pick best beam
-        final_ids = P.transpose(
-            P.reshape(final_ids, [-1, d_batch * 1]), [1, 0])
-    return final_ids
-
-
-en_patten = re.compile(r'^[a-zA-Z0-9]*$')
-
-
-def post_process(token):
-    if token.startswith('##'):
-        ret = token[2:]
-    else:
-        if en_patten.match(token):
-            ret = ' ' + token
-        else:
-            ret = token
-    return ret
-
-
-if __name__ == '__main__':
-    sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
-    sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')
-
-    parser = argparse.ArgumentParser('seq2seq model with ERNIE')
-    parser.add_argument(
-        '--from_pretrained',
-        type=Path,
-        required=True,
-        help='pretrained model directory or tag')
-    parser.add_argument('--bsz', type=int, default=8, help='batchsize')
-    parser.add_argument('--max_encode_len', type=int, default=640)
-    parser.add_argument('--max_decode_len', type=int, default=120)
-    parser.add_argument('--tgt_type_id', type=int, default=3)
-    parser.add_argument('--beam_width', type=int, default=5)
-    parser.add_argument(
-        '--attn_token',
-        type=str,
-        default='[ATTN]',
-        help='if [ATTN] not in vocab, you can specified [MAKK] as attn-token')
-    parser.add_argument('--length_penalty', type=float, default=1.0)
-    parser.add_argument(
-        '--save_dir', type=str, required=True, help='model dir to be loaded')
-
-    args = parser.parse_args()
-
-    env = P.distributed.ParallelEnv()
-
-    ernie = ErnieModelForGeneration.from_pretrained(
-        args.from_pretrained, name='')
-    tokenizer = ErnieTokenizer.from_pretrained(
-        args.from_pretrained, mask_token=None)
-    rev_dict = {v: k for k, v in tokenizer.vocab.items()}
-    rev_dict[tokenizer.pad_id] = ''  # replace [PAD]
-    rev_dict[tokenizer.unk_id] = ''  # replace [PAD]
-
-    sd = P.load(str(args.save_dir))
-    ernie.set_state_dict(sd)
-
-    def map_fn(src_ids):
-        src_ids = src_ids[:args.max_encode_len]
-        src_ids, src_sids = tokenizer.build_for_ernie(src_ids)
-        return (src_ids, src_sids)
-
-    feature_column = propeller.data.FeatureColumns([
-        propeller.data.TextColumn(
-            'seg_a',
-            unk_id=tokenizer.unk_id,
-            vocab_dict=tokenizer.vocab,
-            tokenizer=tokenizer.tokenize),
-    ])
-    dataset = feature_column.build_dataset_from_stdin('predict').map(
-        map_fn).padded_batch(args.bsz)
-
-    for step, (encoder_ids, encoder_sids) in enumerate(dataset):
-        #result_ids = greedy_search_infilling(ernie, P.to_tensor(encoder_ids), P.to_tensor(encoder_sids),
-        #       eos_id=tokenizer.sep_id,
-        #       sos_id=tokenizer.cls_id,
-        #       attn_id=tokenizer.vocab[args.attn_id],
-        #    max_decode_len=args.max_decode_len,
-        #    max_encode_len=args.max_encode_len,
-        #    beam_width=args.beam_width,
-        #    tgt_type_id=args.tgt_type_id)
-        result_ids = beam_search_infilling(
-            ernie,
-            P.to_tensor(encoder_ids),
-            P.to_tensor(encoder_sids),
-            eos_id=tokenizer.sep_id,
-            sos_id=tokenizer.cls_id,
-            attn_id=tokenizer.vocab[args.attn_token],
-            max_decode_len=args.max_decode_len,
-            max_encode_len=args.max_encode_len,
-            beam_width=args.beam_width,
-            length_penalty=args.length_penalty,
-            tgt_type_id=args.tgt_type_id)
-
-        output_str = rev_lookup(result_ids.numpy())
-        for ostr in output_str.tolist():
-            if '[SEP]' in ostr:
-                ostr = ostr[:ostr.index('[SEP]')]
-
-            ostr = ''.join(map(post_process, ostr))
-            ostr = ostr.strip()
-            print(ostr)
--- a/demo/seq2seq/eval_cnndm/cnndm/bs_pyrouge.py
+++ b/demo/seq2seq/eval_cnndm/cnndm/bs_pyrouge.py
--- a/demo/seq2seq/eval_cnndm/cnndm/eval.py
+++ b/demo/seq2seq/eval_cnndm/cnndm/eval.py
--- a/demo/seq2seq/eval_cnndm/cnndm_eval.sh
+++ b/demo/seq2seq/eval_cnndm/cnndm_eval.sh
-set -x
-(($#!=2)) && echo "Usage predict_file label_file" && exit -1
-
-PRED=$1
-PREFIX=$2
-
-python pyrouge_set_rouge_path.py `pwd`/file2rouge/
-python cnndm/eval.py --pred ${PRED} \
-  --gold ${PREFIX} --trunc_len 100 --perl
--- a/demo/seq2seq/eval_cnndm/file2rouge/README.txt
+++ b/demo/seq2seq/eval_cnndm/file2rouge/README.txt
--- a/demo/seq2seq/eval_cnndm/file2rouge/RELEASE-NOTE.txt
+++ b/demo/seq2seq/eval_cnndm/file2rouge/RELEASE-NOTE.txt
--- a/demo/seq2seq/eval_cnndm/file2rouge/ROUGE-1.5.5.pl
+++ b/demo/seq2seq/eval_cnndm/file2rouge/ROUGE-1.5.5.pl
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM.pm
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM.pm
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/AttDef.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/AttDef.pod
-=head1 NAME
-
-XML::DOM::AttDef - A single XML attribute definition in an ATTLIST in XML::DOM 
-
-=head1 DESCRIPTION
-
-XML::DOM::AttDef extends L<XML::DOM::Node>, but is not part of the DOM Level 1
-specification.
-
-Each object of this class represents one attribute definition in an AttlistDecl.
-
-=head2 METHODS
-
-=over 4
-
-=item getName
-
-Returns the attribute name.
-
-=item getDefault
-
-Returns the default value, or undef.
-
-=item isFixed
-
-Whether the attribute value is fixed (see #FIXED keyword.)
-
-=item isRequired
-
-Whether the attribute value is required (see #REQUIRED keyword.)
-
-=item isImplied
-
-Whether the attribute value is implied (see #IMPLIED keyword.)
-
-=back
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/AttlistDecl.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/AttlistDecl.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Attr.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Attr.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/CDATASection.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/CDATASection.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/CharacterData.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/CharacterData.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Comment.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Comment.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/DOMException.pm
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/DOMException.pm
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/DOMImplementation.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/DOMImplementation.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Document.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Document.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/DocumentFragment.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/DocumentFragment.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/DocumentType.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/DocumentType.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Element.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Element.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/ElementDecl.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/ElementDecl.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Entity.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Entity.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/EntityReference.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/EntityReference.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/NamedNodeMap.pm
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/NamedNodeMap.pm
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/NamedNodeMap.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/NamedNodeMap.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Node.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Node.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/NodeList.pm
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/NodeList.pm
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/NodeList.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/NodeList.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Notation.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Notation.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Parser.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Parser.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/PerlSAX.pm
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/PerlSAX.pm
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/ProcessingInstruction.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/ProcessingInstruction.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Text.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/Text.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/XMLDecl.pod
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/DOM/XMLDecl.pod
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/Handler/BuildDOM.pm
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/Handler/BuildDOM.pm
--- a/demo/seq2seq/eval_cnndm/file2rouge/XML/RegExp.pm
+++ b/demo/seq2seq/eval_cnndm/file2rouge/XML/RegExp.pm
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-1.6-Exceptions/adj.exc
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-1.6-Exceptions/adj.exc
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-1.6-Exceptions/adv.exc
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-1.6-Exceptions/adv.exc
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-1.6-Exceptions/buildExeptionDB.pl
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-1.6-Exceptions/buildExeptionDB.pl
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-1.6-Exceptions/noun.exc
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-1.6-Exceptions/noun.exc
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-1.6-Exceptions/verb.exc
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-1.6-Exceptions/verb.exc
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0-Exceptions/adj.exc
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0-Exceptions/adj.exc
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0-Exceptions/adv.exc
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0-Exceptions/adv.exc
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0-Exceptions/buildExeptionDB.pl
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0-Exceptions/buildExeptionDB.pl
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0-Exceptions/noun.exc
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0-Exceptions/noun.exc
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0-Exceptions/verb.exc
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0-Exceptions/verb.exc
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/smart_common_words.txt
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/smart_common_words.txt
--- a/demo/seq2seq/eval_cnndm/file2rouge/runROUGE-test.pl
+++ b/demo/seq2seq/eval_cnndm/file2rouge/runROUGE-test.pl
--- a/demo/seq2seq/eval_cnndm/pyrouge_set_rouge_path.py
+++ b/demo/seq2seq/eval_cnndm/pyrouge_set_rouge_path.py
--- a/demo/seq2seq/finetune_seq2seq.py
+++ b/demo/seq2seq/finetune_seq2seq.py
--- a/demo/utils.py
+++ b/demo/utils.py
--- a/ernie-doc/README.md
+++ b/ernie-doc/README.md
--- a/ernie-gen/README.md
+++ b/ernie-gen/README.md
--- a/ernie-gram/.meta/ernie-gram.jpeg
+++ b/ernie-gram/.meta/ernie-gram.jpeg
--- a/ernie-gram/README.en.md
+++ b/ernie-gram/README.en.md
--- a/ernie-gram/README.md
+++ b/ernie-gram/README.md
--- a/ernie-gram/README.zh.md
+++ b/ernie-gram/README.zh.md
--- a/ernie-gram/finetune_classifier_distributed.py
+++ b/ernie-gram/finetune_classifier_distributed.py
--- a/ernie-gram/finetune_mrc.py
+++ b/ernie-gram/finetune_mrc.py
--- a/ernie-gram/finetune_ner.py
+++ b/ernie-gram/finetune_ner.py
--- a/ernie-gram/mrc/mrc_metrics.py
+++ b/ernie-gram/mrc/mrc_metrics.py
--- a/ernie-gram/mrc/mrc_reader.py
+++ b/ernie-gram/mrc/mrc_reader.py
--- a/ernie-gram/optimization.py
+++ b/ernie-gram/optimization.py
--- a/ernie-gram/run_cls.sh
+++ b/ernie-gram/run_cls.sh
--- a/ernie-gram/run_mrc.sh
+++ b/ernie-gram/run_mrc.sh
--- a/ernie-gram/run_ner.sh
+++ b/ernie-gram/run_ner.sh
--- a/ernie-gram/task_configs/cmrc_conf
+++ b/ernie-gram/task_configs/cmrc_conf
--- a/ernie-gram/task_configs/msra_ner_conf
+++ b/ernie-gram/task_configs/msra_ner_conf
--- a/ernie-gram/task_configs/xnli_conf
+++ b/ernie-gram/task_configs/xnli_conf
--- a/ernie-gram/utils.py
+++ b/ernie-gram/utils.py
--- a/ernie-m/README.md
+++ b/ernie-m/README.md
--- a/ernie-unimo/README.md
+++ b/ernie-unimo/README.md
--- a/ernie-vil/.meta/ernie-vil.png
+++ b/ernie-vil/.meta/ernie-vil.png
--- a/ernie-vil/README.md
+++ b/ernie-vil/README.md
--- a/ernie/__init__.py
+++ b/ernie/__init__.py
--- a/ernie/file_utils.py
+++ b/ernie/file_utils.py
--- a/ernie/modeling_ernie.py
+++ b/ernie/modeling_ernie.py
--- a/ernie/tokenizing_ernie.py
+++ b/ernie/tokenizing_ernie.py
--- a/experimental/seq2seq/README.md
+++ b/experimental/seq2seq/README.md
--- a/inference/README.md
+++ b/inference/README.md
--- a/inference/cpu/CMakeLists.txt
+++ b/inference/cpu/CMakeLists.txt
--- a/inference/cpu/inference.cc
+++ b/inference/cpu/inference.cc
--- a/inference/cpu/run.sh
+++ b/inference/cpu/run.sh
--- a/inference/data/sample
+++ b/inference/data/sample
--- a/inference/gpu/CMakeLists.txt
+++ b/inference/gpu/CMakeLists.txt
--- a/inference/gpu/inference.cc
+++ b/inference/gpu/inference.cc
--- a/inference/gpu/run.sh
+++ b/inference/gpu/run.sh
--- a/nlp-ernie/.DS_Store
+++ b/nlp-ernie/.DS_Store
--- a/nlp-ernie/wenxin/.gitignore
+++ b/nlp-ernie/wenxin/.gitignore
--- a/nlp-ernie/wenxin/__init__.py
+++ b/nlp-ernie/wenxin/__init__.py
--- a/nlp-ernie/wenxin/build.sh
+++ b/nlp-ernie/wenxin/build.sh
--- a/nlp-ernie/wenxin/ci.yml
+++ b/nlp-ernie/wenxin/ci.yml
--- a/demo/__init__.py
+++ b/demo/__init__.py
--- a/nlp-ernie/wenxin/common/jit_wenxin.py
+++ b/nlp-ernie/wenxin/common/jit_wenxin.py
--- a/nlp-ernie/wenxin/common/register.py
+++ b/nlp-ernie/wenxin/common/register.py
--- a/nlp-ernie/wenxin/common/rule.py
+++ b/nlp-ernie/wenxin/common/rule.py
--- a/demo/mrc/__init__.py
+++ b/demo/mrc/__init__.py
--- a/nlp-ernie/wenxin/controller/dynamic_trainer.py
+++ b/nlp-ernie/wenxin/controller/dynamic_trainer.py
--- a/nlp-ernie/wenxin/controller/evaluate.py
+++ b/nlp-ernie/wenxin/controller/evaluate.py
--- a/nlp-ernie/wenxin/controller/inference.py
+++ b/nlp-ernie/wenxin/controller/inference.py
--- a/nlp-ernie/wenxin/controller/static_trainer.py
+++ b/nlp-ernie/wenxin/controller/static_trainer.py
--- a/nlp-ernie/wenxin/controller/static_trainer_ernie_gen.py
+++ b/nlp-ernie/wenxin/controller/static_trainer_ernie_gen.py
--- a/demo/seq2seq/eval_cnndm/cnndm/__init__.py
+++ b/demo/seq2seq/eval_cnndm/cnndm/__init__.py
--- a/nlp-ernie/wenxin/data/data_set.py
+++ b/nlp-ernie/wenxin/data/data_set.py
--- a/nlp-ernie/wenxin/data/data_set_ernie3.py
+++ b/nlp-ernie/wenxin/data/data_set_ernie3.py
--- a/ernie-gram/__init__.py
+++ b/ernie-gram/__init__.py
--- a/nlp-ernie/wenxin/data/data_set_reader/base_dataset_reader.py
+++ b/nlp-ernie/wenxin/data/data_set_reader/base_dataset_reader.py
--- a/nlp-ernie/wenxin/data/data_set_reader/base_dataset_reader_ernie_gen.py
+++ b/nlp-ernie/wenxin/data/data_set_reader/base_dataset_reader_ernie_gen.py
--- a/nlp-ernie/wenxin/data/data_set_reader/basic_dataset_reader.py
+++ b/nlp-ernie/wenxin/data/data_set_reader/basic_dataset_reader.py
--- a/nlp-ernie/wenxin/data/field.py
+++ b/nlp-ernie/wenxin/data/field.py
--- a/ernie-gram/mrc/__init__.py
+++ b/ernie-gram/mrc/__init__.py
--- a/nlp-ernie/wenxin/data/field_reader/base_field_reader.py
+++ b/nlp-ernie/wenxin/data/field_reader/base_field_reader.py
--- a/nlp-ernie/wenxin/data/field_reader/custom_text_field_reader.py
+++ b/nlp-ernie/wenxin/data/field_reader/custom_text_field_reader.py
--- a/nlp-ernie/wenxin/data/field_reader/ernie_classification_field_reader.py
+++ b/nlp-ernie/wenxin/data/field_reader/ernie_classification_field_reader.py
--- a/nlp-ernie/wenxin/data/field_reader/ernie_seqlabel_label_field_reader.py
+++ b/nlp-ernie/wenxin/data/field_reader/ernie_seqlabel_label_field_reader.py
--- a/nlp-ernie/wenxin/data/field_reader/ernie_text_field_reader.py
+++ b/nlp-ernie/wenxin/data/field_reader/ernie_text_field_reader.py
--- a/nlp-ernie/wenxin/data/field_reader/ernie_text_field_reader_for_doc.py
+++ b/nlp-ernie/wenxin/data/field_reader/ernie_text_field_reader_for_doc.py
--- a/nlp-ernie/wenxin/data/field_reader/ernie_text_field_reader_for_multilingual.py
+++ b/nlp-ernie/wenxin/data/field_reader/ernie_text_field_reader_for_multilingual.py
--- a/nlp-ernie/wenxin/data/field_reader/scalar_array_field_reader.py
+++ b/nlp-ernie/wenxin/data/field_reader/scalar_array_field_reader.py
--- a/nlp-ernie/wenxin/data/field_reader/scalar_field_reader.py
+++ b/nlp-ernie/wenxin/data/field_reader/scalar_field_reader.py
--- a/nlp-ernie/wenxin/data/field_reader/text_field_reader.py
+++ b/nlp-ernie/wenxin/data/field_reader/text_field_reader.py
--- a/nlp-ernie/wenxin/data/reader_config.py
+++ b/nlp-ernie/wenxin/data/reader_config.py
--- a/nlp-ernie/wenxin/data/tokenizer/__init__.py
+++ b/nlp-ernie/wenxin/data/tokenizer/__init__.py
--- a/nlp-ernie/wenxin/data/tokenizer/custom_tokenizer.py
+++ b/nlp-ernie/wenxin/data/tokenizer/custom_tokenizer.py
--- a/nlp-ernie/wenxin/data/tokenizer/doie_basic_tokenizer.py
+++ b/nlp-ernie/wenxin/data/tokenizer/doie_basic_tokenizer.py
--- a/nlp-ernie/wenxin/data/tokenizer/doie_ernie_tiny_tokenizer.py
+++ b/nlp-ernie/wenxin/data/tokenizer/doie_ernie_tiny_tokenizer.py
--- a/nlp-ernie/wenxin/data/tokenizer/ernie_sim_slim_tokenizer.py
+++ b/nlp-ernie/wenxin/data/tokenizer/ernie_sim_slim_tokenizer.py
--- a/nlp-ernie/wenxin/data/tokenizer/lac_tokenizer.py
+++ b/nlp-ernie/wenxin/data/tokenizer/lac_tokenizer.py
--- a/nlp-ernie/wenxin/data/tokenizer/mrc_tokenizer.py
+++ b/nlp-ernie/wenxin/data/tokenizer/mrc_tokenizer.py
--- a/nlp-ernie/wenxin/data/tokenizer/nlpc_wordseg_tokenizer.py
+++ b/nlp-ernie/wenxin/data/tokenizer/nlpc_wordseg_tokenizer.py
--- a/nlp-ernie/wenxin/data/tokenizer/tokenization_erniem.py
+++ b/nlp-ernie/wenxin/data/tokenizer/tokenization_erniem.py
--- a/nlp-ernie/wenxin/data/tokenizer/tokenization_mix.py
+++ b/nlp-ernie/wenxin/data/tokenizer/tokenization_mix.py
--- a/nlp-ernie/wenxin/data/tokenizer/tokenization_spm.py
+++ b/nlp-ernie/wenxin/data/tokenizer/tokenization_spm.py
--- a/nlp-ernie/wenxin/data/tokenizer/tokenization_utils.py
+++ b/nlp-ernie/wenxin/data/tokenizer/tokenization_utils.py
--- a/nlp-ernie/wenxin/data/tokenizer/tokenization_wp.py
+++ b/nlp-ernie/wenxin/data/tokenizer/tokenization_wp.py
--- a/nlp-ernie/wenxin/data/tokenizer/tokenizer.py
+++ b/nlp-ernie/wenxin/data/tokenizer/tokenizer.py
--- a/nlp-ernie/wenxin/data/util_helper.py
+++ b/nlp-ernie/wenxin/data/util_helper.py
--- a/nlp-ernie/wenxin/data/vocabulary.py
+++ b/nlp-ernie/wenxin/data/vocabulary.py
--- a/experimental/__init__.py
+++ b/experimental/__init__.py
--- a/nlp-ernie/wenxin/metrics/chunk_metrics.py
+++ b/nlp-ernie/wenxin/metrics/chunk_metrics.py
--- a/nlp-ernie/wenxin/metrics/gen_eval.py
+++ b/nlp-ernie/wenxin/metrics/gen_eval.py
--- a/nlp-ernie/wenxin/metrics/metrics.py
+++ b/nlp-ernie/wenxin/metrics/metrics.py
--- a/nlp-ernie/wenxin/metrics/mrr.py
+++ b/nlp-ernie/wenxin/metrics/mrr.py
--- a/demo/mrc/mrc_metrics.py
+++ b/demo/mrc/mrc_metrics.py
--- a/nlp-ernie/wenxin/metrics/tuple.py
+++ b/nlp-ernie/wenxin/metrics/tuple.py
--- a/nlp-ernie/wenxin/model/__init__.py
+++ b/nlp-ernie/wenxin/model/__init__.py
--- a/nlp-ernie/wenxin/model/base_ernie_model.py
+++ b/nlp-ernie/wenxin/model/base_ernie_model.py
--- a/nlp-ernie/wenxin/model/model.py
+++ b/nlp-ernie/wenxin/model/model.py
--- a/nlp-ernie/wenxin/modules/__init__.py
+++ b/nlp-ernie/wenxin/modules/__init__.py
--- a/nlp-ernie/wenxin/modules/encoder.py
+++ b/nlp-ernie/wenxin/modules/encoder.py
--- a/nlp-ernie/wenxin/modules/ernie.py
+++ b/nlp-ernie/wenxin/modules/ernie.py
--- a/nlp-ernie/wenxin/modules/ernie_config.py
+++ b/nlp-ernie/wenxin/modules/ernie_config.py
--- a/nlp-ernie/wenxin/modules/ernie_factory.py
+++ b/nlp-ernie/wenxin/modules/ernie_factory.py
--- a/nlp-ernie/wenxin/modules/ernie_gen.py
+++ b/nlp-ernie/wenxin/modules/ernie_gen.py
--- a/nlp-ernie/wenxin/modules/ernie_lr.py
+++ b/nlp-ernie/wenxin/modules/ernie_lr.py
--- a/nlp-ernie/wenxin/modules/token_embedding/__init__.py
+++ b/nlp-ernie/wenxin/modules/token_embedding/__init__.py
--- a/nlp-ernie/wenxin/modules/token_embedding/base_token_embedding.py
+++ b/nlp-ernie/wenxin/modules/token_embedding/base_token_embedding.py
--- a/nlp-ernie/wenxin/modules/token_embedding/custom_fluid_embedding.py
+++ b/nlp-ernie/wenxin/modules/token_embedding/custom_fluid_embedding.py
--- a/nlp-ernie/wenxin/modules/token_embedding/custom_token_embedding.py
+++ b/nlp-ernie/wenxin/modules/token_embedding/custom_token_embedding.py
--- a/nlp-ernie/wenxin/modules/transformer_encoder_gen.py
+++ b/nlp-ernie/wenxin/modules/transformer_encoder_gen.py
--- a/nlp-ernie/wenxin/utils/__init__.py
+++ b/nlp-ernie/wenxin/utils/__init__.py
--- a/nlp-ernie/wenxin/utils/args.py
+++ b/nlp-ernie/wenxin/utils/args.py
--- a/nlp-ernie/wenxin/utils/log.py
+++ b/nlp-ernie/wenxin/utils/log.py
--- a/nlp-ernie/wenxin/utils/multi_process_eval.py
+++ b/nlp-ernie/wenxin/utils/multi_process_eval.py
--- a/nlp-ernie/wenxin/utils/params.py
+++ b/nlp-ernie/wenxin/utils/params.py
--- a/nlp-ernie/wenxin/utils/util_helper.py
+++ b/nlp-ernie/wenxin/utils/util_helper.py
--- a/nlp-ernie/wenxin/utils/visual_manager.py
+++ b/nlp-ernie/wenxin/utils/visual_manager.py
--- a/nlp-ernie/wenxin/version.py
+++ b/nlp-ernie/wenxin/version.py
--- a/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0.exc.db
+++ b/demo/seq2seq/eval_cnndm/file2rouge/data/WordNet-2.0.exc.db
--- a/nlp-ernie/wenxin_appzoo/.gitignore
+++ b/nlp-ernie/wenxin_appzoo/.gitignore
--- a/nlp-ernie/wenxin_appzoo/CHANGELOG.md
+++ b/nlp-ernie/wenxin_appzoo/CHANGELOG.md
--- a/nlp-ernie/wenxin_appzoo/README.md
+++ b/nlp-ernie/wenxin_appzoo/README.md
--- a/nlp-ernie/wenxin_appzoo/build.sh
+++ b/nlp-ernie/wenxin_appzoo/build.sh
--- a/nlp-ernie/wenxin_appzoo/ci.yml
+++ b/nlp-ernie/wenxin_appzoo/ci.yml
--- a/nlp-ernie/wenxin_appzoo/requirements.txt
+++ b/nlp-ernie/wenxin_appzoo/requirements.txt
--- a/nlp-ernie/wenxin_appzoo/setup.cfg
+++ b/nlp-ernie/wenxin_appzoo/setup.cfg
--- a/nlp-ernie/wenxin_appzoo/setup.py
+++ b/nlp-ernie/wenxin_appzoo/setup.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/.DS_Store
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/.DS_Store
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/__main__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/__main__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/cmdline.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/cmdline.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/demo.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/demo.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_data_aug.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_data_aug.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_2.0_base_ch.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_2.0_base_ch.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_2.0_large_ch.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_2.0_large_ch.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_3.0_base_ch.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_3.0_base_ch.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_3.0_medium.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_3.0_medium.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_3.0_x_base_ch.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_3.0_x_base_ch.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_gen_base_ch.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_gen_base_ch.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_m_1.0_base.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/download_ernie_m_1.0_base.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/readme.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/readme.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/README.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/dev_data/dev_1.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/dev_data/dev_1.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/dict/sentencepiece.bpe.model
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/dict/sentencepiece.bpe.model
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/dict/vocab.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/dict/vocab.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/download_data.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/download_data.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/predict_data/infer.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/predict_data/infer.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/test_data/test.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/test_data/test.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/train_data/train.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/data/train_data/train.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/distill/download_data.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/distill/download_data.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/examples/cls_bow_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/examples/cls_bow_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/examples/cls_cnn_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/examples/cls_cnn_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/examples/cls_ernie_fc_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/examples/cls_ernie_fc_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/examples/cls_ernie_fc_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/examples/cls_ernie_fc_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/inference/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/inference/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/inference/custom_inference.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/inference/custom_inference.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/model/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/model/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/model/base_cls.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/model/base_cls.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/model/bow_classification.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/model/bow_classification.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/model/cnn_classification.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/model/cnn_classification.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/model/ernie_classification.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/model/ernie_classification.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/run_distill.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/run_distill.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/run_infer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/run_infer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/run_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/run_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/trainer/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/trainer/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/trainer/custom_dynamic_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/trainer/custom_dynamic_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/trainer/custom_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/trainer/custom_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/DuIE2.0/convert_data.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/DuIE2.0/convert_data.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_attribute_data/dev_data/dev.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_attribute_data/dev_data/dev.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_attribute_data/predict_data/predict.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_attribute_data/predict_data/predict.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_attribute_data/test_data/test.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_attribute_data/test_data/test.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_attribute_data/train_data/train.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_attribute_data/train_data/train.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_relation_data/dev_data/dev.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_relation_data/dev_data/dev.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_relation_data/predict_data/predict.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_relation_data/predict_data/predict.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_relation_data/test_data/test.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_relation_data/test_data/test.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_relation_data/train_data/train.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data/entity_relation_data/train_data/train.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data_set_reader/ie_data_set_reader.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/data_set_reader/ie_data_set_reader.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/dict/entity_attribute_label_map.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/dict/entity_attribute_label_map.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/dict/entity_relation_label_map.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/dict/entity_relation_label_map.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/examples/many_to_many_ie_attribute_ernie_fc_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/examples/many_to_many_ie_attribute_ernie_fc_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/examples/many_to_many_ie_attribute_ernie_fc_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/examples/many_to_many_ie_attribute_ernie_fc_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/examples/many_to_many_ie_relation_ernie_fc_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/examples/many_to_many_ie_relation_ernie_fc_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/examples/many_to_many_ie_relation_ernie_fc_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/examples/many_to_many_ie_relation_ernie_fc_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/inference/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/inference/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/inference/custom_inference.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/inference/custom_inference.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/model/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/model/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/model/ernie_fc_ie_many_to_many.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/model/ernie_fc_ie_many_to_many.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/readme.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/readme.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/run_infer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/run_infer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/run_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/run_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/trainer/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/trainer/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/trainer/custom_dynamic_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/trainer/custom_dynamic_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/trainer/custom_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/trainer/custom_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/.aipe-container-id
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/.aipe-container-id
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/README.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/dev_data/dev.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/dev_data/dev.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/download_data.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/download_data.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/msra_ner/dev/dev.tsv
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/msra_ner/dev/dev.tsv
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/msra_ner/label_map.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/msra_ner/label_map.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/msra_ner/predict/predict.tsv
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/msra_ner/predict/predict.tsv
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/msra_ner/test/test.tsv
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/msra_ner/test/test.tsv
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/msra_ner/train/train.tsv
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/msra_ner/train/train.tsv
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/predict_data/infer.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/predict_data/infer.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/test_data/test.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/test_data/test.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/train_data/train.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/data/train_data/train.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/dict/vocab.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/dict/vocab.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/dict/vocab_label_map.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/dict/vocab_label_map.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/examples/seqlab_ernie_fc_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/examples/seqlab_ernie_fc_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/examples/seqlab_ernie_fc_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/examples/seqlab_ernie_fc_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/inference/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/inference/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/inference/custom_inference.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/inference/custom_inference.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/model/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/model/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/model/ernie_fc_sequence_label.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/model/ernie_fc_sequence_label.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/run_infer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/run_infer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/run_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/run_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/trainer/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/trainer/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/trainer/custom_dynamic_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/trainer/custom_dynamic_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/trainer/custom_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/trainer/custom_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/README.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/README_grid.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/README_grid.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/dev_data/dev_1.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/dev_data/dev_1.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/dev_data_one_sent_multilingual/multi_dev.tsv
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/dev_data_one_sent_multilingual/multi_dev.tsv
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/dict/sentencepiece.bpe.model
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/dict/sentencepiece.bpe.model
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/dict/vocab.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/dict/vocab.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/download_data.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/download_data.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/multi_label_data/dev_data/dev.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/multi_label_data/dev_data/dev.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/multi_label_data/test_data/test.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/multi_label_data/test_data/test.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/multi_label_data/train_data/train.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/multi_label_data/train_data/train.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/predict_data/infer.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/predict_data/infer.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/predict_data_one_sent_multilingual/multi_predict.tsv
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/predict_data_one_sent_multilingual/multi_predict.tsv
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/test_data/test.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/test_data/test.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/test_data_one_sent_multilingual/multi_test.tsv
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/test_data_one_sent_multilingual/multi_test.tsv
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/train_data/train.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/train_data/train.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/train_data_one_sent_multilingual/multi_train.tsv
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data/train_data_one_sent_multilingual/multi_train.tsv
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data_set_reader/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/data_set_reader/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_bow_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_bow_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_bow_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_bow_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_fc_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_fc_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_fc_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_fc_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_fc_ch_with_data_aug.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_fc_ch_with_data_aug.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_m_1.0_base_one_sent.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_m_1.0_base_one_sent.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_m_1.0_base_one_sent_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_m_1.0_base_one_sent_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_multi_label_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_multi_label_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_multi_label_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/examples/cls_ernie_multi_label_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/inference/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/inference/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/inference/custom_inference.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/inference/custom_inference.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/model/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/model/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/model/base_cls.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/model/base_cls.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/model/bow_classification.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/model/bow_classification.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/model/ernie_classification.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/model/ernie_classification.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/model/multi_label_classification.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/model/multi_label_classification.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/reader/categorical_field_reader.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/reader/categorical_field_reader.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/reader/multi_label_field_reader.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/reader/multi_label_field_reader.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/readme_ERNIE.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/readme_ERNIE.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/readme_M.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/readme_M.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/readme_bow.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/readme_bow.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/readme_code.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/readme_code.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/run_infer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/run_infer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/run_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/run_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/run_with_data_aug.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/run_with_data_aug.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/trainer/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/trainer/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/trainer/custom_dynamic_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/trainer/custom_dynamic_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/trainer/custom_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/trainer/custom_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/README.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/data/ernie_gen_dureader/dev.tsv
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/data/ernie_gen_dureader/dev.tsv
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/data/ernie_gen_dureader/test.tsv
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/data/ernie_gen_dureader/test.tsv
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/data/ernie_gen_dureader/train.tsv
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/data/ernie_gen_dureader/train.tsv
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/data_set_reader/ernie_gen_infilling_dataset_reader.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/data_set_reader/ernie_gen_infilling_dataset_reader.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/examples/cls_ernie_gen_infilling_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/examples/cls_ernie_gen_infilling_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/examples/cls_ernie_gen_infilling_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/examples/cls_ernie_gen_infilling_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/inference/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/inference/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/inference/custom_inference.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/inference/custom_inference.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/model/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/model/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/model/ernie_infilling_generation.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/model/ernie_infilling_generation.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/run_infer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/run_infer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/run_trainer_ernie_gen.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/run_trainer_ernie_gen.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/trainer/.ipynb_checkpoints/__init__-checkpoint.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/trainer/.ipynb_checkpoints/__init__-checkpoint.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/trainer/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/trainer/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/trainer/custom_static_generation_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/trainer/custom_static_generation_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/.aipe-container-id
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/.aipe-container-id
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/README.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/dev_data/dev.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/dev_data/dev.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/dev_data_tokenized/dev.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/dev_data_tokenized/dev.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/dict/vocab.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/dict/vocab.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/download_data.sh
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/download_data.sh
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/predict_data/infer.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/predict_data/infer.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/predict_data_tokenized/infer.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/predict_data_tokenized/infer.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/test_data/test.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/test_data/test.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/test_data_tokenized/test.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/test_data_tokenized/test.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/train_data_pairwise/train.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/train_data_pairwise/train.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/train_data_pairwise_tokenized/train.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/train_data_pairwise_tokenized/train.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/train_data_pointwise/train.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data/train_data_pointwise/train.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data_set_reader/ernie_classification_dataset_reader.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/data_set_reader/ernie_classification_dataset_reader.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_bow_pairwise_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_bow_pairwise_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_bow_pairwise_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_bow_pairwise_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_fc_pointwise_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_fc_pointwise_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_fc_pointwise_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_fc_pointwise_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_pairwise_simnet_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_pairwise_simnet_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_pairwise_simnet_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_pairwise_simnet_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_pointwise_simnet_ch.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_pointwise_simnet_ch.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_pointwise_simnet_ch_infer.json
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/examples/mtch_ernie_pointwise_simnet_ch_infer.json
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/inference/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/inference/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/inference/custom_inference.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/inference/custom_inference.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/base_matching.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/base_matching.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/bow_matching_pairwise.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/bow_matching_pairwise.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/ernie_matching_fc_pointwise.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/ernie_matching_fc_pointwise.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/ernie_matching_siamese_pairwise.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/ernie_matching_siamese_pairwise.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/ernie_matching_siamese_pointwise.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/model/ernie_matching_siamese_pointwise.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/run_infer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/run_infer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/run_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/run_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/trainer/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/trainer/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/trainer/custom_dynamic_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/trainer/custom_dynamic_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/trainer/custom_trainer.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/trainer/custom_trainer.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/data_aug/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/data_aug/README.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/data_aug/data_aug.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/data_aug/data_aug.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/data_cleaning/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/data_cleaning/README.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/data_cleaning/file_encoding.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/data_cleaning/file_encoding.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/wordseg/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/wordseg/README.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/wordseg/__init__.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/wordseg/__init__.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/wordseg/build_voc.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/wordseg/build_voc.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/wordseg/test_seg/train_chnsenticorp_100.txt
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/wordseg/test_seg/train_chnsenticorp_100.txt
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/wordseg/wordseg_lac.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/data/wordseg/wordseg_lac.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/readme.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/readme.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/run_preprocess/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/run_preprocess/README.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/run_preprocess/pretreatment.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/run_preprocess/pretreatment.py
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/run_preprocess/run_with_preprocess.py
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/run_preprocess/run_with_preprocess.py
--- a/propeller/__init__.py
+++ b/propeller/__init__.py
--- a/propeller/data/__init__.py
+++ b/propeller/data/__init__.py
--- a/propeller/data/example.proto
+++ b/propeller/data/example.proto
--- a/propeller/data/example_pb2.py
+++ b/propeller/data/example_pb2.py
--- a/propeller/data/feature.proto
+++ b/propeller/data/feature.proto
--- a/propeller/data/feature_column.py
+++ b/propeller/data/feature_column.py
--- a/propeller/data/feature_pb2.py
+++ b/propeller/data/feature_pb2.py
--- a/propeller/data/functional.py
+++ b/propeller/data/functional.py
--- a/propeller/paddle/__init__.py
+++ b/propeller/paddle/__init__.py
--- a/propeller/paddle/collection.py
+++ b/propeller/paddle/collection.py
--- a/propeller/paddle/data/__init__.py
+++ b/propeller/paddle/data/__init__.py
--- a/propeller/paddle/data/feature_column.py
+++ b/propeller/paddle/data/feature_column.py
--- a/propeller/paddle/data/functional.py
+++ b/propeller/paddle/data/functional.py
--- a/propeller/paddle/summary.py
+++ b/propeller/paddle/summary.py
--- a/propeller/paddle/train/__init__.py
+++ b/propeller/paddle/train/__init__.py
--- a/propeller/paddle/train/distribution.py
+++ b/propeller/paddle/train/distribution.py
--- a/propeller/paddle/train/exporter.py
+++ b/propeller/paddle/train/exporter.py
--- a/propeller/paddle/train/hooks.py
+++ b/propeller/paddle/train/hooks.py
--- a/propeller/paddle/train/metrics.py
+++ b/propeller/paddle/train/metrics.py
--- a/propeller/paddle/train/monitored_executor.py
+++ b/propeller/paddle/train/monitored_executor.py
--- a/propeller/paddle/train/trainer.py
+++ b/propeller/paddle/train/trainer.py
--- a/propeller/service/__init__.py
+++ b/propeller/service/__init__.py
--- a/propeller/service/client.py
+++ b/propeller/service/client.py
--- a/propeller/service/interface.proto
+++ b/propeller/service/interface.proto
--- a/propeller/service/interface_pb2.py
+++ b/propeller/service/interface_pb2.py
--- a/propeller/service/server.py
+++ b/propeller/service/server.py
--- a/propeller/service/utils.py
+++ b/propeller/service/utils.py
--- a/propeller/tools/__init__.py
+++ b/propeller/tools/__init__.py
--- a/propeller/tools/ckpt_inspector.py
+++ b/propeller/tools/ckpt_inspector.py
--- a/propeller/tools/start_server.py
+++ b/propeller/tools/start_server.py
--- a/propeller/train/__init__.py
+++ b/propeller/train/__init__.py
--- a/propeller/train/model.py
+++ b/propeller/train/model.py
--- a/propeller/types.py
+++ b/propeller/types.py
--- a/propeller/util.py
+++ b/propeller/util.py
--- a/readme_env.md
+++ b/readme_env.md
--- a/readme_model.md
+++ b/readme_model.md
--- a/readme_score.md
+++ b/readme_score.md
--- a/requirements.txt
+++ b/requirements.txt
--- a/setup.py
+++ b/setup.py