Update PaddleNLP LAC model for new codestyle (#3463)

* reconstruct the run_sequence_labeling.py into train.py predict.py eval.py & add yalm configure * reconstruct the ERNIE based LAC model * Update train_ernie.py recurrent multi-GPU nan * update the ernie base model * configure update * update configure * add inference model * specification code * delete unused run_sequence_labeling.py * rename evaluate.py to compare.py * add postfix '.pdckpt' to model * update README.md * add LAC class(for predict conveniently) * add LAC class(for predict conveniently) * update README.md * fixed bug on run_ernie * update default setting * fix infenence bug in windows * fix infenence bug in windows * update new model and dateset * delete the postfix .pdckpt of model checkpoint directory * update new model's performance * fixed the bug for empty input * unusing of tqdm tools * fix the bug of train_data

Update PaddleNLP LAC model for new codestyle (#3463)
* reconstruct the run_sequence_labeling.py into train.py predict.py eval.py & add yalm configure * reconstruct the ERNIE based LAC model * Update train_ernie.py recurrent multi-GPU nan * update the ernie base model * configure update * update configure * add inference model * specification code * delete unused run_sequence_labeling.py * rename evaluate.py to compare.py * add postfix '.pdckpt' to model * update README.md * add LAC class(for predict conveniently) * add LAC class(for predict conveniently) * update README.md * fixed bug on run_ernie * update default setting * fix infenence bug in windows * fix infenence bug in windows * update new model and dateset * delete the postfix .pdckpt of model checkpoint directory * update new model's performance * fixed the bug for empty input * unusing of tqdm tools * fix the bug of train_data
e9c7c30e · SYSU_BOND · bbking · 0cc14636 · e9c7c30e · e9c7c30e
17 changed file
--- a/PaddleNLP/lexical_analysis/README.md
+++ b/PaddleNLP/lexical_analysis/README.md
@@ -2,13 +2,13 @@

 ## 1. 简介

-Lexical Analysis of Chinese，简称 LAC，是一个联合的词法分析模型，能整体性地完成中文分词、词性标注、专名识别任务。我们在自建的数据集上对分词、词性标注、专名识别进行整体的评估效果，具体数值见下表；此外，我们在百度开放的 [ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE) 模型上 finetune，并对比基线模型、BERT finetuned 和 ERNIE finetuned 的效果，可以看出会有显著的提升。可通过 [AI开放平台-词法分析](http://ai.baidu.com/tech/nlp/lexical) 线上体验百度的词法分析服务。
+Lexical Analysis of Chinese，简称 LAC，是一个联合的词法分析模型，在单个模型中完成中文分词、词性标注、专名识别任务。我们在自建的数据集上对分词、词性标注、专名识别进行整体的评估效果，具体数值见下表；此外，我们在百度开放的 [ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE) 模型上 finetune，并对比基线模型、BERT finetuned 和 ERNIE finetuned 的效果，可以看出会有显著的提升。可通过 [AI开放平台-词法分析](http://ai.baidu.com/tech/nlp/lexical) 线上体验百度的词法分析服务。

 |模型|Precision|Recall|F1-score|
 |:-:|:-:|:-:|:-:|
-|Lexical Analysis|88.0%|88.7%|88.4%|
+|Lexical Analysis|89.2%|89.4%|89.3%|
 |BERT finetuned|90.2%|90.4%|90.3%|
-|ERNIE finetuned|92.0%|92.0%|92.0%|
+|ERNIE finetuned|91.7%|91.7%|91.7%|

 ## 2. 快速开始

@@ -16,7 +16,7 @@ Lexical Analysis of Chinese，简称 LAC，是一个联合的词法分析模型

 #### 1.PaddlePaddle 安装

-本项目依赖 PaddlePaddle 1.3.2 及以上版本，安装请参考官网 [快速安装](http://www.paddlepaddle.org/paddle#quick-start)。
+本项目依赖 PaddlePaddle 1.4.0 及以上版本和PaddleHub 1.0.0及以上版本 ，PaddlePaddle安装请参考官网 [快速安装](http://www.paddlepaddle.org/paddle#quick-start)，PaddleHub安装参考 [PaddleHub](https://github.com/PaddlePaddle/PaddleHub)。

 > Warning: GPU 和 CPU 版本的 PaddlePaddle 分别是 paddlepaddle-gpu 和 paddlepaddle，请安装时注意区别。

@@ -27,20 +27,32 @@ Lexical Analysis of Chinese，简称 LAC，是一个联合的词法分析模型
 cd models/PaddleNLP/lexical_analysis
 ```
 ### 数据准备
+
+#### 1. 快速下载
+
+本项目涉及的**数据集**和**预训练模型**的数据可通过执行以下脚本进行快速下载，若仅需使用部分数据，可根据需要参照下列介绍进行部分下载
+
+```bash
+sh download.sh
+```
+
+#### 2. 训练数据集
+
 下载数据集文件，解压后会生成 `./data/` 文件夹
 ```bash
-wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/lexical_analysis-dataset-1.0.0.tar.gz
-tar xvf lexical_analysis-dataset-1.0.0.tar.gz
+wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/lexical_analysis-dataset-2.0.0.tar.gz
+tar xvf lexical_analysis-dataset-2.0.0.tar.gz
 ```

-### 模型下载
+#### 3. 预训练模型
+
 我们开源了在自建数据集上训练的词法分析模型，可供用户直接使用，这里提供两种下载方式：

 方式一：基于 PaddleHub 命令行工具，PaddleHub 的安装参考 [PaddleHub](https://github.com/PaddlePaddle/PaddleHub)
 ```bash
 # download baseline model
 hub download lexical_analysis
-tar xvf lexical_analysis-1.0.0.tar.gz
+tar xvf lexical_analysis-2.0.0.tar.gz

 # download ERNIE finetuned model
 hub download lexical_analysis_finetuned
@@ -50,17 +62,18 @@ tar xvf lexical_analysis_finetuned-1.0.0.tar.gz
 方式二：直接下载
 ```bash
 # download baseline model
-wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/lexical_analysis-1.0.0.tar.gz
-tar xvf lexical_analysis-1.0.0.tar.gz
+wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/lexical_analysis-2.0.0.tar.gz
+tar xvf lexical_analysis-2.0.0.tar.gz

 # download ERNIE finetuned model
 wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/lexical_analysis_finetuned-1.0.0.tar.gz
 tar xvf lexical_analysis_finetuned-1.0.0.tar.gz
 ```

-注：下载 ERNIE 开放的模型请参考 [ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE)，下载后可放在 `./pretrained/` 目录下。
+注：若需进行ERNIE Finetune训练，需自行下载  [ERNIE](https://baidu-nlp.bj.bcebos.com/ERNIE_stable-1.0.1.tar.gz) 开放的模型，下载链接为： [https://baidu-nlp.bj.bcebos.com/ERNIE_stable-1.0.1.tar.gz](https://baidu-nlp.bj.bcebos.com/ERNIE_stable-1.0.1.tar.gz)，下载后解压至 `./pretrained/` 目录下。

 ### 模型评估
+
 我们基于自建的数据集训练了一个词法分析的模型，可以直接用这个模型对测试集 `./data/test.tsv` 进行验证，
 ```bash
 # baseline model
@@ -71,16 +84,33 @@ sh run_ernie.sh eval
 ```

 ### 模型训练
-基于示例的数据集，可以运行下面的命令，在训练集 `./data/train.tsv` 上进行训练
+基于示例的数据集，可通过下面的命令，在训练集 `./data/train.tsv` 上进行训练，示例包含程序在单机单卡/多卡，以及CPU多线程的运行设置
+> Warning: 若需进行ERNIE Finetune训练，需自行下载  [ERNIE](https://baidu-nlp.bj.bcebos.com/ERNIE_stable-1.0.1.tar.gz) 开放的模型，下载链接为： [https://baidu-nlp.bj.bcebos.com/ERNIE_stable-1.0.1.tar.gz](https://baidu-nlp.bj.bcebos.com/ERNIE_stable-1.0.1.tar.gz)，下载后解压至 `./pretrained/` 目录下。
+
 ```bash
-# baseline model
-sh run.sh train
+# baseline model, using single GPU
+sh run.sh train_single_gpu
+
+# baseline model, using multi GPU
+sh run.sh train_multi_gpu
+
+# baseline model, using multi CPU
+sh run.sh train_multi_cpu

 # ERNIE finetuned model
 sh run_ernie.sh train
+
+# ERNIE finetuned model, using single GPU
+sh run_ernie.sh train_single_gpu
+
+# ERNIE finetuned model, using multi CPU
+sh run_ernie.sh train_multi_cpu
 ```

+注：基于ERNIE 的序列标注模型暂不支持多GPU
+
 ### 模型预测
+
 加载已有的模型，对未知的数据进行预测
 ```bash
 # baseline model
@@ -90,6 +120,20 @@ sh run.sh infer
 sh run_ernie.sh infer
 ```

+### 模型保存
+
+将预训练好的模型转换为部署和预测用的模型
+
+```bash
+# baseline model
+export PYTHONIOENCODING=UTF-8   # 模型输出为Unicode编码，Python2若无此设置容易报错
+python inference_model.py \
+		--init_checkpoint ./model_baseline \
+		--inference_save_dir ./inference_model
+```
+
+
+
 ## 3. 进阶使用

 ### 任务定义与建模
@@ -99,7 +143,7 @@ sh run_ernie.sh infer
 3. 字向量序列作为双向 GRU 的输入，学习输入序列的特征表示，得到新的特性表示序列，我们堆叠了两层双向GRU以增加学习能力；
 4. CRF 以 GRU 学习到的特征为输入，以标记序列为监督信号，实现序列标注。

-词性和专名类别标签集合如下表，其中词性标签 24 个（小写字母），专名类别标签 4 个（大写字母）。这里需要说明的是，人名、地名、机名和时间四个类别，在上表中存在两套标签（PER / LOC / ORG / TIME 和 nr / ns / nt / t），被标注为第二套标签的词，是模型判断为低置信度的人名、地名、机构名和时间词。开发者可以基于这两套标签，在四个类别的准确、召回之间做出自己的权衡。
+词性和专名类别标签集合如下表，其中词性标签 24 个（小写字母），专名类别标签 4 个（大写字母）。这里需要说明的是，人名、地名、机构名和时间四个类别，在上表中存在两套标签（PER / LOC / ORG / TIME 和 nr / ns / nt / t），被标注为第二套标签的词，是模型判断为低置信度的人名、地名、机构名和时间词。开发者可以基于这两套标签，在四个类别的准确、召回之间做出自己的权衡。

 | 标签 | 含义     | 标签 | 含义     | 标签 | 含义     | 标签 | 含义     |
 | ---- | -------- | ---- | -------- | ---- | -------- | ---- | -------- |
@@ -141,14 +185,19 @@ sh run_ernie.sh infer
 ```text
 .
 ├── README.md                           # 本文档
-├── conf/                               # 词典目录
+├── conf/                               # 词典及程序默认配置的目录
+├── compare.py                          # 执行LAC与其他开源分词的对比脚本
+├── creator.py                          # 执行创建网络和数据读取器的脚本
 ├── data/                               # 存放数据集的目录
 ├── downloads.sh                        # 用于下载数据和模型的脚本
+├── eval.py                             # 词法分析评估的脚本
+├── inference_model.py                  # 执行保存inference_model的脚本，用于准备上线部署环境
 ├── gru-crf-model.png                   # README 用到的模型图片
+├── predict.py                          # 执行预测功能的脚本
 ├── reader.py                           # 文件读取相关函数
 ├── run_ernie_sequence_labeling.py      # 用于 finetune ERNIE 的代码
 ├── run_ernie.sh                        # 启用上面代码的脚本
-├── run_sequence_labeling.py            # 词法分析任务代码
+├── train.py                            # 词法分析训练脚本
 ├── run.sh                              # 启用上面代码的脚本
 └── utils.py                            # 常用工具函数
 ```

--- a/PaddleNLP/lexical_analysis/evaluate.py
+++ b/PaddleNLP/lexical_analysis/evaluate.py
@@ -11,7 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-#coding=utf-8
+# -*- coding: UTF-8 -*-
 """
 evaluate wordseg for LAC and other open-source wordseg tools
 """
@@ -275,6 +275,4 @@ def evaluate_all():


 if __name__ == "__main__":
-    import ipdb
-    #ipdb.set_trace()
    evaluate_all()
--- a/PaddleNLP/lexical_analysis/conf/args.yaml
+++ b/PaddleNLP/lexical_analysis/conf/args.yaml
+model:
+  word_emb_dim:
+    val: 128
+    meaning: "The dimension in which a word is embedded."
+  grnn_hidden_dim:
+    val: 128
+    meaning: "The number of hidden nodes in the GRNN layer."
+  bigru_num:
+    val: 2
+    meaning: "The number of bi_gru layers in the network."
+  init_checkpoint:
+    val: ""
+    meaning: "Path to init model"
+  inference_save_dir:
+    val: ""
+    meaning: "Path to save inference model"
+
+train:
+  random_seed:
+    val: 0
+    meaning: "Random seed for training"
+  print_steps:
+    val: 1
+    meaning: "Print the result per xxx batch of training"
+  save_steps:
+    val: 10
+    meaning: "Save the model once per xxxx batch of training"
+  validation_steps:
+    val: 10
+    meaning: "Do the validation once per xxxx batch of training"
+  batch_size:
+    val: 300
+    meaning: "The number of sequences contained in a mini-batch"
+  epoch:
+    val: 10
+    meaning: "Corpus iteration num"
+  use_cuda:
+    val: False
+    meaning: "If set, use GPU for training."
+  traindata_shuffle_buffer:
+    val: 20000
+    meaning: "The buffer size used in shuffle the training data."
+  base_learning_rate:
+    val: 0.001
+    meaning: "The basic learning rate that affects the entire network."
+  emb_learning_rate:
+    val: 2
+    meaning: "The real learning rate of the embedding layer will be (emb_learning_rate * base_learning_rate)."
+  crf_learning_rate:
+    val: 0.2
+    meaning: "The real learning rate of the embedding layer will be (crf_learning_rate * base_learning_rate)."
+  enable_ce:
+    val: false
+    meaning: 'If set, run the task with continuous evaluation logs.'
+  cpu_num:
+    val: 10
+    meaning: "The number of cpu used to train model, this argument wouldn't be valid if use_cuda=true"
+
+data:
+  word_dict_path:
+    val: "./conf/word.dic"
+    meaning: "The path of the word dictionary."
+  label_dict_path:
+    val: "./conf/tag.dic"
+    meaning: "The path of the label dictionary."
+  word_rep_dict_path:
+    val: "./conf/q2b.dic"
+    meaning: "The path of the word replacement Dictionary."
+  train_data:
+    val: "./data/train.tsv"
+    meaning: "The folder where the training data is located."
+  test_data:
+    val: "./data/test.tsv"
+    meaning: "The folder where the test data is located."
+  infer_data:
+    val: "./data/infer.tsv"
+    meaning: "The folder where the infer data is located."
+  model_save_dir:
+    val: "./models"
+    meaning: "The model will be saved in this path."
+
--- a/PaddleNLP/lexical_analysis/conf/ernie_args.yaml
+++ b/PaddleNLP/lexical_analysis/conf/ernie_args.yaml
+model:
+  ernie_config_path:
+    val: "../LARK/ERNIE/config/ernie_config.json"
+    meaning: "Path to the json file for ernie model config."
+  init_checkpoint:
+    val: ""
+    meaning: "Path to init model"
+  mode:
+    val: "train"
+    meaning: "Setting to train or eval or infer"
+  init_pretraining_params:
+    val: "pretrained/params/"
+    meaning: "Init pre-training params which preforms fine-tuning from. If the arg 'init_checkpoint' has been set, this argument wouldn't be valid."
+
+train:
+  random_seed:
+    val: 0
+    meaning: "Random seed for training"
+  batch_size:
+    val: 10
+    meaning: "The number of sequences contained in a mini-batch"
+  epoch:
+    val: 10
+    meaning: "Corpus iteration num"
+  use_cuda:
+    val: True
+    meaning: "If set, use GPU for training."
+  base_learning_rate:
+    val: 0.0002
+    meaning: "The basic learning rate that affects the entire network."
+  init_bound:
+    val: 0.1
+    meaning: "init bound for initialization."
+  crf_learning_rate:
+    val: 0.2
+    meaning: "The real learning rate of the embedding layer will be (crf_learning_rate * base_learning_rate)."
+  cpu_num:
+    val: 10
+    meaning: "The number of cpu used to train model, it works when use_cuda=False"
+  print_steps:
+    val: 1
+    meaning: "Print the result per xxx batch of training"
+  save_steps:
+    val: 10
+    meaning: "Save the model once per xxxx batch of training"
+  validation_steps:
+    val: 5
+    meaning: "Do the validation once per xxxx batch of training"
+
+data:
+  vocab_path:
+    val: "../LARK/ERNIE/config/vocab.txt"
+    meaning: "The path of the vocabulary."
+  label_map_config:
+    val: "./conf/label_map.json"
+    meaning: "The path of the label dictionary."
+  num_labels:
+    val: 57
+    meaning: "label number"
+  max_seq_len:
+    val: 128
+    meaning: "Number of words of the longest seqence."
+  do_lower_case:
+    val: True
+    meaning: "Whether to lower case the input text. Should be True for uncased models and False for cased models."
+  train_data:
+    val: "./data/train.tsv"
+    meaning: "The folder where the training data is located."
+  test_data:
+    val: "./data/test.tsv"
+    meaning: "The folder where the test data is located."
+  infer_data:
+    val: "./data/test.tsv"
+    meaning: "The folder where the infer data is located."
+  model_save_dir:
+    val: "./ernie_models"
+    meaning: "The model will be saved in this path."
--- a/PaddleNLP/lexical_analysis/creator.py
+++ b/PaddleNLP/lexical_analysis/creator.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# -*- coding: UTF-8 -*-
+"""
+The function lex_net(args) define the lexical analysis network structure
+"""
+import sys
+import os
+import math
+
+import paddle
+import paddle.fluid as fluid
+from paddle.fluid.initializer import NormalInitializer
+
+from reader import Dataset
+sys.path.append("..")
+from models.sequence_labeling import nets
+from models.representation.ernie import ernie_encoder
+from preprocess.ernie import task_reader
+
+def create_model(args,  vocab_size, num_labels, mode = 'train'):
+    """create lac model"""
+
+    # model's input data
+    words = fluid.layers.data(name='words', shape=[-1, 1], dtype='int64',lod_level=1)
+    targets = fluid.layers.data(name='targets', shape=[-1, 1], dtype='int64', lod_level= 1)
+
+    # for inference process
+    if mode=='infer':
+        crf_decode = nets.lex_net(words, args, vocab_size, num_labels, for_infer=True, target=None)
+        return { "feed_list":[words],"words":words, "crf_decode":crf_decode,}
+
+    # for test or train process
+    avg_cost, crf_decode = nets.lex_net(words, args, vocab_size, num_labels, for_infer=False, target=targets)
+
+    (precision, recall, f1_score, num_infer_chunks, num_label_chunks,
+     num_correct_chunks) = fluid.layers.chunk_eval(
+        input=crf_decode,
+        label=targets,
+        chunk_scheme="IOB",
+        num_chunk_types=int(math.ceil((num_labels - 1) / 2.0)))
+    chunk_evaluator = fluid.metrics.ChunkEvaluator()
+    chunk_evaluator.reset()
+
+    ret = {
+        "feed_list":[words, targets],
+        "words": words,
+        "targets": targets,
+        "avg_cost":avg_cost,
+        "crf_decode": crf_decode,
+        "precision" : precision,
+        "recall": recall,
+        "f1_score": f1_score,
+        "chunk_evaluator": chunk_evaluator,
+        "num_infer_chunks": num_infer_chunks,
+        "num_label_chunks": num_label_chunks,
+        "num_correct_chunks": num_correct_chunks
+    }
+    return  ret
+
+
+
+def create_pyreader(args, file_name, feed_list, place, model='lac', reader=None, return_reader=False, mode='train'):
+    # init reader
+    pyreader = fluid.io.PyReader(
+        feed_list=feed_list,
+        capacity=300,
+        use_double_buffer=True,
+        iterable=True
+    )
+    if model == 'lac':
+        if reader==None:
+            reader = Dataset(args)
+        # create lac pyreader
+        if mode == 'train':
+            pyreader.decorate_sample_list_generator(
+                paddle.batch(
+                    paddle.reader.shuffle(
+                        reader.file_reader(file_name),
+                        buf_size=args.traindata_shuffle_buffer
+                    ),
+                    batch_size=args.batch_size
+                ),
+                places=place
+            )
+        else:
+            pyreader.decorate_sample_list_generator(
+                paddle.batch(
+                    reader.file_reader(file_name, mode=mode),
+                    batch_size=args.batch_size
+                ),
+                places=place
+            )
+
+    elif model == 'ernie':
+        # create ernie pyreader
+        if reader==None:
+            reader = task_reader.SequenceLabelReader(
+                vocab_path=args.vocab_path,
+                label_map_config=args.label_map_config,
+                max_seq_len=args.max_seq_len,
+                do_lower_case=args.do_lower_case,
+                in_tokens=False,
+                random_seed=args.random_seed)
+
+        if mode == 'train':
+            pyreader.decorate_batch_generator(
+                reader.data_generator(
+                    file_name, args.batch_size, args.epoch, shuffle=True, phase="train"
+                ),
+                places=place
+            )
+        else:
+            pyreader.decorate_batch_generator(
+                reader.data_generator(
+                    file_name, args.batch_size, epoch=1, shuffle=False, phase=mode
+                ),
+                places=place
+            )
+
+    if return_reader:
+        return pyreader, reader
+    else:
+        return pyreader
+
+def create_ernie_model(args, ernie_config):
+
+    """
+    Create Model for LAC based on ERNIE encoder
+    """
+    # ERNIE's input data
+    src_ids = fluid.layers.data(name='src_ids', shape=[args.max_seq_len, 1], dtype='int64',lod_level=0)
+    sent_ids = fluid.layers.data(name='sent_ids', shape=[args.max_seq_len, 1], dtype='int64',lod_level=0)
+    pos_ids = fluid.layers.data(name='pos_ids', shape=[args.max_seq_len, 1], dtype='int64',lod_level=0)
+    input_mask = fluid.layers.data(name='input_mask', shape=[args.max_seq_len, 1], dtype='int64',lod_level=0)
+    padded_labels =fluid.layers.data(name='padded_labels', shape=[args.max_seq_len, 1], dtype='int64',lod_level=0)
+    seq_lens = fluid.layers.data(name='seq_lens', shape=[1], dtype='int64',lod_level=0)
+
+    ernie_inputs = {
+        "src_ids": src_ids,
+        "sent_ids": sent_ids,
+        "pos_ids": pos_ids,
+        "input_mask": input_mask,
+        "seq_lens": seq_lens
+    }
+    embeddings = ernie_encoder(ernie_inputs, ernie_config=ernie_config)
+
+    words = fluid.layers.sequence_unpad(src_ids, seq_lens)
+    labels = fluid.layers.sequence_unpad(padded_labels, seq_lens)
+
+    token_embeddings = embeddings["token_embeddings"]
+
+    emission = fluid.layers.fc(
+        size=args.num_labels,
+        input=token_embeddings,
+        param_attr=fluid.ParamAttr(
+            initializer=fluid.initializer.Uniform(
+                low=-args.init_bound, high=args.init_bound),
+            regularizer=fluid.regularizer.L2DecayRegularizer(
+                regularization_coeff=1e-4)))
+
+    crf_cost = fluid.layers.linear_chain_crf(
+        input=emission,
+        label=labels,
+        param_attr=fluid.ParamAttr(
+            name='crfw',
+            learning_rate=args.crf_learning_rate))
+    avg_cost = fluid.layers.mean(x=crf_cost)
+    crf_decode = fluid.layers.crf_decoding(
+        input=emission, param_attr=fluid.ParamAttr(name='crfw'))
+
+
+    (precision, recall, f1_score, num_infer_chunks, num_label_chunks,
+     num_correct_chunks) = fluid.layers.chunk_eval(
+         input=crf_decode,
+         label=labels,
+         chunk_scheme="IOB",
+         num_chunk_types=int(math.ceil((args.num_labels - 1) / 2.0)))
+    chunk_evaluator = fluid.metrics.ChunkEvaluator()
+    chunk_evaluator.reset()
+
+    ret = {
+        "feed_list": [src_ids, sent_ids, pos_ids, input_mask, padded_labels, seq_lens],
+        "words":words,
+        "labels":labels,
+        "avg_cost":avg_cost,
+        "crf_decode":crf_decode,
+        "precision" : precision,
+        "recall": recall,
+        "f1_score": f1_score,
+        "chunk_evaluator":chunk_evaluator,
+        "num_infer_chunks":num_infer_chunks,
+        "num_label_chunks":num_label_chunks,
+        "num_correct_chunks":num_correct_chunks
+    }
+
+    return ret
--- a/PaddleNLP/lexical_analysis/downloads.sh
+++ b/PaddleNLP/lexical_analysis/downloads.sh
@@ -5,9 +5,9 @@ if [ -d ./model_baseline/ ]
 then
    echo "./model_baseline/ directory already existed, ignore download"
 else
-    wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/lexical_analysis-1.0.0.tar.gz
-    tar xvf lexical_analysis-1.0.0.tar.gz
-    /bin/rm lexical_analysis-1.0.0.tar.gz
+    wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/lexical_analysis-2.0.0.tar.gz
+    tar xvf lexical_analysis-2.0.0.tar.gz
+    /bin/rm lexical_analysis-2.0.0.tar.gz
 fi

 # download dataset file to ./data/
@@ -15,9 +15,9 @@ if [ -d ./data/ ]
 then
    echo "./data/ directory already existed, ignore download"
 else
-    wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/lexical_analysis-dataset-1.0.0.tar.gz
-    tar xvf lexical_analysis-dataset-1.0.0.tar.gz
-    /bin/rm lexical_analysis-dataset-1.0.0.tar.gz
+    wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/lexical_analysis-dataset-2.0.0.tar.gz
+    tar xvf lexical_analysis-dataset-2.0.0.tar.gz
+    /bin/rm lexical_analysis-dataset-2.0.0.tar.gz
 fi

 # download ERNIE pretrained model to ./pretrained/

--- a/PaddleNLP/lexical_analysis/eval.py
+++ b/PaddleNLP/lexical_analysis/eval.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# -*- coding: UTF-8 -*-
+import argparse
+import os
+import time
+import sys
+
+import paddle.fluid as fluid
+import paddle
+
+import utils
+import reader
+import creator
+sys.path.append('../models/')
+from model_check import check_cuda
+
+parser = argparse.ArgumentParser(__doc__)
+# 1. model parameters
+model_g = utils.ArgumentGroup(parser, "model", "model configuration")
+model_g.add_arg("word_emb_dim", int, 128, "The dimension in which a word is embedded.")
+model_g.add_arg("grnn_hidden_dim", int, 128, "The number of hidden nodes in the GRNN layer.")
+model_g.add_arg("bigru_num", int, 2, "The number of bi_gru layers in the network.")
+model_g.add_arg("use_cuda", bool, False, "If set, use GPU for training.")
+
+# 2. data parameters
+data_g = utils.ArgumentGroup(parser, "data", "data paths")
+data_g.add_arg("word_dict_path", str, "./conf/word.dic", "The path of the word dictionary.")
+data_g.add_arg("label_dict_path", str, "./conf/tag.dic", "The path of the label dictionary.")
+data_g.add_arg("word_rep_dict_path", str, "./conf/q2b.dic", "The path of the word replacement Dictionary.")
+data_g.add_arg("test_data", str, "./data/test.tsv", "The folder where the training data is located.")
+data_g.add_arg("init_checkpoint", str, "./model_baseline", "Path to init model")
+data_g.add_arg("batch_size", int, 200, "The number of sequences contained in a mini-batch, "
+        "or the maximum number of tokens (include paddings) contained in a mini-batch.")
+
+def do_eval(args):
+    dataset = reader.Dataset(args)
+
+    test_program = fluid.Program()
+    with fluid.program_guard(test_program, fluid.default_startup_program()):
+        with fluid.unique_name.guard():
+            test_ret = creator.create_model(
+                args, dataset.vocab_size, dataset.num_labels, mode='test')
+    test_program = test_program.clone(for_test=True)
+
+    # init executor
+    if args.use_cuda:
+        place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
+    else:
+        place = fluid.CPUPlace()
+
+    pyreader = creator.create_pyreader(args, file_name=args.test_data,
+                                       feed_list=test_ret['feed_list'],
+                                       place=place,
+                                       model='lac',
+                                       reader=dataset,
+                                       mode='test')
+
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+
+    # load model
+    utils.init_checkpoint(exe, args.init_checkpoint, test_program)
+    test_process(exe=exe,
+                 program=test_program,
+                 reader=pyreader,
+                 test_ret=test_ret
+                 )
+
+def test_process(exe, program, reader, test_ret):
+    """
+    the function to execute the infer process
+    :param exe: the fluid Executor
+    :param program: the infer_program
+    :param reader: data reader
+    :return: the list of prediction result
+    """
+    test_ret["chunk_evaluator"].reset()
+
+    start_time = time.time()
+    for data in reader():
+
+        nums_infer, nums_label, nums_correct = exe.run(program,
+                                fetch_list=[
+                                    test_ret["num_infer_chunks"],
+                                    test_ret["num_label_chunks"],
+                                    test_ret["num_correct_chunks"],
+                                ],
+                             feed=data,
+                             )
+
+        test_ret["chunk_evaluator"].update(nums_infer, nums_label, nums_correct)
+    precision, recall, f1 = test_ret["chunk_evaluator"].eval()
+    end_time = time.time()
+    print("[test] P: %.5f, R: %.5f, F1: %.5f, elapsed time: %.3f s"
+          % (precision, recall, f1, end_time - start_time))
+
+if __name__ == '__main__':
+    args = parser.parse_args()
+    check_cuda(args.use_cuda)
+    do_eval(args)
--- a/PaddleNLP/lexical_analysis/inference_model.py
+++ b/PaddleNLP/lexical_analysis/inference_model.py
+# -*- coding: UTF-8 -*-
+
+import argparse
+import sys
+import os
+
+import numpy as np
+import paddle.fluid as fluid
+
+import creator
+import reader
+import utils
+sys.path.append('../models/')
+from model_check import check_cuda
+
+def save_inference_model(args):
+
+    # model definition
+    if args.use_cuda:
+        place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
+    else:
+        place = fluid.CPUPlace()
+    dataset = reader.Dataset(args)
+    infer_program = fluid.Program()
+    with fluid.program_guard(infer_program, fluid.default_startup_program()):
+        with fluid.unique_name.guard():
+
+            infer_ret = creator.create_model(
+                args, dataset.vocab_size, dataset.num_labels, mode='infer')
+            infer_program = infer_program.clone(for_test=True)
+
+
+    # load pretrain check point
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    utils.init_checkpoint(exe, args.init_checkpoint, infer_program)
+
+    fluid.io.save_inference_model(args.inference_save_dir,
+                                  ['words'],
+                                  infer_ret['crf_decode'],
+                                  exe,
+                                  main_program=infer_program,
+                                  model_filename='model.pdmodel',
+                                  params_filename='params.pdparams',
+                                  )
+
+
+def test_inference_model(model_dir, text_list, dataset):
+    """
+    :param model_dir: model's dir
+    :param text_list: a list of input text, which decode as unicode
+    :param dataset:
+    :return:
+    """
+    # init executor
+    if args.use_cuda:
+        place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
+    else:
+        place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    # transfer text data to input tensor
+    lod = []
+    for text in text_list:
+        lod.append(np.array(dataset.word_to_ids(text.strip())).astype(np.int64))
+    base_shape = [[len(c) for c in lod]]
+    tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
+
+    # for empty input, output the same empty
+    if(sum(base_shape[0]) == 0 ):
+        crf_decode = [tensor_words]
+    else:
+        # load inference model
+        inference_scope = fluid.core.Scope()
+        with fluid.scope_guard(inference_scope):
+            [inferencer, feed_target_names,
+            fetch_targets] = fluid.io.load_inference_model(model_dir, exe,
+                    model_filename='model.pdmodel',
+                    params_filename='params.pdparams',
+                    )
+            assert feed_target_names[0] == "words"
+            print("Load inference model from %s"%(model_dir))
+
+            # get lac result
+            crf_decode = exe.run(inferencer,
+                             feed={feed_target_names[0]:tensor_words},
+                             fetch_list=fetch_targets,
+                             return_numpy=False,
+                             use_program_cache=True,
+                             )
+
+    # parse the crf_decode result
+    result = utils.parse_result(tensor_words,crf_decode[0], dataset)
+    for i,(sent, tags) in enumerate(result):
+        result_list = ['(%s, %s)'%(ch, tag) for ch, tag in zip(sent,tags)]
+        print(''.join(result_list))
+
+
+if __name__=="__main__":
+    parser = argparse.ArgumentParser(__doc__)
+    utils.load_yaml(parser,'conf/args.yaml')
+    args = parser.parse_args()
+    check_cuda(args.use_cuda)
+    print("save inference model")
+    save_inference_model(args)
+    
+    print("inference model save in %s"%args.inference_save_dir)
+    print("test inference model")
+    dataset = reader.Dataset(args)
+    test_data = [u'百度是一家高科技公司', u'中山大学是岭南第一学府']
+    test_inference_model(args.inference_save_dir, test_data, dataset)
--- a/PaddleNLP/lexical_analysis/predict.py
+++ b/PaddleNLP/lexical_analysis/predict.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# -*- coding: UTF-8 -*-
+import argparse
+import os
+import time
+import sys
+
+import paddle.fluid as fluid
+import paddle
+
+import utils
+import reader
+import creator
+sys.path.append('../models/')
+from model_check import check_cuda
+
+parser = argparse.ArgumentParser(__doc__)
+# 1. model parameters
+model_g = utils.ArgumentGroup(parser, "model", "model configuration")
+model_g.add_arg("word_emb_dim", int, 128, "The dimension in which a word is embedded.")
+model_g.add_arg("grnn_hidden_dim", int, 256, "The number of hidden nodes in the GRNN layer.")
+model_g.add_arg("bigru_num", int, 2, "The number of bi_gru layers in the network.")
+model_g.add_arg("use_cuda", bool, False, "If set, use GPU for training.")
+
+# 2. data parameters
+data_g = utils.ArgumentGroup(parser, "data", "data paths")
+data_g.add_arg("word_dict_path", str, "./conf/word.dic", "The path of the word dictionary.")
+data_g.add_arg("label_dict_path", str, "./conf/tag.dic", "The path of the label dictionary.")
+data_g.add_arg("word_rep_dict_path", str, "./conf/q2b.dic", "The path of the word replacement Dictionary.")
+data_g.add_arg("infer_data", str, "./data/infer.tsv", "The folder where the training data is located.")
+data_g.add_arg("init_checkpoint", str, "./model_baseline", "Path to init model")
+data_g.add_arg("batch_size", int, 200, "The number of sequences contained in a mini-batch, "
+        "or the maximum number of tokens (include paddings) contained in a mini-batch.")
+
+def do_infer(args):
+    dataset = reader.Dataset(args)
+
+    infer_program = fluid.Program()
+    with fluid.program_guard(infer_program, fluid.default_startup_program()):
+        with fluid.unique_name.guard():
+
+            infer_ret = creator.create_model(
+                args, dataset.vocab_size, dataset.num_labels, mode='infer')
+    infer_program = infer_program.clone(for_test=True)
+
+    if args.use_cuda:
+        place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
+    else:
+        place = fluid.CPUPlace()
+
+
+
+    pyreader = creator.create_pyreader(args, file_name=args.infer_data,
+                                       feed_list=infer_ret['feed_list'],
+                                       place=place,
+                                       model='lac',
+                                       reader=dataset,
+                                       mode='infer')
+
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+
+    # load model
+    utils.init_checkpoint(exe, args.init_checkpoint, infer_program)
+
+    result = infer_process(
+        exe=exe,
+        program=infer_program,
+        reader=pyreader,
+        fetch_vars=[infer_ret['words'], infer_ret['crf_decode']],
+        dataset=dataset
+    )
+    for sent, tags in result:
+        result_list = ['(%s, %s)' % (ch, tag) for ch, tag in zip(sent, tags)]
+        print(''.join(result_list))
+
+
+def infer_process(exe, program, reader, fetch_vars, dataset):
+    """
+    the function to execute the infer process
+    :param exe: the fluid Executor
+    :param program: the infer_program
+    :param reader: data reader
+    :return: the list of prediction result
+    """
+    def input_check(data):
+        if data[0]['words'].lod()[0][-1]==0:
+            return data[0]['words']
+        return None
+
+    results = []
+    for data in reader():
+        crf_decode = input_check(data)
+        if crf_decode:
+            results += utils.parse_result(crf_decode, crf_decode, dataset)
+            continue
+
+        words, crf_decode = exe.run(program,
+                             fetch_list=fetch_vars,
+                             feed=data,
+                             return_numpy=False,
+                             use_program_cache=True,
+                             )
+        results += utils.parse_result(words, crf_decode, dataset)
+    return results
+
+
+if __name__=="__main__":
+    args = parser.parse_args()
+    check_cuda(args.use_cuda)
+    do_infer(args)
--- a/PaddleNLP/lexical_analysis/reader.py
+++ b/PaddleNLP/lexical_analysis/reader.py
@@ -73,18 +73,18 @@ class Dataset(object):

    def get_num_examples(self, filename):
        """num of line of file"""
-        return sum(1 for line in io.open(filename, "r", encoding='utf-8'))
+        return sum(1 for line in open(filename, "r"))

    def word_to_ids(self, words):
        """convert word to word index"""
        word_ids = []
        for word in words:
-            if word in self.word_replace_dict:
-                word = self.word_replace_dict[word]
+            word = self.word_replace_dict.get(word, word)
            if word not in self.word2id_dict:
                word = "OOV"
            word_id = self.word2id_dict[word]
            word_ids.append(word_id)
+
        return word_ids

    def label_to_ids(self, labels):
@@ -105,20 +105,19 @@ class Dataset(object):

        def wrapper():
            fread = io.open(filename, "r", encoding="utf-8")
-            headline = next(fread)
-            headline = headline.strip().split("\t")
            if mode == "infer":
-                assert len(headline) == 1 and headline[0] == "text_a"
                for line in fread:
-                    words = line.strip("\n").split("\002")
+                    words= line.strip()
                    word_ids = self.word_to_ids(words)
-                    yield word_ids[0:max_seq_len], [0 for _ in word_ids][
-                        0:max_seq_len]
+                    yield (word_ids[0:max_seq_len],)
            else:
-                assert len(headline) == 2 and headline[
-                    0] == "text_a" and headline[1] == "label"
+                headline = next(fread)
+                headline = headline.strip().split('\t')
+                assert len(headline) == 2 and headline[0] == "text_a" and headline[1] == "label"
                for line in fread:
                    words, labels = line.strip("\n").split("\t")
+                    if len(words)<1:
+                        continue
                    word_ids = self.word_to_ids(words.split("\002"))
                    label_ids = self.label_to_ids(labels.split("\002"))
                    assert len(word_ids) == len(label_ids)

--- a/PaddleNLP/lexical_analysis/run.sh
+++ b/PaddleNLP/lexical_analysis/run.sh
 #!/bin/bash
-export FLAGS_fraction_of_gpu_memory_to_use=0.5
+export FLAGS_fraction_of_gpu_memory_to_use=0.02
 export FLAGS_eager_delete_tensor_gb=0.0
 export FLAGS_fast_eager_deletion_mode=1
-export CUDA_VISIBLE_DEVICES=2       #   which GPU to use
-
-#alias python='./anaconda2/bin/python'
+export CUDA_VISIBLE_DEVICES=0,1,2,3     #   which GPU to use

 function run_train() {
    echo "training"
-    python run_sequence_labeling.py \
-        --do_train True \
-        --do_test True \
-        --do_infer False \
+    python train.py \
        --train_data ./data/train.tsv \
        --test_data ./data/test.tsv \
        --model_save_dir ./models \
-        --valid_model_per_batches 1000 \
-        --save_model_per_batches 10000 \
-        --batch_size 100 \
+        --validation_steps 2 \
+        --save_steps 10 \
+        --print_steps 1 \
+        --batch_size 300 \
        --epoch 10 \
-        --use_cuda true \
-        --traindata_shuffle_buffer 200000 \
-        --word_emb_dim 768 \
-        --grnn_hidden_dim 768 \
+        --traindata_shuffle_buffer 20000 \
+        --word_emb_dim 128 \
+        --grnn_hidden_dim 128 \
        --bigru_num 2 \
        --base_learning_rate 1e-3 \
-        --emb_learning_rate 5 \
+        --emb_learning_rate 2 \
        --crf_learning_rate 0.2 \
        --word_dict_path ./conf/word.dic \
        --label_dict_path ./conf/tag.dic \
-        --word_rep_dict_path ./conf/q2b.dic
+        --word_rep_dict_path ./conf/q2b.dic \
+        --enable_ce false \
+        --use_cuda false \
+        --cpu_num 1
+}
+
+function run_train_single_gpu() {
+    echo "single gpu training"              # which GPU to use
+    export CUDA_VISIBLE_DEVICES=0
+    python train.py \
+        --use_cuda true
+}
+
+function run_train_multi_gpu() {
+    echo "multi gpu training"
+    export CUDA_VISIBLE_DEVICES=0,1,2,3     # which GPU to use
+    python train.py \
+        --use_cuda true
+}
+
+function run_train_multi_cpu() {
+    echo "multi cpu training"
+    python train.py \
+        --use_cuda false \
+        --cpu_num 10         #cpu_num works only when use_cuda=false
 }

 function run_eval() {
    echo "evaluating"
    echo "this may cost about 5 minutes if run on you CPU machine"
-    python run_sequence_labeling.py \
-        --do_train False \
-        --do_test True \
-        --do_infer False \
-        --batch_size 80 \
-        --word_emb_dim 768 \
-        --grnn_hidden_dim 768 \
+    python eval.py \
+        --batch_size 200 \
+        --word_emb_dim 128 \
+        --grnn_hidden_dim 128 \
        --bigru_num 2 \
-        --use_cuda True \
+        --use_cuda False \
        --init_checkpoint ./model_baseline \
        --test_data ./data/test.tsv \
        --word_dict_path ./conf/word.dic \
@@ -54,42 +70,66 @@ function run_eval() {

 function run_infer() {
    echo "infering"
-    python run_sequence_labeling.py \
-        --do_train False \
-        --do_test False \
-        --do_infer True \
-        --batch_size 80 \
-        --word_emb_dim 768 \
-        --grnn_hidden_dim 768 \
+    python predict.py \
+        --batch_size 200 \
+        --word_emb_dim 128 \
+        --grnn_hidden_dim 128 \
        --bigru_num 2 \
-        --use_cuda True \
-        --init_checkpoint ./model_baseline/ \
-        --infer_data ./data/test.tsv \
+        --use_cuda False \
+        --init_checkpoint ./model_baseline \
+        --infer_data ./data/infer.tsv \
        --word_dict_path ./conf/word.dic \
        --label_dict_path ./conf/tag.dic \
        --word_rep_dict_path ./conf/q2b.dic
 }


+function run_inference() {
+    echo "inference model"
+    python inference_model.py \
+        --word_emb_dim 128 \
+        --grnn_hidden_dim 128 \
+        --bigru_num 2 \
+        --use_cuda False \
+        --init_checkpoint ./model_baseline \
+        --word_dict_path ./conf/word.dic \
+        --label_dict_path ./conf/tag.dic \
+        --word_rep_dict_path ./conf/q2b.dic \
+        --inference_save_dir ./infer_model
+}
+
+
 function main() {
    local cmd=${1:-help}
    case "${cmd}" in
        train)
            run_train "$@";
            ;;
+        train_single_gpu)
+            run_train_single_gpu "$@";
+            ;;
+        train_multi_gpu)
+            run_train_multi_gpu "$@";
+            ;;
+        train_multi_cpu)
+            run_train_multi_cpu "$@";
+            ;;
        eval)
            run_eval "$@";
            ;;
        infer)
            run_infer "$@";
            ;;
+        inference)
+            run_inference "$@";
+            ;;
        help)
-            echo "Usage: ${BASH_SOURCE} {train|test|infer}";
+            echo "Usage: ${BASH_SOURCE} {train|train_single_gpu|train_multi_gpu|train_multi_cpu|eval|infer}";
            return 0;
            ;;
        *)
            echo "unsupport command [${cmd}]";
-            echo "Usage: ${BASH_SOURCE} {train|eval|infer}";
+            echo "Usage: ${BASH_SOURCE} {train|train_single_gpu|train_multi_gpu|train_multi_cpu|eval|infer}";
            return 1;
            ;;
    esac

--- a/PaddleNLP/lexical_analysis/run_ernie.sh
+++ b/PaddleNLP/lexical_analysis/run_ernie.sh
-set -eux
-
+#set -eux
+export FLAGS_fraction_of_gpu_memory_to_use=0.02
 export FLAGS_eager_delete_tensor_gb=0.0
 export FLAGS_fast_eager_deletion_mode=1
-export FLAGS_sync_nccl_allreduce=1
-export FLAGS_selected_gpus=0        # which GPU to use
-export CUDA_VISIBLE_DEVICES=0
+# export FLAGS_sync_nccl_allreduce=1
+# export NCCL_DEBUG=INFO
+# export NCCL_IB_GID_INDEX=3
+# export GLOG_v=1
+# export GLOG_logtostderr=1
+export CUDA_VISIBLE_DEVICES=0        # which GPU to use

 ERNIE_PRETRAINED_MODEL_PATH=./pretrained/
-ERNIE_FINETUNED_MODEL_PATH=./model_finetuned/
+ERNIE_FINETUNED_MODEL_PATH=./model_finetuned
 DATA_PATH=./data/
-
 # train
 function run_train() {
    echo "training"
    python run_ernie_sequence_labeling.py \
+        --mode train \
        --ernie_config_path "${ERNIE_PRETRAINED_MODEL_PATH}/ernie_config.json" \
-        --checkpoints "./checkpoints" \
+        --model_save_dir "./ernie_models" \
        --init_pretraining_params "${ERNIE_PRETRAINED_MODEL_PATH}/params/" \
        --epoch 10 \
-        --save_steps 1000 \
-        --validation_steps 1000 \
-        --lr 2e-4 \
+        --save_steps 5 \
+        --validation_steps 5 \
+        --base_learning_rate 2e-4 \
        --crf_learning_rate 0.2 \
        --init_bound 0.1 \
-        --skip_steps 1 \
+        --print_steps 1 \
        --vocab_path "${ERNIE_PRETRAINED_MODEL_PATH}/vocab.txt" \
-        --batch_size 64 \
+        --batch_size 3 \
        --random_seed 0 \
        --num_labels 57 \
        --max_seq_len 128 \
-        --train_set "${DATA_PATH}/train.tsv" \
-        --test_set "${DATA_PATH}/test.tsv" \
+        --train_data "${DATA_PATH}/train.tsv" \
+        --test_data "${DATA_PATH}/test.tsv" \
        --label_map_config "./conf/label_map.json" \
        --do_lower_case true \
        --use_cuda false \
-        --do_train true \
-        --do_test true \
-        --do_infer false
+        --cpu_num 1
+}
+
+function run_train_single_gpu() {
+    echo "single gpu training"              # which GPU to use
+    export CUDA_VISIBLE_DEVICES=0
+    python run_ernie_sequence_labeling.py \
+        --mode train \
+        --ernie_config_path "${ERNIE_PRETRAINED_MODEL_PATH}/ernie_config.json" \
+        --init_pretraining_params "${ERNIE_PRETRAINED_MODEL_PATH}/params/" \
+        --vocab_path "${ERNIE_PRETRAINED_MODEL_PATH}/vocab.txt" \
+        --use_cuda true
+}
+
+
+function run_train_multi_cpu() {
+    echo "multi cpu training"
+    python run_ernie_sequence_labeling.py \
+        --mode train \
+        --ernie_config_path "${ERNIE_PRETRAINED_MODEL_PATH}/ernie_config.json" \
+        --init_pretraining_params "${ERNIE_PRETRAINED_MODEL_PATH}/params/" \
+        --vocab_path "${ERNIE_PRETRAINED_MODEL_PATH}/vocab.txt" \
+        --use_cuda false \
+        --batch_size 64 \
+        --cpu_num 8         #cpu_num works only when use_cuda=false
 }


 function run_eval() {
    echo "evaluating"
    python run_ernie_sequence_labeling.py \
+        --mode eval \
        --ernie_config_path "${ERNIE_PRETRAINED_MODEL_PATH}/ernie_config.json" \
-        --init_pretraining_params "${ERNIE_PRETRAINED_MODEL_PATH}/params/" \
        --init_checkpoint "${ERNIE_FINETUNED_MODEL_PATH}" \
        --init_bound 0.1 \
        --vocab_path "${ERNIE_PRETRAINED_MODEL_PATH}/vocab.txt" \
@@ -52,21 +77,18 @@ function run_eval() {
        --random_seed 0 \
        --num_labels 57 \
        --max_seq_len 128 \
-        --test_set "${DATA_PATH}/test.tsv" \
+        --test_data "${DATA_PATH}/test.tsv" \
        --label_map_config "./conf/label_map.json" \
        --do_lower_case true \
-        --use_cuda true \
-        --do_train false \
-        --do_test true \
-        --do_infer false
-}
+        --use_cuda false

+}

 function run_infer() {
    echo "infering"
    python run_ernie_sequence_labeling.py \
+        --mode infer \
        --ernie_config_path "${ERNIE_PRETRAINED_MODEL_PATH}/ernie_config.json" \
-        --init_pretraining_params "${ERNIE_PRETRAINED_MODEL_PATH}/params/" \
        --init_checkpoint "${ERNIE_FINETUNED_MODEL_PATH}" \
        --init_bound 0.1 \
        --vocab_path "${ERNIE_PRETRAINED_MODEL_PATH}/vocab.txt" \
@@ -74,13 +96,11 @@ function run_infer() {
        --random_seed 0 \
        --num_labels 57 \
        --max_seq_len 128 \
-        --infer_set "${DATA_PATH}/test.tsv" \
+        --test_data "${DATA_PATH}/test.tsv" \
        --label_map_config "./conf/label_map.json" \
        --do_lower_case true \
-        --use_cuda true \
-        --do_train false \
-        --do_test false \
-        --do_infer true
+        --use_cuda false
+
 }


@@ -90,6 +110,12 @@ function main() {
        train)
            run_train "$@";
            ;;
+        train_single_gpu)
+            run_train_single_gpu "$@";
+            ;;
+        train_multi_cpu)
+            run_train_multi_cpu "$@";
+            ;;
        eval)
            run_eval "$@";
            ;;
@@ -97,12 +123,12 @@ function main() {
            run_infer "$@";
            ;;
        help)
-            echo "Usage: ${BASH_SOURCE} {train|test|infer}";
+            echo "Usage: ${BASH_SOURCE} {train|train_single_gpu|train_multi_cpu|eval|infer}";
            return 0;
            ;;
        *)
            echo "unsupport command [${cmd}]";
-            echo "Usage: ${BASH_SOURCE} {train|eval|infer}";
+            echo "Usage: ${BASH_SOURCE} {train|train_single_gpu|train_multi_cpu|eval|infer}";
            return 1;
            ;;
    esac

--- a/PaddleNLP/lexical_analysis/run_ernie_sequence_labeling.py
+++ b/PaddleNLP/lexical_analysis/run_ernie_sequence_labeling.py
@@ -12,7 +12,10 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
-Sentiment Classification Task
+Baidu's open-source Lexical Analysis tool for Chinese, including:
+    1. Word Segmentation,
+    2. Part-of-Speech Tagging
+    3. Named Entity Recognition
 """

 from __future__ import absolute_import
@@ -21,378 +24,261 @@ from __future__ import print_function

 import os
 import time
-import math
 import argparse
 import numpy as np
 import multiprocessing
 import sys
+from collections import namedtuple

-import paddle
 import paddle.fluid as fluid
-from collections import namedtuple

+import creator
+import utils
 sys.path.append("..")
-print(sys.path)
-from preprocess.ernie import task_reader
-
 from models.representation.ernie import ErnieConfig
-from models.representation.ernie import ernie_encoder
-#from models.representation.ernie import ernie_pyreader
-from models.sequence_labeling import nets
-import utils
-
-# yapf: disable
-parser = argparse.ArgumentParser(__doc__)
-model_g = utils.ArgumentGroup(parser, "model", "model configuration and paths.")
-model_g.add_arg("ernie_config_path", str, "../LARK/ERNIE/config/ernie_config.json",
-        "Path to the json file for ernie model config.")
-model_g.add_arg("lac_config_path", str, None, "Path to the json file for LAC model config.")
-model_g.add_arg("init_checkpoint", str, None, "Init checkpoint to resume training from.")
-model_g.add_arg("checkpoints", str, None, "Path to save checkpoints")
-model_g.add_arg("init_pretraining_params", str, "pretrained/params/",
-                "Init pre-training params which preforms fine-tuning from. If the "
-                 "arg 'init_checkpoint' has been set, this argument wouldn't be valid.")
-
-train_g = utils.ArgumentGroup(parser, "training", "training options.")
-train_g.add_arg("epoch", int, 10, "Number of epoches for training.")
-train_g.add_arg("save_steps", int, 10000, "The steps interval to save checkpoints.")
-train_g.add_arg("validation_steps", int, 1000, "The steps interval to evaluate model performance.")
-train_g.add_arg("lr", float, 0.001, "The Learning rate value for training.")
-train_g.add_arg("crf_learning_rate", float, 0.2,
-    "The real learning rate of the embedding layer will be (crf_learning_rate * base_learning_rate).")
-train_g.add_arg("init_bound", float, 0.1, "init bound for initialization.")
-
-log_g = utils.ArgumentGroup(parser, "logging", "logging related")
-log_g.add_arg("skip_steps", int, 1, "The steps interval to print loss.")
-
-data_g = utils.ArgumentGroup(parser, "data", "Data paths, vocab paths and data processing options")
-data_g.add_arg("vocab_path", str, "../LARK/ERNIE/config/vocab.txt", "Vocabulary path.")
-data_g.add_arg("batch_size", int, 3, "Total examples' number in batch for training.")
-data_g.add_arg("random_seed", int, 0, "Random seed.")
-data_g.add_arg("num_labels", int, 57, "label number")
-data_g.add_arg("max_seq_len", int, 512, "Number of words of the longest seqence.")
-data_g.add_arg("train_set", str, "./data/train.tsv", "Path to train data.")
-data_g.add_arg("test_set", str, "./data/test.tsv", "Path to test data.")
-data_g.add_arg("infer_set", str, "./data/test.tsv", "Path to infer data.")
-data_g.add_arg("label_map_config", str, "./conf/label_map.json", "label_map_path.")
-data_g.add_arg("do_lower_case", bool, True,
-        "Whether to lower case the input text. Should be True for uncased models and False for cased models.")
-
-run_type_g = utils.ArgumentGroup(parser, "run_type", "running type options.")
-run_type_g.add_arg("use_cuda", bool, True, "If set, use GPU for training.")
-run_type_g.add_arg("do_train", bool, True, "Whether to perform training.")
-run_type_g.add_arg("do_test", bool, True, "Whether to perform testing.")
-run_type_g.add_arg("do_infer", bool, True, "Whether to perform inference.")
-
-args = parser.parse_args()
-# yapf: enable.
-
-sys.path.append('../models/')
-from model_check import check_cuda
-check_cuda(args.use_cuda)
-
-def ernie_pyreader(args, pyreader_name):
-    """define standard ernie pyreader"""
-    pyreader = fluid.layers.py_reader(
-        capacity=50,
-        shapes=[[-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1],
-                [-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1], [-1, args.max_seq_len, 1],
-                [-1, 1]],
-        dtypes=['int64', 'int64', 'int64', 'float32', 'int64', 'int64'],
-        lod_levels=[0, 0, 0, 0, 0, 0],
-        name=pyreader_name,
-        use_double_buffer=True)
-
-    (src_ids, sent_ids, pos_ids, input_mask, padded_labels, seq_lens) = fluid.layers.read_file(pyreader)
-
-    words = fluid.layers.sequence_unpad(src_ids, seq_lens)
-    labels = fluid.layers.sequence_unpad(padded_labels, seq_lens)
-
-    ernie_inputs = {
-        "src_ids": src_ids,
-        "sent_ids": sent_ids,
-        "pos_ids": pos_ids,
-        "input_mask": input_mask,
-        "seq_lens": seq_lens
-    }
-    return pyreader, ernie_inputs, words, labels
-
-
-def create_model(args,
-                 embeddings,
-                 labels,
-                 is_prediction=False):
-
-    """
-    Create Model for LAC based on ERNIE encoder
-    """
-    # sentence_embeddings = embeddings["sentence_embeddings"]
-    token_embeddings = embeddings["token_embeddings"]
-
-    emission = fluid.layers.fc(
-        size=args.num_labels,
-        input=token_embeddings,
-        param_attr=fluid.ParamAttr(
-            initializer=fluid.initializer.Uniform(
-                low=-args.init_bound, high=args.init_bound),
-            regularizer=fluid.regularizer.L2DecayRegularizer(
-                regularization_coeff=1e-4)))
-
-    crf_cost = fluid.layers.linear_chain_crf(
-        input=emission,
-        label=labels,
-        param_attr=fluid.ParamAttr(
-            name='crfw',
-            learning_rate=args.crf_learning_rate))
-    loss = fluid.layers.mean(x=crf_cost)
-
-    crf_decode = fluid.layers.crf_decoding(
-        input=emission, param_attr=fluid.ParamAttr(name='crfw'))
-
-    (precision, recall, f1_score, num_infer_chunks, num_label_chunks,
-     num_correct_chunks) = fluid.layers.chunk_eval(
-         input=crf_decode,
-         label=labels,
-         chunk_scheme="IOB",
-         num_chunk_types=int(math.ceil((args.num_labels - 1) / 2.0)))
-    chunk_evaluator = fluid.metrics.ChunkEvaluator()
-    chunk_evaluator.reset()
-
-    ret = {
-        "loss":loss,
-        "crf_decode":crf_decode,
-        "chunk_evaluator":chunk_evaluator,
-        "num_infer_chunks":num_infer_chunks,
-        "num_label_chunks":num_label_chunks,
-        "num_correct_chunks":num_correct_chunks
-    }
-
-    return ret
-
+from models.model_check import check_cuda

 def evaluate(exe, test_program, test_pyreader, test_ret):
    """
    Evaluation Function
    """
-    test_pyreader.start()
    test_ret["chunk_evaluator"].reset()
-    total_loss, precision, recall, f1 = [], [], [], []
+    total_loss = []
    start_time = time.time()
-    while True:
-        try:
-            loss, nums_infer, nums_label, nums_correct = exe.run(
-                test_program,
-                fetch_list=[
-                    test_ret["loss"],
-                    test_ret["num_infer_chunks"],
-                    test_ret["num_label_chunks"],
-                    test_ret["num_correct_chunks"],
-                ],
-            )
-            total_loss.append(loss)
-
-            test_ret["chunk_evaluator"].update(nums_infer, nums_label, nums_correct)
-            p, r, f = test_ret["chunk_evaluator"].eval()
-            precision.append(p)
-            recall.append(r)
-            f1.append(f)
-
-        except fluid.core.EOFException:
-            test_pyreader.reset()
-            break
+    for data in test_pyreader():
+        loss, nums_infer, nums_label, nums_correct = exe.run(
+            test_program,
+            fetch_list=[
+                test_ret["avg_cost"],
+                test_ret["num_infer_chunks"],
+                test_ret["num_label_chunks"],
+                test_ret["num_correct_chunks"],
+            ],
+            feed=data[0]
+        )
+        total_loss.append(loss)
+
+        test_ret["chunk_evaluator"].update(nums_infer, nums_label, nums_correct)
+
+    precision, recall, f1 = test_ret["chunk_evaluator"].eval()
    end_time = time.time()
-    print("\t[test] loss: %.5f, P: %.5f, R: %.5f, F1: %.5f, elapsed time: %.3f s"
-        % (np.mean(total_loss), np.mean(precision), np.mean(recall), np.mean(f1), end_time - start_time))

+    print("\t[test] loss: %.5f, P: %.5f, R: %.5f, F1: %.5f, elapsed time: %.3f s"
+        % (np.mean(total_loss), precision, recall, f1, end_time - start_time))

-def main(args):
+def do_train(args):
    """
    Main Function
    """
-    args = parser.parse_args()
    ernie_config = ErnieConfig(args.ernie_config_path)
    ernie_config.print_config()

    if args.use_cuda:
        place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
-        dev_count = fluid.core.get_cuda_device_count()
+        dev_count = 1
    else:
+        dev_count = min(multiprocessing.cpu_count(), args.cpu_num)
+        if (dev_count < args.cpu_num):
+            print("WARNING: The total CPU NUM in this machine is %d, which is less than cpu_num parameter you set. "
+                  "Change the cpu_num from %d to %d"%(dev_count, args.cpu_num, dev_count))
+        os.environ['CPU_NUM'] = str(dev_count)
        place = fluid.CPUPlace()
-        dev_count = int(os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
-    exe = fluid.Executor(place)

-    reader = task_reader.SequenceLabelReader(
-        vocab_path=args.vocab_path,
-        label_map_config=args.label_map_config,
-        max_seq_len=args.max_seq_len,
-        do_lower_case=args.do_lower_case,
-        in_tokens=False,
-        random_seed=args.random_seed)
+    exe = fluid.Executor(place)

-    if not (args.do_train or args.do_test or args.do_infer):
-        raise ValueError("For args `do_train`, `do_val` and `do_test`, at "
-                         "least one of them must be True.")

    startup_prog = fluid.Program()
    if args.random_seed is not None:
        startup_prog.random_seed = args.random_seed

-    if args.do_train:
-        num_train_examples = reader.get_num_examples(args.train_set)
-        max_train_steps = args.epoch * num_train_examples // args.batch_size // dev_count
-        print("Device count: %d" % dev_count)
-        print("Num train examples: %d" % num_train_examples)
-        print("Max train steps: %d" % max_train_steps)
-
-        train_program = fluid.Program()
-
-        with fluid.program_guard(train_program, startup_prog):
-            with fluid.unique_name.guard():
-                # create ernie_pyreader
-                train_pyreader, ernie_inputs, words, labels = ernie_pyreader(args, pyreader_name='train_reader')
-                train_pyreader.decorate_tensor_provider(
-                    reader.data_generator(
-                        args.train_set, args.batch_size, args.epoch, shuffle=True, phase="train"
-                    )
-                )
-                # get ernie_embeddings
-                embeddings = ernie_encoder(ernie_inputs, ernie_config=ernie_config)
-                # user defined model based on ernie embeddings
-                train_ret = create_model(args, embeddings, labels=labels, is_prediction=False)
-
-                optimizer = fluid.optimizer.Adam(learning_rate=args.lr)
-                fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0))
-                optimizer.minimize(train_ret["loss"])
-
-        lower_mem, upper_mem, unit = fluid.contrib.memory_usage(
-            program=train_program, batch_size=args.batch_size)
-        print("Theoretical memory usage in training: %.3f - %.3f %s" %
-            (lower_mem, upper_mem, unit))
-
-    if args.do_test:
-        test_program = fluid.Program()
-        with fluid.program_guard(test_program, startup_prog):
-            with fluid.unique_name.guard():
-                # create ernie_pyreader
-                test_pyreader, ernie_inputs, words, labels = ernie_pyreader(args, pyreader_name='test_reader')
-                test_pyreader.decorate_tensor_provider(
-                    reader.data_generator(
-                        args.test_set, args.batch_size, phase='test', epoch=1, shuffle=False
-                    )
-                )
-                # get ernie_embeddings
-                embeddings = ernie_encoder(ernie_inputs, ernie_config=ernie_config)
-                # user defined model based on ernie embeddings
-                test_ret = create_model(args, embeddings, labels=labels, is_prediction=False)
-
-        test_program = test_program.clone(for_test=True)
-
-    if args.do_infer:
-        infer_program = fluid.Program()
-        with fluid.program_guard(infer_program, startup_prog):
-            with fluid.unique_name.guard():
-                # create ernie_pyreader
-                infer_pyreader, ernie_inputs, words, labels = ernie_pyreader(args, pyreader_name='infer_reader')
-                infer_pyreader.decorate_tensor_provider(
-                    reader.data_generator(
-                        args.infer_set, args.batch_size, phase='infer', epoch=1, shuffle=False
-                    )
-                )
-                # get ernie_embeddings
-                embeddings = ernie_encoder(ernie_inputs, ernie_config=ernie_config)
-                # user defined model based on ernie embeddings
-                infer_ret = create_model(args, embeddings, labels=labels, is_prediction=True)
-                infer_ret["words"] = words
-
-        infer_program = infer_program.clone(for_test=True)
+    train_program = fluid.Program()
+    with fluid.program_guard(train_program, startup_prog):
+        with fluid.unique_name.guard():
+            # user defined model based on ernie embeddings
+            train_ret = creator.create_ernie_model(args, ernie_config)
+
+            # ernie pyreader
+            train_pyreader = creator.create_pyreader(args, file_name=args.train_data,
+                                                  feed_list=train_ret['feed_list'],
+                                                  model="ernie",
+                                                  place=place)
+
+            test_program = train_program.clone(for_test=True)
+            test_pyreader = creator.create_pyreader(args, file_name=args.test_data,
+                                                  feed_list=train_ret['feed_list'],
+                                                  model="ernie",
+                                                  place=place)
+
+            optimizer = fluid.optimizer.Adam(learning_rate=args.base_learning_rate)
+            fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0))
+            optimizer.minimize(train_ret["avg_cost"])

-    exe.run(startup_prog)

+    lower_mem, upper_mem, unit = fluid.contrib.memory_usage(
+        program=train_program, batch_size=args.batch_size)
+    print("Theoretical memory usage in training: %.3f - %.3f %s" %
+        (lower_mem, upper_mem, unit))
+    print("Device count: %d" % dev_count)
+
+    exe.run(startup_prog)
    # load checkpoints
-    if args.do_train:
-        if args.init_checkpoint and args.init_pretraining_params:
-            print("WARNING: args 'init_checkpoint' and 'init_pretraining_params' "
-                    "both are set! Only arg 'init_checkpoint' is made valid.")
-        if args.init_checkpoint:
-            utils.init_checkpoint(exe, args.init_checkpoint, startup_prog)
-        elif args.init_pretraining_params:
-            utils.init_pretraining_params(exe, args.init_pretraining_params, startup_prog)
-    elif args.do_test or args.do_infer:
-        if not args.init_checkpoint:
-            raise ValueError("args 'init_checkpoint' should be set if only doing test or infer!")
+    if args.init_checkpoint and args.init_pretraining_params:
+        print("WARNING: args 'init_checkpoint' and 'init_pretraining_params' "
+                "both are set! Only arg 'init_checkpoint' is made valid.")
+    if args.init_checkpoint:
        utils.init_checkpoint(exe, args.init_checkpoint, startup_prog)
-
-    if args.do_train:
-        train_pyreader.start()
-        steps = 0
-        total_cost, total_acc, total_num_seqs = [], [], []
-        while True:
-            try:
-                steps += 1
-                if steps % args.skip_steps == 0:
-                    fetch_list = [
-                        train_ret["loss"],
-                        train_ret["num_infer_chunks"],
-                        train_ret["num_label_chunks"],
-                        train_ret["num_correct_chunks"],
-                    ]
-                else:
-                    fetch_list = []
-
-                start_time = time.time()
-                outputs = exe.run(program=train_program, fetch_list=fetch_list)
-                end_time = time.time()
-                if steps % args.skip_steps == 0:
-                    loss, nums_infer, nums_label, nums_correct = outputs
-                    train_ret["chunk_evaluator"].reset()
-                    train_ret["chunk_evaluator"].update(nums_infer, nums_label, nums_correct)
-                    precision, recall, f1_score = train_ret["chunk_evaluator"].eval()
-                    print("[train] batch_id = %d, loss = %.5f, P: %.5f, R: %.5f, F1: %.5f, elapsed time %.5f, "
-                            "pyreader queue_size: %d " % (steps, loss, precision, recall, f1_score,
-                            end_time - start_time, train_pyreader.queue.size()))
-
-                if steps % args.save_steps == 0:
-                    save_path = os.path.join(args.checkpoints, "step_" + str(steps))
-                    print("\tsaving model as %s" % (save_path))
-                    fluid.io.save_persistables(exe, save_path, train_program)
-
-                if steps % args.validation_steps == 0:
-                    # evaluate test set
-                    if args.do_test:
-                        evaluate(exe, test_program, test_pyreader, test_ret)
-
-            except fluid.core.EOFException:
-                save_path = os.path.join(args.checkpoints, "step_" + str(steps))
+    elif args.init_pretraining_params:
+        utils.init_pretraining_params(exe, args.init_pretraining_params, startup_prog)
+
+    if dev_count>1 and not args.use_cuda:
+        device = "GPU" if args.use_cuda else "CPU"
+        print("%d %s are used to train model"%(dev_count, device))
+
+        # multi cpu/gpu config
+        exec_strategy = fluid.ExecutionStrategy()
+        build_strategy = fluid.BuildStrategy()
+        compiled_prog = fluid.compiler.CompiledProgram(train_program).with_data_parallel(
+            loss_name=train_ret['avg_cost'].name,
+            build_strategy=build_strategy,
+            exec_strategy=exec_strategy)
+    else:
+        compiled_prog = fluid.compiler.CompiledProgram(train_program)
+
+    # start training
+    steps = 0
+    for epoch_id in range(args.epoch):
+        for data in train_pyreader():
+            steps += 1
+            if steps % args.print_steps == 0:
+                fetch_list = [
+                    train_ret["avg_cost"],
+                    train_ret["precision"],
+                    train_ret["recall"],
+                    train_ret["f1_score"],
+                ]
+            else:
+                fetch_list = []
+
+            start_time = time.time()
+            outputs = exe.run(program=compiled_prog, feed=data[0], fetch_list=fetch_list)
+            end_time = time.time()
+            if steps % args.print_steps == 0:
+                loss, precision, recall, f1_score = [np.mean(x) for x in outputs]
+                print("[train] batch_id = %d, loss = %.5f, P: %.5f, R: %.5f, F1: %.5f, elapsed time %.5f, "
+                        "pyreader queue_size: %d " % (steps, loss, precision, recall, f1_score,
+                        end_time - start_time, train_pyreader.queue.size()))
+
+            if steps % args.save_steps == 0:
+                save_path = os.path.join(args.model_save_dir, "step_" + str(steps))
+                print("\tsaving model as %s" % (save_path))
                fluid.io.save_persistables(exe, save_path, train_program)
-                train_pyreader.reset()
-                break
-
-    # final eval on test set
-    if args.do_test:
-        evaluate(exe, test_program, test_pyreader, test_ret)
-
-    if args.do_infer:
-        # create dict
-        id2word_dict = dict([(str(word_id), word) for word, word_id in reader.vocab.items()])
-        id2label_dict = dict([(str(label_id), label) for label, label_id in reader.label_map.items()])
-        Dataset = namedtuple("Dataset", ["id2word_dict", "id2label_dict"])
-        dataset = Dataset(id2word_dict, id2label_dict)
-
-        infer_pyreader.start()
-        while True:
-            try:
-                (words, crf_decode) = exe.run(infer_program,
-                        fetch_list=[infer_ret["words"], infer_ret["crf_decode"]],
-                        return_numpy=False)
-                # User should notice that words had been clipped if long than args.max_seq_len
-                results = utils.parse_result(words, crf_decode, dataset)
-                for result in results:
-                    print(result)
-            except fluid.core.EOFException:
-                infer_pyreader.reset()
-                break
+
+            if steps % args.validation_steps == 0:
+                evaluate(exe, test_program, test_pyreader, train_ret)
+
+    save_path = os.path.join(args.model_save_dir, "step_" + str(steps))
+    fluid.io.save_persistables(exe, save_path, train_program)
+
+
+
+def do_eval(args):
+    # init executor
+    if args.use_cuda:
+        place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
+    else:
+        place = fluid.CPUPlace()
+
+    ernie_config = ErnieConfig(args.ernie_config_path)
+    ernie_config.print_config()
+    test_program = fluid.Program()
+    with fluid.program_guard(test_program, fluid.default_startup_program()):
+        with fluid.unique_name.guard():
+            test_ret = creator.create_ernie_model(args, ernie_config)
+    test_program = test_program.clone(for_test=True)
+
+    pyreader = creator.create_pyreader(args, file_name=args.test_data,
+                                          feed_list=test_ret['feed_list'],
+                                          model="ernie",
+                                          place=place,
+                                          mode='test',)
+
+    print('program startup')
+
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+
+    print('program loading')
+    # load model
+    if not args.init_checkpoint:
+        raise ValueError("args 'init_checkpoint' should be set if only doing test or infer!")
+    utils.init_checkpoint(exe, args.init_checkpoint, test_program)
+
+    evaluate(exe, test_program, pyreader, test_ret)
+
+def do_infer(args):
+    # init executor
+    if args.use_cuda:
+        place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
+    else:
+        place = fluid.CPUPlace()
+
+    # define network and reader
+    ernie_config = ErnieConfig(args.ernie_config_path)
+    ernie_config.print_config()
+    infer_program = fluid.Program()
+    with fluid.program_guard(infer_program, fluid.default_startup_program()):
+        with fluid.unique_name.guard():
+            infer_ret = creator.create_ernie_model(args, ernie_config)
+    infer_program = infer_program.clone(for_test=True)
+    print(args.test_data)
+    pyreader, reader = creator.create_pyreader(args, file_name=args.test_data,
+                                          feed_list=infer_ret['feed_list'],
+                                          model="ernie",
+                                          place=place,
+                                          return_reader=True,
+                                          mode='test')
+
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+
+    # load model
+    if not args.init_checkpoint:
+        raise ValueError("args 'init_checkpoint' should be set if only doing test or infer!")
+    utils.init_checkpoint(exe, args.init_checkpoint, infer_program)
+
+    # create dict
+    id2word_dict = dict([(str(word_id), word) for word, word_id in reader.vocab.items()])
+    id2label_dict = dict([(str(label_id), label) for label, label_id in reader.label_map.items()])
+    Dataset = namedtuple("Dataset", ["id2word_dict", "id2label_dict"])
+    dataset = Dataset(id2word_dict, id2label_dict)
+
+    # make prediction
+    for data in pyreader():
+        (words, crf_decode) = exe.run(infer_program,
+                                      fetch_list=[infer_ret["words"], infer_ret["crf_decode"]],
+                                      feed=data[0],
+                                      return_numpy=False)
+        # User should notice that words had been clipped if long than args.max_seq_len
+        results = utils.parse_result(words, crf_decode, dataset)
+        for sent, tags in results:
+            result_list = ['(%s, %s)' % (ch, tag) for ch, tag in zip(sent, tags)]
+            print(''.join(result_list))
+


 if __name__ == "__main__":
+    parser = argparse.ArgumentParser(__doc__)
+    utils.load_yaml(parser, './conf/ernie_args.yaml')
+    args = parser.parse_args()
+    check_cuda(args.use_cuda)
    utils.print_arguments(args)
-    main(args)
+
+    if args.mode == 'train':
+        do_train(args)
+    elif args.mode == 'eval':
+        do_eval(args)
+    elif args.mode == 'infer':
+        do_infer(args)
+    else:
+        print("Usage: %s --mode train|eval|infer " % sys.argv[0])
+
--- a/PaddleNLP/lexical_analysis/run_sequence_labeling.py
+++ b/PaddleNLP/lexical_analysis/run_sequence_labeling.py
-#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""
-Baidu's open-source Lexical Analysis tool for Chinese, including:
-    1. Word Segmentation,
-    2. Part-of-Speech Tagging
-    3. Named Entity Recognition
-"""
-
-from __future__ import print_function
-
-import os
-import sys
-import math
-import time
-import random
-import argparse
-import multiprocessing
-
-import numpy as np
-import paddle
-import paddle.fluid as fluid
-
-import reader
-import utils
-
-sys.path.append("../")
-from models.sequence_labeling import nets
-
-# yapf: disable
-parser = argparse.ArgumentParser(__doc__)
-
-# 1. model parameters
-model_g = utils.ArgumentGroup(parser, "model", "model configuration")
-model_g.add_arg("word_emb_dim", int, 128, "The dimension in which a word is embedded.")
-model_g.add_arg("grnn_hidden_dim", int, 256, "The number of hidden nodes in the GRNN layer.")
-model_g.add_arg("bigru_num", int, 2, "The number of bi_gru layers in the network.")
-
-# 2. data parameters
-data_g = utils.ArgumentGroup(parser, "data", "data paths")
-data_g.add_arg("word_dict_path", str, "./conf/word.dic", "The path of the word dictionary.")
-data_g.add_arg("label_dict_path", str, "./conf/tag.dic", "The path of the label dictionary.")
-data_g.add_arg("word_rep_dict_path", str, "./conf/q2b.dic", "The path of the word replacement Dictionary.")
-data_g.add_arg("train_data", str, "./data/train.tsv", "The folder where the training data is located.")
-data_g.add_arg("test_data", str, "./data/test.tsv", "The folder where the training data is located.")
-data_g.add_arg("infer_data", str, "./data/test.tsv", "The folder where the training data is located.")
-data_g.add_arg("model_save_dir", str, "./models", "The model will be saved in this path.")
-data_g.add_arg("init_checkpoint", str, "", "Path to init model")
-
-# 3. train parameters
-train_g = utils.ArgumentGroup(parser, "training", "training options")
-
-train_g.add_arg("do_train", bool, True, "whether to perform training")
-train_g.add_arg("do_test", bool, True, "whether to perform testing")
-train_g.add_arg("do_infer", bool, False, "whether to perform inference")
-train_g.add_arg("random_seed", int, 0, "random seed for training")
-train_g.add_arg("save_model_per_batches", int, 10000, "Save the model once per xxxx batch of training")
-train_g.add_arg("valid_model_per_batches", int, 1000, "Do the validation once per xxxx batch of training")
-train_g.add_arg("batch_size", int, 80, "The number of sequences contained in a mini-batch, "
-        "or the maximum number of tokens (include paddings) contained in a mini-batch.")
-train_g.add_arg("epoch", int, 10, "corpus iteration num")
-train_g.add_arg("use_cuda", bool, False, "If set, use GPU for training.")
-train_g.add_arg("traindata_shuffle_buffer", int, 200, "The buffer size used in shuffle the training data.")
-train_g.add_arg("base_learning_rate", float, 1e-3, "The basic learning rate that affects the entire network.")
-train_g.add_arg("emb_learning_rate", float, 5,
-    "The real learning rate of the embedding layer will be (emb_learning_rate * base_learning_rate).")
-train_g.add_arg("crf_learning_rate", float, 0.2,
-    "The real learning rate of the embedding layer will be (crf_learning_rate * base_learning_rate).")
-
-parser.add_argument('--enable_ce', action='store_true', help='If set, run the task with continuous evaluation logs.')
-
-args = parser.parse_args()
-# yapf: enable.
-
-sys.path.append('../models/')
-from model_check import check_cuda
-check_cuda(args.use_cuda)
-
-print(args)
-
-
-def create_model(args, pyreader_name, vocab_size, num_labels):
-    """create lac model"""
-    pyreader = fluid.layers.py_reader(
-            capacity=50,
-            shapes=([-1, 1], [-1, 1]),
-            dtypes=('int64', 'int64'),
-            lod_levels=(1, 1),
-            name=pyreader_name,
-            use_double_buffer=False)
-
-    words, targets = fluid.layers.read_file(pyreader)
-    avg_cost, crf_decode = nets.lex_net(words, targets, args, vocab_size, num_labels)
-
-    (precision, recall, f1_score, num_infer_chunks, num_label_chunks,
-     num_correct_chunks) = fluid.layers.chunk_eval(
-         input=crf_decode,
-         label=targets,
-         chunk_scheme="IOB",
-         num_chunk_types=int(math.ceil((num_labels - 1) / 2.0)))
-    chunk_evaluator = fluid.metrics.ChunkEvaluator()
-    chunk_evaluator.reset()
-
-    ret = {
-        "pyreader":pyreader,
-        "words":words,
-        "targets":targets,
-        "avg_cost":avg_cost,
-        "crf_decode":crf_decode,
-        "chunk_evaluator":chunk_evaluator,
-        "num_infer_chunks":num_infer_chunks,
-        "num_label_chunks":num_label_chunks,
-        "num_correct_chunks":num_correct_chunks
-    }
-
-    return ret
-
-
-def evaluate(exe, test_program, test_ret):
-    """evaluate for test data"""
-    test_ret["pyreader"].start()
-    test_ret["chunk_evaluator"].reset()
-    loss = []
-    precision = []
-    recall = []
-    f1 = []
-    start_time = time.time()
-    while True:
-        try:
-            avg_loss, nums_infer, nums_label, nums_correct = exe.run(
-                    test_program,
-                    fetch_list=[
-                        test_ret["avg_cost"],
-                        test_ret["num_infer_chunks"],
-                        test_ret["num_label_chunks"],
-                        test_ret["num_correct_chunks"],
-                    ],
-            )
-            loss.append(avg_loss)
-
-            test_ret["chunk_evaluator"].update(nums_infer, nums_label, nums_correct)
-            p, r, f = test_ret["chunk_evaluator"].eval()
-
-            precision.append(p)
-            recall.append(r)
-            f1.append(f)
-        except fluid.core.EOFException:
-            test_ret["pyreader"].reset()
-            break
-    end_time = time.time()
-    print("[test] avg loss: %.5f, P: %.5f, R: %.5f, F1: %.5f, elapsed time: %.3f s"
-            % (np.mean(loss), np.mean(precision),
-                np.mean(recall), np.mean(f1), end_time - start_time))
-
-
-def main(args):
-
-    startup_program = fluid.Program()
-    if args.random_seed is not None:
-        startup_program.random_seed = args.random_seed
-
-    # prepare dataset
-    dataset = reader.Dataset(args)
-
-    if args.do_train:
-        train_program = fluid.Program()
-        if args.random_seed is not None:
-            train_program.random_seed = args.random_seed
-        with fluid.program_guard(train_program, startup_program):
-            with fluid.unique_name.guard():
-                train_ret = create_model(
-                        args, "train_reader", dataset.vocab_size, dataset.num_labels)
-                train_ret["pyreader"].decorate_paddle_reader(
-                        paddle.batch(
-                            paddle.reader.shuffle(
-                                dataset.file_reader(args.train_data),
-                                buf_size=args.traindata_shuffle_buffer
-                            ),
-                            batch_size=args.batch_size
-                        )
-                )
-
-                optimizer = fluid.optimizer.Adam(learning_rate=args.base_learning_rate)
-                optimizer.minimize(train_ret["avg_cost"])
-
-    if args.do_test:
-        test_program = fluid.Program()
-        with fluid.program_guard(test_program, startup_program):
-            with fluid.unique_name.guard():
-                test_ret = create_model(
-                       args, "test_reader", dataset.vocab_size, dataset.num_labels)
-                test_ret["pyreader"].decorate_paddle_reader(
-                    paddle.batch(
-                        dataset.file_reader(args.test_data),
-                        batch_size=args.batch_size
-                    )
-                )
-        test_program = test_program.clone(for_test=True)  # to share parameters with train model
-
-    if args.do_infer:
-        infer_program = fluid.Program()
-        with fluid.program_guard(infer_program, startup_program):
-            with fluid.unique_name.guard():
-                infer_ret = create_model(
-                       args, "infer_reader", dataset.vocab_size, dataset.num_labels)
-                infer_ret["pyreader"].decorate_paddle_reader(
-                    paddle.batch(
-                        dataset.file_reader(args.infer_data),
-                        batch_size=args.batch_size
-                    )
-                )
-        infer_program = infer_program.clone(for_test=True)
-
-
-    # init executor
-    if args.use_cuda:
-        place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
-        dev_count = fluid.core.get_cuda_device_count()
-    else:
-        place = fluid.CPUPlace()
-        dev_count = multiprocessing.cpu_count()
-    exe = fluid.Executor(place)
-    exe.run(startup_program)
-
-    # load checkpoints
-    if args.do_train:
-        if args.init_checkpoint:
-            utils.init_checkpoint(exe, args.init_checkpoint, train_program)
-    elif args.do_test:
-        if not args.init_checkpoint:
-            raise ValueError("args 'init_checkpoint' should be set if only doing validation or testing!")
-        utils.init_checkpoint(exe, args.init_checkpoint, test_program)
-    if args.do_infer:
-        utils.init_checkpoint(exe, args.init_checkpoint, infer_program)
-
-    # do start to train
-    if args.do_train:
-        num_train_examples = dataset.get_num_examples(args.train_data)
-        max_train_steps = args.epoch * num_train_examples // args.batch_size
-        print("Num train examples: %d" % num_train_examples)
-        print("Max train steps: %d" % max_train_steps)
-
-        ce_info = []
-        batch_id = 0
-        for epoch_id in range(args.epoch):
-            train_ret["pyreader"].start()
-            ce_time = 0
-            try:
-                while True:
-                    start_time = time.time()
-                    avg_cost, nums_infer, nums_label, nums_correct = exe.run(
-                        train_program,
-                        fetch_list=[
-                            train_ret["avg_cost"],
-                            train_ret["num_infer_chunks"],
-                            train_ret["num_label_chunks"],
-                            train_ret["num_correct_chunks"],
-                        ],
-                    )
-                    end_time = time.time()
-                    train_ret["chunk_evaluator"].reset()
-                    train_ret["chunk_evaluator"].update(nums_infer, nums_label, nums_correct)
-                    precision, recall, f1_score = train_ret["chunk_evaluator"].eval()
-                    batch_id += 1
-                    print("[train] batch_id = %d, loss = %.5f, P: %.5f, R: %.5f, F1: %.5f, elapsed time %.5f " % (
-                        batch_id, avg_cost, precision, recall, f1_score, end_time - start_time))
-                    ce_time += end_time - start_time
-                    ce_info.append([ce_time, avg_cost, precision, recall, f1_score])
-
-                    # save checkpoints
-                    if (batch_id % args.save_model_per_batches == 0):
-                        save_path = os.path.join(args.model_save_dir, "step_" + str(batch_id))
-                        fluid.io.save_persistables(exe, save_path, train_program)
-
-                    # evaluate
-                    if (batch_id % args.valid_model_per_batches == 0) and args.do_test:
-                        evaluate(exe, test_program, test_ret)
-
-            except fluid.core.EOFException:
-                save_path = os.path.join(args.model_save_dir, "step_" + str(batch_id))
-                fluid.io.save_persistables(exe, save_path, train_program)
-                train_ret["pyreader"].reset()
-                # break?
-    if args.do_train and args.enable_ce:
-        card_num = get_cards()
-        ce_cost = 0
-        ce_f1 = 0
-        ce_p = 0
-        ce_r = 0
-        ce_time = 0
-        try:
-            ce_time = ce_info[-2][0]
-            ce_cost = ce_info[-2][1]
-            ce_p = ce_info[-2][2]
-            ce_r = ce_info[-2][3]
-            ce_f1 = ce_info[-2][4]
-        except:
-            print("ce info error")
-        print("kpis\teach_step_duration_card%s\t%s" %
-                (card_num, ce_time))
-        print("kpis\ttrain_cost_card%s\t%f" %
-            (card_num, ce_cost))
-        print("kpis\ttrain_precision_card%s\t%f" %
-            (card_num, ce_p))
-        print("kpis\ttrain_recall_card%s\t%f" %
-            (card_num, ce_r))
-        print("kpis\ttrain_f1_card%s\t%f" %
-            (card_num, ce_f1))
-
-
-    # only test
-    if args.do_test:
-        evaluate(exe, test_program, test_ret)
-
-    if args.do_infer:
-        infer_ret["pyreader"].start()
-        while True:
-            try:
-                (words, crf_decode, ) = exe.run(infer_program,
-                        fetch_list=[
-                            infer_ret["words"],
-                            infer_ret["crf_decode"],
-                        ],
-                        return_numpy=False)
-                results = utils.parse_result(words, crf_decode, dataset)
-                for result in results:
-                    print(result)
-            except fluid.core.EOFException:
-                infer_ret["pyreader"].reset()
-                break
-
-
-def get_cards():
-    num = 0
-    cards = os.environ.get('CUDA_VISIBLE_DEVICES', '')
-    if cards != '':
-        num = len(cards.split(","))
-    return num
-
-
-if __name__ == "__main__":
-    main(args)
--- a/PaddleNLP/lexical_analysis/train.py
+++ b/PaddleNLP/lexical_analysis/train.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# -*- coding: UTF-8 -*-
+
+import os
+import sys
+import math
+import time
+import random
+import argparse
+import multiprocessing
+
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+
+import reader
+import utils
+import creator
+from  eval import test_process
+sys.path.append('../models/')
+from model_check import check_cuda
+
+# the function to train model
+def do_train(args):
+    train_program = fluid.default_main_program()
+    startup_program = fluid.default_startup_program()
+
+    dataset = reader.Dataset(args)
+    with fluid.program_guard(train_program, startup_program):
+        train_program.random_seed = args.random_seed
+        startup_program.random_seed = args.random_seed
+
+        with fluid.unique_name.guard():
+            train_ret = creator.create_model(
+                args, dataset.vocab_size, dataset.num_labels, mode='train')
+            test_program = train_program.clone(for_test=True)
+
+            optimizer = fluid.optimizer.Adam(learning_rate=args.base_learning_rate)
+            optimizer.minimize(train_ret["avg_cost"])
+
+
+    # init executor
+    if args.use_cuda:
+        place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
+        dev_count = fluid.core.get_cuda_device_count()
+    else:
+        dev_count = min(multiprocessing.cpu_count(), args.cpu_num)
+        if (dev_count < args.cpu_num):
+            print("WARNING: The total CPU NUM in this machine is %d, which is less than cpu_num parameter you set. "
+                  "Change the cpu_num from %d to %d" % (dev_count, args.cpu_num, dev_count))
+        os.environ['CPU_NUM'] = str(dev_count)
+        place = fluid.CPUPlace()
+
+    train_reader = creator.create_pyreader(args, file_name=args.train_data,
+                                       feed_list=train_ret['feed_list'],
+                                       place=place,
+                                       model='lac',
+                                       reader=dataset)
+     
+    test_reader = creator.create_pyreader(args, file_name=args.test_data,
+                                       feed_list=train_ret['feed_list'],
+                                       place=place,
+                                       model='lac',
+                                       reader=dataset,
+                                       mode='test')
+
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+
+    if args.init_checkpoint:
+        utils.init_checkpoint(exe, args.init_checkpoint, train_program)
+    if dev_count>1:
+        device = "GPU" if args.use_cuda else "CPU"
+        print("%d %s are used to train model"%(dev_count, device))
+        # multi cpu/gpu config
+        exec_strategy = fluid.ExecutionStrategy()
+        # exec_strategy.num_threads = dev_count * 6
+        build_strategy = fluid.compiler.BuildStrategy()
+        # build_strategy.enable_inplace = True
+
+        compiled_prog = fluid.compiler.CompiledProgram(train_program).with_data_parallel(
+            loss_name=train_ret['avg_cost'].name,
+            build_strategy=build_strategy,
+            exec_strategy=exec_strategy
+        )
+    else:
+        compiled_prog = fluid.compiler.CompiledProgram(train_program)
+
+    # start training
+    num_train_examples = dataset.get_num_examples(args.train_data)
+    max_train_steps = args.epoch * num_train_examples // args.batch_size
+    print("Num train examples: %d" % num_train_examples)
+    print("Max train steps: %d" % max_train_steps)
+
+    ce_info = []
+    step = 0
+    for epoch_id in range(args.epoch):
+        ce_time = 0
+        for data in train_reader():
+            # this is for minimizing the fetching op, saving the training speed.
+            if step % args.print_steps == 0:
+                fetch_list = [
+                    train_ret["avg_cost"],
+                    train_ret["precision"],
+                    train_ret["recall"],
+                    train_ret["f1_score"]
+                ]
+            else:
+                fetch_list = []
+
+            start_time = time.time()
+            outputs = exe.run(
+                compiled_prog,
+                fetch_list=fetch_list,
+                feed=data[0],
+            )
+
+            end_time = time.time()
+            if step % args.print_steps == 0:
+                avg_cost, precision, recall, f1_score = [np.mean(x) for x in outputs]
+
+                print("[train] step = %d, loss = %.5f, P: %.5f, R: %.5f, F1: %.5f, elapsed time %.5f" % (
+                    step, avg_cost, precision, recall, f1_score, end_time - start_time))
+
+            if step % args.validation_steps == 0:
+                test_process(exe, test_program, test_reader, train_ret)
+
+                ce_time += end_time - start_time
+                ce_info.append([ce_time, avg_cost, precision, recall, f1_score])
+
+            # save checkpoints
+            if step % args.save_steps == 0 and step != 0:
+                save_path = os.path.join(args.model_save_dir, "step_" + str(step))
+                fluid.io.save_persistables(exe, save_path, train_program)
+            step += 1
+        
+
+
+    if args.enable_ce:
+        card_num = get_cards()
+        ce_cost = 0
+        ce_f1 = 0
+        ce_p = 0
+        ce_r = 0
+        ce_time = 0
+        try:
+            ce_time = ce_info[-2][0]
+            ce_cost = ce_info[-2][1]
+            ce_p = ce_info[-2][2]
+            ce_r = ce_info[-2][3]
+            ce_f1 = ce_info[-2][4]
+        except:
+            print("ce info error")
+        print("kpis\teach_step_duration_card%s\t%s" %
+                (card_num, ce_time))
+        print("kpis\ttrain_cost_card%s\t%f" %
+            (card_num, ce_cost))
+        print("kpis\ttrain_precision_card%s\t%f" %
+            (card_num, ce_p))
+        print("kpis\ttrain_recall_card%s\t%f" %
+            (card_num, ce_r))
+        print("kpis\ttrain_f1_card%s\t%f" %
+            (card_num, ce_f1))
+
+
+def get_cards():
+    num = 0
+    cards = os.environ.get('CUDA_VISIBLE_DEVICES', '')
+    if cards != '':
+        num = len(cards.split(","))
+    return num
+
+if __name__ == "__main__":
+    # 参数控制可以根据需求使用argparse，yaml或者json
+    # 对NLP任务推荐使用PALM下定义的configure，可以统一argparse，yaml或者json格式的配置文件。
+
+    parser = argparse.ArgumentParser(__doc__)
+    utils.load_yaml(parser,'conf/args.yaml')
+
+    args = parser.parse_args()
+    check_cuda(args.use_cuda)
+
+    print(args)
+
+    do_train(args)
+
--- a/PaddleNLP/lexical_analysis/utils.py
+++ b/PaddleNLP/lexical_analysis/utils.py
@@ -19,6 +19,7 @@ import os
 import sys
 import numpy as np
 import paddle.fluid as fluid
+import yaml


 def str2bool(v):
@@ -47,6 +48,21 @@ class ArgumentGroup(object):
            help=help + ' Default: %(default)s.',
            **kwargs)

+def load_yaml(parser, file_name, **kwargs):
+    with open(file_name) as f:
+        args = yaml.load(f)
+        for title in args:
+            group = parser.add_argument_group(title=title, description='')
+            for name in args[title]:
+                _type = type(args[title][name]['val'])
+                _type = str2bool if _type==bool else _type
+                group.add_argument(
+                    "--"+name,
+                    default=args[title][name]['val'],
+                    type=_type,
+                    help=args[title][name]['meaning'] + ' Default: %(default)s.',
+                    **kwargs)
+

 def print_arguments(args):
    """none"""
@@ -82,7 +98,7 @@ def to_lodtensor(data, place):
        lod.append(cur_len)
    flattened_data = np.concatenate(data, axis=0).astype("int64")
    flattened_data = flattened_data.reshape([len(flattened_data), 1])
-    res = fluid.LoDTensor()
+    res = fluid.Tensor()
    res.set(flattened_data, place)
    res.set_lod([lod])
    return res
@@ -94,35 +110,38 @@ def parse_result(words, crf_decode, dataset):
    words = np.array(words)
    crf_decode = np.array(crf_decode)
    batch_size = len(offset_list) - 1
-    batch_out_str = []
-    for sent_index in range(batch_size):
-        sent_out_str = ""
-        sent_len = offset_list[sent_index + 1] - offset_list[sent_index]
-        last_word = ""
-        last_tag = ""
-        for tag_index in range(sent_len):  # iterate every word in sent
-            index = tag_index + offset_list[sent_index]
-            cur_word_id = str(words[index][0])
-            cur_tag_id = str(crf_decode[index][0])
-            cur_word = dataset.id2word_dict[cur_word_id]
-            cur_tag = dataset.id2label_dict[cur_tag_id]
-            if last_word == "":
-                last_word = cur_word
-                last_tag = cur_tag[:-2]
-            elif cur_tag.endswith("-B") or cur_tag == "O":
-                sent_out_str += last_word + u"/" + last_tag + u" "
-                last_word = cur_word
-                last_tag = cur_tag[:-2]
-            elif cur_tag.endswith("-I"):
-                last_word += cur_word
-            else:
-                raise ValueError("invalid tag: %s" % (cur_tag))
-        if cur_word != "":
-            sent_out_str += last_word + u"/" + last_tag + u" "
-        sent_out_str = to_str(sent_out_str.strip())
-        batch_out_str.append(sent_out_str)
-    return batch_out_str

+    batch_out = []
+    for sent_index in range(batch_size):
+        begin, end = offset_list[sent_index], offset_list[sent_index + 1]
+        sent = [dataset.id2word_dict[str(id[0])] for id in words[begin:end]]
+        tags = [dataset.id2label_dict[str(id[0])] for id in crf_decode[begin:end]]
+
+        sent_out = []
+        tags_out = []
+        parital_word = ""
+        for ind, tag in enumerate(tags):
+            # for the first word
+            if parital_word == "":
+                parital_word = sent[ind]
+                tags_out.append(tag.split('-')[0])
+                continue
+
+            # for the beginning of word
+            if tag.endswith("-B") or (tag == "O" and tags[ind-1]!="O"):
+                sent_out.append(parital_word)
+                tags_out.append(tag.split('-')[0])
+                parital_word = sent[ind]
+                continue
+
+            parital_word += sent[ind]
+
+        # append the last word, except for len(tags)=0
+        if len(sent_out)<len(tags_out):
+            sent_out.append(parital_word)
+
+        batch_out.append([sent_out,tags_out])
+    return batch_out

 def init_checkpoint(exe, init_checkpoint_path, main_program):
    """
@@ -146,7 +165,6 @@ def init_checkpoint(exe, init_checkpoint_path, main_program):
        predicate=existed_persitables)
    print("Load model from {}".format(init_checkpoint_path))

-
 def init_pretraining_params(exe,
                            pretraining_params_path,
                            main_program,

--- a/PaddleNLP/models/sequence_labeling/nets.py
+++ b/PaddleNLP/models/sequence_labeling/nets.py
@@ -21,15 +21,20 @@ import math
 import paddle.fluid as fluid
 from paddle.fluid.initializer import NormalInitializer

-
-def lex_net(word, target, args, vocab_size, num_labels):
+def lex_net(word, args, vocab_size, num_labels, for_infer = True, target=None):
    """
    define the lexical analysis network structure
+    word: stores the input of the model
+    for_infer: a boolean value, indicating if the model to be created is for training or predicting.
+
+    return:
+        for infer: return the prediction
+        otherwise: return the prediction
    """
    word_emb_dim = args.word_emb_dim
    grnn_hidden_dim = args.grnn_hidden_dim
-    emb_lr = args.emb_learning_rate
-    crf_lr = args.crf_learning_rate
+    emb_lr = args.emb_learning_rate if 'emb_learning_rate' in dir(args) else 1.0
+    crf_lr = args.emb_learning_rate if 'crf_learning_rate' in dir(args) else 1.0
    bigru_num = args.bigru_num
    init_bound = 0.1
    IS_SPARSE = True
@@ -76,7 +81,7 @@ def lex_net(word, target, args, vocab_size, num_labels):
        bi_merge = fluid.layers.concat(input=[gru, gru_r], axis=1)
        return bi_merge

-    def _net_conf(word, target):
+    def _net_conf(word, target=None):
        """
        Configure the network
        """
@@ -105,16 +110,31 @@ def lex_net(word, target, args, vocab_size, num_labels):
                regularizer=fluid.regularizer.L2DecayRegularizer(
                    regularization_coeff=1e-4)))

-        crf_cost = fluid.layers.linear_chain_crf(
-            input=emission,
-            label=target,
-            param_attr=fluid.ParamAttr(
-                name='crfw', learning_rate=crf_lr))
-        crf_decode = fluid.layers.crf_decoding(
-            input=emission, param_attr=fluid.ParamAttr(name='crfw'))
-        avg_cost = fluid.layers.mean(x=crf_cost)
-        return avg_cost, crf_decode
+        if target is not None:
+            crf_cost = fluid.layers.linear_chain_crf(
+                input=emission,
+                label=target,
+                param_attr=fluid.ParamAttr(
+                    name='crfw',
+                    learning_rate=crf_lr))
+            avg_cost = fluid.layers.mean(x=crf_cost)
+            crf_decode = fluid.layers.crf_decoding(
+                input=emission, param_attr=fluid.ParamAttr(name='crfw'))
+            return avg_cost,crf_decode
+
+        else:
+            size = emission.shape[1]
+            fluid.layers.create_parameter(shape = [size + 2, size],
+                                          dtype=emission.dtype,
+                                          name='crfw')
+            crf_decode = fluid.layers.crf_decoding(
+                input=emission, param_attr=fluid.ParamAttr(name='crfw'))
+
+        return crf_decode

-    avg_cost, crf_decode = _net_conf(word, target)
+    if for_infer:
+        return _net_conf(word)

-    return avg_cost, crf_decode
+    else:
+        # assert target != None, "target is necessary for training"
+        return _net_conf(word, target)