Update PaddleNLP emotion-detection for new codestyle (#3444)

* update code for argparser * add inference_model * fix inference_model * update create model for inference * fix model nets conflict with Senta * update readme

Update PaddleNLP emotion-detection for new codestyle (#3444)
* update code for argparser * add inference_model * fix inference_model * update create model for inference * fix model nets conflict with Senta * update readme
ac5b7971 · bbking · GitHub · a2beb8ae · ac5b7971 · ac5b7971
11 changed file
--- a/PaddleNLP/emotion_detection/README.md
+++ b/PaddleNLP/emotion_detection/README.md
-## 简介
+# 对话情绪识别
+* [模型简介](#模型简介)
+* [快速开始](#快速开始)
+* [进阶使用](#进阶使用)
+* [版本更新](#版本更新)
+* [作者](#作者)
+* [如何贡献代码](#如何贡献代码)
+## 模型简介
 对话情绪识别（Emotion Detection，简称EmoTect），专注于识别智能对话场景中用户的情绪，针对智能对话场景中的用户文本，自动判断该文本的情绪类别并给出相应的置信度，情绪类型分为积极、消极、中性。
 对话情绪识别适用于聊天、客服等多个场景，能够帮助企业更好地把握对话质量、改善产品的用户交互体验，也能分析客服服务质量、降低人工质检成本。可通过 [AI开放平台-对话情绪识别](http://ai.baidu.com/tech/nlp_apply/emotion_detection) 线上体验。
-效果上，我们基于百度自建测试集（包含闲聊、客服）和nlpcc2014微博情绪数据集，进行评测，效果如下表所示，此外我们还开源了百度基于海量数据训练好的模型，该模型在聊天对话语料上fine-tune之后，可以得到更好的效果。
+效果上，我们基于百度自建测试集（包含闲聊、客服）和 nlpcc2014 微博情绪数据集，进行评测，效果如下表所示，此外我们还开源了百度基于海量数据训练好的模型，该模型在聊天对话语料上 Finetune 之后，可以得到更好的效果。
 | 模型 | 闲聊 | 客服 | 微博 |
 | :------| :------ | :------ | :------ |
@@ -19,32 +28,148 @@
 ## 快速开始
-本项目依赖于 Paddlepaddle 1.3.2 及以上版本，请参考 [安装指南](http://www.paddlepaddle.org/#quick-start) 进行安装
+### 安装说明
+1. PaddlePaddle 安装
+   本项目依赖于 PaddlePaddle Fluid 1.3.2 及以上版本，请参考 [安装指南](http://www.paddlepaddle.org/#quick-start) 进行安装
+2. 代码安装
+   克隆代码库到本地
+   ```shell
+   git clone https://github.com/PaddlePaddle/models.git
+   cd models/PaddleNLP/emotion_detection
+   ```
+3. 环境依赖
+   请参考 PaddlePaddle [安装说明](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/index_cn.html) 部分的内容
+### 代码结构说明
+以下是本项目主要代码结构及说明：
+```text
+.
+├── config.json             # 配置文件
+├── config.py               # 配置文件读取接口
+├── inference_model.py			# 保存 inference_model 的脚本
+├── reader.py               # 数据读取接口
+├── run_classifier.py       # 项目的主程序入口，包括训练、预测、评估
+├── run.sh                  # 训练、预测、评估运行脚本
+├── run_ernie_classifier.py # 基于ERNIE表示的项目的主程序入口
+├── run_ernie.sh            # 基于ERNIE的训练、预测、评估运行脚本
+├── utils.py                # 其它功能函数脚本
+```
+### 数据准备
+#### **自定义数据**
+数据由两列组成，以制表符（'\t'）分隔，第一列是情绪分类的类别（0表示消极；1表示中性；2表示积极），第二列是以空格分词的中文文本，如下示例，文件为 utf8 编码。
+```text
+label   text_a
+0   谁 骂人 了 ？ 我 从来 不 骂人 ， 我 骂 的 都 不是 人 ， 你 是 人 吗 ？
+1   我 有事 等会儿 就 回来 和 你 聊
+2   我 见到 你 很高兴 谢谢 你 帮 我
+```
+注：PaddleNLP 项目提供了分词预处理脚本（在preprocess目录下），可供用户使用，具体使用方法如下：
+```shell
+python tokenizer.py --test_data_dir ./test.txt.utf8 --batch_size 1 > test.txt.utf8.seg
+```
+#### 公开数据集
+这里我们提供一份已标注的、经过分词预处理的机器人聊天数据集，只需运行数据下载脚本 ```sh download_data.sh```，运行成功后，会生成文件夹 ```data```，其目录结构如下：
+```text
+.
+├── train.tsv				# 训练集
+├── dev.tsv					# 验证集
+├── test.tsv				# 测试集
+├── infer.tsv				# 待预测数据
+├── vocab.txt				# 词典
+```
+### 单机训练
+基于示例的数据集，可以运行下面的命令，在训练集（train.tsv）上进行模型训练，并在开发集（dev.tsv）验证。
+```shell
+# TextCNN 模型
+sh run.sh train
+```
+其中 ```run.sh``` 默认训练的是 TextCNN 模型，可直接通过 ```run.sh``` 脚本传入```model_type```参数，或者通过修改 ```config.json``` 中的```model_type``` 选择不同的模型，更多参数配置及说明可以运行如下命令查看
+ ```shell
+python run_classifier.py -h
-python版本依赖python 2.7
+"""
+# 输出结果示例
+Running type options:
+  --do_train DO_TRAIN   Whether to perform training. Default: False.
+  ...
-#### 安装代码
+Model config options:
+  --model_type {bow_net,cnn_net,lstm_net,bilstm_net,gru_net,textcnn_net}
+                        Model type to run the task. Default: textcnn_net.
+  --init_checkpoint INIT_CHECKPOINT
+                        Init checkpoint to resume training from. Default: .
+  --save_checkpoint_dir SAVE_CHECKPOINT_DIR
+                        Directory path to save checkpoints Default: .
+...
+"""
+ ```
+本项目参数控制优先级：命令行参数 > ```config.json ``` > 默认值。训练完成后，会在```./save_models/textcnn``` 目录下生成以 ```step_xxx ``` 命名的模型目录。
+### 模型评估
+基于训练的模型，可以运行下面的命令进行测试，查看预训练的模型在测试集（test.tsv）上的评测结果
-克隆代码库到本地
 ```shell
-git clone https://github.com/PaddlePaddle/models.git
+# TextCNN 模型
-cd models/PaddleNLP/emotion_detection
+sh run.sh eval
+"""
+# 输出结果示例
+Load model from ./save_models/textcnn/step_756
+Final test result:
+[test evaluation] avg loss: 0.339021, avg acc: 0.869691, elapsed time: 0.123983 s
+"""
 ```
-#### 数据准备
+默认使用的模型```./save_models/textcnn/step_756```，可修改```run.sh```中的 init_checkpoint 参数，选择其它step的模型进行评估。
-下载经过预处理的数据，运行该脚本之后，会生成data目录，data目录下有训练集数据（train.tsv）、开发集数据（dev.tsv）、测试集数据（test.tsv）、 待预测数据（infer.tsv）以及对应词典（vocab.txt）
+### 模型推断
+利用已有模型，可在未知label的数据集（infer.tsv）上进行预测，得到模型预测结果及各label的概率。
 ```shell
-sh download_data.sh
+# TextCNN 模型
+sh run.sh infer
+"""
+# 输出结果示例
+Load model from ./save_models/textcnn/step_756
+1       0.000776        0.998341        0.000883
+0       0.987223        0.003371        0.009406
+1       0.000365        0.998635        0.001001
+1       0.000455        0.998125        0.001420
+"""
 ```
-#### 模型下载
+### 预训练模型
 我们开源了基于海量数据训练好的对话情绪识别模型（基于TextCNN、ERNIE等模型训练），可供用户直接使用，我们提供两种下载方式。
 **方式一**：基于PaddleHub命令行工具（PaddleHub[安装方式](https://github.com/PaddlePaddle/PaddleHub)）
 ```shell
-mkdir models && cd models
+mkdir pretrain_models && cd pretrain_models
 hub download emotion_detection_textcnn --output_path ./
 hub download emotion_detection_ernie_finetune --output_path ./
 tar xvf emotion_detection_textcnn-1.0.0.tar.gz
@@ -52,48 +177,39 @@ tar xvf emotion_detection_ernie_finetune-1.0.0.tar.gz
 ```
 **方式二**：直接下载脚本
 ```shell
 sh download_model.sh
 ```
-#### 模型评估
+以上两种方式会将预训练的 TextCNN 模型和 ERNIE模型，保存在```pretrain_models```目录下，可直接修改```run.sh```脚本中的```init_checkpoint```参数进行评估、预测。
-基于已有的预训练模型和数据，可以运行下面的命令进行测试，查看预训练的模型在测试集（test.tsv）上的评测结果
+### 服务部署
-```shell
-# TextCNN 模型
-sh run.sh eval
-# ERNIE 模型
-sh run_ernie.sh eval
-```
-#### 模型训练
+为了将模型应用于线上部署，可以利用```inference_model.py``` 脚本对模型进行裁剪，只保存网络参数及裁剪后的模型。运行命令如下：
-基于示例的数据集，可以运行下面的命令，在训练集（train.tsv）上进行模型训练，并在开发集（dev.tsv）验证
 ```shell
-# TextCNN 模型
+sh run.sh save_inference_model
-sh run.sh train
-# ERNIE 模型
-sh run_ernie.sh train
 ```
-训练完成后，可修改```run.sh```和```run_ernie.sh```中的init_checkpoint 参数，选择最优step的模型进行评估和预测
-#### 模型预测
+同时裁剪后的模型使用方法详见```inference_model.py```，测试命令如下:
-利用已有模型，可在未知label的数据集（infer.tsv）上进行预测，得到模型预测结果及各label的概率
 ```shell
-# TextCNN 模型
+python inference_model.py 
-sh run.sh infer
-# ERNIE 模型
-sh run_ernie.sh infer
 ```
+#### 服务器部署
+请参考PaddlePaddle官方提供的 [服务器端部署](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/advanced_usage/deploy/inference/index_cn.html) 文档进行部署上线。
 ## 进阶使用
-#### 任务定义
+### 背景介绍
 对话情绪识别任务输入是一段用户文本，输出是检测到的情绪类别，包括消极、积极、中性，这是一个经典的短文本三分类任务。
-#### 模型原理介绍
+### 模型概览
 本项目针对对话情绪识别问题，开源了一系列分类模型，供用户可配置地使用：
@@ -104,102 +220,127 @@ sh run_ernie.sh infer
 + BI-LSTM：双向单层LSTM模型，采用双向LSTM结构，更好地捕获句子中的语义特征；
 + ERNIE：百度自研基于海量数据和先验知识训练的通用文本语义表示模型，并基于此在对话情绪分类数据集上进行fine-tune获得。
-#### 数据格式说明
+### 自定义模型
-训练、预测、评估使用的数据示例如下，数据由两列组成，以制表符（'\t'）分隔，第一列是情绪分类的类别（0表示消极；1表示中性；2表示积极），第二列是以空格分词的中文文本，文件为utf8编码。
+可以根据自己的需求，组建自定义的模型，具体方法如下所示：
-```text
+1. 定义自己的网络结构
-label   text_a
-0   谁 骂人 了 ？ 我 从来 不 骂人 ， 我 骂 的 都 不是 人 ， 你 是 人 吗 ？
-1   我 有事 等会儿 就 回来 和 你 聊
-2   我 见到 你 很高兴 谢谢 你 帮 我
-```
-注：本项目额外提供了分词预处理脚本（在preprocess目录下），可供用户使用，具体使用方法如下：
-```shell
-python tokenizer.py --test_data_dir ./test.txt.utf8 --batch_size 1 > test.txt.utf8.seg
-```
-#### 代码结构说明
+   用户可以在 ```models/classification/nets.py``` 中，定义自己的模型，只需要增加新的函数即可。假设用户自定义的函数名为```user_net```
-```text
+2. 更改模型配置
-.
-├── config.json             # 模型配置文件
-├── config.py               # 定义了该项目模型的相关配置，包括具体模型类别、以及模型的超参数
-├── reader.py               # 定义了读入数据，加载词典的功能
-├── run_classifier.py       # 该项目的主函数，封装包括训练、预测、评估的部分
-├── run_ernie_classifier.py # 基于ERNIE表示的项目的主函数
-├── run_ernie.sh            # 基于ERNIE的训练、预测、评估运行脚本
-├── run.sh                  # 训练、预测、评估运行脚本
-├── utils.py                # 定义了其他常用的功能函数
-```
-#### 如何组建自己的模型
+   在 ```config.json``` 中需要将 ```model_type``` 改为用户自定义的 ```user_net```
-可以根据自己的需求，组建自定义的模型，具体方法如下所示：
+3. 模型训练
-1. 定义自己的网络结构
+   通过```run.sh``` 脚本运行训练、评估、预测。
-用户可以在 ```models/classification/nets.py``` 中，定义自己的模型，只需要增加新的函数即可。假设用户自定义的函数名为```user_net```
-2. 更改模型配置
+### 基于 ERNIE 进行 Finetune
-在 ```config.json``` 中需要将 ```model_type``` 改为用户自定义的 ```user_net```
-3. 模型训练，运行训练、评估、预测需要在 ```run.sh``` 、```run_ernie.sh``` 中将模型、数据、词典路径等配置进行修改
-#### 如何基于百度开源模型进行 Finetune
+ERNIE 是百度自研的基于海量数据和先验知识训练的通用文本语义表示模型，基于 ERNIE 进行 Finetune，能够提升对话情绪识别的效果。
-用户可基于百度开源的对话情绪识别模型在自有数据上实现 Finetune 训练，以期获得更好的效果提升，具体模型 Finetune 方法如下所示
+#### 模型训练
-如果用户基于开源的 TextCNN模型进行 Finetune，需要修改```run.sh```和```config.json```文件
+需要先下载 ERNIE 模型，使用如下命令：
-```run.sh``` 脚本修改如下：
 ```shell
-# 在train()函数中，增加--init_checkpoint选项；修改--vocab_path
+mkdir -p pretrain_models/ernie
--init_checkpoint ./models/textcnn
+cd pretrain_models/ernie
--vocab_path ./data/vocab.txt
+wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/ERNIE_stable-1.0.1.tar.gz -O ERNIE_stable-1.0.1.tar.gz
+tar -zxvf ERNIE_stable-1.0.1.tar.gz
 ```
-```config.json``` 配置修改如下:
+然后修改```run_ernie.sh``` 脚本中train 函数的 ```init_checkpoint``` 参数，再执行命令：
 ```shell
-# vocab_size为词典大小，对应上面./data/vocab.txt
+#--init_checkpoint ./pretrain_models/ernie
-"vocab_size": 240465
+sh run_ernie.sh train
 ```
-如果用户基于开源的 ERNIE模型进行Finetune，需要更新```run_ernie.sh```脚本，具体修改如下：
+默认使用GPU进行训练，模型保存在 ```./save_models/ernie/```目录下，以 ```step_xxx ``` 命名。
+#### 模型评估
+根据训练结果，可选择最优的step进行评估，修改```run_ernie.sh``` 脚本中 eval 函数 ```init_checkpoint``` 参数，然后执行
 ```shell
-# 在train()函数中，修改--init_checkpoint选项
+#--init_checkpoint./save/step_907
--init_checkpoint ./models/ernie_finetune/params
+sh run_ernie.sh eval
+'''
+# 输出结果示例
+W0820 14:59:47.811139   334 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
+W0820 14:59:47.815557   334 device_context.cc:267] device: 0, cuDNN Version: 7.3.
+Load model from ./save_models/ernie/step_907
+Final validation result:
+[test evaluation] avg loss: 0.260597, ave acc: 0.907336, elapsed time: 2.383077 s
+'''
 ```
-#### 如何基于PaddleHub加载ERNIE进行 Finetune
+#### 模型推断
-我们也提供了使用PaddleHub加载ERNIE模型的选项，PaddleHub是PaddlePaddle的预训练模型管理工具，可以一行代码完成预训练模型的加载，简化预训练模型的使用和迁移学习。更多相关的介绍，可以查看[PaddleHub](https://github.com/PaddlePaddle/PaddleHub)
+修改```run_ernie.sh``` 脚本中 infer 函数 ```init_checkpoint``` 参数，然后执行
-如果想使用该功能，需要修改run_ernie.sh中的配置如下：
 ```shell
-# 在train()函数中，修改--use_paddle_hub选项
+#--init_checkpoint./save/step_907
--use_paddle_hub true
+sh run_ernie.sh infer
+'''
+# 输出结果示例
+Load model from ./save_models/ernie/step_907
+Final test result:
+1      0.000803      0.998870      0.000326
+0      0.976585      0.021535      0.001880
+1      0.000572      0.999153      0.000275
+1      0.001113      0.998502      0.000385
+'''
 ```
+### 基于 PaddleHub 加载 ERNIE 进行 Finetune
+我们也提供了使用 PaddleHub 加载 ERNIE 模型的选项，PaddleHub 是 PaddlePaddle 的预训练模型管理工具，可以一行代码完成预训练模型的加载，简化预训练模型的使用和迁移学习。更多相关的介绍，可以查看 [PaddleHub](https://github.com/PaddlePaddle/PaddleHub)
 注意：使用该选项需要先安装PaddleHub，安装命令如下
 ```shell
 pip install paddlehub
 ```
-执行以下命令进行Finetune
+需要修改```run_ernie.sh```中的配置如下：
+```shell
+# 在train()函数中，修改--use_paddle_hub选项
+--use_paddle_hub true
+```
+执行以下命令进行 Finetune
 ```shell
 sh run_ernie.sh train
 ```
-Finetune结束后，进行eval或者infer时，需要修改run_ernie.sh中的配置如下：
+Finetune 结束后，进行 eval 或者 infer 时，需要修改 ```run_ernie.sh``` 中的配置如下：
 ```shell
 # 在eval()和infer()函数中，修改--use_paddle_hub选项
 --use_paddle_hub true
 ```
-执行以下命令进行eval和infer
+执行以下命令进行 eval 和 infer
 ```shell
 sh run_ernie.sh eval
 sh run_ernie.sh infer
 ```
+## 版本更新
+2019/08/26 规范化配置的使用，对模块内数据处理代码进行了重构，更新README结构，提高易用性。
+2019/06/13 添加PaddleHub调用ERNIE方式。
+## 作者
+- [chenbingjin](https://github.com/chenbjin)
+- [wuzewu](https://github.com/nepeplwu)
 ## 如何贡献代码
 如果你可以修复某个issue或者增加一个新功能，欢迎给我们提交PR。如果对应的PR被接受了，我们将根据贡献的质量和难度进行打分（0-5分，越高越好）。如果你累计获得了10分，可以联系我们获得面试机会或者为你写推荐信。
--- a/PaddleNLP/emotion_detection/config.json
+++ b/PaddleNLP/emotion_detection/config.json
 {
+    "task_name": "emotion_detection",
    "model_type": "textcnn_net",
-    "vocab_size": 240465 
+    "num_labels": 3,
+    "vocab_size": 240465,
+    "vocab_path": "./data/vocab.txt",
+    "data_dir": "./data",
+    "inference_model_dir": "./inference_model",
+    "save_checkpoint_dir": "",
+    "init_checkpoint": "",
+    "lr": 0.02,
+    "epoch": 10,
+    "batch_size": 64
 }
--- a/PaddleNLP/emotion_detection/config.py
+++ b/PaddleNLP/emotion_detection/config.py
@@ -19,35 +19,172 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
+import os
 import six
 import json
+import argparse
+def str2bool(value):
+    """
+    String to Boolean
+    """
+    # because argparse does not support to parse "true, False" as python
+    # boolean directly
+    return value.lower() in ("true", "t", "1")
-class EmoTectConfig(object):
+class ArgumentGroup(object):
    """
-    EmoTect Config
+    Argument Class
    """
+    def __init__(self, parser, title, des):
+        self._group = parser.add_argument_group(title=title, description=des)
+    def add_arg(self, name, dtype, default, help, **kwargs):
+        """
+        Add argument
+        """
+        dtype = str2bool if dtype == bool else dtype
+        self._group.add_argument(
+            "--" + name,
+            default=default,
+            type=dtype,
+            help=help + ' Default: %(default)s.',
+            **kwargs)
-    def __init__(self, config_path):
-        self._config_dict = self._parse(config_path)
-    def _parse(self, config_path):
+class PDConfig(object):
-        try:
+    """
-            with open(config_path) as json_file:
+    A high-level api for handling argument configs.
-                config_dict = json.load(json_file)
+    """
-        except Exception:
+    def __init__(self, json_file=""):
-            raise IOError("Error in parsing emotect model config file '%s'" %
-                          config_path)
-        else:
-            return config_dict
-    def __getitem__(self, key):
-        return self._config_dict[key]
-    def print_config(self):
        """
-        Print Config
+        Init funciton for PDConfig.
+        json_file: the path to the json configure file.
        """
-        for arg, value in sorted(six.iteritems(self._config_dict)):
+        assert isinstance(json_file, str)
+        self.args = None
+        self.arg_config = {}
+        parser = argparse.ArgumentParser()
+        run_type_g = ArgumentGroup(parser, "Running type options", "")
+        run_type_g.add_arg("do_train", bool, False, "Whether to perform training.")
+        run_type_g.add_arg("do_val", bool, False, "Whether to perform evaluation.")
+        run_type_g.add_arg("do_infer", bool, False, "Whether to perform inference.")
+        run_type_g.add_arg("do_save_inference_model", bool, False, "Whether to perform save inference model.")
+        model_g = ArgumentGroup(parser, "Model config options", "")
+        model_g.add_arg("model_type", str, "cnn_net", "Model type to run the task.",
+            choices=["bow_net","cnn_net", "lstm_net", "bilstm_net", "gru_net", "textcnn_net"])
+        model_g.add_arg("num_labels", int, 3 , "Number of labels for classification")
+        model_g.add_arg("init_checkpoint", str, None, "Init checkpoint to resume training from.")
+        model_g.add_arg("save_checkpoint_dir", str, None, "Directory path to save checkpoints")
+        model_g.add_arg("inference_model_dir", str, None, "Directory path to save inference model")
+        data_g = ArgumentGroup(parser, "Data config options", "")
+        data_g.add_arg("data_dir", str, None, "Directory path to training data.")
+        data_g.add_arg("vocab_path", str, None, "Vocabulary path.")
+        data_g.add_arg("vocab_size", str, None, "Vocabulary size.")
+        data_g.add_arg("max_seq_len", int, 128, "Number of words of the longest sequence.")
+        train_g = ArgumentGroup(parser, "Training config options", "")
+        train_g.add_arg("lr", float, 0.002, "The Learning rate value for training.")
+        train_g.add_arg("epoch", int, 10, "Number of epoches for training.")
+        train_g.add_arg("use_cuda", bool, False, "If set, use GPU for training.")
+        train_g.add_arg("batch_size", int, 256, "Total examples' number in batch for training.")
+        train_g.add_arg("skip_steps", int, 10, "The steps interval to print loss.")
+        train_g.add_arg("save_steps", int, 1000, "The steps interval to save checkpoints.")
+        train_g.add_arg("validation_steps", int, 1000, "The steps interval to evaluate model performance.")
+        train_g.add_arg("random_seed", int, 0, "Random seed.")
+        log_g = ArgumentGroup(parser, "Logging options", "")
+        log_g.add_arg("verbose", bool, False, "Whether to output verbose log")
+        log_g.add_arg("task_name", str, None, "The name of task to perform emotion detection")
+        log_g.add_arg('enable_ce', bool, False, 'If set, run the task with continuous evaluation logs.')
+        custom_g = ArgumentGroup(parser, "Customize options", "")
+        self.custom_g = custom_g
+        self.parser = parser
+        self.arglist = [a.dest for a in self.parser._actions]
+        self.json_config = None
+        if json_file != "":
+            self.load_json(json_file)
+    def load_json(self, file_path):
+        """load json config """
+        if not os.path.exists(file_path):
+            raise Warning("the json file %s does not exist." % file_path)
+            return
+        try:
+            with open(file_path, "r") as fin:
+                self.json_config = json.load(fin)
+        except Exception as e:
+            raise IOError("Error in parsing json config file '%s'" % file_path)
+        for name in self.json_config:
+            # use `six.string_types` but not `str` for compatible with python2 and python3
+            if not isinstance(self.json_config[name], (int, float, bool, six.string_types)):
+                continue
+            if name in self.arglist:
+                self.set_default(name, self.json_config[name])
+            else:
+                self.custom_g.add_arg(name,
+                                      type(self.json_config[name]),
+                                      self.json_config[name],
+                                      "customized options")
+    def set_default(self, name, value):
+        for arg in self.parser._actions:
+            if arg.dest == name:
+                arg.default = value
+    def build(self):
+        self.args = self.parser.parse_args()
+        self.arg_config = vars(self.args)
+    def print_arguments(self):
+        print('-----------  Configuration Arguments -----------')
+        for arg, value in sorted(six.iteritems(self.arg_config)):
            print('%s: %s' % (arg, value))
        print('------------------------------------------------')
+    def add_arg(self, name, dtype, default, descrip):
+        self.custom_g.add_arg(name, dtype, default, descrip)
+    def __add__(self, new_arg):
+        assert isinstance(new_arg, list) or isinstance(new_arg, tuple)
+        assert len(new_arg) >= 3
+        assert self.args is None
+        name = new_arg[0]
+        dtype = new_arg[1]
+        dvalue = new_arg[2]
+        desc = new_arg[3] if len(new_arg) == 4 else "Description is not provided."
+        self.add_arg(name, dtype, dvalue, desc)
+        return self
+    def __getattr__(self, name):
+        if name in self.arg_config:
+            return self.arg_config[name]
+        if name in self.json_config:
+            return self.json_config[name]
+        raise Warning("The argument %s is not defined." % name)
+if __name__ == '__main__':
+    pd_config = PDConfig('config.json')
+    pd_config += ("my_age", int, 18, "I am forever 18.")
+    pd_config.build()
+    pd_config.print_arguments()
+    print(pd_config.use_cuda)
+    print(pd_config.model_type)
--- a/PaddleNLP/emotion_detection/inference_model.py
+++ b/PaddleNLP/emotion_detection/inference_model.py
+# -*- encoding: utf8 -*-
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import sys
+sys.path.append("../")
+import paddle
+import paddle.fluid as fluid
+import numpy as np
+from models.model_check import check_cuda
+from config import PDConfig
+from run_classifier import create_model
+import utils
+def do_save_inference_model(args):
+    if args.use_cuda:
+        dev_count = fluid.core.get_cuda_device_count()
+        place = fluid.CUDAPlace(0)
+    else:
+        dev_count = int(os.environ.get('CPU_NUM', 1))
+        place = fluid.CPUPlace()
+    test_prog = fluid.default_main_program()
+    startup_prog = fluid.default_startup_program()
+    with fluid.program_guard(test_prog, startup_prog):
+        with fluid.unique_name.guard():
+            infer_pyreader, probs, feed_target_names = create_model(
+                args,
+                pyreader_name='infer_reader',
+                num_labels=args.num_labels,
+                is_prediction=True)
+    test_prog = test_prog.clone(for_test=True)
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+    assert (args.init_checkpoint)
+    if args.init_checkpoint:
+        utils.init_checkpoint(exe, args.init_checkpoint, test_prog)
+    fluid.io.save_inference_model(
+        args.inference_model_dir,
+        feeded_var_names=feed_target_names,
+        target_vars=[probs],
+        executor=exe,
+        main_program=test_prog,
+        model_filename="model.pdmodel",
+        params_filename="params.pdparams")
+    print("save inference model at %s" % (args.inference_model_dir))
+def test_inference_model(args, texts):
+    if args.use_cuda:
+        dev_count = fluid.core.get_cuda_device_count()
+        place = fluid.CUDAPlace(0)
+    else:
+        dev_count = int(os.environ.get('CPU_NUM', 1))
+        place = fluid.CPUPlace()
+    test_prog = fluid.default_main_program()
+    startup_prog = fluid.default_startup_program()
+    with fluid.program_guard(test_prog, startup_prog):
+        with fluid.unique_name.guard():
+            infer_pyreader, probs, feed_target_names = create_model(
+                args,
+                pyreader_name='infer_reader',
+                num_labels=args.num_labels,
+                is_prediction=True)
+    test_prog = test_prog.clone(for_test=True)
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+    assert (args.inference_model_dir)
+    infer_program, feed_names, fetch_targets = fluid.io.load_inference_model(
+            dirname=args.inference_model_dir,
+            executor=exe,
+            model_filename="model.pdmodel",
+            params_filename="params.pdparams")
+    data = []
+    seq_lens = []
+    for query in texts:
+        wids = utils.query2ids(args.vocab_path, query)
+        wids, seq_len = utils.pad_wid(wids)
+        data.append(wids)
+        seq_lens.append(seq_len)
+    batch_size = len(data)
+    data = np.array(data).reshape((batch_size, 128, 1))
+    seq_lens = np.array(seq_lens).reshape((batch_size, 1))
+    pred = exe.run(infer_program,
+                feed={
+                    feed_names[0]:data,
+                    feed_names[1]:seq_lens},
+                fetch_list=fetch_targets,
+                return_numpy=True)
+    for probs in pred[0]:
+        print("%d\t%f\t%f\t%f" % (np.argmax(probs), probs[0], probs[1], probs[2]))
+if __name__ == "__main__":
+    args = PDConfig(json_file="./config.json")
+    args.build()
+    args.print_arguments()
+    check_cuda(args.use_cuda)
+    if args.do_save_inference_model:
+        do_save_inference_model(args)
+    else:
+        texts = [u"我 讨厌 你 ， 哼哼 哼 。 。", u"我 喜欢 你 ， 爱 你 哟"]
+        test_inference_model(args, texts)
--- a/PaddleNLP/emotion_detection/reader.py
+++ b/PaddleNLP/emotion_detection/reader.py
@@ -29,43 +29,45 @@ class EmoTectProcessor(object):
    Processor class for data convertors for EmoTect.
    """
-    def __init__(self, data_dir, vocab_path, random_seed=None):
+    def __init__(self, data_dir, vocab_path, random_seed=None, max_seq_len=128):
        self.data_dir = data_dir
        self.vocab = load_vocab(vocab_path)
        self.num_examples = {"train": -1, "dev": -1, "test": -1, "infer": -1}
        np.random.seed(random_seed)
+        self.max_seq_len = max_seq_len
-    def get_train_examples(self, data_dir, epoch=1):
+    def get_train_examples(self, data_dir, epoch, max_seq_len):
        """
        Load training examples
        """
        return data_reader(
            os.path.join(self.data_dir, "train.tsv"), self.vocab,
-            self.num_examples, "train", epoch)
+            self.num_examples, "train", epoch, max_seq_len)
-    def get_dev_examples(self, data_dir):
+    def get_dev_examples(self, data_dir, epoch, max_seq_len):
        """
        Load dev examples
        """
        return data_reader(
            os.path.join(self.data_dir, "dev.tsv"), self.vocab,
-            self.num_examples, "dev")
+            self.num_examples, "dev", epoch, max_seq_len)
-    def get_test_examples(self, data_dir):
+    def get_test_examples(self, data_dir, epoch, max_seq_len):
        """
        Load test examples
        """
        return data_reader(
            os.path.join(self.data_dir, "test.tsv"), self.vocab,
-            self.num_examples, "test")
+            self.num_examples, "test", epoch, max_seq_len)
-    def get_infer_examples(self, data_dir):
+    def get_infer_examples(self, data_dir, epoch, max_seq_len):
        """
        Load infer querys
        """
        return data_reader(
            os.path.join(self.data_dir, "infer.tsv"), self.vocab,
-            self.num_examples, "infer")
+            self.num_examples, "infer", epoch, max_seq_len)
    def get_labels(self):
        """
@@ -95,16 +97,16 @@ class EmoTectProcessor(object):
        """
        if phase == "train":
            return paddle.batch(
-                self.get_train_examples(self.data_dir, epoch), batch_size)
+                self.get_train_examples(self.data_dir, epoch, self.max_seq_len), batch_size)
        elif phase == "dev":
            return paddle.batch(
-                self.get_dev_examples(self.data_dir), batch_size)
+                self.get_dev_examples(self.data_dir, epoch, self.max_seq_len), batch_size)
        elif phase == "test":
            return paddle.batch(
-                self.get_test_examples(self.data_dir), batch_size)
+                self.get_test_examples(self.data_dir, epoch, self.max_seq_len), batch_size)
        elif phase == "infer":
            return paddle.batch(
-                self.get_infer_examples(self.data_dir), batch_size)
+                self.get_infer_examples(self.data_dir, epoch, self.max_seq_len), batch_size)
        else:
            raise ValueError(
                "Unknown phase, which should be in ['train', 'dev', 'test', 'infer']."

--- a/PaddleNLP/emotion_detection/run.sh
+++ b/PaddleNLP/emotion_detection/run.sh
 #!/bin/bash
 export FLAGS_enable_parallel_graph=1
 export FLAGS_sync_nccl_allreduce=1
-export CUDA_VISIBLE_DEVICES=3
+export CUDA_VISIBLE_DEVICES=0
 export FLAGS_fraction_of_gpu_memory_to_use=0.95
-TASK_NAME='emotion_detection'
-DATA_PATH=./data/
-VOCAB_PATH=./data/vocab.txt
 CKPT_PATH=./save_models/textcnn
-MODEL_PATH=./models/textcnn
+MODEL_PATH=./save_models/textcnn/step_756
 # run_train on train.tsv and do_val on dev.tsv
 train() {
    python run_classifier.py \
-        --task_name ${TASK_NAME} \
        --use_cuda false \
        --do_train true \
        --do_val true \
+        --epoch 5 \
+        --lr 0.002 \
        --batch_size 64 \
-        --data_dir ${DATA_PATH} \
+        --save_checkpoint_dir ${CKPT_PATH} \
-        --vocab_path ${VOCAB_PATH} \
-        --output_dir ${CKPT_PATH} \
        --save_steps 200 \
        --validation_steps 200 \
-        --epoch 5 \
-        --lr 0.002 \
-        --config_path ./config.json \
        --skip_steps 200
 }
 # run_eval on test.tsv
 evaluate() {
    python run_classifier.py \
-        --task_name ${TASK_NAME} \
        --use_cuda false \
        --do_val true \
        --batch_size 128 \
-        --data_dir ${DATA_PATH} \
+        --init_checkpoint ${MODEL_PATH}
-        --vocab_path ${VOCAB_PATH} \
-        --init_checkpoint ${MODEL_PATH} \
-        --config_path ./config.json
 }
 # run_infer on infer.tsv
 infer() {
    python run_classifier.py \
-        --task_name ${TASK_NAME} \
        --use_cuda false \
        --do_infer true \
        --batch_size 32 \
-        --data_dir ${DATA_PATH} \
+        --init_checkpoint ${MODEL_PATH}
-        --vocab_path ${VOCAB_PATH} \
+}
-        --init_checkpoint ${MODEL_PATH} \
-        --config_path ./config.json
+# run_save_inference_model
+save_inference_model() {
+    python inference_model.py \
+        --use_cuda false \
+        --do_save_inference_model true \
+        --init_checkpoint  ${MODEL_PATH} \
+        --inference_model_dir ./inference_model
 }
 main() {
@@ -64,13 +58,16 @@ main() {
        infer)
            infer "$@";
            ;;
+        save_inference_model)
+            save_inference_model "$@";
+            ;;
        help)
-            echo "Usage: ${BASH_SOURCE} {train|eval|infer}";
+            echo "Usage: ${BASH_SOURCE} {train|eval|infer|save_inference_model}";
            return 0;
            ;;
        *)
            echo "unsupport command [${cmd}]";
-            echo "Usage: ${BASH_SOURCE} {train|eval|infer}";
+            echo "Usage: ${BASH_SOURCE} {train|eval|infer|save_inference_model}";
            return 1;
            ;;
    esac

--- a/PaddleNLP/emotion_detection/run_classifier.py
+++ b/PaddleNLP/emotion_detection/run_classifier.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
 Emotion Detection Task
 """
@@ -21,7 +22,6 @@ from __future__ import print_function
 import os
 import time
-import argparse
 import multiprocessing
 import sys
 sys.path.append("../")
@@ -32,107 +32,55 @@ import numpy as np
 from models.classification import nets
 from models.model_check import check_cuda
+from config import PDConfig
 import reader
-import config
 import utils
-parser = argparse.ArgumentParser(__doc__)
-model_g = utils.ArgumentGroup(parser, "model", "model configuration and paths.")
-model_g.add_arg("config_path", str, None,
-                "Path to the json file for EmoTect model config.")
-model_g.add_arg("init_checkpoint", str, None,
-                "Init checkpoint to resume training from.")
-model_g.add_arg("output_dir", str, None, "Directory path to save checkpoints")
-train_g = utils.ArgumentGroup(parser, "training", "training options.")
-train_g.add_arg("epoch", int, 10, "Number of epoches for training.")
-train_g.add_arg("save_steps", int, 10000,
-                "The steps interval to save checkpoints.")
-train_g.add_arg("validation_steps", int, 1000,
-                "The steps interval to evaluate model performance.")
-train_g.add_arg("lr", float, 0.002, "The Learning rate value for training.")
-log_g = utils.ArgumentGroup(parser, "logging", "logging related")
-log_g.add_arg("skip_steps", int, 10, "The steps interval to print loss.")
-log_g.add_arg("verbose", bool, False, "Whether to output verbose log")
-data_g = utils.ArgumentGroup(
-    parser, "data", "Data paths, vocab paths and data processing options")
-data_g.add_arg("data_dir", str, None, "Directory path to training data.")
-data_g.add_arg("vocab_path", str, None, "Vocabulary path.")
-data_g.add_arg("batch_size", int, 256,
-               "Total examples' number in batch for training.")
-data_g.add_arg("random_seed", int, 0, "Random seed.")
-run_type_g = utils.ArgumentGroup(parser, "run_type", "running type options.")
-run_type_g.add_arg("use_cuda", bool, False, "If set, use GPU for training.")
-run_type_g.add_arg("task_name", str, None,
-                   "The name of task to perform sentiment classification.")
-run_type_g.add_arg("do_train", bool, False, "Whether to perform training.")
-run_type_g.add_arg("do_val", bool, False, "Whether to perform evaluation.")
-run_type_g.add_arg("do_infer", bool, False, "Whether to perform inference.")
-parser.add_argument(
-    '--enable_ce',
-    action='store_true',
-    help='If set, run the task with continuous evaluation logs.')
-args = parser.parse_args()
 def create_model(args,
                 pyreader_name,
-                 emotect_config,
                 num_labels,
-                 is_infer=False):
+                 is_prediction=False):
    """
-    Create Model for sentiment classification
+    Create Model for Emotion Detection
    """
-    if is_infer:
+    data = fluid.layers.data(name="words", shape=[-1, args.max_seq_len, 1], dtype="int64")
-        pyreader = fluid.layers.py_reader(
+    label = fluid.layers.data(name="label", shape=[-1, 1], dtype="int64")
+    seq_len = fluid.layers.data(name="seq_len", shape=[-1, 1], dtype="int64")
+    if is_prediction:
+        pyreader = fluid.io.PyReader(
+            feed_list=[data, seq_len],
            capacity=16,
-            shapes=[[-1, 1]],
+            iterable=False,
-            dtypes=['int64'],
+            return_list=False)
-            lod_levels=[1],
-            name=pyreader_name,
-            use_double_buffer=False)
    else:
-        pyreader = fluid.layers.py_reader(
+        pyreader = fluid.io.PyReader(
+            feed_list=[data, label, seq_len],
            capacity=16,
-            shapes=([-1, 1], [-1, 1]),
+            iterable=False,
-            dtypes=('int64', 'int64'),
+            return_list=False)
-            lod_levels=(1, 0),
-            name=pyreader_name,
-            use_double_buffer=False)
-    if emotect_config['model_type'] == "cnn_net":
+    if args.model_type == "cnn_net":
        network = nets.cnn_net
-    elif emotect_config['model_type'] == "bow_net":
+    elif args.model_type == "bow_net":
        network = nets.bow_net
-    elif emotect_config['model_type'] == "lstm_net":
+    elif args.model_type == "lstm_net":
        network = nets.lstm_net
-    elif emotect_config['model_type'] == "bilstm_net":
+    elif args.model_type == "bilstm_net":
        network = nets.bilstm_net
-    elif emotect_config['model_type'] == "gru_net":
+    elif args.model_type == "gru_net":
        network = nets.gru_net
-    elif emotect_config['model_type'] == "textcnn_net":
+    elif args.model_type == "textcnn_net":
        network = nets.textcnn_net
    else:
        raise ValueError("Unknown network type!")
-    if is_infer:
+    if is_prediction:
-        data = fluid.layers.read_file(pyreader)
+        probs = network(data, seq_len, None, args.vocab_size, class_dim=num_labels, is_prediction=True)
-        probs = network(
+        return pyreader, probs, [data.name, seq_len.name]
-            data,
-            None,
+    avg_loss, probs = network(data, seq_len, label, args.vocab_size, class_dim=num_labels)
-            emotect_config["vocab_size"],
-            class_dim=num_labels,
-            is_infer=True)
-        return pyreader, probs
-    data, label = fluid.layers.read_file(pyreader)
-    avg_loss, probs = network(
-        data, label, emotect_config["vocab_size"], class_dim=num_labels)
    num_seqs = fluid.layers.create_tensor(dtype='int64')
    accuracy = fluid.layers.accuracy(input=probs, label=label, total=num_seqs)
    return pyreader, avg_loss, accuracy, num_seqs
@@ -187,8 +135,6 @@ def main(args):
    """
    Main Function
    """
-    emotect_config = config.EmoTectConfig(args.config_path)
    if args.use_cuda:
        place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
    else:
@@ -196,11 +142,11 @@ def main(args):
    exe = fluid.Executor(place)
    task_name = args.task_name.lower()
-    processor = reader.EmoTectProcessor(
+    processor = reader.EmoTectProcessor(data_dir=args.data_dir,
-        data_dir=args.data_dir,
+                                      vocab_path=args.vocab_path,
-        vocab_path=args.vocab_path,
+                                      random_seed=args.random_seed)
-        random_seed=args.random_seed)
+    #num_labels = len(processor.get_labels())
-    num_labels = len(processor.get_labels())
+    num_labels = args.num_labels
    if not (args.do_train or args.do_val or args.do_infer):
        raise ValueError("For args `do_train`, `do_val` and `do_infer`, at "
@@ -229,9 +175,8 @@ def main(args):
                train_pyreader, loss, accuracy, num_seqs = create_model(
                    args,
                    pyreader_name='train_reader',
-                    emotect_config=emotect_config,
                    num_labels=num_labels,
-                    is_infer=False)
+                    is_prediction=False)
                sgd_optimizer = fluid.optimizer.Adagrad(learning_rate=args.lr)
                sgd_optimizer.minimize(loss)
@@ -243,27 +188,41 @@ def main(args):
                  (lower_mem, upper_mem, unit))
    if args.do_val:
+        if args.do_train:
+            test_data_generator = processor.data_generator(
+                batch_size=args.batch_size,
+                phase='dev',
+                epoch=1)
+        else:
+            test_data_generator = processor.data_generator(
+                batch_size=args.batch_size,
+                phase='test',
+                epoch=1)
        test_prog = fluid.Program()
        with fluid.program_guard(test_prog, startup_prog):
            with fluid.unique_name.guard():
                test_pyreader, loss, accuracy, num_seqs = create_model(
                    args,
                    pyreader_name='test_reader',
-                    emotect_config=emotect_config,
                    num_labels=num_labels,
-                    is_infer=False)
+                    is_prediction=False)
        test_prog = test_prog.clone(for_test=True)
    if args.do_infer:
+        infer_data_generator = processor.data_generator(
+            batch_size=args.batch_size,
+            phase='infer',
+            epoch=1)
        test_prog = fluid.Program()
        with fluid.program_guard(test_prog, startup_prog):
            with fluid.unique_name.guard():
-                infer_pyreader, probs = create_model(
+                infer_pyreader, probs, _ = create_model(
                    args,
                    pyreader_name='infer_reader',
-                    emotect_config=emotect_config,
                    num_labels=num_labels,
-                    is_infer=True)
+                    is_prediction=True)
        test_prog = test_prog.clone(for_test=True)
    exe.run(startup_prog)
@@ -280,11 +239,15 @@ def main(args):
    if args.do_train:
        train_exe = exe
-        train_pyreader.decorate_paddle_reader(train_data_generator)
+        train_pyreader.decorate_sample_list_generator(train_data_generator)
    else:
        train_exe = None
-    if args.do_val or args.do_infer:
+    if args.do_val:
        test_exe = exe
+        test_pyreader.decorate_sample_list_generator(test_data_generator)
+    if args.do_infer:
+        test_exe = exe
+        infer_pyreader.decorate_sample_list_generator(infer_data_generator)
    if args.do_train:
        train_pyreader.start()
@@ -332,24 +295,24 @@ def main(args):
                    time_begin = time.time()
                if steps % args.save_steps == 0:
-                    save_path = os.path.join(args.output_dir,
+                    save_path = os.path.join(args.save_checkpoint_dir, "step_" + str(steps))
-                                             "step_" + str(steps))
                    fluid.io.save_persistables(exe, save_path, train_program)
                if steps % args.validation_steps == 0:
                    # evaluate on dev set
                    if args.do_val:
-                        test_pyreader.decorate_paddle_reader(
-                            processor.data_generator(
-                                batch_size=args.batch_size,
-                                phase='dev',
-                                epoch=1))
                        evaluate(test_exe, test_prog, test_pyreader,
                                 [loss.name, accuracy.name, num_seqs.name],
                                 "dev")
            except fluid.core.EOFException:
-                save_path = os.path.join(args.output_dir, "step_" + str(steps))
+                print("final step: %d " % steps)
+                if args.do_val:
+                    evaluate(test_exe, test_prog, test_pyreader,
+                        [loss.name, accuracy.name, num_seqs.name],
+                        "dev")
+                save_path = os.path.join(args.save_checkpoint_dir, "step_" + str(steps))
                fluid.io.save_persistables(exe, save_path, train_program)
                train_pyreader.reset()
                break
@@ -372,19 +335,17 @@ def main(args):
    # evaluate on test set
    if not args.do_train and args.do_val:
-        test_pyreader.decorate_paddle_reader(
-            processor.data_generator(
-                batch_size=args.batch_size, phase='test', epoch=1))
        print("Final test result:")
        evaluate(test_exe, test_prog, test_pyreader,
-                 [loss.name, accuracy.name, num_seqs.name], "test")
+                 [loss.name, accuracy.name, num_seqs.name],
+                 "test")
    # infer
    if args.do_infer:
-        infer_pyreader.decorate_paddle_reader(
+        print("Final infer result:")
-            processor.data_generator(
+        infer(test_exe, test_prog, infer_pyreader,
-                batch_size=args.batch_size, phase='infer', epoch=1))
+             [probs.name],
-        infer(test_exe, test_prog, infer_pyreader, [probs.name], "infer")
+             "infer")
 def get_cards():
@@ -396,6 +357,8 @@ def get_cards():
 if __name__ == "__main__":
-    utils.print_arguments(args)
+    args = PDConfig('config.json')
+    args.build()
+    args.print_arguments()
    check_cuda(args.use_cuda)
    main(args)
--- a/PaddleNLP/emotion_detection/run_ernie.sh
+++ b/PaddleNLP/emotion_detection/run_ernie.sh
 #!/bin/bash
 export FLAGS_sync_nccl_allreduce=1
-export CUDA_VISIBLE_DEVICES=2
+export CUDA_VISIBLE_DEVICES=0
-MODEL_PATH=./models/ernie_finetune
+MODEL_PATH=./pretrain_models/ernie
 TASK_DATA_PATH=./data
 CKPT_PATH=./save_models/ernie
@@ -18,7 +18,7 @@ train() {
        --train_set ${TASK_DATA_PATH}/train.tsv \
        --dev_set ${TASK_DATA_PATH}/dev.tsv \
        --vocab_path ${MODEL_PATH}/vocab.txt \
-        --output_dir ${CKPT_PATH} \
+        --save_checkpoint_dir ${CKPT_PATH} \
        --save_steps 500 \
        --validation_steps 50 \
        --epoch 3 \
@@ -38,7 +38,7 @@ evaluate() {
        --do_val true \
        --use_paddle_hub false \
        --batch_size 32 \
-        --init_checkpoint ${MODEL_PATH}/params \
+        --init_checkpoint ${CKPT_PATH}/step_907 \
        --test_set ${TASK_DATA_PATH}/test.tsv \
        --vocab_path ${MODEL_PATH}/vocab.txt \
        --max_seq_len 64 \
@@ -54,7 +54,7 @@ infer() {
        --do_infer true \
        --use_paddle_hub false \
        --batch_size 32 \
-        --init_checkpoint ${MODEL_PATH}/params \
+        --init_checkpoint ${CKPT_PATH}/step_907 \
        --infer_set ${TASK_DATA_PATH}/infer.tsv \
        --vocab_path ${MODEL_PATH}/vocab.txt \
        --max_seq_len 64 \

--- a/PaddleNLP/emotion_detection/run_ernie_classifier.py
+++ b/PaddleNLP/emotion_detection/run_ernie_classifier.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
 Emotion Detection Task, based on ERNIE
 """
@@ -34,27 +35,27 @@ from preprocess.ernie import task_reader
 from models.representation import ernie
 from models.model_check import check_cuda
 import utils
+import config
 # yapf: disable
 parser = argparse.ArgumentParser(__doc__)
-model_g = utils.ArgumentGroup(parser, "model", "model configuration and paths.")
+model_g = config.ArgumentGroup(parser, "model", "model configuration and paths.")
 model_g.add_arg("ernie_config_path", str, None, "Path to the json file for ernie model config.")
-model_g.add_arg("senta_config_path", str, None, "Path to the json file for senta model config.")
 model_g.add_arg("init_checkpoint", str, None, "Init checkpoint to resume training from.")
-model_g.add_arg("output_dir", str, "checkpoints", "Path to save checkpoints")
+model_g.add_arg("save_checkpoint_dir", str, "checkpoints", "Path to save checkpoints")
 model_g.add_arg("use_paddle_hub", bool, False, "Whether to load ERNIE using PaddleHub")
-train_g = utils.ArgumentGroup(parser, "training", "training options.")
+train_g = config.ArgumentGroup(parser, "training", "training options.")
 train_g.add_arg("epoch", int, 10, "Number of epoches for training.")
 train_g.add_arg("save_steps", int, 10000, "The steps interval to save checkpoints.")
 train_g.add_arg("validation_steps", int, 1000, "The steps interval to evaluate model performance.")
 train_g.add_arg("lr", float, 0.002, "The Learning rate value for training.")
-log_g = utils.ArgumentGroup(parser, "logging", "logging related")
+log_g = config.ArgumentGroup(parser, "logging", "logging related")
 log_g.add_arg("skip_steps", int, 10, "The steps interval to print loss.")
 log_g.add_arg("verbose", bool, False, "Whether to output verbose log")
-data_g = utils.ArgumentGroup(parser, "data", "Data paths, vocab paths and data processing options")
+data_g = config.ArgumentGroup(parser, "data", "Data paths, vocab paths and data processing options")
 data_g.add_arg("data_dir", str, None, "Directory path to training data.")
 data_g.add_arg("vocab_path", str, None, "Vocabulary path.")
 data_g.add_arg("batch_size", int, 256, "Total examples' number in batch for training.")
@@ -69,7 +70,7 @@ data_g.add_arg("label_map_config", str, None, "label_map_path.")
 data_g.add_arg("do_lower_case", bool, True,
               "Whether to lower case the input text. Should be True for uncased models and False for cased models.")
-run_type_g = utils.ArgumentGroup(parser, "run_type", "running type options.")
+run_type_g = config.ArgumentGroup(parser, "run_type", "running type options.")
 run_type_g.add_arg("use_cuda", bool, False, "If set, use GPU for training.")
 run_type_g.add_arg("task_name", str, None, "The name of task to perform sentiment classification.")
 run_type_g.add_arg("do_train", bool, False, "Whether to perform training.")
@@ -348,7 +349,7 @@ def main(args):
                    time_begin = time.time()
                if steps % args.save_steps == 0:
-                    save_path = os.path.join(args.output_dir, "step_" + str(steps))
+                    save_path = os.path.join(args.save_checkpoint_dir, "step_" + str(steps))
                    fluid.io.save_persistables(exe, save_path, train_program)
                if steps % args.validation_steps == 0:
@@ -367,7 +368,7 @@ def main(args):
                                "dev")
            except fluid.core.EOFException:
-                save_path = os.path.join(args.output_dir, "step_" + str(steps))
+                save_path = os.path.join(args.save_checkpoint_dir, "step_" + str(steps))
                fluid.io.save_persistables(exe, save_path, train_program)
                train_pyreader.reset()
                break

--- a/PaddleNLP/emotion_detection/utils.py
+++ b/PaddleNLP/emotion_detection/utils.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
 EmoTect utilities.
 """
@@ -20,56 +21,14 @@ from __future__ import print_function
 import io
 import os
-import six
 import sys
+import six
 import random
-import argparse
 import paddle
 import paddle.fluid as fluid
 import numpy as np
-def str2bool(value):
-    """
-    String to Boolean
-    """
-    # because argparse does not support to parse "true, False" as python
-    # boolean directly
-    return value.lower() in ("true", "t", "1")
-class ArgumentGroup(object):
-    """
-    Argument Class
-    """
-    def __init__(self, parser, title, des):
-        self._group = parser.add_argument_group(title=title, description=des)
-    def add_arg(self, name, type, default, help, **kwargs):
-        """
-        Add argument
-        """
-        type = str2bool if type == bool else type
-        self._group.add_argument(
-            "--" + name,
-            default=default,
-            type=type,
-            help=help + ' Default: %(default)s.',
-            **kwargs)
-def print_arguments(args):
-    """
-    Print Arguments
-    """
-    print('-----------  Configuration Arguments -----------')
-    for arg, value in sorted(six.iteritems(vars(args))):
-        print('%s: %s' % (arg, value))
-    print('------------------------------------------------')
 def init_checkpoint(exe, init_checkpoint_path, main_program):
    """
    Init CheckPoint
@@ -93,11 +52,34 @@ def init_checkpoint(exe, init_checkpoint_path, main_program):
    print("Load model from {}".format(init_checkpoint_path))
-def data_reader(file_path, word_dict, num_examples, phrase, epoch=1):
+def word2id(word_dict, query):
    """
-    Convert word sequence into slot
+    Convert word sequence into id list
    """
    unk_id = len(word_dict)
+    wids = [word_dict[w] if w in word_dict else unk_id
+            for w in query.strip().split(" ")]
+    return wids
+def pad_wid(wids, max_seq_len=128, pad_id=0):
+    """
+    Padding data to max_seq_len
+    """
+    seq_len = len(wids)
+    if seq_len < max_seq_len:
+        for i in range(max_seq_len - seq_len):
+            wids.append(pad_id)
+    else:
+        wids = wids[:max_seq_len]
+    seq_len = max_seq_len
+    return wids, seq_len
+def data_reader(file_path, word_dict, num_examples, phrase, epoch, max_seq_len):
+    """
+    Data reader, which convert word sequence into id list
+    """
    all_data = []
    with io.open(file_path, "r", encoding='utf8') as fin:
        for line in fin:
@@ -105,24 +87,20 @@ def data_reader(file_path, word_dict, num_examples, phrase, epoch=1):
                continue
            if phrase == "infer":
                cols = line.strip().split("\t")
-                if len(cols) != 1:
+                query = cols[-1] if len(cols) != -1 else cols[0]
-                    query = cols[-1]
+                wids = word2id(word_dict, query)
-                wids = [
+                wids, seq_len = pad_wid(wids, max_seq_len)
-                    word_dict[x] if x in word_dict else unk_id
+                all_data.append((wids, seq_len))
-                    for x in query.strip().split(" ")
-                ]
-                all_data.append((wids, ))
            else:
                cols = line.strip().split("\t")
                if len(cols) != 2:
                    sys.stderr.write("[NOTICE] Error Format Line!")
                    continue
                label = int(cols[0])
-                wids = [
+                query = cols[1].strip()
-                    word_dict[x] if x in word_dict else unk_id
+                wids = word2id(word_dict, query)
-                    for x in cols[1].split(" ")
+                wids, seq_len = pad_wid(wids, max_seq_len)
-                ]
+                all_data.append((wids, label, seq_len))
-                all_data.append((wids, label))
    num_examples[phrase] = len(all_data)
    if phrase == "infer":
@@ -131,8 +109,8 @@ def data_reader(file_path, word_dict, num_examples, phrase, epoch=1):
            """
            Infer reader function
            """
-            for wids in all_data:
+            for wids, seq_len in all_data:
-                yield wids
+                yield wids, seq_len
        return reader
@@ -143,8 +121,8 @@ def data_reader(file_path, word_dict, num_examples, phrase, epoch=1):
        for idx in range(epoch):
            if phrase == "train":
                random.shuffle(all_data)
-            for wids, label in all_data:
+            for wids, label, seq_len in all_data:
-                yield wids, label
+                yield wids, label, seq_len
    return reader
@@ -162,3 +140,22 @@ def load_vocab(file_path):
                wid += 1
    vocab["<unk>"] = len(vocab)
    return vocab
+def print_arguments(args):
+    """
+    print arguments
+    """
+    print('-----------  Configuration Arguments -----------')
+    for arg, value in sorted(six.iteritems(vars(args))):
+        print('%s: %s' % (arg, value))
+    print('------------------------------------------------')
+def query2ids(vocab_path, query):
+    """
+    Convert query to id list according to the given vocab
+    """
+    vocab = load_vocab(vocab_path)
+    wids = word2id(vocab, query)
+    return wids
--- a/PaddleNLP/models/classification/nets.py
+++ b/PaddleNLP/models/classification/nets.py
@@ -133,7 +133,7 @@ def bilstm_net(data,
        input=data,
        size=[dict_dim, emb_dim],
        param_attr=fluid.ParamAttr(learning_rate=emb_lr))
    emb = fluid.layers.sequence_unpad(emb, length=seq_len)
    fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4)
@@ -200,15 +200,15 @@ def gru_net(data,
 def textcnn_net(data,
-            seq_len,
+                seq_len,
-            label,
+                label,
-            dict_dim,
+                dict_dim,
-            emb_dim=128,
+                emb_dim=128,
-            hid_dim=128,
+                hid_dim=128,
-            hid_dim2=96,
+                hid_dim2=96,
-            class_dim=2,
+                class_dim=2,
-            win_sizes=None,
+                win_sizes=None,
-            is_prediction=False):
+                is_prediction=False):
    """
    Textcnn_net
    """