Merge pull request #1367 from mapingshuo/world_conference

Add Text Matching models on Quora

Merge pull request #1367 from mapingshuo/world_conference
Add Text Matching models on Quora
96166665 · guru4elephant · GitHub · 14139f8f · 0627db21 · 96166665
26 changed file
--- a/fluid/text_matching_on_quora/README.md
+++ b/fluid/text_matching_on_quora/README.md
+# Text matching on Quora qestion-answer pair dataset
+## contents
+* [Introduction](#introduction)
+  * [a brief review of the Quora Question Pair (QQP) Task](#a-brief-review-of-the-quora-question-pair-qqp-task)
+  * [Our Work](#our-work)
+* [Environment Preparation](#environment-preparation)
+  * [Install Fluid release 1.0](#install-fluid-release-10)
+    * [cpu version](#cpu-version)
+    * [gpu version](#gpu-version)
+    * [Have I installed Fluid successfully?](#have-i-installed-fluid-successfully)
+* [Prepare Data](#prepare-data)
+* [Train and evaluate](#train-and-evaluate)
+* [Models](#models)
+* [Results](#results)
+## Introduction
+### a brief review of the Quora Question Pair (QQP) Task
+The [Quora Question Pair](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs) dataset contains 400,000 question pairs from [Quora](https://www.quora.com/), where people ask and answer questions related to specific areas. Each sample in the dataset consists of two questions (both English) and a label that represents whether the questions are duplicate. The dataset is well annotated by human. 
+Below are two samples from the dataset. The last column indicates whether the two questions are duplicate (1) or not (0).
+|id | qid1 | qid2| question1| question2| is_duplicate
+|:---:|:---:|:---:|:---:|:---:|:---:|
+|0 |1 |2 |What is the step by step guide to invest in share market in india? |What is the step by step guide to invest in share market? |0|
+|1 |3 |4 |What is the story of Kohinoor (Koh-i-Noor) Diamond? | What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back? |0|
+ A [kaggle competition](https://www.kaggle.com/c/quora-question-pairs#description) was held based on this dataset in 2017. The kagglers were given a training dataset (with labels), and requested to make predictions on a test dataset (without labels). The predictions were evaluated by the log-likelihood loss on the test data.
+The kaggle competition has inspired much effective work. However, most of these models are rule-based and difficult to be transferred to new tasks. Researchers are seeking for more general models that work well on this task and other natual language processing (NLP) tasks.
+[Wang _et al._](https://arxiv.org/abs/1702.03814) proposed a bilateral multi-perspective matching (BIMPM) model based on the Quora Question Pair dataset. They splitted the original dataset to [3 parts](https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing): _train.tsv_ (384,348 samples), _dev.tsv_ (10,000 samples) and _test.tsv_ (10,000 samples). The class distribution of _train.tsv_ is unbalanced (37% positive and 63% negative), while those of _dev.tsv_ and _test.tsv_ are balanced(50% positive and 50% negetive). We used the same splitting method in our experiments. 
+### Our Work
+Based on the Quora Question Pair Dataset, we implemented some classic models in the area of neural language understanding (NLU). The accuracy of prediction results are evaluated on the _test.tsv_ from [Wang _et al._](https://arxiv.org/abs/1702.03814).
+## Environment Preparation
+### Install Fluid release 1.0
+Please follow the [official document in English](http://www.paddlepaddle.org/documentation/docs/en/1.0/build_and_install/pip_install_en.html) or [official document in Chinese](http://www.paddlepaddle.org/documentation/docs/zh/1.0/beginners_guide/install/Start.html) to install the Fluid deep learning framework. 
+#### Have I installed Fluid successfully?
+Run the following script from your command line:
+```shell
+python -c "import paddle"
+```
+If Fluid is installed successfully you should see no error message. Feel free to open issues under the [PaddlePaddle repository](https://github.com/PaddlePaddle/Paddle/issues) for support.
+## Prepare Data
+Please download the Quora dataset from [Google drive](https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing) and unzip to $HOME/.cache/paddle/dataset.
+Then run _data/prepare_quora_data.sh_ to download the pre-trained _word2vec_ embedding file -- _glove.840B.300d.zip_:
+```shell
+sh data/prepare_quora_data.sh   
+```
+At this point the dataset directory ($HOME/.cache/paddle/dataset) structure should be:
+```shell
+$HOME/.cache/paddle/dataset
+    |- Quora_question_pair_partition
+        |- train.tsv
+        |- test.tsv
+        |- dev.tsv
+        |- readme.txt
+        |- wordvec.txt
+    |- glove.840B.300d.txt
+```
+## Train and evaluate
+We provide multiple models and configurations. Details are shown in `models` and `configs` directories. For a quick start, please run the _cdssmNet_ model with the corresponding configuration:
+```shell
+python train_and_evaluate.py  \
+    --model_name=cdssmNet  \
+    --config=cdssm_base
+```
+Logs will be output to the console. If everything works well, the logging information will have the same formats as the content in _cdssm_base.log_.
+All configurations used in our experiments are as follows:
+|Model|Config|command
+|:----:|:----:|:----:|
+|cdssmNet|cdssm_base|python train_and_evaluate.py  --model_name=cdssmNet  --config=cdssm_base
+|DecAttNet|decatt_glove|python train_and_evaluate.py --model_name=DecAttNet  --config=decatt_glove
+|InferSentNet|infer_sent_v1|python train_and_evaluate.py --model_name=InferSentNet --config=infer_sent_v1
+|InferSentNet|infer_sent_v2|python train_and_evaluate.py --model_name=InferSentNet --config=infer_sent_v2
+|SSENet|sse_base|python train_and_evaluate.py  --model_name=SSENet  --config=sse_base
+## Models
+We implemeted 4 models for now: the convolutional deep-structured semantic model (CDSSM, CNN-based), the InferSent model (RNN-based), the shortcut-stacked encoder (SSE, RNN-based), and the decomposed attention model (DecAtt, attention-based).
+|Model|features|Context Encoder|Match Layer|Classification Layer
+|:----:|:----:|:----:|:----:|:----:|
+|CDSSM|word|1 layer conv1d|concatenation|MLP
+|DecAtt|word|Attention|concatenation|MLP
+|InferSent|word|1 layer Bi-LSTM|concatenation/element-wise product/<br>absolute element-wise difference|MLP
+|SSE|word|3 layer Bi-LSTM|concatenation/element-wise product/<br>absolute element-wise difference|MLP
+### CDSSM
+```
+@inproceedings{shen2014learning,
+  title={Learning semantic representations using convolutional neural networks for web search},
+  author={Shen, Yelong and He, Xiaodong and Gao, Jianfeng and Deng, Li and Mesnil, Gr{\'e}goire},
+  booktitle={Proceedings of the 23rd International Conference on World Wide Web},
+  pages={373--374},
+  year={2014},
+  organization={ACM}
+}
+```
+### InferSent
+```
+@article{conneau2017supervised,
+  title={Supervised learning of universal sentence representations from natural language inference data},
+  author={Conneau, Alexis and Kiela, Douwe and Schwenk, Holger and Barrault, Loic and Bordes, Antoine},
+  journal={arXiv preprint arXiv:1705.02364},
+  year={2017}
+}
+```
+### SSE
+```
+@article{nie2017shortcut,
+  title={Shortcut-stacked sentence encoders for multi-domain inference},
+  author={Nie, Yixin and Bansal, Mohit},
+  journal={arXiv preprint arXiv:1708.02312},
+  year={2017}
+}
+```
+### DecAtt
+```
+@article{tomar2017neural,
+  title={Neural paraphrase identification of questions with noisy pretraining},
+  author={Tomar, Gaurav Singh and Duque, Thyago and T{\"a}ckstr{\"o}m, Oscar and Uszkoreit, Jakob and Das, Dipanjan},
+  journal={arXiv preprint arXiv:1704.04565},
+  year={2017}
+}
+```
+## Results
+|Model|Config|dev accuracy| test accuracy
+|:----:|:----:|:----:|:----:|
+|cdssmNet|cdssm_base|83.56%|82.83%|
+|DecAttNet|decatt_glove|86.31%|86.22%|
+|InferSentNet|infer_sent_v1|87.15%|86.62%|
+|InferSentNet|infer_sent_v2|88.55%|88.43%|
+|SSENet|sse_base|88.35%|88.25%|
+In our experiment, we found that LSTM-based models outperformed convolution-based models. The DecAtt model has fewer parameters than LSTM-based models, but is sensitive to hyper-parameters.
+<p align="center"> 
+ <img src="imgs/models_test_acc.png" width = "500" alt="test_acc"/> 
+</p>
--- a/fluid/text_matching_on_quora/cdssm_base.log
+++ b/fluid/text_matching_on_quora/cdssm_base.log
--- a/fluid/text_matching_on_quora/configs/__init__.py
+++ b/fluid/text_matching_on_quora/configs/__init__.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .cdssm import cdssm_base
+from .dec_att import decatt_glove
+from .sse import sse_base
+from .infer_sent import infer_sent_v1
+from .infer_sent import infer_sent_v2
--- a/fluid/text_matching_on_quora/configs/basic_config.py
+++ b/fluid/text_matching_on_quora/configs/basic_config.py
+from __future__ import print_function
+class config(object):
+    def __init__(self):
+        self.batch_size = 128
+        self.epoch_num = 50
+        self.optimizer_type = 'adam' # sgd, adagrad
+        # pretrained word embedding 
+        self.use_pretrained_word_embedding = True
+        # when employing pretrained word embedding,  
+        # out of vocabulary words' embedding is initialized with uniform or normal numbers
+        self.OOV_fill = 'uniform'
+        self.embedding_norm = False
+        # or else, use padding and masks for sequence data
+        self.use_lod_tensor = True
+        # lr = lr * lr_decay after each epoch
+        self.lr_decay = 1
+        self.learning_rate = 0.001
+        self.save_dirname = 'model_dir'
+        self.train_samples_num = 384348
+        self.duplicate_data = False
+        self.metric_type = ['accuracy']
+    def list_config(self):
+        print("config", self.__dict__)
+    def has_member(self, var_name):
+        return var_name in self.__dict__
+if __name__ == "__main__":
+    basic = config()
+    basic.list_config()
+    basic.ahh = 2
+    basic.list_config()
--- a/fluid/text_matching_on_quora/configs/cdssm.py
+++ b/fluid/text_matching_on_quora/configs/cdssm.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import basic_config
+def cdssm_base():
+    """
+    set configs
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.001
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = True
+    config.dict_dim = 40000 # approx_vocab_size
+    # net config
+    config.emb_dim = 300
+    config.kernel_size = 5
+    config.kernel_count = 300
+    config.fc_dim = 128
+    config.mlp_hid_dim = [128, 128]
+    config.droprate_conv = 0.1
+    config.droprate_fc = 0.1
+    config.class_dim = 2
+    return config 
--- a/fluid/text_matching_on_quora/configs/dec_att.py
+++ b/fluid/text_matching_on_quora/configs/dec_att.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import basic_config
+def decatt_glove():
+    """
+    use config 'decAtt_glove' in the paper 'Neural Paraphrase Identification of Questions with Noisy Pretraining'
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.05
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = True
+    config.dict_dim = 40000 # approx_vocab_size
+    config.metric_type = ['accuracy', 'accuracy_with_threshold']
+    config.optimizer_type = 'sgd'
+    config.lr_decay = 1
+    config.use_lod_tensor = False
+    config.embedding_norm = False
+    config.OOV_fill = 'uniform'
+    config.duplicate_data = False
+    # net config
+    config.emb_dim = 300
+    config.proj_emb_dim = 200 #TODO: has project?
+    config.num_units = [400, 200]
+    config.word_embedding_trainable = True
+    config.droprate = 0.1
+    config.share_wight_btw_seq =  True
+    config.class_dim = 2
+    return config
+def decatt_word():
+    """
+    use config 'decAtt_glove' in the paper 'Neural Paraphrase Identification of Questions with Noisy Pretraining'
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.05
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = False
+    config.dict_dim = 40000 # approx_vocab_size
+    config.metric_type = ['accuracy', 'accuracy_with_threshold']
+    config.optimizer_type = 'sgd'
+    config.lr_decay = 1
+    config.use_lod_tensor = False
+    config.embedding_norm = False
+    config.OOV_fill = 'uniform'
+    config.duplicate_data = False
+    # net config
+    config.emb_dim = 300
+    config.proj_emb_dim = 200 #TODO: has project?
+    config.num_units = [400, 200]
+    config.word_embedding_trainable = True
+    config.droprate = 0.1
+    config.share_wight_btw_seq =  True
+    config.class_dim = 2
+    return config 
--- a/fluid/text_matching_on_quora/configs/infer_sent.py
+++ b/fluid/text_matching_on_quora/configs/infer_sent.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import basic_config
+def infer_sent_v1():
+    """
+    set configs
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.1
+    config.lr_decay = 0.99
+    config.optimizer_type = 'sgd'
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = True
+    config.dict_dim = 40000 # approx_vocab_size
+    config.class_dim = 2
+    # net config
+    config.emb_dim = 300 
+    config.droprate_lstm = 0.0
+    config.droprate_fc = 0.0
+    config.word_embedding_trainable = False
+    config.rnn_hid_dim = 2048
+    config.mlp_non_linear = False
+    return config
+def infer_sent_v2():
+    """
+    use our own config
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.0002
+    config.lr_decay = 0.99
+    config.optimizer_type = 'adam'
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = True
+    config.dict_dim = 40000 # approx_vocab_size
+    config.class_dim = 2
+    # net config
+    config.emb_dim = 300
+    config.droprate_lstm = 0.0
+    config.droprate_fc = 0.2
+    config.word_embedding_trainable = False
+    config.rnn_hid_dim = 2048
+    config.mlp_non_linear = True
+    return config
--- a/fluid/text_matching_on_quora/configs/sse.py
+++ b/fluid/text_matching_on_quora/configs/sse.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import basic_config
+def sse_base():
+    """
+    use config in the paper 'Shortcut-Stacked Sentence Encoders for Multi-Domain Inference'
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.0002
+    config.lr_decay = 0.7
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = True
+    config.dict_dim = 40000 # approx_vocab_size
+    config.metric_type = ['accuracy']
+    config.optimizer_type = 'adam'
+    config.use_lod_tensor = True
+    config.embedding_norm = False
+    config.OOV_fill = 'uniform'
+    config.duplicate_data = False
+    # net config
+    config.emb_dim = 300
+    config.rnn_hid_dim = [512, 1024, 2048]
+    config.fc_dim = [1600, 1600]
+    config.droprate_lstm = 0.0
+    config.droprate_fc = 0.1
+    config.class_dim = 2
+    return config
--- a/fluid/text_matching_on_quora/data/prepare_quora_data.sh
+++ b/fluid/text_matching_on_quora/data/prepare_quora_data.sh
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Please download the Quora dataset firstly from https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing
+# to the ROOT_DIR: $HOME/.cache/paddle/dataset
+DATA_DIR=$HOME/.cache/paddle/dataset
+wget --directory-prefix=$DATA_DIR http://nlp.stanford.edu/data/glove.840B.300d.zip
+unzip $DATA_DIR/glove.840B.300d.zip
+# The finally dataset dir should be like
+# $HOME/.cache/paddle/dataset
+# |- Quora_question_pair_partition
+#     |- train.tsv
+#     |- test.tsv
+#     |- dev.tsv
+#     |- readme.txt
+#     |- wordvec.txt
+# |- glove.840B.300d.txt
--- a/fluid/text_matching_on_quora/imgs/README.md
+++ b/fluid/text_matching_on_quora/imgs/README.md
+Image files for this model: text_matching_on_quora
--- a/fluid/text_matching_on_quora/imgs/models_test_acc.png
+++ b/fluid/text_matching_on_quora/imgs/models_test_acc.png
--- a/fluid/text_matching_on_quora/metric.py
+++ b/fluid/text_matching_on_quora/metric.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+"""
+This Module defines evaluate metrics for classification tasks
+"""
+def accuracy(y_pred, label):
+    """
+    define correct: the top 1 class in y_pred is the same as y_true
+    """
+    y_pred = np.squeeze(y_pred)
+    y_pred_idx = np.argmax(y_pred, axis=1)
+    return 1.0 * np.sum(y_pred_idx == label) / label.shape[0]
+def accuracy_with_threshold(y_pred, label, threshold=0.5):
+    """
+    define correct: the y_true class's prob in y_pred is bigger than threshold
+    when threshold is 0.5, This fuction is equal to accuracy
+    """
+    y_pred = np.squeeze(y_pred)
+    y_pred_idx = (y_pred[:, 1] > threshold).astype(int)
+    return 1.0 * np.sum(y_pred_idx == label) / label.shape[0]
--- a/fluid/text_matching_on_quora/models/__init__.py
+++ b/fluid/text_matching_on_quora/models/__init__.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .cdssm import cdssmNet
+from .dec_att import DecAttNet
+from .sse import SSENet
+from .infer_sent import InferSentNet
--- a/fluid/text_matching_on_quora/models/cdssm.py
+++ b/fluid/text_matching_on_quora/models/cdssm.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle.fluid as fluid
+class cdssmNet():
+    """cdssm net"""
+    def __init__(self, config):
+         self._config = config
+    def __call__(self, seq1, seq2, label):
+        return self.body(seq1, seq2, label, self._config)
+    def body(self, seq1, seq2, label, config):
+        """Body function"""
+        def conv_model(seq):
+            embed = fluid.layers.embedding(input=seq, size=[config.dict_dim, config.emb_dim], param_attr='emb.w')
+            conv = fluid.layers.sequence_conv(embed,
+                                        num_filters=config.kernel_count,
+                                        filter_size=config.kernel_size,
+                                        filter_stride=1,
+                                        padding=True, # TODO: what is padding
+                                        bias_attr=False,
+                                        param_attr='conv1d.w',
+                                        act='relu')
+            #print paddle.parameters.get('conv1d.w').shape
+            conv = fluid.layers.dropout(conv, dropout_prob = config.droprate_conv)
+            pool = fluid.layers.sequence_pool(conv, pool_type="max")
+            fc = fluid.layers.fc(pool,
+                             size=config.fc_dim,
+                             param_attr='fc1.w',
+                             bias_attr='fc1.b',
+                             act='relu')
+            return fc
+        def MLP(vec):
+            for dim in config.mlp_hid_dim:
+                vec = fluid.layers.fc(vec, size=dim, act='relu')
+                vec = fluid.layers.dropout(vec, dropout_prob=config.droprate_fc)
+            return vec
+        seq1_fc = conv_model(seq1)
+        seq2_fc = conv_model(seq2)
+        concated_seq = fluid.layers.concat(input=[seq1_fc, seq2_fc], axis=1)
+        mlp_res = MLP(concated_seq)
+        prediction = fluid.layers.fc(mlp_res, size=config.class_dim, act='softmax')
+        loss = fluid.layers.cross_entropy(input=prediction, label=label)
+        avg_cost = fluid.layers.mean(x=loss)
+        acc = fluid.layers.accuracy(input=prediction, label=label)
+        return avg_cost, acc, prediction
--- a/fluid/text_matching_on_quora/models/dec_att.py
+++ b/fluid/text_matching_on_quora/models/dec_att.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle.fluid as fluid
+class DecAttNet():
+    """decompose attention net"""
+    def __init__(self, config):
+         self._config = config
+         self.initializer = fluid.initializer.Xavier(uniform=False)
+    def __call__(self, seq1, seq2, mask1, mask2, label):
+        return self.body(seq1, seq2, mask1, mask2, label)
+    def body(self, seq1, seq2, mask1, mask2, label):
+        """Body function"""
+        transformed_q1 = self.transformation(seq1)
+        transformed_q2 = self.transformation(seq2)
+        masked_q1 = self.apply_mask(transformed_q1, mask1)
+        masked_q2 = self.apply_mask(transformed_q2, mask2)
+        alpha, beta = self.attend(masked_q1, masked_q2)
+        if self._config.share_wight_btw_seq:
+            seq1_compare = self.compare(masked_q1, beta, param_prefix='compare')
+            seq2_compare = self.compare(masked_q2, alpha, param_prefix='compare')
+        else:
+            seq1_compare = self.compare(masked_q1, beta, param_prefix='compare_1')
+            seq2_compare = self.compare(masked_q2, alpha, param_prefix='compare_2')
+        aggregate_res = self.aggregate(seq1_compare, seq2_compare)
+        prediction = fluid.layers.fc(aggregate_res, size=self._config.class_dim, act='softmax')
+        loss = fluid.layers.cross_entropy(input=prediction, label=label)
+        avg_cost = fluid.layers.mean(x=loss)
+        acc = fluid.layers.accuracy(input=prediction, label=label)
+        return avg_cost, acc, prediction
+    def apply_mask(self, seq, mask):
+       """
+       apply mask on seq
+       Input: seq in shape [batch_size, seq_len, embedding_size]
+       Input: mask in shape [batch_size, seq_len]
+       Output: masked seq in shape [batch_size, seq_len, embedding_size]
+       """
+       return fluid.layers.elementwise_mul(x=seq, y=mask, axis=0)
+    def feed_forward_2d(self, vec, param_prefix):
+        """
+        Input: vec in shape [batch_size, seq_len, vec_dim]
+        Output: fc2 in shape [batch_size, seq_len, num_units[1]]
+        """
+        fc1 = fluid.layers.fc(vec, size=self._config.num_units[0], num_flatten_dims=2,
+                        param_attr=fluid.ParamAttr(name=param_prefix+'_fc1.w',
+                                                   initializer=self.initializer),
+                        bias_attr=param_prefix + '_fc1.b', act='relu')
+        fc1 = fluid.layers.dropout(fc1, dropout_prob = self._config.droprate)
+        fc2 = fluid.layers.fc(fc1, size=self._config.num_units[1], num_flatten_dims=2,
+                        param_attr=fluid.ParamAttr(name=param_prefix+'_fc2.w',
+                                                   initializer=self.initializer),
+                        bias_attr=param_prefix + '_fc2.b', act='relu')
+        fc2 = fluid.layers.dropout(fc2, dropout_prob = self._config.droprate)
+        return fc2
+    def feed_forward(self, vec, param_prefix):
+        """
+        Input: vec in shape [batch_size, vec_dim]
+        Output: fc2 in shape [batch_size, num_units[1]]
+        """
+        fc1 = fluid.layers.fc(vec, size=self._config.num_units[0], num_flatten_dims=1,
+                        param_attr=fluid.ParamAttr(name=param_prefix+'_fc1.w',
+                                                   initializer=self.initializer),
+                        bias_attr=param_prefix + '_fc1.b', act='relu')
+        fc1 = fluid.layers.dropout(fc1, dropout_prob = self._config.droprate)
+        fc2 = fluid.layers.fc(fc1, size=self._config.num_units[1], num_flatten_dims=1,
+                        param_attr=fluid.ParamAttr(name=param_prefix+'_fc2.w',
+                                                   initializer=self.initializer),
+                        bias_attr=param_prefix + '_fc2.b', act='relu')
+        fc2 = fluid.layers.dropout(fc2, dropout_prob = self._config.droprate)
+        return fc2
+    def transformation(self, seq):
+        embed = fluid.layers.embedding(input=seq, size=[self._config.dict_dim, self._config.emb_dim],
+                                       param_attr=fluid.ParamAttr(name='emb.w', trainable=self._config.word_embedding_trainable))
+        if self._config.proj_emb_dim is not None:
+            return fluid.layers.fc(embed, size=self._config.proj_emb_dim, num_flatten_dims=2,
+                        param_attr=fluid.ParamAttr(name='project' + '_fc1.w',
+                                                   initializer=self.initializer),
+                        bias_attr=False,
+                        act=None)
+        return embed
+    def attend(self, seq1, seq2):
+        """
+        Input: seq1, shape [batch_size, seq_len1, embed_size]
+        Input: seq2, shape [batch_size, seq_len2, embed_size]
+        Output: alpha, shape [batch_size, seq_len1, embed_size]
+        Output: beta, shape [batch_size, seq_len2, embed_size]
+        """
+        if self._config.share_wight_btw_seq:
+            seq1 = self.feed_forward_2d(seq1, param_prefix="attend")
+            seq2 = self.feed_forward_2d(seq2, param_prefix="attend")
+        else:
+            seq1 = self.feed_forward_2d(seq1, param_prefix="attend_1")
+            seq2 = self.feed_forward_2d(seq2, param_prefix="attend_2")
+        attention_weight = fluid.layers.matmul(seq1, seq2, transpose_y=True)
+        normalized_attention_weight = fluid.layers.softmax(attention_weight)
+        beta = fluid.layers.matmul(normalized_attention_weight, seq2)
+        attention_weight_t = fluid.layers.transpose(attention_weight, perm=[0, 2, 1])
+        normalized_attention_weight_t = fluid.layers.softmax(attention_weight_t)
+        alpha = fluid.layers.matmul(normalized_attention_weight_t, seq1)
+        return alpha, beta
+    def compare(self, seq, soft_alignment, param_prefix):
+        concat_seq = fluid.layers.concat(input=[seq, soft_alignment], axis=2)
+        return self.feed_forward_2d(concat_seq, param_prefix="compare")
+    def aggregate(self, vec1, vec2):
+        vec1 = fluid.layers.reduce_sum(vec1, dim=1)
+        vec2 = fluid.layers.reduce_sum(vec2, dim=1)
+        concat_vec = fluid.layers.concat(input=[vec1, vec2], axis=1)
+        return self.feed_forward(concat_vec, param_prefix='aggregate')
--- a/fluid/text_matching_on_quora/models/infer_sent.py
+++ b/fluid/text_matching_on_quora/models/infer_sent.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle.fluid as fluid
+from .my_layers import bi_lstm_layer
+from .match_layers import ElementwiseMatching
+class InferSentNet():
+    """
+    Base on the paper: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data:
+    https://arxiv.org/abs/1705.02364
+    """
+    def __init__(self, config):
+         self._config = config
+    def __call__(self, seq1, seq2, label):
+        return self.body(seq1, seq2, label, self._config)
+    def body(self, seq1, seq2, label, config):
+        """Body function"""
+        seq1_rnn = self.encoder(seq1)
+        seq2_rnn = self.encoder(seq2)
+        seq_match = ElementwiseMatching(seq1_rnn, seq2_rnn)
+        mlp_res = self.MLP(seq_match)
+        prediction = fluid.layers.fc(mlp_res, size=self._config.class_dim, act='softmax')
+        loss = fluid.layers.cross_entropy(input=prediction, label=label)
+        avg_cost = fluid.layers.mean(x=loss)
+        acc = fluid.layers.accuracy(input=prediction, label=label)
+        return avg_cost, acc, prediction
+    def encoder(self, seq):
+        """encoder"""
+        embed = fluid.layers.embedding(
+                    input=seq,
+                    size=[self._config.dict_dim, self._config.emb_dim],
+                    param_attr=fluid.ParamAttr(name='emb.w', trainable=self._config.word_embedding_trainable))
+        bi_lstm_h = bi_lstm_layer(
+                        embed,
+                        rnn_hid_dim = self._config.rnn_hid_dim, 
+                        name='encoder')
+        bi_lstm_h = fluid.layers.dropout(bi_lstm_h, dropout_prob=self._config.droprate_lstm)
+        pool = fluid.layers.sequence_pool(input=bi_lstm_h, pool_type='max')
+        return pool
+    def MLP(self, vec):
+        if self._config.mlp_non_linear:
+            drop1 = fluid.layers.dropout(vec, dropout_prob=self._config.droprate_fc)
+            fc1 = fluid.layers.fc(drop1, size=512, act='tanh')
+            drop2 = fluid.layers.dropout(fc1, dropout_prob=self._config.droprate_fc)
+            fc2 = fluid.layers.fc(drop2, size=512, act='tanh')
+            res = fluid.layers.dropout(fc2, dropout_prob=self._config.droprate_fc)
+        else:
+            fc1 = fluid.layers.fc(vec, size=512, act=None)
+            res = fluid.layers.fc(fc1, size=512, act=None)
+        return res
--- a/fluid/text_matching_on_quora/models/match_layers.py
+++ b/fluid/text_matching_on_quora/models/match_layers.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This Module provide different kinds of Match layers
+"""
+import paddle.fluid as fluid
+def MultiPerspectiveMatching(vec1, vec2, perspective_num):
+    """
+    MultiPerspectiveMatching
+    """
+    sim_res = None
+    for i in range(perspective_num):
+        vec1_res = fluid.layers.elementwise_add_with_weight(
+                       vec1,
+                       param_attr="elementwise_add_with_weight." + str(i))
+        vec2_res = fluid.layers.elementwise_add_with_weight(
+                       vec2,
+                       param_attr="elementwise_add_with_weight." + str(i))
+        m = fluid.layers.cos_sim(vec1_res, vec2_res)
+        if sim_res is None:
+            sim_res = m
+        else:
+            sim_res = fluid.layers.concat(input=[sim_res, m], axis=1)
+    return sim_res
+def ConcateMatching(vec1, vec2):
+    """
+    ConcateMatching
+    """
+    #TODO: assert shape
+    return fluid.layers.concat(input=[vec1, vec2], axis=1)
+def ElementwiseMatching(vec1, vec2):
+    """
+    reference: [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://arxiv.org/abs/1705.02364)
+    """
+    elementwise_mul = fluid.layers.elementwise_mul(x=vec1, y=vec2)
+    elementwise_sub = fluid.layers.elementwise_sub(x=vec1, y=vec2) 
+    elementwise_abs_sub = fluid.layers.abs(elementwise_sub)
+    return fluid.layers.concat(input=[vec1, vec2, elementwise_mul, elementwise_abs_sub], axis=1)
--- a/fluid/text_matching_on_quora/models/my_layers.py
+++ b/fluid/text_matching_on_quora/models/my_layers.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This module defines some Frequently-used DNN layers
+"""
+import paddle.fluid as fluid
+def bi_lstm_layer(input, rnn_hid_dim, name):
+    """
+    This is a Bi-directional LSTM(long short term memory) Module
+    """
+    fc0 = fluid.layers.fc(input=input,              # fc for lstm
+                                size=rnn_hid_dim * 4,
+                                param_attr=name + '.fc0.w',
+                                bias_attr=False,
+                                act=None)
+    lstm_h, c = fluid.layers.dynamic_lstm(
+                 input=fc0,
+                 size=rnn_hid_dim * 4,
+                 is_reverse=False,
+                 param_attr=name + '.lstm_w',
+                 bias_attr=name + '.lstm_b')
+    reversed_lstm_h, reversed_c = fluid.layers.dynamic_lstm(
+                 input=fc0,
+                 size=rnn_hid_dim * 4,
+                 is_reverse=True,
+                 param_attr=name + '.reversed_lstm_w',
+                 bias_attr=name + '.reversed_lstm_b')
+    return fluid.layers.concat(input=[lstm_h, reversed_lstm_h], axis=1) 
--- a/fluid/text_matching_on_quora/models/pwim.py
+++ b/fluid/text_matching_on_quora/models/pwim.py
+# Just for test `git push`
--- a/fluid/text_matching_on_quora/models/sse.py
+++ b/fluid/text_matching_on_quora/models/sse.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle.fluid as fluid
+from .my_layers import bi_lstm_layer
+from .match_layers import ElementwiseMatching
+class SSENet():
+    """
+    SSE net: Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
+    https://arxiv.org/abs/1708.02312
+    """
+    def __init__(self, config):
+         self._config = config
+    def __call__(self, seq1, seq2, label):
+        return self.body(seq1, seq2, label, self._config)
+    def body(self, seq1, seq2, label, config):
+        """Body function"""
+        def stacked_bi_rnn_model(seq):
+            embed = fluid.layers.embedding(input=seq, size=[self._config.dict_dim, self._config.emb_dim], param_attr='emb.w')
+            stacked_lstm_out = [embed]
+            for i in range(len(self._config.rnn_hid_dim)):
+                if i == 0:
+                    feature = embed
+                else:
+                    feature = fluid.layers.concat(input = stacked_lstm_out, axis=1)
+                bi_lstm_h = bi_lstm_layer(feature,
+                                          rnn_hid_dim=self._config.rnn_hid_dim[i],
+                                          name="lstm_" + str(i))
+                # add dropout except for the last stacked lstm layer
+                if i != len(self._config.rnn_hid_dim) - 1:
+                    bi_lstm_h = fluid.layers.dropout(bi_lstm_h, dropout_prob=self._config.droprate_lstm)
+                stacked_lstm_out.append(bi_lstm_h)
+            pool = fluid.layers.sequence_pool(input=bi_lstm_h, pool_type='max')
+            return pool
+        def MLP(vec):
+            for i in range(len(self._config.fc_dim)):
+                vec = fluid.layers.fc(vec, size=self._config.fc_dim[i], act='relu')
+                # add dropout after every layer of MLP
+                vec = fluid.layers.dropout(vec, dropout_prob=self._config.droprate_fc)
+            return vec
+        seq1_rnn = stacked_bi_rnn_model(seq1)
+        seq2_rnn = stacked_bi_rnn_model(seq2)
+        seq_match = ElementwiseMatching(seq1_rnn, seq2_rnn)
+        mlp_res = MLP(seq_match)
+        prediction = fluid.layers.fc(mlp_res, size=self._config.class_dim, act='softmax')
+        loss = fluid.layers.cross_entropy(input=prediction, label=label)
+        avg_cost = fluid.layers.mean(x=loss)
+        acc = fluid.layers.accuracy(input=prediction, label=label)
+        return avg_cost, acc, prediction
--- a/fluid/text_matching_on_quora/models/test.py
+++ b/fluid/text_matching_on_quora/models/test.py
--- a/fluid/text_matching_on_quora/pretrained_word2vec.py
+++ b/fluid/text_matching_on_quora/pretrained_word2vec.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This Module provide pretrained word-embeddings 
+"""
+from __future__ import print_function, unicode_literals 
+import numpy as np
+import time, datetime
+import os, sys
+def Glove840B_300D(filepath, keys=None):
+    """
+    input: the "glove.840B.300d.txt" file path
+    return: a dict, key: word (unicode), value: a numpy array with shape [300]
+    """
+    if keys is not None:
+        assert(isinstance(keys, set))
+    print("loading word2vec from ", filepath)
+    print("please wait for a minute.")
+    start = time.time()
+    word2vec = {}
+    with open(filepath, "r") as f:
+        for line in f:
+            if sys.version_info <= (3, 0): # for python2
+                line = line.decode('utf-8')
+            info = line.strip("\n").split(" ")
+            word = info[0]
+            if (keys is not None) and (word not in keys):
+                continue
+            vector = info[1:]
+            assert(len(vector) == 300)
+            word2vec[word] = np.asarray(vector, dtype='float32')
+    end = time.time()
+    print("Spent ", str(datetime.timedelta(seconds=end-start)), " on loading word2vec.")
+    return word2vec
+if __name__ == '__main__':
+    from os.path import expanduser
+    home = expanduser("~")
+    embed_dict = Glove840B_300D(os.path.join(home, "./.cache/paddle/dataset/glove.840B.300d.txt"))
+    exit(0)
--- a/fluid/text_matching_on_quora/quora_question_pairs.py
+++ b/fluid/text_matching_on_quora/quora_question_pairs.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+"""
+import paddle.dataset.common
+import collections
+import tarfile
+import re
+import string
+import random
+import os, sys
+import nltk
+from os.path import expanduser
+__all__ = ['word_dict', 'train', 'dev', 'test']
+URL = "https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view"
+DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset')
+DATA_DIR = "Quora_question_pair_partition"
+QUORA_TRAIN_FILE_NAME = os.path.join(DATA_HOME, DATA_DIR, 'train.tsv')
+QUORA_DEV_FILE_NAME = os.path.join(DATA_HOME, DATA_DIR, 'dev.tsv')
+QUORA_TEST_FILE_NAME = os.path.join(DATA_HOME, DATA_DIR, 'test.tsv')
+# punctuation or nltk or space
+TOKENIZE_METHOD='space'
+COLUMN_COUNT = 4
+def tokenize(s):
+    if sys.version_info <= (3, 0): # for python2
+        s = s.decode('utf-8')
+    if TOKENIZE_METHOD == "nltk":
+        return nltk.tokenize.word_tokenize(s)
+    elif TOKENIZE_METHOD == "punctuation":
+        return s.translate({ord(char): None for char in string.punctuation}).lower().split()
+    elif TOKENIZE_METHOD == "space":
+        return s.split()
+    else:
+        raise RuntimeError("Invalid tokenize method")
+def maybe_open(file_name):
+    if not os.path.isfile(file_name):
+        msg = "file not exist: %s\nPlease download the dataset firstly from: %s\n\n" % (file_name, URL) + \
+                ("# The finally dataset dir should be like\n\n"
+                "$HOME/.cache/paddle/dataset\n"
+                " |- Quora_question_pair_partition\n"
+                "     |- train.tsv\n"
+                "     |- test.tsv\n"
+                "     |- dev.tsv\n"
+                "     |- readme.txt\n"
+                "     |- wordvec.txt\n")
+        raise RuntimeError(msg)
+    return open(file_name, 'r')
+def tokenized_question_pairs(file_name):
+    """
+    """
+    with maybe_open(file_name) as f:
+        questions = {}
+        lines = f.readlines()
+        for line in lines:
+            info = line.strip().split('\t')
+            if len(info) != COLUMN_COUNT:
+                # formatting error
+                continue
+            (label, question1, question2, id) = info
+            question1 = tokenize(question1)
+            question2 = tokenize(question2)
+            yield question1, question2, int(label)
+def tokenized_questions(file_name):
+    """
+    """
+    with maybe_open(file_name) as f:
+        lines = f.readlines()
+        for line in lines:
+            info = line.strip().split('\t')
+            if len(info) != COLUMN_COUNT:
+                # formatting error
+                continue
+            (label, question1, question2, id) = info
+            yield tokenize(question1)
+            yield tokenize(question2)
+def build_dict(file_name, cutoff):
+    """
+    Build a word dictionary from the corpus. Keys of the dictionary are words,
+    and values are zero-based IDs of these words.
+    """
+    word_freq = collections.defaultdict(int)
+    for doc in tokenized_questions(file_name):
+        for word in doc:
+            word_freq[word] += 1
+    word_freq = filter(lambda x: x[1] > cutoff, word_freq.items())
+    dictionary = sorted(word_freq, key=lambda x: (-x[1], x[0]))
+    words, _ = list(zip(*dictionary))
+    word_idx = dict(zip(words, range(len(words))))
+    word_idx['<unk>'] = len(words)
+    word_idx['<pad>'] = len(words) + 1
+    return word_idx
+def reader_creator(file_name, word_idx):
+    UNK_ID = word_idx['<unk>']
+    def reader():
+        for (q1, q2, label) in tokenized_question_pairs(file_name):
+            q1_ids = [word_idx.get(w, UNK_ID) for w in q1]
+            q2_ids = [word_idx.get(w, UNK_ID) for w in q2]
+            if q1_ids != [] and q2_ids != []: # [] is not allowed in fluid
+                assert(label in [0, 1])
+                yield q1_ids, q2_ids, label
+    return reader
+def train(word_idx):
+    """
+    Quora training set creator.
+    It returns a reader creator, each sample in the reader is two zero-based ID
+    list and label in [0, 1].
+    :param word_idx: word dictionary
+    :type word_idx: dict
+    :return: Training reader creator
+    :rtype: callable
+    """   
+    return reader_creator(QUORA_TRAIN_FILE_NAME, word_idx)
+def dev(word_idx):
+    """
+    Quora develop set creator.
+    It returns a reader creator, each sample in the reader is two zero-based ID
+    list and label in [0, 1].
+    :param word_idx: word dictionary
+    :type word_idx: dict
+    :return: develop reader creator
+    :rtype: callable
+    """
+    return reader_creator(QUORA_DEV_FILE_NAME, word_idx)
+def test(word_idx):
+    """
+    Quora test set creator.
+    It returns a reader creator, each sample in the reader is two zero-based ID
+    list and label in [0, 1].
+    :param word_idx: word dictionary
+    :type word_idx: dict
+    :return: Test reader creator
+    :rtype: callable
+    """
+    return reader_creator(QUORA_TEST_FILE_NAME, word_idx)
+def word_dict():
+    """
+    Build a word dictionary from the corpus.
+    :return: Word dictionary
+    :rtype: dict
+    """
+    return build_dict(file_name=QUORA_TRAIN_FILE_NAME, cutoff=4)
--- a/fluid/text_matching_on_quora/train_and_evaluate.py
+++ b/fluid/text_matching_on_quora/train_and_evaluate.py
+#Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import print_function
+import os
+import sys
+import time
+import argparse
+import unittest
+import contextlib
+import numpy as np
+import paddle.fluid as fluid
+import utils, metric, configs
+import models
+from pretrained_word2vec import Glove840B_300D 
+parser = argparse.ArgumentParser(description=__doc__)
+parser.add_argument('--model_name',       type=str,   default='cdssm',                  help="Which model to train")
+parser.add_argument('--config',           type=str,   default='cdssm.cdssm_base',       help="The global config setting")
+DATA_DIR = os.path.join(os.path.expanduser('~'), '.cache/paddle/dataset')
+def evaluate(epoch_id, exe, inference_program, dev_reader, test_reader, fetch_list, feeder, metric_type):
+    """
+    evaluate on test/dev dataset
+    """
+    def infer(test_reader):
+        """
+        do inference function
+        """
+        total_cost = 0.0
+        total_count = 0
+        preds, labels = [], []
+        for data in test_reader():
+            avg_cost, avg_acc, batch_prediction = exe.run(inference_program,
+                          feed=feeder.feed(data),
+                          fetch_list=fetch_list,
+                          return_numpy=True)
+            total_cost += avg_cost * len(data)
+            total_count += len(data)
+            preds.append(batch_prediction)
+            labels.append(np.asarray([x[-1] for x in data], dtype=np.int64))
+        y_pred = np.concatenate(preds)
+        y_label = np.concatenate(labels)
+        metric_res = []
+        for metric_name in metric_type:
+            if metric_name == 'accuracy_with_threshold':
+                metric_res.append((metric_name, metric.accuracy_with_threshold(y_pred, y_label, threshold=0.3)))
+            elif metric_name == 'accuracy':
+                metric_res.append((metric_name, metric.accuracy(y_pred, y_label)))
+            else:
+                print("Unknown metric type: ", metric_name)
+                exit()
+        return total_cost / (total_count * 1.0), metric_res
+    dev_cost, dev_metric_res = infer(dev_reader)
+    print("[%s] epoch_id: %d, dev_cost: %f, " % (
+                 time.asctime( time.localtime(time.time()) ),
+                 epoch_id,
+                 dev_cost)
+               + ', '.join([str(x[0]) + ": " + str(x[1]) for x in dev_metric_res]))
+    test_cost, test_metric_res = infer(test_reader)
+    print("[%s] epoch_id: %d, test_cost: %f, " % (
+                time.asctime( time.localtime(time.time()) ),
+                epoch_id,
+                test_cost)
+              + ', '.join([str(x[0]) + ": " + str(x[1]) for x in test_metric_res]))
+    print("")
+def train_and_evaluate(train_reader,
+          test_reader, 
+          dev_reader,
+          network,
+          optimizer,
+          global_config,
+          pretrained_word_embedding,
+          use_cuda,
+          parallel):
+    """
+    train network
+    """
+    # define the net
+    if global_config.use_lod_tensor: 
+        # automatic add batch dim
+        q1 = fluid.layers.data(
+            name="question1", shape=[1], dtype="int64", lod_level=1)
+        q2 = fluid.layers.data(
+            name="question2", shape=[1], dtype="int64", lod_level=1)
+        label = fluid.layers.data(name="label", shape=[1], dtype="int64")
+        cost, acc, prediction = network(q1, q2, label)  
+    else:
+        # shape: [batch_size, max_seq_len_in_batch, 1]
+        q1 = fluid.layers.data(
+            name="question1", shape=[-1, -1, 1], dtype="int64")
+        q2 = fluid.layers.data(
+            name="question2", shape=[-1, -1, 1], dtype="int64")
+        # shape: [batch_size, max_seq_len_in_batch]
+        mask1 = fluid.layers.data(name="mask1", shape=[-1, -1], dtype="float32")
+        mask2 = fluid.layers.data(name="mask2", shape=[-1, -1], dtype="float32")
+        label = fluid.layers.data(name="label", shape=[1], dtype="int64")
+        cost, acc, prediction = network(q1, q2, mask1, mask2, label)
+    if parallel:
+        # TODO: Paarallel Training
+        print("Parallel Training is not supported for now.")
+        sys.exit(1)
+    #optimizer.minimize(cost)
+    if use_cuda:
+        print("Using GPU")
+        place = fluid.CUDAPlace(0)
+    else:
+        print("Using CPU")
+        place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    if global_config.use_lod_tensor:
+        feeder = fluid.DataFeeder(feed_list=[q1, q2, label], place=place)
+    else:
+        feeder = fluid.DataFeeder(feed_list=[q1, q2, mask1, mask2, label], place=place)
+    # logging param info
+    for param in fluid.default_main_program().global_block().all_parameters():
+        print("param name: %s; param shape: %s" % (param.name, param.shape))
+    # define inference_program
+    inference_program = fluid.default_main_program().clone(for_test=True)
+    optimizer.minimize(cost)
+    exe.run(fluid.default_startup_program())
+    # load emb from a numpy erray
+    if pretrained_word_embedding is not None:
+        print("loading pretrained word embedding to param")
+        embedding_name = "emb.w"
+        embedding_param = fluid.global_scope().find_var(embedding_name).get_tensor()
+        embedding_param.set(pretrained_word_embedding, place)
+    evaluate(-1,
+             exe,
+             inference_program,
+             dev_reader,
+             test_reader,
+             fetch_list=[cost, acc, prediction],
+             feeder=feeder,
+             metric_type=global_config.metric_type)
+    # start training
+    print("[%s] Start Training" % time.asctime(time.localtime(time.time())))
+    for epoch_id in range(global_config.epoch_num):
+        data_size, data_count, total_acc, total_cost = 0, 0, 0.0, 0.0
+        batch_id = 0
+        for data in train_reader():
+            avg_cost_np, avg_acc_np = exe.run(fluid.default_main_program(),
+                                              feed=feeder.feed(data),
+                                              fetch_list=[cost, acc])
+            data_size = len(data)
+            total_acc += data_size * avg_acc_np
+            total_cost += data_size * avg_cost_np
+            data_count += data_size
+            if batch_id % 100 == 0:
+                print("[%s] epoch_id: %d, batch_id: %d, cost: %f, acc: %f" % (
+                    time.asctime(time.localtime(time.time())),
+                    epoch_id, 
+                    batch_id, 
+                    avg_cost_np,
+                    avg_acc_np))
+            batch_id += 1
+        avg_cost = total_cost / data_count
+        avg_acc = total_acc / data_count
+        print("")
+        print("[%s] epoch_id: %d, train_avg_cost: %f, train_avg_acc: %f" % (
+            time.asctime( time.localtime(time.time()) ), epoch_id, avg_cost, avg_acc))
+        epoch_model = global_config.save_dirname + "/" + "epoch" + str(epoch_id)
+        fluid.io.save_inference_model(epoch_model, ["question1", "question2", "label"], acc, exe)    
+        evaluate(epoch_id, 
+                 exe, 
+                 inference_program,
+                 dev_reader,
+                 test_reader, 
+                 fetch_list=[cost, acc, prediction], 
+                 feeder=feeder, 
+                 metric_type=global_config.metric_type)
+def main():
+    """
+    This function will parse argments, prepare data and prepare pretrained embedding
+    """
+    args = parser.parse_args()
+    global_config = configs.__dict__[args.config]()
+    print("net_name: ", args.model_name)
+    net = models.__dict__[args.model_name](global_config)
+    # get word_dict
+    word_dict = utils.getDict(data_type="quora_question_pairs")
+    # get reader
+    train_reader, dev_reader, test_reader = utils.prepare_data(
+        "quora_question_pairs",
+         word_dict=word_dict,
+         batch_size = global_config.batch_size,
+         buf_size=800000,
+         duplicate_data=global_config.duplicate_data,
+         use_pad=(not global_config.use_lod_tensor))
+    # load pretrained_word_embedding
+    if global_config.use_pretrained_word_embedding:
+        word2vec = Glove840B_300D(filepath=os.path.join(DATA_DIR, "glove.840B.300d.txt"),
+                                  keys=set(word_dict.keys()))
+        pretrained_word_embedding = utils.get_pretrained_word_embedding(
+                                        word2vec=word2vec,
+                                        word2id=word_dict,
+                                        config=global_config)
+        print("pretrained_word_embedding to be load:", pretrained_word_embedding)
+    else:
+        pretrained_word_embedding = None
+    # define optimizer
+    optimizer = utils.getOptimizer(global_config)
+    # use cuda or not
+    if not global_config.has_member('use_cuda'):
+        global_config.use_cuda = 'CUDA_VISIBLE_DEVICES' in os.environ
+    global_config.list_config()
+    train_and_evaluate(
+                   train_reader,
+                   dev_reader,
+                   test_reader,
+                   net,
+                   optimizer,
+                   global_config,
+                   pretrained_word_embedding,
+                   use_cuda=global_config.use_cuda,
+                   parallel=False)
+if __name__ == "__main__":
+    main()
--- a/fluid/text_matching_on_quora/train_and_evaluate.sh
+++ b/fluid/text_matching_on_quora/train_and_evaluate.sh
+source ~/mapingshuo/.bash_mapingshuo_fluid
+export CUDA_VISIBLE_DEVICES=1
+fluid train_and_evaluate.py  \
+    --model_name=cdssmNet  \
+    --config=cdssm_base
+#fluid train_and_evaluate.py \
+#    --model_name=DecAttNet  \
+#    --config=decatt_glove
+#fluid train_and_evaluate.py \
+#    --model_name=DecAttNet  \
+#    --config=decatt_word
+#fluid train_and_evaluate.py  \
+#    --model_name=ESIMNet  \
+#    --config=esim_seq
--- a/fluid/text_matching_on_quora/utils.py
+++ b/fluid/text_matching_on_quora/utils.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This module provides utilities for data generator and optimizer definition 
+"""
+import sys
+import time
+import numpy as np
+import paddle.fluid as fluid
+import paddle
+import quora_question_pairs
+def to_lodtensor(data, place):
+    """
+    convert to LODtensor
+    """
+    seq_lens = [len(seq) for seq in data]
+    cur_len = 0
+    lod = [cur_len]
+    for l in seq_lens:
+        cur_len += l
+        lod.append(cur_len)
+    flattened_data = np.concatenate(data, axis=0).astype("int64")
+    flattened_data = flattened_data.reshape([len(flattened_data), 1])
+    res = fluid.LoDTensor()
+    res.set(flattened_data, place)
+    res.set_lod([lod])
+    return res
+def getOptimizer(global_config):
+    """
+    get Optimizer by config
+    """
+    if global_config.optimizer_type == "adam":
+        optimizer = fluid.optimizer.Adam(learning_rate=fluid.layers.exponential_decay(
+                                                      learning_rate=global_config.learning_rate,
+                                                      decay_steps=global_config.train_samples_num // global_config.batch_size,
+                                                      decay_rate=global_config.lr_decay))
+    elif global_config.optimizer_type == "sgd":
+        optimizer = fluid.optimizer.SGD(learning_rate=fluid.layers.exponential_decay(
+                                                      learning_rate=global_config.learning_rate,
+                                                      decay_steps=global_config.train_samples_num // global_config.batch_size,
+                                                      decay_rate=global_config.lr_decay))
+    elif global_config.optimizer_type == "adagrad":
+        optimizer = fluid.optimizer.Adagrad(learning_rate=fluid.layers.exponential_decay(
+                                                      learning_rate=global_config.learning_rate,
+                                                      decay_steps=global_config.train_samples_num // global_config.batch_size,
+                                                      decay_rate=global_config.lr_decay))
+    return optimizer
+def get_pretrained_word_embedding(word2vec, word2id, config):
+    """get pretrained embedding in shape [config.dict_dim, config.emb_dim]"""
+    print("preparing pretrained word embedding ...")
+    assert(config.dict_dim >= len(word2id))
+    word2id = sorted(word2id.items(), key = lambda x : x[1])
+    words = [x[0] for x in word2id]
+    words = words + ['<not-a-real-words>'] * (config.dict_dim - len(words))
+    pretrained_emb = []
+    for _, word in enumerate(words):
+        if word in word2vec:
+            assert(len(word2vec[word] == config.emb_dim))
+            if config.embedding_norm:
+                pretrained_emb.append(word2vec[word] / np.linalg.norm(word2vec[word]))
+            else:
+                pretrained_emb.append(word2vec[word])
+        elif config.OOV_fill == 'uniform':
+            pretrained_emb.append(np.random.uniform(-0.05, 0.05, size=[config.emb_dim]).astype(np.float32))
+        elif config.OOV_fill == 'normal':
+            pretrained_emb.append(np.random.normal(loc=0.0, scale=0.1, size=[config.emb_dim]).astype(np.float32))
+        else:
+            print("Unkown OOV fill method: ", OOV_fill)
+            exit()
+    word_embedding = np.stack(pretrained_emb)
+    return word_embedding
+def getDict(data_type="quora_question_pairs"):
+    """
+    get word2id dict from quora dataset
+    """
+    print("Generating word dict...")
+    if data_type == "quora_question_pairs":
+        word_dict = quora_question_pairs.word_dict()
+    else:
+        raise RuntimeError("No such dataset")
+    print("Vocab size: ", len(word_dict))
+    return word_dict
+def duplicate(reader):
+    """
+    duplicate the quora qestion pairs since there are 2 questions in a sample
+    Input: reader, which yield (question1, question2, label)
+    Output: reader, which yield (question1, question2, label) and yield (question2, question1, label)
+    """
+    def duplicated_reader():
+        for data in reader():
+            (q1, q2, label) = data
+            yield (q1, q2, label)
+            yield (q2, q1, label)
+    return duplicated_reader
+def pad(reader, PAD_ID):
+    """
+    Input: reader, yield batches of [(question1, question2, label), ... ]
+    Output: padded_reader, yield batches of [(padded_question1, padded_question2, mask1, mask2, label), ... ]
+    """
+    assert(isinstance(PAD_ID, int))
+    def padded_reader():
+        for batch in reader():
+            max_len1 = max([len(data[0]) for data in batch])
+            max_len2 = max([len(data[1]) for data in batch])
+            padded_batch = []
+            for data in batch:
+                question1, question2, label = data
+                seq_len1 = len(question1)
+                seq_len2 = len(question2)
+                mask1 = [1] * seq_len1 + [0] * (max_len1 - seq_len1)
+                mask2 = [1] * seq_len2 + [0] * (max_len2 - seq_len2)
+                padded_question1 = question1 + [PAD_ID] * (max_len1 - seq_len1)
+                padded_question2 = question2 + [PAD_ID] * (max_len2 - seq_len2)
+                padded_question1 = [[x] for x in padded_question1] # last dim of questions must be 1, according to fluid's request
+                padded_question2 = [[x] for x in padded_question2]
+                assert(len(mask1) == max_len1)
+                assert(len(mask2) == max_len2)
+                assert(len(padded_question1) == max_len1)
+                assert(len(padded_question2) == max_len2)
+                padded_batch.append((padded_question1, padded_question2, mask1, mask2, label))
+            yield padded_batch
+    return padded_reader
+def prepare_data(data_type,
+                 word_dict,
+                 batch_size,
+                 buf_size=50000,
+                 duplicate_data=False,
+                 use_pad=False):
+    """
+    prepare data
+    """
+    PAD_ID=word_dict['<pad>']
+    if data_type == "quora_question_pairs":
+	# train/dev/test reader are batched iters which yield a batch of (question1, question2, label) each time
+	# qestion1 and question2 are lists of word ID
+	# label is 0 or 1
+	# for example: ([1, 3, 2], [7, 5, 4, 99], 1)
+        def prepare_reader(reader):
+            if duplicate_data:
+                reader = duplicate(reader)
+            reader = paddle.batch(
+                       paddle.reader.shuffle(reader, buf_size=buf_size),
+                       batch_size=batch_size, 
+                       drop_last=False)
+            if use_pad:
+                reader = pad(reader, PAD_ID=PAD_ID)
+            return reader
+        train_reader = prepare_reader(quora_question_pairs.train(word_dict))
+        dev_reader = prepare_reader(quora_question_pairs.dev(word_dict))
+        test_reader = prepare_reader(quora_question_pairs.test(word_dict))
+    else:
+        raise RuntimeError("no such dataset")
+    return train_reader, dev_reader, test_reader