diff --git a/.gitignore b/.gitignore
index e3fa5ac69f5b49bef3f079a9fdb4c1a4df4165d8..9376aa940a6060e88d9b2415909292a95a15ca7a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,33 +1,5 @@
-paddle/operators/check_t.save
-paddle/operators/check_tensor.ls
-paddle/operators/tensor.save
-python/paddle/v2/fluid/tests/book/image_classification_resnet.inference.model/
-python/paddle/v2/fluid/tests/book/image_classification_vgg.inference.model/
-python/paddle/v2/fluid/tests/book/label_semantic_roles.inference.model/
 *.DS_Store
 *.vs
-build/
-build_doc/
 *.user
-
-.vscode
-.idea
-.project
-.cproject
-.pydevproject
-.settings/
-
 *.pyc
-CMakeSettings.json
-Makefile
-.test_env/
-third_party/
-
 *~
-bazel-*
-third_party/
-
-build_*
-# clion workspace.
-cmake-build-*
-model_test
\ No newline at end of file
diff --git a/README.md b/README.md
index a8ab968086f8986fc792baa134c7b15079615316..76b5a6f7981e27923e993e5d484d2b07cdaed770 100644
--- a/README.md
+++ b/README.md
@@ -26,7 +26,8 @@ PaddlePaddle 提供了丰富的计算单元，使得用户可以采用模块化
 [SE-ResNeXt](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类模型|ResNeXt中加入了SE block，提高了模型准确率|[Squeeze-and-excitation networks](https://arxiv.org/abs/1709.01507)
 [SSD](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md)|单阶段目标检测器|在不同尺度的特征图上检测对应尺度的目标,可以方便地插入到任何一种标准卷积网络中|[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325)
 [Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md)|基于SSD的单阶段人脸检测器|利用上下文信息解决困难人脸的检测问题，网络表达能力高，鲁棒性强|[PyramidBox: A Context-assisted Single Shot Face Detector](https://arxiv.org/pdf/1803.07737.pdf)
-[Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/faster_rcnn/README_cn.md)|典型的两阶段目标检测器|创造性地采用卷积网络自行产生建议框，并且和目标检测网络共享卷积网络，建议框数目减少，质量提高|[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)
+[Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md)|典型的两阶段目标检测器|创造性地采用卷积网络自行产生建议框，并且和目标检测网络共享卷积网络，建议框数目减少，质量提高|[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)
+[Mask RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md)|基于Faster RCNN模型的经典实例分割模型|在原有Faster RCNN模型基础上添加分割分支，得到掩码结果，实现了掩码和类别预测关系的解藕。|[Mask R-CNN](https://arxiv.org/abs/1703.06870)
 [ICNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/icnet)|图像实时语义分割模型|即考虑了速度，也考虑了准确性，在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡|[ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
 [DCGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/c_gan)|图像生成模型|深度卷积生成对抗网络，将GAN和卷积网络结合起来，以解决GAN训练不稳定的问题|[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/pdf/1511.06434.pdf)
 [ConditionalGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/c_gan)|图像生成模型|条件生成对抗网络，一种带条件约束的GAN，使用额外信息对模型增加条件，可以指导数据生成过程|[Conditional Generative Adversarial Nets](https://arxiv.org/abs/1411.1784)
diff --git a/fluid/AutoDL/LRC/README.md b/fluid/AutoDL/LRC/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..df9af47d4a3876371673cbbfef0ad2553768b9a5
--- /dev/null
+++ b/fluid/AutoDL/LRC/README.md
@@ -0,0 +1,74 @@
+# LRC Local Rademachar Complexity Regularization
+Regularization of Deep Neural Networks(DNNs) for the sake of improving their generalization capability is important and chllenging. This directory contains image classification model based on a novel regularizer rooted in Local Rademacher Complexity (LRC). We appreciate the contribution by [DARTS](https://arxiv.org/abs/1806.09055) for our research. The regularization by LRC and DARTS are combined in this model on CIFAR-10 dataset. Code accompanying the paper
+> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\
+> Yingzhen Yang, Xingjian Li, Jun Huan.\
+> _arXiv:1902.00873_.
+
+---
+# Table of Contents
+
+- [Installation](#installation)
+- [Data preparation](#data-preparation)
+- [Training](#training)
+
+## Installation
+
+Running sample code in this directory requires PaddelPaddle Fluid v.1.2.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle) and make an update.
+
+## Data preparation
+
+When you want to use the cifar-10 dataset for the first time, you can download the dataset as:
+
+    sh ./dataset/download.sh
+
+Please make sure your environment has an internet connection.
+
+The dataset will be downloaded to `dataset/cifar/cifar-10-batches-py` in the same directory as the `train.py`. If automatic download fails, you can download cifar-10-python.tar.gz from https://www.cs.toronto.edu/~kriz/cifar.html and decompress it to the location mentioned above.
+
+
+## Training
+
+After data preparation, one can start the training step by:
+
+    python -u train_mixup.py \
+        --batch_size=80 \
+        --auxiliary \
+        --weight_decay=0.0003 \
+        --learning_rate=0.025 \
+        --lrc_loss_lambda=0.7 \
+        --cutout
+- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to train.
+- For more help on arguments:
+
+    python train_mixup.py --help
+
+**data reader introduction:**
+
+* Data reader is defined in `reader.py`.
+* Reshape the images to 32 * 32.
+* In training stage, images are padding to 40 * 40 and cropped randomly to the original size.
+* In training stage, images are horizontally random flipped.
+* Images are standardized to (0, 1).
+* In training stage, cutout images randomly.
+* Shuffle the order of the input images during training.
+
+**model configuration:**
+
+* Use auxiliary loss and auxiliary\_weight=0.4.
+* Use dropout and drop\_path\_prob=0.2.
+* Set lrc\_loss\_lambda=0.7.
+
+**training strategy:**
+
+*  Use momentum optimizer with momentum=0.9.
+*  Weight decay is 0.0003.
+*  Use cosine decay with init\_lr=0.025.
+*  Total epoch is 600.
+*  Use Xaiver initalizer to weight in conv2d, Constant initalizer to weight in batch norm and Normal initalizer to weight in fc.
+*  Initalize bias in batch norm and fc to zero constant and do not add bias to conv2d.
+
+
+## Reference
+
+  - DARTS: Differentiable Architecture Search [`paper`](https://arxiv.org/abs/1806.09055)
+  - Differentiable architecture search in PyTorch [`code`](https://github.com/quark0/darts)
diff --git a/fluid/AutoDL/LRC/README_cn.md b/fluid/AutoDL/LRC/README_cn.md
new file mode 100644
index 0000000000000000000000000000000000000000..06dc937074de199af31db97ee200e7690443b1b0
--- /dev/null
+++ b/fluid/AutoDL/LRC/README_cn.md
@@ -0,0 +1,71 @@
+# LRC 局部Rademachar复杂度正则化
+为了在深度神经网络中提升泛化能力，正则化的选择十分重要也具有挑战性。本目录包括了一种基于局部rademacher复杂度的新型正则（LRC）的图像分类模型。十分感谢[DARTS](https://arxiv.org/abs/1806.09055)模型对本研究提供的帮助。该模型将LRC正则和DARTS网络相结合，在CIFAR-10数据集中得到了很出色的效果。代码和文章一同发布
+> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\
+> Yingzhen Yang, Xingjian Li, Jun Huan.\
+> _arXiv:1902.00873_.
+
+---
+# 内容
+
+- [安装](#安装)
+- [数据准备](#数据准备)
+- [模型训练](#模型训练)
+
+## 安装
+
+在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.2.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本，请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle)中的说明来更新PaddlePaddle。
+
+## 数据准备
+
+第一次使用CIFAR-10数据集时，您可以通过如果命令下载：
+
+    sh ./dataset/download.sh
+
+请确保您的环境有互联网连接。数据会下载到`train.py`同目录下的`dataset/cifar/cifar-10-batches-py`。如果下载失败，您可以自行从https://www.cs.toronto.edu/~kriz/cifar.html上下载cifar-10-python.tar.gz并解压到上述位置。
+
+## 模型训练
+
+数据准备好后，可以通过如下命令开始训练：
+
+    python -u train_mixup.py \
+        --batch_size=80 \
+        --auxiliary \
+        --weight_decay=0.0003 \
+        --learning_rate=0.025 \
+        --lrc_loss_lambda=0.7 \
+        --cutout
+- 通过设置 ```export CUDA_VISIBLE_DEVICES=0```指定单张GPU训练。
+- 可选参数见：
+
+    python train_mixup.py --help
+
+**数据读取器说明：**
+
+* 数据读取器定义在`reader.py`中
+* 输入图像尺寸统一变换为32 * 32
+* 训练时将图像填充为40 * 40然后随机剪裁为原输入图像大小
+* 训练时图像随机水平翻转
+* 对图像每个像素做归一化处理
+* 训练时对图像做随机遮挡
+* 训练时对输入图像做随机洗牌
+
+**模型配置：**
+
+* 使用辅助损失，辅助损失权重为0.4
+* 使用dropout，随机丢弃率为0.2
+* 设置lrc\_loss\_lambda为0.7
+
+**训练策略：**
+
+* 采用momentum优化算法训练，momentum=0.9
+* 权重衰减系数为0.0001
+* 采用正弦学习率衰减，初始学习率为0.025
+* 总共训练600轮
+* 对卷积权重采用Xaiver初始化，对batch norm权重采用固定初始化，对全连接层权重采用高斯初始化
+* 对batch norm和全连接层偏差采用固定初始化，不对卷积设置偏差
+
+
+## 引用
+
+  - DARTS: Differentiable Architecture Search [`论文`](https://arxiv.org/abs/1806.09055)
+  - Differentiable Architecture Search in PyTorch [`代码`](https://github.com/quark0/darts)
diff --git a/fluid/AutoDL/LRC/dataset/download.sh b/fluid/AutoDL/LRC/dataset/download.sh
new file mode 100644
index 0000000000000000000000000000000000000000..0981c3b6878421f80d392f314fd0ae836644a63c
--- /dev/null
+++ b/fluid/AutoDL/LRC/dataset/download.sh
@@ -0,0 +1,10 @@
+DIR="$( cd "$(dirname "$0")" ; pwd -P )"
+cd "$DIR"
+mkdir cifar
+cd cifar
+# Download the data.
+echo "Downloading..."
+wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
+# Extract the data.
+echo "Extracting..."
+tar zvxf cifar-10-python.tar.gz
diff --git a/fluid/AutoDL/LRC/genotypes.py b/fluid/AutoDL/LRC/genotypes.py
new file mode 100644
index 0000000000000000000000000000000000000000..349fbd2478a7c2d1bb4cc3dd901b470de3c8b906
--- /dev/null
+++ b/fluid/AutoDL/LRC/genotypes.py
@@ -0,0 +1,116 @@
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Based on:
+# --------------------------------------------------------
+# DARTS
+# Copyright (c) 2018, Hanxiao Liu.
+# Licensed under the Apache License, Version 2.0;
+# --------------------------------------------------------
+
+from collections import namedtuple
+
+Genotype = namedtuple('Genotype', 'normal normal_concat reduce reduce_concat')
+
+PRIMITIVES = [
+    'none', 'max_pool_3x3', 'avg_pool_3x3', 'skip_connect', 'sep_conv_3x3',
+    'sep_conv_5x5', 'dil_conv_3x3', 'dil_conv_5x5'
+]
+
+NASNet = Genotype(
+    normal=[
+        ('sep_conv_5x5', 1),
+        ('sep_conv_3x3', 0),
+        ('sep_conv_5x5', 0),
+        ('sep_conv_3x3', 0),
+        ('avg_pool_3x3', 1),
+        ('skip_connect', 0),
+        ('avg_pool_3x3', 0),
+        ('avg_pool_3x3', 0),
+        ('sep_conv_3x3', 1),
+        ('skip_connect', 1),
+    ],
+    normal_concat=[2, 3, 4, 5, 6],
+    reduce=[
+        ('sep_conv_5x5', 1),
+        ('sep_conv_7x7', 0),
+        ('max_pool_3x3', 1),
+        ('sep_conv_7x7', 0),
+        ('avg_pool_3x3', 1),
+        ('sep_conv_5x5', 0),
+        ('skip_connect', 3),
+        ('avg_pool_3x3', 2),
+        ('sep_conv_3x3', 2),
+        ('max_pool_3x3', 1),
+    ],
+    reduce_concat=[4, 5, 6], )
+
+AmoebaNet = Genotype(
+    normal=[
+        ('avg_pool_3x3', 0),
+        ('max_pool_3x3', 1),
+        ('sep_conv_3x3', 0),
+        ('sep_conv_5x5', 2),
+        ('sep_conv_3x3', 0),
+        ('avg_pool_3x3', 3),
+        ('sep_conv_3x3', 1),
+        ('skip_connect', 1),
+        ('skip_connect', 0),
+        ('avg_pool_3x3', 1),
+    ],
+    normal_concat=[4, 5, 6],
+    reduce=[
+        ('avg_pool_3x3', 0),
+        ('sep_conv_3x3', 1),
+        ('max_pool_3x3', 0),
+        ('sep_conv_7x7', 2),
+        ('sep_conv_7x7', 0),
+        ('avg_pool_3x3', 1),
+        ('max_pool_3x3', 0),
+        ('max_pool_3x3', 1),
+        ('conv_7x1_1x7', 0),
+        ('sep_conv_3x3', 5),
+    ],
+    reduce_concat=[3, 4, 6])
+
+DARTS_V1 = Genotype(
+    normal=[('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('skip_connect', 0),
+            ('sep_conv_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 1),
+            ('sep_conv_3x3', 0), ('skip_connect', 2)],
+    normal_concat=[2, 3, 4, 5],
+    reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2),
+            ('max_pool_3x3', 0), ('max_pool_3x3', 0), ('skip_connect', 2),
+            ('skip_connect', 2), ('avg_pool_3x3', 0)],
+    reduce_concat=[2, 3, 4, 5])
+DARTS_V2 = Genotype(
+    normal=[('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0),
+            ('sep_conv_3x3', 1), ('sep_conv_3x3', 1), ('skip_connect', 0),
+            ('skip_connect', 0), ('dil_conv_3x3', 2)],
+    normal_concat=[2, 3, 4, 5],
+    reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2),
+            ('max_pool_3x3', 1), ('max_pool_3x3', 0), ('skip_connect', 2),
+            ('skip_connect', 2), ('max_pool_3x3', 1)],
+    reduce_concat=[2, 3, 4, 5])
+
+MY_DARTS = Genotype(
+    normal=[('sep_conv_3x3', 0), ('skip_connect', 1), ('skip_connect', 0),
+            ('dil_conv_5x5', 1), ('skip_connect', 0), ('sep_conv_3x3', 1),
+            ('skip_connect', 0), ('sep_conv_3x3', 1)],
+    normal_concat=range(2, 6),
+    reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('max_pool_3x3', 0),
+            ('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 2),
+            ('skip_connect', 2), ('skip_connect', 3)],
+    reduce_concat=range(2, 6))
+
+DARTS = MY_DARTS
diff --git a/fluid/AutoDL/LRC/learning_rate.py b/fluid/AutoDL/LRC/learning_rate.py
new file mode 100644
index 0000000000000000000000000000000000000000..3965171b487884d36e4a7447f10f312204803bf8
--- /dev/null
+++ b/fluid/AutoDL/LRC/learning_rate.py
@@ -0,0 +1,43 @@
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Based on:
+# --------------------------------------------------------
+# DARTS
+# Copyright (c) 2018, Hanxiao Liu.
+# Licensed under the Apache License, Version 2.0;
+# --------------------------------------------------------
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle
+import paddle.fluid as fluid
+import paddle.fluid.layers.ops as ops
+from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
+import math
+from paddle.fluid.initializer import init_on_cpu
+
+
+def cosine_decay(learning_rate, num_epoch, steps_one_epoch):
+    """Applies cosine decay to the learning rate.
+    lr = 0.5 * (math.cos(epoch * (math.pi / 120)) + 1)
+    """
+    global_step = _decay_step_counter()
+
+    with init_on_cpu():
+        decayed_lr = learning_rate * \
+                 (ops.cos((global_step / steps_one_epoch) \
+                 * math.pi / num_epoch) + 1)/2
+    return decayed_lr
diff --git a/fluid/AutoDL/LRC/model.py b/fluid/AutoDL/LRC/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..45a403495ecc0b7cc0ac3b541d75702adbef31b2
--- /dev/null
+++ b/fluid/AutoDL/LRC/model.py
@@ -0,0 +1,313 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+# Based on:
+# --------------------------------------------------------
+# DARTS
+# Copyright (c) 2018, Hanxiao Liu.
+# Licensed under the Apache License, Version 2.0;
+# --------------------------------------------------------
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import sys
+import numpy as np
+import time
+import functools
+import paddle
+import paddle.fluid as fluid
+from operations import *
+
+
+class Cell():
+    def __init__(self, genotype, C_prev_prev, C_prev, C, reduction,
+                 reduction_prev):
+        print(C_prev_prev, C_prev, C)
+
+        if reduction_prev:
+            self.preprocess0 = functools.partial(FactorizedReduce, C_out=C)
+        else:
+            self.preprocess0 = functools.partial(
+                ReLUConvBN, C_out=C, kernel_size=1, stride=1, padding=0)
+        self.preprocess1 = functools.partial(
+            ReLUConvBN, C_out=C, kernel_size=1, stride=1, padding=0)
+        if reduction:
+            op_names, indices = zip(*genotype.reduce)
+            concat = genotype.reduce_concat
+        else:
+            op_names, indices = zip(*genotype.normal)
+            concat = genotype.normal_concat
+        print(op_names, indices, concat, reduction)
+        self._compile(C, op_names, indices, concat, reduction)
+
+    def _compile(self, C, op_names, indices, concat, reduction):
+        assert len(op_names) == len(indices)
+        self._steps = len(op_names) // 2
+        self._concat = concat
+        self.multiplier = len(concat)
+
+        self._ops = []
+        for name, index in zip(op_names, indices):
+            stride = 2 if reduction and index < 2 else 1
+            op = functools.partial(OPS[name], C=C, stride=stride, affine=True)
+            self._ops += [op]
+        self._indices = indices
+
+    def forward(self, s0, s1, drop_prob, is_train, name):
+        self.training = is_train
+        preprocess0_name = name + 'preprocess0.'
+        preprocess1_name = name + 'preprocess1.'
+        s0 = self.preprocess0(s0, name=preprocess0_name)
+        s1 = self.preprocess1(s1, name=preprocess1_name)
+        out = [s0, s1]
+        for i in range(self._steps):
+            h1 = out[self._indices[2 * i]]
+            h2 = out[self._indices[2 * i + 1]]
+            op1 = self._ops[2 * i]
+            op2 = self._ops[2 * i + 1]
+            h3 = op1(h1, name=name + '_ops.' + str(2 * i) + '.')
+            h4 = op2(h2, name=name + '_ops.' + str(2 * i + 1) + '.')
+            if self.training and drop_prob > 0.:
+                if h3 != h1:
+                    h3 = fluid.layers.dropout(
+                        h3,
+                        drop_prob,
+                        dropout_implementation='upscale_in_train')
+                if h4 != h2:
+                    h4 = fluid.layers.dropout(
+                        h4,
+                        drop_prob,
+                        dropout_implementation='upscale_in_train')
+            s = h3 + h4
+            out += [s]
+        return fluid.layers.concat([out[i] for i in self._concat], axis=1)
+
+
+def AuxiliaryHeadCIFAR(input, num_classes, aux_name='auxiliary_head'):
+    relu_a = fluid.layers.relu(input)
+    pool_a = fluid.layers.pool2d(relu_a, 5, 'avg', 3)
+    conv2d_a = fluid.layers.conv2d(
+        pool_a,
+        128,
+        1,
+        name=aux_name + '.features.2',
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=aux_name + '.features.2.weight'),
+        bias_attr=False)
+    bn_a_name = aux_name + '.features.3'
+    bn_a = fluid.layers.batch_norm(
+        conv2d_a,
+        act='relu',
+        name=bn_a_name,
+        param_attr=ParamAttr(
+            initializer=Constant(1.), name=bn_a_name + '.weight'),
+        bias_attr=ParamAttr(
+            initializer=Constant(0.), name=bn_a_name + '.bias'),
+        moving_mean_name=bn_a_name + '.running_mean',
+        moving_variance_name=bn_a_name + '.running_var')
+    conv2d_b = fluid.layers.conv2d(
+        bn_a,
+        768,
+        2,
+        name=aux_name + '.features.5',
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=aux_name + '.features.5.weight'),
+        bias_attr=False)
+    bn_b_name = aux_name + '.features.6'
+    bn_b = fluid.layers.batch_norm(
+        conv2d_b,
+        act='relu',
+        name=bn_b_name,
+        param_attr=ParamAttr(
+            initializer=Constant(1.), name=bn_b_name + '.weight'),
+        bias_attr=ParamAttr(
+            initializer=Constant(0.), name=bn_b_name + '.bias'),
+        moving_mean_name=bn_b_name + '.running_mean',
+        moving_variance_name=bn_b_name + '.running_var')
+    fc_name = aux_name + '.classifier'
+    fc = fluid.layers.fc(bn_b,
+                         num_classes,
+                         name=fc_name,
+                         param_attr=ParamAttr(
+                             initializer=Normal(scale=1e-3),
+                             name=fc_name + '.weight'),
+                         bias_attr=ParamAttr(
+                             initializer=Constant(0.), name=fc_name + '.bias'))
+    return fc
+
+
+def StemConv(input, C_out, kernel_size, padding):
+    conv_a = fluid.layers.conv2d(
+        input,
+        C_out,
+        kernel_size,
+        padding=padding,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0), name='stem.0.weight'),
+        bias_attr=False)
+    bn_a = fluid.layers.batch_norm(
+        conv_a,
+        param_attr=ParamAttr(
+            initializer=Constant(1.), name='stem.1.weight'),
+        bias_attr=ParamAttr(
+            initializer=Constant(0.), name='stem.1.bias'),
+        moving_mean_name='stem.1.running_mean',
+        moving_variance_name='stem.1.running_var')
+    return bn_a
+
+
+class NetworkCIFAR(object):
+    def __init__(self, C, class_num, layers, auxiliary, genotype):
+        self.class_num = class_num
+        self._layers = layers
+        self._auxiliary = auxiliary
+
+        stem_multiplier = 3
+        self.drop_path_prob = 0
+        C_curr = stem_multiplier * C
+
+        C_prev_prev, C_prev, C_curr = C_curr, C_curr, C
+        self.cells = []
+        reduction_prev = False
+        for i in range(layers):
+            if i in [layers // 3, 2 * layers // 3]:
+                C_curr *= 2
+                reduction = True
+            else:
+                reduction = False
+            cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction,
+                        reduction_prev)
+            reduction_prev = reduction
+            self.cells += [cell]
+            C_prev_prev, C_prev = C_prev, cell.multiplier * C_curr
+            if i == 2 * layers // 3:
+                C_to_auxiliary = C_prev
+
+    def forward(self, init_channel, is_train):
+        self.training = is_train
+        self.logits_aux = None
+        num_channel = init_channel * 3
+        s0 = StemConv(self.image, num_channel, kernel_size=3, padding=1)
+        s1 = s0
+        for i, cell in enumerate(self.cells):
+            name = 'cells.' + str(i) + '.'
+            s0, s1 = s1, cell.forward(s0, s1, self.drop_path_prob, is_train,
+                                      name)
+            if i == int(2 * self._layers // 3):
+                if self._auxiliary and self.training:
+                    self.logits_aux = AuxiliaryHeadCIFAR(s1, self.class_num)
+        out = fluid.layers.adaptive_pool2d(s1, (1, 1), "avg")
+        self.logits = fluid.layers.fc(out,
+                                      size=self.class_num,
+                                      param_attr=ParamAttr(
+                                          initializer=Normal(scale=1e-3),
+                                          name='classifier.weight'),
+                                      bias_attr=ParamAttr(
+                                          initializer=Constant(0.),
+                                          name='classifier.bias'))
+        return self.logits, self.logits_aux
+
+    def build_input(self, image_shape, batch_size, is_train):
+        if is_train:
+            py_reader = fluid.layers.py_reader(
+                capacity=64,
+                shapes=[[-1] + image_shape, [-1, 1], [-1, 1], [-1, 1], [-1, 1],
+                        [-1, 1], [-1, batch_size, self.class_num - 1]],
+                lod_levels=[0, 0, 0, 0, 0, 0, 0],
+                dtypes=[
+                    "float32", "int64", "int64", "float32", "int32", "int32",
+                    "float32"
+                ],
+                use_double_buffer=True,
+                name='train_reader')
+        else:
+            py_reader = fluid.layers.py_reader(
+                capacity=64,
+                shapes=[[-1] + image_shape, [-1, 1]],
+                lod_levels=[0, 0],
+                dtypes=["float32", "int64"],
+                use_double_buffer=True,
+                name='test_reader')
+        return py_reader
+
+    def train_model(self, py_reader, init_channels, aux, aux_w, batch_size,
+                    loss_lambda):
+        self.image, self.ya, self.yb, self.lam, self.label_reshape,\
+           self.non_label_reshape, self.rad_var = fluid.layers.read_file(py_reader)
+        self.logits, self.logits_aux = self.forward(init_channels, True)
+        self.mixup_loss = self.mixup_loss(aux, aux_w)
+        self.lrc_loss = self.lrc_loss(batch_size)
+        return self.mixup_loss + loss_lambda * self.lrc_loss
+
+    def test_model(self, py_reader, init_channels):
+        self.image, self.ya = fluid.layers.read_file(py_reader)
+        self.logits, _ = self.forward(init_channels, False)
+        prob = fluid.layers.softmax(self.logits, use_cudnn=False)
+        loss = fluid.layers.cross_entropy(prob, self.ya)
+        acc_1 = fluid.layers.accuracy(self.logits, self.ya, k=1)
+        acc_5 = fluid.layers.accuracy(self.logits, self.ya, k=5)
+        return loss, acc_1, acc_5
+
+    def mixup_loss(self, auxiliary, auxiliary_weight):
+        prob = fluid.layers.softmax(self.logits, use_cudnn=False)
+        loss_a = fluid.layers.cross_entropy(prob, self.ya)
+        loss_b = fluid.layers.cross_entropy(prob, self.yb)
+        loss_a_mean = fluid.layers.reduce_mean(loss_a)
+        loss_b_mean = fluid.layers.reduce_mean(loss_b)
+        loss = self.lam * loss_a_mean + (1 - self.lam) * loss_b_mean
+        if auxiliary:
+            prob_aux = fluid.layers.softmax(self.logits_aux, use_cudnn=False)
+            loss_a_aux = fluid.layers.cross_entropy(prob_aux, self.ya)
+            loss_b_aux = fluid.layers.cross_entropy(prob_aux, self.yb)
+            loss_a_aux_mean = fluid.layers.reduce_mean(loss_a_aux)
+            loss_b_aux_mean = fluid.layers.reduce_mean(loss_b_aux)
+            loss_aux = self.lam * loss_a_aux_mean + (1 - self.lam
+                                                     ) * loss_b_aux_mean
+        return loss + auxiliary_weight * loss_aux
+
+    def lrc_loss(self, batch_size):
+        y_diff_reshape = fluid.layers.reshape(self.logits, shape=(-1, 1))
+        label_reshape = fluid.layers.squeeze(self.label_reshape, axes=[1])
+        non_label_reshape = fluid.layers.squeeze(
+            self.non_label_reshape, axes=[1])
+        label_reshape.stop_gradient = True
+        non_label_reshape.stop_graident = True
+
+        y_diff_label_reshape = fluid.layers.gather(y_diff_reshape,
+                                                   label_reshape)
+        y_diff_non_label_reshape = fluid.layers.gather(y_diff_reshape,
+                                                       non_label_reshape)
+        y_diff_label = fluid.layers.reshape(
+            y_diff_label_reshape, shape=(-1, batch_size, 1))
+        y_diff_non_label = fluid.layers.reshape(
+            y_diff_non_label_reshape,
+            shape=(-1, batch_size, self.class_num - 1))
+        y_diff_ = y_diff_non_label - y_diff_label
+
+        y_diff_ = fluid.layers.transpose(y_diff_, perm=[1, 2, 0])
+        rad_var_trans = fluid.layers.transpose(self.rad_var, perm=[1, 2, 0])
+        rad_y_diff_trans = rad_var_trans * y_diff_
+        lrc_loss_sum = fluid.layers.reduce_sum(rad_y_diff_trans, dim=[0, 1])
+        lrc_loss_ = fluid.layers.abs(lrc_loss_sum) / (batch_size *
+                                                      (self.class_num - 1))
+        lrc_loss_mean = fluid.layers.reduce_mean(lrc_loss_)
+
+        return lrc_loss_mean
diff --git a/fluid/AutoDL/LRC/operations.py b/fluid/AutoDL/LRC/operations.py
new file mode 100644
index 0000000000000000000000000000000000000000..b015722a1bc5dbf682c90812a971f3dbb2cd8c9a
--- /dev/null
+++ b/fluid/AutoDL/LRC/operations.py
@@ -0,0 +1,349 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+# Based on:
+# --------------------------------------------------------
+# DARTS
+# Copyright (c) 2018, Hanxiao Liu.
+# Licensed under the Apache License, Version 2.0;
+# --------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import sys
+import numpy as np
+import time
+import paddle
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Xavier
+from paddle.fluid.initializer import Normal
+from paddle.fluid.initializer import Constant
+
+OPS = {
+    'none' : lambda input, C, stride, name, affine: Zero(input, stride, name),
+    'avg_pool_3x3' : lambda input, C, stride, name, affine: fluid.layers.pool2d(input, 3, 'avg', pool_stride=stride, pool_padding=1, name=name),
+    'max_pool_3x3' : lambda input, C, stride, name, affine: fluid.layers.pool2d(input, 3, 'max', pool_stride=stride, pool_padding=1, name=name),
+    'skip_connect' : lambda input,C, stride, name, affine: Identity(input, name) if stride == 1 else FactorizedReduce(input, C, name=name, affine=affine),
+    'sep_conv_3x3' : lambda input,C, stride, name, affine: SepConv(input, C, C, 3, stride, 1, name=name, affine=affine),
+    'sep_conv_5x5' : lambda input,C, stride, name, affine: SepConv(input, C, C, 5, stride, 2, name=name, affine=affine),
+    'sep_conv_7x7' : lambda input,C, stride, name, affine: SepConv(input, C, C, 7, stride, 3, name=name, affine=affine),
+    'dil_conv_3x3' : lambda input,C, stride, name, affine: DilConv(input, C, C, 3, stride, 2, 2, name=name, affine=affine),
+    'dil_conv_5x5' : lambda input,C, stride, name, affine: DilConv(input, C, C, 5, stride, 4, 2, name=name, affine=affine),
+    'conv_7x1_1x7' : lambda input,C, stride, name, affine: SevenConv(input, C, name=name, affine=affine)
+}
+
+
+def ReLUConvBN(input, C_out, kernel_size, stride, padding, name='',
+               affine=True):
+    relu_a = fluid.layers.relu(input)
+    conv2d_a = fluid.layers.conv2d(
+        relu_a,
+        C_out,
+        kernel_size,
+        stride,
+        padding,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=name + 'op.1.weight'),
+        bias_attr=False)
+    if affine:
+        reluconvbn_out = fluid.layers.batch_norm(
+            conv2d_a,
+            param_attr=ParamAttr(
+                initializer=Constant(1.), name=name + 'op.2.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.), name=name + 'op.2.bias'),
+            moving_mean_name=name + 'op.2.running_mean',
+            moving_variance_name=name + 'op.2.running_var')
+    else:
+        reluconvbn_out = fluid.layers.batch_norm(
+            conv2d_a,
+            param_attr=ParamAttr(
+                initializer=Constant(1.),
+                learning_rate=0.,
+                name=name + 'op.2.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.),
+                learning_rate=0.,
+                name=name + 'op.2.bias'),
+            moving_mean_name=name + 'op.2.running_mean',
+            moving_variance_name=name + 'op.2.running_var')
+    return reluconvbn_out
+
+
+def DilConv(input,
+            C_in,
+            C_out,
+            kernel_size,
+            stride,
+            padding,
+            dilation,
+            name='',
+            affine=True):
+    relu_a = fluid.layers.relu(input)
+    conv2d_a = fluid.layers.conv2d(
+        relu_a,
+        C_in,
+        kernel_size,
+        stride,
+        padding,
+        dilation,
+        groups=C_in,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=name + 'op.1.weight'),
+        bias_attr=False,
+        use_cudnn=False)
+    conv2d_b = fluid.layers.conv2d(
+        conv2d_a,
+        C_out,
+        1,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=name + 'op.2.weight'),
+        bias_attr=False)
+    if affine:
+        dilconv_out = fluid.layers.batch_norm(
+            conv2d_b,
+            param_attr=ParamAttr(
+                initializer=Constant(1.), name=name + 'op.3.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.), name=name + 'op.3.bias'),
+            moving_mean_name=name + 'op.3.running_mean',
+            moving_variance_name=name + 'op.3.running_var')
+    else:
+        dilconv_out = fluid.layers.batch_norm(
+            conv2d_b,
+            param_attr=ParamAttr(
+                initializer=Constant(1.),
+                learning_rate=0.,
+                name=name + 'op.3.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.),
+                learning_rate=0.,
+                name=name + 'op.3.bias'),
+            moving_mean_name=name + 'op.3.running_mean',
+            moving_variance_name=name + 'op.3.running_var')
+    return dilconv_out
+
+
+def SepConv(input,
+            C_in,
+            C_out,
+            kernel_size,
+            stride,
+            padding,
+            name='',
+            affine=True):
+    relu_a = fluid.layers.relu(input)
+    conv2d_a = fluid.layers.conv2d(
+        relu_a,
+        C_in,
+        kernel_size,
+        stride,
+        padding,
+        groups=C_in,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=name + 'op.1.weight'),
+        bias_attr=False,
+        use_cudnn=False)
+    conv2d_b = fluid.layers.conv2d(
+        conv2d_a,
+        C_in,
+        1,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=name + 'op.2.weight'),
+        bias_attr=False)
+    if affine:
+        bn_a = fluid.layers.batch_norm(
+            conv2d_b,
+            param_attr=ParamAttr(
+                initializer=Constant(1.), name=name + 'op.3.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.), name=name + 'op.3.bias'),
+            moving_mean_name=name + 'op.3.running_mean',
+            moving_variance_name=name + 'op.3.running_var')
+    else:
+        bn_a = fluid.layers.batch_norm(
+            conv2d_b,
+            param_attr=ParamAttr(
+                initializer=Constant(1.),
+                learning_rate=0.,
+                name=name + 'op.3.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.),
+                learning_rate=0.,
+                name=name + 'op.3.bias'),
+            moving_mean_name=name + 'op.3.running_mean',
+            moving_variance_name=name + 'op.3.running_var')
+
+    relu_b = fluid.layers.relu(bn_a)
+    conv2d_d = fluid.layers.conv2d(
+        relu_b,
+        C_in,
+        kernel_size,
+        1,
+        padding,
+        groups=C_in,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=name + 'op.5.weight'),
+        bias_attr=False,
+        use_cudnn=False)
+    conv2d_e = fluid.layers.conv2d(
+        conv2d_d,
+        C_out,
+        1,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=name + 'op.6.weight'),
+        bias_attr=False)
+    if affine:
+        sepconv_out = fluid.layers.batch_norm(
+            conv2d_e,
+            param_attr=ParamAttr(
+                initializer=Constant(1.), name=name + 'op.7.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.), name=name + 'op.7.bias'),
+            moving_mean_name=name + 'op.7.running_mean',
+            moving_variance_name=name + 'op.7.running_var')
+    else:
+        sepconv_out = fluid.layers.batch_norm(
+            conv2d_e,
+            param_attr=ParamAttr(
+                initializer=Constant(1.),
+                learning_rate=0.,
+                name=name + 'op.7.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.),
+                learning_rate=0.,
+                name=name + 'op.7.bias'),
+            moving_mean_name=name + 'op.7.running_mean',
+            moving_variance_name=name + 'op.7.running_var')
+    return sepconv_out
+
+
+def SevenConv(input, C_out, stride, name='', affine=True):
+    relu_a = fluid.layers.relu(input)
+    conv2d_a = fluid.layers.conv2d(
+        relu_a,
+        C_out, (1, 7), (1, stride), (0, 3),
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=name + 'op.1.weight'),
+        bias_attr=False)
+    conv2d_b = fluid.layers.conv2d(
+        conv2d_a,
+        C_out, (7, 1), (stride, 1), (3, 0),
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=name + 'op.2.weight'),
+        bias_attr=False)
+    if affine:
+        out = fluid.layers.batch_norm(
+            conv2d_b,
+            param_attr=ParamAttr(
+                initializer=Constant(1.), name=name + 'op.3.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.), name=name + 'op.3.bias'),
+            moving_mean_name=name + 'op.3.running_mean',
+            moving_variance_name=name + 'op.3.running_var')
+    else:
+        out = fluid.layers.batch_norm(
+            conv2d_b,
+            param_attr=ParamAttr(
+                initializer=Constant(1.),
+                learning_rate=0.,
+                name=name + 'op.3.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.),
+                learning_rate=0.,
+                name=name + 'op.3.bias'),
+            moving_mean_name=name + 'op.3.running_mean',
+            moving_variance_name=name + 'op.3.running_var')
+
+
+def Identity(input, name=''):
+    return input
+
+
+def Zero(input, stride, name=''):
+    ones = np.ones(input.shape[-2:])
+    ones[::stride, ::stride] = 0
+    ones = fluid.layers.assign(ones)
+    return input * ones
+
+
+def FactorizedReduce(input, C_out, name='', affine=True):
+    relu_a = fluid.layers.relu(input)
+    conv2d_a = fluid.layers.conv2d(
+        relu_a,
+        C_out // 2,
+        1,
+        2,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=name + 'conv_1.weight'),
+        bias_attr=False)
+    h_end = relu_a.shape[2]
+    w_end = relu_a.shape[3]
+    slice_a = fluid.layers.slice(relu_a, [2, 3], [1, 1], [h_end, w_end])
+    conv2d_b = fluid.layers.conv2d(
+        slice_a,
+        C_out // 2,
+        1,
+        2,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=name + 'conv_2.weight'),
+        bias_attr=False)
+    out = fluid.layers.concat([conv2d_a, conv2d_b], axis=1)
+    if affine:
+        out = fluid.layers.batch_norm(
+            out,
+            param_attr=ParamAttr(
+                initializer=Constant(1.), name=name + 'bn.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.), name=name + 'bn.bias'),
+            moving_mean_name=name + 'bn.running_mean',
+            moving_variance_name=name + 'bn.running_var')
+    else:
+        out = fluid.layers.batch_norm(
+            out,
+            param_attr=ParamAttr(
+                initializer=Constant(1.),
+                learning_rate=0.,
+                name=name + 'bn.weight'),
+            bias_attr=ParamAttr(
+                initializer=Constant(0.),
+                learning_rate=0.,
+                name=name + 'bn.bias'),
+            moving_mean_name=name + 'bn.running_mean',
+            moving_variance_name=name + 'bn.running_var')
+    return out
diff --git a/fluid/AutoDL/LRC/reader.py b/fluid/AutoDL/LRC/reader.py
new file mode 100644
index 0000000000000000000000000000000000000000..20b32b504e9245c4ff3892f08736d800080daab4
--- /dev/null
+++ b/fluid/AutoDL/LRC/reader.py
@@ -0,0 +1,187 @@
+# Copyright (c) 2019 PaddlePaddle Authors. All Rig hts Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Based on:
+# --------------------------------------------------------
+# DARTS
+# Copyright (c) 2018, Hanxiao Liu.
+# Licensed under the Apache License, Version 2.0;
+# --------------------------------------------------------
+"""
+CIFAR-10 dataset.
+This module will download dataset from
+https://www.cs.toronto.edu/~kriz/cifar.html and parse train/test set into
+paddle reader creators.
+The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes,
+with 6000 images per class. There are 50000 training images and 10000 test images.
+"""
+
+from PIL import Image
+from PIL import ImageOps
+import numpy as np
+
+import cPickle
+import random
+import utils
+import paddle.fluid as fluid
+import time
+import os
+import functools
+import paddle.reader
+
+__all__ = ['train10', 'test10']
+
+image_size = 32
+image_depth = 3
+half_length = 8
+
+CIFAR_MEAN = [0.4914, 0.4822, 0.4465]
+CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
+
+
+def generate_reshape_label(label, batch_size, CIFAR_CLASSES=10):
+    reshape_label = np.zeros((batch_size, 1), dtype='int32')
+    reshape_non_label = np.zeros(
+        (batch_size * (CIFAR_CLASSES - 1), 1), dtype='int32')
+    num = 0
+    for i in range(batch_size):
+        label_i = label[i]
+        reshape_label[i] = label_i + i * CIFAR_CLASSES
+        for j in range(CIFAR_CLASSES):
+            if label_i != j:
+                reshape_non_label[num] = \
+                          j + i * CIFAR_CLASSES
+                num += 1
+    return reshape_label, reshape_non_label
+
+
+def generate_bernoulli_number(batch_size, CIFAR_CLASSES=10):
+    rcc_iters = 50
+    rad_var = np.zeros((rcc_iters, batch_size, CIFAR_CLASSES - 1))
+    for i in range(rcc_iters):
+        bernoulli_num = np.random.binomial(size=batch_size, n=1, p=0.5)
+        bernoulli_map = np.array([])
+        ones = np.ones((CIFAR_CLASSES - 1, 1))
+        for batch_id in range(batch_size):
+            num = bernoulli_num[batch_id]
+            var_id = 2 * ones * num - 1
+            bernoulli_map = np.append(bernoulli_map, var_id)
+        rad_var[i] = bernoulli_map.reshape((batch_size, CIFAR_CLASSES - 1))
+    return rad_var.astype('float32')
+
+
+def preprocess(sample, is_training, args):
+    image_array = sample.reshape(3, image_size, image_size)
+    rgb_array = np.transpose(image_array, (1, 2, 0))
+    img = Image.fromarray(rgb_array, 'RGB')
+
+    if is_training:
+        # pad and ramdom crop
+        img = ImageOps.expand(img, (4, 4, 4, 4), fill=0)  # pad to 40 * 40 * 3
+        left_top = np.random.randint(9, size=2)  # rand 0 - 8
+        img = img.crop((left_top[0], left_top[1], left_top[0] + image_size,
+                        left_top[1] + image_size))
+        if np.random.randint(2):
+            img = img.transpose(Image.FLIP_LEFT_RIGHT)
+
+    img = np.array(img).astype(np.float32)
+
+    # per_image_standardization
+    img_float = img / 255.0
+    img = (img_float - CIFAR_MEAN) / CIFAR_STD
+
+    if is_training and args.cutout:
+        center = np.random.randint(image_size, size=2)
+        offset_width = max(0, center[0] - half_length)
+        offset_height = max(0, center[1] - half_length)
+        target_width = min(center[0] + half_length, image_size)
+        target_height = min(center[1] + half_length, image_size)
+
+        for i in range(offset_height, target_height):
+            for j in range(offset_width, target_width):
+                img[i][j][:] = 0.0
+
+    img = np.transpose(img, (2, 0, 1))
+    return img
+
+
+def reader_creator_filepath(filename, sub_name, is_training, args):
+    files = os.listdir(filename)
+    names = [each_item for each_item in files if sub_name in each_item]
+    names.sort()
+    datasets = []
+    for name in names:
+        print("Reading file " + name)
+        batch = cPickle.load(open(filename + name, 'rb'))
+        data = batch['data']
+        labels = batch.get('labels', batch.get('fine_labels', None))
+        assert labels is not None
+        dataset = zip(data, labels)
+        datasets.extend(dataset)
+    random.shuffle(datasets)
+
+    def read_batch(datasets, args):
+        for sample, label in datasets:
+            im = preprocess(sample, is_training, args)
+            yield im, [int(label)]
+
+    def reader():
+        batch_data = []
+        batch_label = []
+        for data, label in read_batch(datasets, args):
+            batch_data.append(data)
+            batch_label.append(label)
+            if len(batch_data) == args.batch_size:
+                batch_data = np.array(batch_data, dtype='float32')
+                batch_label = np.array(batch_label, dtype='int64')
+                if is_training:
+                    flatten_label, flatten_non_label = \
+                      generate_reshape_label(batch_label, args.batch_size)
+                    rad_var = generate_bernoulli_number(args.batch_size)
+                    mixed_x, y_a, y_b, lam = utils.mixup_data(
+                        batch_data, batch_label, args.batch_size,
+                        args.mix_alpha)
+                    batch_out = [[mixed_x, y_a, y_b, lam, flatten_label, \
+                                flatten_non_label, rad_var]]
+                    yield batch_out
+                else:
+                    batch_out = [[batch_data, batch_label]]
+                    yield batch_out
+                batch_data = []
+                batch_label = []
+
+    return reader
+
+
+def train10(args):
+    """
+    CIFAR-10 training set creator.
+    It returns a reader creator, each sample in the reader is image pixels in
+    [0, 1] and label in [0, 9].
+    :return: Training reader creator
+    :rtype: callable
+    """
+
+    return reader_creator_filepath(args.data, 'data_batch', True, args)
+
+
+def test10(args):
+    """
+    CIFAR-10 test set creator.
+    It returns a reader creator, each sample in the reader is image pixels in
+    [0, 1] and label in [0, 9].
+    :return: Test reader creator.
+    :rtype: callable
+    """
+    return reader_creator_filepath(args.data, 'test_batch', False, args)
diff --git a/fluid/AutoDL/LRC/run.sh b/fluid/AutoDL/LRC/run.sh
new file mode 100644
index 0000000000000000000000000000000000000000..9f1a045d49789c3e9aebbc2a73b84b11da471b5a
--- /dev/null
+++ b/fluid/AutoDL/LRC/run.sh
@@ -0,0 +1,8 @@
+CUDA_VISIBLE_DEVICES=0 python -u train_mixup.py \
+--batch_size=80 \
+--auxiliary \
+--weight_decay=0.0003 \
+--learning_rate=0.025 \
+--lrc_loss_lambda=0.7 \
+--cutout
+
diff --git a/fluid/AutoDL/LRC/train_mixup.py b/fluid/AutoDL/LRC/train_mixup.py
new file mode 100644
index 0000000000000000000000000000000000000000..de752c84bcf9276aa83540d60370517e66c0704f
--- /dev/null
+++ b/fluid/AutoDL/LRC/train_mixup.py
@@ -0,0 +1,247 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+# Based on:
+# --------------------------------------------------------
+# DARTS
+# Copyright (c) 2018, Hanxiao Liu.
+# Licensed under the Apache License, Version 2.0;
+# --------------------------------------------------------
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from learning_rate import cosine_decay
+import numpy as np
+import argparse
+from model import NetworkCIFAR as Network
+import reader
+import sys
+import os
+import time
+import logging
+import genotypes
+import paddle.fluid as fluid
+import shutil
+import utils
+import cPickle as cp
+
+parser = argparse.ArgumentParser("cifar")
+parser.add_argument(
+    '--data',
+    type=str,
+    default='./dataset/cifar/cifar-10-batches-py/',
+    help='location of the data corpus')
+parser.add_argument('--batch_size', type=int, default=96, help='batch size')
+parser.add_argument(
+    '--learning_rate', type=float, default=0.025, help='init learning rate')
+parser.add_argument('--momentum', type=float, default=0.9, help='momentum')
+parser.add_argument(
+    '--weight_decay', type=float, default=3e-4, help='weight decay')
+parser.add_argument(
+    '--report_freq', type=float, default=50, help='report frequency')
+parser.add_argument(
+    '--epochs', type=int, default=600, help='num of training epochs')
+parser.add_argument(
+    '--init_channels', type=int, default=36, help='num of init channels')
+parser.add_argument(
+    '--layers', type=int, default=20, help='total number of layers')
+parser.add_argument(
+    '--model_path',
+    type=str,
+    default='saved_models',
+    help='path to save the model')
+parser.add_argument(
+    '--auxiliary',
+    action='store_true',
+    default=False,
+    help='use auxiliary tower')
+parser.add_argument(
+    '--auxiliary_weight',
+    type=float,
+    default=0.4,
+    help='weight for auxiliary loss')
+parser.add_argument(
+    '--cutout', action='store_true', default=False, help='use cutout')
+parser.add_argument(
+    '--cutout_length', type=int, default=16, help='cutout length')
+parser.add_argument(
+    '--drop_path_prob', type=float, default=0.2, help='drop path probability')
+parser.add_argument('--save', type=str, default='EXP', help='experiment name')
+parser.add_argument(
+    '--arch', type=str, default='DARTS', help='which architecture to use')
+parser.add_argument(
+    '--grad_clip', type=float, default=5, help='gradient clipping')
+parser.add_argument(
+    '--lr_exp_decay',
+    action='store_true',
+    default=False,
+    help='use exponential_decay learning_rate')
+parser.add_argument('--mix_alpha', type=float, default=0.5, help='mixup alpha')
+parser.add_argument(
+    '--lrc_loss_lambda', default=0, type=float, help='lrc_loss_lambda')
+parser.add_argument(
+    '--loss_type',
+    default=1,
+    type=float,
+    help='loss_type 0: cross entropy 1: multi margin loss 2: max margin loss')
+
+args = parser.parse_args()
+
+CIFAR_CLASSES = 10
+dataset_train_size = 50000
+image_size = 32
+
+
+def main():
+    image_shape = [3, image_size, image_size]
+    devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
+    devices_num = len(devices.split(","))
+    logging.info("args = %s", args)
+    genotype = eval("genotypes.%s" % args.arch)
+    model = Network(args.init_channels, CIFAR_CLASSES, args.layers,
+                    args.auxiliary, genotype)
+    steps_one_epoch = dataset_train_size / (devices_num * args.batch_size)
+    train(model, args, image_shape, steps_one_epoch)
+
+
+def build_program(main_prog, startup_prog, args, is_train, model, im_shape,
+                  steps_one_epoch):
+    out = []
+    with fluid.program_guard(main_prog, startup_prog):
+        py_reader = model.build_input(im_shape, args.batch_size, is_train)
+        if is_train:
+            with fluid.unique_name.guard():
+                loss = model.train_model(py_reader, args.init_channels,
+                                         args.auxiliary, args.auxiliary_weight,
+                                         args.batch_size, args.lrc_loss_lambda)
+                optimizer = fluid.optimizer.Momentum(
+                        learning_rate=cosine_decay(args.learning_rate, \
+                            args.epochs, steps_one_epoch),
+                        regularization=fluid.regularizer.L2Decay(\
+                            args.weight_decay),
+                        momentum=args.momentum)
+                optimizer.minimize(loss)
+                out = [py_reader, loss]
+        else:
+            with fluid.unique_name.guard():
+                loss, acc_1, acc_5 = model.test_model(py_reader,
+                                                      args.init_channels)
+                out = [py_reader, loss, acc_1, acc_5]
+    return out
+
+
+def train(model, args, im_shape, steps_one_epoch):
+    train_startup_prog = fluid.Program()
+    test_startup_prog = fluid.Program()
+    train_prog = fluid.Program()
+    test_prog = fluid.Program()
+
+    train_py_reader, loss_train = build_program(train_prog, train_startup_prog,
+                                                args, True, model, im_shape,
+                                                steps_one_epoch)
+
+    test_py_reader, loss_test, acc_1, acc_5 = build_program(
+        test_prog, test_startup_prog, args, False, model, im_shape,
+        steps_one_epoch)
+
+    test_prog = test_prog.clone(for_test=True)
+
+    place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(train_startup_prog)
+    exe.run(test_startup_prog)
+
+    exec_strategy = fluid.ExecutionStrategy()
+    exec_strategy.num_threads = 1
+    train_exe = fluid.ParallelExecutor(
+        main_program=train_prog,
+        use_cuda=True,
+        loss_name=loss_train.name,
+        exec_strategy=exec_strategy)
+    train_reader = reader.train10(args)
+    test_reader = reader.test10(args)
+    train_py_reader.decorate_paddle_reader(train_reader)
+    test_py_reader.decorate_paddle_reader(test_reader)
+
+    fluid.clip.set_gradient_clip(fluid.clip.GradientClipByNorm(args.grad_clip))
+    fluid.memory_optimize(fluid.default_main_program())
+
+    def save_model(postfix, main_prog):
+        model_path = os.path.join(args.model_path, postfix)
+        if os.path.isdir(model_path):
+            shutil.rmtree(model_path)
+        fluid.io.save_persistables(exe, model_path, main_program=main_prog)
+
+    def test(epoch_id):
+        test_fetch_list = [loss_test, acc_1, acc_5]
+        objs = utils.AvgrageMeter()
+        top1 = utils.AvgrageMeter()
+        top5 = utils.AvgrageMeter()
+        test_py_reader.start()
+        test_start_time = time.time()
+        step_id = 0
+        try:
+            while True:
+                prev_test_start_time = test_start_time
+                test_start_time = time.time()
+                loss_test_v, acc_1_v, acc_5_v = exe.run(
+                    test_prog, fetch_list=test_fetch_list)
+                objs.update(np.array(loss_test_v), args.batch_size)
+                top1.update(np.array(acc_1_v), args.batch_size)
+                top5.update(np.array(acc_5_v), args.batch_size)
+                if step_id % args.report_freq == 0:
+                    print("Epoch {}, Step {}, acc_1 {}, acc_5 {}, time {}".
+                          format(epoch_id, step_id,
+                                 np.array(acc_1_v),
+                                 np.array(acc_5_v), test_start_time -
+                                 prev_test_start_time))
+                step_id += 1
+        except fluid.core.EOFException:
+            test_py_reader.reset()
+        print("Epoch {0}, top1 {1}, top5 {2}".format(epoch_id, top1.avg,
+                                                     top5.avg))
+
+    train_fetch_list = [loss_train]
+    epoch_start_time = time.time()
+    for epoch_id in range(args.epochs):
+        model.drop_path_prob = args.drop_path_prob * epoch_id / args.epochs
+        train_py_reader.start()
+        epoch_end_time = time.time()
+        if epoch_id > 0:
+            print("Epoch {}, total time {}".format(epoch_id - 1, epoch_end_time
+                                                   - epoch_start_time))
+        epoch_start_time = epoch_end_time
+        epoch_end_time
+        start_time = time.time()
+        step_id = 0
+        try:
+            while True:
+                prev_start_time = start_time
+                start_time = time.time()
+                loss_v, = train_exe.run(
+                    fetch_list=[v.name for v in train_fetch_list])
+                print("Epoch {}, Step {}, loss {}, time {}".format(epoch_id, step_id, \
+                        np.array(loss_v).mean(), start_time-prev_start_time))
+                step_id += 1
+                sys.stdout.flush()
+        except fluid.core.EOFException:
+            train_py_reader.reset()
+        if epoch_id % 50 == 0 or epoch_id == args.epochs - 1:
+            save_model(str(epoch_id), train_prog)
+        test(epoch_id)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/fluid/AutoDL/LRC/utils.py b/fluid/AutoDL/LRC/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..4002b57c6e91f9a4f7992156c4fa07f9e55d628c
--- /dev/null
+++ b/fluid/AutoDL/LRC/utils.py
@@ -0,0 +1,55 @@
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Based on:
+# --------------------------------------------------------
+# DARTS
+# Copyright (c) 2018, Hanxiao Liu.
+# Licensed under the Apache License, Version 2.0;
+# --------------------------------------------------------
+
+import os
+import sys
+import time
+import math
+import numpy as np
+
+
+def mixup_data(x, y, batch_size, alpha=1.0):
+    '''Compute the mixup data. Return mixed inputs, pairs of targets, and lambda'''
+    if alpha > 0.:
+        lam = np.random.beta(alpha, alpha)
+    else:
+        lam = 1.
+    index = np.random.permutation(batch_size)
+
+    mixed_x = lam * x + (1 - lam) * x[index, :]
+    y_a, y_b = y, y[index]
+    return mixed_x.astype('float32'), y_a.astype('int64'),\
+           y_b.astype('int64'), np.array(lam, dtype='float32')
+
+
+class AvgrageMeter(object):
+    def __init__(self):
+        self.reset()
+
+    def reset(self):
+        self.avg = 0
+        self.sum = 0
+        self.cnt = 0
+
+    def update(self, val, n=1):
+        self.sum += val * n
+        self.cnt += n
+        self.avg = self.sum / self.cnt
diff --git a/fluid/DeepQNetwork/DQN_agent.py b/fluid/DeepQNetwork/DQN_agent.py
index 67eb3ce6a29bb723b481d6b1c2f517f037d52942..5b474325f656533b91965fd59d70c2d421e16fc3 100644
--- a/fluid/DeepQNetwork/DQN_agent.py
+++ b/fluid/DeepQNetwork/DQN_agent.py
@@ -1,11 +1,10 @@
 #-*- coding: utf-8 -*-
 
+import math
+import numpy as np
 import paddle.fluid as fluid
 from paddle.fluid.param_attr import ParamAttr
-import numpy as np
-import math
 from tqdm import tqdm
-from utils import fluid_flatten
 
 
 class DQNModel(object):
@@ -39,34 +38,51 @@ class DQNModel(object):
                    name='isOver', shape=[], dtype='bool')
 
     def _build_net(self):
-        state, action, reward, next_s, isOver = self._get_inputs()
-        self.pred_value = self.get_DQN_prediction(state)
-        self.predict_program = fluid.default_main_program().clone()
+        self.predict_program = fluid.Program()
+        self.train_program = fluid.Program()
+        self._sync_program = fluid.Program()
 
-        reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
+        with fluid.program_guard(self.predict_program):
+            state, action, reward, next_s, isOver = self._get_inputs()
+            self.pred_value = self.get_DQN_prediction(state)
 
-        action_onehot = fluid.layers.one_hot(action, self.action_dim)
-        action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
+        with fluid.program_guard(self.train_program):
+            state, action, reward, next_s, isOver = self._get_inputs()
+            pred_value = self.get_DQN_prediction(state)
 
-        pred_action_value = fluid.layers.reduce_sum(
-            fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
+            reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
 
-        targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
-        best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
-        best_v.stop_gradient = True
+            action_onehot = fluid.layers.one_hot(action, self.action_dim)
+            action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
 
-        target = reward + (1.0 - fluid.layers.cast(
-            isOver, dtype='float32')) * self.gamma * best_v
-        cost = fluid.layers.square_error_cost(pred_action_value, target)
-        cost = fluid.layers.reduce_mean(cost)
+            pred_action_value = fluid.layers.reduce_sum(
+                fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1)
 
-        self._sync_program = self._build_sync_target_network()
+            targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
+            best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
+            best_v.stop_gradient = True
 
-        optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
-        optimizer.minimize(cost)
+            target = reward + (1.0 - fluid.layers.cast(
+                isOver, dtype='float32')) * self.gamma * best_v
+            cost = fluid.layers.square_error_cost(pred_action_value, target)
+            cost = fluid.layers.reduce_mean(cost)
 
-        # define program
-        self.train_program = fluid.default_main_program()
+            optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
+            optimizer.minimize(cost)
+
+        vars = list(self.train_program.list_vars())
+        policy_vars = list(filter(
+            lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
+        target_vars = list(filter(
+            lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
+        policy_vars.sort(key=lambda x: x.name)
+        target_vars.sort(key=lambda x: x.name)
+        
+        with fluid.program_guard(self._sync_program):
+            sync_ops = []
+            for i, var in enumerate(policy_vars):
+                sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
+                sync_ops.append(sync_op)
 
         # fluid exe
         place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
@@ -81,50 +97,50 @@ class DQNModel(object):
         conv1 = fluid.layers.conv2d(
             input=image,
             num_filters=32,
-            filter_size=[5, 5],
-            stride=[1, 1],
-            padding=[2, 2],
+            filter_size=5,
+            stride=1,
+            padding=2,
             act='relu',
             param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
         max_pool1 = fluid.layers.pool2d(
-            input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+            input=conv1, pool_size=2, pool_stride=2, pool_type='max')
 
         conv2 = fluid.layers.conv2d(
             input=max_pool1,
             num_filters=32,
-            filter_size=[5, 5],
-            stride=[1, 1],
-            padding=[2, 2],
+            filter_size=5,
+            stride=1,
+            padding=2,
             act='relu',
             param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
         max_pool2 = fluid.layers.pool2d(
-            input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+            input=conv2, pool_size=2, pool_stride=2, pool_type='max')
 
         conv3 = fluid.layers.conv2d(
             input=max_pool2,
             num_filters=64,
-            filter_size=[4, 4],
-            stride=[1, 1],
-            padding=[1, 1],
+            filter_size=4,
+            stride=1,
+            padding=1,
             act='relu',
             param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
         max_pool3 = fluid.layers.pool2d(
-            input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+            input=conv3, pool_size=2, pool_stride=2, pool_type='max')
 
         conv4 = fluid.layers.conv2d(
             input=max_pool3,
             num_filters=64,
-            filter_size=[3, 3],
-            stride=[1, 1],
-            padding=[1, 1],
+            filter_size=3,
+            stride=1,
+            padding=1,
             act='relu',
             param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
 
-        flatten = fluid_flatten(conv4)
+        flatten = fluid.layers.flatten(conv4, axis=1)
 
         out = fluid.layers.fc(
             input=flatten,
@@ -133,23 +149,6 @@ class DQNModel(object):
             bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
         return out
 
-    def _build_sync_target_network(self):
-        vars = list(fluid.default_main_program().list_vars())
-        policy_vars = list(filter(
-            lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
-        target_vars = list(filter(
-            lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
-        policy_vars.sort(key=lambda x: x.name)
-        target_vars.sort(key=lambda x: x.name)
-
-        sync_program = fluid.default_main_program().clone()
-        with fluid.program_guard(sync_program):
-            sync_ops = []
-            for i, var in enumerate(policy_vars):
-                sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
-                sync_ops.append(sync_op)
-        sync_program = sync_program.prune(sync_ops)
-        return sync_program
 
     def act(self, state, train_or_test):
         sample = np.random.random()
diff --git a/fluid/DeepQNetwork/DoubleDQN_agent.py b/fluid/DeepQNetwork/DoubleDQN_agent.py
index 09b4b2119bab3fbdfa9bb9cfb8fae40fa34f87e1..c95ae5632fd2e904a625f680f4a9147d5615b765 100644
--- a/fluid/DeepQNetwork/DoubleDQN_agent.py
+++ b/fluid/DeepQNetwork/DoubleDQN_agent.py
@@ -1,11 +1,10 @@
 #-*- coding: utf-8 -*-
 
+import math
+import numpy as np
 import paddle.fluid as fluid
 from paddle.fluid.param_attr import ParamAttr
-import numpy as np
 from tqdm import tqdm
-import math
-from utils import fluid_argmax, fluid_flatten
 
 
 class DoubleDQNModel(object):
@@ -39,41 +38,59 @@ class DoubleDQNModel(object):
                    name='isOver', shape=[], dtype='bool')
 
     def _build_net(self):
-        state, action, reward, next_s, isOver = self._get_inputs()
-        self.pred_value = self.get_DQN_prediction(state)
-        self.predict_program = fluid.default_main_program().clone()
+        self.predict_program = fluid.Program()
+        self.train_program = fluid.Program()
+        self._sync_program = fluid.Program()
 
-        reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
+        with fluid.program_guard(self.predict_program):
+            state, action, reward, next_s, isOver = self._get_inputs()
+            self.pred_value = self.get_DQN_prediction(state)
 
-        action_onehot = fluid.layers.one_hot(action, self.action_dim)
-        action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
+        with fluid.program_guard(self.train_program):
+            state, action, reward, next_s, isOver = self._get_inputs()
+            pred_value = self.get_DQN_prediction(state)
 
-        pred_action_value = fluid.layers.reduce_sum(
-            fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
+            reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
 
-        targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
+            action_onehot = fluid.layers.one_hot(action, self.action_dim)
+            action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
 
-        next_s_predcit_value = self.get_DQN_prediction(next_s)
-        greedy_action = fluid_argmax(next_s_predcit_value)
+            pred_action_value = fluid.layers.reduce_sum(
+                fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1)
 
-        predict_onehot = fluid.layers.one_hot(greedy_action, self.action_dim)
-        best_v = fluid.layers.reduce_sum(
-            fluid.layers.elementwise_mul(predict_onehot, targetQ_predict_value),
-            dim=1)
-        best_v.stop_gradient = True
+            targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
 
-        target = reward + (1.0 - fluid.layers.cast(
-            isOver, dtype='float32')) * self.gamma * best_v
-        cost = fluid.layers.square_error_cost(pred_action_value, target)
-        cost = fluid.layers.reduce_mean(cost)
+            next_s_predcit_value = self.get_DQN_prediction(next_s)
+            greedy_action = fluid.layers.argmax(next_s_predcit_value, axis=1)
+            greedy_action = fluid.layers.unsqueeze(greedy_action, axes=[1])
 
-        self._sync_program = self._build_sync_target_network()
+            predict_onehot = fluid.layers.one_hot(greedy_action, self.action_dim)
+            best_v = fluid.layers.reduce_sum(
+                fluid.layers.elementwise_mul(predict_onehot, targetQ_predict_value),
+                dim=1)
+            best_v.stop_gradient = True
 
-        optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
-        optimizer.minimize(cost)
+            target = reward + (1.0 - fluid.layers.cast(
+                isOver, dtype='float32')) * self.gamma * best_v
+            cost = fluid.layers.square_error_cost(pred_action_value, target)
+            cost = fluid.layers.reduce_mean(cost)
 
-        # define program
-        self.train_program = fluid.default_main_program()
+            optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
+            optimizer.minimize(cost)
+
+        vars = list(self.train_program.list_vars())
+        policy_vars = list(filter(
+            lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
+        target_vars = list(filter(
+            lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
+        policy_vars.sort(key=lambda x: x.name)
+        target_vars.sort(key=lambda x: x.name)
+        
+        with fluid.program_guard(self._sync_program):
+            sync_ops = []
+            for i, var in enumerate(policy_vars):
+                sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
+                sync_ops.append(sync_op)
 
         # fluid exe
         place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
@@ -88,50 +105,50 @@ class DoubleDQNModel(object):
         conv1 = fluid.layers.conv2d(
             input=image,
             num_filters=32,
-            filter_size=[5, 5],
-            stride=[1, 1],
-            padding=[2, 2],
+            filter_size=5,
+            stride=1,
+            padding=2,
             act='relu',
             param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
         max_pool1 = fluid.layers.pool2d(
-            input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+            input=conv1, pool_size=2, pool_stride=2, pool_type='max')
 
         conv2 = fluid.layers.conv2d(
             input=max_pool1,
             num_filters=32,
-            filter_size=[5, 5],
-            stride=[1, 1],
-            padding=[2, 2],
+            filter_size=5,
+            stride=1,
+            padding=2,
             act='relu',
             param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
         max_pool2 = fluid.layers.pool2d(
-            input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+            input=conv2, pool_size=2, pool_stride=2, pool_type='max')
 
         conv3 = fluid.layers.conv2d(
             input=max_pool2,
             num_filters=64,
-            filter_size=[4, 4],
-            stride=[1, 1],
-            padding=[1, 1],
+            filter_size=4,
+            stride=1,
+            padding=1,
             act='relu',
             param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
         max_pool3 = fluid.layers.pool2d(
-            input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+            input=conv3, pool_size=2, pool_stride=2, pool_type='max')
 
         conv4 = fluid.layers.conv2d(
             input=max_pool3,
             num_filters=64,
-            filter_size=[3, 3],
-            stride=[1, 1],
-            padding=[1, 1],
+            filter_size=3,
+            stride=1,
+            padding=1,
             act='relu',
             param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
 
-        flatten = fluid_flatten(conv4)
+        flatten = fluid.layers.flatten(conv4, axis=1)
 
         out = fluid.layers.fc(
             input=flatten,
@@ -140,23 +157,6 @@ class DoubleDQNModel(object):
             bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
         return out
 
-    def _build_sync_target_network(self):
-        vars = list(fluid.default_main_program().list_vars())
-        policy_vars = list(filter(
-            lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
-        target_vars = list(filter(
-            lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
-        policy_vars.sort(key=lambda x: x.name)
-        target_vars.sort(key=lambda x: x.name)
-
-        sync_program = fluid.default_main_program().clone()
-        with fluid.program_guard(sync_program):
-            sync_ops = []
-            for i, var in enumerate(policy_vars):
-                sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
-                sync_ops.append(sync_op)
-        sync_program = sync_program.prune(sync_ops)
-        return sync_program
 
     def act(self, state, train_or_test):
         sample = np.random.random()
diff --git a/fluid/DeepQNetwork/DuelingDQN_agent.py b/fluid/DeepQNetwork/DuelingDQN_agent.py
index 271a767b7b5841cf1abe213fc477859e3cf5dd05..cf2ff71bb811e5dce62be78beab1f0afb05d31f9 100644
--- a/fluid/DeepQNetwork/DuelingDQN_agent.py
+++ b/fluid/DeepQNetwork/DuelingDQN_agent.py
@@ -1,11 +1,10 @@
 #-*- coding: utf-8 -*-
 
+import math
+import numpy as np
 import paddle.fluid as fluid
 from paddle.fluid.param_attr import ParamAttr
-import numpy as np
 from tqdm import tqdm
-import math
-from utils import fluid_flatten
 
 
 class DuelingDQNModel(object):
@@ -39,34 +38,51 @@ class DuelingDQNModel(object):
                    name='isOver', shape=[], dtype='bool')
 
     def _build_net(self):
-        state, action, reward, next_s, isOver = self._get_inputs()
-        self.pred_value = self.get_DQN_prediction(state)
-        self.predict_program = fluid.default_main_program().clone()
+        self.predict_program = fluid.Program()
+        self.train_program = fluid.Program()
+        self._sync_program = fluid.Program()
 
-        reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
+        with fluid.program_guard(self.predict_program):
+            state, action, reward, next_s, isOver = self._get_inputs()
+            self.pred_value = self.get_DQN_prediction(state)
 
-        action_onehot = fluid.layers.one_hot(action, self.action_dim)
-        action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
+        with fluid.program_guard(self.train_program):
+            state, action, reward, next_s, isOver = self._get_inputs()
+            pred_value = self.get_DQN_prediction(state)
 
-        pred_action_value = fluid.layers.reduce_sum(
-            fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
+            reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
 
-        targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
-        best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
-        best_v.stop_gradient = True
+            action_onehot = fluid.layers.one_hot(action, self.action_dim)
+            action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
 
-        target = reward + (1.0 - fluid.layers.cast(
-            isOver, dtype='float32')) * self.gamma * best_v
-        cost = fluid.layers.square_error_cost(pred_action_value, target)
-        cost = fluid.layers.reduce_mean(cost)
+            pred_action_value = fluid.layers.reduce_sum(
+                fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1)
 
-        self._sync_program = self._build_sync_target_network()
+            targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
+            best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
+            best_v.stop_gradient = True
 
-        optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
-        optimizer.minimize(cost)
+            target = reward + (1.0 - fluid.layers.cast(
+                isOver, dtype='float32')) * self.gamma * best_v
+            cost = fluid.layers.square_error_cost(pred_action_value, target)
+            cost = fluid.layers.reduce_mean(cost)
 
-        # define program
-        self.train_program = fluid.default_main_program()
+            optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
+            optimizer.minimize(cost)
+
+        vars = list(self.train_program.list_vars())
+        policy_vars = list(filter(
+            lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
+        target_vars = list(filter(
+            lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
+        policy_vars.sort(key=lambda x: x.name)
+        target_vars.sort(key=lambda x: x.name)
+        
+        with fluid.program_guard(self._sync_program):
+            sync_ops = []
+            for i, var in enumerate(policy_vars):
+                sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
+                sync_ops.append(sync_op)
 
         # fluid exe
         place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
@@ -81,50 +97,50 @@ class DuelingDQNModel(object):
         conv1 = fluid.layers.conv2d(
             input=image,
             num_filters=32,
-            filter_size=[5, 5],
-            stride=[1, 1],
-            padding=[2, 2],
+            filter_size=5,
+            stride=1,
+            padding=2,
             act='relu',
             param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
         max_pool1 = fluid.layers.pool2d(
-            input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+            input=conv1, pool_size=2, pool_stride=2, pool_type='max')
 
         conv2 = fluid.layers.conv2d(
             input=max_pool1,
             num_filters=32,
-            filter_size=[5, 5],
-            stride=[1, 1],
-            padding=[2, 2],
+            filter_size=5,
+            stride=1,
+            padding=2,
             act='relu',
             param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
         max_pool2 = fluid.layers.pool2d(
-            input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+            input=conv2, pool_size=2, pool_stride=2, pool_type='max')
 
         conv3 = fluid.layers.conv2d(
             input=max_pool2,
             num_filters=64,
-            filter_size=[4, 4],
-            stride=[1, 1],
-            padding=[1, 1],
+            filter_size=4,
+            stride=1,
+            padding=1,
             act='relu',
             param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
         max_pool3 = fluid.layers.pool2d(
-            input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+            input=conv3, pool_size=2, pool_stride=2, pool_type='max')
 
         conv4 = fluid.layers.conv2d(
             input=max_pool3,
             num_filters=64,
-            filter_size=[3, 3],
-            stride=[1, 1],
-            padding=[1, 1],
+            filter_size=3,
+            stride=1,
+            padding=1,
             act='relu',
             param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
             bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
 
-        flatten = fluid_flatten(conv4)
+        flatten = fluid.layers.flatten(conv4, axis=1)
 
         value = fluid.layers.fc(
             input=flatten,
@@ -143,24 +159,6 @@ class DuelingDQNModel(object):
             advantage, dim=1, keep_dim=True))
         return Q
 
-    def _build_sync_target_network(self):
-        vars = list(fluid.default_main_program().list_vars())
-        policy_vars = list(filter(
-            lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
-        target_vars = list(filter(
-            lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
-        policy_vars.sort(key=lambda x: x.name)
-        target_vars.sort(key=lambda x: x.name)
-
-        sync_program = fluid.default_main_program().clone()
-        with fluid.program_guard(sync_program):
-            sync_ops = []
-            for i, var in enumerate(policy_vars):
-                sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
-                sync_ops.append(sync_op)
-        # The prune API is deprecated, please don't use it any more.
-        sync_program = sync_program._prune(sync_ops)
-        return sync_program
 
     def act(self, state, train_or_test):
         sample = np.random.random()
@@ -186,12 +184,14 @@ class DuelingDQNModel(object):
         self.global_step += 1
 
         action = np.expand_dims(action, -1)
-        self.exe.run(self.train_program, \
-                  feed={'state': state.astype('float32'), \
-                        'action': action.astype('int32'), \
-                        'reward': reward, \
-                        'next_s': next_state.astype('float32'), \
-                        'isOver': isOver})
+        self.exe.run(self.train_program,
+                     feed={
+                         'state': state.astype('float32'),
+                         'action': action.astype('int32'),
+                         'reward': reward,
+                         'next_s': next_state.astype('float32'),
+                         'isOver': isOver
+                     })
 
     def sync_target_network(self):
         self.exe.run(self._sync_program)
diff --git a/fluid/DeepQNetwork/README.md b/fluid/DeepQNetwork/README.md
index e72920bcad29ce7ffd78bfb90a1406654298248d..1edeaaa884318ec3a530ec4fdb7d031d07411b56 100644
--- a/fluid/DeepQNetwork/README.md
+++ b/fluid/DeepQNetwork/README.md
@@ -29,7 +29,7 @@ The average game rewards that can be obtained for the three models as the number
 + gym
 + tqdm
 + opencv-python
-+ paddlepaddle-gpu>=0.12.0
++ paddlepaddle-gpu>=1.0.0
 + ale_python_interface
 
 ### Install Dependencies:
diff --git a/fluid/DeepQNetwork/README_cn.md b/fluid/DeepQNetwork/README_cn.md
index 68a65bffe8fab79ce563fefc894dd035c1572065..640d775ad8fed2be360d308b6c5df41c86d77c04 100644
--- a/fluid/DeepQNetwork/README_cn.md
+++ b/fluid/DeepQNetwork/README_cn.md
@@ -28,7 +28,7 @@
 + gym
 + tqdm
 + opencv-python
-+ paddlepaddle-gpu>=0.12.0
++ paddlepaddle-gpu>=1.0.0
 + ale_python_interface
 
 ### 下载依赖：
diff --git a/fluid/DeepQNetwork/utils.py b/fluid/DeepQNetwork/utils.py
deleted file mode 100644
index 26ed7fbdb54494c3cf9a983f8ecafdfbcd4d2719..0000000000000000000000000000000000000000
--- a/fluid/DeepQNetwork/utils.py
+++ /dev/null
@@ -1,20 +0,0 @@
-#-*- coding: utf-8 -*-
-#File: utils.py
-
-import paddle.fluid as fluid
-import numpy as np
-
-
-def fluid_argmax(x):
-    """
-    Get index of max value for the last dimension
-    """
-    _, max_index = fluid.layers.topk(x, k=1)
-    return max_index
-
-
-def fluid_flatten(x):
-    """
-    Flatten fluid variable along the first dimension
-    """
-    return fluid.layers.reshape(x, shape=[-1, np.prod(x.shape[1:])])
diff --git a/fluid/PaddleCV/HiNAS_models/nn_paddle.py b/fluid/PaddleCV/HiNAS_models/nn_paddle.py
index d56bca5f156f47dccad07d32e7ad9d383d3dd459..d3a3ddd60cf3e5e114de322f3eea763e5a2e6018 100755
--- a/fluid/PaddleCV/HiNAS_models/nn_paddle.py
+++ b/fluid/PaddleCV/HiNAS_models/nn_paddle.py
@@ -21,6 +21,7 @@ import math
 import numpy as np
 import paddle
 import paddle.fluid as fluid
+from paddle.fluid.contrib.trainer import *
 from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
 import reader
 
@@ -104,7 +105,7 @@ class Model(object):
         accs = []
 
         def event_handler(event):
-            if isinstance(event, fluid.EndStepEvent):
+            if isinstance(event, EndStepEvent):
                 costs.append(event.metrics[0])
                 accs.append(event.metrics[1])
                 if event.step % 20 == 0:
@@ -113,7 +114,7 @@ class Model(object):
                     del costs[:]
                     del accs[:]
 
-            if isinstance(event, fluid.EndEpochEvent):
+            if isinstance(event, EndEpochEvent):
                 if event.epoch % 3 == 0 or event.epoch == FLAGS.num_epochs - 1:
                     avg_cost, accuracy = trainer.test(
                         reader=test_reader, feed_order=['pixel', 'label'])
@@ -126,7 +127,7 @@ class Model(object):
 
         event_handler.best_acc = 0.0
         place = fluid.CUDAPlace(0)
-        trainer = fluid.Trainer(
+        trainer = Trainer(
             train_func=self.train_network,
             optimizer_func=self.optimizer_program,
             place=place)
diff --git a/fluid/PaddleCV/face_detection/data_util.py b/fluid/PaddleCV/face_detection/data_util.py
deleted file mode 100644
index a8f6aac6ba8a418f5d4645d167122a3bc4cb125b..0000000000000000000000000000000000000000
--- a/fluid/PaddleCV/face_detection/data_util.py
+++ /dev/null
@@ -1,157 +0,0 @@
-"""
-This code is based on https://github.com/fchollet/keras/blob/master/keras/utils/data_utils.py
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import time
-import numpy as np
-import threading
-import multiprocessing
-import traceback
-try:
-    import queue
-except ImportError:
-    import Queue as queue
-
-
-class GeneratorEnqueuer(object):
-    """
-    Builds a queue out of a data generator.
-
-    Args:
-        generator: a generator function which endlessly yields data
-        use_multiprocessing (bool): use multiprocessing if True,
-            otherwise use threading.
-        wait_time (float): time to sleep in-between calls to `put()`.
-        random_seed (int): Initial seed for workers,
-            will be incremented by one for each workers.
-    """
-
-    def __init__(self,
-                 generator,
-                 use_multiprocessing=False,
-                 wait_time=0.05,
-                 random_seed=None):
-        self.wait_time = wait_time
-        self._generator = generator
-        self._use_multiprocessing = use_multiprocessing
-        self._threads = []
-        self._stop_event = None
-        self.queue = None
-        self._manager = None
-        self.seed = random_seed
-
-    def start(self, workers=1, max_queue_size=10):
-        """
-        Start worker threads which add data from the generator into the queue.
-
-        Args:
-            workers (int): number of worker threads
-            max_queue_size (int): queue size
-                (when full, threads could block on `put()`)
-        """
-
-        def data_generator_task():
-            """
-            Data generator task.
-            """
-
-            def task():
-                if (self.queue is not None and
-                        self.queue.qsize() < max_queue_size):
-                    generator_output = next(self._generator)
-                    self.queue.put((generator_output))
-                else:
-                    time.sleep(self.wait_time)
-
-            if not self._use_multiprocessing:
-                while not self._stop_event.is_set():
-                    with self.genlock:
-                        try:
-                            task()
-                        except Exception:
-                            traceback.print_exc()
-                            self._stop_event.set()
-                            break
-            else:
-                while not self._stop_event.is_set():
-                    try:
-                        task()
-                    except Exception:
-                        traceback.print_exc()
-                        self._stop_event.set()
-                        break
-
-        try:
-            if self._use_multiprocessing:
-                self._manager = multiprocessing.Manager()
-                self.queue = self._manager.Queue(maxsize=max_queue_size)
-                self._stop_event = multiprocessing.Event()
-            else:
-                self.genlock = threading.Lock()
-                self.queue = queue.Queue()
-                self._stop_event = threading.Event()
-            for _ in range(workers):
-                if self._use_multiprocessing:
-                    # Reset random seed else all children processes
-                    # share the same seed
-                    np.random.seed(self.seed)
-                    thread = multiprocessing.Process(target=data_generator_task)
-                    thread.daemon = True
-                    if self.seed is not None:
-                        self.seed += 1
-                else:
-                    thread = threading.Thread(target=data_generator_task)
-                self._threads.append(thread)
-                thread.start()
-        except:
-            self.stop()
-            raise
-
-    def is_running(self):
-        """
-        Returns:
-            bool: Whether the worker theads are running.
-        """
-        return self._stop_event is not None and not self._stop_event.is_set()
-
-    def stop(self, timeout=None):
-        """
-        Stops running threads and wait for them to exit, if necessary.
-        Should be called by the same thread which called `start()`.
-
-        Args:
-            timeout(int|None): maximum time to wait on `thread.join()`.
-        """
-        if self.is_running():
-            self._stop_event.set()
-        for thread in self._threads:
-            if self._use_multiprocessing:
-                if thread.is_alive():
-                    thread.terminate()
-            else:
-                thread.join(timeout)
-        if self._manager:
-            self._manager.shutdown()
-
-        self._threads = []
-        self._stop_event = None
-        self.queue = None
-
-    def get(self):
-        """
-        Creates a generator to extract data from the queue.
-        Skip the data if it is `None`.
-
-        # Yields
-            tuple of data in the queue.
-        """
-        while self.is_running():
-            if not self.queue.empty():
-                inputs = self.queue.get()
-                if inputs is not None:
-                    yield inputs
-            else:
-                time.sleep(self.wait_time)
diff --git a/fluid/PaddleCV/face_detection/reader.py b/fluid/PaddleCV/face_detection/reader.py
index 2b38952d2d419ec5b658c762d2668f724dc92a09..4839ba5c5389a696fe0cb5f4fcd24daff42f217f 100644
--- a/fluid/PaddleCV/face_detection/reader.py
+++ b/fluid/PaddleCV/face_detection/reader.py
@@ -16,8 +16,6 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 
-import image_util
-from paddle.utils.image_util import *
 from PIL import Image
 from PIL import ImageDraw
 import numpy as np
@@ -28,7 +26,10 @@ import copy
 import random
 import cv2
 import six
-from data_util import GeneratorEnqueuer
+import math
+from itertools import islice
+import paddle
+import image_util
 
 
 class Settings(object):
@@ -199,7 +200,7 @@ def load_file_list(input_txt):
             else:
                 file_dict[num_class].append(line_txt)
 
-    return file_dict
+    return list(file_dict.values())
 
 
 def expand_bboxes(bboxes,
@@ -227,13 +228,12 @@ def expand_bboxes(bboxes,
 
 
 def train_generator(settings, file_list, batch_size, shuffle=True):
-    file_dict = load_file_list(file_list)
-    while True:
+    def reader():
         if shuffle:
-            np.random.shuffle(file_dict)
+            np.random.shuffle(file_list)
         batch_out = []
-        for index_image in file_dict.keys():
-            image_name = file_dict[index_image][0]
+        for item in file_list:
+            image_name = item[0]
             image_path = os.path.join(settings.data_dir, image_name)
             im = Image.open(image_path)
             if im.mode == 'L':
@@ -242,10 +242,10 @@ def train_generator(settings, file_list, batch_size, shuffle=True):
 
             # layout: label | xmin | ymin | xmax | ymax
             bbox_labels = []
-            for index_box in range(len(file_dict[index_image])):
+            for index_box in range(len(item)):
                 if index_box >= 2:
                     bbox_sample = []
-                    temp_info_box = file_dict[index_image][index_box].split(' ')
+                    temp_info_box = item[index_box].split(' ')
                     xmin = float(temp_info_box[0])
                     ymin = float(temp_info_box[1])
                     w = float(temp_info_box[2])
@@ -277,43 +277,25 @@ def train_generator(settings, file_list, batch_size, shuffle=True):
                 yield batch_out
                 batch_out = []
 
+    return reader
 
-def train(settings,
-          file_list,
-          batch_size,
-          shuffle=True,
-          use_multiprocessing=True,
-          num_workers=8,
-          max_queue=24):
-    def reader():
-        try:
-            enqueuer = GeneratorEnqueuer(
-                train_generator(settings, file_list, batch_size, shuffle),
-                use_multiprocessing=use_multiprocessing)
-            enqueuer.start(max_queue_size=max_queue, workers=num_workers)
-            generator_output = None
-            while True:
-                while enqueuer.is_running():
-                    if not enqueuer.queue.empty():
-                        generator_output = enqueuer.queue.get()
-                        break
-                    else:
-                        time.sleep(0.01)
-                yield generator_output
-                generator_output = None
-        finally:
-            if enqueuer is not None:
-                enqueuer.stop()
 
-    return reader
+def train(settings, file_list, batch_size, shuffle=True, num_workers=8):
+    file_lists = load_file_list(file_list)
+    n = int(math.ceil(len(file_lists) // num_workers))
+    split_lists = [file_lists[i:i + n] for i in range(0, len(file_lists), n)]
+    readers = []
+    for iterm in split_lists:
+        readers.append(train_generator(settings, iterm, batch_size, shuffle))
+    return paddle.reader.multiprocess_reader(readers, False)
 
 
 def test(settings, file_list):
-    file_dict = load_file_list(file_list)
+    file_lists = load_file_list(file_list)
 
     def reader():
-        for index_image in file_dict.keys():
-            image_name = file_dict[index_image][0]
+        for image in file_lists:
+            image_name = image[0]
             image_path = os.path.join(settings.data_dir, image_name)
             im = Image.open(image_path)
             if im.mode == 'L':
diff --git a/fluid/PaddleCV/face_detection/train.py b/fluid/PaddleCV/face_detection/train.py
index 71caab9702762cc7f823e6be3f22c9ed278ca364..2108bcc32a378bbb0803032108ddafea4161e202 100644
--- a/fluid/PaddleCV/face_detection/train.py
+++ b/fluid/PaddleCV/face_detection/train.py
@@ -163,9 +163,7 @@ def train(args, config, train_params, train_file_list):
                                 train_file_list,
                                 batch_size_per_device,
                                 shuffle = is_shuffle,
-                                use_multiprocessing=True,
-                                num_workers = num_workers,
-                                max_queue=24)
+                                num_workers = num_workers)
     train_py_reader.decorate_paddle_reader(train_reader)
 
     if args.parallel:
@@ -182,61 +180,59 @@ def train(args, config, train_params, train_file_list):
         print('save models to %s' % (model_path))
         fluid.io.save_persistables(exe, model_path, main_program=program)
 
-    train_py_reader.start()
-    try:
-        total_time = 0.0
-        epoch_idx = 0
-        face_loss = 0
-        head_loss = 0
-        for pass_id in range(start_epoc, epoc_num):
-            epoch_idx += 1
-            start_time = time.time()
-            prev_start_time = start_time
-            end_time = 0
-            batch_id = 0
-            for batch_id in range(iters_per_epoc):
+    total_time = 0.0
+    epoch_idx = 0
+    face_loss = 0
+    head_loss = 0
+    for pass_id in range(start_epoc, epoc_num):
+        epoch_idx += 1
+        start_time = time.time()
+        prev_start_time = start_time
+        end_time = 0
+        batch_id = 0
+        train_py_reader.start()
+        while True:
+            try:
                 prev_start_time = start_time
                 start_time = time.time()
                 if args.parallel:
                     fetch_vars = train_exe.run(fetch_list=
                         [v.name for v in fetches])
                 else:
-                    fetch_vars = exe.run(train_prog,
-                                         fetch_list=fetches)
+                    fetch_vars = exe.run(train_prog, fetch_list=fetches)
                 end_time = time.time()
                 fetch_vars = [np.mean(np.array(v)) for v in fetch_vars]
+                face_loss = fetch_vars[0]
+                head_loss = fetch_vars[1]
                 if batch_id % 10 == 0:
                     if not args.use_pyramidbox:
                         print("Pass {:d}, batch {:d}, loss {:.6f}, time {:.5f}".format(
-                            pass_id, batch_id, fetch_vars[0],
+                            pass_id, batch_id, face_loss,
                             start_time - prev_start_time))
                     else:
                         print("Pass {:d}, batch {:d}, face loss {:.6f}, " \
                               "head loss {:.6f}, " \
                               "time {:.5f}".format(pass_id,
-                               batch_id, fetch_vars[0], fetch_vars[1],
+                               batch_id, face_loss, head_loss,
                                start_time - prev_start_time))
-                face_loss = fetch_vars[0]
-                head_loss = fetch_vars[1]
-            epoch_end_time = time.time()
-            total_time += epoch_end_time - start_time
-            if pass_id % 1 == 0 or pass_id == epoc_num - 1:
-                save_model(str(pass_id), train_prog)
-        # only for ce
-        if args.enable_ce:
-            gpu_num = get_cards(args)
-            print("kpis\teach_pass_duration_card%s\t%s" %
-                    (gpu_num, total_time / epoch_idx))
-            print("kpis\ttrain_face_loss_card%s\t%s" %
-                    (gpu_num, face_loss))
-            print("kpis\ttrain_head_loss_card%s\t%s" %
-                    (gpu_num, head_loss))
-
-    except fluid.core.EOFException:
-        train_py_reader.reset()
-    except StopIteration:
-        train_py_reader.reset()
-    train_py_reader.reset()
+                batch_id += 1
+            except (fluid.core.EOFException, StopIteration):
+                train_py_reader.reset()
+                break
+        epoch_end_time = time.time()
+        total_time += epoch_end_time - start_time
+        save_model(str(pass_id), train_prog)
+
+    # only for ce
+    if args.enable_ce:
+        gpu_num = get_cards(args)
+        print("kpis\teach_pass_duration_card%s\t%s" %
+                (gpu_num, total_time / epoch_idx))
+        print("kpis\ttrain_face_loss_card%s\t%s" %
+                (gpu_num, face_loss))
+        print("kpis\ttrain_head_loss_card%s\t%s" %
+                (gpu_num, head_loss))
+
 
 
 def get_cards(args):
diff --git a/fluid/PaddleCV/face_detection/widerface_eval.py b/fluid/PaddleCV/face_detection/widerface_eval.py
index 1544442c78c38bcbcb537cd81374f5c72c7bfc5a..46eed9be5d064d50c824dd0769b07b9b425dfeb4 100644
--- a/fluid/PaddleCV/face_detection/widerface_eval.py
+++ b/fluid/PaddleCV/face_detection/widerface_eval.py
@@ -121,7 +121,7 @@ def detect_face(image, shrink):
                          return_numpy=False)
     detection = np.array(detection)
     # layout: xmin, ymin, xmax. ymax, score
-    if detection.shape == (1, ):
+    if np.prod(detection.shape) == 1:
         print("No face detected")
         return np.array([[0, 0, 0, 0, 0]])
     det_conf = detection[:, 1]
diff --git a/fluid/PaddleCV/faster_rcnn/image/Faster_RCNN.jpg b/fluid/PaddleCV/faster_rcnn/image/Faster_RCNN.jpg
deleted file mode 100644
index c2ab8085c914979eb23a59734d54797b6580e956..0000000000000000000000000000000000000000
Binary files a/fluid/PaddleCV/faster_rcnn/image/Faster_RCNN.jpg and /dev/null differ
diff --git a/fluid/PaddleCV/icnet/README.md b/fluid/PaddleCV/icnet/README.md
index dc350ff5e66993b33b976018df36369b773a90c3..84e067ab081f648a4107ece906bad9a52ae13bbc 100644
--- a/fluid/PaddleCV/icnet/README.md
+++ b/fluid/PaddleCV/icnet/README.md
@@ -103,7 +103,7 @@ python infer.py \
 ## 其他信息
 |数据集 | pretrained model |
 |---|---|
-|CityScape | [Model]()[md: ] |
+|CityScape | [pretrained_model](https://paddle-icnet-models.bj.bcebos.com/model_1000.tar.gz) |
 
 ## 参考
 
diff --git a/fluid/PaddleCV/image_classification/README.md b/fluid/PaddleCV/image_classification/README.md
index 57dc26005334eff06528dcb22a99c17659a61d2c..02500e90e9ff519029952680cb6942ebe38b0a6a 100644
--- a/fluid/PaddleCV/image_classification/README.md
+++ b/fluid/PaddleCV/image_classification/README.md
@@ -209,6 +209,7 @@ Models are trained by starting with learning rate ```0.1``` and decaying it by `
 |[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% |
 |[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% |
 |[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% |
+|[MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.zip) | 71.90%/90.55% | 71.53%/90.41% |
 |[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% |
 |[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% |
 |[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% |
diff --git a/fluid/PaddleCV/image_classification/README_cn.md b/fluid/PaddleCV/image_classification/README_cn.md
index 7fc35a643e95dae8c2197a96e1fab44b60e458a4..17e911596f2cb9830a51efaa839d4fa5fff38a74 100644
--- a/fluid/PaddleCV/image_classification/README_cn.md
+++ b/fluid/PaddleCV/image_classification/README_cn.md
@@ -204,6 +204,7 @@ Models包括两种模型：带有参数名字的模型，和不带有参数名
 |[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% |
 |[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% |
 |[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% |
+|[MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.zip) | 71.90%/90.55% | 71.53%/90.41% |
 |[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% |
 |[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% |
 |[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% |
diff --git a/fluid/PaddleCV/image_classification/dist_train/dist_train.py b/fluid/PaddleCV/image_classification/dist_train/dist_train.py
index cd1a84be8bc41630646289edcfcf9745a051d7e5..8b7d5c569100a2b3769f584311e35569f61cd13c 100644
--- a/fluid/PaddleCV/image_classification/dist_train/dist_train.py
+++ b/fluid/PaddleCV/image_classification/dist_train/dist_train.py
@@ -80,9 +80,9 @@ def get_device_num():
         device_num = subprocess.check_output(['nvidia-smi', '-L']).decode().count('\n')
     return device_num
 
-def prepare_reader(is_train, pyreader, args):
+def prepare_reader(is_train, pyreader, args, pass_id=0):
     if is_train:
-        reader = train(data_dir=args.data_dir)
+        reader = train(data_dir=args.data_dir, pass_id_as_seed=pass_id)
     else:
         reader = val(data_dir=args.data_dir)
     if is_train:
@@ -262,6 +262,8 @@ def train_parallel(args):
         num_samples = 0
         start_time = time.time()
         batch_id = 1
+        # use pass_id+1 as per pass global shuffle for distributed training
+        prepare_reader(True, train_pyreader, args, pass_id + 1)
         train_pyreader.start()
         while True:
             try:
diff --git a/fluid/PaddleCV/image_classification/reader.py b/fluid/PaddleCV/image_classification/reader.py
index 3d52acc3813d309153a75a2188b5587ecbe13e97..d9559df09ba34f3a6512f1c4628d454cd33c9ee2 100644
--- a/fluid/PaddleCV/image_classification/reader.py
+++ b/fluid/PaddleCV/image_classification/reader.py
@@ -130,11 +130,14 @@ def _reader_creator(file_list,
                     shuffle=False,
                     color_jitter=False,
                     rotate=False,
-                    data_dir=DATA_DIR):
+                    data_dir=DATA_DIR,
+                    pass_id_as_seed=0):
     def reader():
         with open(file_list) as flist:
             full_lines = [line.strip() for line in flist]
             if shuffle:
+                if pass_id_as_seed:
+                    np.random.seed(pass_id_as_seed)
                 np.random.shuffle(full_lines)
             if mode == 'train' and os.getenv('PADDLE_TRAINING_ROLE'):
                 # distributed mode if the env var `PADDLE_TRAINING_ROLE` exits
@@ -166,7 +169,7 @@ def _reader_creator(file_list,
     return paddle.reader.xmap_readers(mapper, reader, THREAD, BUF_SIZE)
 
 
-def train(data_dir=DATA_DIR):
+def train(data_dir=DATA_DIR, pass_id_as_seed=0):
     file_list = os.path.join(data_dir, 'train_list.txt')
     return _reader_creator(
         file_list,
@@ -174,7 +177,8 @@ def train(data_dir=DATA_DIR):
         shuffle=True,
         color_jitter=False,
         rotate=False,
-        data_dir=data_dir)
+        data_dir=data_dir,
+        pass_id_as_seed=pass_id_as_seed)
 
 
 def val(data_dir=DATA_DIR):
diff --git a/fluid/PaddleCV/image_classification/run.sh b/fluid/PaddleCV/image_classification/run.sh
index fbdacb87633a70b60ecdedf9a6f74e7287d2b2d0..41e5e493468a8d8bffbf6c6aabc0d7e7947e989b 100644
--- a/fluid/PaddleCV/image_classification/run.sh
+++ b/fluid/PaddleCV/image_classification/run.sh
@@ -6,7 +6,7 @@ python train.py \
        --class_dim=1000 \
        --image_shape=3,224,224 \
        --model_save_dir=output/ \
-       --with_mem_opt=False \
+       --with_mem_opt=True \
        --lr_strategy=piecewise_decay \
        --lr=0.1
 #      >log_SE_ResNeXt50_32x4d.txt 2>&1 &
@@ -19,7 +19,7 @@ python train.py \
 #       --class_dim=1000 \
 #       --image_shape=3,224,224 \
 #       --model_save_dir=output/ \
-#       --with_mem_opt=False \
+#       --with_mem_opt=True \
 #       --lr_strategy=piecewise_decay \
 #	--num_epochs=120 \
 #       --lr=0.01
@@ -32,7 +32,7 @@ python train.py \
 #       --class_dim=1000 \
 #       --image_shape=3,224,224 \
 #       --model_save_dir=output/ \
-#       --with_mem_opt=False \
+#       --with_mem_opt=True \
 #       --lr_strategy=piecewise_decay \
 #       --num_epochs=120 \
 #       --lr=0.1
@@ -46,12 +46,22 @@ python train.py \
 #       --class_dim=1000 \
 #       --image_shape=3,224,224 \
 #       --model_save_dir=output/ \
-#       --with_mem_opt=False \
+#       --with_mem_opt=True \
 #       --lr_strategy=piecewise_decay \
 #	--num_epochs=120 \
 #       --lr=0.1
 
-
+#python train.py \
+#	--model=MobileNetV2 \
+#	--batch_size=500 \
+#	--total_images=1281167 \
+#	--class_dim=1000 \
+#	--image_shape=3,224,224 \
+#	--model_save_dir=output/ \
+#	--with_mem_opt=True \
+#	--lr_strategy=cosine_decay \
+#	--num_epochs=200 \
+#	--lr=0.1
 #ResNet50:
 #python train.py \
 #       --model=ResNet50 \
@@ -60,7 +70,7 @@ python train.py \
 #       --class_dim=1000 \
 #       --image_shape=3,224,224 \
 #       --model_save_dir=output/ \
-#       --with_mem_opt=False \
+#       --with_mem_opt=True \
 #       --lr_strategy=piecewise_decay \
 #	--num_epochs=120 \
 #       --lr=0.1
@@ -87,7 +97,7 @@ python train.py \
 #       --lr_strategy=piecewise_decay \
 #       --lr=0.1 \
 #       --num_epochs=120 \
-#       --l2_decay=1e-4 \(TODO)
+#       --l2_decay=1e-4
 
 
 #SE_ResNeXt50:
@@ -99,7 +109,7 @@ python train.py \
 #       --lr_strategy=cosine_decay \
 #       --lr=0.1 \
 #       --num_epochs=200 \
-#       --l2_decay=12e-5 \(TODO)
+#       --l2_decay=12e-5
 
 #SE_ResNeXt101:
 #python train.py \
@@ -110,7 +120,7 @@ python train.py \
 #        --lr_strategy=cosine_decay \
 #        --lr=0.1 \
 #        --num_epochs=200 \
-#        --l2_decay=15e-5 \(TODO)
+#        --l2_decay=15e-5
 
 #VGG11:
 #python train.py \
@@ -121,7 +131,7 @@ python train.py \
 #        --lr_strategy=cosine_decay \
 #        --lr=0.1 \
 #        --num_epochs=90 \
-#        --l2_decay=2e-4 \(TODO)
+#        --l2_decay=2e-4
 
 #VGG13:
 #python train.py
@@ -132,4 +142,4 @@ python train.py \
 #        --lr_strategy=cosine_decay \
 #        --lr=0.01 \
 #        --num_epochs=90 \
-#        --l2_decay=3e-4 \(TODO)
+#        --l2_decay=3e-4
diff --git a/fluid/PaddleCV/image_classification/train.py b/fluid/PaddleCV/image_classification/train.py
index 6830773b91f2fa07c2b6f530a6370cedde82ffd7..87cef29e0eed4c9ed0944a7f5b3a14b76766d579 100644
--- a/fluid/PaddleCV/image_classification/train.py
+++ b/fluid/PaddleCV/image_classification/train.py
@@ -10,7 +10,6 @@ import math
 import paddle
 import paddle.fluid as fluid
 import paddle.dataset.flowers as flowers
-import models
 import reader
 import argparse
 import functools
@@ -19,8 +18,8 @@ import utils
 from utils.learning_rate import cosine_decay
 from utils.fp16_utils import create_master_params_grads, master_param_to_train_param
 from utility import add_arguments, print_arguments
-import models
-import models_name
+
+IMAGENET1000 = 1281167
 
 parser = argparse.ArgumentParser(description=__doc__)
 add_arg = functools.partial(add_arguments, argparser=parser)
@@ -40,25 +39,32 @@ add_arg('lr_strategy',      str,   "piecewise_decay",    "Set the learning rate
 add_arg('model',            str,   "SE_ResNeXt50_32x4d", "Set the network to use.")
 add_arg('enable_ce',        bool,  False,                "If set True, enable continuous evaluation job.")
 add_arg('data_dir',         str,   "./data/ILSVRC2012",  "The ImageNet dataset root dir.")
-add_arg('model_category',   str,   "models",             "Whether to use models_name or not, valid value:'models','models_name'" )
+add_arg('model_category',   str,   "models",        "Whether to use models_name or not, valid value:'models','models_name'." )
 add_arg('fp16',             bool,  False,                "Enable half precision training with fp16." )
 add_arg('scale_loss',       float, 1.0,                  "Scale loss for fp16." )
+add_arg('l2_decay',         float, 1e-4,                 "L2_decay parameter.")
+add_arg('momentum_rate',    float, 0.9,                  "momentum_rate.")
 # yapf: enable
 
 
-def set_models(model):
+def set_models(model_category):
     global models
-    if model == "models":
-        models = models
+    assert model_category in ["models", "models_name"
+                              ], "{} is not in lists: {}".format(
+                                  model_category, ["models", "models_name"])
+    if model_category == "models_name":
+        import models_name as models
     else:
-        models = models_name
+        import models as models
 
 
 def optimizer_setting(params):
     ls = params["learning_strategy"]
+    l2_decay = params["l2_decay"]
+    momentum_rate = params["momentum_rate"]
     if ls["name"] == "piecewise_decay":
         if "total_images" not in params:
-            total_images = 1281167
+            total_images = IMAGENET1000
         else:
             total_images = params["total_images"]
         batch_size = ls["batch_size"]
@@ -71,16 +77,17 @@ def optimizer_setting(params):
         optimizer = fluid.optimizer.Momentum(
             learning_rate=fluid.layers.piecewise_decay(
                 boundaries=bd, values=lr),
-            momentum=0.9,
-            regularization=fluid.regularizer.L2Decay(1e-4))
+            momentum=momentum_rate,
+            regularization=fluid.regularizer.L2Decay(l2_decay))
 
     elif ls["name"] == "cosine_decay":
         if "total_images" not in params:
-            total_images = 1281167
+            total_images = IMAGENET1000
         else:
             total_images = params["total_images"]
-
         batch_size = ls["batch_size"]
+        l2_decay = params["l2_decay"]
+        momentum_rate = params["momentum_rate"]
         step = int(total_images / batch_size + 1)
 
         lr = params["lr"]
@@ -89,43 +96,42 @@ def optimizer_setting(params):
         optimizer = fluid.optimizer.Momentum(
             learning_rate=cosine_decay(
                 learning_rate=lr, step_each_epoch=step, epochs=num_epochs),
-            momentum=0.9,
-            regularization=fluid.regularizer.L2Decay(4e-5))
-    elif ls["name"] == "exponential_decay":
+            momentum=momentum_rate,
+            regularization=fluid.regularizer.L2Decay(l2_decay))
+    elif ls["name"] == "linear_decay":
         if "total_images" not in params:
-            total_images = 1281167
+            total_images = IMAGENET1000
         else:
             total_images = params["total_images"]
         batch_size = ls["batch_size"]
-        step = int(total_images / batch_size +1)
-        lr = params["lr"]
         num_epochs = params["num_epochs"]
-        learning_decay_rate_factor=ls["learning_decay_rate_factor"]
-        num_epochs_per_decay = ls["num_epochs_per_decay"]
-        NUM_GPUS = 1
-
+        start_lr = params["lr"]
+        l2_decay = params["l2_decay"]
+        momentum_rate = params["momentum_rate"]
+        end_lr = 0
+        total_step = int((total_images / batch_size) * num_epochs)
+        lr = fluid.layers.polynomial_decay(
+            start_lr, total_step, end_lr, power=1)
         optimizer = fluid.optimizer.Momentum(
-            learning_rate=fluid.layers.exponential_decay(
-                learning_rate = lr * NUM_GPUS,
-                decay_steps = step * num_epochs_per_decay / NUM_GPUS,
-                decay_rate = learning_decay_rate_factor),
-            momentum=0.9,
-
-            regularization = fluid.regularizer.L2Decay(4e-5))
-
+            learning_rate=lr,
+            momentum=momentum_rate,
+            regularization=fluid.regularizer.L2Decay(l2_decay))
     else:
         lr = params["lr"]
+        l2_decay = params["l2_decay"]
+        momentum_rate = params["momentum_rate"]
         optimizer = fluid.optimizer.Momentum(
             learning_rate=lr,
-            momentum=0.9,
-            regularization=fluid.regularizer.L2Decay(1e-4))
+            momentum=momentum_rate,
+            regularization=fluid.regularizer.L2Decay(l2_decay))
 
     return optimizer
 
+
 def net_config(image, label, model, args):
     model_list = [m for m in dir(models) if "__" not in m]
-    assert args.model in model_list,"{} is not lists: {}".format(
-        args.model, model_list)
+    assert args.model in model_list, "{} is not lists: {}".format(args.model,
+                                                                  model_list)
 
     class_dim = args.class_dim
     model_name = args.model
@@ -148,8 +154,9 @@ def net_config(image, label, model, args):
         acc_top1 = fluid.layers.accuracy(input=out0, label=label, k=1)
         acc_top5 = fluid.layers.accuracy(input=out0, label=label, k=5)
     else:
-        out = model.net(input=image, class_dim=class_dim)    
-        cost, pred = fluid.layers.softmax_with_cross_entropy(out, label, return_softmax=True) 
+        out = model.net(input=image, class_dim=class_dim)
+        cost, pred = fluid.layers.softmax_with_cross_entropy(
+            out, label, return_softmax=True)
         if args.scale_loss > 1:
             avg_cost = fluid.layers.mean(x=cost) * float(args.scale_loss)
         else:
@@ -190,19 +197,25 @@ def build_program(is_train, main_prog, startup_prog, args):
                 params["num_epochs"] = args.num_epochs
                 params["learning_strategy"]["batch_size"] = args.batch_size
                 params["learning_strategy"]["name"] = args.lr_strategy
+                params["l2_decay"] = args.l2_decay
+                params["momentum_rate"] = args.momentum_rate
 
                 optimizer = optimizer_setting(params)
-
                 if args.fp16:
                     params_grads = optimizer.backward(avg_cost)
                     master_params_grads = create_master_params_grads(
                         params_grads, main_prog, startup_prog, args.scale_loss)
                     optimizer.apply_gradients(master_params_grads)
-                    master_param_to_train_param(master_params_grads, params_grads, main_prog)
+                    master_param_to_train_param(master_params_grads,
+                                                params_grads, main_prog)
                 else:
                     optimizer.minimize(avg_cost)
+                global_lr = optimizer._global_learning_rate()
 
-    return py_reader, avg_cost, acc_top1, acc_top5
+    if is_train:
+        return py_reader, avg_cost, acc_top1, acc_top5, global_lr
+    else:
+        return py_reader, avg_cost, acc_top1, acc_top5
 
 
 def train(args):
@@ -220,7 +233,7 @@ def train(args):
         startup_prog.random_seed = 1000
         train_prog.random_seed = 1000
 
-    train_py_reader, train_cost, train_acc1, train_acc5 = build_program(
+    train_py_reader, train_cost, train_acc1, train_acc5, global_lr = build_program(
         is_train=True,
         main_prog=train_prog,
         startup_prog=startup_prog,
@@ -255,7 +268,8 @@ def train(args):
     if visible_device:
         device_num = len(visible_device.split(','))
     else:
-        device_num = subprocess.check_output(['nvidia-smi', '-L']).decode().count('\n')
+        device_num = subprocess.check_output(
+            ['nvidia-smi', '-L']).decode().count('\n')
 
     train_batch_size = args.batch_size / device_num
     test_batch_size = 16
@@ -283,11 +297,12 @@ def train(args):
         use_cuda=bool(args.use_gpu),
         loss_name=train_cost.name)
 
-    train_fetch_list = [train_cost.name, train_acc1.name, train_acc5.name]
+    train_fetch_list = [
+        train_cost.name, train_acc1.name, train_acc5.name, global_lr.name
+    ]
     test_fetch_list = [test_cost.name, test_acc1.name, test_acc5.name]
 
     params = models.__dict__[args.model]().params
-
     for pass_id in range(params["num_epochs"]):
 
         train_py_reader.start()
@@ -299,7 +314,9 @@ def train(args):
         try:
             while True:
                 t1 = time.time()
-                loss, acc1, acc5 = train_exe.run(fetch_list=train_fetch_list)
+                loss, acc1, acc5, lr = train_exe.run(
+                    fetch_list=train_fetch_list)
+
                 t2 = time.time()
                 period = t2 - t1
                 loss = np.mean(np.array(loss))
@@ -308,12 +325,14 @@ def train(args):
                 train_info[0].append(loss)
                 train_info[1].append(acc1)
                 train_info[2].append(acc5)
+                lr = np.mean(np.array(lr))
                 train_time.append(period)
+
                 if batch_id % 10 == 0:
                     print("Pass {0}, trainbatch {1}, loss {2}, \
-                        acc1 {3}, acc5 {4} time {5}"
-                          .format(pass_id, batch_id, loss, acc1, acc5,
-                                  "%2.2f sec" % period))
+                        acc1 {3}, acc5 {4}, lr{5}, time {6}"
+                          .format(pass_id, batch_id, loss, acc1, acc5, "%.5f" %
+                                  lr, "%2.2f sec" % period))
                     sys.stdout.flush()
                 batch_id += 1
         except fluid.core.EOFException:
@@ -322,7 +341,8 @@ def train(args):
         train_loss = np.array(train_info[0]).mean()
         train_acc1 = np.array(train_info[1]).mean()
         train_acc5 = np.array(train_info[2]).mean()
-        train_speed = np.array(train_time).mean() / (train_batch_size * device_num)
+        train_speed = np.array(train_time).mean() / (train_batch_size *
+                                                     device_num)
 
         test_py_reader.start()
 
@@ -394,10 +414,7 @@ def train(args):
 
 def main():
     args = parser.parse_args()
-    models_now = args.model_category
-    assert models_now in ["models", "models_name"], "{} is not in lists: {}".format(
-            models_now, ["models", "models_name"])
-    set_models(models_now)
+    set_models(args.model_category)
     print_arguments(args)
     train(args)
 
diff --git a/fluid/PaddleCV/metric_learning/README.md b/fluid/PaddleCV/metric_learning/README.md
index c961bf4842727dab8e808ef016d10efa219bf2f6..71ecb5cf1cc10506abf2ce8c225a57b630b56d1a 100644
--- a/fluid/PaddleCV/metric_learning/README.md
+++ b/fluid/PaddleCV/metric_learning/README.md
@@ -1,15 +1,15 @@
 # Deep Metric Learning
-Metric learning is a kind of methods to learn discriminative features for each sample, with the purpose that intra-class samples have smaller distances while inter-class samples have larger distances in the learned space. With the develop of deep learning technique, metric learning methods are combined with deep neural networks to boost the performance of traditional tasks, such as face recognition/verification, human re-identification, image retrieval and so on. In this page, we introduce the way to implement deep metric learning using PaddlePaddle Fluid, including [data preparation](#data-preparation), [training](#training-a-model), [finetuning](#finetuning), [evaluation](#evaluation) and [inference](#inference).
+Metric learning is a kind of methods to learn discriminative features for each sample, with the purpose that intra-class samples have smaller distances while inter-class samples have larger distances in the learned space. With the develop of deep learning technique, metric learning methods are combined with deep neural networks to boost the performance of traditional tasks, such as face recognition/verification, human re-identification, image retrieval and so on. In this page, we introduce the way to implement deep metric learning using PaddlePaddle Fluid, including [data preparation](#data-preparation), [training](#training-metric-learning-models), [finetuning](#finetuning), [evaluation](#evaluation), [inference](#inference) and [Performances](#performances).
 
 ---
 ## Table of Contents
 - [Installation](#installation)
 - [Data preparation](#data-preparation)
-- [Training metric learning models](#training-a-model)
+- [Training metric learning models](#training-metric-learning-models)
 - [Finetuning](#finetuning)
 - [Evaluation](#evaluation)
 - [Inference](#inference)
-- [Performances](#supported-models)
+- [Performances](#performances)
 
 ## Installation
 
@@ -17,7 +17,7 @@ Running sample code in this directory requires PaddelPaddle Fluid v0.14.0 and la
 
 ## Data preparation
 
-Stanford Online Product(SOP) dataset contains 120,053 images of 22,634 products downloaded from eBay.com. We use it to conduct the metric learning experiments. For training, 59,5511 out of 11,318 classes are used, and 11,316 classes(60,502 images) are held out for testing. First of all, preparation of SOP data can be done as:
+Stanford Online Product(SOP) dataset contains 120,053 images of 22,634 products downloaded from eBay.com. We use it to conduct the metric learning experiments. For training, 59,551 out of 11,318 classes are used, and 11,316 classes(60,502 images) are held out for testing. First of all, preparation of SOP data can be done as:
 ```
 cd data/
 sh download_sop.sh
@@ -25,7 +25,7 @@ sh download_sop.sh
 
 ## Training metric learning models
 
-To train a metric learning model, one need to set the neural network as backbone and the metric loss function to optimize. We train meiric learning model using softmax or [arcmargin](https://arxiv.org/abs/1801.07698) loss firstly, and then fine-turned the model using other metric learning loss, such as triplet, [quadruplet](https://arxiv.org/abs/1710.00478) and [eml](https://arxiv.org/abs/1212.6094) loss. One example of training using arcmargin loss is shown below:
+To train a metric learning model, one need to set the neural network as backbone and the metric loss function to optimize. We train meiric learning model using softmax or arcmargin loss firstly, and then fine-turned the model using other metric learning loss, such as triplet, quadruplet and eml loss. One example of training using arcmargin loss is shown below:
 
 
 ```
@@ -52,7 +52,7 @@ python train_elem.py  \
 * **use_gpu**: whether to use GPU or not. Default: True.
 * **pretrained_model**: model path for pretraining. Default: None.
 * **model_save_dir**: the directory to save trained model. Default: "output".
-* **loss_name**: loss fortraining model. Default: "softmax".
+* **loss_name**: loss for training model. Default: "softmax".
 * **arc_scale**: parameter of arcmargin loss. Default: 80.0.
 * **arc_margin**: parameter of arcmargin loss. Default: 0.15.
 * **arc_easy_margin**: parameter of arcmargin loss. Default: False.
@@ -103,3 +103,9 @@ For comparation, many metric learning models with different neural networks and
 |fine-tuned with triplet | 78.37% | 79.21%
 |fine-tuned with quadruplet | 78.10% | 79.59%
 |fine-tuned with eml | 79.32% | 80.11%
+
+## Reference
+
+- ArcFace: Additive Angular Margin Loss for Deep Face Recognition [link](https://arxiv.org/abs/1801.07698)
+- Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification [link](https://arxiv.org/abs/1710.00478)
+- Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval [link](https://arxiv.org/abs/1212.6094)
diff --git a/fluid/PaddleCV/metric_learning/README_cn.md b/fluid/PaddleCV/metric_learning/README_cn.md
new file mode 100644
index 0000000000000000000000000000000000000000..c155200c64e8e21549a6d642f89d95fdcb0acd11
--- /dev/null
+++ b/fluid/PaddleCV/metric_learning/README_cn.md
@@ -0,0 +1,111 @@
+# 深度度量学习
+度量学习是一种为样本对学习具有区分性特征的方法，目的是在特征空间中，让同一个类别的样本具有较小的特征距离，不同类的样本具有较大的特征距离。随着深度学习技术的发展，基于深度神经网络的度量学习方法已经在许多视觉任务上提升了很大的性能，例如：人脸识别、人脸校验、行人重识别和图像检索等等。在本章节，介绍在PaddlePaddle Fluid里实现的几种度量学习方法和使用方法，具体包括[数据准备](#数据准备)，[模型训练](#模型训练)，[模型微调](#模型微调)，[模型评估](#模型评估)，[模型预测](#模型预测)。
+
+---
+## 简介
+- [安装](#安装)
+- [数据准备](#数据准备)
+- [模型训练](#模型训练)
+- [模型微调](#模型微调)
+- [模型评估](#模型评估)
+- [模型预测](#模型预测)
+- [模型性能](#模型性能)
+
+## 安装
+
+运行本章节代码需要在PaddlePaddle Fluid v0.14.0 或更高的版本环境。如果你的设备上的PaddlePaddle版本低于v0.14.0，请按照此[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)进行安装和跟新。
+
+## 数据准备
+
+Stanford Online Product(SOP) 数据集下载自eBay，包含120053张商品图片，有22634个类别。我们使用该数据集进行实验。训练时，使用59551张图片，11318个类别的数据；测试时，使用60502张图片，11316个类别。首先，SOP数据集可以使用以下脚本下载：
+```
+cd data/
+sh download_sop.sh
+```
+
+## 模型训练 
+
+为了训练度量学习模型，我们需要一个神经网络模型作为骨架模型（如ResNet50）和度量学习代价函数来进行优化。我们首先使用 softmax 或者 arcmargin 来进行训练，然后使用其它的代价函数来进行微调，例如：triplet，quadruplet和eml。下面是一个使用arcmargin训练的例子：
+
+
+```
+python train_elem.py  \
+        --model=ResNet50 \
+        --train_batch_size=256 \
+        --test_batch_size=50 \
+        --lr=0.01 \
+        --total_iter_num=30000 \
+        --use_gpu=True \
+        --pretrained_model=${path_to_pretrain_imagenet_model} \
+        --model_save_dir=${output_model_path} \
+        --loss_name=arcmargin \
+        --arc_scale=80.0 \ 
+        --arc_margin=0.15 \
+        --arc_easy_margin=False
+```
+**参数介绍:**
+* **model**: 使用的模型名字. 默认: "ResNet50".
+* **train_batch_size**: 训练的 mini-batch大小. 默认: 256.
+* **test_batch_size**: 测试的 mini-batch大小. 默认: 50.
+* **lr**: 初始学习率. 默认: 0.01.
+* **total_iter_num**: 总的训练迭代轮数. 默认: 30000.
+* **use_gpu**: 是否使用GPU. 默认: True.
+* **pretrained_model**: 预训练模型的路径. 默认: None.
+* **model_save_dir**: 保存模型的路径. 默认: "output".
+* **loss_name**: 优化的代价函数. 默认: "softmax".
+* **arc_scale**: arcmargin的参数. 默认: 80.0.
+* **arc_margin**: arcmargin的参数. 默认: 0.15.
+* **arc_easy_margin**: arcmargin的参数. 默认: False.
+
+## 模型微调
+
+网络微调是在指定的任务上加载已有的模型来微调网络。在用softmax和arcmargin训完网络后，可以继续使用triplet，quadruplet或eml来微调网络。下面是一个使用eml来微调网络的例子：
+
+```
+python train_pair.py  \
+        --model=ResNet50 \
+        --train_batch_size=160 \
+        --test_batch_size=50 \
+        --lr=0.0001 \
+        --total_iter_num=100000 \
+        --use_gpu=True \
+        --pretrained_model=${path_to_pretrain_arcmargin_model} \
+        --model_save_dir=${output_model_path} \
+        --loss_name=eml \
+        --samples_each_class=2
+```
+
+## 模型评估
+模型评估主要是评估模型的检索性能。这里需要设置```path_to_pretrain_model```。可以使用下面命令来计算Recall@Rank-1。
+```
+python eval.py \
+       --model=ResNet50 \
+       --batch_size=50 \
+       --pretrained_model=${path_to_pretrain_model} \
+```
+
+## 模型预测
+模型预测主要是基于训练好的网络来获取图像数据的特征，下面是模型预测的例子：
+```
+python infer.py \
+       --model=ResNet50 \
+       --batch_size=1 \         
+       --pretrained_model=${path_to_pretrain_model}
+```
+
+## 模型性能
+
+下面列举了几种度量学习的代价函数在SOP数据集上的检索效果，这里使用Recall@Rank-1来进行评估。
+
+|预训练模型 | softmax | arcmargin
+|- | - | -:
+|未微调 | 77.42% | 78.11%
+|使用triplet微调 | 78.37% | 79.21%
+|使用quadruplet微调 | 78.10% | 79.59%
+|使用eml微调 | 79.32% | 80.11%
+
+## 引用
+
+- ArcFace: Additive Angular Margin Loss for Deep Face Recognition [链接](https://arxiv.org/abs/1801.07698)
+- Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification [链接](https://arxiv.org/abs/1710.00478)
+- Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval [链接](https://arxiv.org/abs/1212.6094)
diff --git a/fluid/PaddleCV/metric_learning/reader.py b/fluid/PaddleCV/metric_learning/reader.py
index 9c5aaf396d7b535bc4729a31834e2ef0f8151a28..ac8f257ecbadbb08454ffc616b1c587455cc92b6 100644
--- a/fluid/PaddleCV/metric_learning/reader.py
+++ b/fluid/PaddleCV/metric_learning/reader.py
@@ -63,6 +63,7 @@ def common_iterator(data, settings):
     assert (batch_size % samples_each_class == 0)
     class_num = batch_size // samples_each_class 
     def train_iterator():
+        count = 0
         labs = list(data.keys())
         lab_num = len(labs)
         ind = list(range(0, lab_num))
@@ -79,6 +80,9 @@ def common_iterator(data, settings):
                 for anchor_ind_i in anchor_ind:
                     anchor_path = DATA_DIR + data_list[anchor_ind_i]
                     yield anchor_path, lab
+            count += 1
+            if count >= settings.total_iter_num + 1:
+                return
 
     return train_iterator
 
@@ -86,6 +90,8 @@ def triplet_iterator(data, settings):
     batch_size = settings.train_batch_size
     assert (batch_size % 3 == 0)
     def train_iterator():
+        total_count = settings.train_batch_size * (settings.total_iter_num + 1)
+        count = 0
         labs = list(data.keys())
         lab_num = len(labs)
         ind = list(range(0, lab_num))
@@ -108,16 +114,24 @@ def triplet_iterator(data, settings):
             yield pos_path, lab_pos
             neg_path = DATA_DIR + neg_data_list[neg_ind]
             yield neg_path, lab_neg
+            count += 3
+            if count >= total_count:
+                return
 
     return train_iterator
 
 def arcmargin_iterator(data, settings):
     def train_iterator():
+        total_count = settings.train_batch_size * (settings.total_iter_num + 1)
+        count = 0
         while True:
             for items in data:
                 path, label = items
                 path = DATA_DIR + path
                 yield path, label
+                count += 1
+                if count >= total_count:
+                    return
     return train_iterator
 
 def image_iterator(data, mode):
diff --git a/fluid/PaddleCV/object_detection/README.md b/fluid/PaddleCV/object_detection/README.md
index 651016cdffa7fe6c4fa1dc5e886b9b18e8e40b04..2466ba96577c7cb1e2bb335a0b8b5c74edbb92fd 100644
--- a/fluid/PaddleCV/object_detection/README.md
+++ b/fluid/PaddleCV/object_detection/README.md
@@ -21,9 +21,7 @@ SSD is readily pluggable into a wide variant standard convolutional network, suc
 
 ### Data Preparation
 
-You can use [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) or [MS-COCO dataset](http://cocodataset.org/#download).
-
-If you want to train a model on PASCAL VOC dataset, please download dataset at first, skip this step if you already have one.
+Please download [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) at first, skip this step if you already have one.
 
 ```bash
 cd data/pascalvoc
@@ -32,30 +30,18 @@ cd data/pascalvoc
 
 The command `download.sh` also will create training and testing file lists.
 
-If you want to train a model on MS-COCO dataset, please download dataset at first, skip this step if you already have one.
-
-```
-cd data/coco
-./download.sh
-```
-
 ### Train
 
 #### Download the Pre-trained Model.
 
-We provide two pre-trained models. The one is MobileNet-v1 SSD trained on COCO dataset, but removed the convolutional predictors for COCO dataset. This model can be used to initialize the models when training other datasets, like PASCAL VOC. The other pre-trained model is MobileNet-v1 trained on ImageNet 2012 dataset but removed the last weights and bias in the Fully-Connected layer.
-
-Declaration: the MobileNet-v1 SSD model is converted by [TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md). The MobileNet-v1 model is converted from [Caffe](https://github.com/shicai/MobileNet-Caffe).
-We will release the pre-trained models by ourself in the upcoming soon.
+We provide two pre-trained models. The one is MobileNet-v1 SSD trained on COCO dataset, but removed the convolutional predictors for COCO dataset. This model can be used to initialize the models when training other datasets, like PASCAL VOC. The other pre-trained model is MobileNet-v1 trained on ImageNet 2012 dataset but removed the last weights and bias in the Fully-Connected layer. Download MobileNet-v1 SSD:
 
-  - Download MobileNet-v1 SSD:
     ```bash
     ./pretrained/download_coco.sh
     ```
-  - Download MobileNet-v1:
-    ```bash
-    ./pretrained/download_imagenet.sh
-    ```
+
+Declaration: the MobileNet-v1 SSD model is converted by [TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md).
+
 
 #### Train on PASCAL VOC
 
@@ -64,7 +50,6 @@ We will release the pre-trained models by ourself in the upcoming soon.
   python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/'
   ```
    - Set ```export CUDA_VISIBLE_DEVICES=0,1``` to specifiy the number of GPU you want to use.
-   - Set ```--dataset='coco2014'``` or ```--dataset='coco2017'``` to train model on MS COCO dataset.
    - For more help on arguments:
 
   ```bash
@@ -88,19 +73,6 @@ You can evaluate your trained model in different metrics like 11point, integral
 python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/best_model' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point' --nms_threshold=0.45
 ```
 
-You can set ```--dataset``` to ```coco2014``` or ```coco2017``` to evaluate COCO dataset. Moreover, we provide `eval_coco_map.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). To use this eval_coco_map.py, [cocoapi](https://github.com/cocodataset/cocoapi) is needed.
-Install the cocoapi:
-```
-# COCOAPI=/path/to/clone/cocoapi
-git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
-cd $COCOAPI/PythonAPI
-# Install into global site-packages
-make install
-# Alternatively, if you do not have permissions or prefer
-# not to install the COCO API into global site-packages
-python2 setup.py install --user
-```
-
 ### Infer and Visualize
 `infer.py` is the main caller of the inferring module. Examples of usage are shown below.
 ```bash
diff --git a/fluid/PaddleCV/object_detection/README_cn.md b/fluid/PaddleCV/object_detection/README_cn.md
index 99603953a9dad956bcd13e7af68c59a9ae45c9cd..8c4cecab28e49c10820e092d3a521facf4be68ea 100644
--- a/fluid/PaddleCV/object_detection/README_cn.md
+++ b/fluid/PaddleCV/object_detection/README_cn.md
@@ -21,9 +21,8 @@ SSD 可以方便地插入到任何一种标准卷积网络中，比如 VGG、Res
 
 ### 数据准备
 
-你可以使用 [PASCAL VOC 数据集](http://host.robots.ox.ac.uk/pascal/VOC/) 或者 [MS-COCO 数据集](http://cocodataset.org/#download)。
 
-如果你想在 PASCAL VOC 数据集上进行训练，请先使用下面的命令下载数据集。
+请先使用下面的命令下载 [PASCAL VOC 数据集](http://host.robots.ox.ac.uk/pascal/VOC/)：
 
 ```bash
 cd data/pascalvoc
@@ -32,29 +31,19 @@ cd data/pascalvoc
 
 `download.sh` 命令会自动创建训练和测试用的列表文件。
 
-如果你想在 MS-COCO 数据集上进行训练，请先使用下面的命令下载数据集。
-
-```
-cd data/coco
-./download.sh
-```
 
 ### 模型训练
 
 #### 下载预训练模型
 
-我们提供了两个预训练模型。第一个模型是在 COCO 数据集上预训练的 MobileNet-v1 SSD，我们将它的预测头移除了以便在 COCO 以外的数据集上进行训练。第二个模型是在 ImageNet 2012 数据集上预训练的 MobileNet-v1，我们也将最后的全连接层移除以便进行目标检测训练。
-
-声明：MobileNet-v1 SSD 模型转换自[TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md)。MobileNet-v1 模型转换自[Caffe](https://github.com/shicai/MobileNet-Caffe)。我们不久也会发布我们自己预训练的模型。
+我们提供了两个预训练模型。第一个模型是在 COCO 数据集上预训练的 MobileNet-v1 SSD，我们将它的预测头移除了以便在 COCO 以外的数据集上进行训练。第二个模型是在 ImageNet 2012 数据集上预训练的 MobileNet-v1，我们也将最后的全连接层移除以便进行目标检测训练。下载 MobileNet-v1 SSD:
 
-  - 下载 MobileNet-v1 SSD:
     ```bash
     ./pretrained/download_coco.sh
     ```
-  - 下载 MobileNet-v1:
-    ```bash
-    ./pretrained/download_imagenet.sh
-    ```
+
+声明：MobileNet-v1 SSD 模型转换自[TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md)。MobileNet-v1 模型转换自[Caffe](https://github.com/shicai/MobileNet-Caffe)。
+
 
 #### 训练
 
@@ -63,7 +52,6 @@ cd data/coco
   python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/'
   ```
    - 可以通过设置 ```export CUDA_VISIBLE_DEVICES=0,1``` 指定想要使用的GPU数量。
-   - 可以通过设置 ```--dataset='coco2014'``` 或 ```--dataset='coco2017'``` 指定训练 MS-COCO数据集。
    - 更多的可选参数见:
 
   ```bash
@@ -80,25 +68,13 @@ cd data/coco
 
 ### 模型评估
 
-你可以使用11point、integral等指标在PASCAL VOC 和 COCO 数据集上评估训练好的模型。不失一般性，我们采用相应数据集的测试列表作为样例代码的默认列表，你也可以通过设置```--test_list```来指定自己的测试样本列表。
+你可以使用11point、integral等指标在PASCAL VOC 数据集上评估训练好的模型。不失一般性，我们采用相应数据集的测试列表作为样例代码的默认列表，你也可以通过设置```--test_list```来指定自己的测试样本列表。
 
 `eval.py`是评估模块的主要执行程序，调用示例如下：
 ```bash
 python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/best_model' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point' --nms_threshold=0.45
 ```
 
-你可以设置```--dataset``` 为 ```coco2014``` 或 ```coco2017```来评估 COCO 数据集。我们也提供了`eval_coco_map.py`以进行[COCO官方评估](http://cocodataset.org/#detections-eval)。若要使用 eval_coco_map.py, 需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi)：
-```
-# COCOAPI=/path/to/clone/cocoapi
-git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
-cd $COCOAPI/PythonAPI
-# Install into global site-packages
-make install
-# Alternatively, if you do not have permissions or prefer
-# not to install the COCO API into global site-packages
-python2 setup.py install --user
-```
-
 ### 模型预测以及可视化
 
 `infer.py`是预测及可视化模块的主要执行程序，调用示例如下：
diff --git a/fluid/PaddleCV/object_detection/README_quant.md b/fluid/PaddleCV/object_detection/README_quant.md
index 6723a48832d1b5210436eb2001234c6fe9149736..7ea7f7bd79d21ba34c84d1a1b48a5298837939ac 100644
--- a/fluid/PaddleCV/object_detection/README_quant.md
+++ b/fluid/PaddleCV/object_detection/README_quant.md
@@ -2,7 +2,7 @@
 
 ### Introduction
 
-The quantization-aware training used in this experiments is introduced in [fixed-point quantization desigin](https://gthub.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/quantization/fixed_point_quantization.md). Since quantization-aware training is still an active area of research and experimentation,
+The quantization-aware training used in this experiments is introduced in [fixed-point quantization desigin](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/quantization/fixed_point_quantization.md). Since quantization-aware training is still an active area of research and experimentation,
 here, we just give an simple quantization training usage in Fluid based on MobileNet-SSD model, and more other exeperiments are still needed, like how to quantization traning by considering fusing batch normalization and convolution/fully-connected layers, channel-wise quantization of weights and so on.
 
 
@@ -130,6 +130,9 @@ A Python transpiler is used to rewrite Fluid training program or evaluation prog
   ```
   See 002271.jpg for the visualized image with bbouding boxes.
 
+
+  **Note**, if you want to convert model to 8-bit, you should call `fluid.contrib.QuantizeTranspiler.convert_to_int8` to do this. But, now Paddle can't load 8-bit model to do inference.
+
 ### Results
 
 Results of MobileNet-v1-SSD 300x300 model on PascalVOC dataset.
diff --git a/fluid/PaddleCV/object_detection/_ce.py b/fluid/PaddleCV/object_detection/_ce.py
index 5f5d3e013a1bfca1ca0a0d1b6fb93a76a242496e..6f300e162b1c1940a2c8f1463953f0bcbeaa0a78 100644
--- a/fluid/PaddleCV/object_detection/_ce.py
+++ b/fluid/PaddleCV/object_detection/_ce.py
@@ -9,10 +9,10 @@ from kpi import CostKpi, DurationKpi, AccKpi
 
 train_cost_kpi = CostKpi('train_cost', 0.02, 0, actived=True)
 test_acc_kpi = AccKpi('test_acc', 0.01, 0, actived=False)
-train_speed_kpi = AccKpi('train_speed', 0.1, 0, actived=True)
+train_speed_kpi = DurationKpi('train_speed', 0.1, 0, actived=True, unit_repr="s/epoch")
 train_cost_card4_kpi = CostKpi('train_cost_card4', 0.02, 0, actived=True)
 test_acc_card4_kpi = AccKpi('test_acc_card4', 0.01, 0, actived=False)
-train_speed_card4_kpi = AccKpi('train_speed_card4', 0.1, 0, actived=True)
+train_speed_card4_kpi = DurationKpi('train_speed_card4', 0.1, 0, actived=True, unit_repr="s/epoch")
 
 tracking_kpis = [
     train_cost_kpi,
diff --git a/fluid/PaddleCV/object_detection/data_util.py b/fluid/PaddleCV/object_detection/data_util.py
deleted file mode 100644
index e7d6b2b43eee5048fb5d3d8397a3e88aa0f14b49..0000000000000000000000000000000000000000
--- a/fluid/PaddleCV/object_detection/data_util.py
+++ /dev/null
@@ -1,153 +0,0 @@
-"""
-This code is based on https://github.com/fchollet/keras/blob/master/keras/utils/data_utils.py
-"""
-
-import time
-import numpy as np
-import threading
-import multiprocessing
-try:
-    import queue
-except ImportError:
-    import Queue as queue
-
-
-class GeneratorEnqueuer(object):
-    """
-    Builds a queue out of a data generator.
-
-    Args:
-        generator: a generator function which endlessly yields data
-        use_multiprocessing (bool): use multiprocessing if True,
-            otherwise use threading.
-        wait_time (float): time to sleep in-between calls to `put()`.
-        random_seed (int): Initial seed for workers,
-            will be incremented by one for each workers.
-    """
-
-    def __init__(self,
-                 generator,
-                 use_multiprocessing=False,
-                 wait_time=0.05,
-                 random_seed=None):
-        self.wait_time = wait_time
-        self._generator = generator
-        self._use_multiprocessing = use_multiprocessing
-        self._threads = []
-        self._stop_event = None
-        self.queue = None
-        self._manager = None
-        self.seed = random_seed
-
-    def start(self, workers=1, max_queue_size=10):
-        """
-        Start worker threads which add data from the generator into the queue.
-
-        Args:
-            workers (int): number of worker threads
-            max_queue_size (int): queue size
-                (when full, threads could block on `put()`)
-        """
-
-        def data_generator_task():
-            """
-            Data generator task.
-            """
-
-            def task():
-                if (self.queue is not None and
-                        self.queue.qsize() < max_queue_size):
-                    generator_output = next(self._generator)
-                    self.queue.put((generator_output))
-                else:
-                    time.sleep(self.wait_time)
-
-            if not self._use_multiprocessing:
-                while not self._stop_event.is_set():
-                    with self.genlock:
-                        try:
-                            task()
-                        except Exception:
-                            traceback.print_exc()
-                            self._stop_event.set()
-                            break
-            else:
-                while not self._stop_event.is_set():
-                    try:
-                        task()
-                    except Exception:
-                        traceback.print_exc()
-                        self._stop_event.set()
-                        break
-
-        try:
-            if self._use_multiprocessing:
-                self._manager = multiprocessing.Manager()
-                self.queue = self._manager.Queue(maxsize=max_queue_size)
-                self._stop_event = multiprocessing.Event()
-            else:
-                self.genlock = threading.Lock()
-                self.queue = queue.Queue()
-                self._stop_event = threading.Event()
-            for _ in range(workers):
-                if self._use_multiprocessing:
-                    # Reset random seed else all children processes
-                    # share the same seed
-                    np.random.seed(self.seed)
-                    thread = multiprocessing.Process(target=data_generator_task)
-                    thread.daemon = True
-                    if self.seed is not None:
-                        self.seed += 1
-                else:
-                    thread = threading.Thread(target=data_generator_task)
-                self._threads.append(thread)
-                thread.start()
-        except:
-            self.stop()
-            raise
-
-    def is_running(self):
-        """
-        Returns:
-            bool: Whether the worker theads are running.
-        """
-        return self._stop_event is not None and not self._stop_event.is_set()
-
-    def stop(self, timeout=None):
-        """
-        Stops running threads and wait for them to exit, if necessary.
-        Should be called by the same thread which called `start()`.
-
-        Args:
-            timeout(int|None): maximum time to wait on `thread.join()`.
-        """
-        if self.is_running():
-            self._stop_event.set()
-        for thread in self._threads:
-            if self._use_multiprocessing:
-                if thread.is_alive():
-                    thread.terminate()
-            else:
-                thread.join(timeout)
-        if self._manager:
-            self._manager.shutdown()
-
-        self._threads = []
-        self._stop_event = None
-        self.queue = None
-
-    def get(self):
-        """
-        Creates a generator to extract data from the queue.
-        Skip the data if it is `None`.
-
-        # Yields
-            tuple of data in the queue.
-        """
-        while self.is_running():
-            if not self.queue.empty():
-                inputs = self.queue.get()
-                if inputs is not None:
-                    yield inputs
-            else:
-                time.sleep(self.wait_time)
diff --git a/fluid/PaddleCV/object_detection/eval.py b/fluid/PaddleCV/object_detection/eval.py
index 106fb67e073648f94934e7b17f02b964d276e5ec..157384b04f40ab2e3023fa57269267219b16d62d 100644
--- a/fluid/PaddleCV/object_detection/eval.py
+++ b/fluid/PaddleCV/object_detection/eval.py
@@ -52,7 +52,7 @@ def build_program(main_prog, startup_prog, args, data_args):
             nmsed_out = fluid.layers.detection_output(
                 locs, confs, box, box_var, nms_threshold=args.nms_threshold)
             with fluid.program_guard(main_prog):
-                map = fluid.evaluator.DetectionMAP(
+                map = fluid.metrics.DetectionMAP(
                     nmsed_out,
                     gt_label,
                     gt_box,
diff --git a/fluid/PaddleCV/object_detection/eval_coco_map.py b/fluid/PaddleCV/object_detection/eval_coco_map.py
index 0837f42ad89cda1e6a81825bc0545a11b48c4b3c..3e4d4ab8b3460263221b90d0dce787439f014f5b 100644
--- a/fluid/PaddleCV/object_detection/eval_coco_map.py
+++ b/fluid/PaddleCV/object_detection/eval_coco_map.py
@@ -47,7 +47,7 @@ def eval(args, data_args, test_list, batch_size, model_dir=None):
     gt_iscrowd = fluid.layers.data(
         name='gt_iscrowd', shape=[1], dtype='int32', lod_level=1)
     gt_image_info = fluid.layers.data(
-        name='gt_image_id', shape=[3], dtype='int32', lod_level=1)
+        name='gt_image_id', shape=[3], dtype='int32')
 
     locs, confs, box, box_var = mobile_net(num_classes, image, image_shape)
     nmsed_out = fluid.layers.detection_output(
@@ -57,14 +57,14 @@ def eval(args, data_args, test_list, batch_size, model_dir=None):
 
     place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
     exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
     # yapf: disable
     if model_dir:
         def if_exist(var):
             return os.path.exists(os.path.join(model_dir, var.name))
         fluid.io.load_vars(exe, model_dir, predicate=if_exist)
     # yapf: enable
-    test_reader = paddle.batch(
-        reader.test(data_args, test_list), batch_size=batch_size)
+    test_reader = reader.test(data_args, test_list, batch_size)
     feeder = fluid.DataFeeder(
         place=place,
         feed_list=[image, gt_box, gt_label, gt_iscrowd, gt_image_info])
@@ -146,8 +146,7 @@ if __name__ == '__main__':
         mean_value=[args.mean_value_B, args.mean_value_G, args.mean_value_R],
         apply_distort=False,
         apply_expand=False,
-        ap_version=args.ap_version,
-        toy=0)
+        ap_version=args.ap_version)
     eval(
         args,
         data_args=data_args,
diff --git a/fluid/PaddleCV/object_detection/main_quant.py b/fluid/PaddleCV/object_detection/main_quant.py
index bd7d377e69e95dcaf066a40941cf48091583e7ab..2927858a1ea34fbc3adc6eb996ac596c50846fa4 100644
--- a/fluid/PaddleCV/object_detection/main_quant.py
+++ b/fluid/PaddleCV/object_detection/main_quant.py
@@ -85,7 +85,6 @@ def train(args,
 
     batch_size = train_params['batch_size']
     batch_size_per_device = batch_size // devices_num
-    iters_per_epoc = train_params["train_images"] // batch_size
     num_workers = 4
 
     startup_prog = fluid.Program()
@@ -134,22 +133,22 @@ def train(args,
                                 train_file_list,
                                 batch_size_per_device,
                                 shuffle=is_shuffle,
-                                use_multiprocessing=True,
-                                num_workers=num_workers,
-                                max_queue=24)
+                                num_workers=num_workers)
     test_reader = reader.test(data_args, val_file_list, batch_size)
     train_py_reader.decorate_paddle_reader(train_reader)
     test_py_reader.decorate_paddle_reader(test_reader)
 
     train_py_reader.start()
     best_map = 0.
-    try:
-        for epoc in range(epoc_num):
-            if epoc == 0:
-                # test quantized model without quantization-aware training.
-                test_map = test(exe, test_prog, map_eval, test_py_reader)
-            # train
-            for batch in range(iters_per_epoc):
+    for epoc in range(epoc_num):
+        if epoc == 0:
+            # test quantized model without quantization-aware training.
+            test_map = test(exe, test_prog, map_eval, test_py_reader)
+        batch = 0
+        train_py_reader.start()
+        while True:
+            try:
+                # train
                 start_time = time.time()
                 if parallel:
                     outs = train_exe.run(fetch_list=[loss.name])
@@ -157,18 +156,19 @@ def train(args,
                     outs = exe.run(train_prog, fetch_list=[loss])
                 end_time = time.time()
                 avg_loss = np.mean(np.array(outs[0]))
-                if batch % 20 == 0:
+                if batch % 10 == 0:
                     print("Epoc {:d}, batch {:d}, loss {:.6f}, time {:.5f}".format(
                         epoc , batch, avg_loss, end_time - start_time))
-            end_time = time.time()
-            test_map = test(exe, test_prog, map_eval, test_py_reader)
-            save_model(exe, train_prog, model_save_dir, str(epoc))
-            if test_map > best_map:
-                best_map = test_map
-                save_model(exe, train_prog, model_save_dir, 'best_map')
-            print("Best test map {0}".format(best_map))
-    except (fluid.core.EOFException, StopIteration):
-        train_py_reader.reset()
+            except (fluid.core.EOFException, StopIteration):
+                train_reader().close()
+                train_py_reader.reset()
+                break
+        test_map = test(exe, test_prog, map_eval, test_py_reader)
+        save_model(exe, train_prog, model_save_dir, str(epoc))
+        if test_map > best_map:
+            best_map = test_map
+            save_model(exe, train_prog, model_save_dir, 'best_map')
+        print("Best test map {0}".format(best_map))
 
 
 def eval(args, data_args, configs, val_file_list):
@@ -212,6 +212,9 @@ def eval(args, data_args, configs, val_file_list):
 
     test_map = test(exe, test_prog, map_eval, test_py_reader)
     print("Test model {0}, map {1}".format(init_model, test_map))
+    # convert model to 8-bit before saving, but now Paddle can't load
+    # the 8-bit model to do inference.
+    # transpiler.convert_to_int8(test_prog, place)
     fluid.io.save_inference_model(model_save_dir, [image.name],
                                   [nmsed_out], exe, test_prog)
 
diff --git a/fluid/PaddleCV/object_detection/reader.py b/fluid/PaddleCV/object_detection/reader.py
index 6acc18594e5979308a7ba641002569b0867516a8..3559591c4ed5741d52f44bd92f4398d133b2e104 100644
--- a/fluid/PaddleCV/object_detection/reader.py
+++ b/fluid/PaddleCV/object_detection/reader.py
@@ -12,17 +12,17 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-import image_util
-from paddle.utils.image_util import *
-from PIL import Image
-from PIL import ImageDraw
-import numpy as np
 import xml.etree.ElementTree
 import os
 import time
 import copy
 import six
-from data_util import GeneratorEnqueuer
+import math
+import numpy as np
+from PIL import Image
+from PIL import ImageDraw
+import image_util
+import paddle
 
 
 class Settings(object):
@@ -162,26 +162,14 @@ def preprocess(img, bbox_labels, mode, settings):
     return img, sampled_labels
 
 
-def coco(settings, file_list, mode, batch_size, shuffle):
-    # cocoapi
+def coco(settings, coco_api, file_list, mode, batch_size, shuffle, data_dir):
     from pycocotools.coco import COCO
-    from pycocotools.cocoeval import COCOeval
-
-    coco = COCO(file_list)
-    image_ids = coco.getImgIds()
-    images = coco.loadImgs(image_ids)
-    print("{} on {} with {} images".format(mode, settings.dataset, len(images)))
 
     def reader():
         if mode == 'train' and shuffle:
-            np.random.shuffle(images)
+            np.random.shuffle(file_list)
         batch_out = []
-        if '2014' in file_list:
-            sub_dir = "train2014" if model == "train" else "val2014"
-        elif '2017' in file_list:
-            sub_dir = "train2017" if mode == "train" else "val2017"
-        data_dir = os.path.join(settings.data_dir, sub_dir)
-        for image in images:
+        for image in file_list:
             image_name = image['file_name']
             image_path = os.path.join(data_dir, image_name)
             if not os.path.exists(image_path):
@@ -195,8 +183,8 @@ def coco(settings, file_list, mode, batch_size, shuffle):
 
             # layout: category_id | xmin | ymin | xmax | ymax | iscrowd
             bbox_labels = []
-            annIds = coco.getAnnIds(imgIds=image['id'])
-            anns = coco.loadAnns(annIds)
+            annIds = coco_api.getAnnIds(imgIds=image['id'])
+            anns = coco_api.loadAnns(annIds)
             for ann in anns:
                 bbox_sample = []
                 # start from 1, leave 0 to background
@@ -236,16 +224,12 @@ def coco(settings, file_list, mode, batch_size, shuffle):
 
 
 def pascalvoc(settings, file_list, mode, batch_size, shuffle):
-    flist = open(file_list)
-    images = [line.strip() for line in flist]
-    print("{} on {} with {} images".format(mode, settings.dataset, len(images)))
-
     def reader():
         if mode == 'train' and shuffle:
-            np.random.shuffle(images)
+            np.random.shuffle(file_list)
         batch_out = []
         cnt = 0
-        for image in images:
+        for image in file_list:
             image_path, label_path = image.split()
             image_path = os.path.join(settings.data_dir, image_path)
             label_path = os.path.join(settings.data_dir, label_path)
@@ -299,52 +283,55 @@ def train(settings,
           file_list,
           batch_size,
           shuffle=True,
-          use_multiprocessing=True,
           num_workers=8,
-          max_queue=24,
           enable_ce=False):
-    file_list = os.path.join(settings.data_dir, file_list)
+    file_path = os.path.join(settings.data_dir, file_list)
+    readers = []
     if 'coco' in settings.dataset:
-        generator = coco(settings, file_list, "train", batch_size, shuffle)
-    else:
-        generator = pascalvoc(settings, file_list, "train", batch_size, shuffle)
+        # cocoapi
+        from pycocotools.coco import COCO
+        coco_api = COCO(file_path)
+        image_ids = coco_api.getImgIds()
+        images = coco_api.loadImgs(image_ids)
+        n = int(math.ceil(len(images) // num_workers))
+        image_lists = [images[i:i + n] for i in range(0, len(images), n)]
 
-    def infinite_reader():
-        while True:
-            for data in generator():
-                yield data
-
-    def reader():
-        try:
-            enqueuer = GeneratorEnqueuer(
-                infinite_reader(), use_multiprocessing=use_multiprocessing)
-            enqueuer.start(max_queue_size=max_queue, workers=num_workers)
-            generator_output = None
-            while True:
-                while enqueuer.is_running():
-                    if not enqueuer.queue.empty():
-                        generator_output = enqueuer.queue.get()
-                        break
-                    else:
-                        time.sleep(0.02)
-                yield generator_output
-                generator_output = None
-        finally:
-            if enqueuer is not None:
-                enqueuer.stop()
-
-    if enable_ce:
-        return infinite_reader
+        if '2014' in file_list:
+            sub_dir = "train2014"
+        elif '2017' in file_list:
+            sub_dir = "train2017"
+        data_dir = os.path.join(settings.data_dir, sub_dir)
+        for l in image_lists:
+            readers.append(
+                coco(settings, coco_api, l, 'train', batch_size, shuffle,
+                     data_dir))
     else:
-        return reader
+        images = [line.strip() for line in open(file_path)]
+        n = int(math.ceil(len(images) // num_workers))
+        image_lists = [images[i:i + n] for i in range(0, len(images), n)]
+        for l in image_lists:
+            readers.append(pascalvoc(settings, l, 'train', batch_size, shuffle))
+
+    return paddle.reader.multiprocess_reader(readers, False)
 
 
 def test(settings, file_list, batch_size):
     file_list = os.path.join(settings.data_dir, file_list)
     if 'coco' in settings.dataset:
-        return coco(settings, file_list, 'test', batch_size, False)
+        from pycocotools.coco import COCO
+        coco_api = COCO(file_list)
+        image_ids = coco_api.getImgIds()
+        images = coco_api.loadImgs(image_ids)
+        if '2014' in file_list:
+            sub_dir = "val2014"
+        elif '2017' in file_list:
+            sub_dir = "val2017"
+        data_dir = os.path.join(settings.data_dir, sub_dir)
+        return coco(settings, coco_api, images, 'test', batch_size, False,
+                    data_dir)
     else:
-        return pascalvoc(settings, file_list, 'test', batch_size, False)
+        image_list = [line.strip() for line in open(file_list)]
+        return pascalvoc(settings, image_list, 'test', batch_size, False)
 
 
 def infer(settings, image_path):
diff --git a/fluid/PaddleCV/object_detection/train.py b/fluid/PaddleCV/object_detection/train.py
index 2d830bcdf1d7900ca2f27055a9ec7568f75b6211..6fb7ce0236dc63f39de597788bc425f8dfa5ae6d 100644
--- a/fluid/PaddleCV/object_detection/train.py
+++ b/fluid/PaddleCV/object_detection/train.py
@@ -105,7 +105,7 @@ def build_program(main_prog, startup_prog, train_params, is_train):
                 with fluid.unique_name.guard("inference"):
                     nmsed_out = fluid.layers.detection_output(
                         locs, confs, box, box_var, nms_threshold=0.45)
-                    map_eval = fluid.evaluator.DetectionMAP(
+                    map_eval = fluid.metrics.DetectionMAP(
                         nmsed_out,
                         gt_label,
                         gt_box,
@@ -141,7 +141,6 @@ def train(args,
     batch_size = train_params['batch_size']
     epoc_num = train_params['epoc_num']
     batch_size_per_device = batch_size // devices_num
-    iters_per_epoc = train_params["train_images"] // batch_size
     num_workers = 8
 
     startup_prog = fluid.Program()
@@ -186,9 +185,7 @@ def train(args,
                                 train_file_list,
                                 batch_size_per_device,
                                 shuffle=is_shuffle,
-                                use_multiprocessing=True,
                                 num_workers=num_workers,
-                                max_queue=24,
                                 enable_ce=enable_ce)
     test_reader = reader.test(data_args, val_file_list, batch_size)
     train_py_reader.decorate_paddle_reader(train_reader)
@@ -205,7 +202,7 @@ def train(args,
     def test(epoc_id, best_map):
         _, accum_map = map_eval.get_map_var()
         map_eval.reset(exe)
-        every_epoc_map=[]
+        every_epoc_map=[] # for CE
         test_py_reader.start()
         try:
             batch_id = 0
@@ -218,22 +215,23 @@ def train(args,
         except fluid.core.EOFException:
             test_py_reader.reset()
         mean_map = np.mean(every_epoc_map)
-        print("Epoc {0}, test map {1}".format(epoc_id, test_map))
+        print("Epoc {0}, test map {1}".format(epoc_id, test_map[0]))
         if test_map[0] > best_map:
             best_map = test_map[0]
             save_model('best_model', test_prog)
         return best_map, mean_map
 
 
-    train_py_reader.start()
     total_time = 0.0
-    try:
-        for epoc_id in range(epoc_num):
-            epoch_idx = epoc_id + 1
-            start_time = time.time()
-            prev_start_time = start_time
-            every_epoc_loss = []
-            for batch_id in range(iters_per_epoc):
+    for epoc_id in range(epoc_num):
+        epoch_idx = epoc_id + 1
+        start_time = time.time()
+        prev_start_time = start_time
+        every_epoc_loss = []
+        batch_id = 0
+        train_py_reader.start()
+        while True:
+            try:
                 prev_start_time = start_time
                 start_time = time.time()
                 if parallel:
@@ -242,34 +240,35 @@ def train(args,
                     loss_v, = exe.run(train_prog, fetch_list=[loss])
                 loss_v = np.mean(np.array(loss_v))
                 every_epoc_loss.append(loss_v)
-                if batch_id % 20 == 0:
+                if batch_id % 10 == 0:
                     print("Epoc {:d}, batch {:d}, loss {:.6f}, time {:.5f}".format(
                         epoc_id, batch_id, loss_v, start_time - prev_start_time))
-            end_time = time.time()
-            total_time += end_time - start_time
-
-            best_map, mean_map = test(epoc_id, best_map)
-            print("Best test map {0}".format(best_map))
-            if epoc_id % 10 == 0 or epoc_id == epoc_num - 1:
-                save_model(str(epoc_id), train_prog)
-
-            if enable_ce and epoc_id == epoc_num - 1:
-                train_avg_loss = np.mean(every_epoc_loss)
-                if devices_num == 1:
-                    print("kpis	train_cost	%s" % train_avg_loss)
-                    print("kpis	test_acc	%s" % mean_map)
-                    print("kpis	train_speed	%s" % (total_time / epoch_idx))
-                else:
-                    print("kpis	train_cost_card%s	%s" %
-                           (devices_num, train_avg_loss))
-                    print("kpis	test_acc_card%s	%s" %
-                           (devices_num, mean_map))
-                    print("kpis	train_speed_card%s	%f" %
-                           (devices_num, total_time / epoch_idx))
-
-    except (fluid.core.EOFException, StopIteration):
-        train_reader().close()
-        train_py_reader.reset()
+                batch_id += 1
+            except (fluid.core.EOFException, StopIteration):
+                train_reader().close()
+                train_py_reader.reset()
+                break
+
+        end_time = time.time()
+        total_time += end_time - start_time
+        best_map, mean_map = test(epoc_id, best_map)
+        print("Best test map {0}".format(best_map))
+        if epoc_id % 10 == 0 or epoc_id == epoc_num - 1:
+            save_model(str(epoc_id), train_prog)
+
+    if enable_ce:
+        train_avg_loss = np.mean(every_epoc_loss)
+        if devices_num == 1:
+            print("kpis	train_cost	%s" % train_avg_loss)
+            print("kpis	test_acc	%s" % mean_map)
+            print("kpis	train_speed	%s" % (total_time / epoch_idx))
+        else:
+            print("kpis	train_cost_card%s	%s" %
+                   (devices_num, train_avg_loss))
+            print("kpis	test_acc_card%s	%s" %
+                   (devices_num, mean_map))
+            print("kpis	train_speed_card%s	%f" %
+                   (devices_num, total_time / epoch_idx))
 
 
 if __name__ == '__main__':
diff --git a/fluid/PaddleCV/ocr_recognition/README.md b/fluid/PaddleCV/ocr_recognition/README.md
index 8b2d95694631e46d541d46c3f4950fd9a99ce0e3..1c9553993e84d10376441407704088ec4dd66c0c 100644
--- a/fluid/PaddleCV/ocr_recognition/README.md
+++ b/fluid/PaddleCV/ocr_recognition/README.md
@@ -202,5 +202,5 @@ env CUDA_VISIBLE_DEVICE=0 python infer.py \
 
 |模型| 错误率|
 |- |:-: |
-|[ocr_ctc_params](https://drive.google.com/open?id=1gsg2ODO2_F2pswXwW5MXpf8RY8-BMRyZ) | 22.3% |
-|[ocr_attention_params](https://drive.google.com/open?id=1Bx7-94mngyTaMA5kVjzYHDPAdXxOYbRm) | 15.8%|
+|[ocr_ctc_params](https://paddle-ocr-models.bj.bcebos.com/ocr_ctc.zip) | 22.3% |
+|[ocr_attention_params](https://paddle-ocr-models.bj.bcebos.com/ocr_attention.zip) | 15.8%|
diff --git a/fluid/PaddleCV/faster_rcnn/.gitignore b/fluid/PaddleCV/rcnn/.gitignore
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/.gitignore
rename to fluid/PaddleCV/rcnn/.gitignore
diff --git a/fluid/PaddleCV/faster_rcnn/.run_ce.sh b/fluid/PaddleCV/rcnn/.run_ce.sh
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/.run_ce.sh
rename to fluid/PaddleCV/rcnn/.run_ce.sh
diff --git a/fluid/PaddleCV/faster_rcnn/README.md b/fluid/PaddleCV/rcnn/README.md
similarity index 77%
rename from fluid/PaddleCV/faster_rcnn/README.md
rename to fluid/PaddleCV/rcnn/README.md
index 0a5f68c34adda54ba0e27f44f16c18cafe057830..824709be70af4e7628f8c92e0c7ee7a5b0edf0d8 100644
--- a/fluid/PaddleCV/faster_rcnn/README.md
+++ b/fluid/PaddleCV/rcnn/README.md
@@ -1,4 +1,4 @@
-# Faster RCNN Objective Detection
+# RCNN Objective Detection
 
 ---
 ## Table of Contents
@@ -9,7 +9,6 @@
 - [Training](#training)
 - [Evaluation](#evaluation)
 - [Inference and Visualization](#inference-and-visualization)
-- [Appendix](#appendix)
 
 ## Installation
 
@@ -17,17 +16,20 @@ Running sample code in this directory requires PaddelPaddle Fluid v.1.0.0 and la
 
 ## Introduction
 
-[Faster Rcnn](https://arxiv.org/abs/1506.01497) is a typical two stage detector. The total framework of network can be divided into four parts, as shown below:
-<p align="center">
-<img src="image/Faster_RCNN.jpg" height=400 width=400 hspace='10'/> <br />
-Faster RCNN model
-</p>
+Region Convolutional Neural Network (RCNN) models are two stages detector. According to proposals and feature extraction, obtain class and more precise proposals.
+Now RCNN model contains two typical models: Faster RCNN and Mask RCNN.
+
+[Faster RCNN](https://arxiv.org/abs/1506.01497), The total framework of network can be divided into four parts:
 
 1. Base conv layer. As a CNN objective dection, Faster RCNN extract feature maps using a basic convolutional network. The feature maps then can be shared by RPN and fc layers. This sampel uses [ResNet-50](https://arxiv.org/abs/1512.03385) as base conv layer.
 2. Region Proposal Network (RPN). RPN generates proposals for detection。This block generates anchors by a set of size and ratio and classifies anchors into fore-ground and back-ground by softmax. Then refine anchors to obtain more precise proposals using box regression.
 3. RoI Align. This layer takes feature maps and proposals as input. The proposals are mapped to feature maps and pooled to the same size. The output are sent to fc layers for classification and regression. RoIPool and RoIAlign are used separately to this layer and it can be set in roi\_func in config.py.
 4. Detection layer. Using the output of roi pooling to compute the class and locatoin of each proposal in two fc layers.
 
+[Mask RCNN](https://arxiv.org/abs/1703.06870) is a classical instance segmentation model and an extension of Faster RCNN
+
+Mask RCNN is a two stage model as well. At the first stage, it generates proposals from input images. At the second stage, it obtains class result, bbox and mask which is the result from segmentation branch on original Faster RCNN model. It decouples the relation between mask and classification.  
+
 ## Data preparation
 
 Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download dataset as below:
@@ -62,12 +64,24 @@ To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed.
 
 After data preparation, one can start the training step by:
 
+- Faster RCNN
+
     python train.py \
        --model_save_dir=output/ \
-       --pretrained_model=${path_to_pretrain_model}
-       --data_dir=${path_to_data}
+       --pretrained_model=${path_to_pretrain_model} \
+       --data_dir=${path_to_data} \
+       --MASK_ON=False
+
+- Mask RCNN
+
+    python train.py \
+       --model_save_dir=output/ \
+       --pretrained_model=${path_to_pretrain_model} \
+       --data_dir=${path_to_data} \
+       --MASK_ON=True
 
 - Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train.
+- Set ```MASK_ON``` to choose Faster RCNN or Mask RCNN model.
 - For more help on arguments:
 
     python train.py --help
@@ -93,7 +107,6 @@ After data preparation, one can start the training step by:
 *  In first 500 iteration, the learning rate increases linearly from 0.00333 to 0.01. Then lr is decayed at 120000, 160000 iteration with multiplier 0.1, 0.01. The maximum iteration is 180000. Also, we released a 2x model which has 360000 iterations and lr is decayed at 240000, 320000. These configuration can be set by max_iter and lr_steps in config.py.
 *  Set the learning rate of bias to two times as global lr in non basic convolutional layers.
 *  In basic convolutional layers, parameters of affine layers and res body do not update.
-*  Use Nvidia Tesla V100 8GPU, total time for training is about 40 hours.
 
 ## Evaluation
 
@@ -101,14 +114,27 @@ Evaluation is to evaluate the performance of a trained model. This sample provid
 
 `eval_coco_map.py` is the main executor for evalution, one can start evalution step by:
 
+- Faster RCNN
+
     python eval_coco_map.py \
         --dataset=coco2017 \
         --pretrained_model=${path_to_pretrain_model} \
+        --MASK_ON=False
+
+- Mask RCNN
+
+    python eval_coco_map.py \
+        --dataset=coco2017 \
+        --pretrained_model=${path_to_pretrain_model} \
+        --MASK_ON=True
 
 - Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to eval.
+- Set ```MASK_ON``` to choose Faster RCNN or Mask RCNN model.
 
 Evalutaion result is shown as below:
 
+Faster RCNN:
+
 | Model              | RoI function    | Batch size     | Max iteration    | mAP  |
 | :--------------- | :--------: | :------------:    | :------------------:    |------: |
 | [Fluid RoIPool minibatch padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_minibatch_padding.tar.gz) | RoIPool | 8   |    180000        | 0.316 |
@@ -121,6 +147,14 @@ Evalutaion result is shown as below:
 * Fluid RoIAlign no padding: Images without padding.
 * Fluid RoIAlign no padding 2x: Images without padding, train for 360000 iterations, learning rate is decayed at 240000, 320000.
 
+Mask RCNN:
+
+| Model              | Batch size     | Max iteration | box mAP | mask mAP |
+| :--------------- | :--------: | :------------:    | :--------:    |------: |
+| [Fluid mask no padding](https://paddlemodels.bj.bcebos.com/faster_rcnn/Fluid_mask_no_padding.tar.gz) | 8 | 180000 | 0.359 | 0.314 |
+
+* Fluid mask no padding: Use RoIAlign. Images without padding.
+
 ## Inference and Visualization
 
 Inference is used to get prediction score or image features based on trained models. `infer.py`  is the main executor for inference, one can start infer step by:
@@ -135,8 +169,12 @@ Inference is used to get prediction score or image features based on trained mod
 Visualization of infer result is shown as below:
 <p align="center">
 <img src="image/000000000139.jpg" height=300 width=400 hspace='10'/>
-<img src="image/000000127517.jpg" height=300 width=400 hspace='10'/>
-<img src="image/000000203864.jpg" height=300 width=400 hspace='10'/>
-<img src="image/000000515077.jpg" height=300 width=400 hspace='10'/> <br />
+<img src="image/000000127517.jpg" height=300 width=400 hspace='10'/> <br />
 Faster RCNN Visualization Examples
 </p>
+
+<p align="center">
+<img src="image/000000000139_mask.jpg" height=300 width=400 hspace='10'/>
+<img src="image/000000127517_mask.jpg" height=300 width=400 hspace='10'/> <br />
+Mask RCNN Visualization Examples
+</p>
diff --git a/fluid/PaddleCV/faster_rcnn/README_cn.md b/fluid/PaddleCV/rcnn/README_cn.md
similarity index 76%
rename from fluid/PaddleCV/faster_rcnn/README_cn.md
rename to fluid/PaddleCV/rcnn/README_cn.md
index 29adfcfd274b82f2ddaba1894be6ad1c7ece1e6a..57e622e60f049d2976038bddb4cb39bd8fbf9756 100644
--- a/fluid/PaddleCV/faster_rcnn/README_cn.md
+++ b/fluid/PaddleCV/rcnn/README_cn.md
@@ -1,4 +1,4 @@
-# Faster RCNN 目标检测
+# RCNN 系列目标检测
 
 ---
 ## 内容
@@ -9,25 +9,27 @@
 - [模型训练](#模型训练)
 - [模型评估](#模型评估)
 - [模型推断及可视化](#模型推断及可视化)
-- [附录](#附录)
 
 ## 安装
 
 在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.0.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本，请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/0.15.0/beginners_guide/install/install_doc.html#paddlepaddle)中的说明来更新PaddlePaddle。
 
 ## 简介
+区域卷积神经网络（RCNN）系列模型为两阶段目标检测器。通过对图像生成候选区域，提取特征，判别特征类别并修正候选框位置。
+RCNN系列目前包含两个代表模型：Faster RCNN，Mask RCNN
 
-[Faster Rcnn](https://arxiv.org/abs/1506.01497) 是典型的两阶段目标检测器。如下图所示，整体网络可以分为4个主要内容：
-<p align="center">
-<img src="image/Faster_RCNN.jpg" height=400 width=400 hspace='10'/> <br />
-Faster RCNN 目标检测模型
-</p>
+[Faster RCNN](https://arxiv.org/abs/1506.01497) 整体网络可以分为4个主要内容：
 
 1. 基础卷积层。作为一种卷积神经网络目标检测方法，Faster RCNN首先使用一组基础的卷积网络提取图像的特征图。特征图被后续RPN层和全连接层共享。本示例采用[ResNet-50](https://arxiv.org/abs/1512.03385)作为基础卷积层。
 2. 区域生成网络(RPN)。RPN网络用于生成候选区域(proposals)。该层通过一组固定的尺寸和比例得到一组锚点(anchors), 通过softmax判断锚点属于前景或者背景，再利用区域回归修正锚点从而获得精确的候选区域。
 3. RoI Align。该层收集输入的特征图和候选区域，将候选区域映射到特征图中并池化为统一大小的区域特征图，送入全连接层判定目标类别, 该层可选用RoIPool和RoIAlign两种方式，在config.py中设置roi\_func。
 4. 检测层。利用区域特征图计算候选区域的类别，同时再次通过区域回归获得检测框最终的精确位置。
 
+[Mask RCNN](https://arxiv.org/abs/1703.06870) 扩展自Faster RCNN，是经典的实例分割模型。
+
+Mask RCNN同样为两阶段框架，第一阶段扫描图像生成候选框；第二阶段根据候选框得到分类结果，边界框，同时在原有Faster RCNN模型基础上添加分割分支，得到掩码结果，实现了掩码和类别预测关系的解藕。
+
+
 ## 数据准备
 
 在[MS-COCO数据集](http://cocodataset.org/#download)上进行训练，通过如下方式下载数据集。
@@ -61,12 +63,24 @@ Faster RCNN 目标检测模型
 
 数据准备完毕后，可以通过如下的方式启动训练：
 
+- Faster RCNN
+
     python train.py \
        --model_save_dir=output/ \
-       --pretrained_model=${path_to_pretrain_model}
-       --data_dir=${path_to_data}
+       --pretrained_model=${path_to_pretrain_model} \
+       --data_dir=${path_to_data} \
+       --MASK_ON=False
+
+- Mask RCNN
+
+    python train.py \
+       --model_save_dir=output/ \
+       --pretrained_model=${path_to_pretrain_model} \
+       --data_dir=${path_to_data} \
+       --MASK_ON=True
 
 - 通过设置export CUDA\_VISIBLE\_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。
+- 通过设置```MASK_ON```选择Faster RCNN和Mask RCNN模型。
 - 可选参数见：
 
     python train.py --help
@@ -83,11 +97,10 @@ Faster RCNN 目标检测模型
 
 **训练策略：**
 
-*  采用momentum优化算法训练Faster RCNN，momentum=0.9。
+*  采用momentum优化算法训练，momentum=0.9。
 *  权重衰减系数为0.0001，前500轮学习率从0.00333线性增加至0.01。在120000，160000轮时使用0.1,0.01乘子进行学习率衰减，最大训练180000轮。同时我们也提供了2x模型，该模型采用更多的迭代轮数进行训练，训练360000轮，学习率在240000，320000轮衰减，其他参数不变，训练最大轮数和学习率策略可以在config.py中对max_iter和lr_steps进行设置。
 *  非基础卷积层卷积bias学习率为整体学习率2倍。
 *  基础卷积层中，affine_layers参数不更新，res2层参数不更新。
-*  使用Nvidia Tesla V100 8卡并行，总共训练时长大约40小时。
 
 ## 模型评估
 
@@ -95,14 +108,27 @@ Faster RCNN 目标检测模型
 
 `eval_coco_map.py`是评估模块的主要执行程序，调用示例如下：
 
+- Faster RCNN
+
     python eval_coco_map.py \
         --dataset=coco2017 \
         --pretrained_model=${path_to_pretrain_model} \
+        --MASK_ON=False
+
+- Mask RCNN
+
+    python eval_coco_map.py \
+        --dataset=coco2017 \
+        --pretrained_model=${path_to_pretrain_model} \
+        --MASK_ON=True
 
 - 通过设置export CUDA\_VISIBLE\_DEVICES=0指定单卡GPU评估。
+- 通过设置```MASK_ON```选择Faster RCNN和Mask RCNN模型。
 
 下表为模型评估结果：
 
+Faster RCNN
+
 | 模型                   |   RoI处理方式  | 批量大小   | 迭代次数   | mAP  |
 | :--------------- | :--------: | :------------:    | :------------------:    |------: |
 | [Fluid RoIPool minibatch padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_minibatch_padding.tar.gz) | RoIPool | 8   |    180000        | 0.316 |
@@ -117,6 +143,14 @@ Faster RCNN 目标检测模型
 * Fluid RoIAlign no padding: 使用RoIAlign，不对图像做填充处理。
 * Fluid RoIAlign no padding 2x: 使用RoIAlign，不对图像做填充处理。训练360000轮，学习率在240000，320000轮衰减。
 
+Mask RCNN:
+
+| 模型                   | 批量大小   | 迭代次数   | box mAP  | mask mAP |
+| :--------------- | :--------: | :------------:    | :--------:    |------: |
+| [Fluid mask no padding](https://paddlemodels.bj.bcebos.com/faster_rcnn/Fluid_mask_no_padding.tar.gz) | 8 | 180000 | 0.359 | 0.314 |  
+
+* Fluid mask no padding: 使用RoIAlign，不对图像做填充处理
+
 ## 模型推断及可视化
 
 模型推断可以获取图像中的物体及其对应的类别，`infer.py`是主要执行程序，调用示例如下：
@@ -131,8 +165,12 @@ Faster RCNN 目标检测模型
 下图为模型可视化预测结果：
 <p align="center">
 <img src="image/000000000139.jpg" height=300 width=400 hspace='10'/>
-<img src="image/000000127517.jpg" height=300 width=400 hspace='10'/>
-<img src="image/000000203864.jpg" height=300 width=400 hspace='10'/>
-<img src="image/000000515077.jpg" height=300 width=400 hspace='10'/> <br />
+<img src="image/000000127517.jpg" height=300 width=400 hspace='10'/> <br />
 Faster RCNN 预测可视化
 </p>
+
+<p align="center">
+<img src="image/000000000139_mask.jpg" height=300 width=400 hspace='10'/>
+<img src="image/000000127517_mask.jpg" height=300 width=400 hspace='10'/> <br />
+Mask RCNN 预测可视化
+</p>
diff --git a/fluid/PaddleCV/faster_rcnn/__init__.py b/fluid/PaddleCV/rcnn/__init__.py
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/__init__.py
rename to fluid/PaddleCV/rcnn/__init__.py
diff --git a/fluid/PaddleCV/faster_rcnn/_ce.py b/fluid/PaddleCV/rcnn/_ce.py
similarity index 78%
rename from fluid/PaddleCV/faster_rcnn/_ce.py
rename to fluid/PaddleCV/rcnn/_ce.py
index 9d5850fd22c3d023eb866fa474b6f6f586ca326e..e331d1bb7cccce5ac914dfa3417fe9090bd9cf99 100644
--- a/fluid/PaddleCV/faster_rcnn/_ce.py
+++ b/fluid/PaddleCV/rcnn/_ce.py
@@ -6,18 +6,19 @@ sys.path.append(os.environ['ceroot'])
 from kpi import CostKpi
 from kpi import DurationKpi
 
-
-each_pass_duration_card1_kpi = DurationKpi('each_pass_duration_card1', 0.08, 0, actived=True)
+each_pass_duration_card1_kpi = DurationKpi(
+    'each_pass_duration_card1', 0.08, 0, actived=True)
 train_loss_card1_kpi = CostKpi('train_loss_card1', 0.08, 0)
-each_pass_duration_card4_kpi = DurationKpi('each_pass_duration_card4', 0.08, 0, actived=True)
+each_pass_duration_card4_kpi = DurationKpi(
+    'each_pass_duration_card4', 0.08, 0, actived=True)
 train_loss_card4_kpi = CostKpi('train_loss_card4', 0.08, 0)
 
 tracking_kpis = [
-        each_pass_duration_card1_kpi,
-        train_loss_card1_kpi,
-        each_pass_duration_card4_kpi,
-        train_loss_card4_kpi,
-        ]
+    each_pass_duration_card1_kpi,
+    train_loss_card1_kpi,
+    each_pass_duration_card4_kpi,
+    train_loss_card4_kpi,
+]
 
 
 def parse_log(log):
diff --git a/fluid/PaddleCV/faster_rcnn/box_utils.py b/fluid/PaddleCV/rcnn/box_utils.py
similarity index 88%
rename from fluid/PaddleCV/faster_rcnn/box_utils.py
rename to fluid/PaddleCV/rcnn/box_utils.py
index 64d7d96948b856f4ae5c28594e9fb19a3a18480e..bb3fe9c8f0cb261004578abba651ad7210518a22 100644
--- a/fluid/PaddleCV/faster_rcnn/box_utils.py
+++ b/fluid/PaddleCV/rcnn/box_utils.py
@@ -69,6 +69,7 @@ def clip_xyxy_to_image(x1, y1, x2, y2, height, width):
     y2 = np.minimum(height - 1., np.maximum(0., y2))
     return x1, y1, x2, y2
 
+
 def nms(dets, thresh):
     """Apply classic DPM-style greedy NMS."""
     if dets.shape[0] == 0:
@@ -123,3 +124,21 @@ def nms(dets, thresh):
 
     return np.where(suppressed == 0)[0]
 
+
+def expand_boxes(boxes, scale):
+    """Expand an array of boxes by a given scale."""
+    w_half = (boxes[:, 2] - boxes[:, 0]) * .5
+    h_half = (boxes[:, 3] - boxes[:, 1]) * .5
+    x_c = (boxes[:, 2] + boxes[:, 0]) * .5
+    y_c = (boxes[:, 3] + boxes[:, 1]) * .5
+
+    w_half *= scale
+    h_half *= scale
+
+    boxes_exp = np.zeros(boxes.shape)
+    boxes_exp[:, 0] = x_c - w_half
+    boxes_exp[:, 2] = x_c + w_half
+    boxes_exp[:, 1] = y_c - h_half
+    boxes_exp[:, 3] = y_c + h_half
+
+    return boxes_exp
diff --git a/fluid/PaddleCV/rcnn/colormap.py b/fluid/PaddleCV/rcnn/colormap.py
new file mode 100644
index 0000000000000000000000000000000000000000..8c2447794fc2e9841b30c2cdf11e8fc70d20d764
--- /dev/null
+++ b/fluid/PaddleCV/rcnn/colormap.py
@@ -0,0 +1,61 @@
+#  Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+# Based on:
+# --------------------------------------------------------
+# Detectron
+# Copyright (c) 2017-present, Facebook, Inc.
+# Licensed under the Apache License, Version 2.0;
+# Written by Ross Girshick
+# --------------------------------------------------------
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import numpy as np
+
+
+def colormap(rgb=False):
+    color_list = np.array([
+        0.000, 0.447, 0.741, 0.850, 0.325, 0.098, 0.929, 0.694, 0.125, 0.494,
+        0.184, 0.556, 0.466, 0.674, 0.188, 0.301, 0.745, 0.933, 0.635, 0.078,
+        0.184, 0.300, 0.300, 0.300, 0.600, 0.600, 0.600, 1.000, 0.000, 0.000,
+        1.000, 0.500, 0.000, 0.749, 0.749, 0.000, 0.000, 1.000, 0.000, 0.000,
+        0.000, 1.000, 0.667, 0.000, 1.000, 0.333, 0.333, 0.000, 0.333, 0.667,
+        0.000, 0.333, 1.000, 0.000, 0.667, 0.333, 0.000, 0.667, 0.667, 0.000,
+        0.667, 1.000, 0.000, 1.000, 0.333, 0.000, 1.000, 0.667, 0.000, 1.000,
+        1.000, 0.000, 0.000, 0.333, 0.500, 0.000, 0.667, 0.500, 0.000, 1.000,
+        0.500, 0.333, 0.000, 0.500, 0.333, 0.333, 0.500, 0.333, 0.667, 0.500,
+        0.333, 1.000, 0.500, 0.667, 0.000, 0.500, 0.667, 0.333, 0.500, 0.667,
+        0.667, 0.500, 0.667, 1.000, 0.500, 1.000, 0.000, 0.500, 1.000, 0.333,
+        0.500, 1.000, 0.667, 0.500, 1.000, 1.000, 0.500, 0.000, 0.333, 1.000,
+        0.000, 0.667, 1.000, 0.000, 1.000, 1.000, 0.333, 0.000, 1.000, 0.333,
+        0.333, 1.000, 0.333, 0.667, 1.000, 0.333, 1.000, 1.000, 0.667, 0.000,
+        1.000, 0.667, 0.333, 1.000, 0.667, 0.667, 1.000, 0.667, 1.000, 1.000,
+        1.000, 0.000, 1.000, 1.000, 0.333, 1.000, 1.000, 0.667, 1.000, 0.167,
+        0.000, 0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000,
+        0.000, 0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000,
+        0.000, 0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000,
+        0.833, 0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.167, 0.000, 0.000,
+        0.333, 0.000, 0.000, 0.500, 0.000, 0.000, 0.667, 0.000, 0.000, 0.833,
+        0.000, 0.000, 1.000, 0.000, 0.000, 0.000, 0.143, 0.143, 0.143, 0.286,
+        0.286, 0.286, 0.429, 0.429, 0.429, 0.571, 0.571, 0.571, 0.714, 0.714,
+        0.714, 0.857, 0.857, 0.857, 1.000, 1.000, 1.000
+    ]).astype(np.float32)
+    color_list = color_list.reshape((-1, 3)) * 255
+    if not rgb:
+        color_list = color_list[:, ::-1]
+    return color_list
diff --git a/fluid/PaddleCV/faster_rcnn/config.py b/fluid/PaddleCV/rcnn/config.py
similarity index 93%
rename from fluid/PaddleCV/faster_rcnn/config.py
rename to fluid/PaddleCV/rcnn/config.py
index 44b35f7509eeb1adf316e3e725aef8a729bf6499..2a8ebdf7c1871f5863facd6e2138993ed4d7ffd1 100644
--- a/fluid/PaddleCV/faster_rcnn/config.py
+++ b/fluid/PaddleCV/rcnn/config.py
@@ -90,6 +90,9 @@ _C.TRAIN.freeze_at = 2
 # min area of ground truth box
 _C.TRAIN.gt_min_area = -1
 
+# Use horizontally-flipped images during training?
+_C.TRAIN.use_flipped = True
+
 #
 # Inference options
 #
@@ -120,7 +123,7 @@ _C.TEST.rpn_post_nms_top_n = 1000
 _C.TEST.rpn_min_size = 0.0
 
 # max number of detections
-_C.TEST.detectiions_per_im = 100
+_C.TEST.detections_per_im = 100
 
 # NMS threshold used on RPN proposals
 _C.TEST.rpn_nms_thresh = 0.7
@@ -129,6 +132,9 @@ _C.TEST.rpn_nms_thresh = 0.7
 # Model options
 #
 
+# Whether use mask rcnn head
+_C.MASK_ON = True
+
 # weight for bbox regression targets
 _C.bbox_reg_weights = [0.1, 0.1, 0.2, 0.2]
 
@@ -156,6 +162,15 @@ _C.roi_resolution = 14
 # spatial scale 
 _C.spatial_scale = 1. / 16.
 
+# resolution to represent mask labels
+_C.resolution = 14
+
+# Number of channels in the mask head
+_C.dim_reduced = 256
+
+# Threshold for converting soft masks to hard masks
+_C.mrcnn_thresh_binarize = 0.5
+
 #
 # SOLVER options
 #
@@ -204,12 +219,6 @@ _C.pixel_means = [102.9801, 115.9465, 122.7717]
 # clip box to prevent overflowing
 _C.bbox_clip = np.log(1000. / 16.)
 
-# dataset path
-_C.train_file_list = 'annotations/instances_train2017.json'
-_C.train_data_dir = 'train2017'
-_C.val_file_list = 'annotations/instances_val2017.json'
-_C.val_data_dir = 'val2017'
-
 
 def merge_cfg_from_args(args, mode):
     """Merge config keys, values in args into the global config."""
diff --git a/fluid/PaddleCV/faster_rcnn/data_utils.py b/fluid/PaddleCV/rcnn/data_utils.py
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/data_utils.py
rename to fluid/PaddleCV/rcnn/data_utils.py
diff --git a/fluid/PaddleCV/faster_rcnn/dataset/coco/download.sh b/fluid/PaddleCV/rcnn/dataset/coco/download.sh
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/dataset/coco/download.sh
rename to fluid/PaddleCV/rcnn/dataset/coco/download.sh
diff --git a/fluid/PaddleCV/faster_rcnn/edict.py b/fluid/PaddleCV/rcnn/edict.py
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/edict.py
rename to fluid/PaddleCV/rcnn/edict.py
diff --git a/fluid/PaddleCV/faster_rcnn/eval_coco_map.py b/fluid/PaddleCV/rcnn/eval_coco_map.py
similarity index 57%
rename from fluid/PaddleCV/faster_rcnn/eval_coco_map.py
rename to fluid/PaddleCV/rcnn/eval_coco_map.py
index f8c755a3d0f880a47791f1c43aa161cfa0e5ff98..a8b2556bec4e2a551381b85e797210339b8caa8b 100644
--- a/fluid/PaddleCV/faster_rcnn/eval_coco_map.py
+++ b/fluid/PaddleCV/rcnn/eval_coco_map.py
@@ -18,8 +18,7 @@ from __future__ import print_function
 import os
 import time
 import numpy as np
-from eval_helper import get_nmsed_box
-from eval_helper import get_dt_res
+from eval_helper import *
 import paddle
 import paddle.fluid as fluid
 import reader
@@ -30,21 +29,21 @@ import json
 from pycocotools.coco import COCO
 from pycocotools.cocoeval import COCOeval, Params
 from config import cfg
+from roidbs import DatasetPath
 
 
 def eval():
-    if '2014' in cfg.dataset:
-        test_list = 'annotations/instances_val2014.json'
-    elif '2017' in cfg.dataset:
-        test_list = 'annotations/instances_val2017.json'
+
+    data_path = DatasetPath('val')
+    test_list = data_path.get_file_list()
 
     image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size]
     class_nums = cfg.class_num
     devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
     devices_num = len(devices.split(","))
     total_batch_size = devices_num * cfg.TRAIN.im_per_batch
-    cocoGt = COCO(os.path.join(cfg.data_dir, test_list))
-    numId_to_catId_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())}
+    cocoGt = COCO(test_list)
+    num_id_to_cat_id_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())}
     category_ids = cocoGt.getCatIds()
     label_list = {
         item['id']: item['name']
@@ -52,51 +51,82 @@ def eval():
     }
     label_list[0] = ['background']
 
-    model = model_builder.FasterRCNN(
+    model = model_builder.RCNN(
         add_conv_body_func=resnet.add_ResNet50_conv4_body,
         add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
         use_pyreader=False,
         is_train=False)
     model.build_model(image_shape)
-    rpn_rois, confs, locs = model.eval_out()
+    pred_boxes = model.eval_bbox_out()
+    if cfg.MASK_ON:
+        masks = model.eval_mask_out()
     place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
     exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
     # yapf: disable
     if cfg.pretrained_model:
         def if_exist(var):
             return os.path.exists(os.path.join(cfg.pretrained_model, var.name))
         fluid.io.load_vars(exe, cfg.pretrained_model, predicate=if_exist)
+
     # yapf: enable
     test_reader = reader.test(total_batch_size)
     feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
 
     dts_res = []
-    fetch_list = [rpn_rois, confs, locs]
+    segms_res = []
+    if cfg.MASK_ON:
+        fetch_list = [pred_boxes, masks]
+    else:
+        fetch_list = [pred_boxes]
+    eval_start = time.time()
     for batch_id, batch_data in enumerate(test_reader()):
         start = time.time()
         im_info = []
         for data in batch_data:
             im_info.append(data[1])
-        rpn_rois_v, confs_v, locs_v = exe.run(
-            fetch_list=[v.name for v in fetch_list],
-            feed=feeder.feed(batch_data),
-            return_numpy=False)
-        new_lod, nmsed_out = get_nmsed_box(rpn_rois_v, confs_v, locs_v,
-                                           class_nums, im_info,
-                                           numId_to_catId_map)
+        result = exe.run(fetch_list=[v.name for v in fetch_list],
+                         feed=feeder.feed(batch_data),
+                         return_numpy=False)
+
+        pred_boxes_v = result[0]
+        if cfg.MASK_ON:
+            masks_v = result[1]
 
-        dts_res += get_dt_res(total_batch_size, new_lod, nmsed_out, batch_data)
+        new_lod = pred_boxes_v.lod()
+        nmsed_out = pred_boxes_v
+
+        dts_res += get_dt_res(total_batch_size, new_lod[0], nmsed_out,
+                              batch_data, num_id_to_cat_id_map)
+
+        if cfg.MASK_ON and np.array(masks_v).shape != (1, 1):
+            segms_out = segm_results(nmsed_out, masks_v, im_info)
+            segms_res += get_segms_res(total_batch_size, new_lod[0], segms_out,
+                                       batch_data, num_id_to_cat_id_map)
         end = time.time()
         print('batch id: {}, time: {}'.format(batch_id, end - start))
-    with open("detection_result.json", 'w') as outfile:
+    eval_end = time.time()
+    total_time = eval_end - eval_start
+    print('average time of eval is: {}'.format(total_time / (batch_id + 1)))
+    with open("detection_bbox_result.json", 'w') as outfile:
         json.dump(dts_res, outfile)
-    print("start evaluate using coco api")
-    cocoDt = cocoGt.loadRes("detection_result.json")
+    print("start evaluate bbox using coco api")
+    cocoDt = cocoGt.loadRes("detection_bbox_result.json")
     cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')
     cocoEval.evaluate()
     cocoEval.accumulate()
     cocoEval.summarize()
 
+    if cfg.MASK_ON:
+        with open("detection_segms_result.json", 'w') as outfile:
+            json.dump(segms_res, outfile)
+        print("start evaluate mask using coco api")
+        cocoDt = cocoGt.loadRes("detection_segms_result.json")
+        cocoEval = COCOeval(cocoGt, cocoDt, 'segm')
+        cocoEval.evaluate()
+        cocoEval.accumulate()
+        cocoEval.summarize()
+
 
 if __name__ == '__main__':
     args = parse_args()
diff --git a/fluid/PaddleCV/faster_rcnn/eval_helper.py b/fluid/PaddleCV/rcnn/eval_helper.py
similarity index 53%
rename from fluid/PaddleCV/faster_rcnn/eval_helper.py
rename to fluid/PaddleCV/rcnn/eval_helper.py
index 852b52955915bf268f930ce3b0fa35de5734b1ea..92f76e68ea697ac2fb987ddb10c815a82d746020 100644
--- a/fluid/PaddleCV/faster_rcnn/eval_helper.py
+++ b/fluid/PaddleCV/rcnn/eval_helper.py
@@ -21,6 +21,10 @@ from PIL import Image
 from PIL import ImageDraw
 from PIL import ImageFont
 from config import cfg
+import pycocotools.mask as mask_util
+import six
+from colormap import colormap
+import cv2
 
 
 def box_decoder(deltas, boxes, weights):
@@ -80,8 +84,7 @@ def clip_tiled_boxes(boxes, im_shape):
     return boxes
 
 
-def get_nmsed_box(rpn_rois, confs, locs, class_nums, im_info,
-                  numId_to_catId_map):
+def get_nmsed_box(rpn_rois, confs, locs, class_nums, im_info):
     lod = rpn_rois.lod()[0]
     rpn_rois_v = np.array(rpn_rois)
     variance_v = np.array(cfg.bbox_reg_weights)
@@ -106,38 +109,41 @@ def get_nmsed_box(rpn_rois, confs, locs, class_nums, im_info,
             inds = np.where(scores_n[:, j] > cfg.TEST.score_thresh)[0]
             scores_j = scores_n[inds, j]
             rois_j = rois_n[inds, j * 4:(j + 1) * 4]
-            dets_j = np.hstack((rois_j, scores_j[:, np.newaxis])).astype(
+            dets_j = np.hstack((scores_j[:, np.newaxis], rois_j)).astype(
                 np.float32, copy=False)
             keep = box_utils.nms(dets_j, cfg.TEST.nms_thresh)
             nms_dets = dets_j[keep, :]
             #add labels
-            cat_id = numId_to_catId_map[j]
-            label = np.array([cat_id for _ in range(len(keep))])
+            label = np.array([j for _ in range(len(keep))])
             nms_dets = np.hstack((nms_dets, label[:, np.newaxis])).astype(
                 np.float32, copy=False)
             cls_boxes[j] = nms_dets
     # Limit to max_per_image detections **over all classes**
         image_scores = np.hstack(
-            [cls_boxes[j][:, -2] for j in range(1, class_nums)])
-        if len(image_scores) > cfg.TEST.detectiions_per_im:
-            image_thresh = np.sort(image_scores)[-cfg.TEST.detectiions_per_im]
+            [cls_boxes[j][:, 1] for j in range(1, class_nums)])
+        if len(image_scores) > cfg.TEST.detections_per_im:
+            image_thresh = np.sort(image_scores)[-cfg.TEST.detections_per_im]
             for j in range(1, class_nums):
-                keep = np.where(cls_boxes[j][:, -2] >= image_thresh)[0]
+                keep = np.where(cls_boxes[j][:, 1] >= image_thresh)[0]
                 cls_boxes[j] = cls_boxes[j][keep, :]
 
         im_results_n = np.vstack([cls_boxes[j] for j in range(1, class_nums)])
         im_results[i] = im_results_n
         new_lod.append(len(im_results_n) + new_lod[-1])
-        boxes = im_results_n[:, :-2]
-        scores = im_results_n[:, -2]
-        labels = im_results_n[:, -1]
+        boxes = im_results_n[:, 2:]
+        scores = im_results_n[:, 1]
+        labels = im_results_n[:, 0]
     im_results = np.vstack([im_results[k] for k in range(len(lod) - 1)])
     return new_lod, im_results
 
 
-def get_dt_res(batch_size, lod, nmsed_out, data):
+def get_dt_res(batch_size, lod, nmsed_out, data, num_id_to_cat_id_map):
     dts_res = []
     nmsed_out_v = np.array(nmsed_out)
+    if nmsed_out_v.shape == (
+            1,
+            1, ):
+        return dts_res
     assert (len(lod) == batch_size + 1), \
       "Error Lod Tensor offset dimension. Lod({}) vs. batch_size({})"\
                     .format(len(lod), batch_size)
@@ -150,7 +156,8 @@ def get_dt_res(batch_size, lod, nmsed_out, data):
         for j in range(dt_num_this_img):
             dt = nmsed_out_v[k]
             k = k + 1
-            xmin, ymin, xmax, ymax, score, category_id = dt.tolist()
+            num_id, score, xmin, ymin, xmax, ymax = dt.tolist()
+            category_id = num_id_to_cat_id_map[num_id]
             w = xmax - xmin + 1
             h = ymax - ymin + 1
             bbox = [xmin, ymin, w, h]
@@ -164,24 +171,131 @@ def get_dt_res(batch_size, lod, nmsed_out, data):
     return dts_res
 
 
-def draw_bounding_box_on_image(image_path, nms_out, draw_threshold, label_list):
-    image = Image.open(image_path)
+def get_segms_res(batch_size, lod, segms_out, data, num_id_to_cat_id_map):
+    segms_res = []
+    segms_out_v = np.array(segms_out)
+    k = 0
+    for i in range(batch_size):
+        dt_num_this_img = lod[i + 1] - lod[i]
+        image_id = int(data[i][-1])
+        for j in range(dt_num_this_img):
+            dt = segms_out_v[k]
+            k = k + 1
+            segm, num_id, score = dt.tolist()
+            cat_id = num_id_to_cat_id_map[num_id]
+            if six.PY3:
+                if 'counts' in segm:
+                    segm['counts'] = segm['counts'].decode("utf8")
+            segm_res = {
+                'image_id': image_id,
+                'category_id': cat_id,
+                'segmentation': segm,
+                'score': score
+            }
+            segms_res.append(segm_res)
+    return segms_res
+
+
+def draw_bounding_box_on_image(image_path,
+                               nms_out,
+                               draw_threshold,
+                               label_list,
+                               num_id_to_cat_id_map,
+                               image=None):
+    if image is None:
+        image = Image.open(image_path)
     draw = ImageDraw.Draw(image)
     im_width, im_height = image.size
 
-    for dt in nms_out:
-        xmin, ymin, xmax, ymax, score, category_id = dt.tolist()
+    for dt in np.array(nms_out):
+        num_id, score, xmin, ymin, xmax, ymax = dt.tolist()
+        category_id = num_id_to_cat_id_map[num_id]
         if score < draw_threshold:
             continue
-        bbox = dt[:4]
-        xmin, ymin, xmax, ymax = bbox
         draw.line(
             [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
              (xmin, ymin)],
-            width=4,
+            width=2,
             fill='red')
         if image.mode == 'RGB':
             draw.text((xmin, ymin), label_list[int(category_id)], (255, 255, 0))
     image_name = image_path.split('/')[-1]
     print("image with bbox drawed saved as {}".format(image_name))
     image.save(image_name)
+
+
+def draw_mask_on_image(image_path, segms_out, draw_threshold, alpha=0.7):
+    image = Image.open(image_path)
+    draw = ImageDraw.Draw(image)
+    im_width, im_height = image.size
+    mask_color_id = 0
+    w_ratio = .4
+    image = np.array(image).astype('float32')
+    for dt in np.array(segms_out):
+        segm, num_id, score = dt.tolist()
+        if score < draw_threshold:
+            continue
+        mask = mask_util.decode(segm) * 255
+        color_list = colormap(rgb=True)
+        color_mask = color_list[mask_color_id % len(color_list), 0:3]
+        mask_color_id += 1
+        for c in range(3):
+            color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
+        idx = np.nonzero(mask)
+        image[idx[0], idx[1], :] *= 1.0 - alpha
+        image[idx[0], idx[1], :] += alpha * color_mask
+    image = Image.fromarray(image.astype('uint8'))
+    return image
+
+
+def segm_results(im_results, masks, im_info):
+    im_results = np.array(im_results)
+    class_num = cfg.class_num
+    M = cfg.resolution
+    scale = (M + 2.0) / M
+    lod = masks.lod()[0]
+    masks_v = np.array(masks)
+    boxes = im_results[:, 2:]
+    labels = im_results[:, 0]
+    segms_results = [[] for _ in range(len(lod) - 1)]
+    sum = 0
+    for i in range(len(lod) - 1):
+        im_results_n = im_results[lod[i]:lod[i + 1]]
+        cls_segms = []
+        masks_n = masks_v[lod[i]:lod[i + 1]]
+        boxes_n = boxes[lod[i]:lod[i + 1]]
+        labels_n = labels[lod[i]:lod[i + 1]]
+        im_h = int(round(im_info[i][0] / im_info[i][2]))
+        im_w = int(round(im_info[i][1] / im_info[i][2]))
+        boxes_n = box_utils.expand_boxes(boxes_n, scale)
+        boxes_n = boxes_n.astype(np.int32)
+        padded_mask = np.zeros((M + 2, M + 2), dtype=np.float32)
+        for j in range(len(im_results_n)):
+            class_id = int(labels_n[j])
+            padded_mask[1:-1, 1:-1] = masks_n[j, class_id, :, :]
+
+            ref_box = boxes_n[j, :]
+            w = ref_box[2] - ref_box[0] + 1
+            h = ref_box[3] - ref_box[1] + 1
+            w = np.maximum(w, 1)
+            h = np.maximum(h, 1)
+
+            mask = cv2.resize(padded_mask, (w, h))
+            mask = np.array(mask > cfg.mrcnn_thresh_binarize, dtype=np.uint8)
+            im_mask = np.zeros((im_h, im_w), dtype=np.uint8)
+
+            x_0 = max(ref_box[0], 0)
+            x_1 = min(ref_box[2] + 1, im_w)
+            y_0 = max(ref_box[1], 0)
+            y_1 = min(ref_box[3] + 1, im_h)
+            im_mask[y_0:y_1, x_0:x_1] = mask[(y_0 - ref_box[1]):(y_1 - ref_box[
+                1]), (x_0 - ref_box[0]):(x_1 - ref_box[0])]
+            sum += im_mask.sum()
+            rle = mask_util.encode(
+                np.array(
+                    im_mask[:, :, np.newaxis], order='F'))[0]
+            cls_segms.append(rle)
+        segms_results[i] = np.array(cls_segms)[:, np.newaxis]
+    segms_results = np.vstack([segms_results[k] for k in range(len(lod) - 1)])
+    im_results = np.hstack([segms_results, im_results])
+    return im_results[:, :3]
diff --git a/fluid/PaddleCV/faster_rcnn/image/000000000139.jpg b/fluid/PaddleCV/rcnn/image/000000000139.jpg
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/image/000000000139.jpg
rename to fluid/PaddleCV/rcnn/image/000000000139.jpg
diff --git a/fluid/PaddleCV/rcnn/image/000000000139_mask.jpg b/fluid/PaddleCV/rcnn/image/000000000139_mask.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..47dfa9a435bf81c8585e8100413cfc0d6719754c
Binary files /dev/null and b/fluid/PaddleCV/rcnn/image/000000000139_mask.jpg differ
diff --git a/fluid/PaddleCV/faster_rcnn/image/000000127517.jpg b/fluid/PaddleCV/rcnn/image/000000127517.jpg
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/image/000000127517.jpg
rename to fluid/PaddleCV/rcnn/image/000000127517.jpg
diff --git a/fluid/PaddleCV/rcnn/image/000000127517_mask.jpg b/fluid/PaddleCV/rcnn/image/000000127517_mask.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c0284591deadf6010bf780acf16124231c42d677
Binary files /dev/null and b/fluid/PaddleCV/rcnn/image/000000127517_mask.jpg differ
diff --git a/fluid/PaddleCV/faster_rcnn/image/000000203864.jpg b/fluid/PaddleCV/rcnn/image/000000203864.jpg
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/image/000000203864.jpg
rename to fluid/PaddleCV/rcnn/image/000000203864.jpg
diff --git a/fluid/PaddleCV/faster_rcnn/image/000000515077.jpg b/fluid/PaddleCV/rcnn/image/000000515077.jpg
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/image/000000515077.jpg
rename to fluid/PaddleCV/rcnn/image/000000515077.jpg
diff --git a/fluid/PaddleCV/faster_rcnn/infer.py b/fluid/PaddleCV/rcnn/infer.py
similarity index 60%
rename from fluid/PaddleCV/faster_rcnn/infer.py
rename to fluid/PaddleCV/rcnn/infer.py
index 3c7200f9de57bbd8d42df9dcb7d72c8fdca7e253..53365c015401dae7fb9facb640c8076fa5748c2f 100644
--- a/fluid/PaddleCV/faster_rcnn/infer.py
+++ b/fluid/PaddleCV/rcnn/infer.py
@@ -1,9 +1,7 @@
 import os
 import time
 import numpy as np
-from eval_helper import get_nmsed_box
-from eval_helper import get_dt_res
-from eval_helper import draw_bounding_box_on_image
+from eval_helper import *
 import paddle
 import paddle.fluid as fluid
 import reader
@@ -14,17 +12,16 @@ import json
 from pycocotools.coco import COCO
 from pycocotools.cocoeval import COCOeval, Params
 from config import cfg
+from roidbs import DatasetPath
 
 
 def infer():
 
-    if '2014' in cfg.dataset:
-        test_list = 'annotations/instances_val2014.json'
-    elif '2017' in cfg.dataset:
-        test_list = 'annotations/instances_val2017.json'
+    data_path = DatasetPath('val')
+    test_list = data_path.get_file_list()
 
-    cocoGt = COCO(os.path.join(cfg.data_dir, test_list))
-    numId_to_catId_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())}
+    cocoGt = COCO(test_list)
+    num_id_to_cat_id_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())}
     category_ids = cocoGt.getCatIds()
     label_list = {
         item['id']: item['name']
@@ -34,13 +31,15 @@ def infer():
     image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size]
     class_nums = cfg.class_num
 
-    model = model_builder.FasterRCNN(
+    model = model_builder.RCNN(
         add_conv_body_func=resnet.add_ResNet50_conv4_body,
         add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
         use_pyreader=False,
         is_train=False)
     model.build_model(image_shape)
-    rpn_rois, confs, locs = model.eval_out()
+    pred_boxes = model.eval_bbox_out()
+    if cfg.MASK_ON:
+        masks = model.eval_mask_out()
     place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
     exe = fluid.Executor(place)
     # yapf: disable
@@ -53,17 +52,29 @@ def infer():
     feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
 
     dts_res = []
-    fetch_list = [rpn_rois, confs, locs]
+    segms_res = []
+    if cfg.MASK_ON:
+        fetch_list = [pred_boxes, masks]
+    else:
+        fetch_list = [pred_boxes]
     data = next(infer_reader())
     im_info = [data[0][1]]
-    rpn_rois_v, confs_v, locs_v = exe.run(
-        fetch_list=[v.name for v in fetch_list],
-        feed=feeder.feed(data),
-        return_numpy=False)
-    new_lod, nmsed_out = get_nmsed_box(rpn_rois_v, confs_v, locs_v, class_nums,
-                                       im_info, numId_to_catId_map)
+    result = exe.run(fetch_list=[v.name for v in fetch_list],
+                     feed=feeder.feed(data),
+                     return_numpy=False)
+    pred_boxes_v = result[0]
+    if cfg.MASK_ON:
+        masks_v = result[1]
+    new_lod = pred_boxes_v.lod()
+    nmsed_out = pred_boxes_v
     path = os.path.join(cfg.image_path, cfg.image_name)
-    draw_bounding_box_on_image(path, nmsed_out, cfg.draw_threshold, label_list)
+    image = None
+    if cfg.MASK_ON:
+        segms_out = segm_results(nmsed_out, masks_v, im_info)
+        image = draw_mask_on_image(path, segms_out, cfg.draw_threshold)
+
+    draw_bounding_box_on_image(path, nmsed_out, cfg.draw_threshold, label_list,
+                               num_id_to_cat_id_map, image)
 
 
 if __name__ == '__main__':
diff --git a/fluid/PaddleCV/faster_rcnn/learning_rate.py b/fluid/PaddleCV/rcnn/learning_rate.py
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/learning_rate.py
rename to fluid/PaddleCV/rcnn/learning_rate.py
diff --git a/fluid/PaddleCV/faster_rcnn/models/__init__.py b/fluid/PaddleCV/rcnn/models/__init__.py
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/models/__init__.py
rename to fluid/PaddleCV/rcnn/models/__init__.py
diff --git a/fluid/PaddleCV/faster_rcnn/models/model_builder.py b/fluid/PaddleCV/rcnn/models/model_builder.py
similarity index 59%
rename from fluid/PaddleCV/faster_rcnn/models/model_builder.py
rename to fluid/PaddleCV/rcnn/models/model_builder.py
index 9be2f330a62081107d57566962aadc32e1ac687a..ddc1dc7539df063110d4e9b5b28baa4c2ec95c02 100644
--- a/fluid/PaddleCV/faster_rcnn/models/model_builder.py
+++ b/fluid/PaddleCV/rcnn/models/model_builder.py
@@ -16,11 +16,12 @@ import paddle.fluid as fluid
 from paddle.fluid.param_attr import ParamAttr
 from paddle.fluid.initializer import Constant
 from paddle.fluid.initializer import Normal
+from paddle.fluid.initializer import MSRA
 from paddle.fluid.regularizer import L2Decay
 from config import cfg
 
 
-class FasterRCNN(object):
+class RCNN(object):
     def __init__(self,
                  add_conv_body_func=None,
                  add_roi_box_head_func=None,
@@ -32,7 +33,6 @@ class FasterRCNN(object):
         self.is_train = is_train
         self.use_pyreader = use_pyreader
         self.use_random = use_random
-        #self.py_reader = None
 
     def build_model(self, image_shape):
         self.build_input(image_shape)
@@ -41,31 +41,62 @@ class FasterRCNN(object):
         self.rpn_heads(body_conv)
         # Fast RCNN
         self.fast_rcnn_heads(body_conv)
+        if not self.is_train:
+            self.eval_bbox()
+        # Mask RCNN
+        if cfg.MASK_ON:
+            self.mask_rcnn_heads(body_conv)
 
     def loss(self):
+        losses = []
         # Fast RCNN loss
         loss_cls, loss_bbox = self.fast_rcnn_loss()
         # RPN loss
         rpn_cls_loss, rpn_reg_loss = self.rpn_loss()
-        return loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss,
+        losses = [loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss]
+        rkeys = ['loss', 'loss_cls', 'loss_bbox', \
+                 'loss_rpn_cls', 'loss_rpn_bbox',]
+        if cfg.MASK_ON:
+            loss_mask = self.mask_rcnn_loss()
+            losses = losses + [loss_mask]
+            rkeys = rkeys + ["loss_mask"]
+        loss = fluid.layers.sum(losses)
+        rloss = [loss] + losses
+        return rloss, rkeys
 
-    def eval_out(self):
-        cls_prob = fluid.layers.softmax(self.cls_score, use_cudnn=False)
-        return [self.rpn_rois, cls_prob, self.bbox_pred]
+    def eval_mask_out(self):
+        return self.mask_fcn_logits
+
+    def eval_bbox_out(self):
+        return self.pred_result
 
     def build_input(self, image_shape):
         if self.use_pyreader:
+            in_shapes = [[-1] + image_shape, [-1, 4], [-1, 1], [-1, 1],
+                         [-1, 3], [-1, 1]]
+            lod_levels = [0, 1, 1, 1, 0, 0]
+            dtypes = [
+                'float32', 'float32', 'int32', 'int32', 'float32', 'int32'
+            ]
+            if cfg.MASK_ON:
+                in_shapes.append([-1, 2])
+                lod_levels.append(3)
+                dtypes.append('float32')
             self.py_reader = fluid.layers.py_reader(
                 capacity=64,
-                shapes=[[-1] + image_shape, [-1, 4], [-1, 1], [-1, 1], [-1, 3],
-                        [-1, 1]],
-                lod_levels=[0, 1, 1, 1, 0, 0],
-                dtypes=[
-                    "float32", "float32", "int32", "int32", "float32", "int32"
-                ],
+                shapes=in_shapes,
+                lod_levels=lod_levels,
+                dtypes=dtypes,
                 use_double_buffer=True)
-            self.image, self.gt_box, self.gt_label, self.is_crowd, \
-                self.im_info, self.im_id = fluid.layers.read_file(self.py_reader)
+            ins = fluid.layers.read_file(self.py_reader)
+            self.image = ins[0]
+            self.gt_box = ins[1]
+            self.gt_label = ins[2]
+            self.is_crowd = ins[3]
+            self.im_info = ins[4]
+            self.im_id = ins[5]
+            if cfg.MASK_ON:
+                self.gt_masks = ins[6]
         else:
             self.image = fluid.layers.data(
                 name='image', shape=image_shape, dtype='float32')
@@ -74,24 +105,55 @@ class FasterRCNN(object):
             self.gt_label = fluid.layers.data(
                 name='gt_label', shape=[1], dtype='int32', lod_level=1)
             self.is_crowd = fluid.layers.data(
-                name='is_crowd',
-                shape=[-1],
-                dtype='int32',
-                lod_level=1,
-                append_batch_size=False)
+                name='is_crowd', shape=[1], dtype='int32', lod_level=1)
             self.im_info = fluid.layers.data(
                 name='im_info', shape=[3], dtype='float32')
             self.im_id = fluid.layers.data(
                 name='im_id', shape=[1], dtype='int32')
+            if cfg.MASK_ON:
+                self.gt_masks = fluid.layers.data(
+                    name='gt_masks', shape=[2], dtype='float32', lod_level=3)
 
     def feeds(self):
         if not self.is_train:
             return [self.image, self.im_info, self.im_id]
+        if not cfg.MASK_ON:
+            return [
+                self.image, self.gt_box, self.gt_label, self.is_crowd,
+                self.im_info, self.im_id
+            ]
         return [
             self.image, self.gt_box, self.gt_label, self.is_crowd, self.im_info,
-            self.im_id
+            self.im_id, self.gt_masks
         ]
 
+    def eval_bbox(self):
+        self.im_scale = fluid.layers.slice(
+            self.im_info, [1], starts=[2], ends=[3])
+        im_scale_lod = fluid.layers.sequence_expand(self.im_scale,
+                                                    self.rpn_rois)
+        boxes = self.rpn_rois / im_scale_lod
+        cls_prob = fluid.layers.softmax(self.cls_score, use_cudnn=False)
+        bbox_pred_reshape = fluid.layers.reshape(self.bbox_pred,
+                                                 (-1, cfg.class_num, 4))
+        decoded_box = fluid.layers.box_coder(
+            prior_box=boxes,
+            prior_box_var=cfg.bbox_reg_weights,
+            target_box=bbox_pred_reshape,
+            code_type='decode_center_size',
+            box_normalized=False,
+            axis=1)
+        cliped_box = fluid.layers.box_clip(
+            input=decoded_box, im_info=self.im_info)
+        self.pred_result = fluid.layers.multiclass_nms(
+            bboxes=cliped_box,
+            scores=cls_prob,
+            score_threshold=cfg.TEST.score_thresh,
+            nms_top_k=-1,
+            nms_threshold=cfg.TEST.nms_thresh,
+            keep_top_k=cfg.TEST.detections_per_im,
+            normalized=False)
+
     def rpn_heads(self, rpn_input):
         # RPN hidden representation
         dim_out = rpn_input.shape[1]
@@ -157,7 +219,7 @@ class FasterRCNN(object):
         nms_thresh = param_obj.rpn_nms_thresh
         min_size = param_obj.rpn_min_size
         eta = param_obj.rpn_eta
-        rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals(
+        self.rpn_rois, self.rpn_roi_probs = fluid.layers.generate_proposals(
             scores=rpn_cls_score_prob,
             bbox_deltas=self.rpn_bbox_pred,
             im_info=self.im_info,
@@ -168,10 +230,9 @@ class FasterRCNN(object):
             nms_thresh=nms_thresh,
             min_size=min_size,
             eta=eta)
-        self.rpn_rois = rpn_rois
         if self.is_train:
             outs = fluid.layers.generate_proposal_labels(
-                rpn_rois=rpn_rois,
+                rpn_rois=self.rpn_rois,
                 gt_classes=self.gt_label,
                 is_crowd=self.is_crowd,
                 gt_boxes=self.gt_box,
@@ -191,27 +252,28 @@ class FasterRCNN(object):
             self.bbox_inside_weights = outs[3]
             self.bbox_outside_weights = outs[4]
 
+            if cfg.MASK_ON:
+                mask_out = fluid.layers.generate_mask_labels(
+                    im_info=self.im_info,
+                    gt_classes=self.gt_label,
+                    is_crowd=self.is_crowd,
+                    gt_segms=self.gt_masks,
+                    rois=self.rois,
+                    labels_int32=self.labels_int32,
+                    num_classes=cfg.class_num,
+                    resolution=cfg.resolution)
+                self.mask_rois = mask_out[0]
+                self.roi_has_mask_int32 = mask_out[1]
+                self.mask_int32 = mask_out[2]
+
     def fast_rcnn_heads(self, roi_input):
         if self.is_train:
             pool_rois = self.rois
         else:
             pool_rois = self.rpn_rois
-        if cfg.roi_func == 'RoIPool':
-            pool = fluid.layers.roi_pool(
-                input=roi_input,
-                rois=pool_rois,
-                pooled_height=cfg.roi_resolution,
-                pooled_width=cfg.roi_resolution,
-                spatial_scale=cfg.spatial_scale)
-        elif cfg.roi_func == 'RoIAlign':
-            pool = fluid.layers.roi_align(
-                input=roi_input,
-                rois=pool_rois,
-                pooled_height=cfg.roi_resolution,
-                pooled_width=cfg.roi_resolution,
-                spatial_scale=cfg.spatial_scale,
-                sampling_ratio=cfg.sampling_ratio)
-        rcnn_out = self.add_roi_box_head_func(pool)
+        self.res5_2_sum = self.add_roi_box_head_func(roi_input, pool_rois)
+        rcnn_out = fluid.layers.pool2d(
+            self.res5_2_sum, pool_type='avg', pool_size=7, name='res5_pool')
         self.cls_score = fluid.layers.fc(input=rcnn_out,
                                          size=cfg.class_num,
                                          act=None,
@@ -237,15 +299,87 @@ class FasterRCNN(object):
                                              learning_rate=2.,
                                              regularizer=L2Decay(0.)))
 
+    def SuffixNet(self, conv5):
+        mask_out = fluid.layers.conv2d_transpose(
+            input=conv5,
+            num_filters=cfg.dim_reduced,
+            filter_size=2,
+            stride=2,
+            act='relu',
+            param_attr=ParamAttr(
+                name='conv5_mask_w', initializer=MSRA(uniform=False)),
+            bias_attr=ParamAttr(
+                name='conv5_mask_b', learning_rate=2., regularizer=L2Decay(0.)))
+        act_func = None
+        if not self.is_train:
+            act_func = 'sigmoid'
+        mask_fcn_logits = fluid.layers.conv2d(
+            input=mask_out,
+            num_filters=cfg.class_num,
+            filter_size=1,
+            act=act_func,
+            param_attr=ParamAttr(
+                name='mask_fcn_logits_w', initializer=MSRA(uniform=False)),
+            bias_attr=ParamAttr(
+                name="mask_fcn_logits_b",
+                learning_rate=2.,
+                regularizer=L2Decay(0.)))
+
+        if not self.is_train:
+            mask_fcn_logits = fluid.layers.lod_reset(mask_fcn_logits,
+                                                     self.pred_result)
+        return mask_fcn_logits
+
+    def mask_rcnn_heads(self, mask_input):
+        if self.is_train:
+            conv5 = fluid.layers.gather(self.res5_2_sum,
+                                        self.roi_has_mask_int32)
+            self.mask_fcn_logits = self.SuffixNet(conv5)
+        else:
+            self.eval_bbox()
+            pred_res_shape = fluid.layers.shape(self.pred_result)
+            shape = fluid.layers.reduce_prod(pred_res_shape)
+            shape = fluid.layers.reshape(shape, [1, 1])
+            ones = fluid.layers.fill_constant([1, 1], value=1, dtype='int32')
+            cond = fluid.layers.equal(x=shape, y=ones)
+            ie = fluid.layers.IfElse(cond)
+
+            with ie.true_block():
+                pred_res_null = ie.input(self.pred_result)
+                ie.output(pred_res_null)
+            with ie.false_block():
+                pred_res = ie.input(self.pred_result)
+                pred_boxes = fluid.layers.slice(
+                    pred_res, [1], starts=[2], ends=[6])
+                im_scale_lod = fluid.layers.sequence_expand(self.im_scale,
+                                                            pred_boxes)
+                mask_rois = pred_boxes * im_scale_lod
+                conv5 = self.add_roi_box_head_func(mask_input, mask_rois)
+                mask_fcn = self.SuffixNet(conv5)
+                ie.output(mask_fcn)
+            self.mask_fcn_logits = ie()[0]
+
+    def mask_rcnn_loss(self):
+        mask_label = fluid.layers.cast(x=self.mask_int32, dtype='float32')
+        reshape_dim = cfg.class_num * cfg.resolution * cfg.resolution
+        mask_fcn_logits_reshape = fluid.layers.reshape(self.mask_fcn_logits,
+                                                       (-1, reshape_dim))
+
+        loss_mask = fluid.layers.sigmoid_cross_entropy_with_logits(
+            x=mask_fcn_logits_reshape,
+            label=mask_label,
+            ignore_index=-1,
+            normalize=True)
+        loss_mask = fluid.layers.reduce_sum(loss_mask, name='loss_mask')
+        return loss_mask
+
     def fast_rcnn_loss(self):
         labels_int64 = fluid.layers.cast(x=self.labels_int32, dtype='int64')
         labels_int64.stop_gradient = True
-        #loss_cls = fluid.layers.softmax_with_cross_entropy(
-        #        logits=cls_score,
-        #        label=labels_int64
-        #        )
-        cls_prob = fluid.layers.softmax(self.cls_score, use_cudnn=False)
-        loss_cls = fluid.layers.cross_entropy(cls_prob, labels_int64)
+        loss_cls = fluid.layers.softmax_with_cross_entropy(
+            logits=self.cls_score,
+            label=labels_int64,
+            numeric_stable_mode=True, )
         loss_cls = fluid.layers.reduce_mean(loss_cls)
         loss_bbox = fluid.layers.smooth_l1(
             x=self.bbox_pred,
@@ -303,5 +437,4 @@ class FasterRCNN(object):
         norm = fluid.layers.reduce_prod(score_shape)
         norm.stop_gradient = True
         rpn_reg_loss = rpn_reg_loss / norm
-
         return rpn_cls_loss, rpn_reg_loss
diff --git a/fluid/PaddleCV/faster_rcnn/models/resnet.py b/fluid/PaddleCV/rcnn/models/resnet.py
similarity index 88%
rename from fluid/PaddleCV/faster_rcnn/models/resnet.py
rename to fluid/PaddleCV/rcnn/models/resnet.py
index e868a1506afe4124036d2ecef4acf83676ba02f9..8093470241b3297c44a2e42b5162e25cac1514be 100644
--- a/fluid/PaddleCV/faster_rcnn/models/resnet.py
+++ b/fluid/PaddleCV/rcnn/models/resnet.py
@@ -160,8 +160,22 @@ def add_ResNet50_conv4_body(body_input):
     return res4
 
 
-def add_ResNet_roi_conv5_head(head_input):
-    res5 = layer_warp(bottleneck, head_input, 512, 3, 2, name="res5")
-    res5_pool = fluid.layers.pool2d(
-        res5, pool_type='avg', pool_size=7, name='res5_pool')
-    return res5_pool
+def add_ResNet_roi_conv5_head(head_input, rois):
+    if cfg.roi_func == 'RoIPool':
+        pool = fluid.layers.roi_pool(
+            input=head_input,
+            rois=rois,
+            pooled_height=cfg.roi_resolution,
+            pooled_width=cfg.roi_resolution,
+            spatial_scale=cfg.spatial_scale)
+    elif cfg.roi_func == 'RoIAlign':
+        pool = fluid.layers.roi_align(
+            input=head_input,
+            rois=rois,
+            pooled_height=cfg.roi_resolution,
+            pooled_width=cfg.roi_resolution,
+            spatial_scale=cfg.spatial_scale,
+            sampling_ratio=cfg.sampling_ratio)
+
+    res5 = layer_warp(bottleneck, pool, 512, 3, 2, name="res5")
+    return res5
diff --git a/fluid/PaddleCV/faster_rcnn/pretrained/download.sh b/fluid/PaddleCV/rcnn/pretrained/download.sh
similarity index 100%
rename from fluid/PaddleCV/faster_rcnn/pretrained/download.sh
rename to fluid/PaddleCV/rcnn/pretrained/download.sh
diff --git a/fluid/PaddleCV/faster_rcnn/profile.py b/fluid/PaddleCV/rcnn/profile.py
similarity index 77%
rename from fluid/PaddleCV/faster_rcnn/profile.py
rename to fluid/PaddleCV/rcnn/profile.py
index 73634bd6773ecb1606a43b297f0966e2d55506b3..92f089b4238a545595723bd8251c6a2e715a59d3 100644
--- a/fluid/PaddleCV/faster_rcnn/profile.py
+++ b/fluid/PaddleCV/rcnn/profile.py
@@ -37,18 +37,15 @@ def train():
     devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
     devices_num = len(devices.split(","))
     total_batch_size = devices_num * cfg.TRAIN.im_per_batch
-    model = model_builder.FasterRCNN(
+    model = model_builder.RCNN(
         add_conv_body_func=resnet.add_ResNet50_conv4_body,
         add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
         use_pyreader=cfg.use_pyreader,
         use_random=False)
     model.build_model(image_shape)
-    loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss = model.loss()
-    loss_cls.persistable = True
-    loss_bbox.persistable = True
-    rpn_cls_loss.persistable = True
-    rpn_reg_loss.persistable = True
-    loss = loss_cls + loss_bbox + rpn_cls_loss + rpn_reg_loss
+    losses, keys = model.loss()
+    loss = losses[0]
+    fetch_list = [loss]
 
     boundaries = cfg.lr_steps
     gamma = cfg.lr_gamma
@@ -95,8 +92,6 @@ def train():
         train_reader = reader.train(batch_size=total_batch_size, shuffle=False)
         feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
 
-    fetch_list = [loss, loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss]
-
     def run(iterations):
         reader_time = []
         run_time = []
@@ -109,20 +104,16 @@ def train():
             reader_time.append(end_time - start_time)
             start_time = time.time()
             if cfg.parallel:
-                losses = train_exe.run(fetch_list=[v.name for v in fetch_list],
-                                       feed=feeder.feed(data))
+                outs = train_exe.run(fetch_list=[v.name for v in fetch_list],
+                                     feed=feeder.feed(data))
             else:
-                losses = exe.run(fluid.default_main_program(),
-                                 fetch_list=[v.name for v in fetch_list],
-                                 feed=feeder.feed(data))
+                outs = exe.run(fluid.default_main_program(),
+                               fetch_list=[v.name for v in fetch_list],
+                               feed=feeder.feed(data))
             end_time = time.time()
             run_time.append(end_time - start_time)
             total_images += len(data)
-
-            lr = np.array(fluid.global_scope().find_var('learning_rate')
-                          .get_tensor())
-            print("Batch {:d}, lr {:.6f}, loss {:.6f} ".format(batch_id, lr[0],
-                                                               losses[0][0]))
+            print("Batch {:d}, loss {:.6f} ".format(batch_id, np.mean(outs[0])))
         return reader_time, run_time, total_images
 
     def run_pyreader(iterations):
@@ -135,18 +126,16 @@ def train():
             for batch_id in range(iterations):
                 start_time = time.time()
                 if cfg.parallel:
-                    losses = train_exe.run(
+                    outs = train_exe.run(
                         fetch_list=[v.name for v in fetch_list])
                 else:
-                    losses = exe.run(fluid.default_main_program(),
-                                     fetch_list=[v.name for v in fetch_list])
+                    outs = exe.run(fluid.default_main_program(),
+                                   fetch_list=[v.name for v in fetch_list])
                 end_time = time.time()
                 run_time.append(end_time - start_time)
                 total_images += devices_num
-                lr = np.array(fluid.global_scope().find_var('learning_rate')
-                              .get_tensor())
-                print("Batch {:d}, lr {:.6f}, loss {:.6f} ".format(batch_id, lr[
-                    0], losses[0][0]))
+                print("Batch {:d}, loss {:.6f} ".format(batch_id,
+                                                        np.mean(outs[0])))
         except fluid.core.EOFException:
             py_reader.reset()
 
diff --git a/fluid/PaddleCV/faster_rcnn/reader.py b/fluid/PaddleCV/rcnn/reader.py
similarity index 61%
rename from fluid/PaddleCV/faster_rcnn/reader.py
rename to fluid/PaddleCV/rcnn/reader.py
index 50b3d88b3995442c49833e6f69c7d6f04ea84064..a15e774d6cb3473390014da567e6657e39faed9f 100644
--- a/fluid/PaddleCV/faster_rcnn/reader.py
+++ b/fluid/PaddleCV/rcnn/reader.py
@@ -27,6 +27,46 @@ from collections import deque
 from roidbs import JsonDataset
 import data_utils
 from config import cfg
+import segm_utils
+
+
+def roidb_reader(roidb, mode):
+    im, im_scales = data_utils.get_image_blob(roidb, mode)
+    im_id = roidb['id']
+    im_height = np.round(roidb['height'] * im_scales)
+    im_width = np.round(roidb['width'] * im_scales)
+    im_info = np.array([im_height, im_width, im_scales], dtype=np.float32)
+    if mode == 'val' or mode == 'infer':
+        return im, im_info, im_id
+
+    gt_boxes = roidb['gt_boxes'].astype('float32')
+    gt_classes = roidb['gt_classes'].astype('int32')
+    is_crowd = roidb['is_crowd'].astype('int32')
+    segms = roidb['segms']
+
+    outs = (im, gt_boxes, gt_classes, is_crowd, im_info, im_id)
+
+    if cfg.MASK_ON:
+        gt_masks = []
+        valid = True
+        segms = roidb['segms']
+        assert len(segms) == is_crowd.shape[0]
+        for i in range(len(roidb['segms'])):
+            segm, iscrowd = segms[i], is_crowd[i]
+            gt_segm = []
+            if iscrowd:
+                gt_segm.append([[0, 0]])
+            else:
+                for poly in segm:
+                    if len(poly) == 0:
+                        valid = False
+                        break
+                    gt_segm.append(np.array(poly).reshape(-1, 2))
+            if (not valid) or len(gt_segm) == 0:
+                break
+            gt_masks.append(gt_segm)
+        outs = outs + (gt_masks, )
+    return outs
 
 
 def coco(mode,
@@ -34,48 +74,16 @@ def coco(mode,
          total_batch_size=None,
          padding_total=False,
          shuffle=False):
-    if 'coco2014' in cfg.dataset:
-        cfg.train_file_list = 'annotations/instances_train2014.json'
-        cfg.train_data_dir = 'train2014'
-        cfg.val_file_list = 'annotations/instances_val2014.json'
-        cfg.val_data_dir = 'val2014'
-    elif 'coco2017' in cfg.dataset:
-        cfg.train_file_list = 'annotations/instances_train2017.json'
-        cfg.train_data_dir = 'train2017'
-        cfg.val_file_list = 'annotations/instances_val2017.json'
-        cfg.val_data_dir = 'val2017'
-    else:
-        raise NotImplementedError('Dataset {} not supported'.format(
-            cfg.dataset))
     cfg.mean_value = np.array(cfg.pixel_means)[np.newaxis,
                                                np.newaxis, :].astype('float32')
     total_batch_size = total_batch_size if total_batch_size else batch_size
     if mode != 'infer':
         assert total_batch_size % batch_size == 0
-    if mode == 'train':
-        cfg.train_file_list = os.path.join(cfg.data_dir, cfg.train_file_list)
-        cfg.train_data_dir = os.path.join(cfg.data_dir, cfg.train_data_dir)
-    elif mode == 'test' or mode == 'infer':
-        cfg.val_file_list = os.path.join(cfg.data_dir, cfg.val_file_list)
-        cfg.val_data_dir = os.path.join(cfg.data_dir, cfg.val_data_dir)
-    json_dataset = JsonDataset(train=(mode == 'train'))
+    json_dataset = JsonDataset(mode)
     roidbs = json_dataset.get_roidb()
 
     print("{} on {} with {} roidbs".format(mode, cfg.dataset, len(roidbs)))
 
-    def roidb_reader(roidb, mode):
-        im, im_scales = data_utils.get_image_blob(roidb, mode)
-        im_id = roidb['id']
-        im_height = np.round(roidb['height'] * im_scales)
-        im_width = np.round(roidb['width'] * im_scales)
-        im_info = np.array([im_height, im_width, im_scales], dtype=np.float32)
-        if mode == 'test' or mode == 'infer':
-            return im, im_info, im_id
-        gt_boxes = roidb['gt_boxes'].astype('float32')
-        gt_classes = roidb['gt_classes'].astype('int32')
-        is_crowd = roidb['is_crowd'].astype('int32')
-        return im, gt_boxes, gt_classes, is_crowd, im_info, im_id
-
     def padding_minibatch(batch_data):
         if len(batch_data) == 1:
             return batch_data
@@ -93,39 +101,53 @@ def coco(mode,
 
     def reader():
         if mode == "train":
-            roidb_perm = deque(np.random.permutation(roidbs))
+            if shuffle:
+                roidb_perm = deque(np.random.permutation(roidbs))
+            else:
+                roidb_perm = deque(roidbs)
             roidb_cur = 0
+            count = 0
             batch_out = []
+            device_num = total_batch_size / batch_size
             while True:
                 roidb = roidb_perm[0]
                 roidb_cur += 1
                 roidb_perm.rotate(-1)
                 if roidb_cur >= len(roidbs):
-                    roidb_perm = deque(np.random.permutation(roidbs))
+                    if shuffle:
+                        roidb_perm = deque(np.random.permutation(roidbs))
+                    else:
+                        roidb_perm = deque(roidbs)
                     roidb_cur = 0
-                im, gt_boxes, gt_classes, is_crowd, im_info, im_id = roidb_reader(
-                    roidb, mode)
-                if gt_boxes.shape[0] == 0:
+                # im, gt_boxes, gt_classes, is_crowd, im_info, im_id, gt_masks
+                datas = roidb_reader(roidb, mode)
+                if datas[1].shape[0] == 0:
                     continue
-                batch_out.append(
-                    (im, gt_boxes, gt_classes, is_crowd, im_info, im_id))
+                if cfg.MASK_ON:
+                    if len(datas[-1]) != datas[1].shape[0]:
+                        continue
+                batch_out.append(datas)
                 if not padding_total:
                     if len(batch_out) == batch_size:
                         yield padding_minibatch(batch_out)
+                        count += 1
                         batch_out = []
                 else:
                     if len(batch_out) == total_batch_size:
                         batch_out = padding_minibatch(batch_out)
-                        for i in range(total_batch_size / batch_size):
+                        for i in range(device_num):
                             sub_batch_out = []
                             for j in range(batch_size):
                                 sub_batch_out.append(batch_out[i * batch_size +
                                                                j])
                             yield sub_batch_out
+                            count += 1
                             sub_batch_out = []
                         batch_out = []
-
-        elif mode == "test":
+                iter_id = count // device_num
+                if iter_id >= cfg.max_iter:
+                    return
+        elif mode == "val":
             batch_out = []
             for roidb in roidbs:
                 im, im_info, im_id = roidb_reader(roidb, mode)
@@ -153,7 +175,7 @@ def train(batch_size, total_batch_size=None, padding_total=False, shuffle=True):
 
 
 def test(batch_size, total_batch_size=None, padding_total=False):
-    return coco('test', batch_size, total_batch_size, shuffle=False)
+    return coco('val', batch_size, total_batch_size, shuffle=False)
 
 
 def infer():
diff --git a/fluid/PaddleCV/faster_rcnn/roidbs.py b/fluid/PaddleCV/rcnn/roidbs.py
similarity index 83%
rename from fluid/PaddleCV/faster_rcnn/roidbs.py
rename to fluid/PaddleCV/rcnn/roidbs.py
index b21dc9ed1fb01275aa57b158b0151a56ae297dc7..bd7e581999e6e7052238cadc7472b852ec88dab8 100644
--- a/fluid/PaddleCV/faster_rcnn/roidbs.py
+++ b/fluid/PaddleCV/rcnn/roidbs.py
@@ -36,24 +36,39 @@ import matplotlib
 matplotlib.use('Agg')
 from pycocotools.coco import COCO
 import box_utils
+import segm_utils
 from config import cfg
 
 logger = logging.getLogger(__name__)
 
 
+class DatasetPath(object):
+    def __init__(self, mode):
+        self.mode = mode
+        mode_name = 'train' if mode == 'train' else 'val'
+        if cfg.dataset != 'coco2014' and cfg.dataset != 'coco2017':
+            raise NotImplementedError('Dataset {} not supported'.format(
+                cfg.dataset))
+        self.sub_name = mode_name + cfg.dataset[-4:]
+
+    def get_data_dir(self):
+        return os.path.join(cfg.data_dir, self.sub_name)
+
+    def get_file_list(self):
+        sfile_list = 'annotations/instances_' + self.sub_name + '.json'
+        return os.path.join(cfg.data_dir, sfile_list)
+
+
 class JsonDataset(object):
     """A class representing a COCO json dataset."""
 
-    def __init__(self, train=False):
+    def __init__(self, mode):
         print('Creating: {}'.format(cfg.dataset))
         self.name = cfg.dataset
-        self.is_train = train
-        if self.is_train:
-            data_dir = cfg.train_data_dir
-            file_list = cfg.train_file_list
-        else:
-            data_dir = cfg.val_data_dir
-            file_list = cfg.val_file_list
+        self.is_train = mode == 'train'
+        data_path = DatasetPath(mode)
+        data_dir = data_path.get_data_dir()
+        file_list = data_path.get_file_list()
         self.image_directory = data_dir
         self.COCO = COCO(file_list)
         # Set up dataset classes
@@ -91,8 +106,9 @@ class JsonDataset(object):
             end_time = time.time()
             print('_add_gt_annotations took {:.3f}s'.format(end_time -
                                                             start_time))
-            print('Appending horizontally-flipped training examples...')
-            self._extend_with_flipped_entries(roidb)
+            if cfg.TRAIN.use_flipped:
+                print('Appending horizontally-flipped training examples...')
+                self._extend_with_flipped_entries(roidb)
         print('Loaded dataset: {:s}'.format(self.name))
         print('{:d} roidb entries'.format(len(roidb)))
         if self.is_train:
@@ -111,6 +127,7 @@ class JsonDataset(object):
         entry['gt_classes'] = np.empty((0), dtype=np.int32)
         entry['gt_id'] = np.empty((0), dtype=np.int32)
         entry['is_crowd'] = np.empty((0), dtype=np.bool)
+        entry['segms'] = []
         # Remove unwanted fields that come from the json file (if they exist)
         for k in ['date_captured', 'url', 'license', 'file_name']:
             if k in entry:
@@ -126,9 +143,15 @@ class JsonDataset(object):
         objs = self.COCO.loadAnns(ann_ids)
         # Sanitize bboxes -- some are invalid
         valid_objs = []
+        valid_segms = []
         width = entry['width']
         height = entry['height']
         for obj in objs:
+            if isinstance(obj['segmentation'], list):
+                # Valid polygons have >= 3 points, so require >= 6 coordinates
+                obj['segmentation'] = [
+                    p for p in obj['segmentation'] if len(p) >= 6
+                ]
             if obj['area'] < cfg.TRAIN.gt_min_area:
                 continue
             if 'ignore' in obj and obj['ignore'] == 1:
@@ -141,6 +164,8 @@ class JsonDataset(object):
             if obj['area'] > 0 and x2 > x1 and y2 > y1:
                 obj['clean_bbox'] = [x1, y1, x2, y2]
                 valid_objs.append(obj)
+                valid_segms.append(obj['segmentation'])
+
         num_valid_objs = len(valid_objs)
 
         gt_boxes = np.zeros((num_valid_objs, 4), dtype=entry['gt_boxes'].dtype)
@@ -158,6 +183,7 @@ class JsonDataset(object):
         entry['gt_classes'] = np.append(entry['gt_classes'], gt_classes)
         entry['gt_id'] = np.append(entry['gt_id'], gt_id)
         entry['is_crowd'] = np.append(entry['is_crowd'], is_crowd)
+        entry['segms'].extend(valid_segms)
 
     def _extend_with_flipped_entries(self, roidb):
         """Flip each entry in the given roidb and return a new roidb that is the
@@ -175,11 +201,13 @@ class JsonDataset(object):
             gt_boxes[:, 2] = width - oldx1 - 1
             assert (gt_boxes[:, 2] >= gt_boxes[:, 0]).all()
             flipped_entry = {}
-            dont_copy = ('gt_boxes', 'flipped')
+            dont_copy = ('gt_boxes', 'flipped', 'segms')
             for k, v in entry.items():
                 if k not in dont_copy:
                     flipped_entry[k] = v
             flipped_entry['gt_boxes'] = gt_boxes
+            flipped_entry['segms'] = segm_utils.flip_segms(
+                entry['segms'], entry['height'], entry['width'])
             flipped_entry['flipped'] = True
             flipped_roidb.append(flipped_entry)
         roidb.extend(flipped_roidb)
diff --git a/fluid/PaddleCV/rcnn/scripts/eval.sh b/fluid/PaddleCV/rcnn/scripts/eval.sh
new file mode 100644
index 0000000000000000000000000000000000000000..922380acf52e594931506e791990319d152d9260
--- /dev/null
+++ b/fluid/PaddleCV/rcnn/scripts/eval.sh
@@ -0,0 +1,17 @@
+#!/bin/bash
+export CUDA_VISIBLE_DEVICES=0
+
+model=$1 # faster_rcnn, mask_rcnn
+if [ "$model" = "faster_rcnn" ]; then
+  mask_on="--MASK_ON False"
+elif [ "$model" = "mask_rcnn" ]; then
+  mask_on="--MASK_ON True"
+else
+  echo "Invalid model provided. Please use one of {faster_rcnn, mask_rcnn}"
+  exit 1
+fi
+
+python -u ../eval_coco_map.py \
+    $mask_on \
+    --pretrained_model=../output/model_iter179999 \
+    --data_dir=../dataset/coco/ \
diff --git a/fluid/PaddleCV/rcnn/scripts/infer.sh b/fluid/PaddleCV/rcnn/scripts/infer.sh
new file mode 100644
index 0000000000000000000000000000000000000000..6f0e02730b9db07568c31a280825f75e321eab64
--- /dev/null
+++ b/fluid/PaddleCV/rcnn/scripts/infer.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+export CUDA_VISIBLE_DEVICES=0
+
+model=$1 # faster_rcnn, mask_rcnn
+if [ "$model" = "faster_rcnn" ]; then
+  mask_on="--MASK_ON False"
+elif [ "$model" = "mask_rcnn" ]; then
+  mask_on="--MASK_ON True"
+else
+  echo "Invalid model provided. Please use one of {faster_rcnn, mask_rcnn}"
+  exit 1
+fi
+
+python -u ../infer.py \
+    $mask_on \
+    --pretrained_model=../output/model_iter179999 \
+    --image_path=../dataset/coco/val2017/  \
+    --image_name=000000000139.jpg \
+    --draw_threshold=0.6
diff --git a/fluid/PaddleCV/rcnn/scripts/train.sh b/fluid/PaddleCV/rcnn/scripts/train.sh
new file mode 100755
index 0000000000000000000000000000000000000000..83c67e6c39121c0fecec5cd7c037d14ab53c619d
--- /dev/null
+++ b/fluid/PaddleCV/rcnn/scripts/train.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+
+model=$1 # faster_rcnn, mask_rcnn
+if [ "$model" = "faster_rcnn" ]; then
+  mask_on="--MASK_ON False"
+elif [ "$model" = "mask_rcnn" ]; then
+  mask_on="--MASK_ON True"
+else
+  echo "Invalid model provided. Please use one of {faster_rcnn, mask_rcnn}"
+  exit 1
+fi
+
+python -u ../train.py \
+    $mask_on \
+    --model_save_dir=../output/ \
+    --pretrained_model=../imagenet_resnet50_fusebn/ \
+    --data_dir=../dataset/coco/ \  
+
diff --git a/fluid/PaddleCV/rcnn/segm_utils.py b/fluid/PaddleCV/rcnn/segm_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..17b72228bc4284dc5936d4a3fda5c2422c4aa958
--- /dev/null
+++ b/fluid/PaddleCV/rcnn/segm_utils.py
@@ -0,0 +1,88 @@
+#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://w_idxw.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Based on:
+# --------------------------------------------------------
+# Detectron
+# Copyright (c) 2017-present, Facebook, Inc.
+# Licensed under the Apache License, Version 2.0;
+# Written by Ross Girshick
+# --------------------------------------------------------
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import numpy as np
+import pycocotools.mask as mask_util
+import cv2
+
+
+def is_poly(segm):
+    """Determine if segm is a polygon. Valid segm expected (polygon or RLE)."""
+    assert isinstance(segm, (list, dict)), \
+        'Invalid segm type: {}'.format(type(segm))
+    return isinstance(segm, list)
+
+
+def segms_to_rle(segms, height, width):
+    rle = segms
+    if isinstance(segms, list):
+        # polygon -- a single object might consist of multiple parts
+        # we merge all parts into one mask rle code
+        rles = mask_util.frPyObjects(segms, height, width)
+        rle = mask_util.merge(rles)
+    elif isinstance(segms['counts'], list):
+        # uncompressed RLE
+        rle = mask_util.frPyObjects(segms, height, width)
+    return rle
+
+
+def segms_to_mask(segms, iscrowd, height, width):
+    print('segms: ', segms)
+    if iscrowd:
+        return [[0 for i in range(width)] for j in range(height)]
+    rle = segms_to_rle(segms, height, width)
+    mask = mask_util.decode(rle)
+    return mask
+
+
+def flip_segms(segms, height, width):
+    """Left/right flip each mask in a list of masks."""
+
+    def _flip_poly(poly, width):
+        flipped_poly = np.array(poly)
+        flipped_poly[0::2] = width - np.array(poly[0::2]) - 1
+        return flipped_poly.tolist()
+
+    def _flip_rle(rle, height, width):
+        if 'counts' in rle and type(rle['counts']) == list:
+            # Magic RLE format handling painfully discovered by looking at the
+            # COCO API showAnns function.
+            rle = mask_util.frPyObjects([rle], height, width)
+        mask = mask_util.decode(rle)
+        mask = mask[:, ::-1, :]
+        rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8))
+        return rle
+
+    flipped_segms = []
+    for segm in segms:
+        if is_poly(segm):
+            # Polygon format
+            flipped_segms.append([_flip_poly(poly, width) for poly in segm])
+        else:
+            # RLE format
+            flipped_segms.append(_flip_rle(segm, height, width))
+    return flipped_segms
diff --git a/fluid/PaddleCV/faster_rcnn/train.py b/fluid/PaddleCV/rcnn/train.py
similarity index 62%
rename from fluid/PaddleCV/faster_rcnn/train.py
rename to fluid/PaddleCV/rcnn/train.py
index b840d2855c09e1df91601d30df1503a6003aeef5..8404de31d0be066fb41e0cbd44166bd53787c7ee 100644
--- a/fluid/PaddleCV/faster_rcnn/train.py
+++ b/fluid/PaddleCV/rcnn/train.py
@@ -20,7 +20,8 @@ import sys
 import numpy as np
 import time
 import shutil
-from utility import parse_args, print_arguments, SmoothedValue
+from utility import parse_args, print_arguments, SmoothedValue, TrainingStats, now_time
+import collections
 
 import paddle
 import paddle.fluid as fluid
@@ -35,7 +36,7 @@ def train():
     learning_rate = cfg.learning_rate
     image_shape = [3, cfg.TRAIN.max_size, cfg.TRAIN.max_size]
 
-    if cfg.debug or cfg.enable_ce:
+    if cfg.enable_ce:
         fluid.default_startup_program().random_seed = 1000
         fluid.default_main_program().random_seed = 1000
         import random
@@ -49,36 +50,36 @@ def train():
     use_random = True
     if cfg.enable_ce:
         use_random = False
-    model = model_builder.FasterRCNN(
+    model = model_builder.RCNN(
         add_conv_body_func=resnet.add_ResNet50_conv4_body,
         add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
         use_pyreader=cfg.use_pyreader,
         use_random=use_random)
     model.build_model(image_shape)
-    loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss = model.loss()
-    loss_cls.persistable = True
-    loss_bbox.persistable = True
-    rpn_cls_loss.persistable = True
-    rpn_reg_loss.persistable = True
-    loss = loss_cls + loss_bbox + rpn_cls_loss + rpn_reg_loss
+    losses, keys = model.loss()
+    loss = losses[0]
+    fetch_list = losses
 
     boundaries = cfg.lr_steps
     gamma = cfg.lr_gamma
     step_num = len(cfg.lr_steps)
     values = [learning_rate * (gamma**i) for i in range(step_num + 1)]
 
+    lr = exponential_with_warmup_decay(
+        learning_rate=learning_rate,
+        boundaries=boundaries,
+        values=values,
+        warmup_iter=cfg.warm_up_iter,
+        warmup_factor=cfg.warm_up_factor)
     optimizer = fluid.optimizer.Momentum(
-        learning_rate=exponential_with_warmup_decay(
-            learning_rate=learning_rate,
-            boundaries=boundaries,
-            values=values,
-            warmup_iter=cfg.warm_up_iter,
-            warmup_factor=cfg.warm_up_factor),
+        learning_rate=lr,
         regularization=fluid.regularizer.L2Decay(cfg.weight_decay),
         momentum=cfg.momentum)
     optimizer.minimize(loss)
+    fetch_list = fetch_list + [lr]
 
-    fluid.memory_optimize(fluid.default_main_program())
+    fluid.memory_optimize(
+        fluid.default_main_program(), skip_opt_set=set(fetch_list))
 
     place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
     exe = fluid.Executor(place)
@@ -107,7 +108,8 @@ def train():
         py_reader = model.py_reader
         py_reader.decorate_paddle_reader(train_reader)
     else:
-        train_reader = reader.train(batch_size=total_batch_size, shuffle=shuffle)
+        train_reader = reader.train(
+            batch_size=total_batch_size, shuffle=shuffle)
         feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
 
     def save_model(postfix):
@@ -116,88 +118,72 @@ def train():
             shutil.rmtree(model_path)
         fluid.io.save_persistables(exe, model_path)
 
-    fetch_list = [loss, rpn_cls_loss, rpn_reg_loss, loss_cls, loss_bbox]
-
     def train_loop_pyreader():
         py_reader.start()
-        smoothed_loss = SmoothedValue(cfg.log_window)
+        train_stats = TrainingStats(cfg.log_window, keys)
         try:
             start_time = time.time()
             prev_start_time = start_time
-            total_time = 0
-            last_loss = 0
-            every_pass_loss = []
             for iter_id in range(cfg.max_iter):
                 prev_start_time = start_time
                 start_time = time.time()
-                losses = train_exe.run(fetch_list=[v.name for v in fetch_list])
-                every_pass_loss.append(np.mean(np.array(losses[0])))
-                smoothed_loss.add_value(np.mean(np.array(losses[0])))
-                lr = np.array(fluid.global_scope().find_var('learning_rate')
-                              .get_tensor())
-                print("Iter {:d}, lr {:.6f}, loss {:.6f}, time {:.5f}".format(
-                    iter_id, lr[0],
-                    smoothed_loss.get_median_value(
-                    ), start_time - prev_start_time))
-                end_time = time.time()
-                total_time += end_time - start_time
-                last_loss = np.mean(np.array(losses[0]))
-
+                outs = train_exe.run(fetch_list=[v.name for v in fetch_list])
+                stats = {k: np.array(v).mean() for k, v in zip(keys, outs[:-1])}
+                train_stats.update(stats)
+                logs = train_stats.log()
+                strs = '{}, iter: {}, lr: {:.5f}, {}, time: {:.3f}'.format(
+                    now_time(), iter_id,
+                    np.mean(outs[-1]), logs, start_time - prev_start_time)
+                print(strs)
                 sys.stdout.flush()
                 if (iter_id + 1) % cfg.TRAIN.snapshot_iter == 0:
                     save_model("model_iter{}".format(iter_id))
-            # only for ce
+            end_time = time.time()
+            total_time = end_time - start_time
+            last_loss = np.array(outs[0]).mean()
             if cfg.enable_ce:
                 gpu_num = devices_num
                 epoch_idx = iter_id + 1
                 loss = last_loss
                 print("kpis\teach_pass_duration_card%s\t%s" %
-                        (gpu_num, total_time / epoch_idx))
-                print("kpis\ttrain_loss_card%s\t%s" %
-                        (gpu_num, loss))
-
-        except fluid.core.EOFException:
+                      (gpu_num, total_time / epoch_idx))
+                print("kpis\ttrain_loss_card%s\t%s" % (gpu_num, loss))
+        except (StopIteration, fluid.core.EOFException):
             py_reader.reset()
-        return np.mean(every_pass_loss)
 
     def train_loop():
         start_time = time.time()
         prev_start_time = start_time
         start = start_time
-        total_time = 0
-        last_loss = 0
-        every_pass_loss = []
-        smoothed_loss = SmoothedValue(cfg.log_window)
+        train_stats = TrainingStats(cfg.log_window, keys)
         for iter_id, data in enumerate(train_reader()):
             prev_start_time = start_time
             start_time = time.time()
-            losses = train_exe.run(fetch_list=[v.name for v in fetch_list],
-                                   feed=feeder.feed(data))
-            loss_v = np.mean(np.array(losses[0]))
-            every_pass_loss.append(loss_v)
-            smoothed_loss.add_value(loss_v)
-            lr = np.array(fluid.global_scope().find_var('learning_rate')
-                          .get_tensor())
-            end_time = time.time()
-            total_time += end_time - start_time
-            last_loss = loss_v
-            print("Iter {:d}, lr {:.6f}, loss {:.6f}, time {:.5f}".format(
-                iter_id, lr[0],
-                smoothed_loss.get_median_value(), start_time - prev_start_time))
+            outs = train_exe.run(fetch_list=[v.name for v in fetch_list],
+                                 feed=feeder.feed(data))
+            stats = {k: np.array(v).mean() for k, v in zip(keys, outs[:-1])}
+            train_stats.update(stats)
+            logs = train_stats.log()
+            strs = '{}, iter: {}, lr: {:.5f}, {}, time: {:.3f}'.format(
+                now_time(), iter_id,
+                np.mean(outs[-1]), logs, start_time - prev_start_time)
+            print(strs)
             sys.stdout.flush()
             if (iter_id + 1) % cfg.TRAIN.snapshot_iter == 0:
                 save_model("model_iter{}".format(iter_id))
             if (iter_id + 1) == cfg.max_iter:
                 break
+        end_time = time.time()
+        total_time = end_time - start_time
+        last_loss = np.array(outs[0]).mean()
         # only for ce
         if cfg.enable_ce:
             gpu_num = devices_num
             epoch_idx = iter_id + 1
             loss = last_loss
             print("kpis\teach_pass_duration_card%s\t%s" %
-                    (gpu_num, total_time / epoch_idx))
-            print("kpis\ttrain_loss_card%s\t%s" %
-                    (gpu_num, loss))
+                  (gpu_num, total_time / epoch_idx))
+            print("kpis\ttrain_loss_card%s\t%s" % (gpu_num, loss))
 
         return np.mean(every_pass_loss)
 
diff --git a/fluid/PaddleCV/faster_rcnn/utility.py b/fluid/PaddleCV/rcnn/utility.py
similarity index 83%
rename from fluid/PaddleCV/faster_rcnn/utility.py
rename to fluid/PaddleCV/rcnn/utility.py
index f428de4c17ac9a6bd1600f52267d6718426adc78..7948bc13fb9c540a92603bca8f423d02aecf81c6 100644
--- a/fluid/PaddleCV/faster_rcnn/utility.py
+++ b/fluid/PaddleCV/rcnn/utility.py
@@ -22,7 +22,9 @@ import sys
 import distutils.util
 import numpy as np
 import six
+import collections
 from collections import deque
+import datetime
 from paddle.fluid import core
 import argparse
 import functools
@@ -85,6 +87,37 @@ class SmoothedValue(object):
         return np.median(self.deque)
 
 
+def now_time():
+    return datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')
+
+
+class TrainingStats(object):
+    def __init__(self, window_size, stats_keys):
+        self.smoothed_losses_and_metrics = {
+            key: SmoothedValue(window_size)
+            for key in stats_keys
+        }
+
+    def update(self, stats):
+        for k, v in self.smoothed_losses_and_metrics.items():
+            v.add_value(stats[k])
+
+    def get(self, extras=None):
+        stats = collections.OrderedDict()
+        if extras:
+            for k, v in extras.items():
+                stats[k] = v
+        for k, v in self.smoothed_losses_and_metrics.items():
+            stats[k] = round(v.get_median_value(), 3)
+
+        return stats
+
+    def log(self, extras=None):
+        d = self.get(extras)
+        strs = ', '.join(str(dict({x: y})).strip('{}') for x, y in d.items())
+        return strs
+
+
 def parse_args():
     """return all args
     """
@@ -108,14 +141,15 @@ def parse_args():
     add_arg('learning_rate',    float,  0.01,     "Learning rate.")
     add_arg('max_iter',         int,    180000,   "Iter number.")
     add_arg('log_window',       int,    20,        "Log smooth window, set 1 for debug, set 20 for train.")
-    # FAST RCNN
+    # RCNN
     # RPN
     add_arg('anchor_sizes',     int,    [32,64,128,256,512],  "The size of anchors.")
     add_arg('aspect_ratios',    float,  [0.5,1.0,2.0],    "The ratio of anchors.")
     add_arg('variance',         float,  [1.,1.,1.,1.],    "The variance of anchors.")
     add_arg('rpn_stride',       float,  [16.,16.],    "Stride of the feature map that RPN is attached.")
     add_arg('rpn_nms_thresh',    float,   0.7,          "NMS threshold used on RPN proposals")
-    # TRAIN TEST INFER
+    # TRAIN VAL INFER
+    add_arg('MASK_ON', bool, False, "Option for different models. If False, choose faster_rcnn. If True, choose mask_rcnn")
     add_arg('im_per_batch',       int,   1,        "Minibatch size.")
     add_arg('max_size',         int,   1333,    "The resized image height.")
     add_arg('scales', int,  [800],    "The resized image height.")
@@ -124,7 +158,6 @@ def parse_args():
     add_arg('nms_thresh',    float, 0.5,    "NMS threshold.")
     add_arg('score_thresh',    float, 0.05,    "score threshold for NMS.")
     add_arg('snapshot_stride',  int,    10000,    "save model every snapshot stride.")
-    add_arg('debug',            bool,   False,   "Debug mode")
     # SINGLE EVAL AND DRAW
     add_arg('draw_threshold',  float, 0.8,    "Confidence threshold to draw bbox.")
     add_arg('image_path',       str,   'dataset/coco/val2017',  "The image path used to inference and visualize.")
@@ -138,5 +171,5 @@ def parse_args():
     if 'train' in file_name or 'profile' in file_name:
         merge_cfg_from_args(args, 'train')
     else:
-        merge_cfg_from_args(args, 'test')
+        merge_cfg_from_args(args, 'val')
     return args
diff --git a/fluid/PaddleCV/video/.gitignore b/fluid/PaddleCV/video/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..7052bdda1c76c2ab1adebd204bdef9ebf1a39755
--- /dev/null
+++ b/fluid/PaddleCV/video/.gitignore
@@ -0,0 +1,5 @@
+checkpoints
+output*
+*.pyc
+*.swp
+*_result
diff --git a/fluid/PaddleCV/video/README.md b/fluid/PaddleCV/video/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..20a0993aee6a4cf6e8fbb72a4b41268e5ee46d87
--- /dev/null
+++ b/fluid/PaddleCV/video/README.md
@@ -0,0 +1,7 @@
+# VideoClassification
+Video Classification
+
+To run train:
+    bash ./scripts/train/train_${model_name}.sh
+To run test:
+    bash ./scripts/test/test_${model_name}.sh
diff --git a/fluid/PaddleCV/video/config.py b/fluid/PaddleCV/video/config.py
new file mode 100755
index 0000000000000000000000000000000000000000..a534536c35c9446ed7dd4139c831757654e02222
--- /dev/null
+++ b/fluid/PaddleCV/video/config.py
@@ -0,0 +1,58 @@
+#  Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+try:
+    from configparser import ConfigParser
+except:
+    from ConfigParser import ConfigParser
+
+from utils import AttrDict
+
+CONFIG_SECS = [
+        'train',
+        'valid',
+        'test',
+        'infer',
+        ]
+
+
+def parse_config(cfg_file):
+    parser = ConfigParser()
+    cfg = AttrDict()
+    parser.read(cfg_file)
+    for sec in parser.sections():
+        sec_dict = AttrDict()
+        for k, v in parser.items(sec):
+            try:
+                v = eval(v)
+            except:
+                pass
+            setattr(sec_dict, k, v)
+        setattr(cfg, sec.upper(), sec_dict)
+
+    return cfg
+
+def merge_configs(cfg, sec, args_dict):
+    assert sec in CONFIG_SECS, "invalid config section {}".format(sec)
+    sec_dict = getattr(cfg, sec.upper())
+    for k, v in args_dict.items():
+        if v is None:
+            continue
+        try:
+            if hasattr(sec_dict, k):
+                setattr(sec_dict, k, v)
+        except:
+            pass
+    return cfg
+
diff --git a/fluid/PaddleCV/video/configs/attention_cluster.txt b/fluid/PaddleCV/video/configs/attention_cluster.txt
new file mode 100755
index 0000000000000000000000000000000000000000..0ce7c4b213fe4008e9f02beeb124d11a0ec1f785
--- /dev/null
+++ b/fluid/PaddleCV/video/configs/attention_cluster.txt
@@ -0,0 +1,33 @@
+[MODEL]
+name = "AttentionCluster"
+dataset = "YouTube-8M"
+bone_network = None
+drop_rate = 0.5
+feature_num = 2
+feature_names = ['rgb', 'audio']
+feature_dims = [1024, 128]
+seg_num = 100
+cluster_nums = [32, 32]
+num_classes = 3862
+topk = 20
+
+[TRAIN]
+epoch = 5
+learning_rate = 0.001
+pretrain_base = None
+batch_size = 2048
+use_gpu = True
+num_gpus = 8
+filelist = "dataset/youtube8m/train.list"
+
+[VALID]
+batch_size = 2048
+filelist = "dataset/youtube8m/val.list"
+
+[TEST]
+batch_size = 256
+filelist = "dataset/youtube8m/test.list"
+
+[INFER]
+batch_size = 1
+filelist = "dataset/youtube8m/infer.list"
diff --git a/fluid/PaddleCV/video/configs/attention_lstm.txt b/fluid/PaddleCV/video/configs/attention_lstm.txt
new file mode 100755
index 0000000000000000000000000000000000000000..9154fe2c17282e1066f248a797b50ece080994e7
--- /dev/null
+++ b/fluid/PaddleCV/video/configs/attention_lstm.txt
@@ -0,0 +1,37 @@
+[MODEL]
+name = "AttentionLSTM"
+dataset = "YouTube-8M"
+bone_nework = None
+drop_rate = 0.5
+feature_num = 2
+feature_names = ['rgb', 'audio']
+feature_dims = [1024, 128]
+embedding_size = 512
+lstm_size = 1024
+num_classes = 3862
+topk = 20
+
+[TRAIN]
+epoch = 10
+learning_rate = 0.001
+decay_epochs = [5]
+decay_gamma = 0.1
+weight_decay = 0.0008
+num_samples = 5000000
+pretrain_base = None
+batch_size = 1024
+use_gpu = True
+num_gpus = 8
+filelist = "dataset/youtube8m/train.list"
+
+[VALID]
+batch_size = 1024
+filelist = "dataset/youtube8m/val.list"
+
+[TEST]
+batch_size = 128
+filelist = "dataset/youtube8m/test.list"
+
+[INFER]
+batch_size = 1
+filelist = "dataset/youtube8m/infer.list"
diff --git a/fluid/PaddleCV/video/configs/nextvlad.txt b/fluid/PaddleCV/video/configs/nextvlad.txt
new file mode 100755
index 0000000000000000000000000000000000000000..18779b1f2eaf78cf9db3c25d5fbd991e16e2ed54
--- /dev/null
+++ b/fluid/PaddleCV/video/configs/nextvlad.txt
@@ -0,0 +1,39 @@
+[MODEL]
+name = "NEXTVLAD"
+num_classes = 3862
+topk = 20
+video_feature_size = 1024
+audio_feature_size = 128
+cluster_size = 128
+hidden_size = 2048
+groups = 8
+expansion = 2
+drop_rate = 0.5
+gating_reduction = 8
+eigen_file = "./dataset/youtube8m/yt8m_pca/eigenvals.npy"
+
+[TRAIN]
+epoch = 6
+learning_rate = 0.0002
+lr_boundary_examples = 2000000
+max_iter = 700000
+learning_rate_decay = 0.8
+l2_penalty = 1e-5
+gradient_clip_norm = 1.0
+use_gpu = True
+num_gpus = 4
+batch_size = 160
+filelist = "./dataset/youtube8m/train.list"
+
+[VALID]
+batch_size = 160
+filelist = "./dataset/youtube8m/val.list"
+
+[TEST]
+batch_size = 40
+filelist = "./dataset/youtube8m/test.list"
+
+[INFER]
+batch_size = 1
+filelist = "./dataset/youtube8m/infer.list"
+
diff --git a/fluid/PaddleCV/video/configs/stnet.txt b/fluid/PaddleCV/video/configs/stnet.txt
new file mode 100755
index 0000000000000000000000000000000000000000..ff3e4ddd25202b0d75c4fb53425dfe41a8f4222a
--- /dev/null
+++ b/fluid/PaddleCV/video/configs/stnet.txt
@@ -0,0 +1,51 @@
+[MODEL]
+name = "STNET"
+format = "pkl"
+num_classes = 400
+seg_num = 7
+seglen = 5
+image_mean = [0.485, 0.456, 0.406]
+image_std = [0.229, 0.224, 0.225]
+num_layers = 50
+
+[TRAIN]
+epoch = 60
+short_size = 256
+target_size = 224
+num_reader_threads = 12
+buf_size = 1024
+batch_size = 128
+num_gpus = 8
+use_gpu = True
+filelist = "./dataset/kinetics/train.list"
+learning_rate = 0.01
+learning_rate_decay = 0.1
+l2_weight_decay = 1e-4
+momentum = 0.9
+total_videos = 224684
+pretrain_base = "./dataset/pretrained/ResNet50_pretrained"
+
+[VALID]
+short_size = 256
+target_size = 224
+num_reader_threads = 12
+buf_size = 1024
+batch_size = 128
+filelist = "./dataset/kinetics/val.list"
+
+[TEST]
+short_size = 256
+target_size = 256
+num_reader_threads = 12
+buf_size = 1024
+batch_size = 16
+filelist = "./dataset/kinetics/test.list"
+
+[INFER]
+short_size = 256
+target_size = 256
+num_reader_threads = 12
+buf_size = 1024
+batch_size = 1
+filelist = "./dataset/kinetics/infer.list"
+
diff --git a/fluid/PaddleCV/video/configs/tsn.txt b/fluid/PaddleCV/video/configs/tsn.txt
new file mode 100755
index 0000000000000000000000000000000000000000..bca5ff349a9792bb07b18c815d7f994419cb82f5
--- /dev/null
+++ b/fluid/PaddleCV/video/configs/tsn.txt
@@ -0,0 +1,50 @@
+[MODEL]
+name = "TSN"
+format = "pkl"
+num_classes = 400
+seg_num = 3
+seglen = 1
+image_mean = [0.485, 0.456, 0.406]
+image_std = [0.229, 0.224, 0.225]
+num_layers = 50
+
+[TRAIN]
+epoch = 45
+short_size = 256
+target_size = 224
+num_reader_threads = 12
+buf_size = 1024
+batch_size = 256
+use_gpu = True
+num_gpus = 8
+filelist = "./dataset/kinetics/train.list"
+learning_rate = 0.01
+learning_rate_decay = 0.1
+l2_weight_decay = 1e-4
+momentum = 0.9
+total_videos = 224684
+
+[VALID]
+short_size = 256
+target_size = 224
+num_reader_threads = 12
+buf_size = 1024
+batch_size = 256
+filelist = "./dataset/kinetics/val.list"
+
+[TEST]
+short_size = 256
+target_size = 224
+num_reader_threads = 12
+buf_size = 1024
+batch_size = 32
+filelist = "./dataset/kinetics/test.list"
+
+[INFER]
+short_size = 256
+target_size = 224
+num_reader_threads = 12
+buf_size = 1024
+batch_size = 1
+filelist = "./dataset/kinetics/infer.list"
+
diff --git a/fluid/PaddleCV/video/datareader/__init__.py b/fluid/PaddleCV/video/datareader/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..8f94515902de7768ea7d9a4c67a5f3c7595facaf
--- /dev/null
+++ b/fluid/PaddleCV/video/datareader/__init__.py
@@ -0,0 +1,12 @@
+from .reader_utils import regist_reader, get_reader
+from .feature_reader import FeatureReader
+from .kinetics_reader import KineticsReader
+from .nonlocal_reader import NonlocalReader
+
+regist_reader("ATTENTIONCLUSTER", FeatureReader)
+regist_reader("NEXTVLAD", FeatureReader)
+regist_reader("ATTENTIONLSTM", FeatureReader)
+regist_reader("TSN", KineticsReader)
+regist_reader("TSM", KineticsReader)
+regist_reader("STNET", KineticsReader)
+regist_reader("NONLOCAL", NonlocalReader)
diff --git a/fluid/PaddleCV/video/datareader/feature_reader.py b/fluid/PaddleCV/video/datareader/feature_reader.py
new file mode 100644
index 0000000000000000000000000000000000000000..9f465c09474446529dd6804a26d3c71204b2fcfa
--- /dev/null
+++ b/fluid/PaddleCV/video/datareader/feature_reader.py
@@ -0,0 +1,135 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import sys
+from .reader_utils import DataReader
+try:
+    import cPickle as pickle
+    from cStringIO import StringIO
+except ImportError:
+    import pickle
+    from io import BytesIO
+import numpy as np
+import random
+
+python_ver = sys.version_info
+
+
+class FeatureReader(DataReader):
+    """
+    Data reader for youtube-8M dataset, which was stored as features extracted by prior networks
+    This is for the three models: lstm, attention cluster, nextvlad
+
+    dataset cfg: num_classes
+                 batch_size
+                 list
+                 NextVlad only: eigen_file
+    """
+
+    def __init__(self, name, mode, cfg):
+        self.name = name
+        self.mode = mode
+        self.num_classes = cfg.MODEL.num_classes
+
+        # set batch size and file list
+        self.batch_size = cfg[mode.upper()]['batch_size']
+        self.filelist = cfg[mode.upper()]['filelist']
+        self.eigen_file = cfg.MODEL.get('eigen_file', None)
+        self.seg_num = cfg.MODEL.get('seg_num', None)
+
+    def create_reader(self):
+        fl = open(self.filelist).readlines()
+        fl = [line.strip() for line in fl if line.strip() != '']
+        if self.mode == 'train':
+            random.shuffle(fl)
+
+        def reader():
+            batch_out = []
+            for filepath in fl:
+                if python_ver < (3, 0):
+                    data = pickle.load(open(filepath, 'rb'))
+                else:
+                    data = pickle.load(open(filepath, 'rb'), encoding='bytes')
+                indexes = list(range(len(data)))
+                if self.mode == 'train':
+                    random.shuffle(indexes)
+                for i in indexes:
+                    record = data[i]
+                    nframes = record[b'nframes']
+                    rgb = record[b'feature'].astype(float)
+                    audio = record[b'audio'].astype(float)
+                    if self.mode != 'infer':
+                        label = record[b'label']
+                        one_hot_label = make_one_hot(label, self.num_classes)
+                    video = record[b'video']
+
+                    rgb = rgb[0:nframes, :]
+                    audio = audio[0:nframes, :]
+
+                    rgb = dequantize(
+                        rgb, max_quantized_value=2., min_quantized_value=-2.)
+                    audio = dequantize(
+                        audio, max_quantized_value=2, min_quantized_value=-2)
+
+                    if self.name == 'NEXTVLAD':
+                        # add the effect of eigen values
+                        eigen_file = self.eigen_file
+                        eigen_val = np.sqrt(np.load(eigen_file)
+                                            [:1024, 0]).astype(np.float32)
+                        eigen_val = eigen_val + 1e-4
+                        rgb = (rgb - 4. / 512) * eigen_val
+                    if self.name == 'ATTENTIONCLUSTER':
+                        sample_inds = generate_random_idx(rgb.shape[0],
+                                                          self.seg_num)
+                        rgb = rgb[sample_inds]
+                        audio = audio[sample_inds]
+                    if self.mode != 'infer':
+                        batch_out.append((rgb, audio, one_hot_label))
+                    else:
+                        batch_out.append((rgb, audio, video))
+                    if len(batch_out) == self.batch_size:
+                        yield batch_out
+                        batch_out = []
+
+        return reader
+
+
+def dequantize(feat_vector, max_quantized_value=2., min_quantized_value=-2.):
+    """
+    Dequantize the feature from the byte format to the float format
+    """
+
+    assert max_quantized_value > min_quantized_value
+    quantized_range = max_quantized_value - min_quantized_value
+    scalar = quantized_range / 255.0
+    bias = (quantized_range / 512.0) + min_quantized_value
+
+    return feat_vector * scalar + bias
+
+
+def make_one_hot(label, dim=3862):
+    one_hot_label = np.zeros(dim)
+    one_hot_label = one_hot_label.astype(float)
+    for ind in label:
+        one_hot_label[int(ind)] = 1
+    return one_hot_label
+
+
+def generate_random_idx(feature_len, seg_num):
+    idxs = []
+    stride = float(feature_len) / seg_num
+    for i in range(seg_num):
+        pos = (i + np.random.random()) * stride
+        idxs.append(min(feature_len - 1, int(pos)))
+    return idxs
diff --git a/fluid/PaddleCV/video/datareader/kinetics_reader.py b/fluid/PaddleCV/video/datareader/kinetics_reader.py
new file mode 100644
index 0000000000000000000000000000000000000000..c7bbf17241383ffc32778330db9ac78308683b46
--- /dev/null
+++ b/fluid/PaddleCV/video/datareader/kinetics_reader.py
@@ -0,0 +1,353 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import sys
+import math
+import random
+import functools
+try:
+    import cPickle as pickle
+    from cStringIO import StringIO
+except ImportError:
+    import pickle
+    from io import BytesIO
+import numpy as np
+import paddle
+from PIL import Image, ImageEnhance
+import logging
+
+from .reader_utils import DataReader
+
+logger = logging.getLogger(__name__)
+python_ver = sys.version_info
+
+
+class KineticsReader(DataReader):
+    """
+    Data reader for kinetics dataset of two format mp4 and pkl.
+    1. mp4, the original format of kinetics400
+    2. pkl, the mp4 was decoded previously and stored as pkl
+    In both case, load the data, and then get the frame data in the form of numpy and label as an integer.
+     dataset cfg: format
+                  num_classes
+                  seg_num
+                  short_size
+                  target_size
+                  num_reader_threads
+                  buf_size
+                  image_mean
+                  image_std
+                  batch_size
+                  list
+    """
+
+    def __init__(self, name, mode, cfg):
+        self.name = name
+        self.mode = mode
+        self.format = cfg.MODEL.format
+        self.num_classes = cfg.MODEL.num_classes
+        self.seg_num = cfg.MODEL.seg_num
+        self.seglen = cfg.MODEL.seglen
+        self.short_size = cfg[mode.upper()]['short_size']
+        self.target_size = cfg[mode.upper()]['target_size']
+        self.num_reader_threads = cfg[mode.upper()]['num_reader_threads']
+        self.buf_size = cfg[mode.upper()]['buf_size']
+
+        self.img_mean = np.array(cfg.MODEL.image_mean).reshape(
+            [3, 1, 1]).astype(np.float32)
+        self.img_std = np.array(cfg.MODEL.image_std).reshape(
+            [3, 1, 1]).astype(np.float32)
+        # set batch size and file list
+        self.batch_size = cfg[mode.upper()]['batch_size']
+        self.filelist = cfg[mode.upper()]['filelist']
+
+    def create_reader(self):
+        _reader = _reader_creator(self.filelist, self.mode, seg_num=self.seg_num, seglen = self.seglen, \
+                             short_size = self.short_size, target_size = self.target_size, \
+                             img_mean = self.img_mean, img_std = self.img_std, \
+                             shuffle = (self.mode == 'train'), \
+                             num_threads = self.num_reader_threads, \
+                             buf_size = self.buf_size, format = self.format)
+
+        def _batch_reader():
+            batch_out = []
+            for imgs, label in _reader():
+                if imgs is None:
+                    continue
+                batch_out.append((imgs, label))
+                if len(batch_out) == self.batch_size:
+                    yield batch_out
+                    batch_out = []
+
+        return _batch_reader
+
+
+def _reader_creator(pickle_list,
+                    mode,
+                    seg_num,
+                    seglen,
+                    short_size,
+                    target_size,
+                    img_mean,
+                    img_std,
+                    shuffle=False,
+                    num_threads=1,
+                    buf_size=1024,
+                    format='pkl'):
+    def reader():
+        with open(pickle_list) as flist:
+            lines = [line.strip() for line in flist]
+            if shuffle:
+                random.shuffle(lines)
+            for line in lines:
+                pickle_path = line.strip()
+                yield [pickle_path]
+
+    if format == 'pkl':
+        decode_func = decode_pickle
+    elif format == 'mp4':
+        decode_func = decode_mp4
+    else:
+        raise "Not implemented format {}".format(format)
+
+    mapper = functools.partial(
+        decode_func,
+        mode=mode,
+        seg_num=seg_num,
+        seglen=seglen,
+        short_size=short_size,
+        target_size=target_size,
+        img_mean=img_mean,
+        img_std=img_std)
+
+    return paddle.reader.xmap_readers(mapper, reader, num_threads, buf_size)
+
+
+def decode_mp4(sample, mode, seg_num, seglen, short_size, target_size, img_mean,
+               img_std):
+    sample = sample[0].split(' ')
+    mp4_path = sample[0]
+    # when infer, we store vid as label
+    label = int(sample[1])
+    try:
+        imgs = mp4_loader(mp4_path, seg_num, seglen, mode)
+        if len(imgs) < 1:
+            logger.error('{} frame length {} less than 1.'.format(mp4_path,
+                                                                  len(imgs)))
+            return None, None
+    except:
+        logger.error('Error when loading {}'.format(mp4_path))
+        return None, None
+
+    return imgs_transform(imgs, label, mode, seg_num, seglen, \
+                 short_size, target_size, img_mean, img_std)
+
+
+def decode_pickle(sample, mode, seg_num, seglen, short_size, target_size,
+                  img_mean, img_std):
+    pickle_path = sample[0]
+    try:
+        if python_ver < (3, 0):
+            data_loaded = pickle.load(open(pickle_path, 'rb'))
+        else:
+            data_loaded = pickle.load(open(pickle_path, 'rb'), encoding='bytes')
+
+        vid, label, frames = data_loaded
+        if len(frames) < 1:
+            logger.error('{} frame length {} less than 1.'.format(pickle_path,
+                                                                  len(frames)))
+            return None, None
+    except:
+        logger.info('Error when loading {}'.format(pickle_path))
+        return None, None
+
+    if mode == 'train' or mode == 'valid' or mode == 'test':
+        ret_label = label
+    elif mode == 'infer':
+        ret_label = vid
+
+    imgs = video_loader(frames, seg_num, seglen, mode)
+    return imgs_transform(imgs, ret_label, mode, seg_num, seglen, \
+                 short_size, target_size, img_mean, img_std)
+
+
+def imgs_transform(imgs, label, mode, seg_num, seglen, short_size, target_size,
+                   img_mean, img_std):
+    imgs = group_scale(imgs, short_size)
+
+    if mode == 'train':
+        imgs = group_random_crop(imgs, target_size)
+        imgs = group_random_flip(imgs)
+    else:
+        imgs = group_center_crop(imgs, target_size)
+
+    np_imgs = (np.array(imgs[0]).astype('float32').transpose(
+        (2, 0, 1))).reshape(1, 3, target_size, target_size) / 255
+    for i in range(len(imgs) - 1):
+        img = (np.array(imgs[i + 1]).astype('float32').transpose(
+            (2, 0, 1))).reshape(1, 3, target_size, target_size) / 255
+        np_imgs = np.concatenate((np_imgs, img))
+    imgs = np_imgs
+    imgs -= img_mean
+    imgs /= img_std
+    imgs = np.reshape(imgs, (seg_num, seglen * 3, target_size, target_size))
+
+    return imgs, label
+
+
+def group_random_crop(img_group, target_size):
+    w, h = img_group[0].size
+    th, tw = target_size, target_size
+
+    assert (w >= target_size) and (h >= target_size), \
+          "image width({}) and height({}) should be larger than crop size".format(w, h, target_size)
+
+    out_images = []
+    x1 = random.randint(0, w - tw)
+    y1 = random.randint(0, h - th)
+
+    for img in img_group:
+        if w == tw and h == th:
+            out_images.append(img)
+        else:
+            out_images.append(img.crop((x1, y1, x1 + tw, y1 + th)))
+
+    return out_images
+
+
+def group_random_flip(img_group):
+    v = random.random()
+    if v < 0.5:
+        ret = [img.transpose(Image.FLIP_LEFT_RIGHT) for img in img_group]
+        return ret
+    else:
+        return img_group
+
+
+def group_center_crop(img_group, target_size):
+    img_crop = []
+    for img in img_group:
+        w, h = img.size
+        th, tw = target_size, target_size
+        assert (w >= target_size) and (h >= target_size), \
+             "image width({}) and height({}) should be larger than crop size".format(w, h, target_size)
+        x1 = int(round((w - tw) / 2.))
+        y1 = int(round((h - th) / 2.))
+        img_crop.append(img.crop((x1, y1, x1 + tw, y1 + th)))
+
+    return img_crop
+
+
+def group_scale(imgs, target_size):
+    resized_imgs = []
+    for i in range(len(imgs)):
+        img = imgs[i]
+        w, h = img.size
+        if (w <= h and w == target_size) or (h <= w and h == target_size):
+            resized_imgs.append(img)
+            continue
+
+        if w < h:
+            ow = target_size
+            oh = int(target_size * 4.0 / 3.0)
+            resized_imgs.append(img.resize((ow, oh), Image.BILINEAR))
+        else:
+            oh = target_size
+            ow = int(target_size * 4.0 / 3.0)
+            resized_imgs.append(img.resize((ow, oh), Image.BILINEAR))
+
+    return resized_imgs
+
+
+def imageloader(buf):
+    if isinstance(buf, str):
+        img = Image.open(StringIO(buf))
+    else:
+        img = Image.open(BytesIO(buf))
+
+    return img.convert('RGB')
+
+
+def video_loader(frames, nsample, seglen, mode):
+    videolen = len(frames)
+    average_dur = int(videolen / nsample)
+
+    imgs = []
+    for i in range(nsample):
+        idx = 0
+        if mode == 'train':
+            if average_dur >= seglen:
+                idx = random.randint(0, average_dur - seglen)
+                idx += i * average_dur
+            elif average_dur >= 1:
+                idx += i * average_dur
+            else:
+                idx = i
+        else:
+            if average_dur >= seglen:
+                idx = (average_dur - seglen) // 2
+                idx += i * average_dur
+            elif average_dur >= 1:
+                idx += i * average_dur
+            else:
+                idx = i
+
+        for jj in range(idx, idx + seglen):
+            imgbuf = frames[int(jj % videolen)]
+            img = imageloader(imgbuf)
+            imgs.append(img)
+
+    return imgs
+
+
+def mp4_loader(filepath, nsample, seglen, mode):
+    cap = cv2.VideoCapture(filepath)
+    videolen = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+    average_dur = int(videolen / nsample)
+    sampledFrames = []
+    for i in range(videolen):
+        ret, frame = cap.read()
+        # maybe first frame is empty
+        if ret == False:
+            continue
+        img = frame[:, :, ::-1]
+        sampledFrames.append(img)
+
+    imgs = []
+    for i in range(nsample):
+        idx = 0
+        if mode == 'train':
+            if average_dur >= seglen:
+                idx = random.randint(0, average_dur - seglen)
+                idx += i * average_dur
+            elif average_dur >= 1:
+                idx += i * average_dur
+            else:
+                idx = i
+        else:
+            if average_dur >= seglen:
+                idx = (average_dur - 1) // 2
+                idx += i * average_dur
+            elif average_dur >= 1:
+                idx += i * average_dur
+            else:
+                idx = i
+
+        for jj in range(idx, idx + seglen):
+            imgbuf = sampledFrames[int(jj % videolen)]
+            img = Image.fromarray(imgbuf, mode='RGB')
+            imgs.append(img)
+
+    return imgs
diff --git a/fluid/PaddleCV/video/datareader/nonlocal_reader.py b/fluid/PaddleCV/video/datareader/nonlocal_reader.py
new file mode 100644
index 0000000000000000000000000000000000000000..8edb5ac2ab7fa279e6f04830c8543b9846f23b38
--- /dev/null
+++ b/fluid/PaddleCV/video/datareader/nonlocal_reader.py
@@ -0,0 +1,338 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import random
+import time
+import multiprocessing
+import numpy as np
+import cv2
+import logging
+
+from .reader_utils import DataReader
+
+logger = logging.getLogger(__name__)
+
+
+class NonlocalReader(DataReader):
+    """
+    Data reader for kinetics dataset, which read mp4 file and decode into numpy.
+    This is for nonlocal neural network model.
+          cfg:  num_classes
+                num_reader_threads
+                image_mean
+                image_std
+                batch_size
+                list
+                crop_size
+                sample_rate
+                video_length
+                jitter_scales
+                Test only cfg: num_test_clips
+                               use_multi_crop
+    """
+
+    def __init__(self, name, mode, cfg):
+        self.name = name
+        self.mode = mode
+        self.cfg = cfg
+
+    def create_reader(self):
+        cfg = self.cfg
+        mode = self.mode
+        num_reader_threads = cfg[mode.upper()]['num_reader_threads']
+        assert num_reader_threads >=1, \
+                "number of reader threads({}) should be a positive integer".format(num_reader_threads)
+        if num_reader_threads == 1:
+            reader_func = make_reader
+        else:
+            reader_func = make_multi_reader
+
+        dataset_args = {}
+        dataset_args['image_mean'] = cfg.MODEL.image_mean
+        dataset_args['image_std'] = cfg.MODEL.image_std
+        dataset_args['crop_size'] = cfg[mode.upper()]['crop_size']
+        dataset_args['sample_rate'] = cfg[mode.upper()]['sample_rate']
+        dataset_args['video_length'] = cfg[mode.upper()]['video_length']
+        dataset_args['min_size'] = cfg[mode.upper()]['jitter_scales'][0]
+        dataset_args['max_size'] = cfg[mode.upper()]['jitter_scales'][1]
+        dataset_args['num_reader_threads'] = num_reader_threads
+        filelist = cfg[mode.upper()]['list']
+        batch_size = cfg[mode.upper()]['batch_size']
+
+        if self.mode == 'train':
+            sample_times = 1
+            return reader_func(filelist, batch_size, sample_times, True, True,
+                               **dataset_args)
+        elif self.mode == 'valid':
+            sample_times = 1
+            return reader_func(filelist, batch_size, sample_times, False, False,
+                               **dataset_args)
+        elif self.mode == 'test':
+            sample_times = cfg['TEST']['num_test_clips']
+            if cfg['TEST']['use_multi_crop'] == 1:
+                sample_times = int(sample_times / 3)
+            if cfg['TEST']['use_multi_crop'] == 2:
+                sample_times = int(sample_times / 6)
+            return reader_func(filelist, batch_size, sample_times, False, False,
+                               **dataset_args)
+        else:
+            logger.info('Not implemented')
+            raise NotImplementedError
+
+
+def video_fast_get_frame(video_path,
+                         sampling_rate=1,
+                         length=64,
+                         start_frm=-1,
+                         sample_times=1):
+    cap = cv2.VideoCapture(video_path)
+    frame_cnt = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+
+    sampledFrames = []
+
+    video_output = np.ndarray(shape=[length, height, width, 3], dtype=np.uint8)
+
+    use_start_frm = start_frm
+    if start_frm < 0:
+        if (frame_cnt - length * sampling_rate > 0):
+            use_start_frm = random.randint(0,
+                                           frame_cnt - length * sampling_rate)
+        else:
+            use_start_frm = 0
+    else:
+        frame_gaps = float(frame_cnt) / float(sample_times)
+        use_start_frm = int(frame_gaps * start_frm) % frame_cnt
+
+    for i in range(frame_cnt):
+        ret, frame = cap.read()
+        # maybe first frame is empty
+        if ret == False:
+            continue
+        img = frame[:, :, ::-1]
+        sampledFrames.append(img)
+
+    for idx in range(length):
+        i = use_start_frm + idx * sampling_rate
+        i = i % len(sampledFrames)
+        video_output[idx] = sampledFrames[i]
+
+    cap.release()
+    return video_output
+
+
+def apply_resize(rgbdata, min_size, max_size):
+    length, height, width, channel = rgbdata.shape
+    ratio = 1.0
+    # generate random scale between [min_size, max_size]
+    if min_size == max_size:
+        side_length = min_size
+    else:
+        side_length = np.random.randint(min_size, max_size)
+    if height > width:
+        ratio = float(side_length) / float(width)
+    else:
+        ratio = float(side_length) / float(height)
+    out_height = int(height * ratio)
+    out_width = int(width * ratio)
+    outdata = np.zeros(
+        (length, out_height, out_width, channel), dtype=rgbdata.dtype)
+    for i in range(length):
+        outdata[i] = cv2.resize(rgbdata[i], (out_width, out_height))
+    return outdata
+
+
+def crop_mirror_transform(rgbdata,
+                          mean,
+                          std,
+                          cropsize=224,
+                          use_mirror=True,
+                          center_crop=False,
+                          spatial_pos=-1):
+    channel, length, height, width = rgbdata.shape
+    assert height >= cropsize, "crop size should not be larger than video height"
+    assert width >= cropsize, "crop size should not be larger than video width"
+    # crop to specific scale
+    if center_crop:
+        h_off = int((height - cropsize) / 2)
+        w_off = int((width - cropsize) / 2)
+        if spatial_pos >= 0:
+            now_pos = spatial_pos % 3
+            if h_off > 0:
+                h_off = h_off * now_pos
+            else:
+                w_off = w_off * now_pos
+    else:
+        h_off = np.random.randint(0, height - cropsize)
+        w_off = np.random.randint(0, width - cropsize)
+    outdata = np.zeros(
+        (channel, length, cropsize, cropsize), dtype=rgbdata.dtype)
+    outdata[:, :, :, :] = rgbdata[:, :, h_off:h_off + cropsize, w_off:w_off +
+                                  cropsize]
+    # apply mirror
+    mirror_indicator = (np.random.rand() > 0.5)
+    mirror_me = use_mirror and mirror_indicator
+    if spatial_pos > 0:
+        mirror_me = (int(spatial_pos / 3) > 0)
+    if mirror_me:
+        outdata = outdata[:, :, :, ::-1]
+    # substract mean and divide std
+    outdata = outdata.astype(np.float32)
+    outdata = (outdata - mean) / std
+    return outdata
+
+
+def make_reader(filelist, batch_size, sample_times, is_training, shuffle,
+                **dataset_args):
+    # should add smaple_times param
+    fl = open(filelist).readlines()
+    fl = [line.strip() for line in fl if line.strip() != '']
+
+    if shuffle:
+        random.shuffle(fl)
+
+    def reader():
+        batch_out = []
+        for line in fl:
+            # start_time = time.time()
+            line_items = line.split(' ')
+            fn = line_items[0]
+            label = int(line_items[1])
+            if len(line_items) > 2:
+                start_frm = int(line_items[2])
+                spatial_pos = int(line_items[3])
+                in_sample_times = sample_times
+            else:
+                start_frm = -1
+                spatial_pos = -1
+                in_sample_times = 1
+            label = np.array([label]).astype(np.int64)
+            # 1, get rgb data for fixed length of frames
+            try:
+                rgbdata = video_fast_get_frame(fn, \
+                             sampling_rate = dataset_args['sample_rate'], length = dataset_args['video_length'], \
+                             start_frm = start_frm, sample_times = in_sample_times)
+            except:
+                logger.info('Error when loading {}, just skip this file'.format(
+                    fn))
+                continue
+            # add prepocessing
+            # 2, reszie to randomly scale between [min_size, max_size] when training, or cgf.TEST.SCALE when inference
+            min_size = dataset_args['min_size']
+            max_size = dataset_args['max_size']
+            rgbdata = apply_resize(rgbdata, min_size, max_size)
+            # transform [length, height, width, channel] to [channel, length, height, width]
+            rgbdata = np.transpose(rgbdata, [3, 0, 1, 2])
+
+            # 3 crop, mirror and transform
+            rgbdata = crop_mirror_transform(rgbdata, mean = dataset_args['image_mean'], \
+                             std = dataset_args['image_std'], cropsize = dataset_args['crop_size'], \
+                             use_mirror = is_training, center_crop = (not is_training), \
+                             spatial_pos = spatial_pos)
+
+            batch_out.append((rgbdata, label))
+            if len(batch_out) == batch_size:
+                yield batch_out
+                batch_out = []
+
+    return reader
+
+
+def make_multi_reader(filelist, batch_size, sample_times, is_training, shuffle,
+                      **dataset_args):
+    fl = open(filelist).readlines()
+    fl = [line.strip() for line in fl if line.strip() != '']
+
+    if shuffle:
+        random.shuffle(fl)
+
+    n = dataset_args['num_reader_threads']
+    queue_size = 20
+    reader_lists = [None] * n
+    file_num = int(len(fl) // n)
+    for i in range(n):
+        if i < len(reader_lists) - 1:
+            tmp_list = fl[i * file_num:(i + 1) * file_num]
+        else:
+            tmp_list = fl[i * file_num:]
+        reader_lists[i] = tmp_list
+
+    def read_into_queue(flq, queue):
+        batch_out = []
+        for line in flq:
+            line_items = line.split(' ')
+            fn = line_items[0]
+            label = int(line_items[1])
+            if len(line_items) > 2:
+                start_frm = int(line_items[2])
+                spatial_pos = int(line_items[3])
+                in_sample_times = sample_times
+            else:
+                start_frm = -1
+                spatial_pos = -1
+                in_sample_times = 1
+            label = np.array([label]).astype(np.int64)
+            # 1, get rgb data for fixed length of frames
+            try:
+                rgbdata = video_fast_get_frame(fn, \
+                             sampling_rate = dataset_args['sample_rate'], length = dataset_args['video_length'], \
+                             start_frm = start_frm, sample_times = in_sample_times)
+            except:
+                logger.info('Error when loading {}, just skip this file'.format(
+                    fn))
+                continue
+            # add prepocessing
+            # 2, reszie to randomly scale between [min_size, max_size] when training, or cgf.TEST.SCALE when inference
+            min_size = dataset_args['min_size']
+            max_size = dataset_args['max_size']
+            rgbdata = apply_resize(rgbdata, min_size, max_size)
+            # transform [length, height, width, channel] to [channel, length, height, width]
+            rgbdata = np.transpose(rgbdata, [3, 0, 1, 2])
+
+            # 3 crop, mirror and transform
+            rgbdata = crop_mirror_transform(rgbdata, mean = dataset_args['image_mean'], \
+                             std = dataset_args['image_std'], cropsize = dataset_args['crop_size'], \
+                             use_mirror = is_training, center_crop = (not is_training), \
+                             spatial_pos = spatial_pos)
+
+            batch_out.append((rgbdata, label))
+            if len(batch_out) == batch_size:
+                queue.put(batch_out)
+                batch_out = []
+        queue.put(None)
+
+    def queue_reader():
+        queue = multiprocessing.Queue(queue_size)
+        p_list = [None] * len(reader_lists)
+        # for reader_list in reader_lists:
+        for i in range(len(reader_lists)):
+            reader_list = reader_lists[i]
+            p_list[i] = multiprocessing.Process(
+                target=read_into_queue, args=(reader_list, queue))
+            p_list[i].start()
+        reader_num = len(reader_lists)
+        finish_num = 0
+        while finish_num < reader_num:
+            sample = queue.get()
+            if sample is None:
+                finish_num += 1
+            else:
+                yield sample
+        for i in range(len(p_list)):
+            p_list[i].terminate()
+            p_list[i].join()
+
+    return queue_reader
diff --git a/fluid/PaddleCV/video/datareader/reader_utils.py b/fluid/PaddleCV/video/datareader/reader_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..4c8b436a74335c9b3b3361947123b1a3bb3d43dd
--- /dev/null
+++ b/fluid/PaddleCV/video/datareader/reader_utils.py
@@ -0,0 +1,75 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import pickle
+import cv2
+import numpy as np
+import random
+
+
+class ReaderNotFoundError(Exception):
+    "Error: reader not found"
+
+    def __init__(self, reader_name, avail_readers):
+        super(ReaderNotFoundError, self).__init__()
+        self.reader_name = reader_name
+        self.avail_readers = avail_readers
+
+    def __str__(self):
+        msg = "Reader {} Not Found.\nAvailiable readers:\n".format(
+            self.reader_name)
+        for reader in self.avail_readers:
+            msg += "  {}\n".format(reader)
+        return msg
+
+
+class DataReader(object):
+    """data reader for video input"""
+
+    def __init__(self, model_name, mode, cfg):
+        """Not implemented"""
+        pass
+
+    def create_reader(self):
+        """Not implemented"""
+        pass
+
+
+class ReaderZoo(object):
+    def __init__(self):
+        self.reader_zoo = {}
+
+    def regist(self, name, reader):
+        assert reader.__base__ == DataReader, "Unknow model type {}".format(
+            type(reader))
+        self.reader_zoo[name] = reader
+
+    def get(self, name, mode, cfg):
+        for k, v in self.reader_zoo.items():
+            if k == name:
+                return v(name, mode, cfg)
+        raise ReaderNotFoundError(name, self.reader_zoo.keys())
+
+
+# singleton reader_zoo
+reader_zoo = ReaderZoo()
+
+
+def regist_reader(name, reader):
+    reader_zoo.regist(name, reader)
+
+
+def get_reader(name, mode, cfg):
+    reader_model = reader_zoo.get(name, mode, cfg)
+    return reader_model.create_reader()
diff --git a/fluid/PaddleCV/video/dataset/kinetics/README.md b/fluid/PaddleCV/video/dataset/kinetics/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..25eaee375dd126cd58f1188ed348619a7675f513
--- /dev/null
+++ b/fluid/PaddleCV/video/dataset/kinetics/README.md
@@ -0,0 +1,5 @@
+1. download kinetics-400_train.csv and kinetics-400_val.csv
+2. ffmpeg is required to decode mp4
+3. transfer mp4 video to pkl file, with each pkl stores [video_id, images, label]
+   python generate_label.py kinetics-400_train.csv kinetics400_label.txt # generate label file
+   python video2pkl.py kinetics-400_train.csv $Source_dir $Target_dir $NUM_THREADS
diff --git a/fluid/PaddleCV/video/dataset/kinetics/generate_label.py b/fluid/PaddleCV/video/dataset/kinetics/generate_label.py
new file mode 100644
index 0000000000000000000000000000000000000000..4f7c504c56821527cde57bacf7e9a2d07c666c8f
--- /dev/null
+++ b/fluid/PaddleCV/video/dataset/kinetics/generate_label.py
@@ -0,0 +1,31 @@
+import sys
+
+# kinetics-400_train.csv should be down loaded first and set as sys.argv[1]
+# sys.argv[2] can be set as kinetics400_label.txt
+# python generate_label.py kinetics-400_train.csv kinetics400_label.txt
+
+num_classes = 400
+
+fname = sys.argv[1]
+outname = sys.argv[2]
+fl = open(fname).readlines()
+fl = fl[1:]
+outf = open(outname, 'w')
+
+label_list = []
+for line in fl:
+    label = line.strip().split(',')[0].strip('"')
+    if label in label_list:
+        continue
+    else:
+        label_list.append(label)
+
+assert len(label_list
+           ) == num_classes, "there should be {} labels in list, but ".format(
+               num_classes, len(label_list))
+
+label_list.sort()
+for i in range(num_classes):
+    outf.write('{} {}'.format(label_list[i], i) + '\n')
+
+outf.close()
diff --git a/fluid/PaddleCV/video/dataset/kinetics/video2pkl.py b/fluid/PaddleCV/video/dataset/kinetics/video2pkl.py
new file mode 100644
index 0000000000000000000000000000000000000000..881857c40c4ece2f192e681526e2622ef1ce2f81
--- /dev/null
+++ b/fluid/PaddleCV/video/dataset/kinetics/video2pkl.py
@@ -0,0 +1,84 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import sys
+import glob
+import cPickle
+from multiprocessing import Pool
+
+# example command line: python generate_k400_pkl.py kinetics-400_train.csv 8
+# 
+# kinetics-400_train.csv is the training set file of K400 official release
+# each line contains laebl,youtube_id,time_start,time_end,split,is_cc
+
+assert (len(sys.argv) == 5)
+
+f = open(sys.argv[1])
+source_dir = sys.argv[2]
+target_dir = sys.argv[3]
+num_threads = sys.argv[4]
+all_video_entries = [x.strip().split(',') for x in f.readlines()]
+all_video_entries = all_video_entries[1:]
+f.close()
+
+category_label_map = {}
+f = open('kinetics400_label.txt')
+for line in f:
+    ens = line.strip().split(' ')
+    category = " ".join(ens[0:-1])
+    label = int(ens[-1])
+    category_label_map[category] = label
+f.close()
+
+
+def generate_pkl(entry):
+    mode = entry[4]
+    category = entry[0].strip('"')
+    category_dir = category
+    video_path = os.path.join(
+        './',
+        entry[1] + "_%06d" % int(entry[2]) + "_%06d" % int(entry[3]) + ".mp4")
+    video_path = os.path.join(source_dir, category_dir, video_path)
+    label = category_label_map[category]
+
+    vid = './' + video_path.split('/')[-1].split('.')[0]
+    if os.path.exists(video_path):
+        if not os.path.exists(vid):
+            os.makedirs(vid)
+        os.system('ffmpeg -i ' + video_path + ' -q 0 ' + vid + '/%06d.jpg')
+    else:
+        print("File not exists {}".format(video_path))
+        return
+
+    images = sorted(glob.glob(vid + '/*.jpg'))
+    ims = []
+    for img in images:
+        f = open(img)
+        ims.append(f.read())
+        f.close()
+
+    output_pkl = vid + ".pkl"
+    output_pkl = os.path.join(target_dir, output_pkl)
+    f = open(output_pkl, 'w')
+    cPickle.dump((vid, label, ims), f, -1)
+    f.close()
+
+    os.system('rm -rf %s' % vid)
+
+
+pool = Pool(processes=int(sys.argv[4]))
+pool.map(generate_pkl, all_video_entries)
+pool.close()
+pool.join()
diff --git a/fluid/PaddleCV/video/dataset/youtube8m/README.md b/fluid/PaddleCV/video/dataset/youtube8m/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e9f2d2c9a617c55a5ab4f48752057d0baf03b723
--- /dev/null
+++ b/fluid/PaddleCV/video/dataset/youtube8m/README.md
@@ -0,0 +1,2 @@
+1. Tensorflow is required to process tfrecords
+2. python tf2pkl.py $Source_dir $Target_dir
diff --git a/fluid/PaddleCV/video/dataset/youtube8m/tf2pkl.py b/fluid/PaddleCV/video/dataset/youtube8m/tf2pkl.py
new file mode 100644
index 0000000000000000000000000000000000000000..3b32e3b41a705d6e294581ca3b92c911d238798f
--- /dev/null
+++ b/fluid/PaddleCV/video/dataset/youtube8m/tf2pkl.py
@@ -0,0 +1,278 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+"""Provides readers configured for different datasets."""
+import os, sys
+import numpy as np
+import tensorflow as tf
+from tensorflow import logging
+import cPickle
+
+from tensorflow.python.platform import gfile
+
+assert (len(sys.argv) == 3)
+source_dir = sys.argv[1]
+target_dir = sys.argv[2]
+
+
+def Dequantize(feat_vector, max_quantized_value=2, min_quantized_value=-2):
+    """Dequantize the feature from the byte format to the float format.
+
+    Args:
+    feat_vector: the input 1-d vector.
+    max_quantized_value: the maximum of the quantized value.
+    min_quantized_value: the minimum of the quantized value.
+
+    Returns:
+    A float vector which has the same shape as feat_vector.
+    """
+    assert max_quantized_value > min_quantized_value
+    quantized_range = max_quantized_value - min_quantized_value
+    scalar = quantized_range / 255.0
+    bias = (quantized_range / 512.0) + min_quantized_value
+    return feat_vector * scalar + bias
+
+
+def resize_axis(tensor, axis, new_size, fill_value=0):
+    """Truncates or pads a tensor to new_size on on a given axis.
+
+    Truncate or extend tensor such that tensor.shape[axis] == new_size. If the
+    size increases, the padding will be performed at the end, using fill_value.
+
+    Args:
+      tensor: The tensor to be resized.
+      axis: An integer representing the dimension to be sliced.
+      new_size: An integer or 0d tensor representing the new value for
+        tensor.shape[axis].
+      fill_value: Value to use to fill any new entries in the tensor. Will be
+        cast to the type of tensor.
+
+    Returns:
+      The resized tensor.
+    """
+    tensor = tf.convert_to_tensor(tensor)
+    shape = tf.unstack(tf.shape(tensor))
+
+    pad_shape = shape[:]
+    pad_shape[axis] = tf.maximum(0, new_size - shape[axis])
+
+    shape[axis] = tf.minimum(shape[axis], new_size)
+    shape = tf.stack(shape)
+
+    resized = tf.concat([
+        tf.slice(tensor, tf.zeros_like(shape), shape),
+        tf.fill(tf.stack(pad_shape), tf.cast(fill_value, tensor.dtype))
+    ], axis)
+
+    # Update shape.
+    new_shape = tensor.get_shape().as_list()  # A copy is being made.
+    new_shape[axis] = new_size
+    resized.set_shape(new_shape)
+    return resized
+
+
+class BaseReader(object):
+    """Inherit from this class when implementing new readers."""
+
+    def prepare_reader(self, unused_filename_queue):
+        """Create a thread for generating prediction and label tensors."""
+        raise NotImplementedError()
+
+
+class YT8MFrameFeatureReader(BaseReader):
+    """Reads TFRecords of SequenceExamples.
+
+    The TFRecords must contain SequenceExamples with the sparse in64 'labels'
+    context feature and a fixed length byte-quantized feature vector, obtained
+    from the features in 'feature_names'. The quantized features will be mapped
+    back into a range between min_quantized_value and max_quantized_value.
+    """
+
+    def __init__(self,
+                 num_classes=3862,
+                 feature_sizes=[1024],
+                 feature_names=["inc3"],
+                 max_frames=300):
+        """Construct a YT8MFrameFeatureReader.
+
+        Args:
+          num_classes: a positive integer for the number of classes.
+          feature_sizes: positive integer(s) for the feature dimensions as a list.
+          feature_names: the feature name(s) in the tensorflow record as a list.
+          max_frames: the maximum number of frames to process.
+        """
+
+        assert len(feature_names) == len(feature_sizes), \
+        "length of feature_names (={}) != length of feature_sizes (={})".format( \
+        len(feature_names), len(feature_sizes))
+
+        self.num_classes = num_classes
+        self.feature_sizes = feature_sizes
+        self.feature_names = feature_names
+        self.max_frames = max_frames
+
+    def get_video_matrix(self, features, feature_size, max_frames,
+                         max_quantized_value, min_quantized_value):
+        """Decodes features from an input string and quantizes it.
+
+        Args:
+          features: raw feature values
+          feature_size: length of each frame feature vector
+          max_frames: number of frames (rows) in the output feature_matrix
+          max_quantized_value: the maximum of the quantized value.
+          min_quantized_value: the minimum of the quantized value.
+
+        Returns:
+          feature_matrix: matrix of all frame-features
+          num_frames: number of frames in the sequence
+        """
+        decoded_features = tf.reshape(
+            tf.cast(tf.decode_raw(features, tf.uint8), tf.float32),
+            [-1, feature_size])
+
+        num_frames = tf.minimum(tf.shape(decoded_features)[0], max_frames)
+
+        feature_matrix = decoded_features
+
+        return feature_matrix, num_frames
+
+    def prepare_reader(self,
+                       filename_queue,
+                       max_quantized_value=2,
+                       min_quantized_value=-2):
+        """Creates a single reader thread for YouTube8M SequenceExamples.
+
+        Args:
+          filename_queue: A tensorflow queue of filename locations.
+          max_quantized_value: the maximum of the quantized value.
+          min_quantized_value: the minimum of the quantized value.
+
+        Returns:
+          A tuple of video indexes, video features, labels, and padding data.
+        """
+        reader = tf.TFRecordReader()
+        _, serialized_example = reader.read(filename_queue)
+
+        contexts, features = tf.parse_single_sequence_example(
+            serialized_example,
+            context_features={
+                "id": tf.FixedLenFeature([], tf.string),
+                "labels": tf.VarLenFeature(tf.int64)
+            },
+            sequence_features={
+                feature_name: tf.FixedLenSequenceFeature(
+                    [], dtype=tf.string)
+                for feature_name in self.feature_names
+            })
+
+        # read ground truth labels
+        labels = (tf.cast(
+            tf.sparse_to_dense(
+                contexts["labels"].values, (self.num_classes, ),
+                1,
+                validate_indices=False),
+            tf.bool))
+
+        # loads (potentially) different types of features and concatenates them
+        num_features = len(self.feature_names)
+        assert num_features > 0, "No feature selected: feature_names is empty!"
+
+        assert len(self.feature_names) == len(self.feature_sizes), \
+        "length of feature_names (={}) != length of feature_sizes (={})".format( \
+        len(self.feature_names), len(self.feature_sizes))
+
+        num_frames = -1  # the number of frames in the video
+        feature_matrices = [None
+                            ] * num_features  # an array of different features
+
+        for feature_index in range(num_features):
+            feature_matrix, num_frames_in_this_feature = self.get_video_matrix(
+                features[self.feature_names[feature_index]],
+                self.feature_sizes[feature_index], self.max_frames,
+                max_quantized_value, min_quantized_value)
+            if num_frames == -1:
+                num_frames = num_frames_in_this_feature
+            #else:
+            #  tf.assert_equal(num_frames, num_frames_in_this_feature)
+
+            feature_matrices[feature_index] = feature_matrix
+
+        # cap the number of frames at self.max_frames
+        num_frames = tf.minimum(num_frames, self.max_frames)
+
+        # concatenate different features
+        video_matrix = feature_matrices[0]
+        audio_matrix = feature_matrices[1]
+
+        return contexts["id"], video_matrix, audio_matrix, labels, num_frames
+
+
+def main(files_pattern):
+    data_files = gfile.Glob(files_pattern)
+    filename_queue = tf.train.string_input_producer(
+        data_files, num_epochs=1, shuffle=False)
+
+    reader = YT8MFrameFeatureReader(
+        feature_sizes=[1024, 128], feature_names=["rgb", "audio"])
+    vals = reader.prepare_reader(filename_queue)
+
+    with tf.Session() as sess:
+        sess.run(tf.initialize_local_variables())
+        sess.run(tf.initialize_all_variables())
+        coord = tf.train.Coordinator()
+        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
+
+        vid_num = 0
+        all_data = []
+        try:
+            while not coord.should_stop():
+                vid, features, audios, labels, nframes = sess.run(vals)
+                label_index = np.where(labels == True)[0].tolist()
+                vid_num += 1
+
+                #print vid, features.shape, audios.shape, label_index, nframes
+
+                features_int = features.astype(np.uint8)
+                audios_int = audios.astype(np.uint8)
+
+                value_dict = {}
+                value_dict['video'] = vid
+                value_dict['feature'] = features_int
+                value_dict['audio'] = audios_int
+                value_dict['label'] = label_index
+                value_dict['nframes'] = nframes
+                all_data.append(value_dict)
+
+        except tf.errors.OutOfRangeError:
+            print('Finished extracting.')
+
+        finally:
+            coord.request_stop()
+            coord.join(threads)
+
+    print vid_num
+
+    record_name = files_pattern.split('/')[-1].split('.')[0]
+    outputdir = target_dir
+    fn = '%s.pkl' % record_name
+    outp = open(os.path.join(outputdir, fn), 'wb')
+    cPickle.dump(all_data, outp, protocol=cPickle.HIGHEST_PROTOCOL)
+    outp.close()
+
+
+if __name__ == '__main__':
+    record_dir = source_dir
+    record_files = os.listdir(record_dir)
+    for f in record_files:
+        record_path = os.path.join(record_dir, f)
+        main(record_path)
diff --git a/fluid/PaddleCV/video/dataset/youtube8m/yt8m_pca/eigenvals.npy b/fluid/PaddleCV/video/dataset/youtube8m/yt8m_pca/eigenvals.npy
new file mode 100644
index 0000000000000000000000000000000000000000..632506b9ad68f030d64643cc8100868b21c3eb98
Binary files /dev/null and b/fluid/PaddleCV/video/dataset/youtube8m/yt8m_pca/eigenvals.npy differ
diff --git a/fluid/PaddleCV/video/infer.py b/fluid/PaddleCV/video/infer.py
new file mode 100755
index 0000000000000000000000000000000000000000..43470cede76a39f7b7ffdcb43c0481e25aeca11f
--- /dev/null
+++ b/fluid/PaddleCV/video/infer.py
@@ -0,0 +1,152 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import sys
+import time
+import logging
+import argparse
+import numpy as np
+try:
+    import cPickle as pickle
+except:
+    import pickle
+import paddle.fluid as fluid
+
+from config import *
+import models
+from datareader import get_reader
+
+logging.root.handlers = []
+FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
+logging.basicConfig(level=logging.DEBUG, format=FORMAT, stream=sys.stdout)
+logger = logging.getLogger(__name__)
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--model-name',
+        type=str,
+        default='AttentionCluster',
+        help='name of model to train.')
+    parser.add_argument(
+        '--config',
+        type=str,
+        default='configs/attention_cluster.txt',
+        help='path to config file of model')
+    parser.add_argument(
+        '--use-gpu', type=bool, default=True, help='default use gpu.')
+    parser.add_argument(
+        '--weights',
+        type=str,
+        default=None,
+        help='weight path, None to use weights from Paddle.')
+    parser.add_argument(
+        '--batch-size',
+        type=int,
+        default=1,
+        help='sample number in a batch for inference.')
+    parser.add_argument(
+        '--filelist',
+        type=str,
+        default=None,
+        help='path to inferenece data file lists file.')
+    parser.add_argument(
+        '--log-interval',
+        type=int,
+        default=1,
+        help='mini-batch interval to log.')
+    parser.add_argument(
+        '--infer-topk',
+        type=int,
+        default=20,
+        help='topk predictions to restore.')
+    parser.add_argument(
+        '--save-dir', type=str, default='./', help='directory to store results')
+    args = parser.parse_args()
+    return args
+
+
+def infer(args):
+    # parse config
+    config = parse_config(args.config)
+    infer_config = merge_configs(config, 'infer', vars(args))
+    infer_model = models.get_model(args.model_name, infer_config, mode='infer')
+
+    infer_model.build_input(use_pyreader=False)
+    infer_model.build_model()
+    infer_feeds = infer_model.feeds()
+    infer_outputs = infer_model.outputs()
+
+    place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    filelist = args.filelist or infer_config.INFER.filelist
+    assert os.path.exists(filelist), "{} not exist.".format(args.filelist)
+
+    # get infer reader
+    infer_reader = get_reader(args.model_name.upper(), 'infer', infer_config)
+
+    if args.weights:
+        assert os.path.exists(
+            args.weights), "Given weight dir {} not exist.".format(args.weights)
+    # if no weight files specified, download weights from paddle
+    weights = args.weights or infer_model.get_weights()
+
+    def if_exist(var):
+        return os.path.exists(os.path.join(weights, var.name))
+
+    fluid.io.load_vars(exe, weights, predicate=if_exist)
+
+    infer_feeder = fluid.DataFeeder(place=place, feed_list=infer_feeds)
+    fetch_list = [x.name for x in infer_outputs]
+
+    periods = []
+    results = []
+    cur_time = time.time()
+    for infer_iter, data in enumerate(infer_reader()):
+        data_feed_in = [items[:-1] for items in data]
+        video_id = [items[-1] for items in data]
+        infer_outs = exe.run(fetch_list=fetch_list,
+                             feed=infer_feeder.feed(data_feed_in))
+        predictions = np.array(infer_outs[0])
+        for i in range(len(predictions)):
+            topk_inds = predictions[i].argsort()[0 - args.infer_topk:]
+            topk_inds = topk_inds[::-1]
+            preds = predictions[i][topk_inds]
+            results.append(
+                (video_id[i], preds.tolist(), topk_inds.tolist()))
+        prev_time = cur_time
+        cur_time = time.time()
+        period = cur_time - prev_time
+        periods.append(period)
+        if args.log_interval > 0 and infer_iter % args.log_interval == 0:
+            logger.info('Processed {} samples'.format((infer_iter) * len(
+                predictions)))
+
+    logger.info('[INFER] infer finished. average time: {}'.format(
+        np.mean(periods)))
+
+    if not os.path.isdir(args.save_dir):
+        os.mkdir(args.save_dir)
+    result_file_name = os.path.join(args.save_dir,
+                                    "{}_infer_result".format(args.model_name))
+    pickle.dump(results, open(result_file_name, 'wb'))
+
+if __name__ == "__main__":
+    args = parse_args()
+    logger.info(args)
+
+    infer(args)
diff --git a/fluid/PaddleCV/video/metrics/__init__.py b/fluid/PaddleCV/video/metrics/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..0d1df762bdf3d3b920fc1e00d15a3a2ecdcdbe55
--- /dev/null
+++ b/fluid/PaddleCV/video/metrics/__init__.py
@@ -0,0 +1 @@
+from .metrics_util import get_metrics
diff --git a/fluid/PaddleCV/video/metrics/kinetics/__init__.py b/fluid/PaddleCV/video/metrics/kinetics/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/fluid/PaddleCV/video/metrics/kinetics/accuracy_metrics.py b/fluid/PaddleCV/video/metrics/kinetics/accuracy_metrics.py
new file mode 100644
index 0000000000000000000000000000000000000000..d79bf2ee18ca8de7d5219dd5d1ab6452aec3fe5f
--- /dev/null
+++ b/fluid/PaddleCV/video/metrics/kinetics/accuracy_metrics.py
@@ -0,0 +1,107 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import unicode_literals
+from __future__ import print_function
+from __future__ import division
+
+import numpy as np
+import datetime
+import logging
+
+logger = logging.getLogger(__name__)
+
+
+class MetricsCalculator():
+    def __init__(self, name, mode):
+        self.name = name
+        self.mode = mode  # 'train', 'val', 'test'
+        self.reset()
+
+    def reset(self):
+        logger.info('Resetting {} metrics...'.format(self.mode))
+        self.aggr_acc1 = 0.0
+        self.aggr_acc5 = 0.0
+        self.aggr_loss = 0.0
+        self.aggr_batch_size = 0
+
+    def finalize_metrics(self):
+        self.avg_acc1 = self.aggr_acc1 / self.aggr_batch_size
+        self.avg_acc5 = self.aggr_acc5 / self.aggr_batch_size
+        self.avg_loss = self.aggr_loss / self.aggr_batch_size
+
+    def get_computed_metrics(self):
+        json_stats = {}
+        json_stats['avg_loss'] = self.avg_loss
+        json_stats['avg_acc1'] = self.avg_acc1
+        json_stats['avg_acc5'] = self.avg_acc5
+        return json_stats
+
+    def calculate_metrics(self, loss, softmax, labels):
+        accuracy1 = compute_topk_accuracy(softmax, labels, top_k=1) * 100.
+        accuracy5 = compute_topk_accuracy(softmax, labels, top_k=5) * 100.
+        return accuracy1, accuracy5
+
+    def accumulate(self, loss, softmax, labels):
+        cur_batch_size = softmax.shape[0]
+        # if returned loss is None for e.g. test, just set loss to be 0.
+        if loss is None:
+            cur_loss = 0.
+        else:
+            cur_loss = np.mean(np.array(loss))  #
+        self.aggr_batch_size += cur_batch_size
+        self.aggr_loss += cur_loss * cur_batch_size
+
+        accuracy1 = compute_topk_accuracy(softmax, labels, top_k=1) * 100.
+        accuracy5 = compute_topk_accuracy(softmax, labels, top_k=5) * 100.
+        self.aggr_acc1 += accuracy1 * cur_batch_size
+        self.aggr_acc5 += accuracy5 * cur_batch_size
+
+        return
+
+
+# ----------------------------------------------
+# other utils
+# ----------------------------------------------
+def compute_topk_correct_hits(top_k, preds, labels):
+    '''Compute the number of corret hits'''
+    batch_size = preds.shape[0]
+
+    top_k_preds = np.zeros((batch_size, top_k), dtype=np.float32)
+    for i in range(batch_size):
+        top_k_preds[i, :] = np.argsort(-preds[i, :])[:top_k]
+
+    correctness = np.zeros(batch_size, dtype=np.int32)
+    for i in range(batch_size):
+        if labels[i] in top_k_preds[i, :].astype(np.int32).tolist():
+            correctness[i] = 1
+    correct_hits = sum(correctness)
+
+    return correct_hits
+
+
+def compute_topk_accuracy(softmax, labels, top_k):
+
+    computed_metrics = {}
+
+    assert labels.shape[0] == softmax.shape[0], "Batch size mismatch."
+    aggr_batch_size = labels.shape[0]
+    aggr_top_k_correct_hits = compute_topk_correct_hits(top_k, softmax, labels)
+
+    # normalize results
+    computed_metrics = \
+        float(aggr_top_k_correct_hits) / aggr_batch_size
+
+    return computed_metrics
diff --git a/fluid/PaddleCV/video/metrics/metrics_util.py b/fluid/PaddleCV/video/metrics/metrics_util.py
new file mode 100644
index 0000000000000000000000000000000000000000..f769349136cc1c0b3dc5be9e44d4a5e186f2a39f
--- /dev/null
+++ b/fluid/PaddleCV/video/metrics/metrics_util.py
@@ -0,0 +1,196 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import unicode_literals
+from __future__ import print_function
+from __future__ import division
+
+import logging
+
+import numpy as np
+from metrics.youtube8m import eval_util as youtube8m_metrics
+from metrics.kinetics import accuracy_metrics as kinetics_metrics
+from metrics.multicrop_test import multicrop_test_metrics as multicrop_test_metrics
+
+logger = logging.getLogger(__name__)
+
+
+class Metrics(object):
+    def __init__(self, name, mode, metrics_args):
+        """Not implemented"""
+        pass
+
+    def calculate_and_log_out(self, loss, pred, label, info=''):
+        """Not implemented"""
+        pass
+
+    def accumulate(self, loss, pred, label, info=''):
+        """Not implemented"""
+        pass
+
+    def finalize_and_log_out(self, info=''):
+        """Not implemented"""
+        pass
+
+    def reset(self):
+        """Not implemented"""
+        pass
+
+
+class Youtube8mMetrics(Metrics):
+    def __init__(self, name, mode, metrics_args):
+        self.name = name
+        self.mode = mode
+        self.num_classes = metrics_args['MODEL']['num_classes']
+        self.topk = metrics_args['MODEL']['topk']
+        self.calculator = youtube8m_metrics.EvaluationMetrics(self.num_classes,
+                                                              self.topk)
+
+    def calculate_and_log_out(self, loss, pred, label, info=''):
+        loss = np.mean(np.array(loss))
+        hit_at_one = youtube8m_metrics.calculate_hit_at_one(pred, label)
+        perr = youtube8m_metrics.calculate_precision_at_equal_recall_rate(pred,
+                                                                          label)
+        gap = youtube8m_metrics.calculate_gap(pred, label)
+        logger.info(info + ' , loss = {0}, Hit@1 = {1}, PERR = {2}, GAP = {3}'.format(\
+                     '%.6f' % loss, '%.2f' % hit_at_one, '%.2f' % perr, '%.2f' % gap))
+
+    def accumulate(self, loss, pred, label, info=''):
+        self.calculator.accumulate(loss, pred, label)
+
+    def finalize_and_log_out(self, info=''):
+        epoch_info_dict = self.calculator.get()
+        logger.info(info + '\tavg_hit_at_one: {0},\tavg_perr: {1},\tavg_loss :{2},\taps: {3},\tgap:{4}'\
+                     .format(epoch_info_dict['avg_hit_at_one'], epoch_info_dict['avg_perr'], \
+                             epoch_info_dict['avg_loss'], epoch_info_dict['aps'], epoch_info_dict['gap']))
+
+    def reset(self):
+        self.calculator.clear()
+
+
+class Kinetics400Metrics(Metrics):
+    def __init__(self, name, mode, metrics_args):
+        self.name = name
+        self.mode = mode
+        self.calculator = kinetics_metrics.MetricsCalculator(name, mode.lower())
+
+    def calculate_and_log_out(self, loss, pred, label, info=''):
+        if loss is not None:
+            loss = np.mean(np.array(loss))
+        else:
+            loss = 0.
+        acc1, acc5 = self.calculator.calculate_metrics(loss, pred, label)
+        logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
+                       '%.2f' % acc1, '%.2f' % acc5))
+
+    def accumulate(self, loss, pred, label, info=''):
+        self.calculator.accumulate(loss, pred, label)
+
+    def finalize_and_log_out(self, info=''):
+        self.calculator.finalize_metrics()
+        metrics_dict = self.calculator.get_computed_metrics()
+        loss = metrics_dict['avg_loss']
+        acc1 = metrics_dict['avg_acc1']
+        acc5 = metrics_dict['avg_acc5']
+        logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
+                       '%.2f' % acc1, '%.2f' % acc5))
+
+    def reset(self):
+        self.calculator.reset()
+
+
+class MulticropMetrics(Metrics):
+    def __init__(self, name, mode, metrics_args):
+        self.name = name
+        self.mode = mode
+        if mode == 'test':
+            args = {}
+            args['num_test_clips'] = metrics_args.TEST.num_test_clips
+            args['dataset_size'] = metrics_args.TEST.dataset_size
+            args['filename_gt'] = metrics_args.TEST.filename_gt
+            args['checkpoint_dir'] = metrics_args.TEST.checkpoint_dir
+            args['num_classes'] = metrics_args.MODEL.num_classes
+            self.calculator = multicrop_test_metrics.MetricsCalculator(
+                name, mode.lower(), **args)
+        else:
+            self.calculator = kinetics_metrics.MetricsCalculator(name,
+                                                                 mode.lower())
+
+    def calculate_and_log_out(self, loss, pred, label, info=''):
+        if self.mode == 'test':
+            pass
+        else:
+            if loss is not None:
+                loss = np.mean(np.array(loss))
+            else:
+                loss = 0.
+            acc1, acc5 = self.calculator.calculate_metrics(loss, pred, label)
+            logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
+                                   '%.2f' % acc1, '%.2f' % acc5))
+
+    def accumulate(self, loss, pred, label):
+        self.calculator.accumulate(loss, pred, label)
+
+    def finalize_and_log_out(self, info=''):
+        if self.mode == 'test':
+            self.calculator.finalize_metrics()
+        else:
+            self.calculator.finalize_metrics()
+            metrics_dict = self.calculator.get_computed_metrics()
+            loss = metrics_dict['avg_loss']
+            acc1 = metrics_dict['avg_acc1']
+            acc5 = metrics_dict['avg_acc5']
+            logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
+                           '%.2f' % acc1, '%.2f' % acc5))
+
+    def reset(self):
+        self.calculator.reset()
+
+
+class MetricsZoo(object):
+    def __init__(self):
+        self.metrics_zoo = {}
+
+    def regist(self, name, metrics):
+        assert metrics.__base__ == Metrics, "Unknow model type {}".format(
+            type(metrics))
+        self.metrics_zoo[name] = metrics
+
+    def get(self, name, mode, cfg):
+        for k, v in self.metrics_zoo.items():
+            if k == name:
+                return v(name, mode, cfg)
+        raise MetricsNotFoundError(name, self.metrics_zoo.keys())
+
+
+# singleton metrics_zoo
+metrics_zoo = MetricsZoo()
+
+
+def regist_metrics(name, metrics):
+    metrics_zoo.regist(name, metrics)
+
+
+def get_metrics(name, mode, cfg):
+    return metrics_zoo.get(name, mode, cfg)
+
+
+regist_metrics("NEXTVLAD", Youtube8mMetrics)
+regist_metrics("ATTENTIONLSTM", Youtube8mMetrics)
+regist_metrics("ATTENTIONCLUSTER", Youtube8mMetrics)
+regist_metrics("TSN", Kinetics400Metrics)
+regist_metrics("TSM", Kinetics400Metrics)
+regist_metrics("STNET", Kinetics400Metrics)
+regist_metrics("NONLOCAL", MulticropMetrics)
diff --git a/fluid/PaddleCV/video/metrics/multicrop_test/__init__.py b/fluid/PaddleCV/video/metrics/multicrop_test/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/fluid/PaddleCV/video/metrics/multicrop_test/multicrop_test_metrics.py b/fluid/PaddleCV/video/metrics/multicrop_test/multicrop_test_metrics.py
new file mode 100644
index 0000000000000000000000000000000000000000..9da8826cef1be209fce0bfac8d2c7f0b6d70d4a4
--- /dev/null
+++ b/fluid/PaddleCV/video/metrics/multicrop_test/multicrop_test_metrics.py
@@ -0,0 +1,213 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import unicode_literals
+from __future__ import print_function
+from __future__ import division
+
+import sys
+import os
+import numpy as np
+import datetime
+import logging
+from collections import defaultdict
+import pickle
+
+logger = logging.getLogger(__name__)
+
+
+class MetricsCalculator():
+    def __init__(self, name, mode, **metrics_args):
+        """
+          metrics args:
+                        num_test_clips, number of clips of each video when test
+                        dataset_size,   total number of videos in the dataset
+                        filename_gt,    a file with each line stores the groud truth of each video
+                        checkpoint_dir, dir where to store the test results
+                        num_classes,    number of classes of the dataset
+        """
+        self.name = name
+        self.mode = mode  # 'train', 'val', 'test'
+        self.metrics_args = metrics_args
+
+        self.num_test_clips = metrics_args['num_test_clips']
+        self.dataset_size = metrics_args['dataset_size']
+        self.filename_gt = metrics_args['filename_gt']
+        self.checkpoint_dir = metrics_args['checkpoint_dir']
+        self.num_classes = metrics_args['num_classes']
+        self.reset()
+
+    def reset(self):
+        logger.info('Resetting {} metrics...'.format(self.mode))
+        self.aggr_acc1 = 0.0
+        self.aggr_acc5 = 0.0
+        self.aggr_loss = 0.0
+        self.aggr_batch_size = 0
+        self.seen_inds = defaultdict(int)
+        self.results = []
+
+    def calculate_metrics(self, loss, pred, labels):
+        pass
+
+    def accumulate(self, loss, pred, labels):
+        labels = labels.astype(int)
+        for i in range(pred.shape[0]):
+            probs = pred[i, :].tolist()
+            vid = labels[i]
+            self.seen_inds[vid] += 1
+            if self.seen_inds[vid] > self.num_test_clips:
+                logger.warning('Video id {} have been seen. Skip.'.format(vid,
+                                                                          ))
+                continue
+            save_pairs = [vid, probs]
+            self.results.append(save_pairs)
+            logger.info("({0} / {1}) videos".format(\
+                        len(self.seen_inds), self.dataset_size))
+
+    def finalize_metrics(self):
+        if self.filename_gt is not None:
+            evaluate_results(self.results, self.filename_gt, self.dataset_size, \
+                             self.num_classes, self.num_test_clips)
+        # save temporary file
+        pkl_path = os.path.join(self.checkpoint_dir, "results_probs.pkl")
+
+        with open(pkl_path, 'w') as f:
+            pickle.dump(self.results, f)
+        logger.info('Temporary file saved to: {}'.format(pkl_path))
+
+
+def read_groundtruth(filename_gt):
+    f = open(filename_gt, 'r')
+    labels = []
+    for line in f:
+        rows = line.split()
+        labels.append(int(rows[1]))
+    f.close()
+    return labels
+
+
+def evaluate_results(results, filename_gt, test_dataset_size, num_classes,
+                     num_test_clips):
+    gt_labels = read_groundtruth(filename_gt)
+    sample_num = test_dataset_size
+    class_num = num_classes
+    sample_video_times = num_test_clips
+    counts = np.zeros(sample_num, dtype=np.int32)
+    probs = np.zeros((sample_num, class_num))
+
+    assert (len(gt_labels) == sample_num)
+    """
+    clip_accuracy: the (e.g.) 10*19761 clips' average accuracy
+    clip1_accuracy: the 1st clip's accuracy (starting from frame 0)
+    """
+    clip_accuracy = 0
+    clip1_accuracy = 0
+    clip1_count = 0
+    seen_inds = defaultdict(int)
+
+    # evaluate
+    for entry in results:
+        vid = entry[0]
+        prob = np.array(entry[1])
+        probs[vid] += prob[0:class_num]
+        counts[vid] += 1
+
+        idx = prob.argmax()
+        if idx == gt_labels[vid]:
+            # clip accuracy
+            clip_accuracy += 1
+
+        # clip1 accuracy
+        seen_inds[vid] += 1
+        if seen_inds[vid] == 1:
+            clip1_count += 1
+            if idx == gt_labels[vid]:
+                clip1_accuracy += 1
+
+    # sanity checkcnt = 0
+    max_clips = 0
+    min_clips = sys.maxsize
+    count_empty = 0
+    count_corrupted = 0
+    for i in range(sample_num):
+        max_clips = max(max_clips, counts[i])
+        min_clips = min(min_clips, counts[i])
+        if counts[i] != sample_video_times:
+            count_corrupted += 1
+            logger.warning('Id: {} count: {}'.format(i, counts[i]))
+        if counts[i] == 0:
+            count_empty += 1
+
+    logger.info('Num of empty videos: {}'.format(count_empty))
+    logger.info('Num of corrupted videos: {}'.format(count_corrupted))
+    logger.info('Max num of clips in a video: {}'.format(max_clips))
+    logger.info('Min num of clips in a video: {}'.format(min_clips))
+
+    # clip1 accuracy for sanity (# print clip1 first as it is lowest)
+    logger.info('Clip1 accuracy: {:.2f} percent ({}/{})'.format(
+        100. * clip1_accuracy / clip1_count, clip1_accuracy, clip1_count))
+
+    # clip accuracy for sanity
+    logger.info('Clip accuracy: {:.2f} percent ({}/{})'.format(
+        100. * clip_accuracy / len(results), clip_accuracy, len(results)))
+
+    # compute accuracy
+    accuracy = 0
+    accuracy_top5 = 0
+    for i in range(sample_num):
+        prob = probs[i]
+
+        # top-1
+        idx = prob.argmax()
+        if idx == gt_labels[i] and counts[i] > 0:
+            accuracy = accuracy + 1
+
+        ids = np.argsort(prob)[::-1]
+        for j in range(5):
+            if ids[j] == gt_labels[i] and counts[i] > 0:
+                accuracy_top5 = accuracy_top5 + 1
+                break
+
+    accuracy = float(accuracy) / float(sample_num)
+    accuracy_top5 = float(accuracy_top5) / float(sample_num)
+
+    logger.info('-' * 80)
+    logger.info('top-1 accuracy: {:.2f} percent'.format(accuracy * 100))
+    logger.info('top-5 accuracy: {:.2f} percent'.format(accuracy_top5 * 100))
+    logger.info('-' * 80)
+
+    for i in range(sample_num):
+        prob = probs[i]
+
+        # top-1
+        idx = prob.argmax()
+        if idx == gt_labels[i] and counts[i] > 0:
+            accuracy = accuracy + 1
+
+        ids = np.argsort(prob)[::-1]
+        for j in range(5):
+            if ids[j] == gt_labels[i] and counts[i] > 0:
+                accuracy_top5 = accuracy_top5 + 1
+                break
+
+    accuracy = float(accuracy) / float(sample_num)
+    accuracy_top5 = float(accuracy_top5) / float(sample_num)
+
+    logger.info('-' * 80)
+    logger.info('top-1 accuracy: {:.2f} percent'.format(accuracy * 100))
+    logger.info('top-5 accuracy: {:.2f} percent'.format(accuracy_top5 * 100))
+    logger.info('-' * 80)
+
+    return
diff --git a/fluid/PaddleCV/video/metrics/youtube8m/__init__.py b/fluid/PaddleCV/video/metrics/youtube8m/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/fluid/PaddleCV/video/metrics/youtube8m/average_precision_calculator.py b/fluid/PaddleCV/video/metrics/youtube8m/average_precision_calculator.py
new file mode 100644
index 0000000000000000000000000000000000000000..9bad69dd0aff1906e3548fb0322203f0bc5b408d
--- /dev/null
+++ b/fluid/PaddleCV/video/metrics/youtube8m/average_precision_calculator.py
@@ -0,0 +1,275 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS-IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Calculate or keep track of the interpolated average precision.
+
+It provides an interface for calculating interpolated average precision for an
+entire list or the top-n ranked items. For the definition of the
+(non-)interpolated average precision:
+http://trec.nist.gov/pubs/trec15/appendices/CE.MEASURES06.pdf
+
+Example usages:
+1) Use it as a static function call to directly calculate average precision for
+a short ranked list in the memory.
+
+```
+import random
+
+p = np.array([random.random() for _ in xrange(10)])
+a = np.array([random.choice([0, 1]) for _ in xrange(10)])
+
+ap = average_precision_calculator.AveragePrecisionCalculator.ap(p, a)
+```
+
+2) Use it as an object for long ranked list that cannot be stored in memory or
+the case where partial predictions can be observed at a time (Tensorflow
+predictions). In this case, we first call the function accumulate many times
+to process parts of the ranked list. After processing all the parts, we call
+peek_interpolated_ap_at_n.
+```
+p1 = np.array([random.random() for _ in xrange(5)])
+a1 = np.array([random.choice([0, 1]) for _ in xrange(5)])
+p2 = np.array([random.random() for _ in xrange(5)])
+a2 = np.array([random.choice([0, 1]) for _ in xrange(5)])
+
+# interpolated average precision at 10 using 1000 break points
+calculator = average_precision_calculator.AveragePrecisionCalculator(10)
+calculator.accumulate(p1, a1)
+calculator.accumulate(p2, a2)
+ap3 = calculator.peek_ap_at_n()
+```
+"""
+
+import heapq
+import random
+import numbers
+
+import numpy
+
+
+class AveragePrecisionCalculator(object):
+    """Calculate the average precision and average precision at n."""
+
+    def __init__(self, top_n=None):
+        """Construct an AveragePrecisionCalculator to calculate average precision.
+
+    This class is used to calculate the average precision for a single label.
+
+    Args:
+      top_n: A positive Integer specifying the average precision at n, or
+        None to use all provided data points.
+
+    Raises:
+      ValueError: An error occurred when the top_n is not a positive integer.
+    """
+        if not ((isinstance(top_n, int) and top_n >= 0) or top_n is None):
+            raise ValueError("top_n must be a positive integer or None.")
+
+        self._top_n = top_n  # average precision at n
+        self._total_positives = 0  # total number of positives have seen
+        self._heap = []  # max heap of (prediction, actual)
+
+    @property
+    def heap_size(self):
+        """Gets the heap size maintained in the class."""
+        return len(self._heap)
+
+    @property
+    def num_accumulated_positives(self):
+        """Gets the number of positive samples that have been accumulated."""
+        return self._total_positives
+
+    def accumulate(self, predictions, actuals, num_positives=None):
+        """Accumulate the predictions and their ground truth labels.
+
+    After the function call, we may call peek_ap_at_n to actually calculate
+    the average precision.
+    Note predictions and actuals must have the same shape.
+
+    Args:
+      predictions: a list storing the prediction scores.
+      actuals: a list storing the ground truth labels. Any value
+      larger than 0 will be treated as positives, otherwise as negatives.
+      num_positives = If the 'predictions' and 'actuals' inputs aren't complete,
+      then it's possible some true positives were missed in them. In that case,
+      you can provide 'num_positives' in order to accurately track recall.
+
+    Raises:
+      ValueError: An error occurred when the format of the input is not the
+      numpy 1-D array or the shape of predictions and actuals does not match.
+    """
+        if len(predictions) != len(actuals):
+            raise ValueError(
+                "the shape of predictions and actuals does not match.")
+
+        if not num_positives is None:
+            if not isinstance(num_positives,
+                              numbers.Number) or num_positives < 0:
+                raise ValueError(
+                    "'num_positives' was provided but it wan't a nonzero number."
+                )
+
+        if not num_positives is None:
+            self._total_positives += num_positives
+        else:
+            self._total_positives += numpy.size(numpy.where(actuals > 0))
+        topk = self._top_n
+        heap = self._heap
+
+        for i in range(numpy.size(predictions)):
+            if topk is None or len(heap) < topk:
+                heapq.heappush(heap, (predictions[i], actuals[i]))
+            else:
+                if predictions[i] > heap[0][0]:  # heap[0] is the smallest
+                    heapq.heappop(heap)
+                    heapq.heappush(heap, (predictions[i], actuals[i]))
+
+    def clear(self):
+        """Clear the accumulated predictions."""
+        self._heap = []
+        self._total_positives = 0
+
+    def peek_ap_at_n(self):
+        """Peek the non-interpolated average precision at n.
+
+    Returns:
+      The non-interpolated average precision at n (default 0).
+      If n is larger than the length of the ranked list,
+      the average precision will be returned.
+    """
+        if self.heap_size <= 0:
+            return 0
+        predlists = numpy.array(list(zip(*self._heap)))
+
+        ap = self.ap_at_n(
+            predlists[0],
+            predlists[1],
+            n=self._top_n,
+            total_num_positives=self._total_positives)
+        return ap
+
+    @staticmethod
+    def ap(predictions, actuals):
+        """Calculate the non-interpolated average precision.
+
+    Args:
+      predictions: a numpy 1-D array storing the sparse prediction scores.
+      actuals: a numpy 1-D array storing the ground truth labels. Any value
+      larger than 0 will be treated as positives, otherwise as negatives.
+
+    Returns:
+      The non-interpolated average precision at n.
+      If n is larger than the length of the ranked list,
+      the average precision will be returned.
+
+    Raises:
+      ValueError: An error occurred when the format of the input is not the
+      numpy 1-D array or the shape of predictions and actuals does not match.
+    """
+        return AveragePrecisionCalculator.ap_at_n(predictions, actuals, n=None)
+
+    @staticmethod
+    def ap_at_n(predictions, actuals, n=20, total_num_positives=None):
+        """Calculate the non-interpolated average precision.
+
+    Args:
+      predictions: a numpy 1-D array storing the sparse prediction scores.
+      actuals: a numpy 1-D array storing the ground truth labels. Any value
+      larger than 0 will be treated as positives, otherwise as negatives.
+      n: the top n items to be considered in ap@n.
+      total_num_positives : (optionally) you can specify the number of total
+        positive
+      in the list. If specified, it will be used in calculation.
+
+    Returns:
+      The non-interpolated average precision at n.
+      If n is larger than the length of the ranked list,
+      the average precision will be returned.
+
+    Raises:
+      ValueError: An error occurred when
+      1) the format of the input is not the numpy 1-D array;
+      2) the shape of predictions and actuals does not match;
+      3) the input n is not a positive integer.
+    """
+        if len(predictions) != len(actuals):
+            raise ValueError(
+                "the shape of predictions and actuals does not match.")
+
+        if n is not None:
+            if not isinstance(n, int) or n <= 0:
+                raise ValueError("n must be 'None' or a positive integer."
+                                 " It was '%s'." % n)
+
+        ap = 0.0
+
+        predictions = numpy.array(predictions)
+        actuals = numpy.array(actuals)
+
+        # add a shuffler to avoid overestimating the ap
+        predictions, actuals = AveragePrecisionCalculator._shuffle(predictions,
+                                                                   actuals)
+        sortidx = sorted(
+            range(len(predictions)), key=lambda k: predictions[k], reverse=True)
+
+        if total_num_positives is None:
+            numpos = numpy.size(numpy.where(actuals > 0))
+        else:
+            numpos = total_num_positives
+
+        if numpos == 0:
+            return 0
+
+        if n is not None:
+            numpos = min(numpos, n)
+        delta_recall = 1.0 / numpos
+        poscount = 0.0
+
+        # calculate the ap
+        r = len(sortidx)
+        if n is not None:
+            r = min(r, n)
+        for i in range(r):
+            if actuals[sortidx[i]] > 0:
+                poscount += 1
+                ap += poscount / (i + 1) * delta_recall
+        return ap
+
+    @staticmethod
+    def _shuffle(predictions, actuals):
+        random.seed(0)
+        suffidx = random.sample(range(len(predictions)), len(predictions))
+        predictions = predictions[suffidx]
+        actuals = actuals[suffidx]
+        return predictions, actuals
+
+    @staticmethod
+    def _zero_one_normalize(predictions, epsilon=1e-7):
+        """Normalize the predictions to the range between 0.0 and 1.0.
+
+    For some predictions like SVM predictions, we need to normalize them before
+    calculate the interpolated average precision. The normalization will not
+    change the rank in the original list and thus won't change the average
+    precision.
+
+    Args:
+      predictions: a numpy 1-D array storing the sparse prediction scores.
+      epsilon: a small constant to avoid denominator being zero.
+
+    Returns:
+      The normalized prediction.
+    """
+        denominator = numpy.max(predictions) - numpy.min(predictions)
+        ret = (predictions - numpy.min(predictions)) / numpy.max(denominator,
+                                                                 epsilon)
+        return ret
diff --git a/fluid/PaddleCV/video/metrics/youtube8m/eval_util.py b/fluid/PaddleCV/video/metrics/youtube8m/eval_util.py
new file mode 100644
index 0000000000000000000000000000000000000000..f7742236f1176073eae84fdc7c3a3a1a2e294fe0
--- /dev/null
+++ b/fluid/PaddleCV/video/metrics/youtube8m/eval_util.py
@@ -0,0 +1,245 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS-IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Provides functions to help with evaluating models."""
+import datetime
+import numpy
+
+from . import mean_average_precision_calculator as map_calculator
+from . import average_precision_calculator as ap_calculator
+
+
+def flatten(l):
+    """ Merges a list of lists into a single list. """
+    return [item for sublist in l for item in sublist]
+
+
+def calculate_hit_at_one(predictions, actuals):
+    """Performs a local (numpy) calculation of the hit at one.
+
+  Args:
+    predictions: Matrix containing the outputs of the model.
+      Dimensions are 'batch' x 'num_classes'.
+    actuals: Matrix containing the ground truth labels.
+      Dimensions are 'batch' x 'num_classes'.
+
+  Returns:
+    float: The average hit at one across the entire batch.
+  """
+    top_prediction = numpy.argmax(predictions, 1)
+    hits = actuals[numpy.arange(actuals.shape[0]), top_prediction]
+    return numpy.average(hits)
+
+
+def calculate_precision_at_equal_recall_rate(predictions, actuals):
+    """Performs a local (numpy) calculation of the PERR.
+
+  Args:
+    predictions: Matrix containing the outputs of the model.
+      Dimensions are 'batch' x 'num_classes'.
+    actuals: Matrix containing the ground truth labels.
+      Dimensions are 'batch' x 'num_classes'.
+
+  Returns:
+    float: The average precision at equal recall rate across the entire batch.
+  """
+    aggregated_precision = 0.0
+    num_videos = actuals.shape[0]
+    for row in numpy.arange(num_videos):
+        num_labels = int(numpy.sum(actuals[row]))
+        top_indices = numpy.argpartition(predictions[row],
+                                         -num_labels)[-num_labels:]
+        item_precision = 0.0
+        for label_index in top_indices:
+            if predictions[row][label_index] > 0:
+                item_precision += actuals[row][label_index]
+        item_precision /= top_indices.size
+        aggregated_precision += item_precision
+    aggregated_precision /= num_videos
+    return aggregated_precision
+
+
+def calculate_gap(predictions, actuals, top_k=20):
+    """Performs a local (numpy) calculation of the global average precision.
+
+  Only the top_k predictions are taken for each of the videos.
+
+  Args:
+    predictions: Matrix containing the outputs of the model.
+      Dimensions are 'batch' x 'num_classes'.
+    actuals: Matrix containing the ground truth labels.
+      Dimensions are 'batch' x 'num_classes'.
+    top_k: How many predictions to use per video.
+
+  Returns:
+    float: The global average precision.
+  """
+    gap_calculator = ap_calculator.AveragePrecisionCalculator()
+    sparse_predictions, sparse_labels, num_positives = top_k_by_class(
+        predictions, actuals, top_k)
+    gap_calculator.accumulate(
+        flatten(sparse_predictions), flatten(sparse_labels), sum(num_positives))
+    return gap_calculator.peek_ap_at_n()
+
+
+def top_k_by_class(predictions, labels, k=20):
+    """Extracts the top k predictions for each video, sorted by class.
+
+  Args:
+    predictions: A numpy matrix containing the outputs of the model.
+      Dimensions are 'batch' x 'num_classes'.
+    k: the top k non-zero entries to preserve in each prediction.
+
+  Returns:
+    A tuple (predictions,labels, true_positives). 'predictions' and 'labels'
+    are lists of lists of floats. 'true_positives' is a list of scalars. The
+    length of the lists are equal to the number of classes. The entries in the
+    predictions variable are probability predictions, and
+    the corresponding entries in the labels variable are the ground truth for
+    those predictions. The entries in 'true_positives' are the number of true
+    positives for each class in the ground truth.
+
+  Raises:
+    ValueError: An error occurred when the k is not a positive integer.
+  """
+    if k <= 0:
+        raise ValueError("k must be a positive integer.")
+    k = min(k, predictions.shape[1])
+    num_classes = predictions.shape[1]
+    prediction_triplets = []
+    for video_index in range(predictions.shape[0]):
+        prediction_triplets.extend(
+            top_k_triplets(predictions[video_index], labels[video_index], k))
+    out_predictions = [[] for v in range(num_classes)]
+    out_labels = [[] for v in range(num_classes)]
+    for triplet in prediction_triplets:
+        out_predictions[triplet[0]].append(triplet[1])
+        out_labels[triplet[0]].append(triplet[2])
+    out_true_positives = [numpy.sum(labels[:, i]) for i in range(num_classes)]
+
+    return out_predictions, out_labels, out_true_positives
+
+
+def top_k_triplets(predictions, labels, k=20):
+    """Get the top_k for a 1-d numpy array. Returns a sparse list of tuples in
+  (prediction, class) format"""
+    m = len(predictions)
+    k = min(k, m)
+    indices = numpy.argpartition(predictions, -k)[-k:]
+    return [(index, predictions[index], labels[index]) for index in indices]
+
+
+class EvaluationMetrics(object):
+    """A class to store the evaluation metrics."""
+
+    def __init__(self, num_class, top_k):
+        """Construct an EvaluationMetrics object to store the evaluation metrics.
+
+    Args:
+      num_class: A positive integer specifying the number of classes.
+      top_k: A positive integer specifying how many predictions are considered per video.
+
+    Raises:
+      ValueError: An error occurred when MeanAveragePrecisionCalculator cannot
+        not be constructed.
+    """
+        self.sum_hit_at_one = 0.0
+        self.sum_perr = 0.0
+        self.sum_loss = 0.0
+        self.map_calculator = map_calculator.MeanAveragePrecisionCalculator(
+            num_class)
+        self.global_ap_calculator = ap_calculator.AveragePrecisionCalculator()
+        self.top_k = top_k
+        self.num_examples = 0
+
+    #def accumulate(self, predictions, labels, loss):
+    def accumulate(self, loss, predictions, labels):
+        """Accumulate the metrics calculated locally for this mini-batch.
+
+    Args:
+      predictions: A numpy matrix containing the outputs of the model.
+        Dimensions are 'batch' x 'num_classes'.
+      labels: A numpy matrix containing the ground truth labels.
+        Dimensions are 'batch' x 'num_classes'.
+      loss: A numpy array containing the loss for each sample.
+
+    Returns:
+      dictionary: A dictionary storing the metrics for the mini-batch.
+
+    Raises:
+      ValueError: An error occurred when the shape of predictions and actuals
+        does not match.
+    """
+        batch_size = labels.shape[0]
+        mean_hit_at_one = calculate_hit_at_one(predictions, labels)
+        mean_perr = calculate_precision_at_equal_recall_rate(predictions,
+                                                             labels)
+        mean_loss = numpy.mean(loss)
+
+        # Take the top 20 predictions.
+        sparse_predictions, sparse_labels, num_positives = top_k_by_class(
+            predictions, labels, self.top_k)
+        self.map_calculator.accumulate(sparse_predictions, sparse_labels,
+                                       num_positives)
+        self.global_ap_calculator.accumulate(
+            flatten(sparse_predictions),
+            flatten(sparse_labels), sum(num_positives))
+
+        self.num_examples += batch_size
+        self.sum_hit_at_one += mean_hit_at_one * batch_size
+        self.sum_perr += mean_perr * batch_size
+        self.sum_loss += mean_loss * batch_size
+
+        return {
+            "hit_at_one": mean_hit_at_one,
+            "perr": mean_perr,
+            "loss": mean_loss
+        }
+
+    def get(self):
+        """Calculate the evaluation metrics for the whole epoch.
+
+    Raises:
+      ValueError: If no examples were accumulated.
+
+    Returns:
+      dictionary: a dictionary storing the evaluation metrics for the epoch. The
+        dictionary has the fields: avg_hit_at_one, avg_perr, avg_loss, and
+        aps (default nan).
+    """
+        if self.num_examples <= 0:
+            raise ValueError("total_sample must be positive.")
+        avg_hit_at_one = self.sum_hit_at_one / self.num_examples
+        avg_perr = self.sum_perr / self.num_examples
+        avg_loss = self.sum_loss / self.num_examples
+
+        aps = self.map_calculator.peek_map_at_n()
+        gap = self.global_ap_calculator.peek_ap_at_n()
+
+        epoch_info_dict = {}
+        return {
+            "avg_hit_at_one": avg_hit_at_one,
+            "avg_perr": avg_perr,
+            "avg_loss": avg_loss,
+            "aps": aps,
+            "gap": gap
+        }
+
+    def clear(self):
+        """Clear the evaluation metrics and reset the EvaluationMetrics object."""
+        self.sum_hit_at_one = 0.0
+        self.sum_perr = 0.0
+        self.sum_loss = 0.0
+        self.map_calculator.clear()
+        self.global_ap_calculator.clear()
+        self.num_examples = 0
diff --git a/fluid/PaddleCV/video/metrics/youtube8m/mean_average_precision_calculator.py b/fluid/PaddleCV/video/metrics/youtube8m/mean_average_precision_calculator.py
new file mode 100644
index 0000000000000000000000000000000000000000..0ae8b0ed3717aba13b7ed35b4af025be40423967
--- /dev/null
+++ b/fluid/PaddleCV/video/metrics/youtube8m/mean_average_precision_calculator.py
@@ -0,0 +1,114 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS-IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Calculate the mean average precision.
+
+It provides an interface for calculating mean average precision
+for an entire list or the top-n ranked items.
+
+Example usages:
+We first call the function accumulate many times to process parts of the ranked
+list. After processing all the parts, we call peek_map_at_n
+to calculate the mean average precision.
+
+```
+import random
+
+p = np.array([[random.random() for _ in xrange(50)] for _ in xrange(1000)])
+a = np.array([[random.choice([0, 1]) for _ in xrange(50)]
+     for _ in xrange(1000)])
+
+# mean average precision for 50 classes.
+calculator = mean_average_precision_calculator.MeanAveragePrecisionCalculator(
+            num_class=50)
+calculator.accumulate(p, a)
+aps = calculator.peek_map_at_n()
+```
+"""
+
+import numpy
+from . import average_precision_calculator
+
+
+class MeanAveragePrecisionCalculator(object):
+    """This class is to calculate mean average precision.
+  """
+
+    def __init__(self, num_class):
+        """Construct a calculator to calculate the (macro) average precision.
+
+    Args:
+      num_class: A positive Integer specifying the number of classes.
+      top_n_array: A list of positive integers specifying the top n for each
+      class. The top n in each class will be used to calculate its average
+      precision at n.
+      The size of the array must be num_class.
+
+    Raises:
+      ValueError: An error occurred when num_class is not a positive integer;
+      or the top_n_array is not a list of positive integers.
+    """
+        if not isinstance(num_class, int) or num_class <= 1:
+            raise ValueError("num_class must be a positive integer.")
+
+        self._ap_calculators = []  # member of AveragePrecisionCalculator
+        self._num_class = num_class  # total number of classes
+        for i in range(num_class):
+            self._ap_calculators.append(
+                average_precision_calculator.AveragePrecisionCalculator())
+
+    def accumulate(self, predictions, actuals, num_positives=None):
+        """Accumulate the predictions and their ground truth labels.
+
+    Args:
+      predictions: A list of lists storing the prediction scores. The outer
+      dimension corresponds to classes.
+      actuals: A list of lists storing the ground truth labels. The dimensions
+      should correspond to the predictions input. Any value
+      larger than 0 will be treated as positives, otherwise as negatives.
+      num_positives: If provided, it is a list of numbers representing the
+      number of true positives for each class. If not provided, the number of
+      true positives will be inferred from the 'actuals' array.
+
+    Raises:
+      ValueError: An error occurred when the shape of predictions and actuals
+      does not match.
+    """
+        if not num_positives:
+            num_positives = [None for i in predictions.shape[1]]
+
+        calculators = self._ap_calculators
+        for i in range(len(predictions)):
+            calculators[i].accumulate(predictions[i], actuals[i],
+                                      num_positives[i])
+
+    def clear(self):
+        for calculator in self._ap_calculators:
+            calculator.clear()
+
+    def is_empty(self):
+        return ([calculator.heap_size for calculator in self._ap_calculators] ==
+                [0 for _ in range(self._num_class)])
+
+    def peek_map_at_n(self):
+        """Peek the non-interpolated mean average precision at n.
+
+    Returns:
+      An array of non-interpolated average precision at n (default 0) for each
+      class.
+    """
+        aps = [
+            self._ap_calculators[i].peek_ap_at_n()
+            for i in range(self._num_class)
+        ]
+        return aps
diff --git a/fluid/PaddleCV/video/models/__init__.py b/fluid/PaddleCV/video/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ae3da375a60279edd0b8be104ef399d9b0c7d725
--- /dev/null
+++ b/fluid/PaddleCV/video/models/__init__.py
@@ -0,0 +1,13 @@
+from .model import regist_model, get_model
+from .attention_cluster import AttentionCluster
+from .nextvlad import NEXTVLAD
+from .tsn import TSN
+from .stnet import STNET
+from .attention_lstm import AttentionLSTM
+
+# regist models
+regist_model("AttentionCluster", AttentionCluster)
+regist_model("NEXTVLAD", NEXTVLAD)
+regist_model("TSN", TSN)
+regist_model("STNET", STNET)
+regist_model("AttentionLSTM", AttentionLSTM)
diff --git a/fluid/PaddleCV/video/models/attention_cluster/__init__.py b/fluid/PaddleCV/video/models/attention_cluster/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..bd7ef3d595d3b3d286d765a24a0c44c4b29dc6c6
--- /dev/null
+++ b/fluid/PaddleCV/video/models/attention_cluster/__init__.py
@@ -0,0 +1,3 @@
+from __future__ import absolute_import
+
+from .attention_cluster import *
diff --git a/fluid/PaddleCV/video/models/attention_cluster/attention_cluster.py b/fluid/PaddleCV/video/models/attention_cluster/attention_cluster.py
new file mode 100755
index 0000000000000000000000000000000000000000..84282544c95b21231a043e22f2c8dadd25579e8f
--- /dev/null
+++ b/fluid/PaddleCV/video/models/attention_cluster/attention_cluster.py
@@ -0,0 +1,139 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import paddle.fluid as fluid
+from paddle.fluid import ParamAttr
+
+from ..model import ModelBase
+from .shifting_attention import ShiftingAttentionModel
+from .logistic_model import LogisticModel
+
+__all__ = ["AttentionCluster"]
+
+
+class AttentionCluster(ModelBase):
+    def __init__(self, name, cfg, mode='train'):
+        super(AttentionCluster, self).__init__(name, cfg, mode)
+        self.get_config()
+
+    def get_config(self):
+        # get model configs
+        self.feature_num = self.cfg.MODEL.feature_num
+        self.feature_names = self.cfg.MODEL.feature_names
+        self.feature_dims = self.cfg.MODEL.feature_dims
+        self.cluster_nums = self.cfg.MODEL.cluster_nums
+        self.seg_num = self.cfg.MODEL.seg_num
+        self.class_num = self.cfg.MODEL.num_classes
+        self.drop_rate = self.cfg.MODEL.drop_rate
+
+        if self.mode == 'train':
+            self.learning_rate = self.get_config_from_sec('train',
+                                                          'learning_rate', 1e-3)
+
+    def build_input(self, use_pyreader):
+        if use_pyreader:
+            assert self.mode != 'infer', \
+                'pyreader is not recommendated when infer, please set use_pyreader to be false.'
+            shapes = []
+            for dim in self.feature_dims:
+                shapes.append([-1, self.seg_num, dim])
+            shapes.append([-1, self.class_num])  # label
+            self.py_reader = fluid.layers.py_reader(
+                capacity=1024,
+                shapes=shapes,
+                lod_levels=[0] * (self.feature_num + 1),
+                dtypes=['float32'] * (self.feature_num + 1),
+                name='train_py_reader'
+                if self.is_training else 'test_py_reader',
+                use_double_buffer=True)
+            inputs = fluid.layers.read_file(self.py_reader)
+            self.feature_input = inputs[:self.feature_num]
+            self.label_input = inputs[-1]
+        else:
+            self.feature_input = []
+            for name, dim in zip(self.feature_names, self.feature_dims):
+                self.feature_input.append(
+                    fluid.layers.data(
+                        shape=[self.seg_num, dim], dtype='float32', name=name))
+            if self.mode == 'infer':
+                self.label_input = None
+            else:
+                self.label_input = fluid.layers.data(
+                    shape=[self.class_num], dtype='float32', name='label')
+
+    def build_model(self):
+        att_outs = []
+        for i, (input_dim, cluster_num, feature) in enumerate(
+                zip(self.feature_dims, self.cluster_nums, self.feature_input)):
+            att = ShiftingAttentionModel(input_dim, self.seg_num, cluster_num,
+                                         "satt{}".format(i))
+            att_out = att.forward(feature)
+            att_outs.append(att_out)
+        out = fluid.layers.concat(att_outs, axis=1)
+
+        if self.drop_rate > 0.:
+            out = fluid.layers.dropout(
+                out, self.drop_rate, is_test=(not self.is_training))
+
+        fc1 = fluid.layers.fc(
+            out,
+            size=1024,
+            act='tanh',
+            param_attr=ParamAttr(
+                name="fc1.weights",
+                initializer=fluid.initializer.MSRA(uniform=False)),
+            bias_attr=ParamAttr(
+                name="fc1.bias", initializer=fluid.initializer.MSRA()))
+        fc2 = fluid.layers.fc(
+            fc1,
+            size=4096,
+            act='tanh',
+            param_attr=ParamAttr(
+                name="fc2.weights",
+                initializer=fluid.initializer.MSRA(uniform=False)),
+            bias_attr=ParamAttr(
+                name="fc2.bias", initializer=fluid.initializer.MSRA()))
+
+        aggregate_model = LogisticModel()
+
+        self.output, self.logit = aggregate_model.build_model(
+            model_input=fc2,
+            vocab_size=self.class_num,
+            is_training=self.is_training)
+
+    def optimizer(self):
+        assert self.mode == 'train', "optimizer only can be get in train mode"
+        return fluid.optimizer.AdamOptimizer(self.learning_rate)
+
+    def loss(self):
+        assert self.mode != 'infer', "invalid loss calculationg in infer mode"
+        cost = fluid.layers.sigmoid_cross_entropy_with_logits(
+            x=self.logit, label=self.label_input)
+        cost = fluid.layers.reduce_sum(cost, dim=-1)
+        self.loss_ = fluid.layers.mean(x=cost)
+        return self.loss_
+
+    def outputs(self):
+        return [self.output, self.logit]
+
+    def feeds(self):
+        return self.feature_input if self.mode == 'infer' else self.feature_input + [
+            self.label_input
+        ]
+
+    def weights_info(self):
+        return (
+            "attention_cluster_youtube8m",
+            "https://paddlemodels.bj.bcebos.com/video_classification/attention_cluster_youtube8m.tar.gz"
+        )
diff --git a/fluid/PaddleCV/video/models/attention_cluster/logistic_model.py b/fluid/PaddleCV/video/models/attention_cluster/logistic_model.py
new file mode 100755
index 0000000000000000000000000000000000000000..6fad2a44ffc7df2049eeb04341a88b9c342c70ce
--- /dev/null
+++ b/fluid/PaddleCV/video/models/attention_cluster/logistic_model.py
@@ -0,0 +1,47 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import paddle
+import paddle.fluid as fluid
+
+
+class LogisticModel(object):
+    """Logistic model."""
+    def build_model(self,
+                    model_input,
+                    vocab_size,
+                    **unused_params):
+        """Creates a logistic model.
+
+    Args:
+      model_input: 'batch' x 'num_features' matrix of input features.
+      vocab_size: The number of classes in the dataset.
+
+    Returns:
+      A dictionary with a tensor containing the probability predictions of the
+      model in the 'predictions' key. The dimensions of the tensor are
+      batch_size x num_classes."""
+        logit = fluid.layers.fc(
+            input=model_input,
+            size=vocab_size,
+            act=None,
+            name='logits_clf',
+            param_attr=fluid.ParamAttr(
+                name='logistic.weights',
+                initializer=fluid.initializer.MSRA(uniform=False)),
+            bias_attr=fluid.ParamAttr(
+                name='logistic.bias',
+                initializer=fluid.initializer.MSRA(uniform=False)))
+        output = fluid.layers.sigmoid(logit)
+        return output, logit
diff --git a/fluid/PaddleCV/video/models/attention_cluster/shifting_attention.py b/fluid/PaddleCV/video/models/attention_cluster/shifting_attention.py
new file mode 100755
index 0000000000000000000000000000000000000000..e27ad8dd58b882eb96fbb9763eecccc36ddfe28a
--- /dev/null
+++ b/fluid/PaddleCV/video/models/attention_cluster/shifting_attention.py
@@ -0,0 +1,95 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import paddle.fluid as fluid
+from paddle.fluid import ParamAttr
+import numpy as np
+
+
+class ShiftingAttentionModel(object):
+    """Shifting Attention Model"""
+
+    def __init__(self, input_dim, seg_num, n_att, name):
+        self.n_att = n_att
+        self.input_dim = input_dim
+        self.seg_num = seg_num
+        self.name = name
+        self.gnorm = np.sqrt(n_att)
+
+    def softmax_m1(self, x):
+        x_shape = fluid.layers.shape(x)
+        x_shape.stop_gradient = True
+        flat_x = fluid.layers.reshape(x, shape=(-1, self.seg_num))
+        flat_softmax = fluid.layers.softmax(flat_x)
+        return fluid.layers.reshape(
+            flat_softmax, shape=x.shape, actual_shape=x_shape)
+
+    def glorot(self, n):
+        return np.sqrt(1.0 / np.sqrt(n))
+
+    def forward(self, x):
+        """Forward shifting attention model.
+
+        Args:
+          x: input features in shape of [N, L, F].
+
+        Returns:
+          out: output features in shape of [N, F * C]
+        """
+
+        trans_x = fluid.layers.transpose(x, perm=[0, 2, 1])
+        # scores and weight in shape [N, C, L], sum(weights, -1) = 1
+        trans_x = fluid.layers.unsqueeze(trans_x, [-1])
+        scores = fluid.layers.conv2d(
+            trans_x,
+            self.n_att,
+            filter_size=1,
+            param_attr=ParamAttr(
+                name=self.name + ".conv.weight",
+                initializer=fluid.initializer.MSRA(uniform=False)),
+            bias_attr=ParamAttr(
+                name=self.name + ".conv.bias",
+                initializer=fluid.initializer.MSRA()))
+        scores = fluid.layers.squeeze(scores, [-1])
+        weights = self.softmax_m1(scores)
+
+        glrt = self.glorot(self.n_att)
+        self.w = fluid.layers.create_parameter(
+            shape=(self.n_att, ),
+            dtype=x.dtype,
+            attr=ParamAttr(self.name + ".shift_w"),
+            default_initializer=fluid.initializer.Normal(0.0, glrt))
+        self.b = fluid.layers.create_parameter(
+            shape=(self.n_att, ),
+            dtype=x.dtype,
+            attr=ParamAttr(name=self.name + ".shift_b"),
+            default_initializer=fluid.initializer.Normal(0.0, glrt))
+
+        outs = []
+        for i in range(self.n_att):
+            # slice weight and expand to shape [N, L, C]
+            weight = fluid.layers.slice(
+                weights, axes=[1], starts=[i], ends=[i + 1])
+            weight = fluid.layers.transpose(weight, perm=[0, 2, 1])
+            weight = fluid.layers.expand(weight, [1, 1, self.input_dim])
+
+            w_i = fluid.layers.slice(self.w, axes=[0], starts=[i], ends=[i + 1])
+            b_i = fluid.layers.slice(self.b, axes=[0], starts=[i], ends=[i + 1])
+            shift = fluid.layers.reduce_sum(x * weight, dim=1) * w_i + b_i
+
+            l2_norm = fluid.layers.l2_normalize(shift, axis=-1)
+            outs.append(l2_norm / self.gnorm)
+
+        out = fluid.layers.concat(outs, axis=1)
+        return out
diff --git a/fluid/PaddleCV/video/models/attention_lstm/__init__.py b/fluid/PaddleCV/video/models/attention_lstm/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..cb872f0e43ab52054b42970896e5791a0eeb691d
--- /dev/null
+++ b/fluid/PaddleCV/video/models/attention_lstm/__init__.py
@@ -0,0 +1 @@
+from .attention_lstm import *
diff --git a/fluid/PaddleCV/video/models/attention_lstm/attention_lstm.py b/fluid/PaddleCV/video/models/attention_lstm/attention_lstm.py
new file mode 100755
index 0000000000000000000000000000000000000000..88bb6f334e8f32ff93e037b61fb7bcf673268f76
--- /dev/null
+++ b/fluid/PaddleCV/video/models/attention_lstm/attention_lstm.py
@@ -0,0 +1,150 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import paddle.fluid as fluid
+from paddle.fluid import ParamAttr
+
+from ..model import ModelBase
+from .lstm_attention import LSTMAttentionModel
+
+__all__ = ["AttentionLSTM"]
+
+
+class AttentionLSTM(ModelBase):
+    def __init__(self, name, cfg, mode='train'):
+        super(AttentionLSTM, self).__init__(name, cfg, mode)
+        self.get_config()
+
+    def get_config(self):
+        # get model configs
+        self.feature_num = self.cfg.MODEL.feature_num
+        self.feature_names = self.cfg.MODEL.feature_names
+        self.feature_dims = self.cfg.MODEL.feature_dims
+        self.num_classes = self.cfg.MODEL.num_classes
+        self.embedding_size = self.cfg.MODEL.embedding_size
+        self.lstm_size = self.cfg.MODEL.lstm_size
+        self.drop_rate = self.cfg.MODEL.drop_rate
+
+        # get mode configs
+        self.batch_size = self.get_config_from_sec(self.mode, 'batch_size', 1)
+        self.num_gpus = self.get_config_from_sec(self.mode, 'num_gpus', 1)
+
+        if self.mode == 'train':
+            self.learning_rate = self.get_config_from_sec('train',
+                                                          'learning_rate', 1e-3)
+            self.weight_decay = self.get_config_from_sec('train',
+                                                         'weight_decay', 8e-4)
+            self.num_samples = self.get_config_from_sec('train', 'num_samples',
+                                                        5000000)
+            self.decay_epochs = self.get_config_from_sec('train',
+                                                         'decay_epochs', [5])
+            self.decay_gamma = self.get_config_from_sec('train', 'decay_gamma',
+                                                        0.1)
+
+    def build_input(self, use_pyreader):
+        if use_pyreader:
+            assert self.mode != 'infer', \
+                'pyreader is not recommendated when infer, please set use_pyreader to be false.'
+            shapes = []
+            for dim in self.feature_dims:
+                shapes.append([-1, dim])
+            shapes.append([-1, self.num_classes])  # label
+            self.py_reader = fluid.layers.py_reader(
+                capacity=1024,
+                shapes=shapes,
+                lod_levels=[1] * self.feature_num + [0],
+                dtypes=['float32'] * (self.feature_num + 1),
+                name='train_py_reader'
+                if self.is_training else 'test_py_reader',
+                use_double_buffer=True)
+            inputs = fluid.layers.read_file(self.py_reader)
+            self.feature_input = inputs[:self.feature_num]
+            self.label_input = inputs[-1]
+        else:
+            self.feature_input = []
+            for name, dim in zip(self.feature_names, self.feature_dims):
+                self.feature_input.append(
+                    fluid.layers.data(
+                        shape=[dim], lod_level=1, dtype='float32', name=name))
+            if self.mode == 'infer':
+                self.label_input = None
+            else:
+                self.label_input = fluid.layers.data(
+                    shape=[self.num_classes], dtype='float32', name='label')
+
+    def build_model(self):
+        att_outs = []
+        for i, (input_dim, feature
+                ) in enumerate(zip(self.feature_dims, self.feature_input)):
+            att = LSTMAttentionModel(input_dim, self.embedding_size,
+                                     self.lstm_size, self.drop_rate)
+            att_out = att.forward(feature, is_training=(self.mode == 'train'))
+            att_outs.append(att_out)
+        out = fluid.layers.concat(att_outs, axis=1)
+
+        fc1 = fluid.layers.fc(
+            input=out,
+            size=8192,
+            act='relu',
+            bias_attr=ParamAttr(
+                regularizer=fluid.regularizer.L2Decay(0.0),
+                initializer=fluid.initializer.NormalInitializer(scale=0.0)))
+        fc2 = fluid.layers.fc(
+            input=fc1,
+            size=4096,
+            act='tanh',
+            bias_attr=ParamAttr(
+                regularizer=fluid.regularizer.L2Decay(0.0),
+                initializer=fluid.initializer.NormalInitializer(scale=0.0)))
+
+        self.logit = fluid.layers.fc(input=fc2, size=self.num_classes, act=None, \
+                              bias_attr=ParamAttr(regularizer=fluid.regularizer.L2Decay(0.0),
+                                                  initializer=fluid.initializer.NormalInitializer(scale=0.0)))
+
+        self.output = fluid.layers.sigmoid(self.logit)
+
+    def optimizer(self):
+        assert self.mode == 'train', "optimizer only can be get in train mode"
+        values = [
+            self.learning_rate * (self.decay_gamma**i)
+            for i in range(len(self.decay_epochs) + 1)
+        ]
+        iter_per_epoch = self.num_samples / self.batch_size
+        boundaries = [e * iter_per_epoch for e in self.decay_epochs]
+        return fluid.optimizer.RMSProp(
+            learning_rate=fluid.layers.piecewise_decay(
+                values=values, boundaries=boundaries),
+            centered=True,
+            regularization=fluid.regularizer.L2Decay(self.weight_decay))
+
+    def loss(self):
+        assert self.mode != 'infer', "invalid loss calculationg in infer mode"
+        cost = fluid.layers.sigmoid_cross_entropy_with_logits(
+            x=self.logit, label=self.label_input)
+        cost = fluid.layers.reduce_sum(cost, dim=-1)
+        sum_cost = fluid.layers.reduce_sum(cost)
+        self.loss_ = fluid.layers.scale(
+            sum_cost, scale=self.num_gpus, bias_after_scale=False)
+        return self.loss_
+
+    def outputs(self):
+        return [self.output, self.logit]
+
+    def feeds(self):
+        return self.feature_input if self.mode == 'infer' else self.feature_input + [
+            self.label_input
+        ]
+
+    def weights_info(self):
+        return (None, None)
diff --git a/fluid/PaddleCV/video/models/attention_lstm/lstm_attention.py b/fluid/PaddleCV/video/models/attention_lstm/lstm_attention.py
new file mode 100755
index 0000000000000000000000000000000000000000..5fce85c5c75ea176a3bf371de5f4eea5f02f25b3
--- /dev/null
+++ b/fluid/PaddleCV/video/models/attention_lstm/lstm_attention.py
@@ -0,0 +1,78 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import paddle.fluid as fluid
+from paddle.fluid import ParamAttr
+import numpy as np
+
+
+class LSTMAttentionModel(object):
+    """LSTM Attention Model"""
+
+    def __init__(self,
+                 bias_attr,
+                 embedding_size=512,
+                 lstm_size=1024,
+                 drop_rate=0.5):
+        self.lstm_size = lstm_size
+        self.embedding_size = embedding_size
+        self.drop_rate = drop_rate
+
+    def forward(self, input, is_training):
+        input_fc = fluid.layers.fc(
+            input=input,
+            size=self.embedding_size,
+            act='tanh',
+            bias_attr=ParamAttr(
+                regularizer=fluid.regularizer.L2Decay(0.0),
+                initializer=fluid.initializer.NormalInitializer(scale=0.0)))
+
+        lstm_forward_fc = fluid.layers.fc(
+            input=input_fc,
+            size=self.lstm_size * 4,
+            act=None,
+            bias_attr=ParamAttr(
+                regularizer=fluid.regularizer.L2Decay(0.0),
+                initializer=fluid.initializer.NormalInitializer(scale=0.0)))
+        lstm_forward, _ = fluid.layers.dynamic_lstm(
+            input=lstm_forward_fc, size=self.lstm_size * 4, is_reverse=False)
+
+        lsmt_backward_fc = fluid.layers.fc(
+            input=input_fc,
+            size=self.lstm_size * 4,
+            act=None,
+            bias_attr=ParamAttr(
+                regularizer=fluid.regularizer.L2Decay(0.0),
+                initializer=fluid.initializer.NormalInitializer(scale=0.0)))
+        lstm_backward, _ = fluid.layers.dynamic_lstm(
+            input=lsmt_backward_fc, size=self.lstm_size * 4, is_reverse=True)
+
+        lstm_concat = fluid.layers.concat(
+            input=[lstm_forward, lstm_backward], axis=1)
+
+        lstm_dropout = fluid.layers.dropout(
+            x=lstm_concat, dropout_prob=self.drop_rate, is_test=(not is_training))
+
+        lstm_weight = fluid.layers.fc(
+            input=lstm_dropout,
+            size=1,
+            act='sequence_softmax',
+            bias_attr=ParamAttr(
+                regularizer=fluid.regularizer.L2Decay(0.0),
+                initializer=fluid.initializer.NormalInitializer(scale=0.0)))
+        scaled = fluid.layers.elementwise_mul(
+            x=lstm_dropout, y=lstm_weight, axis=0)
+        lstm_pool = fluid.layers.sequence_pool(input=scaled, pool_type='sum')
+
+        return lstm_pool
diff --git a/fluid/PaddleCV/video/models/model.py b/fluid/PaddleCV/video/models/model.py
new file mode 100755
index 0000000000000000000000000000000000000000..44f888ef39ef1445fae0a6f0e3622002bf6cb66a
--- /dev/null
+++ b/fluid/PaddleCV/video/models/model.py
@@ -0,0 +1,181 @@
+#  Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import logging
+try:
+    from configparser import ConfigParser
+except:
+    from ConfigParser import ConfigParser
+
+import paddle.fluid as fluid
+from datareader import get_reader
+from metrics import get_metrics
+from .utils import download, AttrDict
+
+WEIGHT_DIR = os.path.expanduser("~/.paddle/weights")
+
+logger = logging.getLogger(__name__)
+
+
+class NotImplementError(Exception):
+    "Error: model function not implement"
+
+    def __init__(self, model, function):
+        super(NotImplementError, self).__init__()
+        self.model = model.__class__.__name__
+        self.function = function.__name__
+
+    def __str__(self):
+        return "Function {}() is not implemented in model {}".format(
+            self.function, self.model)
+
+
+class ModelNotFoundError(Exception):
+    "Error: model not found"
+
+    def __init__(self, model_name, avail_models):
+        super(ModelNotFoundError, self).__init__()
+        self.model_name = model_name
+        self.avail_models = avail_models
+
+    def __str__(self):
+        msg = "Model {} Not Found.\nAvailiable models:\n".format(
+            self.model_name)
+        for model in self.avail_models:
+            msg += "  {}\n".format(model)
+        return msg
+
+
+class ModelBase(object):
+    def __init__(self, name, cfg, mode='train'):
+        assert mode in ['train', 'valid', 'test', 'infer'], \
+                "Unknown mode type {}".format(mode)
+        self.name = name
+        self.is_training = (mode == 'train')
+        self.mode = mode
+        self.py_reader = None
+
+        # parse config
+        # assert os.path.exists(cfg), \
+        #         "Config file {} not exists".format(cfg)
+        # self._config = ModelConfig(cfg)
+        # self._config.parse()
+        # if args and isinstance(args, dict):
+        #     self._config.merge_configs(mode, args)
+        # self.cfg = self._config.get_configs()
+        self.cfg = cfg
+
+    def build_model(self):
+        "build model struct"
+        raise NotImplementError(self, self.build_model)
+
+    def build_input(self, use_pyreader):
+        "build input Variable"
+        raise NotImplementError(self, self.build_input)
+
+    def optimizer(self):
+        "get model optimizer"
+        raise NotImplementError(self, self.optimizer)
+
+    def outputs():
+        "get output variable"
+        raise notimplementerror(self, self.outputs)
+
+    def loss(self):
+        "get loss variable"
+        raise notimplementerror(self, self.loss)
+
+    def feeds(self):
+        "get feed inputs list"
+        raise NotImplementError(self, self.feeds)
+
+    def weights_info(self):
+        "get model weight default path and download url"
+        raise NotImplementError(self, self.weights_info)
+
+    def get_weights(self):
+        "get model weight file path, download weight from Paddle if not exist"
+        path, url = self.weights_info()
+        path = os.path.join(WEIGHT_DIR, path)
+        if os.path.exists(path):
+            return path
+
+        logger.info("Download weights of {} from {}".format(self.name, url))
+        download(url, path)
+        return path
+
+    def pyreader(self):
+        return self.py_reader
+
+    def epoch_num(self):
+        "get train epoch num"
+        return self.cfg.TRAIN.epoch
+
+    def pretrain_info(self):
+        "get pretrain base model directory"
+        return (None, None)
+
+    def get_pretrain_weights(self):
+        "get model weight file path, download weight from Paddle if not exist"
+        path, url = self.pretrain_info()
+        if not path:
+            return None
+
+        path = os.path.join(WEIGHT_DIR, path)
+        if os.path.exists(path):
+            return path
+
+        logger.info("Download pretrain weights of {} from {}".format(
+                self.name, url))
+        download(url, path)
+        return path
+
+    def load_pretrain_params(self, exe, pretrain, prog, place):
+        logger.info("Load pretrain weights from {}".format(pretrain))
+        fluid.io.load_params(exe, pretrain, main_program=prog)
+
+    def get_config_from_sec(self, sec, item, default=None):
+        if sec.upper() not in self.cfg:
+            return default
+        return self.cfg[sec.upper()].get(item, default)
+
+
+class ModelZoo(object):
+    def __init__(self):
+        self.model_zoo = {}
+
+    def regist(self, name, model):
+        assert model.__base__ == ModelBase, "Unknow model type {}".format(
+            type(model))
+        self.model_zoo[name] = model
+
+    def get(self, name, cfg, mode='train'):
+        for k, v in self.model_zoo.items():
+            if k == name:
+                return v(name, cfg, mode)
+        raise ModelNotFoundError(name, self.model_zoo.keys())
+
+
+# singleton model_zoo
+model_zoo = ModelZoo()
+
+
+def regist_model(name, model):
+    model_zoo.regist(name, model)
+
+
+def get_model(name, cfg, mode='train'):
+    return model_zoo.get(name, cfg, mode)
+
diff --git a/fluid/PaddleCV/video/models/nextvlad/__init__.py b/fluid/PaddleCV/video/models/nextvlad/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..d9a233374a1ac7069280801413872e9227f820a8
--- /dev/null
+++ b/fluid/PaddleCV/video/models/nextvlad/__init__.py
@@ -0,0 +1,3 @@
+from __future__ import absolute_import
+
+from .nextvlad import *
diff --git a/fluid/PaddleCV/video/models/nextvlad/clf_model.py b/fluid/PaddleCV/video/models/nextvlad/clf_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..70728dfb1139a1d32e6a5d921629ba018ed6cea9
--- /dev/null
+++ b/fluid/PaddleCV/video/models/nextvlad/clf_model.py
@@ -0,0 +1,50 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import paddle
+import paddle.fluid as fluid
+
+
+class LogisticModel(object):
+    """Logistic model with L2 regularization."""
+
+    def create_model(self,
+                     model_input,
+                     vocab_size,
+                     l2_penalty=None,
+                     **unused_params):
+        """Creates a logistic model.
+
+    Args:
+      model_input: 'batch' x 'num_features' matrix of input features.
+      vocab_size: The number of classes in the dataset.
+
+    Returns:
+      A dictionary with a tensor containing the probability predictions of the
+      model in the 'predictions' key. The dimensions of the tensor are
+      batch_size x num_classes."""
+        logits = fluid.layers.fc(
+            input=model_input,
+            size=vocab_size,
+            act=None,
+            name='logits_clf',
+            param_attr=fluid.ParamAttr(
+                name='logits_clf_weights',
+                initializer=fluid.initializer.MSRA(uniform=False),
+                regularizer=fluid.regularizer.L2DecayRegularizer(l2_penalty)),
+            bias_attr=fluid.ParamAttr(
+                name='logits_clf_bias',
+                regularizer=fluid.regularizer.L2DecayRegularizer(l2_penalty)))
+        output = fluid.layers.sigmoid(logits)
+        return {'predictions': output, 'logits': logits}
diff --git a/fluid/PaddleCV/video/models/nextvlad/nextvlad.py b/fluid/PaddleCV/video/models/nextvlad/nextvlad.py
new file mode 100755
index 0000000000000000000000000000000000000000..62a96b5d1d61e3447699b5ec974662566c2e45f0
--- /dev/null
+++ b/fluid/PaddleCV/video/models/nextvlad/nextvlad.py
@@ -0,0 +1,167 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import paddle.fluid as fluid
+from paddle.fluid import ParamAttr
+
+from ..model import ModelBase
+from .clf_model import LogisticModel
+from . import nextvlad_model
+
+__all__ = ["NEXTVLAD"]
+
+
+class NEXTVLAD(ModelBase):
+    def __init__(self, name, cfg, mode='train'):
+        super(NEXTVLAD, self).__init__(name, cfg, mode=mode)
+        self.get_config()
+
+    def get_config(self):
+        # model params
+        self.num_classes = self.get_config_from_sec('model', 'num_classes')
+        self.video_feature_size = self.get_config_from_sec('model',
+                                                           'video_feature_size')
+        self.audio_feature_size = self.get_config_from_sec('model',
+                                                           'audio_feature_size')
+        self.cluster_size = self.get_config_from_sec('model', 'cluster_size')
+        self.hidden_size = self.get_config_from_sec('model', 'hidden_size')
+        self.groups = self.get_config_from_sec('model', 'groups')
+        self.expansion = self.get_config_from_sec('model', 'expansion')
+        self.drop_rate = self.get_config_from_sec('model', 'drop_rate')
+        self.gating_reduction = self.get_config_from_sec('model',
+                                                         'gating_reduction')
+        self.eigen_file = self.get_config_from_sec('model', 'eigen_file')
+        # training params
+        self.base_learning_rate = self.get_config_from_sec('train',
+                                                           'learning_rate')
+        self.lr_boundary_examples = self.get_config_from_sec(
+            'train', 'lr_boundary_examples')
+        self.max_iter = self.get_config_from_sec('train', 'max_iter')
+        self.learning_rate_decay = self.get_config_from_sec(
+            'train', 'learning_rate_decay')
+        self.l2_penalty = self.get_config_from_sec('train', 'l2_penalty')
+        self.gradient_clip_norm = self.get_config_from_sec('train',
+                                                           'gradient_clip_norm')
+        self.use_gpu = self.get_config_from_sec('train', 'use_gpu')
+        self.num_gpus = self.get_config_from_sec('train', 'num_gpus')
+
+        # other params
+        self.batch_size = self.get_config_from_sec(self.mode, 'batch_size')
+
+    def build_input(self, use_pyreader=True):
+        rgb_shape = [self.video_feature_size]
+        audio_shape = [self.audio_feature_size]
+        label_shape = [self.num_classes]
+        if use_pyreader:
+            assert self.mode != 'infer', \
+                      'pyreader is not recommendated when infer, please set use_pyreader to be false.'
+            py_reader = fluid.layers.py_reader(
+                capacity=100,
+                shapes=[[-1] + rgb_shape, [-1] + audio_shape,
+                        [-1] + label_shape],
+                lod_levels=[1, 1, 0],
+                dtypes=['float32', 'float32', 'float32'],
+                name='train_py_reader'
+                if self.is_training else 'test_py_reader',
+                use_double_buffer=True)
+            rgb, audio, label = fluid.layers.read_file(py_reader)
+            self.py_reader = py_reader
+        else:
+            rgb = fluid.layers.data(
+                name='train_rgb' if self.is_training else 'test_rgb',
+                shape=rgb_shape,
+                dtype='float32',
+                lod_level=1)
+            audio = fluid.layers.data(
+                name='train_audio' if self.is_training else 'test_audio',
+                shape=audio_shape,
+                dtype='float32',
+                lod_level=1)
+            if self.mode == 'infer':
+                label = None
+            else:
+                label = fluid.layers.data(
+                    name='train_label' if self.is_training else 'test_label',
+                    shape=label_shape,
+                    dtype='float32')
+        self.feature_input = [rgb, audio]
+        self.label_input = label
+
+    def create_model_args(self):
+        model_args = {}
+        model_args['class_dim'] = self.num_classes
+        model_args['cluster_size'] = self.cluster_size
+        model_args['hidden_size'] = self.hidden_size
+        model_args['groups'] = self.groups
+        model_args['expansion'] = self.expansion
+        model_args['drop_rate'] = self.drop_rate
+        model_args['gating_reduction'] = self.gating_reduction
+        model_args['l2_penalty'] = self.l2_penalty
+        return model_args
+
+    def build_model(self):
+        model_args = self.create_model_args()
+        videomodel = nextvlad_model.NeXtVLADModel()
+        rgb = self.feature_input[0]
+        audio = self.feature_input[1]
+        out = videomodel.create_model(
+            rgb, audio, is_training=(self.mode == 'train'), **model_args)
+        self.logits = out['logits']
+        self.predictions = out['predictions']
+        self.network_outputs = [out['predictions']]
+
+    def optimizer(self):
+        assert self.mode == 'train', "optimizer only can be get in train mode"
+        im_per_batch = self.batch_size
+        lr_bounds, lr_values = get_learning_rate_decay_list(
+            self.base_learning_rate, self.learning_rate_decay, self.max_iter,
+            self.lr_boundary_examples, im_per_batch)
+        return fluid.optimizer.AdamOptimizer(
+            learning_rate=fluid.layers.piecewise_decay(
+                boundaries=lr_bounds, values=lr_values))
+
+    def loss(self):
+        assert self.mode != 'infer', "invalid loss calculationg in infer mode"
+        cost = fluid.layers.sigmoid_cross_entropy_with_logits(
+            x=self.logits, label=self.label_input)
+        cost = fluid.layers.reduce_sum(cost, dim=-1)
+        self.loss_ = fluid.layers.mean(x=cost)
+        return self.loss_
+
+    def outputs(self):
+        return self.network_outputs
+
+    def feeds(self):
+        return self.feature_input if self.mode == 'infer' else self.feature_input + [
+            self.label_input
+        ]
+    
+    def weights_info(self):
+        return ('nextvlad_youtube8m', 
+                'https://paddlemodels.bj.bcebos.com/video_classification/nextvlad_youtube8m.tar.gz')
+
+
+def get_learning_rate_decay_list(base_learning_rate, decay, max_iter,
+                                 decay_examples, total_batch_size):
+    decay_step = decay_examples // total_batch_size
+    lr_bounds = []
+    lr_values = [base_learning_rate]
+    i = 1
+    while True:
+        if i * decay_step >= max_iter:
+            break
+        lr_bounds.append(i * decay_step)
+        lr_values.append(base_learning_rate * (decay**i))
+        i += 1
+    return lr_bounds, lr_values
diff --git a/fluid/PaddleCV/video/models/nextvlad/nextvlad_model.py b/fluid/PaddleCV/video/models/nextvlad/nextvlad_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..9e9efe83fd936989b0b94ff2fadaf487c37c86b7
--- /dev/null
+++ b/fluid/PaddleCV/video/models/nextvlad/nextvlad_model.py
@@ -0,0 +1,231 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+from . import clf_model
+
+
+class NeXtVLAD(object):
+    """
+  This is a paddlepaddle implementation of the NeXtVLAD model. For more
+  information, please refer to the paper,
+   https://static.googleusercontent.com/media/research.google.com/zh-CN//youtube8m/workshop2018/p_c03.pdf
+  """
+
+    def __init__(self,
+                 feature_size,
+                 cluster_size,
+                 is_training=True,
+                 expansion=2,
+                 groups=None,
+                 inputname='video'):
+        self.feature_size = feature_size
+        self.cluster_size = cluster_size
+        self.is_training = is_training
+        self.expansion = expansion
+        self.groups = groups
+        self.name = inputname + '_'
+
+    def forward(self, input):
+        input = fluid.layers.fc(
+            input=input,
+            size=self.expansion * self.feature_size,
+            act=None,
+            name=self.name + 'fc_expansion',
+            param_attr=fluid.ParamAttr(
+                name=self.name + 'fc_expansion_w',
+                initializer=fluid.initializer.MSRA(uniform=False)),
+            bias_attr=fluid.ParamAttr(
+                name=self.name + 'fc_expansion_b',
+                initializer=fluid.initializer.Constant(value=0.)))
+
+        # attention factor of per group
+        attention = fluid.layers.fc(
+            input=input,
+            size=self.groups,
+            act='sigmoid',
+            name=self.name + 'fc_group_attention',
+            param_attr=fluid.ParamAttr(
+                name=self.name + 'fc_group_attention_w',
+                initializer=fluid.initializer.MSRA(uniform=False)),
+            bias_attr=fluid.ParamAttr(
+                name=self.name + 'fc_group_attention_b',
+                initializer=fluid.initializer.Constant(value=0.)))
+
+        # calculate activation factor of per group per cluster
+        feature_size = self.feature_size * self.expansion // self.groups
+        cluster_weights = fluid.layers.create_parameter(
+            shape=[
+                self.expansion * self.feature_size,
+                self.groups * self.cluster_size
+            ],
+            dtype=input.dtype,
+            attr=fluid.ParamAttr(name=self.name + 'cluster_weights'),
+            default_initializer=fluid.initializer.MSRA(uniform=False))
+
+        activation = fluid.layers.matmul(input, cluster_weights)
+        activation = fluid.layers.batch_norm(
+            activation, is_test=(not self.is_training))
+
+        # reshape of activation
+        activation = fluid.layers.reshape(activation,
+                                          [-1, self.groups, self.cluster_size])
+        # softmax on per cluster
+        activation = fluid.layers.softmax(activation)
+        activation = fluid.layers.elementwise_mul(activation, attention, axis=0)
+        a_sum = fluid.layers.sequence_pool(activation, 'sum')
+        a_sum = fluid.layers.reduce_sum(a_sum, dim=1)
+
+        # create cluster_weights2
+        cluster_weights2 = fluid.layers.create_parameter(
+            shape=[self.cluster_size, feature_size],
+            dtype=input.dtype,
+            attr=fluid.ParamAttr(name=self.name + 'cluster_weights2'),
+            default_initializer=fluid.initializer.MSRA(uniform=False))
+
+        # expand a_sum dimension from [-1, self.cluster_size] to be [-1, self.cluster_size, feature_size]
+        a_sum = fluid.layers.reshape(a_sum, [-1, self.cluster_size, 1])
+        a_sum = fluid.layers.expand(a_sum, [1, 1, feature_size])
+
+        # element wise multiply a_sum and cluster_weights2
+        a = fluid.layers.elementwise_mul(
+            a_sum, cluster_weights2,
+            axis=1)  # output shape [-1, self.cluster_size, feature_size]
+
+        # transpose activation from [-1, self.groups, self.cluster_size] to [-1, self.cluster_size, self.groups]
+        activation2 = fluid.layers.transpose(activation, perm=[0, 2, 1])
+        # transpose op will clear the lod infomation, so it should be reset
+        activation = fluid.layers.lod_reset(activation2, activation)
+
+        # reshape input from [-1, self.expansion * self.feature_size] to [-1, self.groups, feature_size]
+        reshaped_input = fluid.layers.reshape(input,
+                                              [-1, self.groups, feature_size])
+        # mat multiply activation and reshaped_input
+        vlad = fluid.layers.matmul(
+            activation,
+            reshaped_input)  # output shape [-1, self.cluster_size, feature_size]
+        vlad = fluid.layers.sequence_pool(vlad, 'sum')
+        vlad = fluid.layers.elementwise_sub(vlad, a)
+
+        # l2_normalization
+        vlad = fluid.layers.transpose(vlad, [0, 2, 1])
+        vlad = fluid.layers.l2_normalize(vlad, axis=1)
+
+        # reshape and batch norm
+        vlad = fluid.layers.reshape(vlad,
+                                    [-1, self.cluster_size * feature_size])
+        vlad = fluid.layers.batch_norm(vlad, is_test=(not self.is_training))
+
+        return vlad
+
+
+class NeXtVLADModel(object):
+    """
+  Creates a NeXtVLAD based model.
+  Args:
+    model_input: A LoDTensor of [-1, N] for the input video frames.
+    vocab_size: The number of classes in the dataset.
+  """
+
+    def __init__(self):
+        pass
+
+    def create_model(self,
+                     video_input,
+                     audio_input,
+                     is_training=True,
+                     class_dim=None,
+                     cluster_size=None,
+                     hidden_size=None,
+                     groups=None,
+                     expansion=None,
+                     drop_rate=None,
+                     gating_reduction=None,
+                     l2_penalty=None,
+                     **unused_params):
+
+        # calcluate vlad of video and audio
+        video_nextvlad = NeXtVLAD(
+            1024,
+            cluster_size,
+            is_training,
+            expansion=expansion,
+            groups=groups,
+            inputname='video')
+        audio_nextvlad = NeXtVLAD(
+            128,
+            cluster_size,
+            is_training,
+            expansion=expansion,
+            groups=groups,
+            inputname='audio')
+        vlad_video = video_nextvlad.forward(video_input)
+        vlad_audio = audio_nextvlad.forward(audio_input)
+
+        # concat video and audio
+        vlad = fluid.layers.concat([vlad_video, vlad_audio], axis=1)
+
+        # drop out
+        if drop_rate > 0.:
+            vlad = fluid.layers.dropout(
+                vlad, drop_rate, is_test=(not is_training))
+
+        # add fc
+        activation = fluid.layers.fc(
+            input=vlad,
+            size=hidden_size,
+            act=None,
+            name='hidden1_fc',
+            param_attr=fluid.ParamAttr(
+                name='hidden1_fc_weights',
+                initializer=fluid.initializer.MSRA(uniform=False)),
+            bias_attr=False)
+        activation = fluid.layers.batch_norm(
+            activation, is_test=(not is_training))
+
+        # add fc, gate 1
+        gates = fluid.layers.fc(
+            input=activation,
+            size=hidden_size // gating_reduction,
+            act=None,
+            name='gating_fc1',
+            param_attr=fluid.ParamAttr(
+                name='gating_fc1_weights',
+                initializer=fluid.initializer.MSRA(uniform=False)),
+            bias_attr=False)
+        gates = fluid.layers.batch_norm(
+            gates, is_test=(not is_training), act='relu')
+
+        # add fc, gate 2
+        gates = fluid.layers.fc(
+            input=gates,
+            size=hidden_size,
+            act='sigmoid',
+            name='gating_fc2',
+            param_attr=fluid.ParamAttr(
+                name='gating_fc2_weights',
+                initializer=fluid.initializer.MSRA(uniform=False)),
+            bias_attr=False)
+
+        activation = fluid.layers.elementwise_mul(activation, gates)
+        aggregate_model = clf_model.LogisticModel  # set classification model
+
+        return aggregate_model().create_model(
+            model_input=activation,
+            vocab_size=class_dim,
+            l2_penalty=l2_penalty,
+            is_training=is_training,
+            **unused_params)
diff --git a/fluid/PaddleCV/video/models/stnet/__init__.py b/fluid/PaddleCV/video/models/stnet/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..db952550a12b34853556fa42bba04c823bc7cbe4
--- /dev/null
+++ b/fluid/PaddleCV/video/models/stnet/__init__.py
@@ -0,0 +1 @@
+from .stnet import *
diff --git a/fluid/PaddleCV/video/models/stnet/stnet.py b/fluid/PaddleCV/video/models/stnet/stnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..e20ad0bd2adf21d6920a7660f86ff4a026b74060
--- /dev/null
+++ b/fluid/PaddleCV/video/models/stnet/stnet.py
@@ -0,0 +1,146 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+import numpy as np
+import paddle.fluid as fluid
+
+from ..model import ModelBase
+from .stnet_res_model import StNet_ResNet
+
+import logging
+logger = logging.getLogger(__name__)
+
+__all__ = ["STNET"]
+
+
+class STNET(ModelBase):
+    def __init__(self, name, cfg, mode='train'):
+        super(STNET, self).__init__(name, cfg, mode=mode)
+        self.get_config()
+
+    def get_config(self):
+        self.num_classes = self.get_config_from_sec('model', 'num_classes')
+        self.seg_num = self.get_config_from_sec('model', 'seg_num')
+        self.seglen = self.get_config_from_sec('model', 'seglen')
+        self.image_mean = self.get_config_from_sec('model', 'image_mean')
+        self.image_std = self.get_config_from_sec('model', 'image_std')
+        self.num_layers = self.get_config_from_sec('model', 'num_layers')
+
+        self.num_epochs = self.get_config_from_sec('train', 'epoch')
+        self.total_videos = self.get_config_from_sec('train', 'total_videos')
+        self.base_learning_rate = self.get_config_from_sec('train',
+                                                           'learning_rate')
+        self.learning_rate_decay = self.get_config_from_sec(
+            'train', 'learning_rate_decay')
+        self.l2_weight_decay = self.get_config_from_sec('train',
+                                                        'l2_weight_decay')
+        self.momentum = self.get_config_from_sec('train', 'momentum')
+
+        self.target_size = self.get_config_from_sec(self.mode, 'target_size')
+        self.batch_size = self.get_config_from_sec(self.mode, 'batch_size')
+
+    def build_input(self, use_pyreader=True):
+        image_shape = [3, self.target_size, self.target_size]
+        image_shape[0] = image_shape[0] * self.seglen
+        image_shape = [self.seg_num] + image_shape
+        self.use_pyreader = use_pyreader
+        if use_pyreader:
+            assert self.mode != 'infer', \
+                        'pyreader is not recommendated when infer, please set use_pyreader to be false.'
+            py_reader = fluid.layers.py_reader(
+                capacity=100,
+                shapes=[[-1] + image_shape, [-1] + [1]],
+                dtypes=['float32', 'int64'],
+                name='train_py_reader'
+                if self.is_training else 'test_py_reader',
+                use_double_buffer=True)
+            image, label = fluid.layers.read_file(py_reader)
+            self.py_reader = py_reader
+        else:
+            image = fluid.layers.data(
+                name='image', shape=image_shape, dtype='float32')
+            if self.mode != 'infer':
+                label = fluid.layers.data(
+                    name='label', shape=[1], dtype='int64')
+            else:
+                label = None
+        self.feature_input = [image]
+        self.label_input = label
+
+    def create_model_args(self):
+        cfg = {}
+        cfg['layers'] = self.num_layers
+        cfg['class_dim'] = self.num_classes
+        cfg['seg_num'] = self.seg_num
+        cfg['seglen'] = self.seglen
+        return cfg
+
+    def build_model(self):
+        cfg = self.create_model_args()
+        videomodel = StNet_ResNet(layers = cfg['layers'], seg_num = cfg['seg_num'], \
+                                  seglen = cfg['seglen'], is_training = (self.mode == 'train'))
+        out = videomodel.net(input=self.feature_input[0],
+                             class_dim=cfg['class_dim'])
+        self.network_outputs = [out]
+
+    def optimizer(self):
+        epoch_points = [self.num_epochs / 3, self.num_epochs * 2 / 3]
+        total_videos = self.total_videos
+        step = int(total_videos / self.batch_size + 1)
+        bd = [e * step for e in epoch_points]
+        base_lr = self.base_learning_rate
+        lr_decay = self.learning_rate_decay
+        lr = [base_lr, base_lr * lr_decay, base_lr * lr_decay * lr_decay]
+        l2_weight_decay = self.l2_weight_decay
+        momentum = self.momentum
+        optimizer = fluid.optimizer.Momentum(
+            learning_rate=fluid.layers.piecewise_decay(
+                boundaries=bd, values=lr),
+            momentum=momentum,
+            regularization=fluid.regularizer.L2Decay(l2_weight_decay))
+
+        return optimizer
+
+    def loss(self):
+        cost = fluid.layers.cross_entropy(input=self.network_outputs[0], \
+                           label=self.label_input, ignore_index=-1)
+        self.loss_ = fluid.layers.mean(x=cost)
+        return self.loss_
+
+    def outputs(self):
+        return self.network_outputs
+
+    def feeds(self):
+        return self.feature_input if self.mode == 'infer' else self.feature_input + [
+            self.label_input
+        ]
+
+    def pretrain_info(self):
+        return ('ResNet50_pretrained', 'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz')
+
+    def load_pretrain_params(self, exe, pretrain, prog, place):
+        def is_parameter(var):
+            if isinstance(var, fluid.framework.Parameter):
+                return isinstance(var, fluid.framework.Parameter) and (not ("fc_0" in var.name)) \
+                    and (not ("batch_norm" in var.name)) and (not ("xception" in var.name)) and (not ("conv3d" in var.name))
+
+        logger.info("Load pretrain weights from {}, exclude fc, batch_norm, xception, conv3d layers.".format(pretrain))
+        vars = filter(is_parameter, prog.list_vars())
+        fluid.io.load_vars(exe, pretrain, vars=vars, main_program=prog)
+
+        param_tensor = fluid.global_scope().find_var(
+            "conv1_weights").get_tensor()
+        param_numpy = np.array(param_tensor)
+        param_numpy = np.mean(param_numpy, axis=1, keepdims=True) / self.seglen
+        param_numpy = np.repeat(param_numpy, 3 * self.seglen, axis=1)
+        param_tensor.set(param_numpy.astype(np.float32), place)
diff --git a/fluid/PaddleCV/video/models/stnet/stnet_res_model.py b/fluid/PaddleCV/video/models/stnet/stnet_res_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..71a22c4f869161a8b92ec5c79a69b85bc68d4c86
--- /dev/null
+++ b/fluid/PaddleCV/video/models/stnet/stnet_res_model.py
@@ -0,0 +1,312 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import time
+import sys
+import paddle.fluid as fluid
+import math
+
+
+class StNet_ResNet():
+    def __init__(self, layers=50, seg_num=7, seglen=5, is_training=True):
+        self.layers = layers
+        self.seglen = seglen
+        self.seg_num = seg_num
+        self.is_training = is_training
+
+    def temporal_conv_bn(
+            self,
+            input,  #(B*seg_num, c, h, w)
+            num_filters,
+            filter_size=(3, 1, 1),
+            padding=(1, 0, 0)):
+        #(B, seg_num, c, h, w)
+        in_reshape = fluid.layers.reshape(
+            x=input,
+            shape=[
+                -1, self.seg_num, input.shape[-3], input.shape[-2],
+                input.shape[-1]
+            ])
+        in_transpose = fluid.layers.transpose(in_reshape, perm=[0, 2, 1, 3, 4])
+
+        conv = fluid.layers.conv3d(
+            input=in_transpose,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=1,
+            groups=1,
+            padding=padding,
+            act='relu',
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.MSRAInitializer()),
+            bias_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=0.0)))
+
+        out = fluid.layers.batch_norm(
+            input=conv,
+            act=None,
+            is_test=(not self.is_training),
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=1.0)),
+            bias_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=0.0)))
+        out = out + in_transpose
+        out = fluid.layers.transpose(out, perm=[0, 2, 1, 3, 4])
+        out = fluid.layers.reshape(x=out, shape=input.shape)
+        return out
+
+    def xception(self, input):  #(B, C, seg_num,1)
+        bn = fluid.layers.batch_norm(
+            input=input,
+            act=None,
+            name="xception_bn",
+            is_test=(not self.is_training),
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=1.0)),
+            bias_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=0.0)))
+        att_conv = fluid.layers.conv2d(
+            input=bn,
+            num_filters=2048,
+            filter_size=[3, 1],
+            stride=[1, 1],
+            padding=[1, 0],
+            groups=2048,
+            name="xception_att_conv",
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.MSRAInitializer()),
+            bias_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=0)))
+        att_2 = fluid.layers.conv2d(
+            input=att_conv,
+            num_filters=1024,
+            filter_size=[1, 1],
+            stride=[1, 1],
+            name="xception_att_2",
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.MSRAInitializer()),
+            bias_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=0)))
+        bndw = fluid.layers.batch_norm(
+            input=att_2,
+            act="relu",
+            name="xception_bndw",
+            is_test=(not self.is_training),
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=1.0)),
+            bias_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=0.0)))
+        att1 = fluid.layers.conv2d(
+            input=bndw,
+            num_filters=1024,
+            filter_size=[3, 1],
+            stride=[1, 1],
+            padding=[1, 0],
+            groups=1024,
+            name="xception_att1",
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.MSRAInitializer()),
+            bias_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=0)))
+        att1_2 = fluid.layers.conv2d(
+            input=att1,
+            num_filters=1024,
+            filter_size=[1, 1],
+            stride=[1, 1],
+            name="xception_att1_2",
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.MSRAInitializer()),
+            bias_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=0)))
+        dw = fluid.layers.conv2d(
+            input=bn,
+            num_filters=1024,
+            filter_size=[1, 1],
+            stride=[1, 1],
+            name="xception_dw",
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.MSRAInitializer()),
+            bias_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=0)))
+        add_to = dw + att1_2
+        bn2 = fluid.layers.batch_norm(
+            input=add_to,
+            act=None,
+            name='xception_bn2',
+            is_test=(not self.is_training),
+            param_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=1.0)),
+            bias_attr=fluid.param_attr.ParamAttr(
+                initializer=fluid.initializer.ConstantInitializer(value=0.0)))
+        return fluid.layers.relu(bn2)
+
+    def conv_bn_layer(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=groups,
+            act=None,
+            param_attr=fluid.param_attr.ParamAttr(name=name + "_weights"),
+            bias_attr=False,
+            #name = name+".conv2d.output.1"
+        )
+        if name == "conv1":
+            bn_name = "bn_" + name
+        else:
+            bn_name = "bn" + name[3:]
+        return fluid.layers.batch_norm(
+            input=conv,
+            act=act,
+            is_test=(not self.is_training),
+            #name=bn_name+'.output.1',
+            param_attr=fluid.param_attr.ParamAttr(name=bn_name + "_scale"),
+            bias_attr=fluid.param_attr.ParamAttr(bn_name + '_offset'),
+            moving_mean_name=bn_name + "_mean",
+            moving_variance_name=bn_name + '_variance')
+
+    def shortcut(self, input, ch_out, stride, name):
+        ch_in = input.shape[1]
+        if ch_in != ch_out or stride != 1:
+            return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
+        else:
+            return input
+
+    def bottleneck_block(self, input, num_filters, stride, name):
+        conv0 = self.conv_bn_layer(
+            input=input,
+            num_filters=num_filters,
+            filter_size=1,
+            act='relu',
+            name=name + "_branch2a")
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            stride=stride,
+            act='relu',
+            name=name + "_branch2b")
+        conv2 = self.conv_bn_layer(
+            input=conv1,
+            num_filters=num_filters * 4,
+            filter_size=1,
+            act=None,
+            name=name + "_branch2c")
+
+        short = self.shortcut(
+            input, num_filters * 4, stride, name=name + "_branch1")
+
+        return fluid.layers.elementwise_add(
+            x=short,
+            y=conv2,
+            act='relu',
+            #name=".add.output.5"
+        )
+
+    def net(self, input, class_dim=101):
+        layers = self.layers
+        seg_num = self.seg_num
+        seglen = self.seglen
+
+        supported_layers = [50, 101, 152]
+        if layers not in supported_layers:
+            print("supported layers are", supported_layers, \
+                  "but input layer is ", layers)
+            exit()
+
+        # reshape input
+        # [B, seg_num, seglen*c, H, W] --> [B*seg_num, seglen*c, H, W]
+        channels = input.shape[2]
+        short_size = input.shape[3]
+        input = fluid.layers.reshape(
+            x=input, shape=[-1, channels, short_size, short_size])
+
+        if layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        num_filters = [64, 128, 256, 512]
+
+        conv = self.conv_bn_layer(
+            input=input,
+            num_filters=64,
+            filter_size=7,
+            stride=2,
+            act='relu',
+            name='conv1')
+        conv = fluid.layers.pool2d(
+            input=conv,
+            pool_size=3,
+            pool_stride=2,
+            pool_padding=1,
+            pool_type='max')
+
+        for block in range(len(depth)):
+            for i in range(depth[block]):
+                if layers in [101, 152] and block == 2:
+                    if i == 0:
+                        conv_name = "res" + str(block + 2) + "a"
+                    else:
+                        conv_name = "res" + str(block + 2) + "b" + str(i)
+                else:
+                    conv_name = "res" + str(block + 2) + chr(97 + i)
+
+                conv = self.bottleneck_block(
+                    input=conv,
+                    num_filters=num_filters[block],
+                    stride=2 if i == 0 and block != 0 else 1,
+                    name=conv_name)
+            if block == 1:
+                #insert the first temporal modeling block
+                conv = self.temporal_conv_bn(input=conv, num_filters=512)
+            if block == 2:
+                #insert the second temporal modeling block
+                conv = self.temporal_conv_bn(input=conv, num_filters=1024)
+
+        pool = fluid.layers.pool2d(
+            input=conv, pool_size=7, pool_type='avg', global_pooling=True)
+
+        feature = fluid.layers.reshape(
+            x=pool, shape=[-1, seg_num, pool.shape[1], 1])
+        feature = fluid.layers.transpose(feature, perm=[0, 2, 1, 3])
+
+        #append the temporal Xception block
+        xfeat = self.xception(feature)  #(B, 1024, seg_num, 1)
+        out = fluid.layers.pool2d(
+            input=xfeat,
+            pool_size=(seg_num, 1),
+            pool_type='max',
+            global_pooling=True)
+        out = fluid.layers.reshape(x=out, shape=[-1, 1024])
+
+        stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+        out = fluid.layers.fc(input=out,
+                              size=class_dim,
+                              act='softmax',
+                              param_attr=fluid.param_attr.ParamAttr(
+                                  initializer=fluid.initializer.Uniform(-stdv,
+                                                                        stdv)))
+        return out
diff --git a/fluid/PaddleCV/video/models/tsn/__init__.py b/fluid/PaddleCV/video/models/tsn/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..bd57d2687bc948e63dd88306e9d435bbbb5a7978
--- /dev/null
+++ b/fluid/PaddleCV/video/models/tsn/__init__.py
@@ -0,0 +1 @@
+from .tsn import *
diff --git a/fluid/PaddleCV/video/models/tsn/tsn.py b/fluid/PaddleCV/video/models/tsn/tsn.py
new file mode 100644
index 0000000000000000000000000000000000000000..5bc8aba3886df138fc5111965b344d47325063cd
--- /dev/null
+++ b/fluid/PaddleCV/video/models/tsn/tsn.py
@@ -0,0 +1,142 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import paddle.fluid as fluid
+from paddle.fluid import ParamAttr
+
+from ..model import ModelBase
+from .tsn_res_model import TSN_ResNet
+
+import logging
+logger = logging.getLogger(__name__)
+
+__all__ = ["TSN"]
+
+
+class TSN(ModelBase):
+    def __init__(self, name, cfg, mode='train'):
+        super(TSN, self).__init__(name, cfg, mode=mode)
+        self.get_config()
+
+    def get_config(self):
+        self.num_classes = self.get_config_from_sec('model', 'num_classes')
+        self.seg_num = self.get_config_from_sec('model', 'seg_num')
+        self.seglen = self.get_config_from_sec('model', 'seglen')
+        self.image_mean = self.get_config_from_sec('model', 'image_mean')
+        self.image_std = self.get_config_from_sec('model', 'image_std')
+        self.num_layers = self.get_config_from_sec('model', 'num_layers')
+
+        self.num_epochs = self.get_config_from_sec('train', 'epoch')
+        self.total_videos = self.get_config_from_sec('train', 'total_videos')
+        self.base_learning_rate = self.get_config_from_sec('train',
+                                                           'learning_rate')
+        self.learning_rate_decay = self.get_config_from_sec(
+            'train', 'learning_rate_decay')
+        self.l2_weight_decay = self.get_config_from_sec('train',
+                                                        'l2_weight_decay')
+        self.momentum = self.get_config_from_sec('train', 'momentum')
+
+        self.target_size = self.get_config_from_sec(self.mode, 'target_size')
+        self.batch_size = self.get_config_from_sec(self.mode, 'batch_size')
+
+    def build_input(self, use_pyreader=True):
+        image_shape = [3, self.target_size, self.target_size]
+        image_shape[0] = image_shape[0] * self.seglen
+        image_shape = [self.seg_num] + image_shape
+        self.use_pyreader = use_pyreader
+        if use_pyreader:
+            assert self.mode != 'infer', \
+                        'pyreader is not recommendated when infer, please set use_pyreader to be false.'
+            py_reader = fluid.layers.py_reader(
+                capacity=100,
+                shapes=[[-1] + image_shape, [-1] + [1]],
+                dtypes=['float32', 'int64'],
+                name='train_py_reader'
+                if self.is_training else 'test_py_reader',
+                use_double_buffer=True)
+            image, label = fluid.layers.read_file(py_reader)
+            self.py_reader = py_reader
+        else:
+            image = fluid.layers.data(
+                name='image', shape=image_shape, dtype='float32')
+            if self.mode != 'infer':
+                label = fluid.layers.data(
+                    name='label', shape=[1], dtype='int64')
+            else:
+                label = None
+        self.feature_input = [image]
+        self.label_input = label
+
+    def create_model_args(self):
+        cfg = {}
+        cfg['layers'] = self.num_layers
+        cfg['class_dim'] = self.num_classes
+        cfg['seg_num'] = self.seg_num
+        return cfg
+
+    def build_model(self):
+        cfg = self.create_model_args()
+        videomodel = TSN_ResNet(
+            layers=cfg['layers'],
+            seg_num=cfg['seg_num'],
+            is_training=(self.mode == 'train'))
+        out = videomodel.net(input=self.feature_input[0],
+                             class_dim=cfg['class_dim'])
+        self.network_outputs = [out]
+
+    def optimizer(self):
+        assert self.mode == 'train', "optimizer only can be get in train mode"
+        epoch_points = [self.num_epochs / 3, self.num_epochs * 2 / 3]
+        total_videos = self.total_videos
+        step = int(total_videos / self.batch_size + 1)
+        bd = [e * step for e in epoch_points]
+        base_lr = self.base_learning_rate
+        lr_decay = self.learning_rate_decay
+        lr = [base_lr, base_lr * lr_decay, base_lr * lr_decay * lr_decay]
+        l2_weight_decay = self.l2_weight_decay
+        momentum = self.momentum
+        optimizer = fluid.optimizer.Momentum(
+            learning_rate=fluid.layers.piecewise_decay(
+                boundaries=bd, values=lr),
+            momentum=momentum,
+            regularization=fluid.regularizer.L2Decay(l2_weight_decay))
+
+        return optimizer
+
+    def loss(self):
+        assert self.mode != 'infer', "invalid loss calculationg in infer mode"
+        cost = fluid.layers.cross_entropy(input=self.network_outputs[0], \
+                           label=self.label_input, ignore_index=-1)
+        self.loss_ = fluid.layers.mean(x=cost)
+        return self.loss_
+
+    def outputs(self):
+        return self.network_outputs
+
+    def feeds(self):
+        return self.feature_input if self.mode == 'infer' else self.feature_input + [
+            self.label_input
+        ]
+
+    def pretrain_info(self):
+        return ('ResNet50_pretrained', 'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz')
+
+    def load_pretrain_params(self, exe, pretrain, prog, place):
+        def is_parameter(var):
+            return isinstance(var, fluid.framework.Parameter) and (not ("fc_0" in var.name))
+
+        logger.info("Load pretrain weights from {}, exclude fc layer.".format(pretrain))
+        vars = filter(is_parameter, prog.list_vars())
+        fluid.io.load_vars(exe, pretrain, vars=vars, main_program=prog)
+
diff --git a/fluid/PaddleCV/video/models/tsn/tsn_res_model.py b/fluid/PaddleCV/video/models/tsn/tsn_res_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..09dc54893f3305a0a1a94fe6e73aff32680915d9
--- /dev/null
+++ b/fluid/PaddleCV/video/models/tsn/tsn_res_model.py
@@ -0,0 +1,158 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import time
+import sys
+import paddle.fluid as fluid
+import math
+
+
+class TSN_ResNet():
+    def __init__(self, layers=50, seg_num=7, is_training=True):
+        self.layers = layers
+        self.seg_num = seg_num
+        self.is_training = is_training
+
+    def conv_bn_layer(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=groups,
+            act=None,
+            param_attr=fluid.param_attr.ParamAttr(name=name + "_weights"),
+            bias_attr=False)
+        if name == "conv1":
+            bn_name = "bn_" + name
+        else:
+            bn_name = "bn" + name[3:]
+
+        return fluid.layers.batch_norm(
+            input=conv,
+            act=act,
+            is_test=(not self.is_training),
+            param_attr=fluid.param_attr.ParamAttr(name=bn_name + "_scale"),
+            bias_attr=fluid.param_attr.ParamAttr(bn_name + '_offset'),
+            moving_mean_name=bn_name + "_mean",
+            moving_variance_name=bn_name + '_variance')
+
+    def shortcut(self, input, ch_out, stride, name):
+        ch_in = input.shape[1]
+        if ch_in != ch_out or stride != 1:
+            return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
+        else:
+            return input
+
+    def bottleneck_block(self, input, num_filters, stride, name):
+        conv0 = self.conv_bn_layer(
+            input=input,
+            num_filters=num_filters,
+            filter_size=1,
+            act='relu',
+            name=name + "_branch2a")
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            stride=stride,
+            act='relu',
+            name=name + "_branch2b")
+        conv2 = self.conv_bn_layer(
+            input=conv1,
+            num_filters=num_filters * 4,
+            filter_size=1,
+            act=None,
+            name=name + "_branch2c")
+
+        short = self.shortcut(
+            input, num_filters * 4, stride, name=name + "_branch1")
+
+        return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
+
+    def net(self, input, class_dim=101):
+        layers = self.layers
+        seg_num = self.seg_num
+        supported_layers = [50, 101, 152]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(supported_layers, layers)
+
+        # reshape input
+        channels = input.shape[2]
+        short_size = input.shape[3]
+        input = fluid.layers.reshape(
+            x=input, shape=[-1, channels, short_size, short_size])
+
+        if layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        num_filters = [64, 128, 256, 512]
+
+        conv = self.conv_bn_layer(
+            input=input,
+            num_filters=64,
+            filter_size=7,
+            stride=2,
+            act='relu',
+            name='conv1')
+        conv = fluid.layers.pool2d(
+            input=conv,
+            pool_size=3,
+            pool_stride=2,
+            pool_padding=1,
+            pool_type='max')
+
+        for block in range(len(depth)):
+            for i in range(depth[block]):
+                if layers in [101, 152] and block == 2:
+                    if i == 0:
+                        conv_name = "res" + str(block + 2) + "a"
+                    else:
+                        conv_name = "res" + str(block + 2) + "b" + str(i)
+                else:
+                    conv_name = "res" + str(block + 2) + chr(97 + i)
+
+                conv = self.bottleneck_block(
+                    input=conv,
+                    num_filters=num_filters[block],
+                    stride=2 if i == 0 and block != 0 else 1,
+                    name=conv_name)
+
+        pool = fluid.layers.pool2d(
+            input=conv, pool_size=7, pool_type='avg', global_pooling=True)
+
+        feature = fluid.layers.reshape(
+            x=pool, shape=[-1, seg_num, pool.shape[1]])
+        out = fluid.layers.reduce_mean(feature, dim=1)
+
+        stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+        out = fluid.layers.fc(input=out,
+                              size=class_dim,
+                              act='softmax',
+                              param_attr=fluid.param_attr.ParamAttr(
+                                  initializer=fluid.initializer.Uniform(-stdv,
+                                                                        stdv)))
+        return out
diff --git a/fluid/PaddleCV/video/models/utils.py b/fluid/PaddleCV/video/models/utils.py
new file mode 100755
index 0000000000000000000000000000000000000000..b02abfdf134c869fe4805f4a746d7357efd0b7b1
--- /dev/null
+++ b/fluid/PaddleCV/video/models/utils.py
@@ -0,0 +1,47 @@
+#  Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import wget
+import tarfile
+
+__all__ = ['decompress', 'download', 'AttrDict']
+
+
+def decompress(path):
+    t = tarfile.open(path)
+    t.extractall(path='/'.join(path.split('/')[:-1]))
+    t.close()
+    os.remove(path)
+
+
+def download(url, path):
+    weight_dir = '/'.join(path.split('/')[:-1])
+    if not os.path.exists(weight_dir):
+        os.makedirs(weight_dir)
+
+    path = path + ".tar.gz"
+    wget.download(url, path)
+    decompress(path)
+
+
+class AttrDict(dict):
+    def __getattr__(self, key):
+        return self[key]
+
+    def __setattr__(self, key, value):
+        if key in self.__dict__:
+            self.__dict__[key] = value
+        else:
+            self[key] = value
diff --git a/fluid/PaddleCV/video/scripts/infer/infer_attention_cluster.sh b/fluid/PaddleCV/video/scripts/infer/infer_attention_cluster.sh
new file mode 100644
index 0000000000000000000000000000000000000000..be6045db83d29426d363bacc524434f60d45ea57
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/infer/infer_attention_cluster.sh
@@ -0,0 +1,4 @@
+python infer.py --model-name="AttentionCluster" --config=./configs/attention_cluster.txt \
+                --filelist=./data/youtube8m/infer.list \
+                --weights=./checkpoints/AttentionCluster_epoch0 \
+                --save-dir="./save"
diff --git a/fluid/PaddleCV/video/scripts/infer/infer_attention_lstm.sh b/fluid/PaddleCV/video/scripts/infer/infer_attention_lstm.sh
new file mode 100644
index 0000000000000000000000000000000000000000..019bb346617d844e33d4dc883d1a6f96a7a91f3a
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/infer/infer_attention_lstm.sh
@@ -0,0 +1,4 @@
+python infer.py --model-name="AttentionLSTM" --config=./configs/attention_lstm.txt \
+                --filelist=./data/youtube8m/infer.list \
+                --weights=./checkpoints/AttentionLSTM_epoch0 \
+                --save-dir="./save"
diff --git a/fluid/PaddleCV/video/scripts/infer/infer_nextvlad.sh b/fluid/PaddleCV/video/scripts/infer/infer_nextvlad.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1a96980106bbda1fc3678785323904a4ccecaa65
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/infer/infer_nextvlad.sh
@@ -0,0 +1,3 @@
+python infer.py --model-name="NEXTVLAD" --config=./configs/nextvlad.txt --filelist=./data/youtube8m/infer.list \
+                --weights=./checkpoints/NEXTVLAD_epoch0 \
+                --save-dir="./save"
diff --git a/fluid/PaddleCV/video/scripts/infer/infer_stnet.sh b/fluid/PaddleCV/video/scripts/infer/infer_stnet.sh
new file mode 100644
index 0000000000000000000000000000000000000000..8b27a234d9a650b3e4acf8d9dae5ba1bb68fc71b
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/infer/infer_stnet.sh
@@ -0,0 +1,2 @@
+python infer.py --model-name="STNET" --config=./configs/stnet.txt --filelist=./data/kinetics/infer.list \
+                --log-interval=10 --weights=./checkpoints/STNET_epoch0 --save-dir=./save
diff --git a/fluid/PaddleCV/video/scripts/infer/infer_tsn.sh b/fluid/PaddleCV/video/scripts/infer/infer_tsn.sh
new file mode 100644
index 0000000000000000000000000000000000000000..515feaf4a502bb35691d357a038f702345e9b9a2
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/infer/infer_tsn.sh
@@ -0,0 +1,2 @@
+python infer.py --model-name="TSN" --config=./configs/tsn.txt --filelist=./data/kinetics/infer.list \
+                --log-interval=10 --weights=./checkpoints/TSN_epoch0 --save-dir=./save
diff --git a/fluid/PaddleCV/video/scripts/test/test_attention_cluster.sh b/fluid/PaddleCV/video/scripts/test/test_attention_cluster.sh
new file mode 100644
index 0000000000000000000000000000000000000000..21df131934cb4306d185fd76374b4314767add68
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/test/test_attention_cluster.sh
@@ -0,0 +1,2 @@
+python test.py --model-name="AttentionCluster" --config=./configs/attention_cluster.txt \
+                --log-interval=5 --weights=./checkpoints/AttentionCluster_epoch0
diff --git a/fluid/PaddleCV/video/scripts/test/test_attention_lstm.sh b/fluid/PaddleCV/video/scripts/test/test_attention_lstm.sh
new file mode 100644
index 0000000000000000000000000000000000000000..d728dbd1c172a5d3c9e19dd7dce457136a90f3d5
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/test/test_attention_lstm.sh
@@ -0,0 +1,2 @@
+python test.py --model-name="AttentionLSTM" --config=./configs/attention_lstm.txt \
+                --log-interval=5 --weights=./checkpoints/AttentionLSTM_epoch0
diff --git a/fluid/PaddleCV/video/scripts/test/test_nextvlad.sh b/fluid/PaddleCV/video/scripts/test/test_nextvlad.sh
new file mode 100644
index 0000000000000000000000000000000000000000..239e9980153303a161511a217a09a4d63b216e3b
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/test/test_nextvlad.sh
@@ -0,0 +1,2 @@
+python test.py --model-name="NEXTVLAD" --config=./configs/nextvlad.txt \
+                --log-interval=10 --weights=./checkpoints/NEXTVLAD_epoch0
diff --git a/fluid/PaddleCV/video/scripts/test/test_stnet.sh b/fluid/PaddleCV/video/scripts/test/test_stnet.sh
new file mode 100644
index 0000000000000000000000000000000000000000..6913ea6970f5448d83b72c5fe8f3b9c05925d9a8
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/test/test_stnet.sh
@@ -0,0 +1,2 @@
+python test.py --model-name="STNET" --config=./configs/stnet.txt \
+                --log-interval=10 --weights=./checkpoints/STNET_epoch0
diff --git a/fluid/PaddleCV/video/scripts/test/test_tsn.sh b/fluid/PaddleCV/video/scripts/test/test_tsn.sh
new file mode 100644
index 0000000000000000000000000000000000000000..b66bcb2cf08fbcccef1954369dd53d6e61b0894a
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/test/test_tsn.sh
@@ -0,0 +1,2 @@
+python test.py --model-name="TSN" --config=./configs/tsn.txt \
+                --log-interval=10 --weights=./checkpoints/TSN_epoch0
diff --git a/fluid/PaddleCV/video/scripts/train/train_attention_cluster.sh b/fluid/PaddleCV/video/scripts/train/train_attention_cluster.sh
new file mode 100644
index 0000000000000000000000000000000000000000..0a0b0bbb33ede34f56e7bda9f0dbce007e197aed
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/train/train_attention_cluster.sh
@@ -0,0 +1,2 @@
+python train.py --model-name="AttentionCluster" --config=./configs/attention_cluster.txt --epoch-num=5 \
+                --valid-interval=1 --log-interval=10
diff --git a/fluid/PaddleCV/video/scripts/train/train_attention_lstm.sh b/fluid/PaddleCV/video/scripts/train/train_attention_lstm.sh
new file mode 100644
index 0000000000000000000000000000000000000000..bb855b19cf1122fea3cbee0171531e6003fb64a9
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/train/train_attention_lstm.sh
@@ -0,0 +1,2 @@
+python train.py --model-name="AttentionLSTM" --config=./configs/attention_lstm.txt --epoch-num=10 \
+                --valid-interval=1 --log-interval=10
diff --git a/fluid/PaddleCV/video/scripts/train/train_nextvlad.sh b/fluid/PaddleCV/video/scripts/train/train_nextvlad.sh
new file mode 100644
index 0000000000000000000000000000000000000000..b5857e9f35a47d89ce5185c6d42b2ed51207e390
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/train/train_nextvlad.sh
@@ -0,0 +1,3 @@
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python train.py --model-name="NEXTVLAD" --config=./configs/nextvlad.txt --epoch-num=6 \
+                --valid-interval=1 --log-interval=10
diff --git a/fluid/PaddleCV/video/scripts/train/train_stnet.sh b/fluid/PaddleCV/video/scripts/train/train_stnet.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c595c10c025c517e9cdc4d70a9d316b853768fa9
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/train/train_stnet.sh
@@ -0,0 +1,2 @@
+python train.py --model-name="STNET" --config=./configs/stnet.txt --epoch-num=60 \
+                --valid-interval=1 --log-interval=10
diff --git a/fluid/PaddleCV/video/scripts/train/train_tsn.sh b/fluid/PaddleCV/video/scripts/train/train_tsn.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e476744d4626b0354cbf0ebb4f7c4b4ffa55a7f1
--- /dev/null
+++ b/fluid/PaddleCV/video/scripts/train/train_tsn.sh
@@ -0,0 +1,2 @@
+python train.py --model-name="TSN" --config=./configs/tsn.txt --epoch-num=45 \
+                --valid-interval=1 --log-interval=10
diff --git a/fluid/PaddleCV/video/test.py b/fluid/PaddleCV/video/test.py
new file mode 100755
index 0000000000000000000000000000000000000000..9698caecc21a26dc38256b145dd54d04a2e13c88
--- /dev/null
+++ b/fluid/PaddleCV/video/test.py
@@ -0,0 +1,124 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import sys
+import time
+import logging
+import argparse
+import numpy as np
+import paddle.fluid as fluid
+
+from config import *
+import models
+from datareader import get_reader
+from metrics import get_metrics
+
+logging.root.handlers = []
+FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
+logger = logging.getLogger(__name__)
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--model-name',
+        type=str,
+        default='AttentionCluster',
+        help='name of model to train.')
+    parser.add_argument(
+        '--config',
+        type=str,
+        default='configs/attention_cluster.txt',
+        help='path to config file of model')
+    parser.add_argument(
+        '--batch-size',
+        type=int,
+        default=None,
+        help='traing batch size per GPU. None to use config file setting.')
+    parser.add_argument(
+        '--use-gpu', type=bool, default=True, help='default use gpu.')
+    parser.add_argument(
+        '--weights',
+        type=str,
+        default=None,
+        help='weight path, None to use weights from Paddle.')
+    parser.add_argument(
+        '--log-interval',
+        type=int,
+        default=1,
+        help='mini-batch interval to log.')
+    args = parser.parse_args()
+    return args
+
+
+def test(args):
+    # parse config
+    config = parse_config(args.config)
+    test_config = merge_configs(config, 'test', vars(args))
+
+    # build model
+    test_model = models.get_model(args.model_name, test_config, mode='test')
+    test_model.build_input(use_pyreader=False)
+    test_model.build_model()
+    test_feeds = test_model.feeds()
+    test_outputs = test_model.outputs()
+    loss = test_model.loss()
+
+    place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    if args.weights:
+        assert os.path.exists(
+            args.weights), "Given weight dir {} not exist.".format(args.weights)
+    weights = args.weights or test_model.get_weights()
+
+    def if_exist(var):
+        return os.path.exists(os.path.join(weights, var.name))
+
+    fluid.io.load_vars(exe, weights, predicate=if_exist)
+
+    # get reader and metrics
+    test_reader = get_reader(args.model_name.upper(), 'test', test_config)
+    test_metrics = get_metrics(args.model_name.upper(), 'test', test_config)
+
+    test_feeder = fluid.DataFeeder(place=place, feed_list=test_feeds)
+    fetch_list = [loss.name] + [x.name
+                                for x in test_outputs] + [test_feeds[-1].name]
+
+    epoch_period = []
+    for test_iter, data in enumerate(test_reader()):
+        cur_time = time.time()
+        test_outs = exe.run(fetch_list=fetch_list,
+                            feed=test_feeder.feed(data))
+        period = time.time() - cur_time
+        epoch_period.append(period)
+        loss = np.array(test_outs[0])
+        pred = np.array(test_outs[1])
+        label = np.array(test_outs[-1])
+        test_metrics.accumulate(loss, pred, label)
+
+        # metric here
+        if args.log_interval > 0 and test_iter % args.log_interval == 0:
+            info_str = '[EVAL] Batch {}'.format(test_iter)
+            test_metrics.calculate_and_log_out(loss, pred, label, info_str)
+    test_metrics.finalize_and_log_out("[EVAL] eval finished. ")
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    logger.info(args)
+
+    test(args)
diff --git a/fluid/PaddleCV/video/tools/__init__.py b/fluid/PaddleCV/video/tools/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/fluid/PaddleCV/video/tools/train_utils.py b/fluid/PaddleCV/video/tools/train_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad2b10f94827dacf1aad4822c5ce4dcb13fa94e9
--- /dev/null
+++ b/fluid/PaddleCV/video/tools/train_utils.py
@@ -0,0 +1,134 @@
+import os
+import time
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+import logging
+import shutil
+
+logger = logging.getLogger(__name__)
+
+
+def test_without_pyreader(test_exe,
+                          test_reader,
+                          test_feeder,
+                          test_fetch_list,
+                          test_metrics,
+                          log_interval=0):
+    test_metrics.reset()
+    for test_iter, data in enumerate(test_reader()):
+        test_outs = test_exe.run(test_fetch_list, feed=test_feeder.feed(data))
+        loss = np.array(test_outs[0])
+        pred = np.array(test_outs[1])
+        label = np.array(test_outs[-1])
+        test_metrics.accumulate(loss, pred, label)
+        if log_interval > 0 and test_iter % log_interval == 0:
+            test_metrics.calculate_and_log_out(loss, pred, label, \
+                  info = '[TEST] test_iter {} '.format(test_iter))
+    test_metrics.finalize_and_log_out("[TEST] Finish")
+
+
+def test_with_pyreader(test_exe,
+                       test_pyreader,
+                       test_fetch_list,
+                       test_metrics,
+                       log_interval=0):
+    if not test_pyreader:
+        logger.error("[TEST] get pyreader failed.")
+    test_pyreader.start()
+    test_metrics.reset()
+    test_iter = 0
+    try:
+        while True:
+            test_outs = test_exe.run(fetch_list=test_fetch_list)
+            loss = np.array(test_outs[0])
+            pred = np.array(test_outs[1])
+            label = np.array(test_outs[-1])
+            test_metrics.accumulate(loss, pred, label)
+            if log_interval > 0 and test_iter % log_interval == 0:
+                test_metrics.calculate_and_log_out(loss, pred, label, \
+                  info = '[TEST] test_iter {} '.format(test_iter))
+            test_iter += 1
+    except fluid.core.EOFException:
+        test_metrics.finalize_and_log_out("[TEST] Finish")
+    finally:
+        test_pyreader.reset()
+
+
+def train_without_pyreader(exe, train_prog, train_exe, train_reader, train_feeder, \
+                           train_fetch_list, train_metrics, epochs = 10, \
+                           log_interval = 0, valid_interval = 0, save_dir = './', \
+                           save_model_name = 'model', test_exe = None, test_reader = None, \
+                           test_feeder = None, test_fetch_list = None, test_metrics = None):
+    for epoch in range(epochs):
+        epoch_periods = []
+        for train_iter, data in enumerate(train_reader()):
+            cur_time = time.time()
+            train_outs = train_exe.run(train_fetch_list,
+                                       feed=train_feeder.feed(data))
+            period = time.time() - cur_time
+            epoch_periods.append(period)
+            loss = np.array(train_outs[0])
+            pred = np.array(train_outs[1])
+            label = np.array(train_outs[-1])
+            if log_interval > 0 and (train_iter % log_interval == 0):
+                # eval here
+                train_metrics.calculate_and_log_out(loss, pred, label, \
+                       info = '[TRAIN] Epoch {}, iter {} '.format(epoch, train_iter))
+            train_iter += 1
+        logger.info('[TRAIN] Epoch {} training finished, average time: {}'.
+                    format(epoch, np.mean(epoch_periods)))
+        save_model(exe, train_prog, save_dir, save_model_name,
+                   "_epoch{}".format(epoch))
+        if test_exe and valid_interval > 0 and (epoch + 1) % valid_interval == 0:
+            test_without_pyreader(test_exe, test_reader, test_feeder,
+                                  test_fetch_list, test_metrics, log_interval)
+
+
+
+def train_with_pyreader(exe, train_prog, train_exe, train_pyreader, \
+                        train_fetch_list, train_metrics, epochs = 10, \
+                        log_interval = 0, valid_interval = 0, \
+                        save_dir = './', save_model_name = 'model', \
+                        test_exe = None, test_pyreader = None, \
+                        test_fetch_list = None, test_metrics = None):
+    if not train_pyreader:
+        logger.error("[TRAIN] get pyreader failed.")
+    for epoch in range(epochs):
+        train_pyreader.start()
+        train_metrics.reset()
+        try:
+            train_iter = 0
+            epoch_periods = []
+            while True:
+                cur_time = time.time()
+                train_outs = train_exe.run(fetch_list=train_fetch_list)
+                period = time.time() - cur_time
+                epoch_periods.append(period)
+                loss = np.array(train_outs[0])
+                pred = np.array(train_outs[1])
+                label = np.array(train_outs[-1])
+                if log_interval > 0 and (train_iter % log_interval == 0):
+                    # eval here
+                    train_metrics.calculate_and_log_out(loss, pred, label, \
+                                info = '[TRAIN] Epoch {}, iter {} '.format(epoch, train_iter))
+                train_iter += 1
+        except fluid.core.EOFException:
+            # eval here
+            logger.info('[TRAIN] Epoch {} training finished, average time: {}'.
+                        format(epoch, np.mean(epoch_periods)))
+            save_model(exe, train_prog, save_dir, save_model_name,
+                       "_epoch{}".format(epoch))
+            if test_exe and valid_interval > 0 and (epoch + 1) % valid_interval == 0:
+                test_with_pyreader(test_exe, test_pyreader, test_fetch_list,
+                                   test_metrics, log_interval)
+        finally:
+            epoch_period = []
+            train_pyreader.reset()
+
+
+def save_model(exe, program, save_dir, model_name, postfix=None):
+    model_path = os.path.join(save_dir, model_name + postfix)
+    if os.path.isdir(model_path):
+        shutil.rmtree(model_path)
+    fluid.io.save_persistables(exe, model_path, main_program=program)
diff --git a/fluid/PaddleCV/video/train.py b/fluid/PaddleCV/video/train.py
new file mode 100755
index 0000000000000000000000000000000000000000..154c51edd431286555b0e11d42a2c7a50ff4ee42
--- /dev/null
+++ b/fluid/PaddleCV/video/train.py
@@ -0,0 +1,226 @@
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import sys
+import time
+import argparse
+import logging
+import numpy as np
+import paddle.fluid as fluid
+
+from tools.train_utils import train_with_pyreader, train_without_pyreader
+import models
+from config import *
+from datareader import get_reader
+from metrics import get_metrics
+
+logging.root.handlers = []
+FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
+logger = logging.getLogger(__name__)
+
+
+def parse_args():
+    parser = argparse.ArgumentParser("Paddle Video train script")
+    parser.add_argument(
+        '--model-name',
+        type=str,
+        default='AttentionCluster',
+        help='name of model to train.')
+    parser.add_argument(
+        '--config',
+        type=str,
+        default='configs/attention_cluster.txt',
+        help='path to config file of model')
+    parser.add_argument(
+        '--batch-size',
+        type=int,
+        default=None,
+        help='training batch size. None to use config file setting.')
+    parser.add_argument(
+        '--learning-rate',
+        type=float,
+        default=None,
+        help='learning rate use for training. None to use config file setting.')
+    parser.add_argument(
+        '--pretrain',
+        type=str,
+        default=None,
+        help='path to pretrain weights. None to use default weights path in  ~/.paddle/weights.'
+    )
+    parser.add_argument(
+        '--resume',
+        type=str,
+        default=None,
+        help='path to resume training based on previous checkpoints. '
+             'None for not resuming any checkpoints.'
+    )
+    parser.add_argument(
+        '--use-gpu', type=bool, default=True, help='default use gpu.')
+    parser.add_argument(
+        '--no-use-pyreader',
+        action='store_true',
+        default=False,
+        help='whether to use pyreader')
+    parser.add_argument(
+        '--no-memory-optimize',
+        action='store_true',
+        default=False,
+        help='whether to use memory optimize in train')
+    parser.add_argument(
+        '--epoch-num',
+        type=int,
+        default=0,
+        help='epoch number, 0 for read from config file')
+    parser.add_argument(
+        '--valid-interval',
+        type=int,
+        default=1,
+        help='validation epoch interval, 0 for no validation.')
+    parser.add_argument(
+        '--save-dir',
+        type=str,
+        default='checkpoints',
+        help='directory name to save train snapshoot')
+    parser.add_argument(
+        '--log-interval',
+        type=int,
+        default=10,
+        help='mini-batch interval to log.')
+    args = parser.parse_args()
+    return args
+
+
+def train(args):
+    # parse config
+    config = parse_config(args.config)
+    train_config = merge_configs(config, 'train', vars(args))
+    valid_config = merge_configs(config, 'valid', vars(args))
+    train_model = models.get_model(args.model_name, train_config, mode='train')
+    valid_model = models.get_model(args.model_name, valid_config, mode='valid')
+
+    # build model
+    startup = fluid.Program()
+    train_prog = fluid.Program()
+    with fluid.program_guard(train_prog, startup):
+        with fluid.unique_name.guard():
+            train_model.build_input(not args.no_use_pyreader)
+            train_model.build_model()
+            # for the input, has the form [data1, data2,..., label], so train_feeds[-1] is label
+            train_feeds = train_model.feeds()
+            train_feeds[-1].persistable = True
+            # for the output of classification model, has the form [pred]
+            train_outputs = train_model.outputs()
+            for output in train_outputs:
+                output.persistable = True
+            train_loss = train_model.loss()
+            train_loss.persistable = True
+            # outputs, loss, label should be fetched, so set persistable to be true
+            optimizer = train_model.optimizer()
+            optimizer.minimize(train_loss)
+            train_pyreader = train_model.pyreader()
+
+    if not args.no_memory_optimize:
+        fluid.memory_optimize(train_prog)
+
+    valid_prog = fluid.Program()
+    with fluid.program_guard(valid_prog, startup):
+        with fluid.unique_name.guard():
+            valid_model.build_input(not args.no_use_pyreader)
+            valid_model.build_model()
+            valid_feeds = valid_model.feeds()
+            valid_outputs = valid_model.outputs()
+            valid_loss = valid_model.loss()
+            valid_pyreader = valid_model.pyreader()
+
+    place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(startup)
+
+    if args.resume:
+        # if resume weights is given, load resume weights directly
+        assert os.path.exists(args.resume), \
+                "Given resume weight dir {} not exist.".format(args.resume)
+        def if_exist(var):
+            return os.path.exists(os.path.join(args.resume, var.name))
+        fluid.io.load_vars(exe, args.resume, predicate=if_exist, main_program=train_prog)
+    else:
+        # if not in resume mode, load pretrain weights
+        if args.pretrain:
+            assert os.path.exists(args.pretrain), \
+                    "Given pretrain weight dir {} not exist.".format(args.pretrain)
+        pretrain = args.pretrain or train_model.get_pretrain_weights()
+        if pretrain:
+            train_model.load_pretrain_params(exe, pretrain, train_prog, place)
+
+    train_exe = fluid.ParallelExecutor(
+        use_cuda=args.use_gpu,
+        loss_name=train_loss.name,
+        main_program=train_prog)
+    valid_exe = fluid.ParallelExecutor(
+        use_cuda=args.use_gpu,
+        share_vars_from=train_exe,
+        main_program=valid_prog)
+
+    # get reader
+    bs_denominator = 1
+    if (not args.no_use_pyreader) and args.use_gpu:
+        bs_denominator = train_config.TRAIN.num_gpus
+    train_config.TRAIN.batch_size = int(train_config.TRAIN.batch_size /
+                                        bs_denominator)
+    valid_config.VALID.batch_size = int(valid_config.VALID.batch_size /
+                                        bs_denominator)
+    train_reader = get_reader(args.model_name.upper(), 'train', train_config)
+    valid_reader = get_reader(args.model_name.upper(), 'valid', valid_config)
+
+    # get metrics 
+    train_metrics = get_metrics(args.model_name.upper(), 'train', train_config)
+    valid_metrics = get_metrics(args.model_name.upper(), 'valid', valid_config)
+
+    train_fetch_list = [train_loss.name] + [x.name for x in train_outputs
+                                            ] + [train_feeds[-1].name]
+    valid_fetch_list = [valid_loss.name] + [x.name for x in valid_outputs
+                                            ] + [valid_feeds[-1].name]
+
+    epochs = args.epoch_num or train_model.epoch_num()
+
+    if args.no_use_pyreader:
+        train_feeder = fluid.DataFeeder(place=place, feed_list=train_feeds)
+        valid_feeder = fluid.DataFeeder(place=place, feed_list=valid_feeds)
+        train_without_pyreader(exe, train_prog, train_exe, train_reader, train_feeder,
+                               train_fetch_list, train_metrics, epochs = epochs,
+                               log_interval = args.log_interval, valid_interval = args.valid_interval,
+                               save_dir = args.save_dir, save_model_name = args.model_name,
+                               test_exe = valid_exe, test_reader = valid_reader, test_feeder = valid_feeder,
+                               test_fetch_list = valid_fetch_list, test_metrics = valid_metrics)
+    else:
+        train_pyreader.decorate_paddle_reader(train_reader)
+        valid_pyreader.decorate_paddle_reader(valid_reader)
+        train_with_pyreader(exe, train_prog, train_exe, train_pyreader, train_fetch_list, train_metrics,
+                            epochs = epochs, log_interval = args.log_interval,
+                            valid_interval = args.valid_interval,
+                            save_dir = args.save_dir, save_model_name = args.model_name,
+                            test_exe = valid_exe, test_pyreader = valid_pyreader,
+                            test_fetch_list = valid_fetch_list, test_metrics = valid_metrics)
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    logger.info(args)
+
+    if not os.path.exists(args.save_dir):
+        os.makedirs(args.save_dir)
+
+    train(args)
diff --git a/fluid/PaddleCV/video/utils.py b/fluid/PaddleCV/video/utils.py
new file mode 100755
index 0000000000000000000000000000000000000000..3b07d606c60b9834429fef94d43c0a5619cd1db5
--- /dev/null
+++ b/fluid/PaddleCV/video/utils.py
@@ -0,0 +1,25 @@
+#  Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+__all__ = ['AttrDict']
+
+class AttrDict(dict):
+    def __getattr__(self, key):
+        return self[key]
+
+    def __setattr__(self, key, value):
+        if key in self.__dict__:
+            self.__dict__[key] = value
+        else:
+            self[key] = value
diff --git a/fluid/PaddleNLP/LAC b/fluid/PaddleNLP/LAC
index d2fc9e0b45b4e6cfc93e73054026fc5a8abfbfb9..a4eb73b2fb64d8aab8499a1184edf4fc386f8268 160000
--- a/fluid/PaddleNLP/LAC
+++ b/fluid/PaddleNLP/LAC
@@ -1 +1 @@
-Subproject commit d2fc9e0b45b4e6cfc93e73054026fc5a8abfbfb9
+Subproject commit a4eb73b2fb64d8aab8499a1184edf4fc386f8268
diff --git a/fluid/PaddleNLP/Senta b/fluid/PaddleNLP/Senta
index 733c1d02085a3092dd262c4f396563962a514c3e..dc1af6a83dd1372055158ac6d17f6d14b3a0f0f8 160000
--- a/fluid/PaddleNLP/Senta
+++ b/fluid/PaddleNLP/Senta
@@ -1 +1 @@
-Subproject commit 733c1d02085a3092dd262c4f396563962a514c3e
+Subproject commit dc1af6a83dd1372055158ac6d17f6d14b3a0f0f8
diff --git a/fluid/PaddleNLP/SimNet b/fluid/PaddleNLP/SimNet
index 60b698a294c34420a7f0aab3112f27649aed1445..57b93859aa070ae6d96f10a470b1bdf2cfaea052 160000
--- a/fluid/PaddleNLP/SimNet
+++ b/fluid/PaddleNLP/SimNet
@@ -1 +1 @@
-Subproject commit 60b698a294c34420a7f0aab3112f27649aed1445
+Subproject commit 57b93859aa070ae6d96f10a470b1bdf2cfaea052
diff --git a/fluid/PaddleNLP/machine_reading_comprehension/run.py b/fluid/PaddleNLP/machine_reading_comprehension/run.py
index 884549d106af7f44789728fb488b5e60e149e118..e9ba1d0b14023f75d7551728dd22571cf8b72fa4 100644
--- a/fluid/PaddleNLP/machine_reading_comprehension/run.py
+++ b/fluid/PaddleNLP/machine_reading_comprehension/run.py
@@ -523,8 +523,8 @@ def evaluate(logger, args):
 
             inference_program = main_program.clone(for_test=True)
             eval_loss, bleu_rouge = validation(
-                inference_program, avg_cost, s_probs, e_probs, feed_order,
-                place, dev_count, vocab, brc_data, logger, args)
+                inference_program, avg_cost, s_probs, e_probs, match, 
+                feed_order, place, dev_count, vocab, brc_data, logger, args)
             logger.info('Dev eval loss {}'.format(eval_loss))
             logger.info('Dev eval result: {}'.format(bleu_rouge))
             logger.info('Predicted answers are saved to {}'.format(
diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/README.md b/fluid/PaddleNLP/neural_machine_translation/rnn_search/README.md
index 59860114a101b54d8c5f148bd8d725d9bfe778bc..86d4a021baf11e04a9fd07c05dbf50425451efab 100644
--- a/fluid/PaddleNLP/neural_machine_translation/rnn_search/README.md
+++ b/fluid/PaddleNLP/neural_machine_translation/rnn_search/README.md
@@ -1,4 +1,4 @@
-运行本目录下的范例模型需要安装PaddlePaddle Fluid 1.0版。如果您的 PaddlePaddle 安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新 PaddlePaddle 安装版本。
+运行本目录下的范例模型需要安装PaddlePaddle Fluid 1.0版。如果您的 PaddlePaddle 安装版本低于此要求，请按照[安装文档](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html)中的说明更新 PaddlePaddle 安装版本。
 
 # 机器翻译：RNN Search
 
@@ -24,7 +24,7 @@
 本目录下此范例模型的实现，旨在展示如何用Paddle Fluid实现一个带有注意力机制（Attention）的RNN模型来解决Seq2Seq类问题，以及如何使用带有Beam Search算法的解码器。如果您仅仅只是需要在机器翻译方面有着较好翻译效果的模型，则建议您参考[Transformer的Paddle Fluid实现](https://github.com/PaddlePaddle/models/tree/develop/fluid/neural_machine_translation/transformer)。
 
 ## 模型概览
-RNN Search模型使用了经典的编码器-解码器（Encoder-Decoder）的框架结构来解决Seq2Seq类问题。这种方法先用编码器将源序列编码成vector，再用解码器将该vector解码为目标序列。这其实模拟了人类在进行翻译类任务时的行为：先解析源语言，理解其含义，再根据该含义来写出目标语言的语句。编码器和解码器往往都使用RNN来实现。关于此方法的具体原理和数学表达式，可以参考[深度学习101](http://www.paddlepaddle.org/documentation/docs/zh/0.15.0/beginners_guide/basics/machine_translation/index.html).
+RNN Search模型使用了经典的编码器-解码器（Encoder-Decoder）的框架结构来解决Seq2Seq类问题。这种方法先用编码器将源序列编码成vector，再用解码器将该vector解码为目标序列。这其实模拟了人类在进行翻译类任务时的行为：先解析源语言，理解其含义，再根据该含义来写出目标语言的语句。编码器和解码器往往都使用RNN来实现。关于此方法的具体原理和数学表达式，可以参考[深度学习101](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html).
 
 本模型中，在编码器方面，我们的实现使用了双向循环神经网络（Bi-directional Recurrent Neural Network）；在解码器方面，我们使用了带注意力（Attention）机制的RNN解码器，并同时提供了一个不带注意力机制的解码器实现作为对比；而在预测方面我们使用柱搜索（beam search）算法来生成翻译的目标语句。以下将分别介绍用到的这些方法。
 
@@ -45,7 +45,7 @@ RNN Search模型使用了经典的编码器-解码器（Encoder-Decoder）的框
 ### 注意力机制
 如果编码阶段的输出是一个固定维度的向量，会带来以下两个问题：1）不论源语言序列的长度是5个词还是50个词，如果都用固定维度的向量去编码其中的语义和句法结构信息，对模型来说是一个非常高的要求，特别是对长句子序列而言；2）直觉上，当人类翻译一句话时，会对与当前译文更相关的源语言片段上给予更多关注，且关注点会随着翻译的进行而改变。而固定维度的向量则相当于，任何时刻都对源语言所有信息给予了同等程度的关注，这是不合理的。因此，Bahdanau等人\[[4](#参考文献)\]引入注意力（attention）机制，可以对编码后的上下文片段进行解码，以此来解决长句子的特征学习问题。下面介绍在注意力机制下的解码器结构。
 
-与简单的解码器不同，这里$z_i$的计算公式为：
+与简单的解码器不同，这里$z_i$的计算公式为 （由于Github原生不支持LaTeX公式，请您移步[这里](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html)查看）：
 
 $$z_{i+1}=\phi _{\theta '}\left ( c_i,u_i,z_i \right )$$
 
@@ -131,4 +131,4 @@ python infer.py
 5. Papineni K, Roukos S, Ward T, et al. [BLEU: a method for automatic evaluation of machine translation](http://dl.acm.org/citation.cfm?id=1073135)[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002: 311-318.
 
 <br/>
-<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
\ No newline at end of file
+<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md b/fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md
index 7e7a09e7a1e4e8dcfddc5dbbc27c94a757d80d9e..bdac7cb0b7c4f9d51bbc281b351232c6edc75a36 100644
--- a/fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md
+++ b/fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md
@@ -69,9 +69,9 @@ WMT 数据集是机器翻译领域公认的主流数据集，[WMT'16 EN-DE 数
 └── subword-nmt                  # BPE 编码的代码
 ```
 
-`gen_data/wmt16_ende_data_bpe` 中是我们最终使用的英德翻译数据，其中 `train.tok.clean.bpe.32000.en-de` 为训练数据，`newstest2016.tok.bpe.32000.en-de` 等为验证和测试数据，。`vocab_all.bpe.32000` 为相应的词典文件（已加入 `<s>` 、`<e>` 和 `<unk>` 这三个特殊符号，源语言和目标语言共享该词典文件）。
+`gen_data/wmt16_ende_data_bpe` 中是我们最终使用的英德翻译数据，其中 `train.tok.clean.bpe.32000.en-de` 为训练数据，`newstest2016.tok.bpe.32000.en-de` 等为验证和测试数据，`vocab_all.bpe.32000` 为相应的词典文件（已加入 `<s>` 、`<e>` 和 `<unk>` 这三个特殊符号，源语言和目标语言共享该词典文件）。另外我们也整理提供了一份处理好的 WMT'16 EN-DE 数据以供[下载](https://transformer-res.bj.bcebos.com/wmt16_ende_data_bpe_clean.tar.gz)使用（包含训练所需 BPE 数据和词典以及预测和评估所需的 BPE 数据和 tokenize 的数据）。
 
-对于其他自定义数据，转换为类似 `train.tok.clean.bpe.32000.en-de` 的数据格式（`\t` 分隔的源语言和目标语言句子对，句子中的 token 之间使用空格分隔）即可；如需使用 BPE 编码，可参考，亦可以使用类似 WMT，使用 `gen_data.sh` 进行处理。
+对于其他自定义数据，转换为类似 `train.tok.clean.bpe.32000.en-de` 的数据格式（`\t` 分隔的源语言和目标语言句子对，句子中的 token 之间使用空格分隔）即可；如需使用 BPE 编码，亦可以使用类似 WMT'16 EN-DE 原始数据的格式，参照 `gen_data.sh` 进行处理。
 
 ### 模型训练
 
@@ -110,11 +110,9 @@ python -u train.py \
   --batch_size 3200 \
   --sort_type pool \
   --pool_size 200000 \
-  n_layer 6 \
   n_head 16 \
   d_model 1024 \
   d_inner_hid 4096 \
-  n_head 16 \
   prepostprocess_dropout 0.3
 ```
 有关这些参数更详细信息的请参考 `config.py` 中的注释说明。
@@ -144,30 +142,53 @@ python -u infer.py \
   --token_delimiter ' ' \
   --batch_size 32 \
   model_path trained_models/iter_100000.infer.model \
-  beam_size 4 \
+  beam_size 5 \
   max_out_len 255
 ```
-和模型训练时类似，预测时也需要设置数据和 reader 相关的参数，并可以执行 `python infer.py --help` 查看这些参数的说明（部分参数意义和训练时略有不同）；同样可以在预测命令中设置模型超参数，但应与模型训练时的设置一致；此外相比于模型训练，预测时还有一些额外的参数，如需要设置 `model_path` 来给出模型所在目录，可以设置 `beam_size` 和 `max_out_len` 来指定 Beam Search 算法的搜索宽度和最大深度（翻译长度），这些参数也可以在 `config.py` 中的 `InferTaskConfig` 内查阅注释说明并进行更改设置。
+和模型训练时类似，预测时也需要设置数据和 reader 相关的参数，并可以执行 `python infer.py --help` 查看这些参数的说明（部分参数意义和训练时略有不同）；同样可以在预测命令中设置模型超参数，但应与模型训练时的设置一致，如训练时使用 big model 的参数设置，则预测时对应类似如下命令：
+```sh
+python -u infer.py \
+  --src_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
+  --trg_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
+  --special_token '<s>' '<e>' '<unk>' \
+  --test_file_pattern gen_data/wmt16_ende_data_bpe/newstest2016.tok.bpe.32000.en-de \
+  --token_delimiter ' ' \
+  --batch_size 32 \
+  model_path trained_models/iter_100000.infer.model \
+  n_head 16 \
+  d_model 1024 \
+  d_inner_hid 4096 \
+  prepostprocess_dropout 0.3 \
+  beam_size 5 \
+  max_out_len 255
+```
+此外相比于模型训练，预测时还有一些额外的参数，如需要设置 `model_path` 来给出模型所在目录，可以设置 `beam_size` 和 `max_out_len` 来指定 Beam Search 算法的搜索宽度和最大深度（翻译长度），这些参数也可以在 `config.py` 中的 `InferTaskConfig` 内查阅注释说明并进行更改设置。
 
 执行以上预测命令会打印翻译结果到标准输出，每行输出是对应行输入的得分最高的翻译。对于使用 BPE 的英德数据，预测出的翻译结果也将是 BPE 表示的数据，要还原成原始的数据（这里指 tokenize 后的数据）才能进行正确的评估，可以使用以下命令来恢复 `predict.txt` 内的翻译结果到 `predict.tok.txt` 中（无需再次 tokenize 处理）：
 ```sh
 sed -r 's/(@@ )|(@@ ?$)//g' predict.txt > predict.tok.txt
 ```
 
-接下来就可以使用参考翻译对翻译结果进行 BLEU 指标的评估了。以英德翻译 `newstest2016.tok.de` 数据为例，执行如下命令：
+接下来就可以使用参考翻译对翻译结果进行 BLEU 指标的评估了，评估需要用到 mosesdecoder 中的脚本，可以通过以下命令获取：
+```sh
+git clone https://github.com/moses-smt/mosesdecoder.git
+```
+以英德翻译 `newstest2014.tok.de` 数据为例，获取 mosesdecoder 后使用 `multi-bleu.perl` 执行如下命令进行翻译结果评估：
 ```sh
-perl gen_data/mosesdecoder/scripts/generic/multi-bleu.perl gen_data/wmt16_ende_data/newstest2016.tok.de < predict.tok.txt
+perl gen_data/mosesdecoder/scripts/generic/multi-bleu.perl gen_data/wmt16_ende_data/newstest2014.tok.de < predict.tok.txt
 ```
-可以看到类似如下的结果（为单机两卡训练 200K 个 iteration 后模型的预测结果）。
+可以看到类似如下的结果：
 ```
-BLEU = 33.08, 64.2/39.2/26.4/18.5 (BP=0.994, ratio=0.994, hyp_len=61971, ref_len=62362)
+BLEU = 26.35, 57.7/32.1/20.0/13.0 (BP=1.000, ratio=1.013, hyp_len=63903, ref_len=63078)
 ```
-目前在未使用 model average 的情况下，英德翻译 base model 八卡训练 100K 个 iteration 后测试 BLEU 值如下：
+目前在未使用 model average 的情况下，英德翻译 base model 和 big model 八卡训练 100K 个 iteration 后测试 BLEU 值如下：
 
 | 测试集 | newstest2014 | newstest2015 | newstest2016 |
 |-|-|-|-|
-| BLEU | 26.25 | 29.15 | 33.64 |
+| Base | 26.35 | 29.07 | 33.30 |
+| Big | 27.07 | 30.09 | 34.38 |
 
+我们这里也提供了以上 [base model](https://transformer-res.bj.bcebos.com/base_model.tar.gz) 和 [big model](https://transformer-res.bj.bcebos.com/big_model.tar.gz) 模型的下载以供使用。
 
 ### 分布式训练
 
diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/config.py b/fluid/PaddleNLP/neural_machine_translation/transformer/config.py
index ca119aa6fd0878b1e2cea5c0eaba050b54348f79..823341ed9084e80b5fe74655bf8db897d72175f0 100644
--- a/fluid/PaddleNLP/neural_machine_translation/transformer/config.py
+++ b/fluid/PaddleNLP/neural_machine_translation/transformer/config.py
@@ -164,7 +164,10 @@ input_descs = {
     # [batch_size * max_trg_len_in_batch, 1]
     "lbl_weight": [(batch_size * seq_len, 1), "float32"],
     # This input is used in beam-search decoder.
-    "init_score": [(batch_size, 1), "float32"],
+    "init_score": [(batch_size, 1), "float32", 2],
+    # This input is used in beam-search decoder for the first gather
+    # (cell states updation)
+    "init_idx": [(batch_size, ), "int32"],
 }
 
 # Names of word embedding table which might be reused for weight sharing.
@@ -194,4 +197,5 @@ label_data_input_fields = (
 fast_decoder_data_input_fields = (
     "trg_word",
     "init_score",
+    "init_idx",
     "trg_src_attn_bias", )
diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/images/attention_formula.png b/fluid/PaddleNLP/neural_machine_translation/transformer/images/attention_formula.png
new file mode 100644
index 0000000000000000000000000000000000000000..249857f524b4137bafc2d4d1b779ed62d1437b6d
Binary files /dev/null and b/fluid/PaddleNLP/neural_machine_translation/transformer/images/attention_formula.png differ
diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/infer.py b/fluid/PaddleNLP/neural_machine_translation/transformer/infer.py
index 6fc04a9422c136d941559d1b45af8bd88c2d2460..57ea546a5c6eb6ef484bbd975312cf99fd0ba18b 100644
--- a/fluid/PaddleNLP/neural_machine_translation/transformer/infer.py
+++ b/fluid/PaddleNLP/neural_machine_translation/transformer/infer.py
@@ -1,18 +1,20 @@
 import argparse
 import ast
+import multiprocessing
 import numpy as np
+import os
 from functools import partial
 
 import paddle
 import paddle.fluid as fluid
 
 import model
+import reader
+from config import *
 from model import wrap_encoder as encoder
 from model import wrap_decoder as decoder
 from model import fast_decode as fast_decoder
-from config import *
-from train import pad_batch_data
-import reader
+from train import pad_batch_data, prepare_data_generator
 
 
 def parse_args():
@@ -54,6 +56,21 @@ def parse_args():
         default=" ",
         help="The delimiter used to split tokens in source or target sentences. "
         "For EN-DE BPE data we provided, use spaces as token delimiter. ")
+    parser.add_argument(
+        "--use_mem_opt",
+        type=ast.literal_eval,
+        default=True,
+        help="The flag indicating whether to use memory optimization.")
+    parser.add_argument(
+        "--use_py_reader",
+        type=ast.literal_eval,
+        default=True,
+        help="The flag indicating whether to use py_reader.")
+    parser.add_argument(
+        "--use_parallel_exe",
+        type=ast.literal_eval,
+        default=False,
+        help="The flag indicating whether to use ParallelExecutor.")
     parser.add_argument(
         'opts',
         help='See config.py for all options',
@@ -123,106 +140,185 @@ def prepare_batch_input(insts, data_input_names, src_pad_idx, bos_idx, n_head,
             trg_word, dtype="float32").reshape(-1, 1),
         place, [range(trg_word.shape[0] + 1)] * 2)
     trg_word = to_lodtensor(trg_word, place, [range(trg_word.shape[0] + 1)] * 2)
+    init_idx = np.asarray(range(len(insts)), dtype="int32")
 
     data_input_dict = dict(
         zip(data_input_names, [
             src_word, src_pos, src_slf_attn_bias, trg_word, init_score,
-            trg_src_attn_bias
+            init_idx, trg_src_attn_bias
         ]))
+    return data_input_dict
+
+
+def prepare_feed_dict_list(data_generator, count, place):
+    """
+    Prepare the list of feed dict for multi-devices.
+    """
+    feed_dict_list = []
+    if data_generator is not None:  # use_py_reader == False
+        data_input_names = encoder_data_input_fields + fast_decoder_data_input_fields
+        data = next(data_generator)
+        for idx, data_buffer in enumerate(data):
+            data_input_dict = prepare_batch_input(
+                data_buffer, data_input_names, ModelHyperParams.eos_idx,
+                ModelHyperParams.bos_idx, ModelHyperParams.n_head,
+                ModelHyperParams.d_model, place)
+            feed_dict_list.append(data_input_dict)
+    return feed_dict_list if len(feed_dict_list) == count else None
+
+
+def py_reader_provider_wrapper(data_reader, place):
+    """
+    Data provider needed by fluid.layers.py_reader.
+    """
 
-    input_dict = dict(data_input_dict.items())
-    return input_dict
+    def py_reader_provider():
+        data_input_names = encoder_data_input_fields + fast_decoder_data_input_fields
+        for batch_id, data in enumerate(data_reader()):
+            data_input_dict = prepare_batch_input(
+                data, data_input_names, ModelHyperParams.eos_idx,
+                ModelHyperParams.bos_idx, ModelHyperParams.n_head,
+                ModelHyperParams.d_model, place)
+            yield [data_input_dict[item] for item in data_input_names]
 
+    return py_reader_provider
 
-def fast_infer(test_data, trg_idx2word):
+
+def fast_infer(args):
     """
     Inference by beam search decoder based solely on Fluid operators.
     """
-    place = fluid.CUDAPlace(0) if InferTaskConfig.use_gpu else fluid.CPUPlace()
-    exe = fluid.Executor(place)
+    out_ids, out_scores, pyreader = fast_decoder(
+        ModelHyperParams.src_vocab_size,
+        ModelHyperParams.trg_vocab_size,
+        ModelHyperParams.max_length + 1,
+        ModelHyperParams.n_layer,
+        ModelHyperParams.n_head,
+        ModelHyperParams.d_key,
+        ModelHyperParams.d_value,
+        ModelHyperParams.d_model,
+        ModelHyperParams.d_inner_hid,
+        ModelHyperParams.prepostprocess_dropout,
+        ModelHyperParams.attention_dropout,
+        ModelHyperParams.relu_dropout,
+        ModelHyperParams.preprocess_cmd,
+        ModelHyperParams.postprocess_cmd,
+        ModelHyperParams.weight_sharing,
+        InferTaskConfig.beam_size,
+        InferTaskConfig.max_out_len,
+        ModelHyperParams.eos_idx,
+        use_py_reader=args.use_py_reader)
+
+    # This is used here to set dropout to the test mode.
+    infer_program = fluid.default_main_program().clone(for_test=True)
 
-    out_ids, out_scores = fast_decoder(
-        ModelHyperParams.src_vocab_size, ModelHyperParams.trg_vocab_size,
-        ModelHyperParams.max_length + 1, ModelHyperParams.n_layer,
-        ModelHyperParams.n_head, ModelHyperParams.d_key,
-        ModelHyperParams.d_value, ModelHyperParams.d_model,
-        ModelHyperParams.d_inner_hid, ModelHyperParams.prepostprocess_dropout,
-        ModelHyperParams.attention_dropout, ModelHyperParams.relu_dropout,
-        ModelHyperParams.preprocess_cmd, ModelHyperParams.postprocess_cmd,
-        ModelHyperParams.weight_sharing, InferTaskConfig.beam_size,
-        InferTaskConfig.max_out_len, ModelHyperParams.eos_idx)
+    if args.use_mem_opt:
+        fluid.memory_optimize(infer_program)
+
+    if InferTaskConfig.use_gpu:
+        place = fluid.CUDAPlace(0)
+        dev_count = fluid.core.get_cuda_device_count()
+    else:
+        place = fluid.CPUPlace()
+        dev_count = int(os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
 
     fluid.io.load_vars(
         exe,
         InferTaskConfig.model_path,
         vars=[
-            var for var in fluid.default_main_program().list_vars()
+            var for var in infer_program.list_vars()
             if isinstance(var, fluid.framework.Parameter)
         ])
 
-    # This is used here to set dropout to the test mode.
-    infer_program = fluid.default_main_program().clone(for_test=True)
+    exec_strategy = fluid.ExecutionStrategy()
+    # For faster executor
+    exec_strategy.use_experimental_executor = True
+    exec_strategy.num_threads = 1
+    build_strategy = fluid.BuildStrategy()
+    infer_exe = fluid.ParallelExecutor(
+        use_cuda=TrainTaskConfig.use_gpu,
+        main_program=infer_program,
+        build_strategy=build_strategy,
+        exec_strategy=exec_strategy)
 
-    for batch_id, data in enumerate(test_data.batch_generator()):
-        data_input = prepare_batch_input(
-            data, encoder_data_input_fields + fast_decoder_data_input_fields,
-            ModelHyperParams.eos_idx, ModelHyperParams.bos_idx,
-            ModelHyperParams.n_head, ModelHyperParams.d_model, place)
-        seq_ids, seq_scores = exe.run(infer_program,
-                                      feed=data_input,
-                                      fetch_list=[out_ids, out_scores],
-                                      return_numpy=False)
-        # How to parse the results:
-        #   Suppose the lod of seq_ids is:
-        #     [[0, 3, 6], [0, 12, 24, 40, 54, 67, 82]]
-        #   then from lod[0]:
-        #     there are 2 source sentences, beam width is 3.
-        #   from lod[1]:
-        #     the first source sentence has 3 hyps; the lengths are 12, 12, 16
-        #     the second source sentence has 3 hyps; the lengths are 14, 13, 15
-        hyps = [[] for i in range(len(data))]
-        scores = [[] for i in range(len(data))]
-        for i in range(len(seq_ids.lod()[0]) - 1):  # for each source sentence
-            start = seq_ids.lod()[0][i]
-            end = seq_ids.lod()[0][i + 1]
-            for j in range(end - start):  # for each candidate
-                sub_start = seq_ids.lod()[1][start + j]
-                sub_end = seq_ids.lod()[1][start + j + 1]
-                hyps[i].append(" ".join([
-                    trg_idx2word[idx]
-                    for idx in post_process_seq(
-                        np.array(seq_ids)[sub_start:sub_end])
-                ]))
-                scores[i].append(np.array(seq_scores)[sub_end - 1])
-                print(hyps[i][-1])
-                if len(hyps[i]) >= InferTaskConfig.n_best:
-                    break
-
-
-def infer(args, inferencer=fast_infer):
-    place = fluid.CUDAPlace(0) if InferTaskConfig.use_gpu else fluid.CPUPlace()
-    test_data = reader.DataReader(
-        src_vocab_fpath=args.src_vocab_fpath,
-        trg_vocab_fpath=args.trg_vocab_fpath,
-        fpattern=args.test_file_pattern,
-        token_delimiter=args.token_delimiter,
-        use_token_batch=False,
-        batch_size=args.batch_size,
-        pool_size=args.pool_size,
-        sort_type=reader.SortType.NONE,
-        shuffle=False,
-        shuffle_batch=False,
-        start_mark=args.special_token[0],
-        end_mark=args.special_token[1],
-        unk_mark=args.special_token[2],
-        # count start and end tokens out
-        max_length=ModelHyperParams.max_length - 2,
-        clip_last_batch=False)
-    trg_idx2word = test_data.load_dict(
+    # data reader settings for inference
+    args.train_file_pattern = args.test_file_pattern
+    args.use_token_batch = False
+    args.sort_type = reader.SortType.NONE
+    args.shuffle = False
+    args.shuffle_batch = False
+    test_data = prepare_data_generator(
+        args,
+        is_test=False,
+        count=dev_count,
+        pyreader=pyreader,
+        py_reader_provider_wrapper=py_reader_provider_wrapper,
+        place=place)
+    if args.use_py_reader:
+        pyreader.start()
+        data_generator = None
+    else:
+        data_generator = test_data()
+    trg_idx2word = reader.DataReader.load_dict(
         dict_path=args.trg_vocab_fpath, reverse=True)
-    inferencer(test_data, trg_idx2word)
+
+    while True:
+        try:
+            feed_dict_list = prepare_feed_dict_list(data_generator, dev_count,
+                                                    place)
+            if args.use_parallel_exe:
+                seq_ids, seq_scores = infer_exe.run(
+                    fetch_list=[out_ids.name, out_scores.name],
+                    feed=feed_dict_list,
+                    return_numpy=False)
+            else:
+                seq_ids, seq_scores = exe.run(
+                    program=infer_program,
+                    fetch_list=[out_ids.name, out_scores.name],
+                    feed=feed_dict_list[0]
+                    if feed_dict_list is not None else None,
+                    return_numpy=False,
+                    use_program_cache=True)
+            seq_ids_list, seq_scores_list = [seq_ids], [
+                seq_scores
+            ] if isinstance(
+                seq_ids, paddle.fluid.core.LoDTensor) else (seq_ids, seq_scores)
+            for seq_ids, seq_scores in zip(seq_ids_list, seq_scores_list):
+                # How to parse the results:
+                #   Suppose the lod of seq_ids is:
+                #     [[0, 3, 6], [0, 12, 24, 40, 54, 67, 82]]
+                #   then from lod[0]:
+                #     there are 2 source sentences, beam width is 3.
+                #   from lod[1]:
+                #     the first source sentence has 3 hyps; the lengths are 12, 12, 16
+                #     the second source sentence has 3 hyps; the lengths are 14, 13, 15
+                hyps = [[] for i in range(len(seq_ids.lod()[0]) - 1)]
+                scores = [[] for i in range(len(seq_scores.lod()[0]) - 1)]
+                for i in range(len(seq_ids.lod()[0]) -
+                               1):  # for each source sentence
+                    start = seq_ids.lod()[0][i]
+                    end = seq_ids.lod()[0][i + 1]
+                    for j in range(end - start):  # for each candidate
+                        sub_start = seq_ids.lod()[1][start + j]
+                        sub_end = seq_ids.lod()[1][start + j + 1]
+                        hyps[i].append(" ".join([
+                            trg_idx2word[idx]
+                            for idx in post_process_seq(
+                                np.array(seq_ids)[sub_start:sub_end])
+                        ]))
+                        scores[i].append(np.array(seq_scores)[sub_end - 1])
+                        print(hyps[i][-1])
+                        if len(hyps[i]) >= InferTaskConfig.n_best:
+                            break
+        except (StopIteration, fluid.core.EOFException):
+            # The data pass is over.
+            if args.use_py_reader:
+                pyreader.reset()
+            break
 
 
 if __name__ == "__main__":
     args = parse_args()
-    infer(args)
+    fast_infer(args)
diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/model.py b/fluid/PaddleNLP/neural_machine_translation/transformer/model.py
index 1e510bc620dc56f82e8e7303a56ca44a44b74650..bf68e089004cf53bfb8d2910b48664e25e44c10d 100644
--- a/fluid/PaddleNLP/neural_machine_translation/transformer/model.py
+++ b/fluid/PaddleNLP/neural_machine_translation/transformer/model.py
@@ -7,6 +7,43 @@ import paddle.fluid.layers as layers
 from config import *
 
 
+def wrap_layer_with_block(layer, block_idx):
+    """
+    Make layer define support indicating block, by which we can add layers
+    to other blocks within current block. This will make it easy to define
+    cache among while loop.
+    """
+
+    class BlockGuard(object):
+        """
+        BlockGuard class.
+
+        BlockGuard class is used to switch to the given block in a program by
+        using the Python `with` keyword.
+        """
+
+        def __init__(self, block_idx=None, main_program=None):
+            self.main_program = fluid.default_main_program(
+            ) if main_program is None else main_program
+            self.old_block_idx = self.main_program.current_block().idx
+            self.new_block_idx = block_idx
+
+        def __enter__(self):
+            self.main_program.current_block_idx = self.new_block_idx
+
+        def __exit__(self, exc_type, exc_val, exc_tb):
+            self.main_program.current_block_idx = self.old_block_idx
+            if exc_type is not None:
+                return False  # re-raise exception
+            return True
+
+    def layer_wrapper(*args, **kwargs):
+        with BlockGuard(block_idx):
+            return layer(*args, **kwargs)
+
+    return layer_wrapper
+
+
 def position_encoding_init(n_position, d_pos_vec):
     """
     Generate the initial values for the sinusoid position encoding table.
@@ -35,7 +72,9 @@ def multi_head_attention(queries,
                          d_model,
                          n_head=1,
                          dropout_rate=0.,
-                         cache=None):
+                         cache=None,
+                         gather_idx=None,
+                         static_kv=False):
     """
     Multi-Head Attention. Note that attn_bias is added to the logit before
     computing softmax activiation to mask certain selected positions so that
@@ -56,42 +95,86 @@ def multi_head_attention(queries,
                       size=d_key * n_head,
                       bias_attr=False,
                       num_flatten_dims=2)
-        k = layers.fc(input=keys,
-                      size=d_key * n_head,
-                      bias_attr=False,
-                      num_flatten_dims=2)
-        v = layers.fc(input=values,
-                      size=d_value * n_head,
-                      bias_attr=False,
-                      num_flatten_dims=2)
+        # For encoder-decoder attention in inference, insert the ops and vars
+        # into global block to use as cache among beam search.
+        fc_layer = wrap_layer_with_block(
+            layers.fc, fluid.default_main_program().current_block()
+            .parent_idx) if cache is not None and static_kv else layers.fc
+        k = fc_layer(
+            input=keys,
+            size=d_key * n_head,
+            bias_attr=False,
+            num_flatten_dims=2)
+        v = fc_layer(
+            input=values,
+            size=d_value * n_head,
+            bias_attr=False,
+            num_flatten_dims=2)
         return q, k, v
 
-    def __split_heads(x, n_head):
+    def __split_heads_qkv(queries, keys, values, n_head, d_key, d_value):
         """
-        Reshape the last dimension of inpunt tensor x so that it becomes two
-        dimensions and then transpose. Specifically, input a tensor with shape
-        [bs, max_sequence_length, n_head * hidden_dim] then output a tensor
+        Reshape input tensors at the last dimension to split multi-heads 
+        and then transpose. Specifically, transform the input tensor with shape
+        [bs, max_sequence_length, n_head * hidden_dim] to the output tensor
         with shape [bs, n_head, max_sequence_length, hidden_dim].
         """
-        if n_head == 1:
-            return x
-
-        hidden_size = x.shape[-1]
         # The value 0 in shape attr means copying the corresponding dimension
         # size of the input as the output dimension size.
-        reshaped = layers.reshape(
-            x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)
-
+        reshaped_q = layers.reshape(
+            x=queries, shape=[0, 0, n_head, d_key], inplace=True)
         # permuate the dimensions into:
         # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
-        return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
+        q = layers.transpose(x=reshaped_q, perm=[0, 2, 1, 3])
+        # For encoder-decoder attention in inference, insert the ops and vars
+        # into global block to use as cache among beam search.
+        reshape_layer = wrap_layer_with_block(
+            layers.reshape,
+            fluid.default_main_program().current_block()
+            .parent_idx) if cache is not None and static_kv else layers.reshape
+        transpose_layer = wrap_layer_with_block(
+            layers.transpose,
+            fluid.default_main_program().current_block().
+            parent_idx) if cache is not None and static_kv else layers.transpose
+        reshaped_k = reshape_layer(
+            x=keys, shape=[0, 0, n_head, d_key], inplace=True)
+        k = transpose_layer(x=reshaped_k, perm=[0, 2, 1, 3])
+        reshaped_v = reshape_layer(
+            x=values, shape=[0, 0, n_head, d_value], inplace=True)
+        v = transpose_layer(x=reshaped_v, perm=[0, 2, 1, 3])
+
+        if cache is not None:  # only for faster inference
+            if static_kv:  # For encoder-decoder attention in inference
+                cache_k, cache_v = cache["static_k"], cache["static_v"]
+                # To init the static_k and static_v in cache.
+                # Maybe we can use condition_op(if_else) to do these at the first
+                # step in while loop to replace these, however it might be less
+                # efficient.
+                static_cache_init = wrap_layer_with_block(
+                    layers.assign,
+                    fluid.default_main_program().current_block().parent_idx)
+                static_cache_init(k, cache_k)
+                static_cache_init(v, cache_v)
+            else:  # For decoder self-attention in inference
+                cache_k, cache_v = cache["k"], cache["v"]
+            # gather cell states corresponding to selected parent
+            select_k = layers.gather(cache_k, index=gather_idx)
+            select_v = layers.gather(cache_v, index=gather_idx)
+            if not static_kv:
+                # For self attention in inference, use cache and concat time steps.
+                select_k = layers.concat([select_k, k], axis=2)
+                select_v = layers.concat([select_v, v], axis=2)
+            # update cell states(caches) cached in global block
+            layers.assign(select_k, cache_k)
+            layers.assign(select_v, cache_v)
+            return q, select_k, select_v
+        return q, k, v
 
     def __combine_heads(x):
         """
         Transpose and then reshape the last two dimensions of inpunt tensor x
         so that it becomes one dimension, which is reverse to __split_heads.
         """
-        if len(x.shape) == 3: return x
         if len(x.shape) != 4:
             raise ValueError("Input(x) should be a 4-D Tensor.")
 
@@ -107,8 +190,7 @@ def multi_head_attention(queries,
         """
         Scaled Dot-Product Attention
         """
-        scaled_q = layers.scale(x=q, scale=d_key**-0.5)
-        product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
+        product = layers.matmul(x=q, y=k, transpose_y=True, alpha=d_key**-0.5)
         if attn_bias:
             product += attn_bias
         weights = layers.softmax(product)
@@ -122,23 +204,7 @@ def multi_head_attention(queries,
         return out
 
     q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
-
-    if cache is not None:  # use cache and concat time steps
-        # Since the inplace reshape in __split_heads changes the shape of k and
-        # v, which is the cache input for next time step, reshape the cache
-        # input from the previous time step first.
-        k = cache["k"] = layers.concat(
-            [layers.reshape(
-                cache["k"], shape=[0, 0, d_key * n_head]), k],
-            axis=1)
-        v = cache["v"] = layers.concat(
-            [layers.reshape(
-                cache["v"], shape=[0, 0, d_value * n_head]), v],
-            axis=1)
-
-    q = __split_heads(q, n_head)
-    k = __split_heads(k, n_head)
-    v = __split_heads(v, n_head)
+    q, k, v = __split_heads_qkv(q, k, v, n_head, d_key, d_value)
 
     ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_model,
                                                   dropout_rate)
@@ -327,7 +393,8 @@ def decoder_layer(dec_input,
                   relu_dropout,
                   preprocess_cmd,
                   postprocess_cmd,
-                  cache=None):
+                  cache=None,
+                  gather_idx=None):
     """ The layer to be stacked in decoder part.
     The structure of this module is similar to that in the encoder part except
     a multi-head attention is added to implement encoder-decoder attention.
@@ -342,7 +409,8 @@ def decoder_layer(dec_input,
         d_model,
         n_head,
         attention_dropout,
-        cache, )
+        cache=cache,
+        gather_idx=gather_idx)
     slf_attn_output = post_process_layer(
         dec_input,
         slf_attn_output,
@@ -358,7 +426,10 @@ def decoder_layer(dec_input,
         d_value,
         d_model,
         n_head,
-        attention_dropout, )
+        attention_dropout,
+        cache=cache,
+        gather_idx=gather_idx,
+        static_kv=True)
     enc_attn_output = post_process_layer(
         slf_attn_output,
         enc_attn_output,
@@ -393,7 +464,8 @@ def decoder(dec_input,
             relu_dropout,
             preprocess_cmd,
             postprocess_cmd,
-            caches=None):
+            caches=None,
+            gather_idx=None):
     """
     The decoder is composed of a stack of identical decoder_layer layers.
     """
@@ -413,7 +485,8 @@ def decoder(dec_input,
             relu_dropout,
             preprocess_cmd,
             postprocess_cmd,
-            cache=None if caches is None else caches[i])
+            cache=None if caches is None else caches[i],
+            gather_idx=gather_idx)
         dec_input = dec_output
     dec_output = pre_process_layer(dec_output, preprocess_cmd,
                                    prepostprocess_dropout)
@@ -610,7 +683,8 @@ def wrap_decoder(trg_vocab_size,
                  weight_sharing,
                  dec_inputs=None,
                  enc_output=None,
-                 caches=None):
+                 caches=None,
+                 gather_idx=None):
     """
     The wrapper assembles together all needed layers for the decoder.
     """
@@ -646,7 +720,8 @@ def wrap_decoder(trg_vocab_size,
         relu_dropout,
         preprocess_cmd,
         postprocess_cmd,
-        caches=caches)
+        caches=caches,
+        gather_idx=gather_idx)
     # Reshape to 2D tensor to use GEMM instead of BatchedGEMM
     dec_output = layers.reshape(
         dec_output, shape=[-1, dec_output.shape[-1]], inplace=True)
@@ -666,9 +741,43 @@ def wrap_decoder(trg_vocab_size,
     return predict
 
 
-def fast_decode(
+def fast_decode(src_vocab_size,
+                trg_vocab_size,
+                max_in_len,
+                n_layer,
+                n_head,
+                d_key,
+                d_value,
+                d_model,
+                d_inner_hid,
+                prepostprocess_dropout,
+                attention_dropout,
+                relu_dropout,
+                preprocess_cmd,
+                postprocess_cmd,
+                weight_sharing,
+                beam_size,
+                max_out_len,
+                eos_idx,
+                use_py_reader=False):
+    """
+    Use beam search to decode. Caches will be used to store states of history
+    steps which can make the decoding faster.
+    """
+    data_input_names = encoder_data_input_fields + fast_decoder_data_input_fields
+
+    if use_py_reader:
+        all_inputs, reader = make_all_py_reader_inputs(data_input_names)
+    else:
+        all_inputs = make_all_inputs(data_input_names)
+
+    enc_inputs_len = len(encoder_data_input_fields)
+    dec_inputs_len = len(fast_decoder_data_input_fields)
+    enc_inputs = all_inputs[0:enc_inputs_len]
+    dec_inputs = all_inputs[enc_inputs_len:enc_inputs_len + dec_inputs_len]
+
+    enc_output = wrap_encoder(
         src_vocab_size,
-        trg_vocab_size,
         max_in_len,
         n_layer,
         n_head,
@@ -682,64 +791,60 @@ def fast_decode(
         preprocess_cmd,
         postprocess_cmd,
         weight_sharing,
-        beam_size,
-        max_out_len,
-        eos_idx, ):
-    """
-    Use beam search to decode. Caches will be used to store states of history
-    steps which can make the decoding faster.
-    """
-    enc_output = wrap_encoder(
-        src_vocab_size, max_in_len, n_layer, n_head, d_key, d_value, d_model,
-        d_inner_hid, prepostprocess_dropout, attention_dropout, relu_dropout,
-        preprocess_cmd, postprocess_cmd, weight_sharing)
-    start_tokens, init_scores, trg_src_attn_bias = make_all_inputs(
-        fast_decoder_data_input_fields)
+        enc_inputs, )
+    start_tokens, init_scores, parent_idx, trg_src_attn_bias = dec_inputs
 
     def beam_search():
         max_len = layers.fill_constant(
-            shape=[1], dtype=start_tokens.dtype, value=max_out_len)
+            shape=[1],
+            dtype=start_tokens.dtype,
+            value=max_out_len,
+            force_cpu=True)
         step_idx = layers.fill_constant(
-            shape=[1], dtype=start_tokens.dtype, value=0)
-        cond = layers.less_than(x=step_idx, y=max_len)
+            shape=[1], dtype=start_tokens.dtype, value=0, force_cpu=True)
+        cond = layers.less_than(x=step_idx, y=max_len)  # default force_cpu=True
         while_op = layers.While(cond)
         # array states will be stored for each step.
         ids = layers.array_write(
             layers.reshape(start_tokens, (-1, 1)), step_idx)
         scores = layers.array_write(init_scores, step_idx)
         # cell states will be overwrited at each step.
-        # caches contains states of history steps to reduce redundant
-        # computation in decoder.
-        caches = [{
-            "k": layers.fill_constant_batch_size_like(
-                input=start_tokens,
-                shape=[-1, 0, d_model],
-                dtype=enc_output.dtype,
-                value=0),
-            "v": layers.fill_constant_batch_size_like(
-                input=start_tokens,
-                shape=[-1, 0, d_model],
-                dtype=enc_output.dtype,
-                value=0)
-        } for i in range(n_layer)]
+        # caches contains states of history steps in decoder self-attention
+        # and static encoder output projections in encoder-decoder attention
+        # to reduce redundant computation.
+        caches = [
+            {
+                "k":  # for self attention
+                layers.fill_constant_batch_size_like(
+                    input=start_tokens,
+                    shape=[-1, n_head, 0, d_key],
+                    dtype=enc_output.dtype,
+                    value=0),
+                "v":  # for self attention
+                layers.fill_constant_batch_size_like(
+                    input=start_tokens,
+                    shape=[-1, n_head, 0, d_value],
+                    dtype=enc_output.dtype,
+                    value=0),
+                "static_k":  # for encoder-decoder attention
+                layers.create_tensor(dtype=enc_output.dtype),
+                "static_v":  # for encoder-decoder attention
+                layers.create_tensor(dtype=enc_output.dtype)
+            } for i in range(n_layer)
+        ]
+
         with while_op.block():
             pre_ids = layers.array_read(array=ids, i=step_idx)
-            pre_ids = layers.reshape(pre_ids, (-1, 1, 1))
+            # Since beam_search_op dosen't enforce pre_ids' shape, we can do
+            # inplace reshape here which actually change the shape of pre_ids.
+            pre_ids = layers.reshape(pre_ids, (-1, 1, 1), inplace=True)
             pre_scores = layers.array_read(array=scores, i=step_idx)
-            # sequence_expand can gather sequences according to lod thus can be
-            # used in beam search to sift states corresponding to selected ids.
-            pre_src_attn_bias = layers.sequence_expand(
-                x=trg_src_attn_bias, y=pre_scores)
-            pre_enc_output = layers.sequence_expand(x=enc_output, y=pre_scores)
-            pre_caches = [{
-                "k": layers.sequence_expand(
-                    x=cache["k"], y=pre_scores),
-                "v": layers.sequence_expand(
-                    x=cache["v"], y=pre_scores),
-            } for cache in caches]
+            # gather cell states corresponding to selected parent
+            pre_src_attn_bias = layers.gather(
+                trg_src_attn_bias, index=parent_idx)
             pre_pos = layers.elementwise_mul(
                 x=layers.fill_constant_batch_size_like(
-                    input=pre_enc_output,  # cann't use pre_ids here since it has lod
+                    input=pre_src_attn_bias,  # cann't use lod tensor here
                     value=1,
                     shape=[-1, 1, 1],
                     dtype=pre_ids.dtype),
@@ -761,35 +866,33 @@ def fast_decode(
                 postprocess_cmd,
                 weight_sharing,
                 dec_inputs=(pre_ids, pre_pos, None, pre_src_attn_bias),
-                enc_output=pre_enc_output,
-                caches=pre_caches)
-
+                enc_output=enc_output,
+                caches=caches,
+                gather_idx=parent_idx)
+            # intra-beam topK
             topk_scores, topk_indices = layers.topk(
                 input=layers.softmax(logits), k=beam_size)
             accu_scores = layers.elementwise_add(
-                x=layers.log(topk_scores),
-                y=layers.reshape(
-                    pre_scores, shape=[-1]),
-                axis=0)
-            # beam_search op uses lod to distinguish branches.
+                x=layers.log(topk_scores), y=pre_scores, axis=0)
+            # beam_search op uses lod to differentiate branches.
             topk_indices = layers.lod_reset(topk_indices, pre_ids)
-            selected_ids, selected_scores = layers.beam_search(
+            # topK reduction across beams, also contain special handle of
+            # end beams and end sentences(batch reduction)
+            selected_ids, selected_scores, gather_idx = layers.beam_search(
                 pre_ids=pre_ids,
                 pre_scores=pre_scores,
                 ids=topk_indices,
                 scores=accu_scores,
                 beam_size=beam_size,
-                end_id=eos_idx)
-
+                end_id=eos_idx,
+                return_parent_idx=True)
             layers.increment(x=step_idx, value=1.0, in_place=True)
-            # update states
+            # cell states(caches) have been updated in wrap_decoder,
+            # only need to update beam search states here.
             layers.array_write(selected_ids, i=step_idx, array=ids)
             layers.array_write(selected_scores, i=step_idx, array=scores)
+            layers.assign(gather_idx, parent_idx)
             layers.assign(pre_src_attn_bias, trg_src_attn_bias)
-            layers.assign(pre_enc_output, enc_output)
-            for i in range(n_layer):
-                layers.assign(pre_caches[i]["k"], caches[i]["k"])
-                layers.assign(pre_caches[i]["v"], caches[i]["v"])
             length_cond = layers.less_than(x=step_idx, y=max_len)
             finish_cond = layers.logical_not(layers.is_empty(x=selected_ids))
             layers.logical_and(x=length_cond, y=finish_cond, out=cond)
@@ -799,4 +902,4 @@ def fast_decode(
         return finished_ids, finished_scores
 
     finished_ids, finished_scores = beam_search()
-    return finished_ids, finished_scores
+    return finished_ids, finished_scores, reader if use_py_reader else None
diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/profile.py b/fluid/PaddleNLP/neural_machine_translation/transformer/profile.py
index 9a437725cb27c29b0233d6297e84781f5343aff1..76711ece132113863f1e42d4ac1529f63ed90ff3 100644
--- a/fluid/PaddleNLP/neural_machine_translation/transformer/profile.py
+++ b/fluid/PaddleNLP/neural_machine_translation/transformer/profile.py
@@ -186,7 +186,7 @@ def main(args):
     # Since the token number differs among devices, customize gradient scale to
     # use token average cost among multi-devices. and the gradient scale is
     # `1 / token_number` for average cost.
-    build_strategy.gradient_scale_strategy = fluid.BuildStrategy.GradientScaleStrategy.Customized
+    # build_strategy.gradient_scale_strategy = fluid.BuildStrategy.GradientScaleStrategy.Customized
     train_exe = fluid.ParallelExecutor(
         use_cuda=TrainTaskConfig.use_gpu,
         loss_name=avg_cost.name,
diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/train.py b/fluid/PaddleNLP/neural_machine_translation/transformer/train.py
index 16d48238941a03309cc9ba269cd619bd21e0f561..4313f8b441ee194935c7c47abc52271589c7765d 100644
--- a/fluid/PaddleNLP/neural_machine_translation/transformer/train.py
+++ b/fluid/PaddleNLP/neural_machine_translation/transformer/train.py
@@ -10,7 +10,6 @@ import time
 
 import numpy as np
 import paddle.fluid as fluid
-from paddle.fluid.transpiler.details import program_to_code
 
 import reader
 from config import *
@@ -258,7 +257,12 @@ def prepare_batch_input(insts, data_input_names, src_pad_idx, trg_pad_idx,
     return data_input_dict, np.asarray([num_token], dtype="float32")
 
 
-def prepare_data_generator(args, is_test, count, pyreader):
+def prepare_data_generator(args,
+                           is_test,
+                           count,
+                           pyreader,
+                           py_reader_provider_wrapper,
+                           place=None):
     """
     Data generator wrapper for DataReader. If use py_reader, set the data
     provider for py_reader
@@ -319,7 +323,7 @@ def prepare_data_generator(args, is_test, count, pyreader):
         data_reader = split(data_reader, count)
     if args.use_py_reader:
         pyreader.decorate_tensor_provider(
-            py_reader_provider_wrapper(data_reader))
+            py_reader_provider_wrapper(data_reader, place))
         data_reader = None
     else:  # Data generator for multi-devices
         data_reader = stack(data_reader, count)
@@ -357,7 +361,7 @@ def prepare_feed_dict_list(data_generator, init_flag, count):
     return feed_dict_list if len(feed_dict_list) == count else None
 
 
-def py_reader_provider_wrapper(data_reader):
+def py_reader_provider_wrapper(data_reader, place):
     """
     Data provider needed by fluid.layers.py_reader.
     """
@@ -370,8 +374,7 @@ def py_reader_provider_wrapper(data_reader):
                 data, data_input_names, ModelHyperParams.eos_idx,
                 ModelHyperParams.eos_idx, ModelHyperParams.n_head,
                 ModelHyperParams.d_model)
-            total_dict = dict(data_input_dict.items())
-            yield [total_dict[item] for item in data_input_names]
+            yield [data_input_dict[item] for item in data_input_names]
 
     return py_reader_provider
 
@@ -406,7 +409,11 @@ def test_context(exe, train_exe, dev_count):
                 is_test=True)
     test_prog = test_prog.clone(for_test=True)
     test_data = prepare_data_generator(
-        args, is_test=True, count=dev_count, pyreader=pyreader)
+        args,
+        is_test=True,
+        count=dev_count,
+        pyreader=pyreader,
+        py_reader_provider_wrapper=py_reader_provider_wrapper)
 
     exe.run(startup_prog)  # to init pyreader for testing
     if TrainTaskConfig.ckpt_path:
@@ -477,7 +484,11 @@ def train_loop(exe,
 
     logging.info("begin reader")
     train_data = prepare_data_generator(
-        args, is_test=False, count=dev_count, pyreader=pyreader)
+        args,
+        is_test=False,
+        count=dev_count,
+        pyreader=pyreader,
+        py_reader_provider_wrapper=py_reader_provider_wrapper)
 
     # For faster executor
     exec_strategy = fluid.ExecutionStrategy()
diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/train.py b/fluid/PaddleNLP/sequence_tagging_for_ner/train.py
index 0b61d6fda6551f99f442f4e13618ca00b33d9557..68e621371e09b654007134c8ce449e3491b9516f 100644
--- a/fluid/PaddleNLP/sequence_tagging_for_ner/train.py
+++ b/fluid/PaddleNLP/sequence_tagging_for_ner/train.py
@@ -30,7 +30,9 @@ def test(exe, chunk_evaluator, inference_program, test_data, test_fetch_list,
         num_infer = np.array(rets[0])
         num_label = np.array(rets[1])
         num_correct = np.array(rets[2])
-        chunk_evaluator.update(num_infer[0], num_label[0], num_correct[0])
+        chunk_evaluator.update(num_infer[0].astype('int64'),
+                               num_label[0].astype('int64'),
+                               num_correct[0].astype('int64'))
     return chunk_evaluator.eval()
 
 
@@ -65,11 +67,11 @@ def main(train_data_file,
         input=feature_out, param_attr=fluid.ParamAttr(name='crfw'))
 
     (precision, recall, f1_score, num_infer_chunks, num_label_chunks,
-    num_correct_chunks) = fluid.layers.chunk_eval(
-        input=crf_decode,
-        label=target,
-        chunk_scheme="IOB",
-        num_chunk_types=int(math.ceil((label_dict_len - 1) / 2.0)))
+     num_correct_chunks) = fluid.layers.chunk_eval(
+         input=crf_decode,
+         label=target,
+         chunk_scheme="IOB",
+         num_chunk_types=int(math.ceil((label_dict_len - 1) / 2.0)))
     chunk_evaluator = fluid.metrics.ChunkEvaluator()
 
     inference_program = fluid.default_main_program().clone(for_test=True)
@@ -134,8 +136,9 @@ def main(train_data_file,
               " pass_f1_score:" + str(test_pass_f1_score))
 
         save_dirname = os.path.join(model_save_dir, "params_pass_%d" % pass_id)
-        fluid.io.save_inference_model(save_dirname, ['word', 'mark'],
-                                      crf_decode, exe)
+        if "CE_MODE_X" not in os.environ:
+            fluid.io.save_inference_model(save_dirname, ['word', 'mark'],
+                                          crf_decode, exe)
 
     if "CE_MODE_X" in os.environ:
         print("kpis	train_precision	%f" % pass_precision)
diff --git a/fluid/PaddleNLP/text_classification/async_executor/data_generator/build_raw_data.py b/fluid/PaddleNLP/text_classification/async_executor/data_generator/build_raw_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c0c0981c93b3b1e9231c7efe1f0b49e178c060f
--- /dev/null
+++ b/fluid/PaddleNLP/text_classification/async_executor/data_generator/build_raw_data.py
@@ -0,0 +1,62 @@
+#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Build raw data
+"""
+from __future__ import print_function
+import sys
+import os
+import random
+import re
+data_type = sys.argv[1]
+
+if not (data_type == "train" or data_type == "test"):
+    print("python %s [test/train]" % sys.argv[0], file=sys.stderr)
+    sys.exit(-1)
+
+pos_folder = "aclImdb/" + data_type + "/pos/"
+neg_folder = "aclImdb/" + data_type + "/neg/"
+
+pos_train_list = [(pos_folder + x, "1") for x in os.listdir(pos_folder)]
+neg_train_list = [(neg_folder + x, "0") for x in os.listdir(neg_folder)]
+
+all_train_list = pos_train_list + neg_train_list
+random.shuffle(all_train_list)
+
+
+def load_dict(dictfile):
+    """
+    Load word id dict
+    """
+    vocab = {}
+    wid = 0
+    with open(dictfile) as f:
+        for line in f:
+            vocab[line.strip()] = str(wid)
+            wid += 1
+    return vocab
+
+
+vocab = load_dict("aclImdb/imdb.vocab")
+unk_id = str(len(vocab))
+print("vocab size: ", len(vocab), file=sys.stderr)
+pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
+
+for fitem in all_train_list:
+    label = str(fitem[1])
+    fname = fitem[0]
+    with open(fname) as f:
+        sent = f.readline().lower().replace("<br />", " ").strip()
+        out_s = "%s | %s" % (sent, label)
+        print(out_s, file=sys.stdout)
diff --git a/fluid/PaddleRec/gru4rec/README.md b/fluid/PaddleRec/gru4rec/README.md
index 0ea3f838eaf9e2f46b7d1551a36aa1f6b462ce44..9c4a5247fe2ecb64e79ba96e0922f0fd1750aa8e 100644
--- a/fluid/PaddleRec/gru4rec/README.md
+++ b/fluid/PaddleRec/gru4rec/README.md
@@ -79,7 +79,7 @@ SessionId    ItemId    Time
 2    214757407    1396850438.247
 ```
 
-数据格式需要转换 运行脚本
+数据格式需要转换, 运行脚本如下
 ```
 python convert_format.py
 ```
@@ -101,7 +101,7 @@ python convert_format.py
 
 根据训练和测试文件生成字典和对应的paddle输入文件
 
-注意需要将训练文件放到一个目录下面，测试文件放到一个目录下面,同时支持多训练文件
+需要将训练文件放到目录raw_train_data下，测试文件放到目录raw_test_data下，并生成对应的train_data，test_data和vocab.txt文件
 ```
 python text2paddle.py raw_train_data/ raw_test_data/ train_data test_data vocab.txt
 ```
diff --git a/fluid/PaddleRec/gru4rec/net.py b/fluid/PaddleRec/gru4rec/net.py
index ebb512377eae865b90f3d0360931a744b1a0ad07..6a715443ff1e72ae77aba51d5eaffe4eefee9687 100644
--- a/fluid/PaddleRec/gru4rec/net.py
+++ b/fluid/PaddleRec/gru4rec/net.py
@@ -171,7 +171,8 @@ def train_cross_entropy_network(vocab_size, neg_size, hid_size, drop_out=0.2):
     ele_mul = fluid.layers.elementwise_mul(emb_label_drop, gru)
     red_sum = fluid.layers.reduce_sum(input=ele_mul, dim=1, keep_dim=True)
 
-    pre = fluid.layers.sequence_reshape(input=red_sum, new_dim=(neg_size + 1))
+    pre_ = fluid.layers.sequence_reshape(input=red_sum, new_dim=(neg_size + 1))
+    pre = fluid.layers.softmax(input=pre_)
 
     cost = fluid.layers.cross_entropy(input=pre, label=pos_label)
     cost_sum = fluid.layers.reduce_sum(input=cost)
diff --git a/fluid/PaddleRec/gru4rec/train_sample_neg.py b/fluid/PaddleRec/gru4rec/train_sample_neg.py
index eb7ec3d4901d9ab7916546d83570534c81a8b0ff..1b1736cf937723bc86693c0d8cd39e579735f129 100644
--- a/fluid/PaddleRec/gru4rec/train_sample_neg.py
+++ b/fluid/PaddleRec/gru4rec/train_sample_neg.py
@@ -68,9 +68,11 @@ def train():
 
     # Train program
     if args.loss == 'bpr':
+        print('bpr loss')
         src, pos_label, label, avg_cost = net.train_bpr_network(
             neg_size=args.neg_size, vocab_size=vocab_size, hid_size=hid_size)
     else:
+        print('cross-entory loss')
         src, pos_label, label, avg_cost = net.train_cross_entropy_network(
             neg_size=args.neg_size, vocab_size=vocab_size, hid_size=hid_size)
 
diff --git a/fluid/PaddleRec/gru4rec/utils.py b/fluid/PaddleRec/gru4rec/utils.py
index 429026b831454e44869238744c890e6139a6074d..1cd6a313b2a5097b16c473722737e0e6936f4e31 100644
--- a/fluid/PaddleRec/gru4rec/utils.py
+++ b/fluid/PaddleRec/gru4rec/utils.py
@@ -45,8 +45,8 @@ def to_lodtensor_bpr(raw_data, neg_size, vocab_size, place):
     neg_data = np.tile(pos_data, neg_size)
     np.random.shuffle(neg_data)
     for ii in range(length * neg_size):
-        if neg_data[ii] == pos_data[ii / neg_size]:
-            neg_data[ii] = pos_data[length - 1 - ii / neg_size]
+        if neg_data[ii] == pos_data[ii // neg_size]:
+            neg_data[ii] = pos_data[length - 1 - ii // neg_size]
 
     label_data = np.column_stack(
         (pos_data.reshape(length, 1), neg_data.reshape(length, neg_size)))
diff --git a/fluid/PaddleRec/ssr/infer.py b/fluid/PaddleRec/ssr/infer.py
index d5c9ee1b5dc95eb403932e0ff7534bfadc7568d7..38fb5cd762e117409b12ce8bd202f110a1cdfcb4 100644
--- a/fluid/PaddleRec/ssr/infer.py
+++ b/fluid/PaddleRec/ssr/infer.py
@@ -81,7 +81,7 @@ def infer(args, vocab_size, test_reader):
         start_up_program = fluid.Program()
         with fluid.program_guard(main_program, start_up_program):
             acc = model(vocab_size, emb_size, hid_size)
-            for epoch in xrange(start_index, last_index + 1):
+            for epoch in range(start_index, last_index + 1):
                 copy_program = main_program.clone()
                 model_path = model_dir + "/epoch_" + str(epoch)
                 fluid.io.load_params(
diff --git a/fluid/PaddleRec/word2vec/reader.py b/fluid/PaddleRec/word2vec/reader.py
index 3eae59cf510790ebc64f88d838863a3353a98ae4..df479a4b71bbb4c2b2297c4d04afe275ba1c9a81 100644
--- a/fluid/PaddleRec/word2vec/reader.py
+++ b/fluid/PaddleRec/word2vec/reader.py
@@ -70,8 +70,8 @@ class Word2VecReader(object):
         self.word_frequencys = [
             float(count) / word_all_count for count in word_counts
         ]
-        print("dict_size = " + str(
-            self.dict_size)) + " word_all_count = " + str(word_all_count)
+        print("dict_size = " + str(self.dict_size) + " word_all_count = " + str(
+            word_all_count))
 
         with io.open(dict_path + "_ptable", 'r', encoding='utf-8') as f2:
             for line in f2:
diff --git a/fluid/PaddleRec/word2vec/train.py b/fluid/PaddleRec/word2vec/train.py
index 97e6fa67a4f402d8543711c0186194e8c642a956..2c8512e4954c60a9f0f0c45533c08170d4e6e09b 100644
--- a/fluid/PaddleRec/word2vec/train.py
+++ b/fluid/PaddleRec/word2vec/train.py
@@ -203,7 +203,7 @@ def train_loop(args, train_program, reader, py_reader, loss, trainer_id):
         time.sleep(10)
         epoch_start = time.time()
         batch_id = 0
-        start = time.clock()
+        start = time.time()
 
         try:
             while True:
@@ -218,8 +218,8 @@ def train_loop(args, train_program, reader, py_reader, loss, trainer_id):
                                loss_val.mean(), py_reader.queue.size()))
                 if args.with_speed:
                     if batch_id % 1000 == 0 and batch_id != 0:
-                        elapsed = (time.clock() - start)
-                        start = time.clock()
+                        elapsed = (time.time() - start)
+                        start = time.time()
                         samples = 1001 * args.batch_size * int(
                             os.getenv("CPU_NUM"))
                         logger.info("Time used: {}, Samples/Sec: {}".format(
diff --git a/fluid/README.cn.rst b/fluid/README.cn.rst
index 811038c8aaedcce2b55fca54e647dcda98924db9..115b3e157e26c764ab44dcab71fffc42b2fb8dca 100644
--- a/fluid/README.cn.rst
+++ b/fluid/README.cn.rst
@@ -33,11 +33,14 @@ Fluid模型配置和参数文件的工具。
 VOC <http://host.robots.ox.ac.uk/pascal/VOC/>`__\ 、\ `MS
 COCO <http://cocodataset.org/#home>`__\ 数据训练通用物体检测模型，当前介绍了SSD算法，SSD全称Single Shot MultiBox Detector，是目标检测领域较新且效果较好的检测算法之一，具有检测速度快且检测精度高的特点。
 
-开放环境中的检测人脸，尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 `WIDER FACE <http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/>`_ 数据训练百度自研的人脸检测PyramidBox模型，该算法于2018年3月份在WIDER FACE的多项评测中均获得 `第一名 <http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html>`_。
+开放环境中的检测人脸，尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 `WIDER FACE <http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/>`_ 数据训练百度自研的人脸检测PyramidBox模型，该算法于2018年3月份在WIDER FACE的多项评测中均获得 `第一名 <http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html>`_ 。
+
+RCNN系列模型是典型的两阶段目标检测器，相较于传统提取区域的方法，RCNN中RPN网络通过共享卷积层参数大幅提高提取区域的效率，并提出高质量的候选区域。其中典型模型包括Faster RCNN和Mask RCNN。
 
 -  `Single Shot MultiBox
    Detector <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md>`__
 -  `Face Detector: PyramidBox <https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md>`_
+-  `RCNN <https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md>`_
 
 图像语义分割
 ------------
diff --git a/fluid/README.md b/fluid/README.md
index 9bbcb9623695319d34d0b986ad85d9029ef0b0a5..bb76fea7cf2a21064ba669c50a8cbda1ce1a109b 100644
--- a/fluid/README.md
+++ b/fluid/README.md
@@ -28,11 +28,14 @@ Fluid模型配置和参数文件的工具。
 
 开放环境中的检测人脸，尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 [WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace) 数据训练百度自研的人脸检测PyramidBox模型，该算法于2018年3月份在WIDER FACE的多项评测中均获得 [第一名](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html)。
 
-Faster RCNN 是典型的两阶段目标检测器，相较于传统提取区域的方法，Faster RCNN中RPN网络通过共享卷积层参数大幅提高提取区域的效率，并提出高质量的候选区域。
+Faster RCNN模型是典型的两阶段目标检测器，相较于传统提取区域的方法，通过RPN网络共享卷积层参数大幅提高提取区域的效率，并提出高质量的候选区域。
+
+Mask RCNN模型是基于Faster RCNN模型的经典实例分割模型，在原有Faster RCNN模型基础上添加分割分支，得到掩码结果，实现了掩码和类别预测关系的解藕。
 
 -  [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md)
 -  [Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md)
--  [Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/faster_rcnn/README_cn.md)
+-  [Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md)
+-  [Mask RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md)
 
 图像语义分割
 ------------