Merge branch 'develop' of https://github.com/PaddlePaddle/models into...

Merge branch 'develop' of https://github.com/PaddlePaddle/models into add-transformer-BeamsearchDecoder-clean

Merge branch 'develop' of https://github.com/PaddlePaddle/models into...
Merge branch 'develop' of https://github.com/PaddlePaddle/models into add-transformer-BeamsearchDecoder-clean
3cbf0f73 · guosheng · 07126083 · 96b2eda5 · 3cbf0f73 · 3cbf0f73
118 changed file
--- a/deep_fm/README.cn.md
+++ b/deep_fm/README.cn.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html)中的说明更新PaddlePaddle安装版本。
+---
+# 基于深度因子分解机的点击率预估模型
+## 介绍
+本模型实现了下述论文中提出的DeepFM模型：
+```text
+@inproceedings{guo2017deepfm,
+  title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction},
+  author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He},
+  booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)},
+  pages={1725--1731},
+  year={2017}
+}
+```
+DeepFM模型把因子分解机和深度神经网络的低阶和高阶特征的相互作用结合起来，有关因子分解机的详细信息，请参考论文[因子分解机](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf)。
+## 数据集
+本文使用的是Kaggle公司举办的[展示广告竞赛](https://www.kaggle.com/c/criteo-display-ad-challenge/)中所使用的Criteo数据集。
+每一行是一次广告展示的特征，第一列是一个标签，表示这次广告展示是否被点击。总共有39个特征，其中13个特征采用整型值，另外26个特征是类别类特征。测试集中是没有标签的。
+下载数据集：
+```bash
+cd data && ./download.sh && cd ..
+```
+## 模型
+DeepFM模型是由因子分解机（FM）和深度神经网络（DNN）组成的。所有的输入特征都会同时输入FM和DNN，最后把FM和DNN的输出结合在一起形成最终的输出。DNN中稀疏特征生成的嵌入层与FM层中的隐含向量（因子）共享参数。
+PaddlePaddle中的因子分解机层负责计算二阶组合特征的相互关系。以下的代码示例结合了因子分解机层和全连接层，形成了完整的的因子分解机：
+```python
+def fm_layer(input, factor_size):
+    first_order = paddle.layer.fc(input=input, size=1, act=paddle.activation.Linear())
+    second_order = paddle.layer.factorization_machine(input=input, factor_size=factor_size)
+    fm = paddle.layer.addto(input=[first_order, second_order],
+                            act=paddle.activation.Linear(),
+                            bias_attr=False)
+    return fm
+```
+## 数据准备
+处理原始数据集，整型特征使用min-max归一化方法规范到[0, 1]，类别类特征使用了one-hot编码。原始数据集分割成两部分：90%用于训练，其他10%用于训练过程中的验证。
+```bash
+python preprocess.py --datadir ./data/raw --outdir ./data
+```
+## 训练
+训练的命令行选项可以通过`python train.py -h`列出。
+训练模型：
+```bash
+python train.py \
+        --train_data_path data/train.txt \
+        --test_data_path data/valid.txt \
+        2>&1 | tee train.log
+```
+训练到第9轮的第40000个batch后，测试的AUC为0.807178，误差（cost）为0.445196。
+## 预测
+预测的命令行选项可以通过`python infer.py -h`列出。
+对测试集进行预测：
+```bash
+python infer.py \
+        --model_gz_path models/model-pass-9-batch-10000.tar.gz \
+        --data_path data/test.txt \
+        --prediction_output_path ./predict.txt
+```
--- a/fluid/DeepASR/README.md
+++ b/fluid/DeepASR/README.md
 The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
+## Deep Automatic Speech Recognition
-### TODO
-This project is still under active development.
+### Introduction
+TBD
+### Installation
+#### Kaldi
+The decoder depends on [kaldi](https://github.com/kaldi-asr/kaldi), install it by flowing its instructions. Then
+```shell
+export KALDI_ROOT=<absolute path to kaldi>
+```
+#### Decoder
+```shell
+git clone https://github.com/PaddlePaddle/models.git
+cd models/fluid/DeepASR/decoder
+sh setup.sh
+```
+### Data reprocessing
+TBD
+### Training
+TBD
+### Inference & Decoding
+TBD
+### Question and Contribution
+TBD
--- a/fluid/DeepASR/data_utils/augmentor/tests/test_data_trans.py
+++ b/fluid/DeepASR/data_utils/augmentor/tests/test_data_trans.py
@@ -8,6 +8,7 @@ import numpy as np
 import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
 import data_utils.augmentor.trans_add_delta as trans_add_delta
 import data_utils.augmentor.trans_splice as trans_splice
+import data_utils.augmentor.trans_delay as trans_delay
 class TestTransMeanVarianceNorm(unittest.TestCase):
@@ -112,5 +113,24 @@ class TestTransSplict(unittest.TestCase):
                        self.assertAlmostEqual(feature[i][j * 10 + k], cur_val)
+class TestTransDelay(unittest.TestCase):
+    """unittest TransDelay
+    """
+    def test_perform(self):
+        label = np.zeros((10, 1), dtype="int64")
+        for i in xrange(10):
+            label[i][0] = i
+        trans = trans_delay.TransDelay(5)
+        (_, label, _) = trans.perform_trans((None, label, None))
+        for i in xrange(5):
+            self.assertAlmostEqual(label[i + 5][0], i)
+        for i in xrange(5):
+            self.assertAlmostEqual(label[i][0], 0)
 if __name__ == '__main__':
    unittest.main()
--- a/fluid/DeepASR/data_utils/augmentor/trans_delay.py
+++ b/fluid/DeepASR/data_utils/augmentor/trans_delay.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import math
+class TransDelay(object):
+    """ Delay label, and copy first label value in the front. 
+        Attributes:
+            _delay_time : the delay frame num of label 
+    """
+    def __init__(self, delay_time):
+        """init construction
+            Args:
+                delay_time : the delay frame num of label
+        """
+        self._delay_time = delay_time
+    def perform_trans(self, sample):
+        """ 
+            Args:
+                sample(object):input sample, contain feature numpy and label numpy, sample name list
+            Returns:
+                (feature, label, name)
+        """
+        (feature, label, name) = sample
+        shape = label.shape
+        assert len(shape) == 2
+        label[self._delay_time:shape[0]] = label[0:shape[0] - self._delay_time]
+        for i in xrange(self._delay_time):
+            label[i][0] = label[self._delay_time][0]
+        return (feature, label, name)
--- a/fluid/DeepASR/examples/aishell/prepare_data.sh
+++ b/fluid/DeepASR/examples/aishell/prepare_data.sh
 data_dir=~/.cache/paddle/dataset/speech/deep_asr_data/aishell
 data_url='http://deep-asr-data.gz.bcebos.com/aishell_data.tar.gz'
 lst_url='http://deep-asr-data.gz.bcebos.com/aishell_lst.tar.gz'
-md5=e017d858d9e509c8a84b73f673f08b9a
+md5=17669b8d63331c9326f4a9393d289bfb
 if [ ! -e $data_dir ]; then
    mkdir -p $data_dir

--- a/fluid/DeepASR/examples/aishell/profile.sh
+++ b/fluid/DeepASR/examples/aishell/profile.sh
-export CUDA_VISIBLE_DEVICES=2,3,4,5
+export CUDA_VISIBLE_DEVICES=0,1,2,3
 python -u ../../tools/profile.py --feature_lst data/train_feature.lst \
                   --label_lst data/train_label.lst \
                   --mean_var data/aishell/global_mean_var \
                   --parallel \
-                   --frame_dim 2640  \
+                   --frame_dim 80  \
-                   --class_num 101  \
+                   --class_num 3040  \
--- a/fluid/DeepASR/examples/aishell/train.sh
+++ b/fluid/DeepASR/examples/aishell/train.sh
-export CUDA_VISIBLE_DEVICES=2,3,4,5
+export CUDA_VISIBLE_DEVICES=0,1,2,3
 python -u ../../train.py --train_feature_lst data/train_feature.lst \
                   --train_label_lst data/train_label.lst \
                   --val_feature_lst data/val_feature.lst \
                   --val_label_lst data/val_label.lst \
                   --mean_var data/aishell/global_mean_var \
                   --checkpoints checkpoints \
-                   --frame_dim 2640  \
+                   --frame_dim 80  \
-                   --class_num 101  \
+                   --class_num 3040  \
                   --infer_models '' \
-                   --batch_size 128 \
+                   --batch_size 64 \
-                   --learning_rate 0.00016 \
+                   --learning_rate 6.4e-5 \
                   --parallel
+~
--- a/fluid/DeepASR/infer_by_ckpt.py
+++ b/fluid/DeepASR/infer_by_ckpt.py
@@ -12,6 +12,7 @@ import paddle.fluid as fluid
 import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
 import data_utils.augmentor.trans_add_delta as trans_add_delta
 import data_utils.augmentor.trans_splice as trans_splice
+import data_utils.augmentor.trans_delay as trans_delay
 import data_utils.async_data_reader as reader
 from decoder.post_decode_faster import Decoder
 from data_utils.util import lodtensor_to_ndarray
@@ -36,7 +37,7 @@ def parse_args():
    parser.add_argument(
        '--frame_dim',
        type=int,
-        default=120 * 11,
+        default=80,
        help='Frame dimension of feature data. (default: %(default)d)')
    parser.add_argument(
        '--stacked_num',
@@ -179,7 +180,7 @@ def infer_from_ckpt(args):
    ltrans = [
        trans_add_delta.TransAddDelta(2, 2),
        trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
-        trans_splice.TransSplice()
+        trans_splice.TransSplice(), trans_delay.TransDelay(5)
    ]
    feature_t = fluid.LoDTensor()

--- a/fluid/DeepASR/model_utils/model.py
+++ b/fluid/DeepASR/model_utils/model.py
@@ -32,25 +32,23 @@ def stacked_lstmp_model(frame_dim,
    # network configuration
    def _net_conf(feature, label):
-        seq_conv1 = fluid.layers.sequence_conv(
+        conv1 = fluid.layers.conv2d(
            input=feature,
-            num_filters=1024,
+            num_filters=32,
            filter_size=3,
-            filter_stride=1,
+            stride=1,
-            bias_attr=True)
+            padding=1,
-        bn1 = fluid.layers.batch_norm(
+            bias_attr=True,
-            input=seq_conv1,
+            act="relu")
-            act="sigmoid",
-            is_test=not is_train,
-            momentum=0.9,
-            epsilon=1e-05,
-            data_layout='NCHW')
-        stack_input = bn1
+        pool1 = fluid.layers.pool2d(
+            conv1, pool_size=3, pool_type="max", pool_stride=2, pool_padding=0)
+        stack_input = pool1
        for i in range(stacked_num):
            fc = fluid.layers.fc(input=stack_input,
                                 size=hidden_dim * 4,
-                                 bias_attr=True)
+                                 bias_attr=None)
            proj, cell = fluid.layers.dynamic_lstmp(
                input=fc,
                size=hidden_dim * 4,
@@ -62,7 +60,6 @@ def stacked_lstmp_model(frame_dim,
                proj_activation="tanh")
            bn = fluid.layers.batch_norm(
                input=proj,
-                act="sigmoid",
                is_test=not is_train,
                momentum=0.9,
                epsilon=1e-05,
@@ -80,7 +77,10 @@ def stacked_lstmp_model(frame_dim,
    # data feeder
    feature = fluid.layers.data(
-        name="feature", shape=[-1, frame_dim], dtype="float32", lod_level=1)
+        name="feature",
+        shape=[-1, 3, 11, frame_dim],
+        dtype="float32",
+        lod_level=1)
    label = fluid.layers.data(
        name="label", shape=[-1, 1], dtype="int64", lod_level=1)

--- a/fluid/DeepASR/tools/profile.py
+++ b/fluid/DeepASR/tools/profile.py
@@ -13,6 +13,7 @@ import _init_paths
 import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
 import data_utils.augmentor.trans_add_delta as trans_add_delta
 import data_utils.augmentor.trans_splice as trans_splice
+import data_utils.augmentor.trans_delay as trans_delay
 import data_utils.async_data_reader as reader
 from model_utils.model import stacked_lstmp_model
 from data_utils.util import lodtensor_to_ndarray
@@ -87,7 +88,7 @@ def parse_args():
    parser.add_argument(
        '--max_batch_num',
        type=int,
-        default=10,
+        default=11,
        help='Maximum number of batches for profiling. (default: %(default)d)')
    parser.add_argument(
        '--first_batches_to_skip',
@@ -146,10 +147,10 @@ def profile(args):
    ltrans = [
        trans_add_delta.TransAddDelta(2, 2),
        trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
-        trans_splice.TransSplice()
+        trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5)
    ]
-    data_reader = reader.AsyncDataReader(args.feature_lst, args.label_lst)
+    data_reader = reader.AsyncDataReader(args.feature_lst, args.label_lst, -1)
    data_reader.set_transformers(ltrans)
    feature_t = fluid.LoDTensor()
@@ -169,6 +170,8 @@ def profile(args):
                frames_seen = 0
            # load_data
            (features, labels, lod, _) = batch_data
+            features = np.reshape(features, (-1, 11, 3, args.frame_dim))
+            features = np.transpose(features, (0, 2, 1, 3))
            feature_t.set(features, place)
            feature_t.set_lod([lod])
            label_t.set(labels, place)

--- a/fluid/DeepASR/train.py
+++ b/fluid/DeepASR/train.py
@@ -12,6 +12,7 @@ import paddle.fluid as fluid
 import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
 import data_utils.augmentor.trans_add_delta as trans_add_delta
 import data_utils.augmentor.trans_splice as trans_splice
+import data_utils.augmentor.trans_delay as trans_delay
 import data_utils.async_data_reader as reader
 from data_utils.util import lodtensor_to_ndarray
 from model_utils.model import stacked_lstmp_model
@@ -33,7 +34,7 @@ def parse_args():
    parser.add_argument(
        '--frame_dim',
        type=int,
-        default=120 * 11,
+        default=80,
        help='Frame dimension of feature data. (default: %(default)d)')
    parser.add_argument(
        '--stacked_num',
@@ -53,7 +54,7 @@ def parse_args():
    parser.add_argument(
        '--class_num',
        type=int,
-        default=1749,
+        default=3040,
        help='Number of classes in label. (default: %(default)d)')
    parser.add_argument(
        '--pass_num',
@@ -157,6 +158,7 @@ def train(args):
    # program for test
    test_program = fluid.default_main_program().clone()
+    #optimizer = fluid.optimizer.Momentum(learning_rate=args.learning_rate, momentum=0.9)
    optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
    optimizer.minimize(avg_cost)
@@ -171,7 +173,7 @@ def train(args):
    ltrans = [
        trans_add_delta.TransAddDelta(2, 2),
        trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
-        trans_splice.TransSplice()
+        trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5)
    ]
    feature_t = fluid.LoDTensor()
@@ -193,6 +195,8 @@ def train(args):
                                                args.minimum_batch_size)):
            # load_data
            (features, labels, lod, _) = batch_data
+            features = np.reshape(features, (-1, 11, 3, args.frame_dim))
+            features = np.transpose(features, (0, 2, 1, 3))
            feature_t.set(features, place)
            feature_t.set_lod([lod])
            label_t.set(labels, place)
@@ -220,6 +224,8 @@ def train(args):
                                                 args.minimum_batch_size)):
            # load_data
            (features, labels, lod, name_lst) = batch_data
+            features = np.reshape(features, (-1, 11, 3, args.frame_dim))
+            features = np.transpose(features, (0, 2, 1, 3))
            feature_t.set(features, place)
            feature_t.set_lod([lod])
            label_t.set(labels, place)

--- a/fluid/DeepQNetwork/DQN.py
+++ b/fluid/DeepQNetwork/DQN.py
-#-*- coding: utf-8 -*-
-#File: DQN.py
-from agent import Model
-import gym
-import argparse
-from tqdm import tqdm
-from expreplay import ReplayMemory, Experience
-import numpy as np
-import os
-UPDATE_FREQ = 4
-MEMORY_WARMUP_SIZE = 1000
-def run_episode(agent, env, exp, train_or_test):
-    assert train_or_test in ['train', 'test'], train_or_test
-    total_reward = 0
-    state = env.reset()
-    for step in range(200):
-        action = agent.act(state, train_or_test)
-        next_state, reward, isOver, _ = env.step(action)
-        if train_or_test == 'train':
-            exp.append(Experience(state, action, reward, isOver))
-            # train model
-            # start training 
-            if len(exp) > MEMORY_WARMUP_SIZE:
-                batch_idx = np.random.randint(
-                    len(exp) - 1, size=(args.batch_size))
-                if step % UPDATE_FREQ == 0:
-                    batch_state, batch_action, batch_reward, \
-                    batch_next_state, batch_isOver = exp.sample(batch_idx)
-                    agent.train(batch_state, batch_action, batch_reward, \
-                                batch_next_state, batch_isOver)
-        total_reward += reward
-        state = next_state
-        if isOver:
-            break
-    return total_reward
-def train_agent():
-    env = gym.make(args.env)
-    state_shape = env.observation_space.shape
-    exp = ReplayMemory(args.mem_size, state_shape)
-    action_dim = env.action_space.n
-    agent = Model(state_shape[0], action_dim, gamma=0.99)
-    while len(exp) < MEMORY_WARMUP_SIZE:
-        run_episode(agent, env, exp, train_or_test='train')
-    max_episode = 4000
-    # train
-    total_episode = 0
-    pbar = tqdm(total=max_episode)
-    recent_100_reward = []
-    for episode in xrange(max_episode):
-        # start epoch
-        total_reward = run_episode(agent, env, exp, train_or_test='train')
-        pbar.set_description('[train]exploration:{}'.format(agent.exploration))
-        pbar.update()
-        # recent 100 reward
-        total_reward = run_episode(agent, env, exp, train_or_test='test')
-        recent_100_reward.append(total_reward)
-        if len(recent_100_reward) > 100:
-            recent_100_reward = recent_100_reward[1:]
-        pbar.write("episode:{}    test_reward:{}".format(\
-                    episode, np.mean(recent_100_reward)))
-    pbar.close()
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
-    parser.add_argument('--env', type=str, default='MountainCar-v0', \
-                        help='enviroment to train DQN model, e.g CartPole-v0')
-    parser.add_argument('--gamma', type=float, default=0.99, \
-                        help='discount factor for accumulated reward computation')
-    parser.add_argument('--mem_size', type=int, default=500000, \
-                        help='memory size for experience replay')
-    parser.add_argument('--batch_size', type=int, default=192, \
-                        help='batch size for training')
-    args = parser.parse_args()
-    train_agent()
--- a/fluid/DeepQNetwork/DQN_agent.py
+++ b/fluid/DeepQNetwork/DQN_agent.py
+#-*- coding: utf-8 -*-
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+import numpy as np
+import math
+from tqdm import tqdm
+from utils import fluid_flatten
+class DQNModel(object):
+    def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False):
+        self.img_height = state_dim[0]
+        self.img_width = state_dim[1]
+        self.action_dim = action_dim
+        self.gamma = gamma
+        self.exploration = 1.1
+        self.update_target_steps = 10000 // 4
+        self.hist_len = hist_len
+        self.use_cuda = use_cuda
+        self.global_step = 0
+        self._build_net()
+    def _get_inputs(self):
+        return fluid.layers.data(
+                   name='state',
+                   shape=[self.hist_len, self.img_height, self.img_width],
+                   dtype='float32'), \
+               fluid.layers.data(
+                   name='action', shape=[1], dtype='int32'), \
+               fluid.layers.data(
+                   name='reward', shape=[], dtype='float32'), \
+               fluid.layers.data(
+                   name='next_s',
+                   shape=[self.hist_len, self.img_height, self.img_width],
+                   dtype='float32'), \
+               fluid.layers.data(
+                   name='isOver', shape=[], dtype='bool')
+    def _build_net(self):
+        state, action, reward, next_s, isOver = self._get_inputs()
+        self.pred_value = self.get_DQN_prediction(state)
+        self.predict_program = fluid.default_main_program().clone()
+        reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
+        action_onehot = fluid.layers.one_hot(action, self.action_dim)
+        action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
+        pred_action_value = fluid.layers.reduce_sum(
+            fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
+        targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
+        best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
+        best_v.stop_gradient = True
+        target = reward + (1.0 - fluid.layers.cast(
+            isOver, dtype='float32')) * self.gamma * best_v
+        cost = fluid.layers.square_error_cost(pred_action_value, target)
+        cost = fluid.layers.reduce_mean(cost)
+        self._sync_program = self._build_sync_target_network()
+        optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
+        optimizer.minimize(cost)
+        # define program
+        self.train_program = fluid.default_main_program()
+        # fluid exe
+        place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
+        self.exe = fluid.Executor(place)
+        self.exe.run(fluid.default_startup_program())
+    def get_DQN_prediction(self, image, target=False):
+        image = image / 255.0
+        variable_field = 'target' if target else 'policy'
+        conv1 = fluid.layers.conv2d(
+            input=image,
+            num_filters=32,
+            filter_size=[5, 5],
+            stride=[1, 1],
+            padding=[2, 2],
+            act='relu',
+            param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
+        max_pool1 = fluid.layers.pool2d(
+            input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+        conv2 = fluid.layers.conv2d(
+            input=max_pool1,
+            num_filters=32,
+            filter_size=[5, 5],
+            stride=[1, 1],
+            padding=[2, 2],
+            act='relu',
+            param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
+        max_pool2 = fluid.layers.pool2d(
+            input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+        conv3 = fluid.layers.conv2d(
+            input=max_pool2,
+            num_filters=64,
+            filter_size=[4, 4],
+            stride=[1, 1],
+            padding=[1, 1],
+            act='relu',
+            param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
+        max_pool3 = fluid.layers.pool2d(
+            input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+        conv4 = fluid.layers.conv2d(
+            input=max_pool3,
+            num_filters=64,
+            filter_size=[3, 3],
+            stride=[1, 1],
+            padding=[1, 1],
+            act='relu',
+            param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
+        flatten = fluid_flatten(conv4)
+        out = fluid.layers.fc(
+            input=flatten,
+            size=self.action_dim,
+            param_attr=ParamAttr(name='{}_fc1'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
+        return out
+    def _build_sync_target_network(self):
+        vars = list(fluid.default_main_program().list_vars())
+        policy_vars = filter(
+            lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars)
+        target_vars = filter(
+            lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)
+        policy_vars.sort(key=lambda x: x.name)
+        target_vars.sort(key=lambda x: x.name)
+        sync_program = fluid.default_main_program().clone()
+        with fluid.program_guard(sync_program):
+            sync_ops = []
+            for i, var in enumerate(policy_vars):
+                sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
+                sync_ops.append(sync_op)
+        sync_program = sync_program.prune(sync_ops)
+        return sync_program
+    def act(self, state, train_or_test):
+        sample = np.random.random()
+        if train_or_test == 'train' and sample < self.exploration:
+            act = np.random.randint(self.action_dim)
+        else:
+            if np.random.random() < 0.01:
+                act = np.random.randint(self.action_dim)
+            else:
+                state = np.expand_dims(state, axis=0)
+                pred_Q = self.exe.run(self.predict_program,
+                                      feed={'state': state.astype('float32')},
+                                      fetch_list=[self.pred_value])[0]
+                pred_Q = np.squeeze(pred_Q, axis=0)
+                act = np.argmax(pred_Q)
+        if train_or_test == 'train':
+            self.exploration = max(0.1, self.exploration - 1e-6)
+        return act
+    def train(self, state, action, reward, next_state, isOver):
+        if self.global_step % self.update_target_steps == 0:
+            self.sync_target_network()
+        self.global_step += 1
+        action = np.expand_dims(action, -1)
+        self.exe.run(self.train_program,
+                     feed={
+                         'state': state.astype('float32'),
+                         'action': action.astype('int32'),
+                         'reward': reward,
+                         'next_s': next_state.astype('float32'),
+                         'isOver': isOver
+                     })
+    def sync_target_network(self):
+        self.exe.run(self._sync_program)
--- a/fluid/DeepQNetwork/DoubleDQN_agent.py
+++ b/fluid/DeepQNetwork/DoubleDQN_agent.py
+#-*- coding: utf-8 -*-
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+import numpy as np
+from tqdm import tqdm
+import math
+from utils import fluid_argmax, fluid_flatten
+class DoubleDQNModel(object):
+    def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False):
+        self.img_height = state_dim[0]
+        self.img_width = state_dim[1]
+        self.action_dim = action_dim
+        self.gamma = gamma
+        self.exploration = 1.1
+        self.update_target_steps = 10000 // 4
+        self.hist_len = hist_len
+        self.use_cuda = use_cuda
+        self.global_step = 0
+        self._build_net()
+    def _get_inputs(self):
+        return fluid.layers.data(
+                   name='state',
+                   shape=[self.hist_len, self.img_height, self.img_width],
+                   dtype='float32'), \
+               fluid.layers.data(
+                   name='action', shape=[1], dtype='int32'), \
+               fluid.layers.data(
+                   name='reward', shape=[], dtype='float32'), \
+               fluid.layers.data(
+                   name='next_s',
+                   shape=[self.hist_len, self.img_height, self.img_width],
+                   dtype='float32'), \
+               fluid.layers.data(
+                   name='isOver', shape=[], dtype='bool')
+    def _build_net(self):
+        state, action, reward, next_s, isOver = self._get_inputs()
+        self.pred_value = self.get_DQN_prediction(state)
+        self.predict_program = fluid.default_main_program().clone()
+        reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
+        action_onehot = fluid.layers.one_hot(action, self.action_dim)
+        action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
+        pred_action_value = fluid.layers.reduce_sum(
+            fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
+        targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
+        next_s_predcit_value = self.get_DQN_prediction(next_s)
+        greedy_action = fluid_argmax(next_s_predcit_value)
+        predict_onehot = fluid.layers.one_hot(greedy_action, self.action_dim)
+        best_v = fluid.layers.reduce_sum(
+            fluid.layers.elementwise_mul(predict_onehot, targetQ_predict_value),
+            dim=1)
+        best_v.stop_gradient = True
+        target = reward + (1.0 - fluid.layers.cast(
+            isOver, dtype='float32')) * self.gamma * best_v
+        cost = fluid.layers.square_error_cost(pred_action_value, target)
+        cost = fluid.layers.reduce_mean(cost)
+        self._sync_program = self._build_sync_target_network()
+        optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
+        optimizer.minimize(cost)
+        # define program
+        self.train_program = fluid.default_main_program()
+        # fluid exe
+        place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
+        self.exe = fluid.Executor(place)
+        self.exe.run(fluid.default_startup_program())
+    def get_DQN_prediction(self, image, target=False):
+        image = image / 255.0
+        variable_field = 'target' if target else 'policy'
+        conv1 = fluid.layers.conv2d(
+            input=image,
+            num_filters=32,
+            filter_size=[5, 5],
+            stride=[1, 1],
+            padding=[2, 2],
+            act='relu',
+            param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
+        max_pool1 = fluid.layers.pool2d(
+            input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+        conv2 = fluid.layers.conv2d(
+            input=max_pool1,
+            num_filters=32,
+            filter_size=[5, 5],
+            stride=[1, 1],
+            padding=[2, 2],
+            act='relu',
+            param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
+        max_pool2 = fluid.layers.pool2d(
+            input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+        conv3 = fluid.layers.conv2d(
+            input=max_pool2,
+            num_filters=64,
+            filter_size=[4, 4],
+            stride=[1, 1],
+            padding=[1, 1],
+            act='relu',
+            param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
+        max_pool3 = fluid.layers.pool2d(
+            input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+        conv4 = fluid.layers.conv2d(
+            input=max_pool3,
+            num_filters=64,
+            filter_size=[3, 3],
+            stride=[1, 1],
+            padding=[1, 1],
+            act='relu',
+            param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
+        flatten = fluid_flatten(conv4)
+        out = fluid.layers.fc(
+            input=flatten,
+            size=self.action_dim,
+            param_attr=ParamAttr(name='{}_fc1'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
+        return out
+    def _build_sync_target_network(self):
+        vars = list(fluid.default_main_program().list_vars())
+        policy_vars = filter(
+            lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars)
+        target_vars = filter(
+            lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)
+        policy_vars.sort(key=lambda x: x.name)
+        target_vars.sort(key=lambda x: x.name)
+        sync_program = fluid.default_main_program().clone()
+        with fluid.program_guard(sync_program):
+            sync_ops = []
+            for i, var in enumerate(policy_vars):
+                sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
+                sync_ops.append(sync_op)
+        sync_program = sync_program.prune(sync_ops)
+        return sync_program
+    def act(self, state, train_or_test):
+        sample = np.random.random()
+        if train_or_test == 'train' and sample < self.exploration:
+            act = np.random.randint(self.action_dim)
+        else:
+            if np.random.random() < 0.01:
+                act = np.random.randint(self.action_dim)
+            else:
+                state = np.expand_dims(state, axis=0)
+                pred_Q = self.exe.run(self.predict_program,
+                                      feed={'state': state.astype('float32')},
+                                      fetch_list=[self.pred_value])[0]
+                pred_Q = np.squeeze(pred_Q, axis=0)
+                act = np.argmax(pred_Q)
+        if train_or_test == 'train':
+            self.exploration = max(0.1, self.exploration - 1e-6)
+        return act
+    def train(self, state, action, reward, next_state, isOver):
+        if self.global_step % self.update_target_steps == 0:
+            self.sync_target_network()
+        self.global_step += 1
+        action = np.expand_dims(action, -1)
+        self.exe.run(self.train_program,
+                     feed={
+                         'state': state.astype('float32'),
+                         'action': action.astype('int32'),
+                         'reward': reward,
+                         'next_s': next_state.astype('float32'),
+                         'isOver': isOver
+                     })
+    def sync_target_network(self):
+        self.exe.run(self._sync_program)
--- a/fluid/DeepQNetwork/agent.py
+++ b/fluid/DeepQNetwork/agent.py
 #-*- coding: utf-8 -*-
-#File: agent.py
 import paddle.fluid as fluid
 from paddle.fluid.param_attr import ParamAttr
 import numpy as np
 from tqdm import tqdm
 import math
+from utils import fluid_flatten
-UPDATE_TARGET_STEPS = 200
+class DuelingDQNModel(object):
-class Model(object):
+    def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False):
-    def __init__(self, state_dim, action_dim, gamma):
+        self.img_height = state_dim[0]
-        self.global_step = 0
+        self.img_width = state_dim[1]
-        self.state_dim = state_dim
        self.action_dim = action_dim
        self.gamma = gamma
-        self.exploration = 1.0
+        self.exploration = 1.1
+        self.update_target_steps = 10000 // 4
+        self.hist_len = hist_len
+        self.use_cuda = use_cuda
+        self.global_step = 0
        self._build_net()
    def _get_inputs(self):
-        return [fluid.layers.data(\
+        return fluid.layers.data(
-                    name='state', shape=[self.state_dim], dtype='float32'),
+                   name='state',
-                fluid.layers.data(\
+                   shape=[self.hist_len, self.img_height, self.img_width],
-                    name='action', shape=[1], dtype='int32'),
+                   dtype='float32'), \
-                fluid.layers.data(\
+               fluid.layers.data(
-                    name='reward', shape=[], dtype='float32'),
+                   name='action', shape=[1], dtype='int32'), \
-                fluid.layers.data(\
+               fluid.layers.data(
-                    name='next_s', shape=[self.state_dim], dtype='float32'),
+                   name='reward', shape=[], dtype='float32'), \
-                fluid.layers.data(\
+               fluid.layers.data(
-                  name='isOver', shape=[], dtype='bool')]
+                   name='next_s',
+                   shape=[self.hist_len, self.img_height, self.img_width],
+                   dtype='float32'), \
+               fluid.layers.data(
+                   name='isOver', shape=[], dtype='bool')
    def _build_net(self):
        state, action, reward, next_s, isOver = self._get_inputs()
        self.pred_value = self.get_DQN_prediction(state)
        self.predict_program = fluid.default_main_program().clone()
+        reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
        action_onehot = fluid.layers.one_hot(action, self.action_dim)
        action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
-        pred_action_value = fluid.layers.reduce_sum(\
+        pred_action_value = fluid.layers.reduce_sum(
-                    fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
+            fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
        targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
        best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
        best_v.stop_gradient = True
-        target = reward + (1.0 - fluid.layers.cast(\
+        target = reward + (1.0 - fluid.layers.cast(
            isOver, dtype='float32')) * self.gamma * best_v
-        cost = fluid.layers.square_error_cost(\
+        cost = fluid.layers.square_error_cost(pred_action_value, target)
-            input=pred_action_value, label=target)
        cost = fluid.layers.reduce_mean(cost)
        self._sync_program = self._build_sync_target_network()
-        optimizer = fluid.optimizer.Adam(1e-3)
+        optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
        optimizer.minimize(cost)
        # define program
        self.train_program = fluid.default_main_program()
        # fluid exe
-        place = fluid.CUDAPlace(0)
+        place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
        self.exe = fluid.Executor(place)
        self.exe.run(fluid.default_startup_program())
-    def get_DQN_prediction(self, state, target=False):
+    def get_DQN_prediction(self, image, target=False):
+        image = image / 255.0
        variable_field = 'target' if target else 'policy'
-        # layer fc1
-        param_attr = ParamAttr(name='{}_fc1'.format(variable_field))
+        conv1 = fluid.layers.conv2d(
-        bias_attr = ParamAttr(name='{}_fc1_b'.format(variable_field))
+            input=image,
-        fc1 = fluid.layers.fc(input=state,
+            num_filters=32,
-                              size=256,
+            filter_size=[5, 5],
-                              act='relu',
+            stride=[1, 1],
-                              param_attr=param_attr,
+            padding=[2, 2],
-                              bias_attr=bias_attr)
+            act='relu',
+            param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
-        param_attr = ParamAttr(name='{}_fc2'.format(variable_field))
+            bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
-        bias_attr = ParamAttr(name='{}_fc2_b'.format(variable_field))
+        max_pool1 = fluid.layers.pool2d(
-        fc2 = fluid.layers.fc(input=fc1,
+            input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
-                              size=128,
-                              act='tanh',
+        conv2 = fluid.layers.conv2d(
-                              param_attr=param_attr,
+            input=max_pool1,
-                              bias_attr=bias_attr)
+            num_filters=32,
+            filter_size=[5, 5],
-        param_attr = ParamAttr(name='{}_fc3'.format(variable_field))
+            stride=[1, 1],
-        bias_attr = ParamAttr(name='{}_fc3_b'.format(variable_field))
+            padding=[2, 2],
-        value = fluid.layers.fc(input=fc2,
+            act='relu',
-                                size=self.action_dim,
+            param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
-                                param_attr=param_attr,
+            bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
-                                bias_attr=bias_attr)
+        max_pool2 = fluid.layers.pool2d(
+            input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
-        return value
+        conv3 = fluid.layers.conv2d(
+            input=max_pool2,
+            num_filters=64,
+            filter_size=[4, 4],
+            stride=[1, 1],
+            padding=[1, 1],
+            act='relu',
+            param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
+        max_pool3 = fluid.layers.pool2d(
+            input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
+        conv4 = fluid.layers.conv2d(
+            input=max_pool3,
+            num_filters=64,
+            filter_size=[3, 3],
+            stride=[1, 1],
+            padding=[1, 1],
+            act='relu',
+            param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
+        flatten = fluid_flatten(conv4)
+        value = fluid.layers.fc(
+            input=flatten,
+            size=1,
+            param_attr=ParamAttr(name='{}_value_fc'.format(variable_field)),
+            bias_attr=ParamAttr(name='{}_value_fc_b'.format(variable_field)))
+        advantage = fluid.layers.fc(
+            input=flatten,
+            size=self.action_dim,
+            param_attr=ParamAttr(name='{}_advantage_fc'.format(variable_field)),
+            bias_attr=ParamAttr(
+                name='{}_advantage_fc_b'.format(variable_field)))
+        Q = advantage + (value - fluid.layers.reduce_mean(
+            advantage, dim=1, keep_dim=True))
+        return Q
    def _build_sync_target_network(self):
-        vars = fluid.default_main_program().list_vars()
+        vars = list(fluid.default_main_program().list_vars())
-        policy_vars = []
+        policy_vars = filter(
-        target_vars = []
+            lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars)
-        for var in vars:
+        target_vars = filter(
-            if 'GRAD' in var.name: continue
+            lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)
-            if 'policy' in var.name:
+        policy_vars.sort(key=lambda x: x.name)
-                policy_vars.append(var)
+        target_vars.sort(key=lambda x: x.name)
-            elif 'target' in var.name:
-                target_vars.append(var)
-        policy_vars.sort(key=lambda x: x.name.split('policy_')[1])
-        target_vars.sort(key=lambda x: x.name.split('target_')[1])
        sync_program = fluid.default_main_program().clone()
        with fluid.program_guard(sync_program):
@@ -122,26 +166,30 @@ class Model(object):
        if train_or_test == 'train' and sample < self.exploration:
            act = np.random.randint(self.action_dim)
        else:
-            state = np.expand_dims(state, axis=0)
+            if np.random.random() < 0.01:
-            pred_Q = self.exe.run(self.predict_program,
+                act = np.random.randint(self.action_dim)
-                                  feed={'state': state.astype('float32')},
+            else:
-                                  fetch_list=[self.pred_value])[0]
+                state = np.expand_dims(state, axis=0)
-            pred_Q = np.squeeze(pred_Q, axis=0)
+                pred_Q = self.exe.run(self.predict_program,
-            act = np.argmax(pred_Q)
+                                      feed={'state': state.astype('float32')},
-        self.exploration = max(0.1, self.exploration - 1e-6)
+                                      fetch_list=[self.pred_value])[0]
+                pred_Q = np.squeeze(pred_Q, axis=0)
+                act = np.argmax(pred_Q)
+        if train_or_test == 'train':
+            self.exploration = max(0.1, self.exploration - 1e-6)
        return act
    def train(self, state, action, reward, next_state, isOver):
-        if self.global_step % UPDATE_TARGET_STEPS == 0:
+        if self.global_step % self.update_target_steps == 0:
            self.sync_target_network()
        self.global_step += 1
        action = np.expand_dims(action, -1)
        self.exe.run(self.train_program, \
-                  feed={'state': state, \
+                  feed={'state': state.astype('float32'), \
-                        'action': action, \
+                        'action': action.astype('int32'), \
                        'reward': reward, \
-                        'next_s': next_state, \
+                        'next_s': next_state.astype('float32'), \
                        'isOver': isOver})
    def sync_target_network(self):

--- a/fluid/DeepQNetwork/README.md
+++ b/fluid/DeepQNetwork/README.md
-<img src="mountain_car.gif" width="300" height="200">
+# Reproduce DQN, DoubleDQN, DuelingDQN model with fluid version of PaddlePaddle
-# Reproduce DQN model
+ DQN in:
- + DQN in:
 [Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
+ DoubleDQN in:
+[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389)
+ DuelingDQN in:
+[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html)
-# Mountain-CAR benchmark & performance
+# Atari benchmark & performance
-[MountainCar-v0](https://gym.openai.com/envs/MountainCar-v0/)
+## [Atari games introduction](https://gym.openai.com/envs/#atari)
-A car is on a one-dimensional track, positioned between two "mountains". The goal is to drive up the mountain on the right; however, the car's engine is not strong enough to scale the mountain in a single pass. Therefore, the only way to succeed is to drive back and forth to build up momentum.
+ Pong game result
+![DQN result](assets/dqn.png)
+# How to use
+ Dependencies:
+    + python2.7
+    + gym
+    + tqdm
+    + paddlepaddle-gpu==0.12.0
+ Start Training:
+    ```
+    # To train a model for Pong game with gpu (use DQN model as default)
+    python train.py --rom ./rom_files/pong.bin --use_cuda
-<img src="curve.png" >
+    # To train a model for Pong with DoubleDQN
+    python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN
+    # To train a model for Pong with DuelingDQN
+    python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN
+    ```
+To train more games, can install more rom files from [here](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms)
-# How to use
+ Start Testing:
-+ Dependencies:
+    ```
-   + python2.7
+    # Play the game with saved model and calculate the average rewards
-   + gym
+    python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong/stepXXXXX
-   + tqdm
-   + paddle-fluid
-+ Start Training:
-   ```
-   # use mountain-car enviroment as default
-   python DQN.py
-   # use other enviorment
+    # Play the game with visualization
-   python DQN.py --env CartPole-v0
+    python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong/stepXXXXX --viz 0.01
-   ```
+    ```
--- a/fluid/DeepQNetwork/assets/dqn.png
+++ b/fluid/DeepQNetwork/assets/dqn.png
--- a/fluid/DeepQNetwork/atari.py
+++ b/fluid/DeepQNetwork/atari.py
+# -*- coding: utf-8 -*-
+import numpy as np
+import os
+import cv2
+import threading
+import gym
+from gym import spaces
+from gym.envs.atari.atari_env import ACTION_MEANING
+from ale_python_interface import ALEInterface
+__all__ = ['AtariPlayer']
+ROM_URL = "https://github.com/openai/atari-py/tree/master/atari_py/atari_roms"
+_ALE_LOCK = threading.Lock()
+"""
+The following AtariPlayer are copied or modified from tensorpack/tensorpack:
+    https://github.com/tensorpack/tensorpack/blob/master/examples/DeepQNetwork/atari.py
+"""
+class AtariPlayer(gym.Env):
+    """
+    A wrapper for ALE emulator, with configurations to mimic DeepMind DQN settings.
+    Info:
+        score: the accumulated reward in the current game
+        gameOver: True when the current game is Over
+    """
+    def __init__(self,
+                 rom_file,
+                 viz=0,
+                 frame_skip=4,
+                 nullop_start=30,
+                 live_lost_as_eoe=True,
+                 max_num_frames=0):
+        """
+        Args:
+            rom_file: path to the rom
+            frame_skip: skip every k frames and repeat the action
+            viz: visualization to be done.
+                Set to 0 to disable.
+                Set to a positive number to be the delay between frames to show.
+                Set to a string to be a directory to store frames.
+            nullop_start: start with random number of null ops.
+            live_losts_as_eoe: consider lost of lives as end of episode. Useful for training.
+            max_num_frames: maximum number of frames per episode.
+        """
+        super(AtariPlayer, self).__init__()
+        assert os.path.isfile(rom_file), \
+            "rom {} not found. Please download at {}".format(rom_file, ROM_URL)
+        try:
+            ALEInterface.setLoggerMode(ALEInterface.Logger.Error)
+        except AttributeError:
+            print "You're not using latest ALE"
+        # avoid simulator bugs: https://github.com/mgbellemare/Arcade-Learning-Environment/issues/86
+        with _ALE_LOCK:
+            self.ale = ALEInterface()
+            self.ale.setInt(b"random_seed", np.random.randint(0, 30000))
+            self.ale.setInt(b"max_num_frames_per_episode", max_num_frames)
+            self.ale.setBool(b"showinfo", False)
+            self.ale.setInt(b"frame_skip", 1)
+            self.ale.setBool(b'color_averaging', False)
+            # manual.pdf suggests otherwise.
+            self.ale.setFloat(b'repeat_action_probability', 0.0)
+            # viz setup
+            if isinstance(viz, str):
+                assert os.path.isdir(viz), viz
+                self.ale.setString(b'record_screen_dir', viz)
+                viz = 0
+            if isinstance(viz, int):
+                viz = float(viz)
+            self.viz = viz
+            if self.viz and isinstance(self.viz, float):
+                self.windowname = os.path.basename(rom_file)
+                cv2.startWindowThread()
+                cv2.namedWindow(self.windowname)
+            self.ale.loadROM(rom_file.encode('utf-8'))
+        self.width, self.height = self.ale.getScreenDims()
+        self.actions = self.ale.getMinimalActionSet()
+        self.live_lost_as_eoe = live_lost_as_eoe
+        self.frame_skip = frame_skip
+        self.nullop_start = nullop_start
+        self.action_space = spaces.Discrete(len(self.actions))
+        self.observation_space = spaces.Box(low=0,
+                                            high=255,
+                                            shape=(self.height, self.width),
+                                            dtype=np.uint8)
+        self._restart_episode()
+    def get_action_meanings(self):
+        return [ACTION_MEANING[i] for i in self.actions]
+    def _grab_raw_image(self):
+        """
+        :returns: the current 3-channel image
+        """
+        m = self.ale.getScreenRGB()
+        return m.reshape((self.height, self.width, 3))
+    def _current_state(self):
+        """
+        returns: a gray-scale (h, w) uint8 image
+        """
+        ret = self._grab_raw_image()
+        # avoid missing frame issue: max-pooled over the last screen
+        ret = np.maximum(ret, self.last_raw_screen)
+        if self.viz:
+            if isinstance(self.viz, float):
+                cv2.imshow(self.windowname, ret)
+                cv2.waitKey(int(self.viz * 1000))
+        ret = ret.astype('float32')
+        # 0.299,0.587.0.114. same as rgb2y in torch/image
+        ret = cv2.cvtColor(ret, cv2.COLOR_RGB2GRAY)
+        return ret.astype('uint8')  # to save some memory
+    def _restart_episode(self):
+        with _ALE_LOCK:
+            self.ale.reset_game()
+        # random null-ops start
+        n = np.random.randint(self.nullop_start)
+        self.last_raw_screen = self._grab_raw_image()
+        for k in range(n):
+            if k == n - 1:
+                self.last_raw_screen = self._grab_raw_image()
+            self.ale.act(0)
+    def reset(self):
+        if self.ale.game_over():
+            self._restart_episode()
+        return self._current_state()
+    def step(self, act):
+        oldlives = self.ale.lives()
+        r = 0
+        for k in range(self.frame_skip):
+            if k == self.frame_skip - 1:
+                self.last_raw_screen = self._grab_raw_image()
+            r += self.ale.act(self.actions[act])
+            newlives = self.ale.lives()
+            if self.ale.game_over() or \
+                    (self.live_lost_as_eoe and newlives < oldlives):
+                break
+        isOver = self.ale.game_over()
+        if self.live_lost_as_eoe:
+            isOver = isOver or newlives < oldlives
+        info = {'ale.lives': newlives}
+        return self._current_state(), r, isOver, info
--- a/fluid/DeepQNetwork/atari_wrapper.py
+++ b/fluid/DeepQNetwork/atari_wrapper.py
+# -*- coding: utf-8 -*-
+import numpy as np
+from collections import deque
+import gym
+from gym import spaces
+_v0, _v1 = gym.__version__.split('.')[:2]
+assert int(_v0) > 0 or int(_v1) >= 10, gym.__version__
+"""
+The following wrappers are copied or modified from openai/baselines:
+https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py
+"""
+class MapState(gym.ObservationWrapper):
+    def __init__(self, env, map_func):
+        gym.ObservationWrapper.__init__(self, env)
+        self._func = map_func
+    def observation(self, obs):
+        return self._func(obs)
+class FrameStack(gym.Wrapper):
+    def __init__(self, env, k):
+        """Buffer observations and stack across channels (last axis)."""
+        gym.Wrapper.__init__(self, env)
+        self.k = k
+        self.frames = deque([], maxlen=k)
+        shp = env.observation_space.shape
+        chan = 1 if len(shp) == 2 else shp[2]
+        self.observation_space = spaces.Box(low=0,
+                                            high=255,
+                                            shape=(shp[0], shp[1], chan * k),
+                                            dtype=np.uint8)
+    def reset(self):
+        """Clear buffer and re-fill by duplicating the first observation."""
+        ob = self.env.reset()
+        for _ in range(self.k - 1):
+            self.frames.append(np.zeros_like(ob))
+        self.frames.append(ob)
+        return self.observation()
+    def step(self, action):
+        ob, reward, done, info = self.env.step(action)
+        self.frames.append(ob)
+        return self.observation(), reward, done, info
+    def observation(self):
+        assert len(self.frames) == self.k
+        return np.stack(self.frames, axis=0)
+class _FireResetEnv(gym.Wrapper):
+    def __init__(self, env):
+        """Take action on reset for environments that are fixed until firing."""
+        gym.Wrapper.__init__(self, env)
+        assert env.unwrapped.get_action_meanings()[1] == 'FIRE'
+        assert len(env.unwrapped.get_action_meanings()) >= 3
+    def reset(self):
+        self.env.reset()
+        obs, _, done, _ = self.env.step(1)
+        if done:
+            self.env.reset()
+        obs, _, done, _ = self.env.step(2)
+        if done:
+            self.env.reset()
+        return obs
+    def step(self, action):
+        return self.env.step(action)
+def FireResetEnv(env):
+    if isinstance(env, gym.Wrapper):
+        baseenv = env.unwrapped
+    else:
+        baseenv = env
+    if 'FIRE' in baseenv.get_action_meanings():
+        return _FireResetEnv(env)
+    return env
+class LimitLength(gym.Wrapper):
+    def __init__(self, env, k):
+        gym.Wrapper.__init__(self, env)
+        self.k = k
+    def reset(self):
+        # This assumes that reset() will really reset the env.
+        # If the underlying env tries to be smart about reset
+        # (e.g. end-of-life), the assumption doesn't hold.
+        ob = self.env.reset()
+        self.cnt = 0
+        return ob
+    def step(self, action):
+        ob, r, done, info = self.env.step(action)
+        self.cnt += 1
+        if self.cnt == self.k:
+            done = True
+        return ob, r, done, info
--- a/fluid/DeepQNetwork/curve.png
+++ b/fluid/DeepQNetwork/curve.png
--- a/fluid/DeepQNetwork/expreplay.py
+++ b/fluid/DeepQNetwork/expreplay.py
-#-*- coding: utf-8 -*-
+# -*- coding: utf-8 -*-
-#File: expreplay.py
-from collections import namedtuple
 import numpy as np
+import copy
+from collections import deque, namedtuple
 Experience = namedtuple('Experience', ['state', 'action', 'reward', 'isOver'])
 class ReplayMemory(object):
-    def __init__(self, max_size, state_shape):
+    def __init__(self, max_size, state_shape, context_len):
        self.max_size = int(max_size)
        self.state_shape = state_shape
+        self.context_len = int(context_len)
-        self.state = np.zeros((self.max_size, ) + state_shape, dtype='float32')
+        self.state = np.zeros((self.max_size, ) + state_shape, dtype='uint8')
        self.action = np.zeros((self.max_size, ), dtype='int32')
        self.reward = np.zeros((self.max_size, ), dtype='float32')
        self.isOver = np.zeros((self.max_size, ), dtype='bool')
        self._curr_size = 0
        self._curr_pos = 0
+        self._context = deque(maxlen=context_len - 1)
    def append(self, exp):
+        """append a new experience into replay memory
+        """
        if self._curr_size < self.max_size:
            self._assign(self._curr_pos, exp)
            self._curr_size += 1
        else:
            self._assign(self._curr_pos, exp)
        self._curr_pos = (self._curr_pos + 1) % self.max_size
+        if exp.isOver:
+            self._context.clear()
+        else:
+            self._context.append(exp)
+    def recent_state(self):
+        """ maintain recent state for training"""
+        lst = list(self._context)
+        states = [np.zeros(self.state_shape, dtype='uint8')] * \
+                    (self._context.maxlen - len(lst))
+        states.extend([k.state for k in lst])
+        return states
+    def sample(self, idx):
+        """ return state, action, reward, isOver,
+            note that some frames in state may be generated from last episode,
+            they should be removed from state
+            """
+        state = np.zeros(
+            (self.context_len + 1, ) + self.state_shape, dtype=np.uint8)
+        state_idx = np.arange(idx, idx + self.context_len + 1) % self._curr_size
+        # confirm that no frame was generated from last episode
+        has_last_episode = False
+        for k in range(self.context_len - 2, -1, -1):
+            to_check_idx = state_idx[k]
+            if self.isOver[to_check_idx]:
+                has_last_episode = True
+                state_idx = state_idx[k + 1:]
+                state[k + 1:] = self.state[state_idx]
+                break
+        if not has_last_episode:
+            state = self.state[state_idx]
+        real_idx = (idx + self.context_len - 1) % self._curr_size
+        action = self.action[real_idx]
+        reward = self.reward[real_idx]
+        isOver = self.isOver[real_idx]
+        return state, reward, action, isOver
+    def __len__(self):
+        return self._curr_size
    def _assign(self, pos, exp):
        self.state[pos] = exp.state
-        self.action[pos] = exp.action
        self.reward[pos] = exp.reward
+        self.action[pos] = exp.action
        self.isOver[pos] = exp.isOver
-    def __len__(self):
+    def sample_batch(self, batch_size):
-        return self._curr_size
+        """sample a batch from replay memory for training
+        """
-    def sample(self, batch_idx):
+        batch_idx = np.random.randint(
-        # index mapping to avoid sampling lastest state
+            self._curr_size - self.context_len - 1, size=batch_size)
        batch_idx = (self._curr_pos + batch_idx) % self._curr_size
-        next_idx = (batch_idx + 1) % self._curr_size
+        batch_exp = [self.sample(i) for i in batch_idx]
+        return self._process_batch(batch_exp)
-        state = self.state[batch_idx]
-        reward = self.reward[batch_idx]
+    def _process_batch(self, batch_exp):
-        action = self.action[batch_idx]
+        state = np.asarray([e[0] for e in batch_exp], dtype='uint8')
-        next_state = self.state[next_idx]
+        reward = np.asarray([e[1] for e in batch_exp], dtype='float32')
-        isOver = self.isOver[batch_idx]
+        action = np.asarray([e[2] for e in batch_exp], dtype='int8')
-        return (state, action, reward, next_state, isOver)
+        isOver = np.asarray([e[3] for e in batch_exp], dtype='bool')
+        return [state, action, reward, isOver]
--- a/fluid/DeepQNetwork/mountain_car.gif
+++ b/fluid/DeepQNetwork/mountain_car.gif
--- a/fluid/DeepQNetwork/play.py
+++ b/fluid/DeepQNetwork/play.py
+#-*- coding: utf-8 -*-
+import argparse
+import os
+import numpy as np
+import paddle.fluid as fluid
+from train import get_player
+from tqdm import tqdm
+def predict_action(exe, state, predict_program, feed_names, fetch_targets,
+                   action_dim):
+    if np.random.randint(100) == 0:
+        act = np.random.randint(action_dim)
+    else:
+        state = np.expand_dims(state, axis=0)
+        pred_Q = exe.run(predict_program,
+                         feed={feed_names[0]: state.astype('float32')},
+                         fetch_list=fetch_targets)[0]
+        pred_Q = np.squeeze(pred_Q, axis=0)
+        act = np.argmax(pred_Q)
+    return act
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--use_cuda', action='store_true', help='if set, use cuda')
+    parser.add_argument('--rom', type=str, required=True, help='atari rom')
+    parser.add_argument(
+        '--model_path', type=str, required=True, help='dirname to load model')
+    parser.add_argument(
+        '--viz',
+        type=float,
+        default=0,
+        help='''viz: visualization setting:
+                Set to 0 to disable;
+                Set to a positive number to be the delay between frames to show.
+             ''')
+    args = parser.parse_args()
+    env = get_player(args.rom, viz=args.viz)
+    place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    inference_scope = fluid.core.Scope()
+    with fluid.scope_guard(inference_scope):
+        [predict_program, feed_names,
+         fetch_targets] = fluid.io.load_inference_model(args.model_path, exe)
+        episode_reward = []
+        for _ in tqdm(xrange(30), desc='eval agent'):
+            state = env.reset()
+            total_reward = 0
+            while True:
+                action = predict_action(exe, state, predict_program, feed_names,
+                                        fetch_targets, env.action_space.n)
+                state, reward, isOver, info = env.step(action)
+                total_reward += reward
+                if isOver:
+                    break
+            episode_reward.append(total_reward)
+        eval_reward = np.mean(episode_reward)
+        print('Average reward of 30 epidose: {}'.format(eval_reward))
--- a/fluid/DeepQNetwork/rom_files/breakout.bin
+++ b/fluid/DeepQNetwork/rom_files/breakout.bin
--- a/fluid/DeepQNetwork/rom_files/pong.bin
+++ b/fluid/DeepQNetwork/rom_files/pong.bin
--- a/fluid/DeepQNetwork/train.py
+++ b/fluid/DeepQNetwork/train.py
+#-*- coding: utf-8 -*-
+from DQN_agent import DQNModel
+from DoubleDQN_agent import DoubleDQNModel
+from DuelingDQN_agent import DuelingDQNModel
+from atari import AtariPlayer
+import paddle.fluid as fluid
+import gym
+import argparse
+import cv2
+from tqdm import tqdm
+from expreplay import ReplayMemory, Experience
+import numpy as np
+import os
+from datetime import datetime
+from atari_wrapper import FrameStack, MapState, FireResetEnv, LimitLength
+from collections import deque
+UPDATE_FREQ = 4
+#MEMORY_WARMUP_SIZE = 2000
+MEMORY_SIZE = 1e6
+MEMORY_WARMUP_SIZE = MEMORY_SIZE // 20
+IMAGE_SIZE = (84, 84)
+CONTEXT_LEN = 4
+ACTION_REPEAT = 4  # aka FRAME_SKIP
+UPDATE_FREQ = 4
+def run_train_episode(agent, env, exp):
+    total_reward = 0
+    state = env.reset()
+    step = 0
+    while True:
+        step += 1
+        context = exp.recent_state()
+        context.append(state)
+        context = np.stack(context, axis=0)
+        action = agent.act(context, train_or_test='train')
+        next_state, reward, isOver, _ = env.step(action)
+        exp.append(Experience(state, action, reward, isOver))
+        # train model
+        # start training 
+        if len(exp) > MEMORY_WARMUP_SIZE:
+            if step % UPDATE_FREQ == 0:
+                batch_all_state, batch_action, batch_reward, batch_isOver = exp.sample_batch(
+                    args.batch_size)
+                batch_state = batch_all_state[:, :CONTEXT_LEN, :, :]
+                batch_next_state = batch_all_state[:, 1:, :, :]
+                agent.train(batch_state, batch_action, batch_reward,
+                            batch_next_state, batch_isOver)
+        total_reward += reward
+        state = next_state
+        if isOver:
+            break
+    return total_reward, step
+def get_player(rom, viz=False, train=False):
+    env = AtariPlayer(
+        rom,
+        frame_skip=ACTION_REPEAT,
+        viz=viz,
+        live_lost_as_eoe=train,
+        max_num_frames=60000)
+    env = FireResetEnv(env)
+    env = MapState(env, lambda im: cv2.resize(im, IMAGE_SIZE))
+    if not train:
+        # in training, context is taken care of in expreplay buffer
+        env = FrameStack(env, CONTEXT_LEN)
+    return env
+def eval_agent(agent, env):
+    episode_reward = []
+    for _ in tqdm(xrange(30), desc='eval agent'):
+        state = env.reset()
+        total_reward = 0
+        step = 0
+        while True:
+            step += 1
+            action = agent.act(state, train_or_test='test')
+            state, reward, isOver, info = env.step(action)
+            total_reward += reward
+            if isOver:
+                break
+        episode_reward.append(total_reward)
+    eval_reward = np.mean(episode_reward)
+    return eval_reward
+def train_agent():
+    env = get_player(args.rom, train=True)
+    test_env = get_player(args.rom)
+    exp = ReplayMemory(args.mem_size, IMAGE_SIZE, CONTEXT_LEN)
+    action_dim = env.action_space.n
+    if args.alg == 'DQN':
+        agent = DQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN,
+                         args.use_cuda)
+    elif args.alg == 'DoubleDQN':
+        agent = DoubleDQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN,
+                               args.use_cuda)
+    elif args.alg == 'DuelingDQN':
+        agent = DuelingDQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN,
+                                args.use_cuda)
+    else:
+        print('Input algorithm name error!')
+        return
+    with tqdm(total=MEMORY_WARMUP_SIZE) as pbar:
+        while len(exp) < MEMORY_WARMUP_SIZE:
+            total_reward, step = run_train_episode(agent, env, exp)
+            pbar.update(step)
+    # train
+    test_flag = 0
+    save_flag = 0
+    pbar = tqdm(total=1e8)
+    recent_100_reward = []
+    total_step = 0
+    while True:
+        # start epoch
+        total_reward, step = run_train_episode(agent, env, exp)
+        total_step += step
+        pbar.set_description('[train]exploration:{}'.format(agent.exploration))
+        pbar.update(step)
+        if total_step // args.test_every_steps == test_flag:
+            pbar.write("testing")
+            eval_reward = eval_agent(agent, test_env)
+            test_flag += 1
+            print("eval_agent done, (steps, eval_reward): ({}, {})".format(
+                total_step, eval_reward))
+        if total_step // args.save_every_steps == save_flag:
+            save_flag += 1
+            save_path = os.path.join(args.model_dirname, '{}-{}'.format(
+                args.alg, os.path.basename(args.rom).split('.')[0]),
+                                     'step{}'.format(total_step))
+            fluid.io.save_inference_model(save_path, ['state'],
+                                          agent.pred_value, agent.exe,
+                                          agent.predict_program)
+    pbar.close()
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--alg',
+        type=str,
+        default='DQN',
+        help='Reinforcement learning algorithm, support: DQN, DoubleDQN, DuelingDQN'
+    )
+    parser.add_argument(
+        '--use_cuda', action='store_true', help='if set, use cuda')
+    parser.add_argument(
+        '--gamma',
+        type=float,
+        default=0.99,
+        help='discount factor for accumulated reward computation')
+    parser.add_argument(
+        '--mem_size',
+        type=int,
+        default=1000000,
+        help='memory size for experience replay')
+    parser.add_argument(
+        '--batch_size', type=int, default=64, help='batch size for training')
+    parser.add_argument('--rom', help='atari rom', required=True)
+    parser.add_argument(
+        '--model_dirname',
+        type=str,
+        default='saved_model',
+        help='dirname to save model')
+    parser.add_argument(
+        '--save_every_steps',
+        type=int,
+        default=100000,
+        help='every steps number to save model')
+    parser.add_argument(
+        '--test_every_steps',
+        type=int,
+        default=100000,
+        help='every steps number to run test')
+    args = parser.parse_args()
+    train_agent()
--- a/fluid/DeepQNetwork/utils.py
+++ b/fluid/DeepQNetwork/utils.py
+#-*- coding: utf-8 -*-
+#File: utils.py
+import paddle.fluid as fluid
+import numpy as np
+def fluid_argmax(x):
+    """
+    Get index of max value for the last dimension
+    """
+    _, max_index = fluid.layers.topk(x, k=1)
+    return max_index
+def fluid_flatten(x):
+    """
+    Flatten fluid variable along the first dimension
+    """
+    return fluid.layers.reshape(x, shape=[-1, np.prod(x.shape[1:])])
--- a/fluid/chinese_ner/train.py
+++ b/fluid/chinese_ner/train.py
@@ -211,13 +211,12 @@ def main(train_data_file, test_data_file, model_save_dir, num_passes):
        avg_cost, feature_out, word, mention, target = ner_net(word_dict_len,
                                                               label_dict_len)
+        crf_decode = fluid.layers.crf_decoding(
+            input=feature_out, param_attr=fluid.ParamAttr(name='crfw'))
        sgd_optimizer = fluid.optimizer.SGD(learning_rate=1e-3)
        sgd_optimizer.minimize(avg_cost)
-        crf_decode = fluid.layers.crf_decoding(
-            input=feature_out, param_attr=fluid.ParamAttr(
-                name='crfw', ))
        (precision, recall, f1_score, num_infer_chunks, num_label_chunks,
         num_correct_chunks) = fluid.layers.chunk_eval(
             input=crf_decode,
@@ -289,8 +288,8 @@ def main(train_data_file, test_data_file, model_save_dir, num_passes):
                  + str(f1))
            save_dirname = os.path.join(model_save_dir,
                                        "params_pass_%d" % pass_id)
-            fluid.io.save_inference_model(
+            fluid.io.save_inference_model(save_dirname, ['word', 'mention'],
-                save_dirname, ['word', 'mention', 'target'], [crf_decode], exe)
+                                          [crf_decode], exe)
 if __name__ == "__main__":

--- a/fluid/face_detection/.gitignore
+++ b/fluid/face_detection/.gitignore
+model/
+pretrained/
+data/
+label/
+*.swp
+*.log
+infer_results/
--- a/fluid/face_detection/image_util.py
+++ b/fluid/face_detection/image_util.py
+from PIL import Image, ImageEnhance, ImageDraw
+from PIL import ImageFile
+import numpy as np
+import random
+import math
+ImageFile.LOAD_TRUNCATED_IMAGES = True  #otherwise IOError raised image file is truncated
+class sampler():
+    def __init__(self,
+                 max_sample,
+                 max_trial,
+                 min_scale,
+                 max_scale,
+                 min_aspect_ratio,
+                 max_aspect_ratio,
+                 min_jaccard_overlap,
+                 max_jaccard_overlap,
+                 min_object_coverage,
+                 max_object_coverage,
+                 use_square=False):
+        self.max_sample = max_sample
+        self.max_trial = max_trial
+        self.min_scale = min_scale
+        self.max_scale = max_scale
+        self.min_aspect_ratio = min_aspect_ratio
+        self.max_aspect_ratio = max_aspect_ratio
+        self.min_jaccard_overlap = min_jaccard_overlap
+        self.max_jaccard_overlap = max_jaccard_overlap
+        self.min_object_coverage = min_object_coverage
+        self.max_object_coverage = max_object_coverage
+        self.use_square = use_square
+class bbox():
+    def __init__(self, xmin, ymin, xmax, ymax):
+        self.xmin = xmin
+        self.ymin = ymin
+        self.xmax = xmax
+        self.ymax = ymax
+def intersect_bbox(bbox1, bbox2):
+    if bbox2.xmin > bbox1.xmax or bbox2.xmax < bbox1.xmin or \
+        bbox2.ymin > bbox1.ymax or bbox2.ymax < bbox1.ymin:
+        intersection_box = bbox(0.0, 0.0, 0.0, 0.0)
+    else:
+        intersection_box = bbox(
+            max(bbox1.xmin, bbox2.xmin),
+            max(bbox1.ymin, bbox2.ymin),
+            min(bbox1.xmax, bbox2.xmax), min(bbox1.ymax, bbox2.ymax))
+    return intersection_box
+def bbox_coverage(bbox1, bbox2):
+    inter_box = intersect_bbox(bbox1, bbox2)
+    intersect_size = bbox_area(inter_box)
+    if intersect_size > 0:
+        bbox1_size = bbox_area(bbox1)
+        return intersect_size / bbox1_size
+    else:
+        return 0.
+def bbox_area(src_bbox):
+    if src_bbox.xmax < src_bbox.xmin or src_bbox.ymax < src_bbox.ymin:
+        return 0.
+    else:
+        width = src_bbox.xmax - src_bbox.xmin
+        height = src_bbox.ymax - src_bbox.ymin
+        return width * height
+def generate_sample(sampler, image_width, image_height):
+    scale = random.uniform(sampler.min_scale, sampler.max_scale)
+    aspect_ratio = random.uniform(sampler.min_aspect_ratio,
+                                  sampler.max_aspect_ratio)
+    aspect_ratio = max(aspect_ratio, (scale**2.0))
+    aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
+    bbox_width = scale * (aspect_ratio**0.5)
+    bbox_height = scale / (aspect_ratio**0.5)
+    # guarantee a squared image patch after cropping
+    if sampler.use_square:
+        if image_height < image_width:
+            bbox_width = bbox_height * image_height / image_width
+        else:
+            bbox_height = bbox_width * image_width / image_height
+    xmin_bound = 1 - bbox_width
+    ymin_bound = 1 - bbox_height
+    xmin = random.uniform(0, xmin_bound)
+    ymin = random.uniform(0, ymin_bound)
+    xmax = xmin + bbox_width
+    ymax = ymin + bbox_height
+    sampled_bbox = bbox(xmin, ymin, xmax, ymax)
+    return sampled_bbox
+def data_anchor_sampling(sampler, bbox_labels, image_width, image_height,
+                         scale_array, resize_width, resize_height):
+    num_gt = len(bbox_labels)
+    # np.random.randint range: [low, high)
+    rand_idx = np.random.randint(0, num_gt) if num_gt != 0 else 0
+    if num_gt != 0:
+        norm_xmin = bbox_labels[rand_idx][0]
+        norm_ymin = bbox_labels[rand_idx][1]
+        norm_xmax = bbox_labels[rand_idx][2]
+        norm_ymax = bbox_labels[rand_idx][3]
+        xmin = norm_xmin * image_width
+        ymin = norm_ymin * image_height
+        wid = image_width * (norm_xmax - norm_xmin)
+        hei = image_height * (norm_ymax - norm_ymin)
+        range_size = 0
+        for scale_ind in range(0, len(scale_array) - 1):
+            area = wid * hei
+            if area > scale_array[scale_ind] ** 2 and area < \
+                    scale_array[scale_ind + 1] ** 2:
+                range_size = scale_ind + 1
+                break
+        scale_choose = 0.0
+        if range_size == 0:
+            rand_idx_size = range_size + 1
+        else:
+            # np.random.randint range: [low, high)
+            rng_rand_size = np.random.randint(0, range_size)
+            rand_idx_size = rng_rand_size % range_size
+        scale_choose = random.uniform(scale_array[rand_idx_size] / 2.0,
+                                      2.0 * scale_array[rand_idx_size])
+        sample_bbox_size = wid * resize_width / scale_choose
+        w_off_orig = 0.0
+        h_off_orig = 0.0
+        if sample_bbox_size < max(image_height, image_width):
+            if wid <= sample_bbox_size:
+                w_off_orig = random.uniform(xmin + wid - sample_bbox_size, xmin)
+            else:
+                w_off_orig = random.uniform(xmin, xmin + wid - sample_bbox_size)
+            if hei <= sample_bbox_size:
+                h_off_orig = random.uniform(ymin + hei - sample_bbox_size, ymin)
+            else:
+                h_off_orig = random.uniform(ymin, ymin + hei - sample_bbox_size)
+        else:
+            w_off_orig = random.uniform(image_width - sample_bbox_size, 0.0)
+            h_off_orig = random.uniform(image_height - sample_bbox_size, 0.0)
+        w_off_orig = math.floor(w_off_orig)
+        h_off_orig = math.floor(h_off_orig)
+        # Figure out top left coordinates.
+        w_off = 0.0
+        h_off = 0.0
+        w_off = float(w_off_orig / image_width)
+        h_off = float(h_off_orig / image_height)
+        sampled_bbox = bbox(w_off, h_off,
+                            w_off + float(sample_bbox_size / image_width),
+                            h_off + float(sample_bbox_size / image_height))
+        return sampled_bbox
+def jaccard_overlap(sample_bbox, object_bbox):
+    if sample_bbox.xmin >= object_bbox.xmax or \
+            sample_bbox.xmax <= object_bbox.xmin or \
+            sample_bbox.ymin >= object_bbox.ymax or \
+            sample_bbox.ymax <= object_bbox.ymin:
+        return 0
+    intersect_xmin = max(sample_bbox.xmin, object_bbox.xmin)
+    intersect_ymin = max(sample_bbox.ymin, object_bbox.ymin)
+    intersect_xmax = min(sample_bbox.xmax, object_bbox.xmax)
+    intersect_ymax = min(sample_bbox.ymax, object_bbox.ymax)
+    intersect_size = (intersect_xmax - intersect_xmin) * (
+        intersect_ymax - intersect_ymin)
+    sample_bbox_size = bbox_area(sample_bbox)
+    object_bbox_size = bbox_area(object_bbox)
+    overlap = intersect_size / (
+        sample_bbox_size + object_bbox_size - intersect_size)
+    return overlap
+def satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
+    if sampler.min_jaccard_overlap == 0 and sampler.max_jaccard_overlap == 0:
+        has_jaccard_overlap = False
+    else:
+        has_jaccard_overlap = True
+    if sampler.min_object_coverage == 0 and sampler.max_object_coverage == 0:
+        has_object_coverage = False
+    else:
+        has_object_coverage = True
+    if not has_jaccard_overlap and not has_object_coverage:
+        return True
+    found = False
+    for i in range(len(bbox_labels)):
+        object_bbox = bbox(bbox_labels[i][1], bbox_labels[i][2],
+                           bbox_labels[i][3], bbox_labels[i][4])
+        if has_jaccard_overlap:
+            overlap = jaccard_overlap(sample_bbox, object_bbox)
+            if sampler.min_jaccard_overlap != 0 and \
+                    overlap < sampler.min_jaccard_overlap:
+                continue
+            if sampler.max_jaccard_overlap != 0 and \
+                    overlap > sampler.max_jaccard_overlap:
+                continue
+            found = True
+        if has_object_coverage:
+            object_coverage = bbox_coverage(object_bbox, sample_bbox)
+            if sampler.min_object_coverage != 0 and \
+                    object_coverage < sampler.min_object_coverage:
+                continue
+            if sampler.max_object_coverage != 0 and \
+                    object_coverage > sampler.max_object_coverage:
+                continue
+            found = True
+        if found:
+            return True
+    return found
+def generate_batch_samples(batch_sampler, bbox_labels, image_width,
+                           image_height):
+    sampled_bbox = []
+    for sampler in batch_sampler:
+        found = 0
+        for i in range(sampler.max_trial):
+            if found >= sampler.max_sample:
+                break
+            sample_bbox = generate_sample(sampler, image_width, image_height)
+            if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
+                sampled_bbox.append(sample_bbox)
+                found = found + 1
+    return sampled_bbox
+def generate_batch_random_samples(batch_sampler, bbox_labels, image_width,
+                                  image_height, scale_array, resize_width,
+                                  resize_height):
+    sampled_bbox = []
+    for sampler in batch_sampler:
+        found = 0
+        for i in range(sampler.max_trial):
+            if found >= sampler.max_sample:
+                break
+            sample_bbox = data_anchor_sampling(
+                sampler, bbox_labels, image_width, image_height, scale_array,
+                resize_width, resize_height)
+            if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
+                sampled_bbox.append(sample_bbox)
+                found = found + 1
+    return sampled_bbox
+def clip_bbox(src_bbox):
+    src_bbox.xmin = max(min(src_bbox.xmin, 1.0), 0.0)
+    src_bbox.ymin = max(min(src_bbox.ymin, 1.0), 0.0)
+    src_bbox.xmax = max(min(src_bbox.xmax, 1.0), 0.0)
+    src_bbox.ymax = max(min(src_bbox.ymax, 1.0), 0.0)
+    return src_bbox
+def meet_emit_constraint(src_bbox, sample_bbox):
+    center_x = (src_bbox.xmax + src_bbox.xmin) / 2
+    center_y = (src_bbox.ymax + src_bbox.ymin) / 2
+    if center_x >= sample_bbox.xmin and \
+        center_x <= sample_bbox.xmax and \
+        center_y >= sample_bbox.ymin and \
+        center_y <= sample_bbox.ymax:
+        return True
+    return False
+def project_bbox(object_bbox, sample_bbox):
+    if object_bbox.xmin >= sample_bbox.xmax or \
+       object_bbox.xmax <= sample_bbox.xmin or \
+       object_bbox.ymin >= sample_bbox.ymax or \
+       object_bbox.ymax <= sample_bbox.ymin:
+        return False
+    else:
+        proj_bbox = bbox(0, 0, 0, 0)
+        sample_width = sample_bbox.xmax - sample_bbox.xmin
+        sample_height = sample_bbox.ymax - sample_bbox.ymin
+        proj_bbox.xmin = (object_bbox.xmin - sample_bbox.xmin) / sample_width
+        proj_bbox.ymin = (object_bbox.ymin - sample_bbox.ymin) / sample_height
+        proj_bbox.xmax = (object_bbox.xmax - sample_bbox.xmin) / sample_width
+        proj_bbox.ymax = (object_bbox.ymax - sample_bbox.ymin) / sample_height
+        proj_bbox = clip_bbox(proj_bbox)
+        if bbox_area(proj_bbox) > 0:
+            return proj_bbox
+        else:
+            return False
+def transform_labels(bbox_labels, sample_bbox):
+    sample_labels = []
+    for i in range(len(bbox_labels)):
+        sample_label = []
+        object_bbox = bbox(bbox_labels[i][1], bbox_labels[i][2],
+                           bbox_labels[i][3], bbox_labels[i][4])
+        if not meet_emit_constraint(object_bbox, sample_bbox):
+            continue
+        proj_bbox = project_bbox(object_bbox, sample_bbox)
+        if proj_bbox:
+            sample_label.append(bbox_labels[i][0])
+            sample_label.append(float(proj_bbox.xmin))
+            sample_label.append(float(proj_bbox.ymin))
+            sample_label.append(float(proj_bbox.xmax))
+            sample_label.append(float(proj_bbox.ymax))
+            sample_label = sample_label + bbox_labels[i][5:]
+            sample_labels.append(sample_label)
+    return sample_labels
+def crop_image(img, bbox_labels, sample_bbox, image_width, image_height):
+    sample_bbox = clip_bbox(sample_bbox)
+    xmin = int(sample_bbox.xmin * image_width)
+    xmax = int(sample_bbox.xmax * image_width)
+    ymin = int(sample_bbox.ymin * image_height)
+    ymax = int(sample_bbox.ymax * image_height)
+    sample_img = img[ymin:ymax, xmin:xmax]
+    sample_labels = transform_labels(bbox_labels, sample_bbox)
+    return sample_img, sample_labels
+def crop_image_sampling(img, bbox_labels, sample_bbox, image_width,
+                        image_height, resize_width, resize_height):
+    # no clipping here
+    xmin = int(sample_bbox.xmin * image_width)
+    xmax = int(sample_bbox.xmax * image_width)
+    ymin = int(sample_bbox.ymin * image_height)
+    ymax = int(sample_bbox.ymax * image_height)
+    w_off = xmin
+    h_off = ymin
+    width = xmax - xmin
+    height = ymax - ymin
+    cross_xmin = max(0.0, float(w_off))
+    cross_ymin = max(0.0, float(h_off))
+    cross_xmax = min(float(w_off + width - 1.0), float(image_width))
+    cross_ymax = min(float(h_off + height - 1.0), float(image_height))
+    cross_width = cross_xmax - cross_xmin
+    cross_height = cross_ymax - cross_ymin
+    roi_xmin = 0 if w_off >= 0 else abs(w_off)
+    roi_ymin = 0 if h_off >= 0 else abs(h_off)
+    roi_width = cross_width
+    roi_height = cross_height
+    sample_img = np.zeros((width, height, 3))
+    sample_img[roi_xmin : roi_xmin + roi_width, roi_ymin : roi_ymin + roi_height] = \
+        img[cross_xmin : cross_xmin + cross_width, cross_ymin : cross_ymin + cross_height]
+    sample_img = cv2.resize(
+        sample_img, (resize_width, resize_height), interpolation=cv2.INTER_AREA)
+    sample_labels = transform_labels(bbox_labels, sample_bbox)
+    return sample_img, sample_labels
+def random_brightness(img, settings):
+    prob = random.uniform(0, 1)
+    if prob < settings.brightness_prob:
+        delta = random.uniform(-settings.brightness_delta,
+                               settings.brightness_delta) + 1
+        img = ImageEnhance.Brightness(img).enhance(delta)
+    return img
+def random_contrast(img, settings):
+    prob = random.uniform(0, 1)
+    if prob < settings.contrast_prob:
+        delta = random.uniform(-settings.contrast_delta,
+                               settings.contrast_delta) + 1
+        img = ImageEnhance.Contrast(img).enhance(delta)
+    return img
+def random_saturation(img, settings):
+    prob = random.uniform(0, 1)
+    if prob < settings.saturation_prob:
+        delta = random.uniform(-settings.saturation_delta,
+                               settings.saturation_delta) + 1
+        img = ImageEnhance.Color(img).enhance(delta)
+    return img
+def random_hue(img, settings):
+    prob = random.uniform(0, 1)
+    if prob < settings.hue_prob:
+        delta = random.uniform(-settings.hue_delta, settings.hue_delta)
+        img_hsv = np.array(img.convert('HSV'))
+        img_hsv[:, :, 0] = img_hsv[:, :, 0] + delta
+        img = Image.fromarray(img_hsv, mode='HSV').convert('RGB')
+    return img
+def distort_image(img, settings):
+    prob = random.uniform(0, 1)
+    # Apply different distort order
+    if prob > 0.5:
+        img = random_brightness(img, settings)
+        img = random_contrast(img, settings)
+        img = random_saturation(img, settings)
+        img = random_hue(img, settings)
+    else:
+        img = random_brightness(img, settings)
+        img = random_saturation(img, settings)
+        img = random_hue(img, settings)
+        img = random_contrast(img, settings)
+    return img
+def expand_image(img, bbox_labels, img_width, img_height, settings):
+    prob = random.uniform(0, 1)
+    if prob < settings.expand_prob:
+        if settings.expand_max_ratio - 1 >= 0.01:
+            expand_ratio = random.uniform(1, settings.expand_max_ratio)
+            height = int(img_height * expand_ratio)
+            width = int(img_width * expand_ratio)
+            h_off = math.floor(random.uniform(0, height - img_height))
+            w_off = math.floor(random.uniform(0, width - img_width))
+            expand_bbox = bbox(-w_off / img_width, -h_off / img_height,
+                               (width - w_off) / img_width,
+                               (height - h_off) / img_height)
+            expand_img = np.ones((height, width, 3))
+            expand_img = np.uint8(expand_img * np.squeeze(settings.img_mean))
+            expand_img = Image.fromarray(expand_img)
+            expand_img.paste(img, (int(w_off), int(h_off)))
+            bbox_labels = transform_labels(bbox_labels, expand_bbox)
+            return expand_img, bbox_labels, width, height
+    return img, bbox_labels, img_width, img_height
--- a/fluid/face_detection/infer.py
+++ b/fluid/face_detection/infer.py
+import os
+import time
+import numpy as np
+import argparse
+import functools
+from PIL import Image
+from PIL import ImageDraw
+import paddle
+import paddle.fluid as fluid
+import reader
+from pyramidbox import PyramidBox
+from utility import add_arguments, print_arguments
+parser = argparse.ArgumentParser(description=__doc__)
+add_arg = functools.partial(add_arguments, argparser=parser)
+# yapf: disable
+add_arg('use_gpu',          bool,  True,      "Whether use GPU.")
+add_arg('use_pyramidbox',   bool,  False, "Whether use PyramidBox model.")
+add_arg('confs_threshold',  float, 0.25,    "Confidence threshold to draw bbox.")
+add_arg('image_path',       str,   '',        "The data root path.")
+add_arg('model_dir',        str,   '',     "The model path.")
+# yapf: enable
+def draw_bounding_box_on_image(image_path, nms_out, confs_threshold):
+    image = Image.open(image_path)
+    draw = ImageDraw.Draw(image)
+    for dt in nms_out:
+        xmin, ymin, xmax, ymax, score = dt
+        if score < confs_threshold:
+            continue
+        (left, right, top, bottom) = (xmin, xmax, ymin, ymax)
+        draw.line(
+            [(left, top), (left, bottom), (right, bottom), (right, top),
+             (left, top)],
+            width=4,
+            fill='red')
+    image_name = image_path.split('/')[-1]
+    image_class = image_path.split('/')[-2]
+    print("image with bbox drawed saved as {}".format(image_name))
+    image.save('./infer_results/' + image_class.encode('utf-8') + '/' +
+               image_name.encode('utf-8'))
+def write_to_txt(image_path, f, nms_out):
+    image_name = image_path.split('/')[-1]
+    image_class = image_path.split('/')[-2]
+    f.write('{:s}\n'.format(
+        image_class.encode('utf-8') + '/' + image_name.encode('utf-8')))
+    f.write('{:d}\n'.format(nms_out.shape[0]))
+    for dt in nms_out:
+        xmin, ymin, xmax, ymax, score = dt
+        f.write('{:.1f} {:.1f} {:.1f} {:.1f} {:.3f}\n'.format(xmin, ymin, (
+            xmax - xmin + 1), (ymax - ymin + 1), score))
+    print("image infer result saved {}".format(image_name[:-4]))
+def get_round(x, loc):
+    str_x = str(x)
+    if '.' in str_x:
+        len_after = len(str_x.split('.')[1])
+        str_before = str_x.split('.')[0]
+        str_after = str_x.split('.')[1]
+        if len_after >= 3:
+            str_final = str_before + '.' + str_after[0:loc]
+            return float(str_final)
+        else:
+            return x
+def bbox_vote(det):
+    order = det[:, 4].ravel().argsort()[::-1]
+    det = det[order, :]
+    if det.shape[0] == 0:
+        dets = np.array([[10, 10, 20, 20, 0.002]])
+        det = np.empty(shape=[0, 5])
+    while det.shape[0] > 0:
+        # IOU
+        area = (det[:, 2] - det[:, 0] + 1) * (det[:, 3] - det[:, 1] + 1)
+        xx1 = np.maximum(det[0, 0], det[:, 0])
+        yy1 = np.maximum(det[0, 1], det[:, 1])
+        xx2 = np.minimum(det[0, 2], det[:, 2])
+        yy2 = np.minimum(det[0, 3], det[:, 3])
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        o = inter / (area[0] + area[:] - inter)
+        # get needed merge det and delete these det
+        merge_index = np.where(o >= 0.3)[0]
+        det_accu = det[merge_index, :]
+        det = np.delete(det, merge_index, 0)
+        if merge_index.shape[0] <= 1:
+            if det.shape[0] == 0:
+                try:
+                    dets = np.row_stack((dets, det_accu))
+                except:
+                    dets = det_accu
+            continue
+        det_accu[:, 0:4] = det_accu[:, 0:4] * np.tile(det_accu[:, -1:], (1, 4))
+        max_score = np.max(det_accu[:, 4])
+        det_accu_sum = np.zeros((1, 5))
+        det_accu_sum[:, 0:4] = np.sum(det_accu[:, 0:4],
+                                      axis=0) / np.sum(det_accu[:, -1:])
+        det_accu_sum[:, 4] = max_score
+        try:
+            dets = np.row_stack((dets, det_accu_sum))
+        except:
+            dets = det_accu_sum
+    dets = dets[0:750, :]
+    return dets
+def image_preprocess(image):
+    img = np.array(image)
+    # HWC to CHW
+    if len(img.shape) == 3:
+        img = np.swapaxes(img, 1, 2)
+        img = np.swapaxes(img, 1, 0)
+    # RBG to BGR
+    img = img[[2, 1, 0], :, :]
+    img = img.astype('float32')
+    img -= np.array(
+        [104., 117., 123.])[:, np.newaxis, np.newaxis].astype('float32')
+    img = img * 0.007843
+    img = [img]
+    img = np.array(img)
+    return img
+def detect_face(image, shrink):
+    image_shape = [3, image.size[1], image.size[0]]
+    num_classes = 2
+    place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    if shrink != 1:
+        image = image.resize((int(image_shape[2] * shrink),
+                              int(image_shape[1] * shrink)), Image.ANTIALIAS)
+        image_shape = [
+            image_shape[0], int(image_shape[1] * shrink),
+            int(image_shape[2] * shrink)
+        ]
+    print "image_shape:", image_shape
+    img = image_preprocess(image)
+    scope = fluid.core.Scope()
+    main_program = fluid.Program()
+    startup_program = fluid.Program()
+    with fluid.scope_guard(scope):
+        with fluid.unique_name.guard():
+            with fluid.program_guard(main_program, startup_program):
+                fetches = []
+                network = PyramidBox(
+                    image_shape,
+                    num_classes,
+                    sub_network=args.use_pyramidbox,
+                    is_infer=True)
+                infer_program, nmsed_out = network.infer(main_program)
+                fetches = [nmsed_out]
+                fluid.io.load_persistables(
+                    exe, args.model_dir, main_program=main_program)
+                detection, = exe.run(infer_program,
+                                     feed={'image': img},
+                                     fetch_list=fetches,
+                                     return_numpy=False)
+                detection = np.array(detection)
+    # layout: xmin, ymin, xmax. ymax, score
+    det_conf = detection[:, 1]
+    det_xmin = image_shape[2] * detection[:, 2] / shrink
+    det_ymin = image_shape[1] * detection[:, 3] / shrink
+    det_xmax = image_shape[2] * detection[:, 4] / shrink
+    det_ymax = image_shape[1] * detection[:, 5] / shrink
+    det = np.column_stack((det_xmin, det_ymin, det_xmax, det_ymax, det_conf))
+    keep_index = np.where(det[:, 4] >= 0)[0]
+    det = det[keep_index, :]
+    return det
+def flip_test(image, shrink):
+    img = image.transpose(Image.FLIP_LEFT_RIGHT)
+    det_f = detect_face(img, shrink)
+    det_t = np.zeros(det_f.shape)
+    # image.size: [width, height]
+    det_t[:, 0] = image.size[0] - det_f[:, 2]
+    det_t[:, 1] = det_f[:, 1]
+    det_t[:, 2] = image.size[0] - det_f[:, 0]
+    det_t[:, 3] = det_f[:, 3]
+    det_t[:, 4] = det_f[:, 4]
+    return det_t
+def multi_scale_test(image, max_shrink):
+    # shrink detecting and shrink only detect big face
+    st = 0.5 if max_shrink >= 0.75 else 0.5 * max_shrink
+    det_s = detect_face(image, st)
+    index = np.where(
+        np.maximum(det_s[:, 2] - det_s[:, 0] + 1, det_s[:, 3] - det_s[:, 1] + 1)
+        > 30)[0]
+    det_s = det_s[index, :]
+    # enlarge one times
+    bt = min(2, max_shrink) if max_shrink > 1 else (st + max_shrink) / 2
+    det_b = detect_face(image, bt)
+    # enlarge small image x times for small face
+    if max_shrink > 2:
+        bt *= 2
+        while bt < max_shrink:
+            det_b = np.row_stack((det_b, detect_face(image, bt)))
+            bt *= 2
+        det_b = np.row_stack((det_b, detect_face(image, max_shrink)))
+    # enlarge only detect small face
+    if bt > 1:
+        index = np.where(
+            np.minimum(det_b[:, 2] - det_b[:, 0] + 1,
+                       det_b[:, 3] - det_b[:, 1] + 1) < 100)[0]
+        det_b = det_b[index, :]
+    else:
+        index = np.where(
+            np.maximum(det_b[:, 2] - det_b[:, 0] + 1,
+                       det_b[:, 3] - det_b[:, 1] + 1) > 30)[0]
+        det_b = det_b[index, :]
+    return det_s, det_b
+def get_im_shrink(image_shape):
+    max_shrink_v1 = (0x7fffffff / 577.0 /
+                     (image_shape[1] * image_shape[2]))**0.5
+    max_shrink_v2 = (
+        (678 * 1024 * 2.0 * 2.0) / (image_shape[1] * image_shape[2]))**0.5
+    max_shrink = get_round(min(max_shrink_v1, max_shrink_v2), 2) - 0.3
+    if max_shrink >= 1.5 and max_shrink < 2:
+        max_shrink = max_shrink - 0.1
+    elif max_shrink >= 2 and max_shrink < 3:
+        max_shrink = max_shrink - 0.2
+    elif max_shrink >= 3 and max_shrink < 4:
+        max_shrink = max_shrink - 0.3
+    elif max_shrink >= 4 and max_shrink < 5:
+        max_shrink = max_shrink - 0.4
+    elif max_shrink >= 5:
+        max_shrink = max_shrink - 0.5
+    print 'max_shrink = ', max_shrink
+    shrink = max_shrink if max_shrink < 1 else 1
+    print "shrink = ", shrink
+    return shrink, max_shrink
+def infer(args, batch_size, data_args):
+    if not os.path.exists(args.model_dir):
+        raise ValueError("The model path [%s] does not exist." %
+                         (args.model_dir))
+    infer_reader = paddle.batch(
+        reader.test(data_args, file_list), batch_size=batch_size)
+    for batch_id, img in enumerate(infer_reader()):
+        image = img[0][0]
+        image_path = img[0][1]
+        # image.size: [width, height]
+        image_shape = [3, image.size[1], image.size[0]]
+        shrink, max_shrink = get_im_shrink(image_shape)
+        det0 = detect_face(image, shrink)
+        det1 = flip_test(image, shrink)
+        [det2, det3] = multi_scale_test(image, max_shrink)
+        det = np.row_stack((det0, det1, det2, det3))
+        dets = bbox_vote(det)
+        image_name = image_path.split('/')[-1]
+        image_class = image_path.split('/')[-2]
+        if not os.path.exists('./infer_results/' + image_class.encode('utf-8')):
+            os.makedirs('./infer_results/' + image_class.encode('utf-8'))
+        f = open('./infer_results/' + image_class.encode('utf-8') + '/' +
+                 image_name.encode('utf-8')[:-4] + '.txt', 'w')
+        write_to_txt(image_path, f, dets)
+        # draw_bounding_box_on_image(image_path, dets, args.confs_threshold)
+    print "Done"
+if __name__ == '__main__':
+    args = parser.parse_args()
+    print_arguments(args)
+    data_dir = 'data/WIDERFACE/WIDER_val/images/'
+    file_list = 'label/val_gt_widerface.res'
+    data_args = reader.Settings(
+        data_dir=data_dir,
+        mean_value=[104., 117., 123],
+        apply_distort=False,
+        apply_expand=False,
+        ap_version='11point')
+    infer(args, batch_size=1, data_args=data_args)
--- a/fluid/face_detection/pyramidbox.py
+++ b/fluid/face_detection/pyramidbox.py
+import numpy as np
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Xavier
+from paddle.fluid.initializer import Constant
+from paddle.fluid.initializer import Bilinear
+from paddle.fluid.regularizer import L2Decay
+def conv_bn(input, filter, ksize, stride, padding, act='relu', bias_attr=False):
+    conv = fluid.layers.conv2d(
+        input=input,
+        filter_size=ksize,
+        num_filters=filter,
+        stride=stride,
+        padding=padding,
+        act=None,
+        bias_attr=bias_attr)
+    return fluid.layers.batch_norm(input=conv, act=act)
+def conv_block(input, groups, filters, ksizes, strides=None, with_pool=True):
+    assert len(filters) == groups
+    assert len(ksizes) == groups
+    strides = [1] * groups if strides is None else strides
+    w_attr = ParamAttr(learning_rate=1., initializer=Xavier())
+    b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
+    conv = input
+    for i in xrange(groups):
+        conv = fluid.layers.conv2d(
+            input=conv,
+            num_filters=filters[i],
+            filter_size=ksizes[i],
+            stride=strides[i],
+            padding=(ksizes[i] - 1) / 2,
+            param_attr=w_attr,
+            bias_attr=b_attr,
+            act='relu')
+    if with_pool:
+        pool = fluid.layers.pool2d(
+            input=conv,
+            pool_size=2,
+            pool_type='max',
+            pool_stride=2,
+            ceil_mode=True)
+        return conv, pool
+    else:
+        return conv
+class PyramidBox(object):
+    def __init__(self,
+                 data_shape,
+                 num_classes,
+                 use_transposed_conv2d=True,
+                 is_infer=False,
+                 sub_network=False):
+        """
+        TODO(qingqing): add comments.
+        """
+        self.data_shape = data_shape
+        self.min_sizes = [16., 32., 64., 128., 256., 512.]
+        self.steps = [4., 8., 16., 32., 64., 128.]
+        self.num_classes = num_classes
+        self.use_transposed_conv2d = use_transposed_conv2d
+        self.is_infer = is_infer
+        self.sub_network = sub_network
+        # the base network is VGG with atrous layers
+        self._input()
+        self._vgg()
+        if sub_network:
+            self._low_level_fpn()
+            self._cpm_module()
+            self._pyramidbox()
+        else:
+            self._vgg_ssd()
+    def feeds(self):
+        if self.is_infer:
+            return [self.image]
+        else:
+            return [
+                self.image, self.face_box, self.head_box, self.gt_label,
+                self.difficult
+            ]
+    def _input(self):
+        self.image = fluid.layers.data(
+            name='image', shape=self.data_shape, dtype='float32')
+        if not self.is_infer:
+            self.face_box = fluid.layers.data(
+                name='face_box', shape=[4], dtype='float32', lod_level=1)
+            self.head_box = fluid.layers.data(
+                name='head_box', shape=[4], dtype='float32', lod_level=1)
+            self.gt_label = fluid.layers.data(
+                name='gt_label', shape=[1], dtype='int32', lod_level=1)
+            self.difficult = fluid.layers.data(
+                name='gt_difficult', shape=[1], dtype='int32', lod_level=1)
+    def _vgg(self):
+        self.conv1, self.pool1 = conv_block(self.image, 2, [64] * 2, [3] * 2)
+        self.conv2, self.pool2 = conv_block(self.pool1, 2, [128] * 2, [3] * 2)
+        #priorbox min_size is 16
+        self.conv3, self.pool3 = conv_block(self.pool2, 3, [256] * 3, [3] * 3)
+        #priorbox min_size is 32
+        self.conv4, self.pool4 = conv_block(self.pool3, 3, [512] * 3, [3] * 3)
+        #priorbox min_size is 64
+        self.conv5, self.pool5 = conv_block(self.pool4, 3, [512] * 3, [3] * 3)
+        # fc6 and fc7 in paper, priorbox min_size is 128
+        self.conv6 = conv_block(
+            self.pool5, 2, [1024, 1024], [3, 1], with_pool=False)
+        # conv6_1 and conv6_2 in paper, priorbox min_size is 256
+        self.conv7 = conv_block(
+            self.conv6, 2, [256, 512], [1, 3], [1, 2], with_pool=False)
+        # conv7_1 and conv7_2 in paper, priorbox mini_size is 512
+        self.conv8 = conv_block(
+            self.conv7, 2, [128, 256], [1, 3], [1, 2], with_pool=False)
+    def _low_level_fpn(self):
+        """
+        Low-level feature pyramid network.
+        """
+        def fpn(up_from, up_to):
+            ch = up_to.shape[1]
+            b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
+            conv1 = fluid.layers.conv2d(
+                up_from, ch, 1, act='relu', bias_attr=b_attr)
+            if self.use_transposed_conv2d:
+                w_attr = ParamAttr(
+                    learning_rate=0.,
+                    regularizer=L2Decay(0.),
+                    initializer=Bilinear())
+                upsampling = fluid.layers.conv2d_transpose(
+                    conv1,
+                    ch,
+                    output_size=None,
+                    filter_size=4,
+                    padding=1,
+                    stride=2,
+                    groups=ch,
+                    param_attr=w_attr,
+                    bias_attr=False)
+            else:
+                upsampling = fluid.layers.resize_bilinear(
+                    conv1, out_shape=up_to.shape[2:])
+            b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
+            conv2 = fluid.layers.conv2d(
+                up_to, ch, 1, act='relu', bias_attr=b_attr)
+            if self.is_infer:
+                upsampling = fluid.layers.crop(upsampling, shape=conv2)
+            # eltwise mul
+            conv_fuse = upsampling * conv2
+            return conv_fuse
+        self.lfpn2_on_conv5 = fpn(self.conv6, self.conv5)
+        self.lfpn1_on_conv4 = fpn(self.lfpn2_on_conv5, self.conv4)
+        self.lfpn0_on_conv3 = fpn(self.lfpn1_on_conv4, self.conv3)
+    def _cpm_module(self):
+        """
+        Context-sensitive Prediction Module 
+        """
+        def cpm(input):
+            # residual
+            branch1 = conv_bn(input, 1024, 1, 1, 0, None)
+            branch2a = conv_bn(input, 256, 1, 1, 0, act='relu')
+            branch2b = conv_bn(branch2a, 256, 3, 1, 1, act='relu')
+            branch2c = conv_bn(branch2b, 1024, 1, 1, 0, None)
+            sum = branch1 + branch2c
+            rescomb = fluid.layers.relu(x=sum)
+            # ssh
+            b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
+            ssh_1 = fluid.layers.conv2d(rescomb, 256, 3, 1, 1, bias_attr=b_attr)
+            ssh_dimred = fluid.layers.conv2d(
+                rescomb, 128, 3, 1, 1, act='relu', bias_attr=b_attr)
+            ssh_2 = fluid.layers.conv2d(
+                ssh_dimred, 128, 3, 1, 1, bias_attr=b_attr)
+            ssh_3a = fluid.layers.conv2d(
+                ssh_dimred, 128, 3, 1, 1, act='relu', bias_attr=b_attr)
+            ssh_3b = fluid.layers.conv2d(ssh_3a, 128, 3, 1, 1, bias_attr=b_attr)
+            ssh_concat = fluid.layers.concat([ssh_1, ssh_2, ssh_3b], axis=1)
+            ssh_out = fluid.layers.relu(x=ssh_concat)
+            return ssh_out
+        self.ssh_conv3 = cpm(self.lfpn0_on_conv3)
+        self.ssh_conv4 = cpm(self.lfpn1_on_conv4)
+        self.ssh_conv5 = cpm(self.lfpn2_on_conv5)
+        self.ssh_conv6 = cpm(self.conv6)
+        self.ssh_conv7 = cpm(self.conv7)
+        self.ssh_conv8 = cpm(self.conv8)
+    def _l2_norm_scale(self, input, init_scale=1.0, channel_shared=False):
+        from paddle.fluid.layer_helper import LayerHelper
+        helper = LayerHelper("Scale")
+        l2_norm = fluid.layers.l2_normalize(
+            input, axis=1)  # l2 norm along channel
+        shape = [1] if channel_shared else [input.shape[1]]
+        scale = helper.create_parameter(
+            attr=helper.param_attr,
+            shape=shape,
+            dtype=input.dtype,
+            default_initializer=Constant(init_scale))
+        out = fluid.layers.elementwise_mul(
+            x=l2_norm, y=scale, axis=-1 if channel_shared else 1)
+        return out
+    def _pyramidbox(self):
+        """
+        Get prior-boxes and pyramid-box
+        """
+        self.ssh_conv3_norm = self._l2_norm_scale(
+            self.ssh_conv3, init_scale=10.)
+        self.ssh_conv4_norm = self._l2_norm_scale(self.ssh_conv4, init_scale=8.)
+        self.ssh_conv5_norm = self._l2_norm_scale(self.ssh_conv5, init_scale=5.)
+        def permute_and_reshape(input, last_dim):
+            trans = fluid.layers.transpose(input, perm=[0, 2, 3, 1])
+            new_shape = [
+                trans.shape[0], np.prod(trans.shape[1:]) / last_dim, last_dim
+            ]
+            return fluid.layers.reshape(trans, shape=new_shape)
+        face_locs, face_confs = [], []
+        head_locs, head_confs = [], []
+        boxes, vars = [], []
+        inputs = [
+            self.ssh_conv3_norm, self.ssh_conv4_norm, self.ssh_conv5_norm,
+            self.ssh_conv6, self.ssh_conv7, self.ssh_conv8
+        ]
+        b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
+        for i, input in enumerate(inputs):
+            mbox_loc = fluid.layers.conv2d(input, 8, 3, 1, 1, bias_attr=b_attr)
+            face_loc, head_loc = fluid.layers.split(
+                mbox_loc, num_or_sections=2, dim=1)
+            face_loc = permute_and_reshape(face_loc, 4)
+            head_loc = permute_and_reshape(head_loc, 4)
+            mbox_conf = fluid.layers.conv2d(input, 6, 3, 1, 1, bias_attr=b_attr)
+            face_conf1, face_conf3, head_conf = fluid.layers.split(
+                mbox_conf, num_or_sections=[1, 3, 2], dim=1)
+            face_conf3_maxin = fluid.layers.reduce_max(
+                face_conf3, dim=1, keep_dim=True)
+            face_conf = fluid.layers.concat(
+                [face_conf1, face_conf3_maxin], axis=1)
+            face_conf = permute_and_reshape(face_conf, 2)
+            head_conf = permute_and_reshape(head_conf, 2)
+            face_locs.append(face_loc)
+            face_confs.append(face_conf)
+            head_locs.append(head_loc)
+            head_confs.append(head_conf)
+            box, var = fluid.layers.prior_box(
+                input,
+                self.image,
+                min_sizes=[self.min_sizes[i]],
+                steps=[self.steps[i]] * 2,
+                aspect_ratios=[1.],
+                clip=False,
+                flip=True,
+                offset=0.5)
+            box = fluid.layers.reshape(box, shape=[-1, 4])
+            var = fluid.layers.reshape(var, shape=[-1, 4])
+            boxes.append(box)
+            vars.append(var)
+        self.face_mbox_loc = fluid.layers.concat(face_locs, axis=1)
+        self.face_mbox_conf = fluid.layers.concat(face_confs, axis=1)
+        self.head_mbox_loc = fluid.layers.concat(head_locs, axis=1)
+        self.head_mbox_conf = fluid.layers.concat(head_confs, axis=1)
+        self.prior_boxes = fluid.layers.concat(boxes)
+        self.box_vars = fluid.layers.concat(vars)
+    def _vgg_ssd(self):
+        self.conv3_norm = self._l2_norm_scale(self.conv3, init_scale=10.)
+        self.conv4_norm = self._l2_norm_scale(self.conv4, init_scale=8.)
+        self.conv5_norm = self._l2_norm_scale(self.conv5, init_scale=5.)
+        def permute_and_reshape(input, last_dim):
+            trans = fluid.layers.transpose(input, perm=[0, 2, 3, 1])
+            new_shape = [
+                trans.shape[0], np.prod(trans.shape[1:]) / last_dim, last_dim
+            ]
+            return fluid.layers.reshape(trans, shape=new_shape)
+        locs, confs = [], []
+        boxes, vars = [], []
+        b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
+        # conv3
+        mbox_loc = fluid.layers.conv2d(
+            self.conv3_norm, 4, 3, 1, 1, bias_attr=b_attr)
+        loc = permute_and_reshape(mbox_loc, 4)
+        mbox_conf = fluid.layers.conv2d(
+            self.conv3_norm, 4, 3, 1, 1, bias_attr=b_attr)
+        conf1, conf3 = fluid.layers.split(
+            mbox_conf, num_or_sections=[1, 3], dim=1)
+        conf3_maxin = fluid.layers.reduce_max(conf3, dim=1, keep_dim=True)
+        conf = fluid.layers.concat([conf1, conf3_maxin], axis=1)
+        conf = permute_and_reshape(conf, 2)
+        box, var = fluid.layers.prior_box(
+            self.conv3_norm,
+            self.image,
+            min_sizes=[16.],
+            steps=[4, 4],
+            aspect_ratios=[1.],
+            clip=False,
+            flip=True,
+            offset=0.5)
+        box = fluid.layers.reshape(box, shape=[-1, 4])
+        var = fluid.layers.reshape(var, shape=[-1, 4])
+        locs.append(loc)
+        confs.append(conf)
+        boxes.append(box)
+        vars.append(var)
+        min_sizes = [32., 64., 128., 256., 512.]
+        steps = [8., 16., 32., 64., 128.]
+        inputs = [
+            self.conv4_norm, self.conv5_norm, self.conv6, self.conv7, self.conv8
+        ]
+        for i, input in enumerate(inputs):
+            mbox_loc = fluid.layers.conv2d(input, 4, 3, 1, 1, bias_attr=b_attr)
+            loc = permute_and_reshape(mbox_loc, 4)
+            mbox_conf = fluid.layers.conv2d(input, 2, 3, 1, 1, bias_attr=b_attr)
+            conf = permute_and_reshape(mbox_conf, 2)
+            box, var = fluid.layers.prior_box(
+                input,
+                self.image,
+                min_sizes=[min_sizes[i]],
+                steps=[steps[i]] * 2,
+                aspect_ratios=[1.],
+                clip=False,
+                flip=True,
+                offset=0.5)
+            box = fluid.layers.reshape(box, shape=[-1, 4])
+            var = fluid.layers.reshape(var, shape=[-1, 4])
+            locs.append(loc)
+            confs.append(conf)
+            boxes.append(box)
+            vars.append(var)
+        self.face_mbox_loc = fluid.layers.concat(locs, axis=1)
+        self.face_mbox_conf = fluid.layers.concat(confs, axis=1)
+        self.prior_boxes = fluid.layers.concat(boxes)
+        self.box_vars = fluid.layers.concat(vars)
+    def vgg_ssd_loss(self):
+        loss = fluid.layers.ssd_loss(
+            self.face_mbox_loc,
+            self.face_mbox_conf,
+            self.face_box,
+            self.gt_label,
+            self.prior_boxes,
+            self.box_vars,
+            overlap_threshold=0.35,
+            neg_overlap=0.35)
+        loss = fluid.layers.reduce_sum(loss)
+        return loss
+    def train(self):
+        face_loss = fluid.layers.ssd_loss(
+            self.face_mbox_loc,
+            self.face_mbox_conf,
+            self.face_box,
+            self.gt_label,
+            self.prior_boxes,
+            self.box_vars,
+            overlap_threshold=0.35,
+            neg_overlap=0.35)
+        head_loss = fluid.layers.ssd_loss(
+            self.head_mbox_loc,
+            self.head_mbox_conf,
+            self.head_box,
+            self.gt_label,
+            self.prior_boxes,
+            self.box_vars,
+            overlap_threshold=0.35,
+            neg_overlap=0.35)
+        face_loss = fluid.layers.reduce_sum(face_loss)
+        head_loss = fluid.layers.reduce_sum(head_loss)
+        total_loss = face_loss + head_loss
+        return face_loss, head_loss, total_loss
+    def infer(self, main_program=None):
+        if main_program is None:
+            test_program = fluid.default_main_program().clone(for_test=True)
+        else:
+            test_program = main_program.clone(for_test=True)
+        with fluid.program_guard(test_program):
+            face_nmsed_out = fluid.layers.detection_output(
+                self.face_mbox_loc,
+                self.face_mbox_conf,
+                self.prior_boxes,
+                self.box_vars,
+                nms_threshold=0.45)
+        return test_program, face_nmsed_out
--- a/fluid/face_detection/reader.py
+++ b/fluid/face_detection/reader.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import image_util
+from paddle.utils.image_util import *
+import random
+from PIL import Image
+from PIL import ImageDraw
+import numpy as np
+import xml.etree.ElementTree
+import os
+import time
+import copy
+import random
+class Settings(object):
+    def __init__(self,
+                 dataset=None,
+                 data_dir=None,
+                 label_file=None,
+                 resize_h=None,
+                 resize_w=None,
+                 mean_value=[104., 117., 123.],
+                 apply_distort=True,
+                 apply_expand=True,
+                 ap_version='11point',
+                 toy=0):
+        self.dataset = dataset
+        self.ap_version = ap_version
+        self.toy = toy
+        self.data_dir = data_dir
+        self.apply_distort = apply_distort
+        self.apply_expand = apply_expand
+        self.resize_height = resize_h
+        self.resize_width = resize_w
+        self.img_mean = np.array(mean_value)[:, np.newaxis, np.newaxis].astype(
+            'float32')
+        self.expand_prob = 0.5
+        self.expand_max_ratio = 4
+        self.hue_prob = 0.5
+        self.hue_delta = 18
+        self.contrast_prob = 0.5
+        self.contrast_delta = 0.5
+        self.saturation_prob = 0.5
+        self.saturation_delta = 0.5
+        self.brightness_prob = 0.5
+        # _brightness_delta is the normalized value by 256
+        # self._brightness_delta = 32
+        self.brightness_delta = 0.125
+        self.scale = 0.007843  # 1 / 127.5
+        self.data_anchor_sampling_prob = 0.5
+def preprocess(img, bbox_labels, mode, settings):
+    img_width, img_height = img.size
+    sampled_labels = bbox_labels
+    if mode == 'train':
+        if settings.apply_distort:
+            img = image_util.distort_image(img, settings)
+        if settings.apply_expand:
+            img, bbox_labels, img_width, img_height = image_util.expand_image(
+                img, bbox_labels, img_width, img_height, settings)
+        # sampling
+        batch_sampler = []
+        prob = random.uniform(0., 1.)
+        if prob > settings.data_anchor_sampling_prob:
+            scale_array = np.array([16, 32, 64, 128, 256, 512])
+            batch_sampler.append(
+                image_util.sampler(1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2,
+                                   0.0, True))
+            sampled_bbox = image_util.generate_batch_random_samples(
+                batch_sampler, bbox_labels, img_width, img_height, scale_array,
+                settings.resize_width, settings.resize_height)
+            img = np.array(img)
+            if len(sampled_bbox) > 0:
+                idx = int(random.uniform(0, len(sampled_bbox)))
+                img, sampled_labels = image_util.crop_image_sampling(
+                    img, bbox_labels, sampled_bbox[idx], img_width, img_height,
+                    resize_width, resize_heigh)
+            img = Image.fromarray(img)
+        else:
+            # hard-code here
+            batch_sampler.append(
+                image_util.sampler(1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
+                                   0.0, True))
+            batch_sampler.append(
+                image_util.sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
+                                   0.0, True))
+            batch_sampler.append(
+                image_util.sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
+                                   0.0, True))
+            batch_sampler.append(
+                image_util.sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
+                                   0.0, True))
+            batch_sampler.append(
+                image_util.sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
+                                   0.0, True))
+            sampled_bbox = image_util.generate_batch_samples(
+                batch_sampler, bbox_labels, img_width, img_height)
+            img = np.array(img)
+            if len(sampled_bbox) > 0:
+                idx = int(random.uniform(0, len(sampled_bbox)))
+                img, sampled_labels = image_util.crop_image(
+                    img, bbox_labels, sampled_bbox[idx], img_width, img_height)
+            img = Image.fromarray(img)
+    img = img.resize((settings.resize_width, settings.resize_height),
+                     Image.ANTIALIAS)
+    img = np.array(img)
+    if mode == 'train':
+        mirror = int(random.uniform(0, 2))
+        if mirror == 1:
+            img = img[:, ::-1, :]
+            for i in xrange(len(sampled_labels)):
+                tmp = sampled_labels[i][1]
+                sampled_labels[i][1] = 1 - sampled_labels[i][3]
+                sampled_labels[i][3] = 1 - tmp
+    # HWC to CHW
+    if len(img.shape) == 3:
+        img = np.swapaxes(img, 1, 2)
+        img = np.swapaxes(img, 1, 0)
+    # RBG to BGR
+    img = img[[2, 1, 0], :, :]
+    img = img.astype('float32')
+    img -= settings.img_mean
+    img = img * settings.scale
+    return img, sampled_labels
+def put_txt_in_dict(input_txt):
+    with open(input_txt, 'r') as f_dir:
+        lines_input_txt = f_dir.readlines()
+    dict_input_txt = {}
+    num_class = 0
+    for i in range(len(lines_input_txt)):
+        tmp_line_txt = lines_input_txt[i].strip('\n\t\r')
+        if '--' in tmp_line_txt:
+            if i != 0:
+                num_class += 1
+            dict_input_txt[num_class] = []
+            dict_name = tmp_line_txt
+            dict_input_txt[num_class].append(tmp_line_txt)
+        if '--' not in tmp_line_txt:
+            if len(tmp_line_txt) > 6:
+                split_str = tmp_line_txt.split(' ')
+                x1_min = float(split_str[0])
+                y1_min = float(split_str[1])
+                x2_max = float(split_str[2])
+                y2_max = float(split_str[3])
+                tmp_line_txt = str(x1_min) + ' ' + str(y1_min) + ' ' + str(
+                    x2_max) + ' ' + str(y2_max)
+                dict_input_txt[num_class].append(tmp_line_txt)
+            else:
+                dict_input_txt[num_class].append(tmp_line_txt)
+    return dict_input_txt
+def expand_bboxes(bboxes,
+                  expand_left=2.,
+                  expand_up=2.,
+                  expand_right=2.,
+                  expand_down=2.):
+    """
+    Expand bboxes, expand 2 times by defalut.
+    """
+    expand_boxes = []
+    for bbox in bboxes:
+        xmin = bbox[0]
+        ymin = bbox[1]
+        xmax = bbox[2]
+        ymax = bbox[3]
+        w = xmax - xmin
+        h = ymax - ymin
+        ex_xmin = max(xmin - w / expand_left, 0.)
+        ex_ymin = max(ymin - h / expand_up, 0.)
+        ex_xmax = min(xmax + w / expand_right, 1.)
+        ex_ymax = min(ymax + h / expand_down, 1.)
+        expand_boxes.append([ex_xmin, ex_ymin, ex_xmax, ex_ymax])
+    return expand_boxes
+def pyramidbox(settings, file_list, mode, shuffle):
+    dict_input_txt = {}
+    dict_input_txt = put_txt_in_dict(file_list)
+    def reader():
+        if mode == 'train' and shuffle:
+            random.shuffle(dict_input_txt)
+        for index_image in range(len(dict_input_txt)):
+            image_name = dict_input_txt[index_image][0] + '.jpg'
+            image_path = os.path.join(settings.data_dir, image_name)
+            im = Image.open(image_path)
+            if im.mode == 'L':
+                im = im.convert('RGB')
+            im_width, im_height = im.size
+            # layout: label | xmin | ymin | xmax | ymax
+            if mode == 'train':
+                bbox_labels = []
+                for index_box in range(len(dict_input_txt[index_image])):
+                    if index_box >= 2:
+                        bbox_sample = []
+                        temp_info_box = dict_input_txt[index_image][
+                            index_box].split(' ')
+                        xmin = float(temp_info_box[0])
+                        ymin = float(temp_info_box[1])
+                        w = float(temp_info_box[2])
+                        h = float(temp_info_box[3])
+                        xmax = xmin + w
+                        ymax = ymin + h
+                        bbox_sample.append(1)
+                        bbox_sample.append(float(xmin) / im_width)
+                        bbox_sample.append(float(ymin) / im_height)
+                        bbox_sample.append(float(xmax) / im_width)
+                        bbox_sample.append(float(ymax) / im_height)
+                        bbox_labels.append(bbox_sample)
+                im, sample_labels = preprocess(im, bbox_labels, mode, settings)
+                sample_labels = np.array(sample_labels)
+                if len(sample_labels) == 0: continue
+                im = im.astype('float32')
+                boxes = sample_labels[:, 1:5]
+                lbls = [1] * len(boxes)
+                difficults = [1] * len(boxes)
+                yield im, boxes, expand_bboxes(boxes), lbls, difficults
+            if mode == 'test':
+                yield im, image_path
+    return reader
+def train(settings, file_list, shuffle=True):
+    return pyramidbox(settings, file_list, 'train', shuffle)
+def test(settings, file_list):
+    return pyramidbox(settings, file_list, 'test', False)
+def infer(settings, image_path):
+    def batch_reader():
+        img = Image.open(image_path)
+        if img.mode == 'L':
+            img = im.convert('RGB')
+        im_width, im_height = img.size
+        if settings.resize_width and settings.resize_height:
+            img = img.resize((settings.resize_width, settings.resize_height),
+                             Image.ANTIALIAS)
+        img = np.array(img)
+        # HWC to CHW
+        if len(img.shape) == 3:
+            img = np.swapaxes(img, 1, 2)
+            img = np.swapaxes(img, 1, 0)
+        # RBG to BGR
+        img = img[[2, 1, 0], :, :]
+        img = img.astype('float32')
+        img -= settings.img_mean
+        img = img * settings.scale
+        return np.array([img])
+    return batch_reader
--- a/fluid/face_detection/train.py
+++ b/fluid/face_detection/train.py
+import os
+import shutil
+import numpy as np
+import time
+import argparse
+import functools
+import reader
+import paddle
+import paddle.fluid as fluid
+from pyramidbox import PyramidBox
+from utility import add_arguments, print_arguments
+parser = argparse.ArgumentParser(description=__doc__)
+add_arg = functools.partial(add_arguments, argparser=parser)
+# yapf: disable
+add_arg('parallel',         bool,  True,            "parallel")
+add_arg('learning_rate',    float, 0.001,           "Learning rate.")
+add_arg('batch_size',       int,   12,              "Minibatch size.")
+add_arg('num_passes',       int,   120,             "Epoch number.")
+add_arg('use_gpu',          bool,  True,            "Whether use GPU.")
+add_arg('use_pyramidbox',   bool,  True,            "Whether use PyramidBox model.")
+add_arg('model_save_dir',   str,   'output',        "The path to save model.")
+add_arg('pretrained_model', str,   './pretrained/', "The init model path.")
+add_arg('resize_h',         int,   640,             "The resized image height.")
+add_arg('resize_w',         int,   640,             "The resized image height.")
+#yapf: enable
+def train(args, config, train_file_list, optimizer_method):
+    learning_rate = args.learning_rate
+    batch_size = args.batch_size
+    num_passes = args.num_passes
+    height = args.resize_h
+    width = args.resize_w
+    use_gpu = args.use_gpu
+    use_pyramidbox = args.use_pyramidbox
+    model_save_dir = args.model_save_dir
+    pretrained_model = args.pretrained_model
+    num_classes = 2
+    image_shape = [3, height, width]
+    devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
+    devices_num = len(devices.split(","))
+    fetches = []
+    network = PyramidBox(image_shape, num_classes,
+                         sub_network=use_pyramidbox)
+    if use_pyramidbox:
+        face_loss, head_loss, loss = network.train()
+        fetches = [face_loss, head_loss]
+    else:
+        loss = network.vgg_ssd_loss()
+        fetches = [loss]
+    epocs = 12880 / batch_size
+    boundaries = [epocs * 40, epocs * 60, epocs * 80, epocs * 100]
+    values = [
+        learning_rate, learning_rate * 0.5, learning_rate * 0.25,
+        learning_rate * 0.1, learning_rate * 0.01
+    ]
+    if optimizer_method == "momentum":
+        optimizer = fluid.optimizer.Momentum(
+            learning_rate=fluid.layers.piecewise_decay(
+                boundaries=boundaries, values=values),
+            momentum=0.9,
+            regularization=fluid.regularizer.L2Decay(0.0005),
+        )
+    else:
+        optimizer = fluid.optimizer.RMSProp(
+            learning_rate=fluid.layers.piecewise_decay(boundaries, values),
+            regularization=fluid.regularizer.L2Decay(0.0005),
+        )
+    optimizer.minimize(loss)
+    #fluid.memory_optimize(fluid.default_main_program())
+    place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    start_pass = 0
+    if pretrained_model:
+        if pretrained_model.isdigit():
+            start_pass = int(pretrained_model) + 1
+            pretrained_model = os.path.join(model_save_dir, pretrained_model)
+            print("Resume from %s " %(pretrained_model))
+        if not os.path.exists(pretrained_model):
+            raise ValueError("The pre-trained model path [%s] does not exist." %
+                             (pretrained_model))
+        def if_exist(var):
+            return os.path.exists(os.path.join(pretrained_model, var.name))
+        fluid.io.load_vars(exe, pretrained_model, predicate=if_exist)
+    if args.parallel:
+        train_exe = fluid.ParallelExecutor(
+            use_cuda=use_gpu, loss_name=loss.name)
+    train_reader = paddle.batch(
+        reader.train(config, train_file_list), batch_size=batch_size)
+    feeder = fluid.DataFeeder(place=place, feed_list=network.feeds())
+    def save_model(postfix):
+        model_path = os.path.join(model_save_dir, postfix)
+        if os.path.isdir(model_path):
+            shutil.rmtree(model_path)
+        print 'save models to %s' % (model_path)
+        fluid.io.save_persistables(exe, model_path)
+    for pass_id in range(start_pass, num_passes):
+        start_time = time.time()
+        prev_start_time = start_time
+        end_time = 0
+        for batch_id, data in enumerate(train_reader()):
+            prev_start_time = start_time
+            start_time = time.time()
+            if len(data) < 2 * devices_num: continue
+            if args.parallel:
+                fetch_vars = train_exe.run(fetch_list=[v.name for v in fetches],
+                                           feed=feeder.feed(data))
+            else:
+                fetch_vars = exe.run(fluid.default_main_program(),
+                                     feed=feeder.feed(data),
+                                     fetch_list=fetches)
+            end_time = time.time()
+            fetch_vars = [np.mean(np.array(v)) for v in fetch_vars]
+            if batch_id % 1 == 0:
+                if not args.use_pyramidbox:
+                    print("Pass {0}, batch {1}, loss {2}, time {3}".format(
+                        pass_id, batch_id, fetch_vars[0],
+                        start_time - prev_start_time))
+                else:
+                    print("Pass {0}, batch {1}, face loss {2}, head loss {3}, " \
+                          "time {4}".format(pass_id,
+                           batch_id, fetch_vars[0], fetch_vars[1],
+                           start_time - prev_start_time))
+        if pass_id % 1 == 0 or pass_id == num_passes - 1:
+            save_model(str(pass_id))
+if __name__ == '__main__':
+    args = parser.parse_args()
+    print_arguments(args)
+    data_dir = 'data/WIDERFACE/WIDER_train/images/'
+    train_file_list = 'label/train_gt_widerface.res'
+    config = reader.Settings(
+        data_dir=data_dir,
+        resize_h=args.resize_h,
+        resize_w=args.resize_w,
+        apply_expand=False,
+        mean_value=[104., 117., 123],
+        ap_version='11point')
+    train(args, config, train_file_list, optimizer_method="momentum")
--- a/fluid/face_detection/utility.py
+++ b/fluid/face_detection/utility.py
+#  Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import distutils.util
+def print_arguments(args):
+    """Print argparse's arguments.
+    Usage:
+    .. code-block:: python
+        parser = argparse.ArgumentParser()
+        parser.add_argument("name", default="Jonh", type=str, help="User name.")
+        args = parser.parse_args()
+        print_arguments(args)
+    :param args: Input argparse.Namespace for printing.
+    :type args: argparse.Namespace
+    """
+    print("-----------  Configuration Arguments -----------")
+    for arg, value in sorted(vars(args).iteritems()):
+        print("%s: %s" % (arg, value))
+    print("------------------------------------------------")
+def add_arguments(argname, type, default, help, argparser, **kwargs):
+    """Add argparse's argument.
+    Usage:
+    .. code-block:: python
+        parser = argparse.ArgumentParser()
+        add_argument("name", str, "Jonh", "User name.", parser)
+        args = parser.parse_args()
+    """
+    type = distutils.util.strtobool if type == bool else type
+    argparser.add_argument(
+        "--" + argname,
+        default=default,
+        type=type,
+        help=help + ' Default: %(default)s.',
+        **kwargs)
--- a/fluid/icnet/README.md
+++ b/fluid/icnet/README.md
+运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+## 代码结构
+```
+├── network.py # 网络结构定义脚本
+├── train.py   # 训练任务脚本
+├── eval.py    # 评估脚本
+├── infer.py   # 预测脚本
+├── cityscape.py    # 数据预处理脚本
+└── utils.py    # 定义通用的函数
+```
+## 简介
+Image Cascade Network（ICNet)主要用于图像实时语义分割。相较于其它压缩计算的方法，ICNet即考虑了速度，也考虑了准确性。
+ICNet的主要思想是将输入图像变换为不同的分辨率，然后用不同计算复杂度的子网络计算不同分辨率的输入，然后将结果合并。ICNet由三个子网络组成，计算复杂度高的网络处理低分辨率输入，计算复杂度低的网络处理分辨率高的网络，通过这种方式在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡。
+整个网络结构如下：
+<p align="center">
+<img src="images/icnet.png" width="620" hspace='10'/> <br/>
+<strong>图 1</strong>
+</p>
+## 数据准备
+本文采用Cityscape数据集，请前往[Cityscape官网](https://www.cityscapes-dataset.com)注册下载。下载数据之后，按照[这里](https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/preparation/createTrainIdLabelImgs.py#L3)的说明和工具处理数据。
+处理之后的数据
+```
+data/cityscape/
+|-- gtFine
+|   |-- test
+|   |-- train
+|   `-- val
+|-- leftImg8bit
+|   |-- test
+|   |-- train
+|   `-- val
+|-- train.list
+`-- val.list
+```
+其中，train.list和val.list分别是用于训练和测试的列表文件，第一列为输入图像数据，第二列为标注数据，两列用空格分开。示例如下：
+```
+leftImg8bit/train/stuttgart/stuttgart_000021_000019_leftImg8bit.png gtFine/train/stuttgart/stuttgart_000021_000019_gtFine_labelTrainIds.png
+leftImg8bit/train/stuttgart/stuttgart_000072_000019_leftImg8bit.png gtFine/train/stuttgart/stuttgart_000072_000019_gtFine_labelTrainIds.png
+```
+完成数据下载和准备后，需要修改`cityscape.py`脚本中对应的数据地址。
+## 模型训练与预测
+### 训练
+执行以下命令进行训练，同时指定checkpoint保存路径:
+```
+python train.py --batch_size=16 --use_gpu=True --checkpoint_path="./chkpnt/"
+```
+使用以下命令获得更多使用说明：
+```
+python train.py --help
+```
+训练过程中会根据用户的设置，输出训练集上每个网络分支的`loss`， 示例如下：
+```
+Iter[0]; train loss: 2.338; sub4_loss: 3.367; sub24_loss: 4.120; sub124_loss: 0.151
+```
+### 测试
+执行以下命令在`Cityscape`测试数据集上进行测试：
+```
+python eval.py --model_path="./model/" --use_gpu=True
+```
+需要通过选项`--model_path`指定模型文件。
+测试脚本的输出的评估指标为[mean IoU]()。
+### 预测
+执行以下命令对指定的数据进行预测：
+```
+python infer.py \
+--model_path="./model" \
+--images_path="./data/cityscape/" \
+--images_list="./data/cityscape/infer.list"
+```
+通过选项`--images_list`指定列表文件，列表文件中每一行为一个要预测的图片的路径。
+预测结果默认保存到当前路径下的`output`文件夹下。
+## 实验结果
+图2为在`CityScape`训练集上的训练的Loss曲线：
+<p align="center">
+<img src="images/train_loss.png" width="620" hspace='10'/> <br/>
+<strong>图 2</strong>
+</p>
+在训练集上训练，在validation数据集上验证的结果为：mean_IoU=67.0%(论文67.7%)
+图3是使用`infer.py`脚本预测产生的结果示例，其中，第一行为输入的原始图片，第二行为人工的标注，第三行为我们模型计算的结果。
+<p align="center">
+<img src="images/result.png" width="620" hspace='10'/> <br/>
+<strong>图 3</strong>
+</p>
+## 其他信息
+|数据集 | pretrained model |
+|---|---|
+|CityScape | [Model]()[md: ] |
+## 参考
+- [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
--- a/fluid/icnet/cityscape.py
+++ b/fluid/icnet/cityscape.py
+"""Reader for Cityscape dataset.
+"""
+import os
+import cv2
+import numpy as np
+import paddle.v2 as paddle
+DATA_PATH = "./data/cityscape"
+TRAIN_LIST = DATA_PATH + "/train.list"
+TEST_LIST = DATA_PATH + "/val.list"
+IGNORE_LABEL = 255
+NUM_CLASSES = 19
+TRAIN_DATA_SHAPE = (3, 720, 720)
+TEST_DATA_SHAPE = (3, 1024, 2048)
+IMG_MEAN = np.array((103.939, 116.779, 123.68), dtype=np.float32)
+def train_data_shape():
+    return TRAIN_DATA_SHAPE
+def test_data_shape():
+    return TEST_DATA_SHAPE
+def num_classes():
+    return NUM_CLASSES
+class DataGenerater:
+    def __init__(self, data_list, mode="train", flip=True, scaling=True):
+        self.flip = flip
+        self.scaling = scaling
+        self.image_label = []
+        with open(data_list, 'r') as f:
+            for line in f:
+                image_file, label_file = line.strip().split(' ')
+                self.image_label.append((image_file, label_file))
+    def create_train_reader(self, batch_size):
+        """
+        Create a reader for train dataset.
+        """
+        def reader():
+            np.random.shuffle(self.image_label)
+            images = []
+            labels_sub1 = []
+            labels_sub2 = []
+            labels_sub4 = []
+            count = 0
+            for image, label in self.image_label:
+                image, label_sub1, label_sub2, label_sub4 = self.process_train_data(
+                    image, label)
+                count += 1
+                images.append(image)
+                labels_sub1.append(label_sub1)
+                labels_sub2.append(label_sub2)
+                labels_sub4.append(label_sub4)
+                if count == batch_size:
+                    yield self.mask(
+                        np.array(images),
+                        np.array(labels_sub1),
+                        np.array(labels_sub2), np.array(labels_sub4))
+                    images = []
+                    labels_sub1 = []
+                    labels_sub2 = []
+                    labels_sub4 = []
+                    count = 0
+            if images:
+                yield self.mask(
+                    np.array(images),
+                    np.array(labels_sub1),
+                    np.array(labels_sub2), np.array(labels_sub4))
+        return reader
+    def create_test_reader(self):
+        """
+        Create a reader for test dataset.
+        """
+        def reader():
+            for image, label in self.image_label:
+                image, label = self.load(image, label)
+                image = paddle.image.to_chw(image)[np.newaxis, :]
+                label = label[np.newaxis, :, :, np.newaxis].astype("float32")
+                label_mask = np.where((label != IGNORE_LABEL).flatten())[
+                    0].astype("int32")
+                yield image, label, label_mask
+        return reader
+    def process_train_data(self, image, label):
+        """
+        Process training data.
+        """
+        image, label = self.load(image, label)
+        if self.flip:
+            image, label = self.random_flip(image, label)
+        if self.scaling:
+            image, label = self.random_scaling(image, label)
+        image, label = self.resize(image, label, out_size=TRAIN_DATA_SHAPE[1:])
+        label = label.astype("float32")
+        label_sub1 = paddle.image.to_chw(self.scale_label(label, factor=4))
+        label_sub2 = paddle.image.to_chw(self.scale_label(label, factor=8))
+        label_sub4 = paddle.image.to_chw(self.scale_label(label, factor=16))
+        image = paddle.image.to_chw(image)
+        return image, label_sub1, label_sub2, label_sub4
+    def load(self, image, label):
+        """
+        Load image from file.
+        """
+        image = paddle.image.load_image(
+            DATA_PATH + "/" + image, is_color=True).astype("float32")
+        image -= IMG_MEAN
+        label = paddle.image.load_image(
+            DATA_PATH + "/" + label, is_color=False).astype("float32")
+        return image, label
+    def random_flip(self, image, label):
+        """
+        Flip image and label randomly.
+        """
+        r = np.random.rand(1)
+        if r > 0.5:
+            image = paddle.image.left_right_flip(image, is_color=True)
+            label = paddle.image.left_right_flip(label, is_color=False)
+        return image, label
+    def random_scaling(self, image, label):
+        """
+        Scale image and label randomly.
+        """
+        scale = np.random.uniform(0.5, 2.0, 1)[0]
+        h_new = int(image.shape[0] * scale)
+        w_new = int(image.shape[1] * scale)
+        image = cv2.resize(image, (w_new, h_new))
+        label = cv2.resize(
+            label, (w_new, h_new), interpolation=cv2.INTER_NEAREST)
+        return image, label
+    def padding_as(self, image, h, w, is_color):
+        """
+        Padding image.
+        """
+        pad_h = max(image.shape[0], h) - image.shape[0]
+        pad_w = max(image.shape[1], w) - image.shape[1]
+        if is_color:
+            return np.pad(image, ((0, pad_h), (0, pad_w), (0, 0)), 'constant')
+        else:
+            return np.pad(image, ((0, pad_h), (0, pad_w)), 'constant')
+    def resize(self, image, label, out_size):
+        """
+        Resize image and label by padding or cropping.
+        """
+        ignore_label = IGNORE_LABEL
+        label = label - ignore_label
+        if len(label.shape) == 2:
+            label = label[:, :, np.newaxis]
+        combined = np.concatenate((image, label), axis=2)
+        combined = self.padding_as(
+            combined, out_size[0], out_size[1], is_color=True)
+        combined = paddle.image.random_crop(
+            combined, out_size[0], is_color=True)
+        image = combined[:, :, 0:3]
+        label = combined[:, :, 3:4] + ignore_label
+        return image, label
+    def scale_label(self, label, factor):
+        """
+        Scale label according to factor.
+        """
+        h = label.shape[0] / factor
+        w = label.shape[1] / factor
+        return cv2.resize(
+            label, (h, w), interpolation=cv2.INTER_NEAREST)[:, :, np.newaxis]
+    def mask(self, image, label0, label1, label2):
+        """
+        Get mask for valid pixels.
+        """
+        mask_sub1 = np.where(((label0 < (NUM_CLASSES + 1)) & (
+            label0 != IGNORE_LABEL)).flatten())[0].astype("int32")
+        mask_sub2 = np.where(((label1 < (NUM_CLASSES + 1)) & (
+            label1 != IGNORE_LABEL)).flatten())[0].astype("int32")
+        mask_sub4 = np.where(((label2 < (NUM_CLASSES + 1)) & (
+            label2 != IGNORE_LABEL)).flatten())[0].astype("int32")
+        return image.astype(
+            "float32"), label0, mask_sub1, label1, mask_sub2, label2, mask_sub4
+def train(batch_size=32, flip=True, scaling=True):
+    """
+    Cityscape training set reader.
+    It returns a reader, in which each result is a batch with batch_size samples.
+    :param batch_size: The batch size of each result return by the reader.
+    :type batch_size: int
+    :param flip: Whether flip images randomly.
+    :type batch_size: bool
+    :param scaling: Whether scale images randomly.
+    :type batch_size: bool
+    :return: Training reader.
+    :rtype: callable
+    """
+    reader = DataGenerater(
+        TRAIN_LIST, flip=flip, scaling=scaling).create_train_reader(batch_size)
+    return reader
+def test():
+    """
+    Cityscape validation set reader.
+    It returns a reader, in which each result is a sample.
+    :return: Training reader.
+    :rtype: callable
+    """
+    reader = DataGenerater(TEST_LIST).create_test_reader()
+    return reader
+def infer(image_list=TEST_LIST):
+    """
+    Infer set reader.
+    It returns a reader, in which each result is a sample.
+    :param image_list: The image list file in which each line is a path of image to be infered.
+    :type batch_size: str
+    :return: Infer reader.
+    :rtype: callable
+    """
+    reader = DataGenerater(image_list).create_test_reader()
--- a/fluid/icnet/eval.py
+++ b/fluid/icnet/eval.py
+"""Evaluator for ICNet model."""
+import paddle.fluid as fluid
+import numpy as np
+from utils import add_arguments, print_arguments, get_feeder_data
+from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
+from paddle.fluid.initializer import init_on_cpu
+from icnet import icnet
+import cityscape
+import argparse
+import functools
+import sys
+import os
+parser = argparse.ArgumentParser(description=__doc__)
+add_arg = functools.partial(add_arguments, argparser=parser)
+# yapf: disable
+add_arg('model_path',        str,   None,         "Model path.")
+add_arg('use_gpu',           bool,  True,       "Whether use GPU to test.")
+# yapf: enable
+def cal_mean_iou(wrong, correct):
+    sum = wrong + cerroct
+    true_num = (sum != 0).sum()
+    for i in len(sum):
+        if sum[i] == 0:
+            sum[i] = 1
+    return (cerroct.astype("float64") / sum).sum() / true_num
+def create_iou(predict, label, mask, num_classes, image_shape):
+    predict = fluid.layers.resize_bilinear(predict, out_shape=image_shape[1:3])
+    predict = fluid.layers.transpose(predict, perm=[0, 2, 3, 1])
+    predict = fluid.layers.reshape(predict, shape=[-1, num_classes])
+    label = fluid.layers.reshape(label, shape=[-1, 1])
+    _, predict = fluid.layers.topk(predict, k=1)
+    predict = fluid.layers.cast(predict, dtype="float32")
+    predict = fluid.layers.gather(predict, mask)
+    label = fluid.layers.gather(label, mask)
+    label = fluid.layers.cast(label, dtype="int32")
+    predict = fluid.layers.cast(predict, dtype="int32")
+    iou, out_w, out_r = fluid.layers.mean_iou(predict, label, num_classes)
+    return iou, out_w, out_r
+def eval(args):
+    data_shape = cityscape.test_data_shape()
+    num_classes = cityscape.num_classes()
+    # define network
+    images = fluid.layers.data(name='image', shape=data_shape, dtype='float32')
+    label = fluid.layers.data(name='label', shape=[1], dtype='int32')
+    mask = fluid.layers.data(name='mask', shape=[-1], dtype='int32')
+    _, _, sub124_out = icnet(images, num_classes,
+                             np.array(data_shape[1:]).astype("float32"))
+    iou, out_w, out_r = create_iou(sub124_out, label, mask, num_classes,
+                                   data_shape)
+    inference_program = fluid.default_main_program().clone(for_test=True)
+    # prepare environment
+    place = fluid.CPUPlace()
+    if args.use_gpu:
+        place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    assert os.path.exists(args.model_path)
+    fluid.io.load_params(exe, args.model_path)
+    print "loaded model from: %s" % args.model_path
+    sys.stdout.flush()
+    fetch_vars = [iou, out_w, out_r]
+    out_wrong = np.zeros([num_classes]).astype("int64")
+    out_right = np.zeros([num_classes]).astype("int64")
+    count = 0
+    test_reader = cityscape.test()
+    for data in test_reader():
+        count += 1
+        result = exe.run(inference_program,
+                         feed=get_feeder_data(
+                             data, place, for_test=True),
+                         fetch_list=fetch_vars)
+        out_wrong += result[1]
+        out_right += result[2]
+        print "count: %s; current iou: %.3f;\r" % (count, result[0]),
+        sys.stdout.flush()
+    iou = cal_mean_iou(out_wrong, out_right)
+    print "\nmean iou: %.3f" % iou
+def main():
+    args = parser.parse_args()
+    print_arguments(args)
+    eval(args)
+if __name__ == "__main__":
+    main()
--- a/fluid/icnet/icnet.py
+++ b/fluid/icnet/icnet.py
+import paddle.fluid as fluid
+import numpy as np
+import sys
+def conv(input,
+         k_h,
+         k_w,
+         c_o,
+         s_h,
+         s_w,
+         relu=False,
+         padding="VALID",
+         biased=False,
+         name=None):
+    act = None
+    tmp = input
+    if relu:
+        act = "relu"
+    if padding == "SAME":
+        padding_h = max(k_h - s_h, 0)
+        padding_w = max(k_w - s_w, 0)
+        padding_top = padding_h / 2
+        padding_left = padding_w / 2
+        padding_bottom = padding_h - padding_top
+        padding_right = padding_w - padding_left
+        padding = [
+            0, 0, 0, 0, padding_top, padding_bottom, padding_left, padding_right
+        ]
+        tmp = fluid.layers.pad(tmp, padding)
+    tmp = fluid.layers.conv2d(
+        tmp,
+        num_filters=c_o,
+        filter_size=[k_h, k_w],
+        stride=[s_h, s_w],
+        groups=1,
+        act=act,
+        bias_attr=biased,
+        use_cudnn=False,
+        name=name)
+    return tmp
+def atrous_conv(input,
+                k_h,
+                k_w,
+                c_o,
+                dilation,
+                relu=False,
+                padding="VALID",
+                biased=False,
+                name=None):
+    act = None
+    if relu:
+        act = "relu"
+    tmp = input
+    if padding == "SAME":
+        padding_h = max(k_h - s_h, 0)
+        padding_w = max(k_w - s_w, 0)
+        padding_top = padding_h / 2
+        padding_left = padding_w / 2
+        padding_bottom = padding_h - padding_top
+        padding_right = padding_w - padding_left
+        padding = [
+            0, 0, 0, 0, padding_top, padding_bottom, padding_left, padding_right
+        ]
+        tmp = fluid.layers.pad(tmp, padding)
+    tmp = fluid.layers.conv2d(
+        input,
+        num_filters=c_o,
+        filter_size=[k_h, k_w],
+        dilation=dilation,
+        groups=1,
+        act=act,
+        bias_attr=biased,
+        use_cudnn=False,
+        name=name)
+    return tmp
+def zero_padding(input, padding):
+    return fluid.layers.pad(input,
+                            [0, 0, 0, 0, padding, padding, padding, padding])
+def bn(input, relu=False, name=None, is_test=False):
+    act = None
+    if relu:
+        act = 'relu'
+    name = input.name.split(".")[0] + "_bn"
+    tmp = fluid.layers.batch_norm(
+        input, act=act, momentum=0.95, epsilon=1e-5, name=name)
+    return tmp
+def avg_pool(input, k_h, k_w, s_h, s_w, name=None, padding=0):
+    temp = fluid.layers.pool2d(
+        input,
+        pool_size=[k_h, k_w],
+        pool_type="avg",
+        pool_stride=[s_h, s_w],
+        pool_padding=padding,
+        name=name)
+    return temp
+def max_pool(input, k_h, k_w, s_h, s_w, name=None, padding=0):
+    temp = fluid.layers.pool2d(
+        input,
+        pool_size=[k_h, k_w],
+        pool_type="max",
+        pool_stride=[s_h, s_w],
+        pool_padding=padding,
+        name=name)
+    return temp
+def interp(input, out_shape):
+    out_shape = list(out_shape.astype("int32"))
+    return fluid.layers.resize_bilinear(input, out_shape=out_shape)
+def dilation_convs(input):
+    tmp = res_block(input, filter_num=256, padding=1, name="conv3_2")
+    tmp = res_block(tmp, filter_num=256, padding=1, name="conv3_3")
+    tmp = res_block(tmp, filter_num=256, padding=1, name="conv3_4")
+    tmp = proj_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_1")
+    tmp = res_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_2")
+    tmp = res_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_3")
+    tmp = res_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_4")
+    tmp = res_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_5")
+    tmp = res_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_6")
+    tmp = proj_block(
+        tmp, filter_num=1024, padding=4, dilation=4, name="conv5_1")
+    tmp = res_block(tmp, filter_num=1024, padding=4, dilation=4, name="conv5_2")
+    tmp = res_block(tmp, filter_num=1024, padding=4, dilation=4, name="conv5_3")
+    return tmp
+def pyramis_pooling(input, input_shape):
+    shape = np.ceil(input_shape / 32).astype("int32")
+    h, w = shape
+    pool1 = avg_pool(input, h, w, h, w)
+    pool1_interp = interp(pool1, shape)
+    pool2 = avg_pool(input, h / 2, w / 2, h / 2, w / 2)
+    pool2_interp = interp(pool2, shape)
+    pool3 = avg_pool(input, h / 3, w / 3, h / 3, w / 3)
+    pool3_interp = interp(pool3, shape)
+    pool4 = avg_pool(input, h / 4, w / 4, h / 4, w / 4)
+    pool4_interp = interp(pool4, shape)
+    conv5_3_sum = input + pool4_interp + pool3_interp + pool2_interp + pool1_interp
+    return conv5_3_sum
+def shared_convs(image):
+    tmp = conv(image, 3, 3, 32, 2, 2, padding='SAME', name="conv1_1_3_3_s2")
+    tmp = bn(tmp, relu=True)
+    tmp = conv(tmp, 3, 3, 32, 1, 1, padding='SAME', name="conv1_2_3_3")
+    tmp = bn(tmp, relu=True)
+    tmp = conv(tmp, 3, 3, 64, 1, 1, padding='SAME', name="conv1_3_3_3")
+    tmp = bn(tmp, relu=True)
+    tmp = max_pool(tmp, 3, 3, 2, 2, padding=[1, 1])
+    tmp = proj_block(tmp, filter_num=128, padding=0, name="conv2_1")
+    tmp = res_block(tmp, filter_num=128, padding=1, name="conv2_2")
+    tmp = res_block(tmp, filter_num=128, padding=1, name="conv2_3")
+    tmp = proj_block(tmp, filter_num=256, padding=1, stride=2, name="conv3_1")
+    return tmp
+def res_block(input, filter_num, padding=0, dilation=None, name=None):
+    tmp = conv(input, 1, 1, filter_num / 4, 1, 1, name=name + "_1_1_reduce")
+    tmp = bn(tmp, relu=True)
+    tmp = zero_padding(tmp, padding=padding)
+    if dilation is None:
+        tmp = conv(tmp, 3, 3, filter_num / 4, 1, 1, name=name + "_3_3")
+    else:
+        tmp = atrous_conv(
+            tmp, 3, 3, filter_num / 4, dilation, name=name + "_3_3")
+    tmp = bn(tmp, relu=True)
+    tmp = conv(tmp, 1, 1, filter_num, 1, 1, name=name + "_1_1_increase")
+    tmp = bn(tmp, relu=False)
+    tmp = input + tmp
+    tmp = fluid.layers.relu(tmp, name=name + "_relu")
+    return tmp
+def proj_block(input, filter_num, padding=0, dilation=None, stride=1,
+               name=None):
+    proj = conv(
+        input, 1, 1, filter_num, stride, stride, name=name + "_1_1_proj")
+    proj_bn = bn(proj, relu=False)
+    tmp = conv(
+        input, 1, 1, filter_num / 4, stride, stride, name=name + "_1_1_reduce")
+    tmp = bn(tmp, relu=True)
+    tmp = zero_padding(tmp, padding=padding)
+    if padding == 0:
+        padding = 'SAME'
+    else:
+        padding = 'VALID'
+    if dilation is None:
+        tmp = conv(
+            tmp,
+            3,
+            3,
+            filter_num / 4,
+            1,
+            1,
+            padding=padding,
+            name=name + "_3_3")
+    else:
+        tmp = atrous_conv(
+            tmp,
+            3,
+            3,
+            filter_num / 4,
+            dilation,
+            padding=padding,
+            name=name + "_3_3")
+    tmp = bn(tmp, relu=True)
+    tmp = conv(tmp, 1, 1, filter_num, 1, 1, name=name + "_1_1_increase")
+    tmp = bn(tmp, relu=False)
+    tmp = proj_bn + tmp
+    tmp = fluid.layers.relu(tmp, name=name + "_relu")
+    return tmp
+def sub_net_4(input, input_shape):
+    tmp = interp(input, out_shape=np.ceil(input_shape / 32))
+    tmp = dilation_convs(tmp)
+    tmp = pyramis_pooling(tmp, input_shape)
+    tmp = conv(tmp, 1, 1, 256, 1, 1, name="conv5_4_k1")
+    tmp = bn(tmp, relu=True)
+    tmp = interp(tmp, input_shape / 16)
+    return tmp
+def sub_net_2(input):
+    tmp = conv(input, 1, 1, 128, 1, 1, name="conv3_1_sub2_proj")
+    tmp = bn(tmp, relu=False)
+    return tmp
+def sub_net_1(input):
+    tmp = conv(input, 3, 3, 32, 2, 2, padding='SAME', name="conv1_sub1")
+    tmp = bn(tmp, relu=True)
+    tmp = conv(tmp, 3, 3, 32, 2, 2, padding='SAME', name="conv2_sub1")
+    tmp = bn(tmp, relu=True)
+    tmp = conv(tmp, 3, 3, 64, 2, 2, padding='SAME', name="conv3_sub1")
+    tmp = bn(tmp, relu=True)
+    tmp = conv(tmp, 1, 1, 128, 1, 1, name="conv3_sub1_proj")
+    tmp = bn(tmp, relu=False)
+    return tmp
+def CCF24(sub2_out, sub4_out, input_shape):
+    tmp = zero_padding(sub4_out, padding=2)
+    tmp = atrous_conv(tmp, 3, 3, 128, 2, name="conv_sub4")
+    tmp = bn(tmp, relu=False)
+    tmp = tmp + sub2_out
+    tmp = fluid.layers.relu(tmp)
+    tmp = interp(tmp, input_shape / 8)
+    return tmp
+def CCF124(sub1_out, sub24_out, input_shape):
+    tmp = zero_padding(sub24_out, padding=2)
+    tmp = atrous_conv(tmp, 3, 3, 128, 2, name="conv_sub2")
+    tmp = bn(tmp, relu=False)
+    tmp = tmp + sub1_out
+    tmp = fluid.layers.relu(tmp)
+    tmp = interp(tmp, input_shape / 4)
+    return tmp
+def icnet(data, num_classes, input_shape):
+    image_sub1 = data
+    image_sub2 = interp(data, out_shape=input_shape * 0.5)
+    s_convs = shared_convs(image_sub2)
+    sub4_out = sub_net_4(s_convs, input_shape)
+    sub2_out = sub_net_2(s_convs)
+    sub1_out = sub_net_1(image_sub1)
+    sub24_out = CCF24(sub2_out, sub4_out, input_shape)
+    sub124_out = CCF124(sub1_out, sub24_out, input_shape)
+    conv6_cls = conv(
+        sub124_out, 1, 1, num_classes, 1, 1, biased=True, name="conv6_cls")
+    sub4_out = conv(
+        sub4_out, 1, 1, num_classes, 1, 1, biased=True, name="sub4_out")
+    sub24_out = conv(
+        sub24_out, 1, 1, num_classes, 1, 1, biased=True, name="sub24_out")
+    return sub4_out, sub24_out, conv6_cls
--- a/fluid/icnet/images/icnet.png
+++ b/fluid/icnet/images/icnet.png
--- a/fluid/icnet/images/result.png
+++ b/fluid/icnet/images/result.png
--- a/fluid/icnet/images/train_loss.png
+++ b/fluid/icnet/images/train_loss.png
--- a/fluid/icnet/infer.py
+++ b/fluid/icnet/infer.py
+"""Infer for ICNet model."""
+import cityscape
+import argparse
+import functools
+import sys
+import os
+import cv2
+import paddle.fluid as fluid
+import paddle.v2 as paddle
+from icnet import icnet
+from utils import add_arguments, print_arguments, get_feeder_data
+from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
+from paddle.fluid.initializer import init_on_cpu
+import numpy as np
+IMG_MEAN = np.array((103.939, 116.779, 123.68), dtype=np.float32)
+parser = argparse.ArgumentParser(description=__doc__)
+add_arg = functools.partial(add_arguments, argparser=parser)
+# yapf: disable
+add_arg('model_path',        str,   None,         "Model path.")
+add_arg('images_list',       str,   None,         "List file with images to be infered.")
+add_arg('images_path',       str,   None,         "The images path.")
+add_arg('out_path',          str,   "./output",         "Output path.")
+add_arg('use_gpu',           bool,  True,       "Whether use GPU to test.")
+# yapf: enable
+data_shape = [3, 1024, 2048]
+num_classes = 19
+label_colours = [
+    [128, 64, 128],
+    [244, 35, 231],
+    [69, 69, 69]
+    # 0 = road, 1 = sidewalk, 2 = building
+    ,
+    [102, 102, 156],
+    [190, 153, 153],
+    [153, 153, 153]
+    # 3 = wall, 4 = fence, 5 = pole
+    ,
+    [250, 170, 29],
+    [219, 219, 0],
+    [106, 142, 35]
+    # 6 = traffic light, 7 = traffic sign, 8 = vegetation
+    ,
+    [152, 250, 152],
+    [69, 129, 180],
+    [219, 19, 60]
+    # 9 = terrain, 10 = sky, 11 = person
+    ,
+    [255, 0, 0],
+    [0, 0, 142],
+    [0, 0, 69]
+    # 12 = rider, 13 = car, 14 = truck
+    ,
+    [0, 60, 100],
+    [0, 79, 100],
+    [0, 0, 230]
+    # 15 = bus, 16 = train, 17 = motocycle
+    ,
+    [119, 10, 32]
+]
+# 18 = bicycle
+def color(input):
+    """
+    Convert infered result to color image.
+    """
+    result = []
+    for i in input.flatten():
+        result.append(
+            [label_colours[i][2], label_colours[i][1], label_colours[i][0]])
+    result = np.array(result).reshape([input.shape[0], input.shape[1], 3])
+    return result
+def infer(args):
+    data_shape = cityscape.test_data_shape()
+    num_classes = cityscape.num_classes()
+    # define network
+    images = fluid.layers.data(name='image', shape=data_shape, dtype='float32')
+    _, _, sub124_out = icnet(images, num_classes,
+                             np.array(data_shape[1:]).astype("float32"))
+    predict = fluid.layers.resize_bilinear(
+        sub124_out, out_shape=data_shape[1:3])
+    predict = fluid.layers.transpose(predict, perm=[0, 2, 3, 1])
+    predict = fluid.layers.reshape(predict, shape=[-1, num_classes])
+    _, predict = fluid.layers.topk(predict, k=1)
+    predict = fluid.layers.reshape(
+        predict,
+        shape=[data_shape[1], data_shape[2], -1])  # batch_size should be 1
+    inference_program = fluid.default_main_program().clone(for_test=True)
+    # prepare environment
+    place = fluid.CPUPlace()
+    if args.use_gpu:
+        place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    assert os.path.exists(args.model_path)
+    fluid.io.load_params(exe, args.model_path)
+    print "loaded model from: %s" % args.model_path
+    sys.stdout.flush()
+    if not os.path.isdir(args.out_path):
+        os.makedirs(args.out_path)
+    for line in open(args.images_list):
+        image_file = args.images_path + "/" + line.strip()
+        filename = os.path.basename(image_file)
+        image = paddle.image.load_image(
+            image_file, is_color=True).astype("float32")
+        image -= IMG_MEAN
+        img = paddle.image.to_chw(image)[np.newaxis, :]
+        image_t = fluid.core.LoDTensor()
+        image_t.set(img, place)
+        result = exe.run(inference_program,
+                         feed={"image": image_t},
+                         fetch_list=[predict])
+        cv2.imwrite(args.out_path + "/" + filename + "_result.png",
+                    color(result[0]))
+def main():
+    args = parser.parse_args()
+    print_arguments(args)
+    infer(args)
+if __name__ == "__main__":
+    main()
--- a/fluid/icnet/train.py
+++ b/fluid/icnet/train.py
+"""Trainer for ICNet model."""
+from icnet import icnet
+import cityscape
+import argparse
+import functools
+import sys
+import time
+import paddle.fluid as fluid
+import numpy as np
+from utils import add_arguments, print_arguments, get_feeder_data
+from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
+from paddle.fluid.initializer import init_on_cpu
+parser = argparse.ArgumentParser(description=__doc__)
+add_arg = functools.partial(add_arguments, argparser=parser)
+# yapf: disable
+add_arg('batch_size',        int,   16,         "Minibatch size.")
+add_arg('checkpoint_path',   str,   None,       "Checkpoint svae path.")
+add_arg('init_model',        str,   None,       "Pretrain model path.")
+add_arg('use_gpu',           bool,  True,       "Whether use GPU to train.")
+add_arg('random_mirror',     bool,  True,       "Whether prepare by random mirror.")
+add_arg('random_scaling',    bool,  True,       "Whether prepare by random scaling.")
+# yapf: enable
+LAMBDA1 = 0.16
+LAMBDA2 = 0.4
+LAMBDA3 = 1.0
+LEARNING_RATE = 0.003
+POWER = 0.9
+LOG_PERIOD = 1
+CHECKPOINT_PERIOD = 1000
+TOTAL_STEP = 60000
+no_grad_set = []
+def create_loss(predict, label, mask, num_classes):
+    predict = fluid.layers.transpose(predict, perm=[0, 2, 3, 1])
+    predict = fluid.layers.reshape(predict, shape=[-1, num_classes])
+    label = fluid.layers.reshape(label, shape=[-1, 1])
+    predict = fluid.layers.gather(predict, mask)
+    label = fluid.layers.gather(label, mask)
+    label = fluid.layers.cast(label, dtype="int64")
+    loss = fluid.layers.softmax_with_cross_entropy(predict, label)
+    no_grad_set.append(label.name)
+    return fluid.layers.reduce_mean(loss)
+def poly_decay():
+    global_step = _decay_step_counter()
+    with init_on_cpu():
+        decayed_lr = LEARNING_RATE * (fluid.layers.pow(
+            (1 - global_step / TOTAL_STEP), POWER))
+    return decayed_lr
+def train(args):
+    data_shape = cityscape.train_data_shape()
+    num_classes = cityscape.num_classes()
+    # define network
+    images = fluid.layers.data(name='image', shape=data_shape, dtype='float32')
+    label_sub1 = fluid.layers.data(name='label_sub1', shape=[1], dtype='int32')
+    label_sub2 = fluid.layers.data(name='label_sub2', shape=[1], dtype='int32')
+    label_sub4 = fluid.layers.data(name='label_sub4', shape=[1], dtype='int32')
+    mask_sub1 = fluid.layers.data(name='mask_sub1', shape=[-1], dtype='int32')
+    mask_sub2 = fluid.layers.data(name='mask_sub2', shape=[-1], dtype='int32')
+    mask_sub4 = fluid.layers.data(name='mask_sub4', shape=[-1], dtype='int32')
+    sub4_out, sub24_out, sub124_out = icnet(
+        images, num_classes, np.array(data_shape[1:]).astype("float32"))
+    loss_sub4 = create_loss(sub4_out, label_sub4, mask_sub4, num_classes)
+    loss_sub24 = create_loss(sub24_out, label_sub2, mask_sub2, num_classes)
+    loss_sub124 = create_loss(sub124_out, label_sub1, mask_sub1, num_classes)
+    reduced_loss = LAMBDA1 * loss_sub4 + LAMBDA2 * loss_sub24 + LAMBDA3 * loss_sub124
+    regularizer = fluid.regularizer.L2Decay(0.0001)
+    optimizer = fluid.optimizer.Momentum(
+        learning_rate=poly_decay(), momentum=0.9, regularization=regularizer)
+    _, params_grads = optimizer.minimize(reduced_loss, no_grad_set=no_grad_set)
+    # prepare environment
+    place = fluid.CPUPlace()
+    if args.use_gpu:
+        place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    if args.init_model is not None:
+        print "load model from: %s" % args.init_model
+        sys.stdout.flush()
+        fluid.io.load_params(exe, args.init_model)
+    iter_id = 0
+    t_loss = 0.
+    sub4_loss = 0.
+    sub24_loss = 0.
+    sub124_loss = 0.
+    train_reader = cityscape.train(
+        args.batch_size, flip=args.random_mirror, scaling=args.random_scaling)
+    while True:
+        # train a pass
+        for data in train_reader():
+            if iter_id > TOTAL_STEP:
+                return
+            iter_id += 1
+            results = exe.run(
+                feed=get_feeder_data(data, place),
+                fetch_list=[reduced_loss, loss_sub4, loss_sub24, loss_sub124])
+            t_loss += results[0]
+            sub4_loss += results[1]
+            sub24_loss += results[2]
+            sub124_loss += results[3]
+            # training log
+            if iter_id % LOG_PERIOD == 0:
+                print "Iter[%d]; train loss: %.3f; sub4_loss: %.3f; sub24_loss: %.3f; sub124_loss: %.3f" % (
+                    iter_id, t_loss / LOG_PERIOD, sub4_loss / LOG_PERIOD,
+                    sub24_loss / LOG_PERIOD, sub124_loss / LOG_PERIOD)
+                t_loss = 0.
+                sub4_loss = 0.
+                sub24_loss = 0.
+                sub124_loss = 0.
+                sys.stdout.flush()
+            if iter_id % CHECKPOINT_PERIOD == 0:
+                dir_name = args.checkpoint_path + "/" + str(iter_id)
+                fluid.io.save_persistables(exe, dirname=dir_name)
+                print "Saved checkpoint: %s" % (dir_name)
+def main():
+    args = parser.parse_args()
+    print_arguments(args)
+    train(args)
+if __name__ == "__main__":
+    main()
--- a/fluid/icnet/utils.py
+++ b/fluid/icnet/utils.py
+"""Contains common utility functions."""
+#  Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import distutils.util
+import numpy as np
+from paddle.fluid import core
+def print_arguments(args):
+    """Print argparse's arguments.
+    Usage:
+    .. code-block:: python
+        parser = argparse.ArgumentParser()
+        parser.add_argument("name", default="Jonh", type=str, help="User name.")
+        args = parser.parse_args()
+        print_arguments(args)
+    :param args: Input argparse.Namespace for printing.
+    :type args: argparse.Namespace
+    """
+    print("-----------  Configuration Arguments -----------")
+    for arg, value in sorted(vars(args).iteritems()):
+        print("%s: %s" % (arg, value))
+    print("------------------------------------------------")
+def add_arguments(argname, type, default, help, argparser, **kwargs):
+    """Add argparse's argument.
+    Usage:
+    .. code-block:: python
+        parser = argparse.ArgumentParser()
+        add_argument("name", str, "Jonh", "User name.", parser)
+        args = parser.parse_args()
+    """
+    type = distutils.util.strtobool if type == bool else type
+    argparser.add_argument(
+        "--" + argname,
+        default=default,
+        type=type,
+        help=help + ' Default: %(default)s.',
+        **kwargs)
+def to_lodtensor(data, place):
+    seq_lens = [len(seq) for seq in data]
+    cur_len = 0
+    lod = [cur_len]
+    for l in seq_lens:
+        cur_len += l
+        lod.append(cur_len)
+    flattened_data = np.concatenate(data, axis=0).astype("int32")
+    flattened_data = flattened_data.reshape([len(flattened_data), 1])
+    res = core.LoDTensor()
+    res.set(flattened_data, place)
+    res.set_lod([lod])
+    return res
+def get_feeder_data(data, place, for_test=False):
+    feed_dict = {}
+    image_t = core.LoDTensor()
+    image_t.set(data[0], place)
+    feed_dict["image"] = image_t
+    if not for_test:
+        labels_sub1_t = core.LoDTensor()
+        labels_sub2_t = core.LoDTensor()
+        labels_sub4_t = core.LoDTensor()
+        mask_sub1_t = core.LoDTensor()
+        mask_sub2_t = core.LoDTensor()
+        mask_sub4_t = core.LoDTensor()
+        labels_sub1_t.set(data[1], place)
+        labels_sub2_t.set(data[3], place)
+        mask_sub1_t.set(data[2], place)
+        mask_sub2_t.set(data[4], place)
+        labels_sub4_t.set(data[5], place)
+        mask_sub4_t.set(data[6], place)
+        feed_dict["label_sub1"] = labels_sub1_t
+        feed_dict["label_sub2"] = labels_sub2_t
+        feed_dict["mask_sub1"] = mask_sub1_t
+        feed_dict["mask_sub2"] = mask_sub2_t
+        feed_dict["label_sub4"] = labels_sub4_t
+        feed_dict["mask_sub4"] = mask_sub4_t
+    else:
+        label_t = core.LoDTensor()
+        mask_t = core.LoDTensor()
+        label_t.set(data[1], place)
+        mask_t.set(data[2], place)
+        feed_dict["label"] = label_t
+        feed_dict["mask"] = mask_t
+    return feed_dict
--- a/fluid/image_classification/README.md
+++ b/fluid/image_classification/README.md
-The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+# Image Classification and Model Zoo
+Image classification, which is an important field of computer vision, is to classify an image into pre-defined labels. Recently, many researchers developed different kinds of neural networks and highly improve the classification performance. This page introduces how to do image classification with PaddlePaddle Fluid, including [data preparation](#data-preparation), [training](#training-a-model), [finetuning](#finetuning), [evaluation](#evaluation) and [inference](#inference).
 ---
+## Table of Contents
+- [Installation](#installation)
+- [Data preparation](#data-preparation)
+- [Training a model with flexible parameters](#training-a-model)
+- [Finetuning](#finetuning)
+- [Evaluation](#evaluation)
+- [Inference](#inference)
+- [Supported models and performances](#supported-models)
-# SE-ResNeXt for image classification
+## Installation
-This model built with paddle fluid is still under active development and is not
+Running sample code in this directory requires PaddelPaddle Fluid v0.13.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update.
-the final version. We welcome feedbacks.
-## Introduction
+## Data preparation
-The current code support the training of [SE-ResNeXt](https://arxiv.org/abs/1709.01507) (50/152 layers).
+An example for ImageNet classification is as follows. First of all, preparation of imagenet data can be done as:
+```
+cd data/ILSVRC2012/
+sh download_imagenet2012.sh
+```
-## Data Preparation
+In the shell script ```download_imagenet2012.sh```,  there are three steps to prepare data:
-1. Download ImageNet-2012 dataset
+**step-1:** Register at ```image-net.org``` first in order to get a pair of ```Username``` and ```AccessKey```, which are used to download ImageNet data.
-```
-cd data/
-mkdir -p ILSVRC2012/
-cd ILSVRC2012/
-# get training set
-wget http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_train.tar
-# get validation set
-wget http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_val.tar
-# prepare directory
-tar xf ILSVRC2012_img_train.tar
-tar xf ILSVRC2012_img_val.tar
-# unzip all classes data using unzip.sh
+**step-2:** Download ImageNet-2012 dataset from website. The training and validation data will be downloaded into folder "train" and "val" respectively. Please note that the size of data is more than 40 GB, it will take much time to download. Users who have downloaded the ImageNet data can organize it into ```data/ILSVRC2012``` directly.
-sh unzip.sh
-```
-2. Download training and validation label files from [ImageNet2012 url](https://pan.baidu.com/s/1Y6BCo0nmxsm_FsEqmx2hKQ)(password:```wx99```). Untar it into workspace ```ILSVRC2012/```. The files include
+**step-3:** Download training and validation label files. There are two label files which contain train and validation image labels respectively:
-**train_list.txt**: training list of imagenet 2012 classification task, with each line seperated by SPACE.
+* *train_list.txt*: label file of imagenet-2012 training set, with each line seperated by ```SPACE```, like:
 ```
 train/n02483708/n02483708_2436.jpeg 369
 train/n03998194/n03998194_7015.jpeg 741
@@ -41,7 +40,7 @@ train/n04596742/n04596742_3032.jpeg 909
 train/n03208938/n03208938_7065.jpeg 535
 ...
 ```
-**val_list.txt**: validation list of imagenet 2012 classification task, with each line seperated by SPACE.
+* *val_list.txt*: label file of imagenet-2012 validation set, with each line seperated by ```SPACE```, like.
 ```
 val/ILSVRC2012_val_00000001.jpeg 65
 val/ILSVRC2012_val_00000002.jpeg 970
@@ -50,38 +49,160 @@ val/ILSVRC2012_val_00000004.jpeg 809
 val/ILSVRC2012_val_00000005.jpeg 516
 ...
 ```
-**synset_words.txt**: the semantic label of each class.
-## Training a model
+## Training a model with flexible parameters
-To start a training task, one can use command line as:
+After data preparation, one can start the training step by:
 ```
-python train.py --num_layers=50 --batch_size=8 --with_mem_opt=True --parallel_exe=False
+python train.py \
+       --model=SE_ResNeXt50_32x4d \
+       --batch_size=32 \
+       --total_images=1281167 \
+       --class_dim=1000
+       --image_shape=3,224,224 \
+       --model_save_dir=output/ \
+       --with_mem_opt=False \
+       --lr_strategy=piecewise_decay \
+       --lr=0.1
 ```
-## Finetune a model
+**parameter introduction:**
+* **model**: name model to use. Default: "SE_ResNeXt50_32x4d".
+* **num_epochs**: the number of epochs. Default: 120.
+* **batch_size**: the size of each mini-batch. Default: 256.
+* **use_gpu**: whether to use GPU or not. Default: True.
+* **total_images**: total number of images in the training set. Default: 1281167.
+* **class_dim**: the class number of the classification task. Default: 1000.
+* **image_shape**: input size of the network. Default: "3,224,224".
+* **model_save_dir**: the directory to save trained model. Default: "output".
+* **with_mem_opt**: whether to use memory optimization or not. Default: False.
+* **lr_strategy**: learning rate changing strategy. Default: "piecewise_decay".
+* **lr**: initialized learning rate. Default: 0.1.
+* **pretrained_model**: model path for pretraining. Default: None.
+* **checkpoint**: the checkpoint path to resume. Default: None.
+**data reader introduction:** Data reader is defined in ```reader.py```. In [training stage](#training-a-model), random crop and flipping are used, while center crop is used in [evaluation](#inference) and [inference](#inference) stages. Supported data augmentation includes:
+* rotation
+* color jitter
+* random crop
+* center crop
+* resize
+* flipping
+**training curve:** The training curve can be drawn based on training log. For example, the log from training AlexNet is like:
 ```
-python train.py --num_layers=50 --batch_size=8 --with_mem_opt=True --parallel_exe=False --pretrained_model="pretrain/96/"
+End pass 1, train_loss 6.23153877258, train_acc1 0.0150696625933, train_acc5 0.0552518665791, test_loss 5.41981744766, test_acc1 0.0519132651389, test_acc5 0.156150355935
+End pass 2, train_loss 5.15442800522, train_acc1 0.0784279331565, train_acc5 0.211050540209, test_loss 4.45795249939, test_acc1 0.140469551086, test_acc5 0.333163291216
+End pass 3, train_loss 4.51505613327, train_acc1 0.145300447941, train_acc5 0.331567406654, test_loss 3.86548018456, test_acc1 0.219443559647, test_acc5 0.446448504925
+End pass 4, train_loss 4.12735557556, train_acc1 0.19437250495, train_acc5 0.405713528395, test_loss 3.56990146637, test_acc1 0.264536827803, test_acc5 0.507190704346
+End pass 5, train_loss 3.87505435944, train_acc1 0.229518383741, train_acc5 0.453582793474, test_loss 3.35345435143, test_acc1 0.297349333763, test_acc5 0.54753267765
+End pass 6, train_loss 3.6929500103, train_acc1 0.255628824234, train_acc5 0.487188398838, test_loss 3.17112898827, test_acc1 0.326953113079, test_acc5 0.581780135632
+End pass 7, train_loss 3.55882954597, train_acc1 0.275381118059, train_acc5 0.511990904808, test_loss 3.03736782074, test_acc1 0.349035382271, test_acc5 0.606293857098
+End pass 8, train_loss 3.45595097542, train_acc1 0.291462600231, train_acc5 0.530815005302, test_loss 2.96034455299, test_acc1 0.362228929996, test_acc5 0.617390751839
+End pass 9, train_loss 3.3745200634, train_acc1 0.303871691227, train_acc5 0.545210540295, test_loss 2.93932366371, test_acc1 0.37129303813, test_acc5 0.623573005199
+...
 ```
-TBD
-## Inference
+The error rate curves of AlexNet, ResNet50 and SE-ResNeXt-50 are shown in the figure below.
+<p align="center">
+<img src="images/curve.jpg" height=480 width=640 hspace='10'/> <br />
+Training and validation Curves
+</p>
+## Finetuning
+Finetuning is to finetune model weights in a specific task by loading pretrained weights. After initializing ```path_to_pretrain_model```, one can finetune a model as:
 ```
-python infer.py --num_layers=50 --batch_size=8 --model='model/90' --test_list=''
+python train.py
+       --model=SE_ResNeXt50_32x4d \
+       --pretrained_model=${path_to_pretrain_model} \
+       --batch_size=32 \
+       --total_images=1281167 \
+       --class_dim=1000 \
+       --image_shape=3,224,224 \
+       --model_save_dir=output/ \
+       --with_mem_opt=True \
+       --lr_strategy=piecewise_decay \
+       --lr=0.1
 ```
-TBD
-## Results
+## Evaluation
+Evaluation is to evaluate the performance of a trained model. One can download [pretrained models](#supported-models) and set its path to ```path_to_pretrain_model```. Then top1/top5 accuracy can be obtained by running the following command:
+```
+python eval.py \
+       --model=SE_ResNeXt50_32x4d \
+       --batch_size=32 \
+       --class_dim=1000 \
+       --image_shape=3,224,224 \
+       --with_mem_opt=True \
+       --pretrained_model=${path_to_pretrain_model}
+```
-The SE-ResNeXt-50 model is trained by starting with learning rate ```0.1``` and decaying it by ```0.1``` after each ```10``` epoches. Top-1/Top-5 Validation Accuracy on ImageNet 2012 is listed in table.
+According to the congfiguration of evaluation, the output log is like:
+```
+Testbatch 0,loss 2.1786134243, acc1 0.625,acc5 0.8125,time 0.48 sec
+Testbatch 10,loss 0.898496925831, acc1 0.75,acc5 0.9375,time 0.51 sec
+Testbatch 20,loss 1.32524681091, acc1 0.6875,acc5 0.9375,time 0.37 sec
+Testbatch 30,loss 1.46830511093, acc1 0.5,acc5 0.9375,time 0.51 sec
+Testbatch 40,loss 1.12802267075, acc1 0.625,acc5 0.9375,time 0.35 sec
+Testbatch 50,loss 0.881597697735, acc1 0.8125,acc5 1.0,time 0.32 sec
+Testbatch 60,loss 0.300163716078, acc1 0.875,acc5 1.0,time 0.48 sec
+Testbatch 70,loss 0.692037761211, acc1 0.875,acc5 1.0,time 0.35 sec
+Testbatch 80,loss 0.0969972759485, acc1 1.0,acc5 1.0,time 0.41 sec
+...
+```
-|model | [original paper(Fig.5)](https://arxiv.org/abs/1709.01507) | Pytorch | Paddle fluid
+## Inference
-|- | :-: |:-: | -:
+Inference is used to get prediction score or image features based on trained models.
-|SE-ResNeXt-50 | 77.6%/- | 77.71%/93.63% | 77.42%/93.50%
+```
+python infer.py \
+       --model=SE_ResNeXt50_32x4d \
+       --batch_size=32 \
+       --class_dim=1000 \
+       --image_shape=3,224,224 \
+       --with_mem_opt=True \
+       --pretrained_model=${path_to_pretrain_model}
+```
+The output contains predication results, including maximum score (before softmax) and corresponding predicted label.
+```
+Test-0-score: [13.168352], class [491]
+Test-1-score: [7.913302], class [975]
+Test-2-score: [16.959702], class [21]
+Test-3-score: [14.197695], class [383]
+Test-4-score: [12.607652], class [878]
+Test-5-score: [17.725458], class [15]
+Test-6-score: [12.678599], class [118]
+Test-7-score: [12.353498], class [505]
+Test-8-score: [20.828007], class [747]
+Test-9-score: [15.135801], class [315]
+Test-10-score: [14.585114], class [920]
+Test-11-score: [13.739927], class [679]
+Test-12-score: [15.040644], class [386]
+...
+```
+## Supported models and performances
+Models are trained by starting with learning rate ```0.1``` and decaying it by ```0.1``` after each pre-defined epoches, if not special introduced. Available top-1/top-5 validation accuracy on ImageNet 2012 are listed in table. Pretrained models can be downloaded by clicking related model names.
-## Released models
+|model | top-1/top-5 accuracy
-|model | Baidu Cloud
 |- | -:
-|SE-ResNeXt-50 | [url]()
+|[AlexNet](http://paddle-imagenet-models.bj.bcebos.com/alexnet_model.tar) | 57.21%/79.72%
-TBD
+|VGG11 | -
+|VGG13 | -
+|VGG16 | -
+|VGG19 | -
+|GoogleNet | -
+|InceptionV4 | -
+|MobileNet | -
+|[ResNet50](http://paddle-imagenet-models.bj.bcebos.com/resnet_50_model.tar) | 76.63%/93.10%
+|ResNet101 | -
+|ResNet152 | -
+|[SE_ResNeXt50_32x4d](http://paddle-imagenet-models.bj.bcebos.com/se_resnext_50_model.tar) | 78.33%/93.96%
+|SE_ResNeXt101_32x4d | -
+|SE_ResNeXt152_32x4d | -
+|DPN68 | -
+|DPN92 | -
+|DPN98 | -
+|DPN107 | -
+|DPN131 | -
--- a/fluid/image_classification/README_cn.md
+++ b/fluid/image_classification/README_cn.md
--- a/fluid/image_classification/__init__.py
+++ b/fluid/image_classification/__init__.py
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/compare.py
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/compare.py
@@ -20,8 +20,8 @@ def calc_diff(f1, f2):
    d1 = np.load(f1)
    d2 = np.load(f2)
-    print d1.shape
+    #print d1.shape
-    print d2.shape
+    #print d2.shape
    #print d1[0, 0, 0:10, 0:10]
    #print d2[0, 0, 0:10, 0:10]
    #d1 = d1[:, :, 1:-2, 1:-2]

--- a/fluid/image_classification/caffe2fluid/examples/imagenet/infer.py
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/infer.py
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/tools/cmp.sh
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/tools/cmp.sh
@@ -19,4 +19,6 @@ if [[ $# -eq 3 ]];then
 else
    caffe_file="./results/${model_name}.caffe/${2}.npy"
 fi
-python ./compare.py $paddle_file $caffe_file
+cmd="python ./compare.py $paddle_file $caffe_file"
+echo $cmd
+eval $cmd
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/tools/cmp_layers.sh
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/tools/cmp_layers.sh
@@ -3,7 +3,7 @@
 #function:
 #   a tool used to compare all layers' results
 #
+#set -x
 if [[ $# -ne 1 ]];then
    echo "usage:"
    echo "  bash $0 [model_name]"
@@ -13,11 +13,20 @@ fi
 model_name=$1
 prototxt="models.caffe/$model_name/${model_name}.prototxt"
-layers=$(cat $prototxt | perl -ne 'if(/^\s+name\s*:\s*\"([^\"]+)/){print $1."\n";}')
+cat $prototxt | grep name | perl -ne 'if(/^\s*name\s*:\s+\"([^\"]+)/){ print $1."\n";}' >.layer_names
+final_layer=$(cat $prototxt | perl -ne 'if(/^\s*top\s*:\s+\"([^\"]+)/){ print $1."\n";}' | tail -n1)
+ret=$(grep "^$final_layer$" .layer_names | wc -l)
+if [[ $ret -eq 0 ]];then
+    echo $final_layer >>.layer_names
+fi
-for i in $layers;do
+for i in $(cat .layer_names);do
+    i=${i//\//_}
    cf_npy="results/${model_name}.caffe/${i}.npy"
-    pd_npy="results/${model_name}.paddle/${i}.npy"
+    #pd_npy="results/${model_name}.paddle/${i}.npy"
+    #pd_npy=$(find results/${model_name}.paddle -iname "${i}*.npy" | head -n1)
+    pd_npy=$(find results/${model_name}.paddle -iname "${i}.*npy" | grep deleted -v | head -n1)
    if [[ ! -e $cf_npy ]];then
        echo "caffe's result not exist[$cf_npy]"

--- a/fluid/image_classification/caffe2fluid/examples/imagenet/tools/diff.sh
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/tools/diff.sh
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/tools/run.sh
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/tools/run.sh
@@ -71,7 +71,9 @@ if [[ -z $only_convert ]];then
    if [[ -z $net_name ]];then
        net_name="MyNet"
    fi
-    $PYTHON ./infer.py dump $net_file $weight_file $imgfile $net_name
+    cmd="$PYTHON ./infer.py dump $net_file $weight_file $imgfile $net_name"
+    echo $cmd
+    eval $cmd
    ret=$?
 fi
 exit $ret
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/tools/test.sh
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/tools/test.sh
+#!/bin/bash
+#
+#script to test all models
+#
+models="alexnet vgg16 googlenet resnet152 resnet101 resnet50"
+for i in $models;do
+    echo "begin to process $i"
+    bash ./tools/diff.sh $i 2>&1
+    echo "finished to process $i with ret[$?]"
+done
--- a/fluid/image_classification/caffe2fluid/kaffe/custom_layers/argmax.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/custom_layers/argmax.py
--- a/fluid/image_classification/caffe2fluid/kaffe/custom_layers/axpy.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/custom_layers/axpy.py
@@ -43,7 +43,7 @@ def axpy_layer(inputs, name):
    x = inputs[1]
    y = inputs[2]
    output = fluid.layers.elementwise_mul(x, alpha, axis=0)
-    output = fluid.layers.elementwise_add(output, y)
+    output = fluid.layers.elementwise_add(output, y, name=name)
    return output

--- a/fluid/image_classification/caffe2fluid/kaffe/graph.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/graph.py
--- a/fluid/image_classification/caffe2fluid/kaffe/layers.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/layers.py
--- a/fluid/image_classification/caffe2fluid/kaffe/net_template.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/net_template.py
--- a/fluid/image_classification/caffe2fluid/kaffe/paddle/network.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/paddle/network.py
--- a/fluid/image_classification/caffe2fluid/kaffe/paddle/transformer.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/paddle/transformer.py
--- a/fluid/image_classification/caffe2fluid/kaffe/transformers.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/transformers.py
--- a/fluid/image_classification/data/ILSVRC2012/download_imagenet2012.sh
+++ b/fluid/image_classification/data/ILSVRC2012/download_imagenet2012.sh
--- a/fluid/image_classification/data/ILSVRC2012/unzip.sh
+++ b/fluid/image_classification/data/ILSVRC2012/unzip.sh
--- a/fluid/image_classification/eval.py
+++ b/fluid/image_classification/eval.py
--- a/fluid/image_classification/images/curve.jpg
+++ b/fluid/image_classification/images/curve.jpg
--- a/fluid/image_classification/inception_v4.py
+++ b/fluid/image_classification/inception_v4.py
--- a/fluid/image_classification/infer.py
+++ b/fluid/image_classification/infer.py
--- a/fluid/image_classification/mobilenet.py
+++ b/fluid/image_classification/mobilenet.py
--- a/fluid/image_classification/models/__init__.py
+++ b/fluid/image_classification/models/__init__.py
--- a/fluid/image_classification/models/alexnet.py
+++ b/fluid/image_classification/models/alexnet.py
--- a/fluid/image_classification/models/dpn.py
+++ b/fluid/image_classification/models/dpn.py
--- a/fluid/image_classification/models/googlenet.py
+++ b/fluid/image_classification/models/googlenet.py
--- a/fluid/image_classification/models/inception_v4.py
+++ b/fluid/image_classification/models/inception_v4.py
--- a/fluid/image_classification/models/learning_rate.py
+++ b/fluid/image_classification/models/learning_rate.py
--- a/fluid/image_classification/models/mobilenet.py
+++ b/fluid/image_classification/models/mobilenet.py
--- a/fluid/image_classification/models/resnet.py
+++ b/fluid/image_classification/models/resnet.py
--- a/fluid/image_classification/models/se_resnext.py
+++ b/fluid/image_classification/models/se_resnext.py
--- a/fluid/image_classification/models/vgg.py
+++ b/fluid/image_classification/models/vgg.py
--- a/fluid/image_classification/reader.py
+++ b/fluid/image_classification/reader.py
--- a/fluid/image_classification/se_resnext.py
+++ b/fluid/image_classification/se_resnext.py
--- a/fluid/image_classification/train.py
+++ b/fluid/image_classification/train.py
--- a/fluid/neural_machine_translation/transformer/config.py
+++ b/fluid/neural_machine_translation/transformer/config.py
--- a/fluid/neural_machine_translation/transformer/infer.py
+++ b/fluid/neural_machine_translation/transformer/infer.py
--- a/fluid/neural_machine_translation/transformer/model.py
+++ b/fluid/neural_machine_translation/transformer/model.py
--- a/fluid/neural_machine_translation/transformer/train.py
+++ b/fluid/neural_machine_translation/transformer/train.py
--- a/fluid/object_detection/.gitignore
+++ b/fluid/object_detection/.gitignore
--- a/fluid/object_detection/README.md
+++ b/fluid/object_detection/README.md
--- a/fluid/object_detection/README_cn.md
+++ b/fluid/object_detection/README_cn.md
--- a/fluid/object_detection/eval.py
+++ b/fluid/object_detection/eval.py
--- a/fluid/object_detection/eval_coco_map.py
+++ b/fluid/object_detection/eval_coco_map.py
--- a/fluid/object_detection/image_util.py
+++ b/fluid/object_detection/image_util.py
--- a/fluid/object_detection/images/009943.jpg
+++ b/fluid/object_detection/images/009943.jpg
--- a/fluid/object_detection/images/009956.jpg
+++ b/fluid/object_detection/images/009956.jpg
--- a/fluid/object_detection/images/009960.jpg
+++ b/fluid/object_detection/images/009960.jpg
--- a/fluid/object_detection/images/009962.jpg
+++ b/fluid/object_detection/images/009962.jpg
--- a/fluid/object_detection/images/SSD_paper_figure.jpg
+++ b/fluid/object_detection/images/SSD_paper_figure.jpg
--- a/fluid/object_detection/infer.py
+++ b/fluid/object_detection/infer.py
--- a/fluid/object_detection/train.py
+++ b/fluid/object_detection/train.py
--- a/fluid/ocr_recognition/README.md
+++ b/fluid/ocr_recognition/README.md
--- a/fluid/ocr_recognition/crnn_ctc_model.py
+++ b/fluid/ocr_recognition/crnn_ctc_model.py
--- a/fluid/ocr_recognition/ctc_reader.py
+++ b/fluid/ocr_recognition/ctc_reader.py
--- a/fluid/ocr_recognition/ctc_train.py
+++ b/fluid/ocr_recognition/ctc_train.py
--- a/fluid/ocr_recognition/eval.py
+++ b/fluid/ocr_recognition/eval.py
--- a/fluid/ocr_recognition/images/demo.jpg
+++ b/fluid/ocr_recognition/images/demo.jpg
--- a/fluid/ocr_recognition/images/train.jpg
+++ b/fluid/ocr_recognition/images/train.jpg
--- a/fluid/text_classification/clouds/scdb_parallel_executor.py
+++ b/fluid/text_classification/clouds/scdb_parallel_executor.py
--- a/fluid/text_classification/clouds/scdb_single_card.py
+++ b/fluid/text_classification/clouds/scdb_single_card.py
--- a/fluid/text_classification/nets.py
+++ b/fluid/text_classification/nets.py
--- a/generate_chinese_poetry/README_en.md
+++ b/generate_chinese_poetry/README_en.md
--- a/ltr/README_en.md
+++ b/ltr/README_en.md
--- a/ltr/images/LambdaRank_EN.png
+++ b/ltr/images/LambdaRank_EN.png
--- a/ltr/images/ranknet_en.png
+++ b/ltr/images/ranknet_en.png
--- a/nested_sequence/README_en.md
+++ b/nested_sequence/README_en.md
--- a/nested_sequence/text_classification/README.md
+++ b/nested_sequence/text_classification/README.md
--- a/nested_sequence/text_classification/README_en.md
+++ b/nested_sequence/text_classification/README_en.md
--- a/scheduled_sampling/README_en.md
+++ b/scheduled_sampling/README_en.md