Merge branch 'develop' of https://github.com/PaddlePaddle/models into model_avg

7df53c9a · wanghaoshuang · 311c92a4 · 36ca3876 · 7df53c9a · 7df53c9a
115 changed file
--- a/conv_seq2seq/README.md
+++ b/conv_seq2seq/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Convolutional Sequence to Sequence Learning
 This model implements the work in the following paper:


--- a/ctr/README.cn.md
+++ b/ctr/README.cn.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 点击率预估

 以下是本例目录包含的文件以及对应说明:

--- a/ctr/README.md
+++ b/ctr/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Click-Through Rate Prediction

 ## Introduction

--- a/deep_fm/README.md
+++ b/deep_fm/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Deep Factorization Machine for Click-Through Rate prediction

 ## Introduction

--- a/dssm/README.cn.md
+++ b/dssm/README.cn.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此版本要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 深度结构化语义模型 (Deep Structured Semantic Models, DSSM)
 DSSM使用DNN模型在一个连续的语义空间中学习文本低纬的表示向量，并且建模两个句子间的语义相似度。本例演示如何使用PaddlePaddle实现一个通用的DSSM 模型，用于建模两个字符串间的语义相似度，模型实现支持通用的数据格式，用户替换数据便可以在真实场景中使用该模型。


--- a/dssm/README.md
+++ b/dssm/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Deep Structured Semantic Models (DSSM)
 Deep Structured Semantic Models (DSSM) is simple but powerful DNN based model for matching web search queries and the URL based documents. This example demonstrates how to use PaddlePaddle to implement a generic DSSM model for modeling the semantic similarity between two strings.


--- a/fluid/DeepASR/README.md
+++ b/fluid/DeepASR/README.md
-Deep ASR Kickoff
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+### TODO
+
+This project is still under active development.
--- a/fluid/DeepASR/data_utils/data_reader.py
+++ b/fluid/DeepASR/data_utils/data_reader.py
--- a/fluid/DeepASR/data_utils/augmentor/tests/__init__.py
+++ b/fluid/DeepASR/data_utils/augmentor/tests/__init__.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
+import data_utils.augmentor.trans_add_delta as trans_add_delta
+import data_utils.augmentor.trans_splice as trans_splice
--- a/fluid/DeepASR/data_utils/util.py
+++ b/fluid/DeepASR/data_utils/util.py
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-import sys
+import sys, time
 from six import reraise
 from tblib import Traceback
+from multiprocessing import Manager, Process
+import posix_ipc, mmap

 import numpy as np

@@ -35,21 +37,177 @@ def lodtensor_to_ndarray(lod_tensor):
    return ret, lod_tensor.lod()


+def batch_to_ndarray(batch_samples, lod):
+    frame_dim = batch_samples[0][0].shape[1]
+    batch_feature = np.zeros((lod[-1], frame_dim), dtype="float32")
+    batch_label = np.zeros((lod[-1], 1), dtype="int64")
+    start = 0
+    for sample in batch_samples:
+        frame_num = sample[0].shape[0]
+        batch_feature[start:start + frame_num, :] = sample[0]
+        batch_label[start:start + frame_num, :] = sample[1]
+        start += frame_num
+    return (batch_feature, batch_label)
+
+
+def split_infer_result(infer_seq, lod):
+    infer_batch = []
+    for i in xrange(0, len(lod[0]) - 1):
+        infer_batch.append(infer_seq[lod[0][i]:lod[0][i + 1]])
+    return infer_batch
+
+
+class DaemonProcessGroup(object):
+    def __init__(self, proc_num, target, args):
+        self._proc_num = proc_num
+        self._workers = [
+            Process(
+                target=target, args=args) for _ in xrange(self._proc_num)
+        ]
+
+    def start_all(self):
+        for w in self._workers:
+            w.daemon = True
+            w.start()
+
+    @property
+    def proc_num(self):
+        return self._proc_num
+
+
+class EpochEndSignal(object):
+    pass
+
+
+class CriticalException(Exception):
+    pass
+
+
+class SharedNDArray(object):
+    """SharedNDArray utilizes shared memory to avoid data serialization when
+    data object shared among different processes. We can reconstruct the
+    `ndarray` when memory address, shape and dtype provided.
+
+    Args:
+        name (str): Address name of shared memory.
+        whether_verify (bool): Whether to validate the writing operation.
+    """
+
+    def __init__(self, name, whether_verify=False):
+        self._name = name
+        self._shm = None
+        self._buf = None
+        self._array = np.zeros(1, dtype=np.float32)
+        self._inited = False
+        self._whether_verify = whether_verify
+
+    def zeros_like(self, shape, dtype):
+        size = int(np.prod(shape)) * np.dtype(dtype).itemsize
+        if self._inited:
+            self._shm = posix_ipc.SharedMemory(self._name)
+        else:
+            self._shm = posix_ipc.SharedMemory(
+                self._name, posix_ipc.O_CREAT, size=size)
+        self._buf = mmap.mmap(self._shm.fd, size)
+        self._array = np.ndarray(shape, dtype, self._buf, order='C')
+
+    def copy(self, ndarray):
+        size = int(np.prod(ndarray.shape)) * np.dtype(ndarray.dtype).itemsize
+        self.zeros_like(ndarray.shape, ndarray.dtype)
+        self._array[:] = ndarray
+        self._buf.flush()
+        self._inited = True
+
+        if self._whether_verify:
+            shm = posix_ipc.SharedMemory(self._name)
+            buf = mmap.mmap(shm.fd, size)
+            array = np.ndarray(ndarray.shape, ndarray.dtype, buf, order='C')
+            np.testing.assert_array_equal(array, ndarray)
+
+    @property
+    def ndarray(self):
+        return self._array
+
+    def recycle(self, pool):
+        self._buf.close()
+        self._shm.close_fd()
+        self._inited = False
+        pool[self._name] = self
+
+    def __getstate__(self):
+        return (self._name, self._array.shape, self._array.dtype, self._inited,
+                self._whether_verify)
+
+    def __setstate__(self, state):
+        self._name = state[0]
+        self._inited = state[3]
+        self.zeros_like(state[1], state[2])
+        self._whether_verify = state[4]
+
+
+class SharedMemoryPoolManager(object):
+    """SharedMemoryPoolManager maintains a multiprocessing.Manager.dict object.
+    All available addresses are allocated once and will be reused. Though this
+    class is not process-safe, the pool can be shared between processes. All
+    shared memory should be unlinked before the main process exited.
+
+    Args:
+        pool_size (int): Size of shared memory pool.
+        manager (dict): A multiprocessing.Manager object, the pool is
+                        maintained by the proxy process.
+        name_prefix (str): Address prefix of shared memory.
+    """
+
+    def __init__(self, pool_size, manager, name_prefix='/deep_asr'):
+        self._names = []
+        self._dict = manager.dict()
+        self._time_prefix = time.strftime('%Y%m%d%H%M%S')
+
+        for i in xrange(pool_size):
+            name = name_prefix + '_' + self._time_prefix + '_' + str(i)
+            self._dict[name] = SharedNDArray(name)
+            self._names.append(name)
+
+    @property
+    def pool(self):
+        return self._dict
+
+    def __del__(self):
+        for name in self._names:
+            # have to unlink the shared memory
+            posix_ipc.unlink_shared_memory(name)
+
+
 def suppress_signal(signo, stack_frame):
    pass


-def suppress_complaints(verbose):
+def suppress_complaints(verbose, notify=None):
    def decorator_maker(func):
        def suppress_warpper(*args, **kwargs):
            try:
                func(*args, **kwargs)
            except:
                et, ev, tb = sys.exc_info()
-                tb = Traceback(tb)
-                if verbose == 1:
-                    reraise(et, ev, tb.as_traceback())
+
+                if notify is not None:
+                    notify(except_type=et, except_value=ev, traceback=tb)
+
+                if verbose == 1 or isinstance(ev, CriticalException):
+                    reraise(et, ev, Traceback(tb).as_traceback())

        return suppress_warpper

    return decorator_maker
+
+
+class ForceExitWrapper(object):
+    def __init__(self, exit_flag):
+        self._exit_flag = exit_flag
+
+    @suppress_complaints(verbose=0)
+    def __call__(self, *args, **kwargs):
+        self._exit_flag.value = True
+
+    def __eq__(self, flag):
+        return self._exit_flag.value == flag
--- a/fluid/DeepASR/decoder/post_decode_faster.cc
+++ b/fluid/DeepASR/decoder/post_decode_faster.cc
+/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. */
+
+#include "post_decode_faster.h"
+
+typedef kaldi::int32 int32;
+using fst::SymbolTable;
+using fst::VectorFst;
+using fst::StdArc;
+
+Decoder::Decoder(std::string word_syms_filename,
+                 std::string fst_in_filename,
+                 std::string logprior_rxfilename) {
+  const char* usage =
+      "Decode, reading log-likelihoods (of transition-ids or whatever symbol "
+      "is on the graph) as matrices.";
+
+  kaldi::ParseOptions po(usage);
+  binary = true;
+  acoustic_scale = 1.5;
+  allow_partial = true;
+  kaldi::FasterDecoderOptions decoder_opts;
+  decoder_opts.Register(&po, true);  // true == include obscure settings.
+  po.Register("binary", &binary, "Write output in binary mode");
+  po.Register("allow-partial",
+              &allow_partial,
+              "Produce output even when final state was not reached");
+  po.Register("acoustic-scale",
+              &acoustic_scale,
+              "Scaling factor for acoustic likelihoods");
+
+  word_syms = NULL;
+  if (word_syms_filename != "") {
+    word_syms = fst::SymbolTable::ReadText(word_syms_filename);
+    if (!word_syms)
+      KALDI_ERR << "Could not read symbol table from file "
+                << word_syms_filename;
+  }
+
+  std::ifstream is_logprior(logprior_rxfilename);
+  logprior.Read(is_logprior, false);
+
+  // It's important that we initialize decode_fst after loglikes_reader, as it
+  // can prevent crashes on systems installed without enough virtual memory.
+  // It has to do with what happens on UNIX systems if you call fork() on a
+  // large process: the page-table entries are duplicated, which requires a
+  // lot of virtual memory.
+  decode_fst = fst::ReadFstKaldi(fst_in_filename);
+
+  decoder = new kaldi::FasterDecoder(*decode_fst, decoder_opts);
+}
+
+
+Decoder::~Decoder() {
+  if (!word_syms) delete word_syms;
+  delete decode_fst;
+  delete decoder;
+}
+
+std::string Decoder::decode(
+    std::string key,
+    const std::vector<std::vector<kaldi::BaseFloat>>& log_probs) {
+  size_t num_frames = log_probs.size();
+  size_t dim_label = log_probs[0].size();
+
+  kaldi::Matrix<kaldi::BaseFloat> loglikes(
+      num_frames, dim_label, kaldi::kSetZero, kaldi::kStrideEqualNumCols);
+  for (size_t i = 0; i < num_frames; ++i) {
+    memcpy(loglikes.Data() + i * dim_label,
+           log_probs[i].data(),
+           sizeof(kaldi::BaseFloat) * dim_label);
+  }
+
+  return decode(key, loglikes);
+}
+
+
+std::vector<std::string> Decoder::decode(std::string posterior_rspecifier) {
+  kaldi::SequentialBaseFloatMatrixReader posterior_reader(posterior_rspecifier);
+  std::vector<std::string> decoding_results;
+
+  for (; !posterior_reader.Done(); posterior_reader.Next()) {
+    std::string key = posterior_reader.Key();
+    kaldi::Matrix<kaldi::BaseFloat> loglikes(posterior_reader.Value());
+
+    decoding_results.push_back(decode(key, loglikes));
+  }
+
+  return decoding_results;
+}
+
+
+std::string Decoder::decode(std::string key,
+                            kaldi::Matrix<kaldi::BaseFloat>& loglikes) {
+  std::string decoding_result;
+
+  if (loglikes.NumRows() == 0) {
+    KALDI_WARN << "Zero-length utterance: " << key;
+  }
+  KALDI_ASSERT(loglikes.NumCols() == logprior.Dim());
+
+  loglikes.ApplyLog();
+  loglikes.AddVecToRows(-1.0, logprior);
+
+  kaldi::DecodableMatrixScaled decodable(loglikes, acoustic_scale);
+  decoder->Decode(&decodable);
+
+  VectorFst<kaldi::LatticeArc> decoded;  // linear FST.
+
+  if ((allow_partial || decoder->ReachedFinal()) &&
+      decoder->GetBestPath(&decoded)) {
+    if (!decoder->ReachedFinal())
+      KALDI_WARN << "Decoder did not reach end-state, outputting partial "
+                    "traceback.";
+
+    std::vector<int32> alignment;
+    std::vector<int32> words;
+    kaldi::LatticeWeight weight;
+
+    GetLinearSymbolSequence(decoded, &alignment, &words, &weight);
+
+    if (word_syms != NULL) {
+      for (size_t i = 0; i < words.size(); i++) {
+        std::string s = word_syms->Find(words[i]);
+        decoding_result += s;
+        if (s == "")
+          KALDI_ERR << "Word-id " << words[i] << " not in symbol table.";
+      }
+    }
+  }
+
+  return decoding_result;
+}
--- a/fluid/DeepASR/decoder/post_decode_faster.h
+++ b/fluid/DeepASR/decoder/post_decode_faster.h
+/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. */
+
+#include <string>
+#include <vector>
+#include "base/kaldi-common.h"
+#include "base/timer.h"
+#include "decoder/decodable-matrix.h"
+#include "decoder/faster-decoder.h"
+#include "fstext/fstext-lib.h"
+#include "hmm/transition-model.h"
+#include "lat/kaldi-lattice.h"  // for {Compact}LatticeArc
+#include "tree/context-dep.h"
+#include "util/common-utils.h"
+
+
+class Decoder {
+public:
+  Decoder(std::string word_syms_filename,
+          std::string fst_in_filename,
+          std::string logprior_rxfilename);
+  ~Decoder();
+
+  // Interface to accept the scores read from specifier and return
+  // the batch decoding results
+  std::vector<std::string> decode(std::string posterior_rspecifier);
+
+  // Accept the scores of one utterance and return the decoding result
+  std::string decode(
+      std::string key,
+      const std::vector<std::vector<kaldi::BaseFloat>> &log_probs);
+
+private:
+  // For decoding one utterance
+  std::string decode(std::string key,
+                     kaldi::Matrix<kaldi::BaseFloat> &loglikes);
+
+  fst::SymbolTable *word_syms;
+  fst::VectorFst<fst::StdArc> *decode_fst;
+  kaldi::FasterDecoder *decoder;
+  kaldi::Vector<kaldi::BaseFloat> logprior;
+
+  bool binary;
+  kaldi::BaseFloat acoustic_scale;
+  bool allow_partial;
+};
--- a/fluid/DeepASR/decoder/pybind.cc
+++ b/fluid/DeepASR/decoder/pybind.cc
+/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. */
+
+#include <pybind11/pybind11.h>
+#include <pybind11/stl.h>
+
+#include "post_decode_faster.h"
+
+namespace py = pybind11;
+
+PYBIND11_MODULE(post_decode_faster, m) {
+  m.doc() = "Decoder for Deep ASR model";
+
+  py::class_<Decoder>(m, "Decoder")
+      .def(py::init<std::string, std::string, std::string>())
+      .def("decode",
+           (std::vector<std::string> (Decoder::*)(std::string)) &
+               Decoder::decode,
+           "Decode for the probability matrices in specifier "
+           "and return the transcriptions.")
+      .def(
+          "decode",
+          (std::string (Decoder::*)(
+              std::string, const std::vector<std::vector<kaldi::BaseFloat>>&)) &
+              Decoder::decode,
+          "Decode one input probability matrix "
+          "and return the transcription.");
+}
--- a/fluid/DeepASR/decoder/setup.py
+++ b/fluid/DeepASR/decoder/setup.py
+#  Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import glob
+from distutils.core import setup, Extension
+from distutils.sysconfig import get_config_vars
+
+try:
+    kaldi_root = os.environ['KALDI_ROOT']
+except:
+    raise ValueError("Enviroment variable 'KALDI_ROOT' is not defined. Please "
+                     "install kaldi and export KALDI_ROOT=<kaldi's root dir> .")
+
+args = [
+    '-std=c++11', '-Wno-sign-compare', '-Wno-unused-variable',
+    '-Wno-unused-local-typedefs', '-Wno-unused-but-set-variable',
+    '-Wno-deprecated-declarations', '-Wno-unused-function'
+]
+
+# remove warning about -Wstrict-prototypes
+(opt, ) = get_config_vars('OPT')
+os.environ['OPT'] = " ".join(flag for flag in opt.split()
+                             if flag != '-Wstrict-prototypes')
+os.environ['CC'] = 'g++'
+
+LIBS = [
+    'fst', 'kaldi-base', 'kaldi-util', 'kaldi-matrix', 'kaldi-tree',
+    'kaldi-hmm', 'kaldi-fstext', 'kaldi-decoder', 'kaldi-lat'
+]
+
+LIB_DIRS = [
+    'tools/openfst/lib', 'src/base', 'src/matrix', 'src/util', 'src/tree',
+    'src/hmm', 'src/fstext', 'src/decoder', 'src/lat'
+]
+LIB_DIRS = [os.path.join(kaldi_root, path) for path in LIB_DIRS]
+LIB_DIRS = [os.path.abspath(path) for path in LIB_DIRS]
+
+ext_modules = [
+    Extension(
+        'post_decode_faster',
+        ['pybind.cc', 'post_decode_faster.cc'],
+        include_dirs=[
+            'pybind11/include', '.', os.path.join(kaldi_root, 'src'),
+            os.path.join(kaldi_root, 'tools/openfst/src/include')
+        ],
+        language='c++',
+        libraries=LIBS,
+        library_dirs=LIB_DIRS,
+        runtime_library_dirs=LIB_DIRS,
+        extra_compile_args=args, ),
+]
+
+setup(
+    name='post_decode_faster',
+    version='0.0.1',
+    author='Paddle',
+    author_email='',
+    description='Decoder for Deep ASR model',
+    ext_modules=ext_modules, )
--- a/fluid/DeepASR/decoder/setup.sh
+++ b/fluid/DeepASR/decoder/setup.sh
+set -e
+
+if [ ! -d pybind11 ]; then
+    git clone https://github.com/pybind/pybind11.git
+fi 
+
+python setup.py build_ext -i 
--- a/fluid/DeepASR/infer.py
+++ b/fluid/DeepASR/infer.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import argparse
+import paddle.fluid as fluid
+import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
+import data_utils.augmentor.trans_add_delta as trans_add_delta
+import data_utils.augmentor.trans_splice as trans_splice
+import data_utils.data_reader as reader
+from data_utils.util import lodtensor_to_ndarray
+from data_utils.util import split_infer_result
+
+
+def parse_args():
+    parser = argparse.ArgumentParser("Inference for stacked LSTMP model.")
+    parser.add_argument(
+        '--batch_size',
+        type=int,
+        default=32,
+        help='The sequence number of a batch data. (default: %(default)d)')
+    parser.add_argument(
+        '--device',
+        type=str,
+        default='GPU',
+        choices=['CPU', 'GPU'],
+        help='The device type. (default: %(default)s)')
+    parser.add_argument(
+        '--mean_var',
+        type=str,
+        default='data/global_mean_var_search26kHr',
+        help="The path for feature's global mean and variance. "
+        "(default: %(default)s)")
+    parser.add_argument(
+        '--infer_feature_lst',
+        type=str,
+        default='data/infer_feature.lst',
+        help='The feature list path for inference. (default: %(default)s)')
+    parser.add_argument(
+        '--infer_label_lst',
+        type=str,
+        default='data/infer_label.lst',
+        help='The label list path for inference. (default: %(default)s)')
+    parser.add_argument(
+        '--infer_model_path',
+        type=str,
+        default='./infer_models/deep_asr.pass_0.infer.model/',
+        help='The directory for loading inference model. '
+        '(default: %(default)s)')
+    args = parser.parse_args()
+    return args
+
+
+def print_arguments(args):
+    print('-----------  Configuration Arguments -----------')
+    for arg, value in sorted(vars(args).iteritems()):
+        print('%s: %s' % (arg, value))
+    print('------------------------------------------------')
+
+
+def infer(args):
+    """ Gets one batch of feature data and predicts labels for each sample.
+    """
+
+    if not os.path.exists(args.infer_model_path):
+        raise IOError("Invalid inference model path!")
+
+    place = fluid.CUDAPlace(0) if args.device == 'GPU' else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    # load model
+    [infer_program, feed_dict,
+     fetch_targets] = fluid.io.load_inference_model(args.infer_model_path, exe)
+
+    ltrans = [
+        trans_add_delta.TransAddDelta(2, 2),
+        trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
+        trans_splice.TransSplice()
+    ]
+
+    infer_data_reader = reader.DataReader(args.infer_feature_lst,
+                                          args.infer_label_lst)
+    infer_data_reader.set_transformers(ltrans)
+
+    feature_t = fluid.LoDTensor()
+    one_batch = infer_data_reader.batch_iterator(args.batch_size, 1).next()
+    (features, labels, lod) = one_batch
+    feature_t.set(features, place)
+    feature_t.set_lod([lod])
+
+    results = exe.run(infer_program,
+                      feed={feed_dict[0]: feature_t},
+                      fetch_list=fetch_targets,
+                      return_numpy=False)
+
+    probs, lod = lodtensor_to_ndarray(results[0])
+    preds = probs.argmax(axis=1)
+    infer_batch = split_infer_result(preds, lod)
+    for index, sample in enumerate(infer_batch):
+        print("result %d: " % index, sample, '\n')
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    print_arguments(args)
+    infer(args)
--- a/fluid/DeepASR/infer_by_ckpt.py
+++ b/fluid/DeepASR/infer_by_ckpt.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import sys
+import os
+import numpy as np
+import argparse
+import time
+
+import paddle.fluid as fluid
+import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
+import data_utils.augmentor.trans_add_delta as trans_add_delta
+import data_utils.augmentor.trans_splice as trans_splice
+import data_utils.async_data_reader as reader
+from decoder.post_decode_faster import Decoder
+from data_utils.util import lodtensor_to_ndarray
+from model_utils.model import stacked_lstmp_model
+from data_utils.util import split_infer_result
+
+
+def parse_args():
+    parser = argparse.ArgumentParser("Run inference by using checkpoint.")
+    parser.add_argument(
+        '--batch_size',
+        type=int,
+        default=32,
+        help='The sequence number of a batch data. (default: %(default)d)')
+    parser.add_argument(
+        '--minimum_batch_size',
+        type=int,
+        default=1,
+        help='The minimum sequence number of a batch data. '
+        '(default: %(default)d)')
+    parser.add_argument(
+        '--frame_dim',
+        type=int,
+        default=120 * 11,
+        help='Frame dimension of feature data. (default: %(default)d)')
+    parser.add_argument(
+        '--stacked_num',
+        type=int,
+        default=5,
+        help='Number of lstmp layers to stack. (default: %(default)d)')
+    parser.add_argument(
+        '--proj_dim',
+        type=int,
+        default=512,
+        help='Project size of lstmp unit. (default: %(default)d)')
+    parser.add_argument(
+        '--hidden_dim',
+        type=int,
+        default=1024,
+        help='Hidden size of lstmp unit. (default: %(default)d)')
+    parser.add_argument(
+        '--class_num',
+        type=int,
+        default=1749,
+        help='Number of classes in label. (default: %(default)d)')
+    parser.add_argument(
+        '--learning_rate',
+        type=float,
+        default=0.00016,
+        help='Learning rate used to train. (default: %(default)f)')
+    parser.add_argument(
+        '--device',
+        type=str,
+        default='GPU',
+        choices=['CPU', 'GPU'],
+        help='The device type. (default: %(default)s)')
+    parser.add_argument(
+        '--parallel', action='store_true', help='If set, run in parallel.')
+    parser.add_argument(
+        '--mean_var',
+        type=str,
+        default='data/global_mean_var_search26kHr',
+        help="The path for feature's global mean and variance. "
+        "(default: %(default)s)")
+    parser.add_argument(
+        '--infer_feature_lst',
+        type=str,
+        default='data/infer_feature.lst',
+        help='The feature list path for inference. (default: %(default)s)')
+    parser.add_argument(
+        '--infer_label_lst',
+        type=str,
+        default='data/infer_label.lst',
+        help='The label list path for inference. (default: %(default)s)')
+    parser.add_argument(
+        '--checkpoint',
+        type=str,
+        default='./checkpoint',
+        help="The checkpoint path to init model. (default: %(default)s)")
+    parser.add_argument(
+        '--vocabulary',
+        type=str,
+        default='./decoder/graph/words.txt',
+        help="The path to vocabulary. (default: %(default)s)")
+    parser.add_argument(
+        '--graphs',
+        type=str,
+        default='./decoder/graph/TLG.fst',
+        help="The path to TLG graphs for decoding. (default: %(default)s)")
+    parser.add_argument(
+        '--log_prior',
+        type=str,
+        default="./decoder/logprior",
+        help="The log prior probs for training data. (default: %(default)s)")
+    args = parser.parse_args()
+    return args
+
+
+def print_arguments(args):
+    print('-----------  Configuration Arguments -----------')
+    for arg, value in sorted(vars(args).iteritems()):
+        print('%s: %s' % (arg, value))
+    print('------------------------------------------------')
+
+
+def infer_from_ckpt(args):
+    """Inference by using checkpoint."""
+
+    if not os.path.exists(args.checkpoint):
+        raise IOError("Invalid checkpoint!")
+
+    prediction, avg_cost, accuracy = stacked_lstmp_model(
+        frame_dim=args.frame_dim,
+        hidden_dim=args.hidden_dim,
+        proj_dim=args.proj_dim,
+        stacked_num=args.stacked_num,
+        class_num=args.class_num,
+        parallel=args.parallel)
+
+    infer_program = fluid.default_main_program().clone()
+
+    optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
+    optimizer.minimize(avg_cost)
+
+    place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+
+    # load checkpoint.
+    fluid.io.load_persistables(exe, args.checkpoint)
+
+    ltrans = [
+        trans_add_delta.TransAddDelta(2, 2),
+        trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
+        trans_splice.TransSplice()
+    ]
+
+    feature_t = fluid.LoDTensor()
+    label_t = fluid.LoDTensor()
+
+    # infer data reader
+    infer_data_reader = reader.AsyncDataReader(args.infer_feature_lst,
+                                               args.infer_label_lst)
+    infer_data_reader.set_transformers(ltrans)
+    infer_costs, infer_accs = [], []
+    for batch_id, batch_data in enumerate(
+            infer_data_reader.batch_iterator(args.batch_size,
+                                             args.minimum_batch_size)):
+        # load_data
+        (features, labels, lod) = batch_data
+        feature_t.set(features.ndarray, place)
+        feature_t.set_lod([lod.ndarray])
+        label_t.set(labels.ndarray, place)
+        label_t.set_lod([lod.ndarray])
+
+        infer_data_reader.recycle(features, labels, lod)
+
+        results = exe.run(infer_program,
+                          feed={"feature": feature_t,
+                                "label": label_t},
+                          fetch_list=[prediction, avg_cost, accuracy],
+                          return_numpy=False)
+        infer_costs.append(lodtensor_to_ndarray(results[1])[0])
+        infer_accs.append(lodtensor_to_ndarray(results[2])[0])
+
+        probs, lod = lodtensor_to_ndarray(results[0])
+        infer_batch = split_infer_result(probs, lod)
+        for index, sample in enumerate(infer_batch):
+            key = "utter#%d" % (batch_id * args.batch_size + index)
+            print(key, ": ", decoder.decode(key, sample), "\n")
+
+    print(np.mean(infer_costs), np.mean(infer_accs))
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    print_arguments(args)
+
+    infer_from_ckpt(args)
--- a/fluid/DeepASR/model_utils/__init__.py
+++ b/fluid/DeepASR/model_utils/__init__.py
--- a/fluid/DeepASR/model_utils/model.py
+++ b/fluid/DeepASR/model_utils/model.py
@@ -3,10 +3,11 @@ from __future__ import division
 from __future__ import print_function

 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid


-def stacked_lstmp_model(hidden_dim,
+def stacked_lstmp_model(frame_dim,
+                        hidden_dim,
                        proj_dim,
                        stacked_num,
                        class_num,
@@ -20,12 +21,13 @@ def stacked_lstmp_model(hidden_dim,
        label data respectively. And in inference, only `feature` is needed.

    Args:
-	hidden_dim(int): The hidden state's dimension of the LSTMP layer.
-	proj_dim(int): The projection size of the LSTMP layer.
-	stacked_num(int): The number of stacked LSTMP layers.
-	parallel(bool): Run in parallel or not, default `False`.
-	is_train(bool): Run in training phase or not, default `True`.
-	class_dim(int): The number of output classes.
+        frame_dim(int): The frame dimension of feature data.
+        hidden_dim(int): The hidden state's dimension of the LSTMP layer.
+        proj_dim(int): The projection size of the LSTMP layer.
+        stacked_num(int): The number of stacked LSTMP layers.
+        parallel(bool): Run in parallel or not, default `False`.
+        is_train(bool): Run in training phase or not, default `True`.
+        class_dim(int): The number of output classes.
    """

    # network configuration
@@ -78,7 +80,7 @@ def stacked_lstmp_model(hidden_dim,

    # data feeder
    feature = fluid.layers.data(
-        name="feature", shape=[-1, 120 * 11], dtype="float32", lod_level=1)
+        name="feature", shape=[-1, frame_dim], dtype="float32", lod_level=1)
    label = fluid.layers.data(
        name="label", shape=[-1, 1], dtype="int64", lod_level=1)

@@ -92,11 +94,12 @@ def stacked_lstmp_model(hidden_dim,
            feat_ = pd.read_input(feature)
            label_ = pd.read_input(label)
            prediction, avg_cost, acc = _net_conf(feat_, label_)
-            for out in [avg_cost, acc]:
+            for out in [prediction, avg_cost, acc]:
                pd.write_output(out)

        # get mean loss and acc through every devices.
-        avg_cost, acc = pd()
+        prediction, avg_cost, acc = pd()
+        prediction.stop_gradient = True
        avg_cost = fluid.layers.mean(x=avg_cost)
        acc = fluid.layers.mean(x=acc)
    else:

--- a/fluid/DeepASR/tools/profile.py
+++ b/fluid/DeepASR/tools/profile.py
@@ -7,13 +7,13 @@ import numpy as np
 import argparse
 import time

-import paddle.v2.fluid as fluid
-import paddle.v2.fluid.profiler as profiler
+import paddle.fluid as fluid
+import paddle.fluid.profiler as profiler
 import _init_paths
 import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
 import data_utils.augmentor.trans_add_delta as trans_add_delta
 import data_utils.augmentor.trans_splice as trans_splice
-import data_utils.data_reader as reader
+import data_utils.async_data_reader as reader
 from model_utils.model import stacked_lstmp_model
 from data_utils.util import lodtensor_to_ndarray

@@ -31,6 +31,11 @@ def parse_args():
        default=1,
        help='The minimum sequence number of a batch data. '
        '(default: %(default)d)')
+    parser.add_argument(
+        '--frame_dim',
+        type=int,
+        default=120 * 11,
+        help='Frame dimension of feature data. (default: %(default)d)')
    parser.add_argument(
        '--stacked_num',
        type=int,
@@ -46,10 +51,15 @@ def parse_args():
        type=int,
        default=1024,
        help='Hidden size of lstmp unit. (default: %(default)d)')
+    parser.add_argument(
+        '--class_num',
+        type=int,
+        default=1749,
+        help='Number of classes in label. (default: %(default)d)')
    parser.add_argument(
        '--learning_rate',
        type=float,
-        default=0.002,
+        default=0.00016,
        help='Learning rate used to train. (default: %(default)f)')
    parser.add_argument(
        '--device',
@@ -119,14 +129,15 @@ def profile(args):
            "arg 'first_batches_to_skip' must not be smaller than 0.")

    _, avg_cost, accuracy = stacked_lstmp_model(
+        frame_dim=args.frame_dim,
        hidden_dim=args.hidden_dim,
        proj_dim=args.proj_dim,
        stacked_num=args.stacked_num,
-        class_num=1749,
+        class_num=args.class_num,
        parallel=args.parallel)

-    adam_optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
-    adam_optimizer.minimize(avg_cost)
+    optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
+    optimizer.minimize(avg_cost)

    place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0)
    exe = fluid.Executor(place)
@@ -138,7 +149,7 @@ def profile(args):
        trans_splice.TransSplice()
    ]

-    data_reader = reader.DataReader(args.feature_lst, args.label_lst)
+    data_reader = reader.AsyncDataReader(args.feature_lst, args.label_lst)
    data_reader.set_transformers(ltrans)

    feature_t = fluid.LoDTensor()
@@ -158,17 +169,20 @@ def profile(args):
                frames_seen = 0
            # load_data
            (features, labels, lod) = batch_data
-            feature_t.set(features, place)
-            feature_t.set_lod([lod])
-            label_t.set(labels, place)
-            label_t.set_lod([lod])
+            feature_t.set(features.ndarray, place)
+            feature_t.set_lod([lod.ndarray])
+            label_t.set(labels.ndarray, place)
+            label_t.set_lod([lod.ndarray])
+
+            frames_seen += lod.ndarray[-1]

-            frames_seen += lod[-1]
+            data_reader.recycle(features, labels, lod)

            outs = exe.run(fluid.default_main_program(),
                           feed={"feature": feature_t,
                                 "label": label_t},
-                           fetch_list=[avg_cost, accuracy],
+                           fetch_list=[avg_cost, accuracy]
+                           if args.print_train_acc else [],
                           return_numpy=False)

            if args.print_train_acc:

--- a/fluid/DeepASR/train.py
+++ b/fluid/DeepASR/train.py
@@ -8,11 +8,11 @@ import numpy as np
 import argparse
 import time

-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
 import data_utils.augmentor.trans_add_delta as trans_add_delta
 import data_utils.augmentor.trans_splice as trans_splice
-import data_utils.data_reader as reader
+import data_utils.async_data_reader as reader
 from data_utils.util import lodtensor_to_ndarray
 from model_utils.model import stacked_lstmp_model

@@ -30,21 +30,31 @@ def parse_args():
        default=1,
        help='The minimum sequence number of a batch data. '
        '(default: %(default)d)')
+    parser.add_argument(
+        '--frame_dim',
+        type=int,
+        default=120 * 11,
+        help='Frame dimension of feature data. (default: %(default)d)')
    parser.add_argument(
        '--stacked_num',
        type=int,
        default=5,
-        help='Number of lstm layers to stack. (default: %(default)d)')
+        help='Number of lstmp layers to stack. (default: %(default)d)')
    parser.add_argument(
        '--proj_dim',
        type=int,
        default=512,
-        help='Project size of lstm unit. (default: %(default)d)')
+        help='Project size of lstmp unit. (default: %(default)d)')
    parser.add_argument(
        '--hidden_dim',
        type=int,
        default=1024,
-        help='Hidden size of lstm unit. (default: %(default)d)')
+        help='Hidden size of lstmp unit. (default: %(default)d)')
+    parser.add_argument(
+        '--class_num',
+        type=int,
+        default=1749,
+        help='Number of classes in label. (default: %(default)d)')
    parser.add_argument(
        '--pass_num',
        type=int,
@@ -58,7 +68,7 @@ def parse_args():
    parser.add_argument(
        '--learning_rate',
        type=float,
-        default=0.002,
+        default=0.00016,
        help='Learning rate used to train. (default: %(default)f)')
    parser.add_argument(
        '--device',
@@ -72,33 +82,46 @@ def parse_args():
        '--mean_var',
        type=str,
        default='data/global_mean_var_search26kHr',
-        help='mean var path')
+        help="The path for feature's global mean and variance. "
+        "(default: %(default)s)")
    parser.add_argument(
        '--train_feature_lst',
        type=str,
        default='data/feature.lst',
-        help='feature list path for training.')
+        help='The feature list path for training. (default: %(default)s)')
    parser.add_argument(
        '--train_label_lst',
        type=str,
        default='data/label.lst',
-        help='label list path for training.')
+        help='The label list path for training. (default: %(default)s)')
    parser.add_argument(
        '--val_feature_lst',
        type=str,
        default='data/val_feature.lst',
-        help='feature list path for validation.')
+        help='The feature list path for validation. (default: %(default)s)')
    parser.add_argument(
        '--val_label_lst',
        type=str,
        default='data/val_label.lst',
-        help='label list path for validation.')
+        help='The label list path for validation. (default: %(default)s)')
+    parser.add_argument(
+        '--init_model_path',
+        type=str,
+        default=None,
+        help="The model (checkpoint) path which the training resumes from. "
+        "If None, train the model from scratch. (default: %(default)s)")
    parser.add_argument(
-        '--model_save_dir',
+        '--checkpoints',
        type=str,
        default='./checkpoints',
-        help='directory to save model. Do not save model if set to '
-        '.')
+        help="The directory for saving checkpoints. Do not save checkpoints "
+        "if set to ''. (default: %(default)s)")
+    parser.add_argument(
+        '--infer_models',
+        type=str,
+        default='./infer_models',
+        help="The directory for saving inference models. Do not save inference "
+        "models if set to ''. (default: %(default)s)")
    args = parser.parse_args()
    return args

@@ -114,27 +137,37 @@ def train(args):
    """train in loop.
    """

-    # prediction, avg_cost, accuracy = stacked_lstmp_model(args.hidden_dim, 
-    #    args.proj_dim, args.stacked_num, class_num=1749, args.parallel)
+    # paths check
+    if args.init_model_path is not None and \
+            not os.path.exists(args.init_model_path):
+        raise IOError("Invalid initial model path!")
+    if args.checkpoints != '' and not os.path.exists(args.checkpoints):
+        os.mkdir(args.checkpoints)
+    if args.infer_models != '' and not os.path.exists(args.infer_models):
+        os.mkdir(args.infer_models)
+
    prediction, avg_cost, accuracy = stacked_lstmp_model(
+        frame_dim=args.frame_dim,
        hidden_dim=args.hidden_dim,
        proj_dim=args.proj_dim,
        stacked_num=args.stacked_num,
-        class_num=1749,
+        class_num=args.class_num,
        parallel=args.parallel)

-    adam_optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
-    adam_optimizer.minimize(avg_cost)
-
    # program for test
    test_program = fluid.default_main_program().clone()
-    with fluid.program_guard(test_program):
-        test_program = fluid.io.get_inference_program([avg_cost, accuracy])
+
+    optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
+    optimizer.minimize(avg_cost)

    place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0)
    exe = fluid.Executor(place)
    exe.run(fluid.default_startup_program())

+    # resume training if initial model provided.
+    if args.init_model_path is not None:
+        fluid.io.load_persistables(exe, args.init_model_path)
+
    ltrans = [
        trans_add_delta.TransAddDelta(2, 2),
        trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
@@ -151,8 +184,8 @@ def train(args):
                os.path.exists(args.val_label_lst)):
            return -1.0, -1.0
        # test data reader
-        test_data_reader = reader.DataReader(args.val_feature_lst,
-                                             args.val_label_lst)
+        test_data_reader = reader.AsyncDataReader(args.val_feature_lst,
+                                                  args.val_label_lst)
        test_data_reader.set_transformers(ltrans)
        test_costs, test_accs = [], []
        for batch_id, batch_data in enumerate(
@@ -160,10 +193,12 @@ def train(args):
                                                args.minimum_batch_size)):
            # load_data
            (features, labels, lod) = batch_data
-            feature_t.set(features, place)
-            feature_t.set_lod([lod])
-            label_t.set(labels, place)
-            label_t.set_lod([lod])
+            feature_t.set(features.ndarray, place)
+            feature_t.set_lod([lod.ndarray])
+            label_t.set(labels.ndarray, place)
+            label_t.set_lod([lod.ndarray])
+
+            test_data_reader.recycle(features, labels, lod)

            cost, acc = exe.run(test_program,
                                feed={"feature": feature_t,
@@ -175,8 +210,8 @@ def train(args):
        return np.mean(test_costs), np.mean(test_accs)

    # train data reader
-    train_data_reader = reader.DataReader(args.train_feature_lst,
-                                          args.train_label_lst)
+    train_data_reader = reader.AsyncDataReader(args.train_feature_lst,
+                                               args.train_label_lst, -1)
    train_data_reader.set_transformers(ltrans)
    # train
    for pass_id in xrange(args.pass_num):
@@ -186,30 +221,46 @@ def train(args):
                                                 args.minimum_batch_size)):
            # load_data
            (features, labels, lod) = batch_data
-            feature_t.set(features, place)
-            feature_t.set_lod([lod])
-            label_t.set(labels, place)
-            label_t.set_lod([lod])
+            feature_t.set(features.ndarray, place)
+            feature_t.set_lod([lod.ndarray])
+            label_t.set(labels.ndarray, place)
+            label_t.set_lod([lod.ndarray])

-            cost, acc = exe.run(fluid.default_main_program(),
-                                feed={"feature": feature_t,
-                                      "label": label_t},
-                                fetch_list=[avg_cost, accuracy],
-                                return_numpy=False)
+            train_data_reader.recycle(features, labels, lod)
+
+            to_print = batch_id > 0 and (batch_id % args.print_per_batches == 0)
+            outs = exe.run(fluid.default_main_program(),
+                           feed={"feature": feature_t,
+                                 "label": label_t},
+                           fetch_list=[avg_cost, accuracy] if to_print else [],
+                           return_numpy=False)

-            if batch_id > 0 and (batch_id % args.print_per_batches == 0):
+            if to_print:
                print("\nBatch %d, train cost: %f, train acc: %f" %
-                      (batch_id, lodtensor_to_ndarray(cost)[0],
-                       lodtensor_to_ndarray(acc)[0]))
+                      (batch_id, lodtensor_to_ndarray(outs[0])[0],
+                       lodtensor_to_ndarray(outs[1])[0]))
+                # save the latest checkpoint
+                if args.checkpoints != '':
+                    model_path = os.path.join(args.checkpoints,
+                                              "deep_asr.latest.checkpoint")
+                    fluid.io.save_persistables(exe, model_path)
            else:
                sys.stdout.write('.')
                sys.stdout.flush()
        # run test
        val_cost, val_acc = test(exe)
-        # save model 
-        if args.model_save_dir != '':
+
+        # save checkpoint per pass
+        if args.checkpoints != '':
            model_path = os.path.join(
-                args.model_save_dir, "deep_asr.pass_" + str(pass_id) + ".model")
+                args.checkpoints,
+                "deep_asr.pass_" + str(pass_id) + ".checkpoint")
+            fluid.io.save_persistables(exe, model_path)
+        # save inference model
+        if args.infer_models != '':
+            model_path = os.path.join(
+                args.infer_models,
+                "deep_asr.pass_" + str(pass_id) + ".infer.model")
            fluid.io.save_inference_model(model_path, ["feature"],
                                          [prediction], exe)
        # cal pass time
@@ -224,7 +275,4 @@ if __name__ == '__main__':
    args = parse_args()
    print_arguments(args)

-    if args.model_save_dir != '' and not os.path.exists(args.model_save_dir):
-        os.mkdir(args.model_save_dir)
-
    train(args)
--- a/fluid/README.md
+++ b/fluid/README.md
+# Paddle Fluid Models
+
+---
+
+The Paddle Fluid models are a collection of example models that use Paddle Fluid APIs. Currently, example codes in this directory are still under active development.
--- a/fluid/adversarial/README.md
+++ b/fluid/adversarial/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Advbox

 Advbox is a Python toolbox to create adversarial examples that fool neural networks. It requires Python and paddle.

--- a/fluid/adversarial/advbox/__init__.py
+++ b/fluid/adversarial/advbox/__init__.py
 """
   A set of tools for generating adversarial example on paddle platform
 """
-
-from . import attacks
-from . import models
-from .adversary import Adversary
--- a/fluid/adversarial/advbox/adversary.py
+++ b/fluid/adversarial/advbox/adversary.py
@@ -18,13 +18,15 @@ class Adversary(object):
        """
        assert original is not None

+        self.original_label = original_label
+        self.target_label = None
+        self.adversarial_label = None
+
        self.__original = original
-        self.__original_label = original_label
-        self.__target_label = None
        self.__target = None
        self.__is_targeted_attack = False
        self.__adversarial_example = None
-        self.__adversarial_label = None
+        self.__bad_adversarial_example = None

    def set_target(self, is_targeted_attack, target=None, target_label=None):
        """
@@ -38,10 +40,10 @@ class Adversary(object):
        """
        assert (target_label is None) or is_targeted_attack
        self.__is_targeted_attack = is_targeted_attack
-        self.__target_label = target_label
+        self.target_label = target_label
        self.__target = target
        if not is_targeted_attack:
-            self.__target_label = None
+            self.target_label = None
            self.__target = None

    def set_original(self, original, original_label=None):
@@ -53,10 +55,11 @@ class Adversary(object):
        """
        if original != self.__original:
            self.__original = original
-            self.__original_label = original_label
+            self.original_label = original_label
            self.__adversarial_example = None
+            self.__bad_adversarial_example = None
        if original is None:
-            self.__original_label = None
+            self.original_label = None

    def _is_successful(self, adversarial_label):
        """
@@ -65,11 +68,11 @@ class Adversary(object):
        :param adversarial_label: adversarial label.
        :return: bool
        """
-        if self.__target_label is not None:
-            return adversarial_label == self.__target_label
+        if self.target_label is not None:
+            return adversarial_label == self.target_label
        else:
            return (adversarial_label is not None) and \
-                   (adversarial_label != self.__original_label)
+                   (adversarial_label != self.original_label)

    def is_successful(self):
        """
@@ -77,7 +80,7 @@ class Adversary(object):

        :return: bool
        """
-        return self._is_successful(self.__adversarial_label)
+        return self._is_successful(self.adversarial_label)

    def try_accept_the_example(self, adversarial_example, adversarial_label):
        """
@@ -93,7 +96,9 @@ class Adversary(object):
        ok = self._is_successful(adversarial_label)
        if ok:
            self.__adversarial_example = adversarial_example
-            self.__adversarial_label = adversarial_label
+            self.adversarial_label = adversarial_label
+        else:
+            self.__bad_adversarial_example = adversarial_example
        return ok

    def perturbation(self, multiplying_factor=1.0):
@@ -104,9 +109,14 @@ class Adversary(object):
        :return: The perturbation that is multiplied by multiplying_factor.
        """
        assert self.__original is not None
-        assert self.__adversarial_example is not None
-        return multiplying_factor * (
-            self.__adversarial_example - self.__original)
+        assert (self.__adversarial_example is not None) or \
+               (self.__bad_adversarial_example is not None)
+        if self.__adversarial_example is not None:
+            return multiplying_factor * (
+                self.__adversarial_example - self.__original)
+        else:
+            return multiplying_factor * (
+                self.__bad_adversarial_example - self.__original)

    @property
    def is_targeted_attack(self):
@@ -115,20 +125,6 @@ class Adversary(object):
        """
        return self.__is_targeted_attack

-    @property
-    def target_label(self):
-        """
-        :property: target_label
-        """
-        return self.__target_label
-
-    @target_label.setter
-    def target_label(self, label):
-        """
-        :property: target_label
-        """
-        self.__target_label = label
-
    @property
    def target(self):
        """
@@ -143,20 +139,6 @@ class Adversary(object):
        """
        return self.__original

-    @property
-    def original_label(self):
-        """
-        :property: original
-        """
-        return self.__original_label
-
-    @original_label.setter
-    def original_label(self, label):
-        """
-        original_label setter
-        """
-        self.__original_label = label
-
    @property
    def adversarial_example(self):
        """
@@ -164,23 +146,9 @@ class Adversary(object):
        """
        return self.__adversarial_example

-    @adversarial_example.setter
-    def adversarial_example(self, example):
-        """
-        adversarial_example setter
-        """
-        self.__adversarial_example = example
-
    @property
-    def adversarial_label(self):
-        """
-        :property: adversarial_label
-        """
-        return self.__adversarial_label
-
-    @adversarial_label.setter
-    def adversarial_label(self, label):
+    def bad_adversarial_example(self):
        """
-        adversarial_label setter
+        :property: bad_adversarial_example
        """
-        self.__adversarial_label = label
+        return self.__bad_adversarial_example
--- a/fluid/adversarial/advbox/attacks/__init__.py
+++ b/fluid/adversarial/advbox/attacks/__init__.py
 """
-Attack methods
+Attack methods __init__.py
 """
-
-from .base import Attack
-from .deepfool import DeepFoolAttack
-from .gradientsign import FGSM
-from .gradientsign import GradientSignAttack
-from .iterator_gradientsign import IFGSM
-from .iterator_gradientsign import IteratorGradientSignAttack
--- a/fluid/adversarial/advbox/attacks/base.py
+++ b/fluid/adversarial/advbox/attacks/base.py
@@ -52,21 +52,23 @@ class Attack(object):
        :param adversary: adversary
        :return: None
        """
+        assert self.model.channel_axis() == adversary.original.ndim
+
        if adversary.original_label is None:
            adversary.original_label = np.argmax(
                self.model.predict(adversary.original))
        if adversary.is_targeted_attack and adversary.target_label is None:
            if adversary.target is None:
                raise ValueError(
-                    'When adversary.is_targeted_attack is True, '
+                    'When adversary.is_targeted_attack is true, '
                    'adversary.target_label or adversary.target must be set.')
            else:
-                adversary.target_label_label = np.argmax(
-                    self.model.predict(
-                        self.model.scale_input(adversary.target)))
+                adversary.target_label = np.argmax(
+                    self.model.predict(adversary.target))

-        logging.info('adversary:\noriginal_label: {}'
-                     '\n          target_lable: {}'
-                     '\n          is_targeted_attack: {}'
+        logging.info('adversary:'
+                     '\n         original_label: {}'
+                     '\n         target_label: {}'
+                     '\n         is_targeted_attack: {}'
                     ''.format(adversary.original_label, adversary.target_label,
                               adversary.is_targeted_attack))
--- a/fluid/adversarial/advbox/attacks/deepfool.py
+++ b/fluid/adversarial/advbox/attacks/deepfool.py
@@ -10,6 +10,8 @@ import numpy as np

 from .base import Attack

+__all__ = ['DeepFoolAttack']
+

 class DeepFoolAttack(Attack):
    """
@@ -56,7 +58,7 @@ class DeepFoolAttack(Attack):
                gradient_k = self.model.gradient(x, k)
                w_k = gradient_k - gradient
                f_k = f[k] - f[pre_label]
-                w_k_norm = np.linalg.norm(w_k) + 1e-8
+                w_k_norm = np.linalg.norm(w_k.flatten()) + 1e-8
                pert_k = (np.abs(f_k) + 1e-8) / w_k_norm
                if pert_k < pert:
                    pert = pert_k
@@ -70,9 +72,12 @@ class DeepFoolAttack(Attack):
            f = self.model.predict(x)
            gradient = self.model.gradient(x, pre_label)
            adv_label = np.argmax(f)
-            logging.info('iteration = {}, f = {}, pre_label = {}'
-                         ', adv_label={}'.format(iteration, f[pre_label],
-                                                 pre_label, adv_label))
+            logging.info('iteration={}, f[pre_label]={}, f[target_label]={}'
+                         ', f[adv_label]={}, pre_label={}, adv_label={}'
+                         ''.format(iteration, f[pre_label], (
+                             f[adversary.target_label]
+                             if adversary.is_targeted_attack else 'NaN'), f[
+                                 adv_label], pre_label, adv_label))
            if adversary.try_accept_the_example(x, adv_label):
                return adversary


--- a/fluid/adversarial/advbox/attacks/gradient_method.py
+++ b/fluid/adversarial/advbox/attacks/gradient_method.py
+"""
+This module provide the attack method for Iterator FGSM's implement.
+"""
+from __future__ import division
+
+import logging
+from collections import Iterable
+
+import numpy as np
+
+from .base import Attack
+
+__all__ = [
+    'GradientMethodAttack', 'FastGradientSignMethodAttack', 'FGSM',
+    'FastGradientSignMethodTargetedAttack', 'FGSMT',
+    'BasicIterativeMethodAttack', 'BIM',
+    'IterativeLeastLikelyClassMethodAttack', 'ILCM'
+]
+
+
+class GradientMethodAttack(Attack):
+    """
+    This class implements gradient attack method, and is the base of FGSM, BIM,
+    ILCM, etc.
+    """
+
+    def __init__(self, model, support_targeted=True):
+        """
+        :param model(model): The model to be attacked.
+        :param support_targeted(bool): Does this attack method support targeted.
+        """
+        super(GradientMethodAttack, self).__init__(model)
+        self.support_targeted = support_targeted
+
+    def _apply(self, adversary, norm_ord=np.inf, epsilons=0.01, steps=100):
+        """
+        Apply the gradient attack method.
+        :param adversary(Adversary):
+            The Adversary object.
+        :param norm_ord(int):
+            Order of the norm, such as np.inf, 1, 2, etc. It can't be 0.
+        :param epsilons(list|tuple|int):
+            Attack step size (input variation).
+        :param steps:
+            The number of iterator steps.
+        :return:
+            adversary(Adversary): The Adversary object.
+        """
+        if norm_ord == 0:
+            raise ValueError("L0 norm is not supported!")
+
+        if not self.support_targeted:
+            if adversary.is_targeted_attack:
+                raise ValueError(
+                    "This attack method doesn't support targeted attack!")
+
+        if not isinstance(epsilons, Iterable):
+            epsilons = np.linspace(epsilons, epsilons + 1e-10, num=steps)
+
+        pre_label = adversary.original_label
+        min_, max_ = self.model.bounds()
+
+        assert self.model.channel_axis() == adversary.original.ndim
+        assert (self.model.channel_axis() == 1 or
+                self.model.channel_axis() == adversary.original.shape[0] or
+                self.model.channel_axis() == adversary.original.shape[-1])
+
+        step = 1
+        adv_img = adversary.original
+        for epsilon in epsilons[:steps]:
+            if epsilon == 0.0:
+                continue
+            if adversary.is_targeted_attack:
+                gradient = -self.model.gradient(adv_img, adversary.target_label)
+            else:
+                gradient = self.model.gradient(adv_img,
+                                               adversary.original_label)
+            if norm_ord == np.inf:
+                gradient_norm = np.sign(gradient)
+            else:
+                gradient_norm = gradient / self._norm(gradient, ord=norm_ord)
+
+            adv_img = adv_img + epsilon * gradient_norm * (max_ - min_)
+            adv_img = np.clip(adv_img, min_, max_)
+            adv_label = np.argmax(self.model.predict(adv_img))
+            logging.info('step={}, epsilon = {:.5f}, pre_label = {}, '
+                         'adv_label={}'.format(step, epsilon, pre_label,
+                                               adv_label))
+            if adversary.try_accept_the_example(adv_img, adv_label):
+                return adversary
+            step += 1
+        return adversary
+
+    @staticmethod
+    def _norm(a, ord):
+        if a.ndim == 1:
+            return np.linalg.norm(a, ord=ord)
+        if a.ndim == a.shape[0]:
+            norm_shape = (a.ndim, reduce(np.dot, a.shape[1:]))
+            norm_axis = 1
+        else:
+            norm_shape = (reduce(np.dot, a.shape[:-1]), a.ndim)
+            norm_axis = 0
+        return np.linalg.norm(a.reshape(norm_shape), ord=ord, axis=norm_axis)
+
+
+class FastGradientSignMethodTargetedAttack(GradientMethodAttack):
+    """
+    "Fast Gradient Sign Method" is extended to support targeted attack.
+    "Fast Gradient Sign Method" was originally implemented by Goodfellow et
+    al. (2015) with the infinity norm.
+
+    Paper link: https://arxiv.org/abs/1412.6572
+    """
+
+    def _apply(self, adversary, epsilons=0.03):
+        return GradientMethodAttack._apply(
+            self,
+            adversary=adversary,
+            norm_ord=np.inf,
+            epsilons=epsilons,
+            steps=1)
+
+
+class FastGradientSignMethodAttack(FastGradientSignMethodTargetedAttack):
+    """
+    This attack was originally implemented by Goodfellow et al. (2015) with the
+    infinity norm, and is known as the "Fast Gradient Sign Method".
+
+    Paper link: https://arxiv.org/abs/1412.6572
+    """
+
+    def __init__(self, model):
+        super(FastGradientSignMethodAttack, self).__init__(model, False)
+
+
+class IterativeLeastLikelyClassMethodAttack(GradientMethodAttack):
+    """
+    "Iterative Least-likely Class Method (ILCM)" extends "BIM" to support
+    targeted attack.
+    "The Basic Iterative Method (BIM)" is to extend "FSGM". "BIM" iteratively
+    take multiple small steps while adjusting the direction after each step.
+
+    Paper link: https://arxiv.org/abs/1607.02533
+    """
+
+    def _apply(self, adversary, epsilons=0.001, steps=1000):
+        return GradientMethodAttack._apply(
+            self,
+            adversary=adversary,
+            norm_ord=np.inf,
+            epsilons=epsilons,
+            steps=steps)
+
+
+class BasicIterativeMethodAttack(IterativeLeastLikelyClassMethodAttack):
+    """
+    FGSM is a one-step method. "The Basic Iterative Method (BIM)" iteratively
+    take multiple small steps while adjusting the direction after each step.
+    Paper link: https://arxiv.org/abs/1607.02533
+    """
+
+    def __init__(self, model):
+        super(BasicIterativeMethodAttack, self).__init__(model, False)
+
+
+FGSM = FastGradientSignMethodAttack
+FGSMT = FastGradientSignMethodTargetedAttack
+BIM = BasicIterativeMethodAttack
+ILCM = IterativeLeastLikelyClassMethodAttack
--- a/fluid/adversarial/advbox/attacks/gradientsign.py
+++ b/fluid/adversarial/advbox/attacks/gradientsign.py
-"""
-This module provide the attack method for FGSM's implement.
-"""
-from __future__ import division
-
-import logging
-from collections import Iterable
-
-import numpy as np
-
-from .base import Attack
-
-
-class GradientSignAttack(Attack):
-    """
-    This attack was originally implemented by Goodfellow et al. (2015) with the
-    infinity norm (and is known as the "Fast Gradient Sign Method").
-    This is therefore called the Fast Gradient Method.
-    Paper link: https://arxiv.org/abs/1412.6572
-    """
-
-    def _apply(self, adversary, epsilons=1000):
-        """
-          Apply the gradient sign attack.
-          Args:
-              adversary(Adversary): The Adversary object.
-              epsilons(list|tuple|int): The epsilon (input variation parameter).
-          Return:
-              adversary: The Adversary object.
-          """
-        assert adversary is not None
-
-        if not isinstance(epsilons, Iterable):
-            epsilons = np.linspace(0, 1, num=epsilons + 1)[1:]
-
-        pre_label = adversary.original_label
-        min_, max_ = self.model.bounds()
-
-        if adversary.is_targeted_attack:
-            gradient = self.model.gradient(adversary.original,
-                                           adversary.target_label)
-            gradient_sign = -np.sign(gradient) * (max_ - min_)
-        else:
-            gradient = self.model.gradient(adversary.original,
-                                           adversary.original_label)
-            gradient_sign = np.sign(gradient) * (max_ - min_)
-
-        for epsilon in epsilons:
-            adv_img = adversary.original + epsilon * gradient_sign
-            adv_img = np.clip(adv_img, min_, max_)
-            adv_label = np.argmax(self.model.predict(adv_img))
-            logging.info('epsilon = {:.3f}, pre_label = {}, adv_label={}'.
-                         format(epsilon, pre_label, adv_label))
-            if adversary.try_accept_the_example(adv_img, adv_label):
-                return adversary
-
-        return adversary
-
-
-FGSM = GradientSignAttack
--- a/fluid/adversarial/advbox/attacks/iterator_gradientsign.py
+++ b/fluid/adversarial/advbox/attacks/iterator_gradientsign.py
-"""
-This module provide the attack method for Iterator FGSM's implement.
-"""
-from __future__ import division
-
-import logging
-from collections import Iterable
-
-import numpy as np
-
-from .base import Attack
-
-
-class IteratorGradientSignAttack(Attack):
-    """
-    This attack was originally implemented by Alexey Kurakin(Google Brain).
-    Paper link: https://arxiv.org/pdf/1607.02533.pdf
-    """
-
-    def _apply(self, adversary, epsilons=100, steps=10):
-        """
-        Apply the iterative gradient sign attack.
-        Args:
-            adversary(Adversary): The Adversary object.
-            epsilons(list|tuple|int): The epsilon (input variation parameter).
-            steps(int): The number of iterator steps.
-        Return:
-            adversary(Adversary): The Adversary object.
-        """
-
-        if not isinstance(epsilons, Iterable):
-            epsilons = np.linspace(0, 1 / steps, num=epsilons + 1)[1:]
-
-        pre_label = adversary.original_label
-        min_, max_ = self.model.bounds()
-
-        for epsilon in epsilons:
-            adv_img = adversary.original
-            for _ in range(steps):
-                if adversary.is_targeted_attack:
-                    gradient = self.model.gradient(adversary.original,
-                                                   adversary.target_label)
-                    gradient_sign = -np.sign(gradient) * (max_ - min_)
-                else:
-                    gradient = self.model.gradient(adversary.original,
-                                                   adversary.original_label)
-                    gradient_sign = np.sign(gradient) * (max_ - min_)
-                adv_img = adv_img + gradient_sign * epsilon
-                adv_img = np.clip(adv_img, min_, max_)
-                adv_label = np.argmax(self.model.predict(adv_img))
-                logging.info('epsilon = {:.3f}, pre_label = {}, adv_label={}'.
-                             format(epsilon, pre_label, adv_label))
-                if adversary.try_accept_the_example(adv_img, adv_label):
-                    return adversary
-
-        return adversary
-
-
-IFGSM = IteratorGradientSignAttack
--- a/fluid/adversarial/advbox/attacks/lbfgs.py
+++ b/fluid/adversarial/advbox/attacks/lbfgs.py
+"""
+This module provide the attack method of "LBFGS".
+"""
+from __future__ import division
+
+import logging
+
+import numpy as np
+from scipy.optimize import fmin_l_bfgs_b
+
+from .base import Attack
+
+__all__ = ['LBFGSAttack', 'LBFGS']
+
+
+class LBFGSAttack(Attack):
+    """
+    Uses L-BFGS-B to minimize the cross-entropy and the distance between the
+    original and the adversary.
+
+    Paper link: https://arxiv.org/abs/1510.05328
+    """
+
+    def __init__(self, model):
+        super(LBFGSAttack, self).__init__(model)
+        self._predicts_normalized = None
+        self._adversary = None  # type: Adversary
+
+    def _apply(self, adversary, epsilon=0.001, steps=10):
+        self._adversary = adversary
+
+        if not adversary.is_targeted_attack:
+            raise ValueError("This attack method only support targeted attack!")
+
+        # finding initial c
+        logging.info('finding initial c...')
+        c = epsilon
+        x0 = adversary.original.flatten()
+        for i in range(30):
+            c = 2 * c
+            logging.info('c={}'.format(c))
+            is_adversary = self._lbfgsb(x0, c, steps)
+            if is_adversary:
+                break
+        if not is_adversary:
+            logging.info('Failed!')
+            return adversary
+
+        # binary search c
+        logging.info('binary search c...')
+        c_low = 0
+        c_high = c
+        while c_high - c_low >= epsilon:
+            logging.info('c_high={}, c_low={}, diff={}, epsilon={}'
+                         .format(c_high, c_low, c_high - c_low, epsilon))
+            c_half = (c_low + c_high) / 2
+            is_adversary = self._lbfgsb(x0, c_half, steps)
+            if is_adversary:
+                c_high = c_half
+            else:
+                c_low = c_half
+
+        return adversary
+
+    def _is_predicts_normalized(self, predicts):
+        """
+        To determine the predicts is normalized.
+        :param predicts(np.array): the output of the model.
+        :return: bool
+        """
+        if self._predicts_normalized is None:
+            if self.model.predict_name().lower() in [
+                    'softmax', 'probabilities', 'probs'
+            ]:
+                self._predicts_normalized = True
+            else:
+                if np.any(predicts < 0.0):
+                    self._predicts_normalized = False
+                else:
+                    s = np.sum(predicts.flatten())
+                    if 0.999 <= s <= 1.001:
+                        self._predicts_normalized = True
+                    else:
+                        self._predicts_normalized = False
+        assert self._predicts_normalized is not None
+        return self._predicts_normalized
+
+    def _loss(self, adv_x, c):
+        """
+        To get the loss and gradient.
+        :param adv_x: the candidate adversarial example
+        :param c: parameter 'C' in the paper
+        :return: (loss, gradient)
+        """
+        x = adv_x.reshape(self._adversary.original.shape)
+
+        # cross_entropy
+        logits = self.model.predict(x)
+        if not self._is_predicts_normalized(logits):  # to softmax
+            e = np.exp(logits)
+            logits = e / np.sum(e)
+        e = np.exp(logits)
+        s = np.sum(e)
+        ce = np.log(s) - logits[self._adversary.target_label]
+
+        # L2 distance
+        min_, max_ = self.model.bounds()
+        d = np.sum((x - self._adversary.original).flatten() ** 2) \
+            / ((max_ - min_) ** 2) / len(adv_x)
+
+        # gradient
+        gradient = self.model.gradient(x, self._adversary.target_label)
+
+        result = (c * ce + d).astype(float), gradient.flatten().astype(float)
+        return result
+
+    def _lbfgsb(self, x0, c, maxiter):
+        min_, max_ = self.model.bounds()
+        bounds = [(min_, max_)] * len(x0)
+        approx_grad_eps = (max_ - min_) / 100.0
+        x, f, d = fmin_l_bfgs_b(
+            self._loss,
+            x0,
+            args=(c, ),
+            bounds=bounds,
+            maxiter=maxiter,
+            epsilon=approx_grad_eps)
+        if np.amax(x) > max_ or np.amin(x) < min_:
+            x = np.clip(x, min_, max_)
+        shape = self._adversary.original.shape
+        adv_label = np.argmax(self.model.predict(x.reshape(shape)))
+        logging.info('pre_label = {}, adv_label={}'.format(
+            self._adversary.target_label, adv_label))
+        return self._adversary.try_accept_the_example(
+            x.reshape(shape), adv_label)
+
+
+LBFGS = LBFGSAttack
--- a/fluid/adversarial/advbox/attacks/saliency.py
+++ b/fluid/adversarial/advbox/attacks/saliency.py
+"""
+This module provide the attack method for JSMA's implement.
+"""
+from __future__ import division
+
+import logging
+import random
+import numpy as np
+
+from .base import Attack
+
+
+class SaliencyMapAttack(Attack):
+    """
+    Implements the Saliency Map Attack.
+    The Jacobian-based Saliency Map Approach (Papernot et al. 2016).
+    Paper link: https://arxiv.org/pdf/1511.07528.pdf
+    """
+
+    def _apply(self,
+               adversary,
+               max_iter=2000,
+               fast=True,
+               theta=0.1,
+               max_perturbations_per_pixel=7):
+        """
+        Apply the JSMA attack.
+        Args:
+            adversary(Adversary): The Adversary object.
+            max_iter(int): The max iterations.
+            fast(bool): Whether evaluate the pixel influence on sum of residual classes.
+            theta(float): Perturbation per pixel relative to [min, max] range.
+            max_perturbations_per_pixel(int): The max count of perturbation per pixel.
+        Return:
+            adversary: The Adversary object.
+        """
+        assert adversary is not None
+
+        if not adversary.is_targeted_attack or (adversary.target_label is None):
+            target_labels = self._generate_random_target(
+                adversary.original_label)
+        else:
+            target_labels = [adversary.target_label]
+
+        for target in target_labels:
+            original_image = adversary.original
+
+            # the mask defines the search domain
+            # each modified pixel with border value is set to zero in mask
+            mask = np.ones_like(original_image)
+
+            # count tracks how often each pixel was changed
+            counts = np.zeros_like(original_image)
+
+            labels = range(self.model.num_classes())
+            adv_img = original_image.copy()
+            min_, max_ = self.model.bounds()
+
+            for step in range(max_iter):
+                adv_img = np.clip(adv_img, min_, max_)
+                adv_label = np.argmax(self.model.predict(adv_img))
+                if adversary.try_accept_the_example(adv_img, adv_label):
+                    return adversary
+
+                # stop if mask is all zero
+                if not any(mask.flatten()):
+                    return adversary
+
+                logging.info('step = {}, original_label = {}, adv_label={}'.
+                             format(step, adversary.original_label, adv_label))
+
+                # get pixel location with highest influence on class
+                idx, p_sign = self._saliency_map(
+                    adv_img, target, labels, mask, fast=fast)
+
+                # apply perturbation
+                adv_img[idx] += -p_sign * theta * (max_ - min_)
+
+                # tracks number of updates for each pixel
+                counts[idx] += 1
+
+                # remove pixel from search domain if it hits the bound
+                if adv_img[idx] <= min_ or adv_img[idx] >= max_:
+                    mask[idx] = 0
+
+                # remove pixel if it was changed too often
+                if counts[idx] >= max_perturbations_per_pixel:
+                    mask[idx] = 0
+
+                adv_img = np.clip(adv_img, min_, max_)
+
+    def _generate_random_target(self, original_label):
+        """
+        Draw random target labels all of which are different and not the original label.
+        Args:
+            original_label(int): Original label.
+        Return:
+            target_labels(list): random target labels
+        """
+        num_random_target = 1
+        num_classes = self.model.num_classes()
+        assert num_random_target <= num_classes - 1
+
+        target_labels = random.sample(range(num_classes), num_random_target + 1)
+        target_labels = [t for t in target_labels if t != original_label]
+        target_labels = target_labels[:num_random_target]
+
+        return target_labels
+
+    def _saliency_map(self, image, target, labels, mask, fast=False):
+        """
+        Get pixel location with highest influence on class.
+        Args:
+            image(numpy.ndarray): Image with shape (height, width, channels).
+            target(int): The target label.
+            labels(int): The number of classes of the output label.
+            mask(list): Each modified pixel with border value is set to zero in mask.
+            fast(bool): Whether evaluate the pixel influence on sum of residual classes.
+        Return:
+            idx: The index of optimal pixel.
+            pix_sign: The direction of perturbation
+        """
+        # pixel influence on target class
+        alphas = self.model.gradient(image, target) * mask
+
+        # pixel influence on sum of residual classes(don't evaluate if fast == True)
+        if fast:
+            betas = -np.ones_like(alphas)
+        else:
+            betas = np.sum([
+                self.model.gradient(image, label) * mask - alphas
+                for label in labels
+            ], 0)
+
+        # compute saliency map (take into account both pos. & neg. perturbations)
+        sal_map = np.abs(alphas) * np.abs(betas) * np.sign(alphas * betas)
+
+        # find optimal pixel & direction of perturbation
+        idx = np.argmin(sal_map)
+        idx = np.unravel_index(idx, mask.shape)
+        pix_sign = np.sign(alphas)[idx]
+
+        return idx, pix_sign
+
+
+JSMA = SaliencyMapAttack
--- a/fluid/adversarial/advbox/models/__init__.py
+++ b/fluid/adversarial/advbox/models/__init__.py
 """
-Paddle model for target of attack
-"""
-from .base import Model
-from .paddle import PaddleModel
+Models __init__.py
+"""
\ No newline at end of file
--- a/fluid/adversarial/advbox/models/base.py
+++ b/fluid/adversarial/advbox/models/base.py
@@ -24,11 +24,21 @@ class Model(object):
        assert len(bounds) == 2
        assert channel_axis in [0, 1, 2, 3]

-        if preprocess is None:
-            preprocess = (0, 1)
        self._bounds = bounds
        self._channel_axis = channel_axis
-        self._preprocess = preprocess
+
+        # Make self._preprocess to be (0,1) if possible, so that don't need
+        # to do substract or divide.
+        if preprocess is not None:
+            sub, div = np.array(preprocess)
+            if not np.any(sub):
+                sub = 0
+            if np.all(div == 1):
+                div = 1
+            assert (div is None) or np.all(div)
+            self._preprocess = (sub, div)
+        else:
+            self._preprocess = (0, 1)

    def bounds(self):
        """
@@ -47,8 +57,7 @@ class Model(object):
        sub, div = self._preprocess
        if np.any(sub != 0):
            res = input_ - sub
-        assert np.any(div != 0)
-        if np.any(div != 1):
+        if not np.all(sub == 1):
            if res is None:  # "res = input_ - sub" is not executed!
                res = input_ / div
            else:
@@ -97,3 +106,11 @@ class Model(object):
                with the shape (height, width, channel).
        """
        raise NotImplementedError
+
+    @abstractmethod
+    def predict_name(self):
+        """
+        Get the predict name, such as "softmax",etc.
+        :return: string
+        """
+        raise NotImplementedError
--- a/fluid/adversarial/advbox/models/paddle.py
+++ b/fluid/adversarial/advbox/models/paddle.py
@@ -4,7 +4,7 @@ Paddle model
 from __future__ import absolute_import

 import numpy as np
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid

 from .base import Model

@@ -16,7 +16,7 @@ class PaddleModel(Model):
    instance of PaddleModel.

    Args:
-        program(paddle.v2.fluid.framework.Program): The program of the model
+        program(paddle.fluid.framework.Program): The program of the model
            which generate the adversarial sample.
        input_name(string): The name of the input.
        logits_name(string): The name of the logits.
@@ -114,3 +114,10 @@ class PaddleModel(Model):
                              feed=feeder.feed([(scaled_data, label)]),
                              fetch_list=[self._gradient])
        return grad.reshape(data.shape)
+
+    def predict_name(self):
+        """
+        Get the predict name, such as "softmax",etc.
+        :return: string
+        """
+        return self._program.block(0).var(self._predict_name).op.type
--- a/fluid/adversarial/fluid_mnist.py
+++ b/fluid/adversarial/fluid_mnist.py
@@ -2,7 +2,7 @@
 CNN on mnist data using fluid api of paddlepaddle
 """
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid


 def mnist_cnn_model(img):
@@ -47,7 +47,9 @@ def main():
    optimizer = fluid.optimizer.Adam(learning_rate=0.01)
    optimizer.minimize(avg_cost)

-    accuracy = fluid.evaluator.Accuracy(input=logits, label=label)
+    batch_size = fluid.layers.create_tensor(dtype='int64')
+    batch_acc = fluid.layers.accuracy(
+        input=logits, label=label, total=batch_size)

    BATCH_SIZE = 50
    PASS_NUM = 3
@@ -63,20 +65,22 @@ def main():
    feeder = fluid.DataFeeder(feed_list=[img, label], place=place)
    exe.run(fluid.default_startup_program())

+    pass_acc = fluid.average.WeightedAverage()
    for pass_id in range(PASS_NUM):
-        accuracy.reset(exe)
+        pass_acc.reset()
        for data in train_reader():
-            loss, acc = exe.run(fluid.default_main_program(),
-                                feed=feeder.feed(data),
-                                fetch_list=[avg_cost] + accuracy.metrics)
-            pass_acc = accuracy.eval(exe)
-            print("pass_id=" + str(pass_id) + " acc=" + str(acc) + " pass_acc="
-                  + str(pass_acc))
+            loss, acc, b_size = exe.run(
+                fluid.default_main_program(),
+                feed=feeder.feed(data),
+                fetch_list=[avg_cost, batch_acc, batch_size])
+            pass_acc.add(value=acc, weight=b_size)
+            print("pass_id=" + str(pass_id) + " acc=" + str(acc[0]) +
+                  " pass_acc=" + str(pass_acc.eval()[0]))
            if loss < LOSS_THRESHOLD and pass_acc > ACC_THRESHOLD:
                break

-        pass_acc = accuracy.eval(exe)
-        print("pass_id=" + str(pass_id) + " pass_acc=" + str(pass_acc))
+        print("pass_id=" + str(pass_id) + " pass_acc=" + str(pass_acc.eval()[
+            0]))
    fluid.io.save_params(
        exe, dirname='./mnist', main_program=fluid.default_main_program())
    print('train mnist done')

--- a/fluid/adversarial/mnist_tutorial_fgsm.py
+++ b/fluid/adversarial/mnist_tutorial_fgsm.py
@@ -3,10 +3,10 @@ FGSM demos on mnist using advbox tool.
 """
 import matplotlib.pyplot as plt
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid

-from advbox import Adversary
-from advbox.attacks.gradientsign import GradientSignAttack
+from advbox.adversary import Adversary
+from advbox.attacks.gradient_method import FGSM
 from advbox.models.paddle import PaddleModel


@@ -73,7 +73,7 @@ def main():
    # advbox demo
    m = PaddleModel(fluid.default_main_program(), IMG_NAME, LABEL_NAME,
                    logits.name, avg_cost.name, (-1, 1))
-    att = GradientSignAttack(m)
+    att = FGSM(m)
    for data in train_reader():
        # fgsm attack
        adversary = att(Adversary(data[0][0], data[0][1]))

--- a/fluid/adversarial/mnist_tutorial_jsma.py
+++ b/fluid/adversarial/mnist_tutorial_jsma.py
+"""
+FGSM demos on mnist using advbox tool.
+"""
+import matplotlib.pyplot as plt
+import paddle.v2 as paddle
+import paddle.fluid as fluid
+import numpy as np
+
+from advbox import Adversary
+from advbox.attacks.saliency import SaliencyMapAttack
+from advbox.models.paddle import PaddleModel
+
+
+def cnn_model(img):
+    """
+    Mnist cnn model
+    Args:
+        img(Varaible): the input image to be recognized
+    Returns:
+        Variable: the label prediction
+    """
+    # conv1 = fluid.nets.conv2d()
+    conv_pool_1 = fluid.nets.simple_img_conv_pool(
+        input=img,
+        num_filters=20,
+        filter_size=5,
+        pool_size=2,
+        pool_stride=2,
+        act='relu')
+
+    conv_pool_2 = fluid.nets.simple_img_conv_pool(
+        input=conv_pool_1,
+        num_filters=50,
+        filter_size=5,
+        pool_size=2,
+        pool_stride=2,
+        act='relu')
+
+    logits = fluid.layers.fc(input=conv_pool_2, size=10, act='softmax')
+    return logits
+
+
+def main():
+    """
+    Advbox demo which demonstrate how to use advbox.
+    """
+    IMG_NAME = 'img'
+    LABEL_NAME = 'label'
+
+    img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32')
+    # gradient should flow
+    img.stop_gradient = False
+    label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64')
+    logits = cnn_model(img)
+    cost = fluid.layers.cross_entropy(input=logits, label=label)
+    avg_cost = fluid.layers.mean(x=cost)
+
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    BATCH_SIZE = 1
+    train_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.train(), buf_size=500),
+        batch_size=BATCH_SIZE)
+    feeder = fluid.DataFeeder(
+        feed_list=[IMG_NAME, LABEL_NAME],
+        place=place,
+        program=fluid.default_main_program())
+
+    fluid.io.load_params(
+        exe, "./mnist/", main_program=fluid.default_main_program())
+
+    # advbox demo
+    m = PaddleModel(fluid.default_main_program(), IMG_NAME, LABEL_NAME,
+                    logits.name, avg_cost.name, (-1, 1))
+    attack = SaliencyMapAttack(m)
+    total_num = 0
+    success_num = 0
+    for data in train_reader():
+        total_num += 1
+        # adversary.set_target(True, target_label=target_label)
+        jsma_attack = attack(Adversary(data[0][0], data[0][1]))
+        if jsma_attack is not None and jsma_attack.is_successful():
+            # plt.imshow(jsma_attack.target, cmap='Greys_r')
+            # plt.show()
+            success_num += 1
+            print('original_label=%d, adversary examples label =%d' %
+                  (data[0][1], jsma_attack.adversarial_label))
+            # np.save('adv_img', jsma_attack.adversarial_example)
+        print('total num = %d, success num = %d ' % (total_num, success_num))
+        if total_num == 100:
+            break
+
+
+if __name__ == '__main__':
+    main()
--- a/fluid/image_classification/README.md
+++ b/fluid/image_classification/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # SE-ResNeXt for image classification

 This model built with paddle fluid is still under active development and is not

--- a/fluid/image_classification/caffe2fluid/README.md
+++ b/fluid/image_classification/caffe2fluid/README.md
+### Caffe2Fluid
+This tool is used to convert a Caffe model to Fluid model
+
+### Howto
+1, Prepare caffepb.py in ./proto if your python has no 'pycaffe' module, two options provided here:
+
+    1) generate it from caffe.proto using protoc
+        bash ./proto/compile.sh
+
+    2) download one from github directly
+        cd proto/ && wget https://github.com/ethereon/caffe-tensorflow/blob/master/kaffe/caffe/caffepb.py
+
+2, Convert the caffe model using 'convert.py' which will generate a python script and a weight(in .npy) file
+
+3, Use the converted model to predict
+
+    see more detail info in 'examples/xxx'
+
+
+### Tested models
+- Lenet on mnist dataset
+
+- ResNets:(ResNet-50, ResNet-101, ResNet-152)
+    model addr: `https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777`_
+
+- GoogleNet:
+    model addr: `https://gist.github.com/jimmie33/7ea9f8ac0da259866b854460f4526034`_
+
+- VGG:
+    model addr: `https://gist.github.com/ksimonyan/211839e770f7b538e2d8`_
+
+- AlexNet:
+    model addr: `https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet`_
+
+### Notes
+Some of this code come from here: https://github.com/ethereon/caffe-tensorflow
--- a/fluid/image_classification/caffe2fluid/convert.py
+++ b/fluid/image_classification/caffe2fluid/convert.py
+#!/usr/bin/env python
+
+import os
+import sys
+import numpy as np
+import argparse
+
+from kaffe import KaffeError, print_stderr
+from kaffe.paddle import Transformer
+
+
+def fatal_error(msg):
+    """ fatal error encounted
+    """
+    print_stderr(msg)
+    exit(-1)
+
+
+def validate_arguments(args):
+    """ validate args
+    """
+    if (args.data_output_path is not None) and (args.caffemodel is None):
+        fatal_error('No input data path provided.')
+    if (args.caffemodel is not None) and (args.data_output_path is None):
+        fatal_error('No output data path provided.')
+    if (args.code_output_path is None) and (args.data_output_path is None):
+        fatal_error('No output path specified.')
+
+
+def convert(def_path, caffemodel_path, data_output_path, code_output_path,
+            phase):
+    """ convert caffe model to tf/paddle models
+    """
+    try:
+        transformer = Transformer(def_path, caffemodel_path, phase=phase)
+        print_stderr('Converting data...')
+        if caffemodel_path is not None:
+            data = transformer.transform_data()
+            print_stderr('Saving data...')
+            with open(data_output_path, 'wb') as data_out:
+                np.save(data_out, data)
+        if code_output_path:
+            print_stderr('Saving source...')
+            with open(code_output_path, 'wb') as src_out:
+                src_out.write(transformer.transform_source())
+        print_stderr('Done.')
+    except KaffeError as err:
+        fatal_error('Error encountered: {}'.format(err))
+
+    return 0
+
+
+def main():
+    """ main
+    """
+    parser = argparse.ArgumentParser()
+    parser.add_argument('def_path', help='Model definition (.prototxt) path')
+    parser.add_argument('--caffemodel', help='Model data (.caffemodel) path')
+    parser.add_argument('--data-output-path', help='Converted data output path')
+    parser.add_argument(
+        '--code-output-path', help='Save generated source to this path')
+    parser.add_argument(
+        '-p',
+        '--phase',
+        default='test',
+        help='The phase to convert: test (default) or train')
+    args = parser.parse_args()
+    validate_arguments(args)
+    return convert(args.def_path, args.caffemodel, args.data_output_path,
+                   args.code_output_path, args.phase)
+
+
+if __name__ == '__main__':
+    ret = main()
+    sys.exit(ret)
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/README.md
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/README.md
+a demo to show converting caffe models on 'imagenet' using caffe2fluid
+
+---
+
+# How to use
+
+1. prepare python environment
+2. download caffe model to "models.caffe/xxx" which contains "xxx.caffemodel" and "xxx.prototxt"
+3. run the tool
+    eg: bash ./run.sh resnet50 ./models.caffe/resnet50 ./models/resnet50
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/data/65.jpeg
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/data/65.jpeg
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/infer.py
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/infer.py
+#!/bin/env python
+
+#function:
+#   a demo to show how to use the converted model genereated by caffe2fluid
+#   
+#notes:
+#   only support imagenet data
+
+import os
+import sys
+import inspect
+import numpy as np
+import paddle.v2 as paddle
+import paddle.v2.fluid as fluid
+
+
+def load_data(imgfile, shape):
+    h, w = shape[1:]
+    from PIL import Image
+    im = Image.open(imgfile)
+
+    # The storage order of the loaded image is W(widht),
+    # H(height), C(channel). PaddlePaddle requires
+    # the CHW order, so transpose them.
+    im = im.resize((w, h), Image.ANTIALIAS)
+    im = np.array(im).astype(np.float32)
+    im = im.transpose((2, 0, 1))  # CHW
+    im = im[(2, 1, 0), :, :]  # BGR
+
+    # The mean to be subtracted from each image.
+    # By default, the per-channel ImageNet mean.
+    mean = np.array([104., 117., 124.], dtype=np.float32)
+    mean = mean.reshape([3, 1, 1])
+    im = im - mean
+    return im.reshape([1] + shape)
+
+
+def build_model(net_file, net_name):
+    print('build model with net_file[%s] and net_name[%s]' %
+          (net_file, net_name))
+
+    net_path = os.path.dirname(net_file)
+    module_name = os.path.basename(net_file).rstrip('.py')
+    if net_path not in sys.path:
+        sys.path.insert(0, net_path)
+
+    try:
+        m = __import__(module_name, fromlist=[net_name])
+        MyNet = getattr(m, net_name)
+    except Exception as e:
+        print('failed to load module[%s]' % (module_name))
+        print(e)
+        return None
+
+    input_name = 'data'
+    input_shape = MyNet.input_shapes()[input_name]
+    images = fluid.layers.data(name='image', shape=input_shape, dtype='float32')
+    #label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+
+    net = MyNet({input_name: images})
+    input_shape = MyNet.input_shapes()[input_name]
+    return net, input_shape
+
+
+def dump_results(results, names, root):
+    if os.path.exists(root) is False:
+        os.path.mkdir(root)
+
+    for i in range(len(names)):
+        n = names[i]
+        res = results[i]
+        filename = os.path.join(root, n)
+        np.save(filename + '.npy', res)
+
+
+def infer(net_file, net_name, model_file, imgfile, debug=False):
+    """ do inference using a model which consist 'xxx.py' and 'xxx.npy'
+    """
+    #1, build model
+    net, input_shape = build_model(net_file, net_name)
+    prediction = net.get_output()
+
+    #2, load weights for this model
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    startup_program = fluid.default_startup_program()
+    exe.run(startup_program)
+
+    if model_file.find('.npy') > 0:
+        net.load(data_path=model_file, exe=exe, place=place)
+    else:
+        net.load(data_path=model_file, exe=exe)
+
+    #3, test this model
+    test_program = fluid.default_main_program().clone()
+
+    fetch_list_var = []
+    fetch_list_name = []
+    if debug is False:
+        fetch_list_var.append(prediction)
+    else:
+        for k, v in net.layers.items():
+            fetch_list_var.append(v)
+            fetch_list_name.append(k)
+
+    np_images = load_data(imgfile, input_shape)
+    results = exe.run(program=test_program,
+                      feed={'image': np_images},
+                      fetch_list=fetch_list_var)
+
+    if debug is True:
+        dump_path = 'results.layers'
+        dump_results(results, fetch_list_name, dump_path)
+        print('all results dumped to [%s]' % (dump_path))
+    else:
+        result = results[0]
+        print('predicted class:', np.argmax(result))
+
+
+if __name__ == "__main__":
+    """ maybe more convenient to use 'run.sh' to call this tool
+    """
+    net_file = 'models/resnet50/resnet50.py'
+    weight_file = 'models/resnet50/resnet50.npy'
+    imgfile = 'data/65.jpeg'
+    net_name = 'ResNet50'
+
+    argc = len(sys.argv)
+    if argc == 5:
+        net_file = sys.argv[1]
+        weight_file = sys.argv[2]
+        imgfile = sys.argv[3]
+        net_name = sys.argv[4]
+    elif argc > 1:
+        print('usage:')
+        print('\tpython %s [net_file] [weight_file] [imgfile] [net_name]' %
+              (sys.argv[0]))
+        print('\teg:python %s %s %s %s %s' % (sys.argv[0], net_file,
+                                              weight_file, imgfile, net_name))
+        sys.exit(1)
+
+    infer(net_file, net_name, weight_file, imgfile)
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/run.sh
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/run.sh
+#!/bin/bash
+
+#function:
+#   a tool used to:
+#       1, convert a caffe model
+#       2, do inference using this model
+#
+#usage:
+#   bash run.sh resnet50 ./models.caffe/resnet50 ./models/resnet50
+#
+
+#set -x
+if [[ $# -lt 3 ]];then
+    echo "usage:"
+    echo "  bash $0 [model_name] [cf_model_path] [pd_model_path] [only_convert]"
+    echo "  eg: bash $0 resnet50 ./models.caffe/resnet50 ./models/resnet50"
+    exit 1
+else
+    model_name=$1
+    cf_model_path=$2
+    pd_model_path=$3
+    only_convert=$4
+fi
+
+proto_file=$cf_model_path/${model_name}.prototxt
+caffemodel_file=$cf_model_path/${model_name}.caffemodel
+weight_file=$pd_model_path/${model_name}.npy
+net_file=$pd_model_path/${model_name}.py
+
+if [[ ! -e $proto_file ]];then
+    echo "not found prototxt[$proto_file]"
+    exit 1
+fi
+
+if [[ ! -e $caffemodel_file ]];then
+    echo "not found caffemodel[$caffemodel_file]"
+    exit 1
+fi
+
+if [[ ! -e $pd_model_path ]];then
+    mkdir $pd_model_path
+fi
+
+PYTHON=`which cfpython`
+if [[ -z $PYTHON ]];then
+    PYTHON=`which python`
+fi
+$PYTHON ../../convert.py \
+        $proto_file \
+        --caffemodel $caffemodel_file \
+        --data-output-path $weight_file\
+        --code-output-path $net_file
+
+ret=$?
+if [[ $ret -ne 0 ]];then
+    echo "failed to convert caffe model[$cf_model_path]"
+    exit $ret
+else
+    echo "succeed to convert caffe model[$cf_model_path] to fluid model[$pd_model_path]"
+fi
+
+if [[ -z $only_convert ]];then
+    PYTHON=`which pdpython`
+    if [[ -z $PYTHON ]];then
+        PYTHON=`which python`
+    fi
+    imgfile="data/65.jpeg"
+    net_name=`grep "name" $proto_file | head -n1 | perl -ne 'if(/\"([^\"]+)\"/){ print $1."\n";}'`
+    $PYTHON ./infer.py $net_file $weight_file $imgfile $net_name
+    ret=$?
+fi
+exit $ret
--- a/fluid/image_classification/caffe2fluid/examples/mnist/README.md
+++ b/fluid/image_classification/caffe2fluid/examples/mnist/README.md
+a demo to show converting caffe model on 'mnist' using caffe2fluid
+
+---
+
+# How to use
+
+1. prepare python environment
+2. download caffe model to "models.caffe/lenet" which contains "lenet.caffemodel" and "lenet.prototxt"
+3. run the tool
+    eg: bash ./run.sh lenet ./models.caffe/lenet ./models/lenet
--- a/fluid/image_classification/caffe2fluid/examples/mnist/evaluate.py
+++ b/fluid/image_classification/caffe2fluid/examples/mnist/evaluate.py
+#!/bin/env python
+
+#function:
+#   demo to show how to use converted model using caffe2fluid
+#
+
+import sys
+import os
+import numpy as np
+import paddle.v2 as paddle
+import paddle.v2.fluid as fluid
+
+
+def test_model(exe, test_program, fetch_list, test_reader, feeder):
+    acc_set = []
+
+    for data in test_reader():
+        acc_np, pred = exe.run(program=test_program,
+                               feed=feeder.feed(data),
+                               fetch_list=fetch_list)
+        acc_set.append(float(acc_np))
+
+    acc_val = np.array(acc_set).mean()
+    return float(acc_val)
+
+
+def evaluate(net_file, model_file):
+    """ main
+    """
+    #1, build model
+    net_path = os.path.dirname(net_file)
+    if net_path not in sys.path:
+        sys.path.insert(0, net_path)
+
+    from lenet import LeNet as MyNet
+
+    with_gpu = False
+    paddle.init(use_gpu=with_gpu)
+
+    #1, define network topology
+    images = fluid.layers.data(name='image', shape=[1, 28, 28], dtype='float32')
+    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+
+    net = MyNet({'data': images})
+    prediction = net.layers['prob']
+    acc = fluid.layers.accuracy(input=prediction, label=label)
+
+    place = fluid.CUDAPlace(0) if with_gpu is True else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+
+    #2, load weights
+    if model_file.find('.npy') > 0:
+        net.load(data_path=model_file, exe=exe, place=place)
+    else:
+        net.load(data_path=model_file, exe=exe)
+
+    #3, test this model
+    test_program = fluid.default_main_program().clone()
+    test_reader = paddle.batch(paddle.dataset.mnist.test(), batch_size=128)
+
+    feeder = fluid.DataFeeder(feed_list=[images, label], place=place)
+    fetch_list = [acc, prediction]
+
+    print('go to test model using test set')
+    acc_val = test_model(exe, test_program, \
+            fetch_list, test_reader, feeder)
+
+    print('test accuracy is [%.4f], expected value[0.919]' % (acc_val))
+
+
+if __name__ == "__main__":
+    net_file = 'models/lenet/lenet.py'
+    weight_file = 'models/lenet/lenet.npy'
+
+    argc = len(sys.argv)
+    if argc == 3:
+        net_file = sys.argv[1]
+        weight_file = sys.argv[2]
+    elif argc > 1:
+        print('usage:')
+        print('\tpython %s [net_file] [weight_file]' % (sys.argv[0]))
+        print('\teg:python %s %s %s %s' % (sys.argv[0], net_file, weight_file))
+        sys.exit(1)
+
+    evaluate(net_file, weight_file)
--- a/fluid/image_classification/caffe2fluid/examples/mnist/run.sh
+++ b/fluid/image_classification/caffe2fluid/examples/mnist/run.sh
+#!/bin/bash
+
+#function:
+#   a tool used to:
+#       1, convert a caffe model
+#       2, do inference using this model
+#
+#usage:
+#   bash run.sh lenet ./models.caffe/lenet ./models/lenet
+#
+
+#set -x
+if [[ $# -lt 3 ]];then
+    echo "usage:"
+    echo "  bash $0 [model_name] [cf_model_path] [pd_model_path] [only_convert]"
+    echo "  eg: bash $0 lenet ./models.caffe/lenet ./models/lenet"
+    exit 1
+else
+    model_name=$1
+    cf_model_path=$2
+    pd_model_path=$3
+    no_eval=$4
+fi
+
+proto_file=$cf_model_path/${model_name}.prototxt
+caffemodel_file=$cf_model_path/${model_name}.caffemodel
+weight_file=$pd_model_path/${model_name}.npy
+net_file=$pd_model_path/${model_name}.py
+
+if [[ ! -e $proto_file ]];then
+    echo "not found prototxt[$proto_file]"
+    exit 1
+fi
+
+if [[ ! -e $caffemodel_file ]];then
+    echo "not found caffemodel[$caffemodel_file]"
+    exit 1
+fi
+
+if [[ ! -e $pd_model_path ]];then
+    mkdir $pd_model_path
+fi
+
+PYTHON=`which cfpython`
+if [[ -z $PYTHON ]];then
+    PYTHON=`which python`
+fi
+$PYTHON ../../convert.py \
+        $proto_file \
+        --caffemodel $caffemodel_file \
+        --data-output-path $weight_file\
+        --code-output-path $net_file
+
+ret=$?
+if [[ $ret -ne 0 ]];then
+    echo "failed to convert caffe model[$cf_model_path]"
+    exit $ret
+else
+    echo "succeed to convert caffe model[$cf_model_path] to fluid model[$pd_model_path]"
+fi
+
+if [[ -z $only_convert ]];then
+    PYTHON=`which pdpython`
+    if [[ -z $PYTHON ]];then
+        PYTHON=`which python`
+    fi
+    net_name=`grep "name" $proto_file | head -n1 | perl -ne 'if(/\"([^\"]+)\"/){ print $1."\n";}'`
+    if [[ $net_name != "LeNet" ]];then
+        echo "only support LeNet"
+        exit 1
+    fi
+    $PYTHON ./evaluate.py $net_file $weight_file
+    ret=$?
+fi
+exit $ret
--- a/fluid/image_classification/caffe2fluid/kaffe/__init__.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/__init__.py
+from .graph import GraphBuilder, NodeMapper
+from .errors import KaffeError, print_stderr
+
+import os
+from . import paddle
--- a/fluid/image_classification/caffe2fluid/kaffe/caffe/__init__.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/caffe/__init__.py
+from .resolver import get_caffe_resolver, has_pycaffe
--- a/fluid/image_classification/caffe2fluid/kaffe/caffe/resolver.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/caffe/resolver.py
+import os
+import sys
+
+SHARED_CAFFE_RESOLVER = None
+
+
+def import_caffepb():
+    p = os.path.realpath(__file__)
+    p = os.path.dirname(p)
+    p = os.path.join(p, '../../proto')
+    sys.path.insert(0, p)
+    import caffepb
+    return caffepb
+
+
+class CaffeResolver(object):
+    def __init__(self):
+        self.import_caffe()
+
+    def import_caffe(self):
+        self.caffe = None
+        try:
+            # Try to import PyCaffe first
+            import caffe
+            self.caffe = caffe
+        except ImportError:
+            # Fall back to the protobuf implementation
+            self.caffepb = import_caffepb()
+            show_fallback_warning()
+        if self.caffe:
+            # Use the protobuf code from the imported distribution.
+            # This way, Caffe variants with custom layers will work.
+            self.caffepb = self.caffe.proto.caffe_pb2
+        self.NetParameter = self.caffepb.NetParameter
+
+    def has_pycaffe(self):
+        return self.caffe is not None
+
+
+def get_caffe_resolver():
+    global SHARED_CAFFE_RESOLVER
+    if SHARED_CAFFE_RESOLVER is None:
+        SHARED_CAFFE_RESOLVER = CaffeResolver()
+    return SHARED_CAFFE_RESOLVER
+
+
+def has_pycaffe():
+    return get_caffe_resolver().has_pycaffe()
+
+
+def show_fallback_warning():
+    msg = '''
+------------------------------------------------------------
+    WARNING: PyCaffe not found!
+    Falling back to a pure protocol buffer implementation.
+    * Conversions will be drastically slower.
+------------------------------------------------------------
+
+'''
+    sys.stderr.write(msg)
--- a/fluid/image_classification/caffe2fluid/kaffe/errors.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/errors.py
+import sys
+
+#debug level, can be 'warn', 'verbose'
+log_level = 'warn'
+
+
+class KaffeError(Exception):
+    pass
+
+
+def print_stderr(msg):
+    sys.stderr.write('%s\n' % msg)
+
+
+def debug(msg):
+    if log_level == 'verbose':
+        print_stderr('[DEBUG]' + msg)
+
+
+def notice(msg):
+    print_stderr('[NOTICE]' + msg)
+
+
+def warn(msg):
+    print_stderr('[WARNING]' + msg)
+
+
+def set_loglevel(level):
+    global log_level
+
+    if 'warn' != level and 'verbose' != level:
+        raise Exception('not supported log level[%s]' % (level))
+
+    log_level = level
--- a/fluid/image_classification/caffe2fluid/kaffe/graph.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/graph.py
+from google.protobuf import text_format
+
+from .caffe import get_caffe_resolver
+from .errors import KaffeError, print_stderr
+from .layers import LayerAdapter, LayerType, NodeKind, NodeDispatch
+from .shapes import TensorShape
+
+
+class Node(object):
+    def __init__(self, name, kind, layer=None):
+        self.name = name
+        self.kind = kind
+        self.layer = LayerAdapter(layer, kind) if layer else None
+        self.parents = []
+        self.children = []
+        self.data = None
+        self.output_shape = None
+        self.metadata = {}
+
+    def add_parent(self, parent_node):
+        assert parent_node not in self.parents
+        self.parents.append(parent_node)
+        if self not in parent_node.children:
+            parent_node.children.append(self)
+
+    def add_child(self, child_node):
+        assert child_node not in self.children
+        self.children.append(child_node)
+        if self not in child_node.parents:
+            child_node.parents.append(self)
+
+    def get_only_parent(self):
+        if len(self.parents) != 1:
+            raise KaffeError('Node (%s) expected to have 1 parent. Found %s.' %
+                             (self, len(self.parents)))
+        return self.parents[0]
+
+    @property
+    def parameters(self):
+        if self.layer is not None:
+            return self.layer.parameters
+        return None
+
+    def __str__(self):
+        return '[%s] %s' % (self.kind, self.name)
+
+    def __repr__(self):
+        return '%s (0x%x)' % (self.name, id(self))
+
+
+class Graph(object):
+    def __init__(self, nodes=None, name=None):
+        self.nodes = nodes or []
+        self.node_lut = {node.name: node for node in self.nodes}
+        self.name = name
+
+    def add_node(self, node):
+        self.nodes.append(node)
+        self.node_lut[node.name] = node
+
+    def get_node(self, name):
+        try:
+            return self.node_lut[name]
+        except KeyError:
+            raise KaffeError('Layer not found: %s' % name)
+
+    def get_input_nodes(self):
+        return [node for node in self.nodes if len(node.parents) == 0]
+
+    def get_output_nodes(self):
+        return [node for node in self.nodes if len(node.children) == 0]
+
+    def topologically_sorted(self):
+        sorted_nodes = []
+        unsorted_nodes = list(self.nodes)
+        temp_marked = set()
+        perm_marked = set()
+
+        def visit(node):
+            if node in temp_marked:
+                raise KaffeError('Graph is not a DAG.')
+            if node in perm_marked:
+                return
+            temp_marked.add(node)
+            for child in node.children:
+                visit(child)
+            perm_marked.add(node)
+            temp_marked.remove(node)
+            sorted_nodes.insert(0, node)
+
+        while len(unsorted_nodes):
+            visit(unsorted_nodes.pop())
+        return sorted_nodes
+
+    def compute_output_shapes(self):
+        sorted_nodes = self.topologically_sorted()
+        for node in sorted_nodes:
+            node.output_shape = TensorShape(
+                *NodeKind.compute_output_shape(node))
+
+    def replaced(self, new_nodes):
+        return Graph(nodes=new_nodes, name=self.name)
+
+    def transformed(self, transformers):
+        graph = self
+        for transformer in transformers:
+            graph = transformer(graph)
+            if graph is None:
+                raise KaffeError('Transformer failed: {}'.format(transformer))
+            assert isinstance(graph, Graph)
+        return graph
+
+    def __contains__(self, key):
+        return key in self.node_lut
+
+    def __str__(self):
+        hdr = '{:<20} {:<30} {:>20} {:>20}'.format('Type', 'Name', 'Param',
+                                                   'Output')
+        s = [hdr, '-' * 94]
+        for node in self.topologically_sorted():
+            # If the node has learned parameters, display the first one's shape.
+            # In case of convolutions, this corresponds to the weights.
+            data_shape = node.data[0].shape if node.data else '--'
+            out_shape = node.output_shape or '--'
+            s.append('{:<20} {:<30} {:>20} {:>20}'.format(
+                node.kind, node.name, data_shape, tuple(out_shape)))
+        return '\n'.join(s)
+
+
+class GraphBuilder(object):
+    '''Constructs a model graph from a Caffe protocol buffer definition.'''
+
+    def __init__(self, def_path, phase='test'):
+        '''
+        def_path: Path to the model definition (.prototxt)
+        data_path: Path to the model data (.caffemodel)
+        phase: Either 'test' or 'train'. Used for filtering phase-specific nodes.
+        '''
+        self.def_path = def_path
+        self.phase = phase
+        self.load()
+
+    def load(self):
+        '''Load the layer definitions from the prototxt.'''
+        self.params = get_caffe_resolver().NetParameter()
+        with open(self.def_path, 'rb') as def_file:
+            text_format.Merge(def_file.read(), self.params)
+
+    def filter_layers(self, layers):
+        '''Filter out layers based on the current phase.'''
+        phase_map = {0: 'train', 1: 'test'}
+        filtered_layer_names = set()
+        filtered_layers = []
+        for layer in layers:
+            phase = self.phase
+            if len(layer.include):
+                phase = phase_map[layer.include[0].phase]
+            if len(layer.exclude):
+                phase = phase_map[1 - layer.include[0].phase]
+            exclude = (phase != self.phase)
+            # Dropout layers appear in a fair number of Caffe
+            # test-time networks. These are just ignored. We'll
+            # filter them out here.
+            if (not exclude) and (phase == 'test'):
+                exclude = (layer.type == LayerType.Dropout)
+            if not exclude:
+                filtered_layers.append(layer)
+                # Guard against dupes.
+                assert layer.name not in filtered_layer_names
+                filtered_layer_names.add(layer.name)
+        return filtered_layers
+
+    def make_node(self, layer):
+        '''Create a graph node for the given layer.'''
+        kind = NodeKind.map_raw_kind(layer.type)
+        if kind is None:
+            raise KaffeError('Unknown layer type encountered: %s' % layer.type)
+
+        # We want to use the layer's top names (the "output" names), rather than the
+        # name attribute, which is more of readability thing than a functional one.
+        # Other layers will refer to a node by its "top name".
+        return Node(layer.name, kind, layer=layer)
+
+    def make_input_nodes(self):
+        '''
+        Create data input nodes.
+
+        This method is for old-style inputs, where the input specification
+        was not treated as a first-class layer in the prototext.
+        Newer models use the "Input layer" type.
+        '''
+        nodes = [Node(name, NodeKind.Data) for name in self.params.input]
+        if len(nodes):
+            input_dim = map(int, self.params.input_dim)
+            if not input_dim:
+                if len(self.params.input_shape) > 0:
+                    input_dim = map(int, self.params.input_shape[0].dim)
+                else:
+                    raise KaffeError('Dimensions for input not specified.')
+            for node in nodes:
+                node.output_shape = tuple(input_dim)
+        return nodes
+
+    def build(self):
+        '''
+        Builds the graph from the Caffe layer definitions.
+        '''
+        # Get the layers
+        layers = self.params.layers or self.params.layer
+        # Filter out phase-excluded layers
+        layers = self.filter_layers(layers)
+        # Get any separately-specified input layers
+        nodes = self.make_input_nodes()
+        nodes += [self.make_node(layer) for layer in layers]
+        # Initialize the graph
+        graph = Graph(nodes=nodes, name=self.params.name)
+        # Connect the nodes
+        #
+        # A note on layers and outputs:
+        # In Caffe, each layer can produce multiple outputs ("tops") from a set of inputs
+        # ("bottoms"). The bottoms refer to other layers' tops. The top can rewrite a bottom
+        # (in case of in-place operations). Note that the layer's name is not used for establishing
+        # any connectivity. It's only used for data association. By convention, a layer with a
+        # single top will often use the same name (although this is not required).
+        #
+        # The current implementation only supports single-output nodes (note that a node can still
+        # have multiple children, since multiple child nodes can refer to the single top's name).
+        node_outputs = {}
+        for layer in layers:
+            node = graph.get_node(layer.name)
+            for input_name in layer.bottom:
+                assert input_name != layer.name
+                parent_node = node_outputs.get(input_name)
+                if (parent_node is None) or (parent_node == node):
+                    parent_node = graph.get_node(input_name)
+                node.add_parent(parent_node)
+            if len(layer.top) > 1:
+                raise KaffeError('Multiple top nodes are not supported.')
+
+            for output_name in layer.top:
+                if output_name == layer.name:
+                    # Output is named the same as the node. No further action required.
+                    continue
+                # There are two possibilities here:
+                #
+                # Case 1: output_name refers to another node in the graph.
+                # This is an "in-place operation" that overwrites an existing node.
+                # This would create a cycle in the graph. We'll undo the in-placing
+                # by substituting this node wherever the overwritten node is referenced.
+                #
+                # Case 2: output_name violates the convention layer.name == output_name.
+                # Since we are working in the single-output regime, we will can rename it to
+                # match the layer name.
+                #
+                # For both cases, future references to this top re-routes to this node.
+                node_outputs[output_name] = node
+
+        graph.compute_output_shapes()
+        return graph
+
+
+class NodeMapper(NodeDispatch):
+    def __init__(self, graph):
+        self.graph = graph
+
+    def map(self):
+        nodes = self.graph.topologically_sorted()
+        # Remove input nodes - we'll handle them separately.
+        input_nodes = self.graph.get_input_nodes()
+        nodes = [t for t in nodes if t not in input_nodes]
+        # Decompose DAG into chains.
+        chains = []
+        for node in nodes:
+            attach_to_chain = None
+            if len(node.parents) == 1:
+                parent = node.get_only_parent()
+                for chain in chains:
+                    if chain[-1] == parent:
+                        # Node is part of an existing chain.
+                        attach_to_chain = chain
+                        break
+            if attach_to_chain is None:
+                # Start a new chain for this node.
+                attach_to_chain = []
+                chains.append(attach_to_chain)
+            attach_to_chain.append(node)
+        # Map each chain.
+        mapped_chains = []
+        for chain in chains:
+            mapped_chains.append(self.map_chain(chain))
+        return self.commit(mapped_chains)
+
+    def map_chain(self, chain):
+        return [self.map_node(node) for node in chain]
+
+    def map_node(self, node):
+        map_func = self.get_handler(node.kind, 'map')
+        mapped_node = map_func(node)
+        assert mapped_node is not None
+        mapped_node.node = node
+        return mapped_node
+
+    def commit(self, mapped_chains):
+        raise NotImplementedError('Must be implemented by subclass.')
--- a/fluid/image_classification/caffe2fluid/kaffe/layers.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/layers.py
+import re
+import numbers
+from collections import namedtuple
+
+from .shapes import *
+
+LAYER_DESCRIPTORS = {
+
+    # Caffe Types
+    'AbsVal': shape_identity,
+    'Accuracy': shape_scalar,
+    'ArgMax': shape_not_implemented,
+    'BatchNorm': shape_identity,
+    'BNLL': shape_not_implemented,
+    'Concat': shape_concat,
+    'ContrastiveLoss': shape_scalar,
+    'Convolution': shape_convolution,
+    'Deconvolution': shape_not_implemented,
+    'Data': shape_data,
+    'Dropout': shape_identity,
+    'DummyData': shape_data,
+    'EuclideanLoss': shape_scalar,
+    'Eltwise': shape_identity,
+    'Exp': shape_identity,
+    'Flatten': shape_not_implemented,
+    'HDF5Data': shape_data,
+    'HDF5Output': shape_identity,
+    'HingeLoss': shape_scalar,
+    'Im2col': shape_not_implemented,
+    'ImageData': shape_data,
+    'InfogainLoss': shape_scalar,
+    'InnerProduct': shape_inner_product,
+    'Input': shape_data,
+    'LRN': shape_identity,
+    'MemoryData': shape_mem_data,
+    'MultinomialLogisticLoss': shape_scalar,
+    'MVN': shape_not_implemented,
+    'Pooling': shape_pool,
+    'Power': shape_identity,
+    'ReLU': shape_identity,
+    'Scale': shape_identity,
+    'Sigmoid': shape_identity,
+    'SigmoidCrossEntropyLoss': shape_scalar,
+    'Silence': shape_not_implemented,
+    'Softmax': shape_identity,
+    'SoftmaxWithLoss': shape_scalar,
+    'Split': shape_not_implemented,
+    'Slice': shape_not_implemented,
+    'TanH': shape_identity,
+    'WindowData': shape_not_implemented,
+    'Threshold': shape_identity,
+}
+
+# layer types in 'V1LayerParameter'
+# (v1layertype name, enum value, mapped to layer type)
+v1_layertypes = [
+    ('ABSVAL', 35),
+    ('ACCURACY', 1),
+    ('ARGMAX', 30),
+    ('BNLL', 2),
+    ('CONCAT', 3),
+    ('CONVOLUTION', 4),
+    ('DATA', 5),
+    ('DECONVOLUTION', 39),
+    ('DROPOUT', 6),
+    ('ELTWISE', 25),
+    ('EXP', 38),
+    ('FLATTEN', 8),
+    ('IM2COL', 11),
+    ('INNERPRODUCT', 14),
+    ('LRN', 15),
+    ('MEMORYDATA', 29),
+    ('MULTINOMIALLOGISTICLOSS', 16),
+    ('MVN', 34),
+    ('POOLING', 17),
+    ('POWER', 26),
+    ('RELU', 18),
+    ('SIGMOID', 19),
+    ('SIGMOIDCROSSENTROPYLOSS', 27),
+    ('SILENCE', 36),
+    ('SOFTMAX', 20),
+    ('SPLIT', 22),
+    ('SLICE', 33),
+    ('TANH', 23),
+    ('WINDOWDATA', 24),
+    ('THRESHOLD', 31),
+]
+
+LAYER_TYPES = LAYER_DESCRIPTORS.keys()
+LayerType = type('LayerType', (), {t: t for t in LAYER_TYPES})
+
+#map the layer name in V1 to standard name
+V1_LAYER_MAP = {'_not_init_': True}
+
+
+def get_v1_layer_map():
+    global V1_LAYER_MAP
+    if '_not_init_' not in V1_LAYER_MAP:
+        return V1_LAYER_MAP
+    else:
+        del V1_LAYER_MAP['_not_init_']
+
+    name2layer = {}
+    for n in LAYER_TYPES:
+        name2layer[n.upper()] = n
+
+    for l in v1_layertypes:
+        n, v = l
+        if n in name2layer and v not in V1_LAYER_MAP:
+            V1_LAYER_MAP[v] = name2layer[n]
+        else:
+            raise KaffeError('not found v1 layer type %s' % n)
+    return V1_LAYER_MAP
+
+
+class NodeKind(LayerType):
+    @staticmethod
+    def map_raw_kind(kind):
+        if kind in LAYER_TYPES:
+            return kind
+
+        v1_layers = get_v1_layer_map()
+        if kind in v1_layers:
+            return v1_layers[kind]
+        else:
+            return None
+
+    @staticmethod
+    def compute_output_shape(node):
+        try:
+            val = LAYER_DESCRIPTORS[node.kind](node)
+            return val
+        except NotImplementedError:
+            raise KaffeError(
+                'Output shape computation not implemented for type: %s' %
+                node.kind)
+
+
+class NodeDispatchError(KaffeError):
+
+    pass
+
+
+class NodeDispatch(object):
+    @staticmethod
+    def get_handler_name(node_kind):
+        if len(node_kind) <= 4:
+            # A catch-all for things like ReLU and tanh
+            return node_kind.lower()
+        # Convert from CamelCase to under_scored
+        name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', node_kind)
+        return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()
+
+    def get_handler(self, node_kind, prefix):
+        name = self.get_handler_name(node_kind)
+        name = '_'.join((prefix, name))
+        try:
+            return getattr(self, name)
+        except AttributeError:
+            raise NodeDispatchError(
+                'No handler found for node kind: %s (expected: %s)' %
+                (node_kind, name))
+
+
+class LayerAdapter(object):
+    def __init__(self, layer, kind):
+        self.layer = layer
+        self.kind = kind
+
+    @property
+    def parameters(self):
+        name = NodeDispatch.get_handler_name(self.kind)
+        name = '_'.join((name, 'param'))
+        try:
+            return getattr(self.layer, name)
+        except AttributeError:
+            raise NodeDispatchError(
+                'Caffe parameters not found for layer kind: %s' % (self.kind))
+
+    @staticmethod
+    def get_kernel_value(scalar, repeated, idx, default=None):
+        if scalar:
+            return scalar
+        if repeated:
+            if isinstance(repeated, numbers.Number):
+                return repeated
+            if len(repeated) == 1:
+                # Same value applies to all spatial dimensions
+                return int(repeated[0])
+            assert idx < len(repeated)
+            # Extract the value for the given spatial dimension
+            return repeated[idx]
+        if default is None:
+            raise ValueError('Unable to determine kernel parameter!')
+        return default
+
+    @property
+    def kernel_parameters(self):
+        assert self.kind in (NodeKind.Convolution, NodeKind.Pooling)
+        params = self.parameters
+        k_h = self.get_kernel_value(params.kernel_h, params.kernel_size, 0)
+        k_w = self.get_kernel_value(params.kernel_w, params.kernel_size, 1)
+        s_h = self.get_kernel_value(
+            params.stride_h, params.stride, 0, default=1)
+        s_w = self.get_kernel_value(
+            params.stride_w, params.stride, 1, default=1)
+        p_h = self.get_kernel_value(params.pad_h, params.pad, 0, default=0)
+        p_w = self.get_kernel_value(params.pad_h, params.pad, 1, default=0)
+        return KernelParameters(k_h, k_w, s_h, s_w, p_h, p_w)
+
+
+KernelParameters = namedtuple('KernelParameters', [
+    'kernel_h', 'kernel_w', 'stride_h', 'stride_w', 'pad_h', 'pad_w'
+])
--- a/fluid/image_classification/caffe2fluid/kaffe/paddle/__init__.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/paddle/__init__.py
+from .transformer import Transformer
+from .network import Network
--- a/fluid/image_classification/caffe2fluid/kaffe/paddle/network.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/paddle/network.py
+import math
+import os
+import numpy as np
+
+
+def import_fluid():
+    import paddle.v2.fluid as fluid
+    return fluid
+
+
+def layer(op):
+    '''Decorator for composable network layers.'''
+
+    def layer_decorated(self, *args, **kwargs):
+        # Automatically set a name if not provided.
+        name = kwargs.setdefault('name', self.get_unique_name(op.__name__))
+        # Figure out the layer inputs.
+        if len(self.terminals) == 0:
+            raise RuntimeError('No input variables found for layer %s.' % name)
+        elif len(self.terminals) == 1:
+            layer_input = self.terminals[0]
+        else:
+            layer_input = list(self.terminals)
+        # Perform the operation and get the output.
+        layer_output = op(self, layer_input, *args, **kwargs)
+        # Add to layer LUT.
+        self.layers[name] = layer_output
+        # This output is now the input for the next layer.
+        self.feed(layer_output)
+        #print('output shape of %s:' % (name))
+        #print layer_output.shape
+
+        # Return self for chained calls.
+        return self
+
+    return layer_decorated
+
+
+class Network(object):
+    def __init__(self, inputs, trainable=True):
+        # The input nodes for this network
+        self.inputs = inputs
+        # The current list of terminal nodes
+        self.terminals = []
+        # Mapping from layer names to layers
+        self.layers = dict(inputs)
+        # If true, the resulting variables are set as trainable
+        self.trainable = trainable
+        # Switch variable for dropout
+        self.paddle_env = None
+        self.setup()
+
+    def setup(self):
+        '''Construct the network. '''
+        raise NotImplementedError('Must be implemented by the subclass.')
+
+    def load(self, data_path, exe=None, place=None, ignore_missing=False):
+        '''Load network weights.
+        data_path: The path to the numpy-serialized network weights
+        ignore_missing: If true, serialized weights for missing layers are ignored.
+        '''
+        fluid = import_fluid()
+        #load fluid mode directly
+        if os.path.isdir(data_path):
+            assert (exe is not None), \
+                'must provide a executor to load fluid model'
+            fluid.io.load_persistables_if_exist(executor=exe, dirname=data_path)
+            return True
+
+        #load model from a npy file
+        if exe is None or place is None:
+            if self.paddle_env is None:
+                place = fluid.CPUPlace()
+                exe = fluid.Executor(place)
+                self.paddle_env = {'place': place, 'exe': exe}
+                exe = exe.run(fluid.default_startup_program())
+            else:
+                place = self.paddle_env['place']
+                exe = self.paddle_env['exe']
+
+        data_dict = np.load(data_path).item()
+        for op_name in data_dict:
+            layer = self.layers[op_name]
+            for param_name, data in data_dict[op_name].iteritems():
+                try:
+                    name = '%s_%s' % (op_name, param_name)
+                    v = fluid.global_scope().find_var(name)
+                    w = v.get_tensor()
+                    w.set(data, place)
+                except ValueError:
+                    if not ignore_missing:
+                        raise
+        return True
+
+    def feed(self, *args):
+        '''Set the input(s) for the next operation by replacing the terminal nodes.
+        The arguments can be either layer names or the actual layers.
+        '''
+        assert len(args) != 0
+        self.terminals = []
+        for fed_layer in args:
+            if isinstance(fed_layer, basestring):
+                try:
+                    fed_layer = self.layers[fed_layer]
+                except KeyError:
+                    raise KeyError('Unknown layer name fed: %s' % fed_layer)
+            self.terminals.append(fed_layer)
+        return self
+
+    def get_output(self):
+        '''Returns the current network output.'''
+        return self.terminals[-1]
+
+    def get_unique_name(self, prefix):
+        '''Returns an index-suffixed unique name for the given prefix.
+        This is used for auto-generating layer names based on the type-prefix.
+        '''
+        ident = sum(t.startswith(prefix) for t, _ in self.layers.items()) + 1
+        return '%s_%d' % (prefix, ident)
+
+    @layer
+    def conv(self,
+             input,
+             k_h,
+             k_w,
+             c_o,
+             s_h,
+             s_w,
+             name,
+             relu=True,
+             padding=None,
+             group=1,
+             biased=True):
+        if padding is None:
+            padding = [0, 0]
+
+        # Get the number of channels in the input
+        c_i, h_i, w_i = input.shape[1:]
+
+        # Verify that the grouping parameter is valid
+        assert c_i % group == 0
+        assert c_o % group == 0
+
+        fluid = import_fluid()
+        prefix = name + '_'
+        output = fluid.layers.conv2d(
+            input=input,
+            filter_size=[k_h, k_w],
+            num_filters=c_o,
+            stride=[s_h, s_w],
+            padding=padding,
+            groups=group,
+            param_attr=fluid.ParamAttr(name=prefix + "weights"),
+            bias_attr=fluid.ParamAttr(name=prefix + "biases"),
+            act="relu" if relu is True else None)
+        return output
+
+    @layer
+    def relu(self, input, name):
+        fluid = import_fluid()
+        output = fluid.layers.relu(x=input)
+        return output
+
+    def _adjust_pad_if_needed(self, i_hw, k_hw, s_hw, p_hw):
+        #adjust the padding if needed
+        i_h, i_w = i_hw
+        k_h, k_w = k_hw
+        s_h, s_w = s_hw
+        p_h, p_w = p_hw
+
+        def is_consistent(i, k, s, p):
+            o = i + 2 * p - k
+            if o % s == 0:
+                return True
+            else:
+                return False
+
+        real_p_h = 0
+        real_p_w = 0
+        if is_consistent(i_h, k_h, s_h, p_h) is False:
+            real_p_h = int(k_h / 2)
+
+        if is_consistent(i_w, k_w, s_w, p_w) is False:
+            real_p_w = int(k_w / 2)
+
+        return [real_p_h, real_p_w]
+
+    def pool(self, pool_type, input, k_h, k_w, s_h, s_w, name, padding):
+        # Get the number of channels in the input
+        in_hw = input.shape[2:]
+        k_hw = [k_h, k_w]
+        s_hw = [s_h, s_w]
+
+        if padding is None:
+            #fix bug about the difference between conv and pool
+            #more info: https://github.com/BVLC/caffe/issues/1318
+            padding = self._adjust_pad_if_needed(in_hw, k_hw, s_hw, [0, 0])
+
+        fluid = import_fluid()
+        output = fluid.layers.pool2d(
+            input=input,
+            pool_size=k_hw,
+            pool_stride=s_hw,
+            pool_padding=padding,
+            pool_type=pool_type)
+        return output
+
+    @layer
+    def max_pool(self, input, k_h, k_w, s_h, s_w, name, padding=None):
+        return self.pool('max', input, k_h, k_w, s_h, s_w, name, padding)
+
+    @layer
+    def avg_pool(self, input, k_h, k_w, s_h, s_w, name, padding=None):
+        return self.pool('avg', input, k_h, k_w, s_h, s_w, name, padding)
+
+    @layer
+    def lrn(self, input, radius, alpha, beta, name, bias=1.0):
+        fluid = import_fluid()
+        output = fluid.layers.lrn(input=input, \
+                n=radius, k=bias, alpha=alpha, beta=beta, name=name)
+        return output
+
+    @layer
+    def concat(self, inputs, axis, name):
+        fluid = import_fluid()
+        output = fluid.layers.concat(input=inputs, axis=axis)
+        return output
+
+    @layer
+    def add(self, inputs, name):
+        fluid = import_fluid()
+        output = inputs[0]
+        for i in inputs[1:]:
+            output = fluid.layers.elementwise_add(x=output, y=i)
+        return output
+
+    @layer
+    def fc(self, input, num_out, name, relu=True, act=None):
+        fluid = import_fluid()
+
+        if act is None:
+            act = 'relu' if relu is True else None
+
+        prefix = name + '_'
+        output = fluid.layers.fc(
+            name=name,
+            input=input,
+            size=num_out,
+            act=act,
+            param_attr=fluid.ParamAttr(name=prefix + 'weights'),
+            bias_attr=fluid.ParamAttr(name=prefix + 'biases'))
+        return output
+
+    @layer
+    def softmax(self, input, name):
+        fluid = import_fluid()
+        output = fluid.layers.softmax(input)
+        return output
+
+    @layer
+    def batch_normalization(self, input, name, scale_offset=True, relu=False):
+        # NOTE: Currently, only inference is supported
+        fluid = import_fluid()
+        prefix = name + '_'
+        param_attr = None if scale_offset is False else fluid.ParamAttr(
+            name=prefix + 'scale')
+        bias_attr = None if scale_offset is False else fluid.ParamAttr(
+            name=prefix + 'offset')
+        mean_name = prefix + 'mean'
+        variance_name = prefix + 'variance'
+        output = fluid.layers.batch_norm(
+            name=name,
+            input=input,
+            is_test=True,
+            param_attr=param_attr,
+            bias_attr=bias_attr,
+            moving_mean_name=mean_name,
+            moving_variance_name=variance_name,
+            epsilon=1e-5,
+            act='relu' if relu is True else None)
+
+        return output
+
+    @layer
+    def dropout(self, input, drop_prob, name, is_test=True):
+        fluid = import_fluid()
+        output = fluid.layers.dropout(
+            input, dropout_prob=drop_prob, is_test=is_test, name=name)
+        return output
--- a/fluid/image_classification/caffe2fluid/kaffe/paddle/transformer.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/paddle/transformer.py
+import numpy as np
+
+from ..errors import KaffeError, print_stderr
+from ..graph import GraphBuilder, NodeMapper
+from ..layers import NodeKind
+from ..transformers import (DataInjector, DataReshaper, NodeRenamer, ReLUFuser,
+                            BatchNormScaleBiasFuser, BatchNormPreprocessor,
+                            ParameterNamer)
+from . import network
+
+
+def get_padding_type(kernel_params, input_shape, output_shape):
+    '''Translates Caffe's numeric padding to one of ('SAME', 'VALID').
+    Caffe supports arbitrary padding values, while TensorFlow only
+    supports 'SAME' and 'VALID' modes. So, not all Caffe paddings
+    can be translated to TensorFlow. There are some subtleties to
+    how the padding edge-cases are handled. These are described here:
+    https://github.com/Yangqing/caffe2/blob/master/caffe2/proto/caffe2_legacy.proto
+    '''
+    k_h, k_w, s_h, s_w, p_h, p_w = kernel_params
+    if p_h * p_w > 0:
+        return [p_h, p_w]
+    else:
+        return None
+
+
+class TensorFlowNode(object):
+    '''An intermediate representation for TensorFlow operations.'''
+
+    def __init__(self, op, *args, **kwargs):
+        # A string corresponding to the TensorFlow operation
+        self.op = op
+        # Positional arguments for the operation
+        self.args = args
+        # Keyword arguments for the operation
+        self.kwargs = list(kwargs.items())
+        # The source Caffe node
+        self.node = None
+
+    def format(self, arg):
+        '''Returns a string representation for the given value.'''
+        return "'%s'" % arg if isinstance(arg, basestring) else str(arg)
+
+    def pair(self, key, value):
+        '''Returns key=formatted(value).'''
+        return '%s=%s' % (key, self.format(value))
+
+    def emit(self):
+        '''Emits the Python source for this node.'''
+        # Format positional arguments
+        args = map(self.format, self.args)
+        # Format any keyword arguments
+        if self.kwargs:
+            args += [self.pair(k, v) for k, v in self.kwargs]
+        # Set the node name
+        args.append(self.pair('name', self.node.name))
+        args = ', '.join(args)
+        return '%s(%s)' % (self.op, args)
+
+
+class MaybeActivated(object):
+    def __init__(self, node, default=True):
+        self.inject_kwargs = {}
+        if node.metadata.get('relu', False) != default:
+            self.inject_kwargs['relu'] = not default
+
+    def __call__(self, *args, **kwargs):
+        kwargs.update(self.inject_kwargs)
+        return TensorFlowNode(*args, **kwargs)
+
+
+class TensorFlowMapper(NodeMapper):
+    def get_kernel_params(self, node):
+        kernel_params = node.layer.kernel_parameters
+        input_shape = node.get_only_parent().output_shape
+        padding = get_padding_type(kernel_params, input_shape,
+                                   node.output_shape)
+        # Only emit the padding if it's not the default value.
+        padding = {'padding': padding} if padding is not None else {}
+        return (kernel_params, padding)
+
+    def map_convolution(self, node):
+        (kernel_params, kwargs) = self.get_kernel_params(node)
+        h = kernel_params.kernel_h
+        w = kernel_params.kernel_w
+        c_o = node.output_shape[1]
+        c_i = node.parents[0].output_shape[1]
+        group = node.parameters.group
+        if group != 1:
+            kwargs['group'] = group
+        if not node.parameters.bias_term:
+            kwargs['biased'] = False
+        assert kernel_params.kernel_h == h
+        assert kernel_params.kernel_w == w
+        return MaybeActivated(node)(
+            'conv', kernel_params.kernel_h, kernel_params.kernel_w, c_o,
+            kernel_params.stride_h, kernel_params.stride_w, **kwargs)
+
+    def map_relu(self, node):
+        return TensorFlowNode('relu')
+
+    def map_pooling(self, node):
+        pool_type = node.parameters.pool
+        if pool_type == 0:
+            pool_op = 'max_pool'
+        elif pool_type == 1:
+            pool_op = 'avg_pool'
+        else:
+            # Stochastic pooling, for instance.
+            raise KaffeError('Unsupported pooling type.')
+        (kernel_params, padding) = self.get_kernel_params(node)
+        return TensorFlowNode(pool_op, kernel_params.kernel_h,
+                              kernel_params.kernel_w, kernel_params.stride_h,
+                              kernel_params.stride_w, **padding)
+
+    def map_inner_product(self, node):
+        #TODO: Axis
+        assert node.parameters.axis == 1
+        #TODO: Unbiased
+        assert node.parameters.bias_term == True
+        return MaybeActivated(node)('fc', node.parameters.num_output)
+
+    def map_softmax(self, node):
+        return TensorFlowNode('softmax')
+
+    def map_lrn(self, node):
+        params = node.parameters
+        # The window size must be an odd value. For a window
+        # size of (2*n+1), TensorFlow defines depth_radius = n.
+        assert params.local_size % 2 == 1
+        # Caffe scales by (alpha/(2*n+1)), whereas TensorFlow
+        # just scales by alpha (as does Krizhevsky's paper).
+        # We'll account for that here.
+        alpha = params.alpha / float(params.local_size)
+        return TensorFlowNode('lrn', params.local_size, alpha, params.beta)
+
+    def map_concat(self, node):
+        return TensorFlowNode('concat', node.parameters.axis)
+
+    def map_dropout(self, node):
+        return TensorFlowNode('dropout', node.parameters.dropout_ratio)
+
+    def map_batch_norm(self, node):
+        scale_offset = len(node.data) == 4
+        kwargs = {} if scale_offset else {'scale_offset': False}
+        return MaybeActivated(
+            node, default=False)('batch_normalization', **kwargs)
+
+    def map_eltwise(self, node):
+        operations = {0: 'multiply', 1: 'add', 2: 'max'}
+        op_code = node.parameters.operation
+        try:
+            return TensorFlowNode(operations[op_code])
+        except KeyError:
+            raise KaffeError('Unknown elementwise operation: {}'.format(
+                op_code))
+
+    def commit(self, chains):
+        return chains
+
+
+class TensorFlowEmitter(object):
+    def __init__(self, tab=None):
+        self.tab = tab or ' ' * 4
+        self.prefix = ''
+        self.net_name = ''
+
+    def indent(self):
+        self.prefix += self.tab
+
+    def outdent(self):
+        self.prefix = self.prefix[:-len(self.tab)]
+
+    def statement(self, s):
+        return self.prefix + s + '\n'
+
+    def emit_imports(self):
+        import inspect
+        codes = []
+        codes.append(
+            '### generated by caffe2fluid, your net is in class "%s" ###\n' %
+            (self.net_name))
+        network_source = inspect.getsource(network)
+        codes.append(network_source + '\n')
+        return self.statement('\n'.join(codes))
+
+    def emit_class_def(self, name):
+        return self.statement('class %s(Network):' % (name))
+
+    def emit_setup_def(self):
+        return self.statement('def setup(self):')
+
+    def emit_shape_def(self, input_nodes):
+        self.outdent()
+        func_def = self.statement('@classmethod')
+        func_def += self.statement('def input_shapes(cls):')
+        self.indent()
+
+        input_shapes = {}
+        for n in input_nodes:
+            name = n.name
+            output_shape = n.output_shape
+            shape = [str(s) for s in output_shape[1:]]
+            input_shapes[name] = ', '.join(shape)
+        input_shapes = ['"%s": [%s]' % (n, l) for n, l in input_shapes.items()]
+        shape_str = ','.join(input_shapes)
+        func_def += self.statement('return {%s}' % (shape_str))
+        return '\n\n' + func_def
+
+    def emit_convert_def(self, input_nodes):
+        codes = []
+        inputs = {}
+        codes.append('shapes = cls.input_shapes()')
+        for n in input_nodes:
+            name = n.name
+            layer_var = name + '_layer'
+            layer_def = '%s = fluid.layers.data(name="%s", shape=shapes["%s"],'\
+                    ' dtype="float32")' % (layer_var, name, name)
+            #layer_var, layer_def = data_layer_def(n.name, n.output_shape)
+            codes.append(layer_def)
+            inputs[name] = layer_var
+
+        input_dict = ','.join(['"%s": %s' % (n, l) for n, l in inputs.items()])
+
+        codes.append('feed_data = {' + input_dict + '}')
+        codes.append('net = cls(feed_data)')
+
+        codes.append("place = fluid.CPUPlace()")
+        codes.append("exe = fluid.Executor(place)")
+        codes.append("exe.run(fluid.default_startup_program())")
+        codes.append("net.load(data_path=npy_model, exe=exe, place=place)")
+        codes.append(
+            "fluid.io.save_persistables(executor=exe, dirname=fluid_path)")
+
+        self.outdent()
+        func_def = self.statement('@classmethod')
+        func_def += self.statement('def convert(cls, npy_model, fluid_path):')
+        self.indent()
+        func_def += self.statement('import paddle.v2.fluid as fluid')
+        for l in codes:
+            func_def += self.statement(l)
+        return '\n' + func_def
+
+    def emit_main_def(self, name):
+        if name is None:
+            return ''
+
+        self.prefix = ''
+        main_def = self.statement('if __name__ == "__main__":')
+        self.indent()
+        main_def += self.statement("#usage: python xxxnet.py xxx.npy ./model\n")
+        main_def += self.statement("import sys")
+        main_def += self.statement("npy_weight = sys.argv[1]")
+        main_def += self.statement("fluid_model = sys.argv[2]")
+        main_def += self.statement("%s.convert(npy_weight, fluid_model)" %
+                                   (name))
+        main_def += self.statement("exit(0)")
+        return '\n\n' + main_def
+
+    def emit_parents(self, chain):
+        assert len(chain)
+        s = 'self.feed('
+        sep = ', \n' + self.prefix + (' ' * len(s))
+        s += sep.join(
+            ["'%s'" % parent.name for parent in chain[0].node.parents])
+        return self.statement(s + ')')
+
+    def emit_node(self, node):
+        return self.statement('self.' + node.emit())
+
+    def emit(self, name, chains, input_nodes=None):
+        self.net_name = name
+        s = self.emit_imports()
+        s += self.emit_class_def(name)
+        self.indent()
+        s += self.emit_setup_def()
+        self.indent()
+        blocks = []
+        for chain in chains:
+            b = ''
+            b += self.emit_parents(chain)
+            for node in chain:
+                b += self.emit_node(node)
+            blocks.append(b[:-1])
+        s = s + '\n\n'.join(blocks)
+        s += self.emit_shape_def(input_nodes)
+        s += self.emit_convert_def(input_nodes)
+        s += self.emit_main_def(name)
+        return s
+
+
+class Transformer(object):
+    def __init__(self, def_path, data_path, verbose=True, phase='test'):
+        self.verbose = verbose
+        self.phase = phase
+        self.load(def_path, data_path, phase)
+        self.params = None
+        self.source = None
+
+    def load(self, def_path, data_path, phase):
+        # Build the graph
+        graph = GraphBuilder(def_path, phase).build()
+
+        if data_path is not None:
+            # Load and associate learned parameters
+            graph = DataInjector(def_path, data_path)(graph)
+
+        # Transform the graph
+        transformers = [
+            # Fuse split batch normalization layers
+            BatchNormScaleBiasFuser(),
+
+            # Fuse ReLUs
+            # TODO: Move non-linearity application to layer wrapper, allowing
+            # any arbitrary operation to be optionally activated.
+            ReLUFuser(allowed_parent_types=[
+                NodeKind.Convolution, NodeKind.InnerProduct, NodeKind.BatchNorm
+            ]),
+
+            # Rename nodes
+            # Slashes are used for scoping in TensorFlow. Replace slashes
+            # in node names with underscores.
+            # (Caffe's GoogLeNet implementation uses slashes)
+            NodeRenamer(lambda node: node.name.replace('/', '_'))
+        ]
+        self.graph = graph.transformed(transformers)
+
+        # Display the graph
+        if self.verbose:
+            print_stderr(self.graph)
+
+    def transform_data(self):
+        if self.params is None:
+            transformers = [
+                # Reshape the parameters to TensorFlow's ordering
+                DataReshaper({
+                    # (c_o, c_i, h, w) -> (h, w, c_i, c_o) for TF
+                    NodeKind.Convolution: (0, 1, 2, 3),
+
+                    # (c_o, c_i) -> (c_i, c_o)
+                    NodeKind.InnerProduct: (1, 0)
+                }),
+
+                # Pre-process batch normalization data
+                BatchNormPreprocessor(),
+
+                # Convert parameters to dictionaries
+                ParameterNamer(),
+            ]
+            self.graph = self.graph.transformed(transformers)
+            self.params = {
+                node.name: node.data
+                for node in self.graph.nodes if node.data
+            }
+        return self.params
+
+    def transform_source(self):
+        if self.source is None:
+            mapper = TensorFlowMapper(self.graph)
+            chains = mapper.map()
+            emitter = TensorFlowEmitter()
+            input_nodes = self.graph.get_input_nodes()
+            self.source = emitter.emit(self.graph.name, chains, input_nodes)
+        return self.source
--- a/fluid/image_classification/caffe2fluid/kaffe/shapes.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/shapes.py
+import math
+from collections import namedtuple
+
+from .errors import KaffeError
+
+TensorShape = namedtuple('TensorShape',
+                         ['batch_size', 'channels', 'height', 'width'])
+
+
+def get_filter_output_shape(i_h, i_w, params, round_func):
+    o_h = (i_h + 2 * params.pad_h - params.kernel_h
+           ) / float(params.stride_h) + 1
+    o_w = (i_w + 2 * params.pad_w - params.kernel_w
+           ) / float(params.stride_w) + 1
+    return (int(round_func(o_h)), int(round_func(o_w)))
+
+
+def get_strided_kernel_output_shape(node, round_func):
+    assert node.layer is not None
+    input_shape = node.get_only_parent().output_shape
+    o_h, o_w = get_filter_output_shape(input_shape.height, input_shape.width,
+                                       node.layer.kernel_parameters, round_func)
+    params = node.layer.parameters
+    has_c_o = hasattr(params, 'num_output')
+    c = params.num_output if has_c_o else input_shape.channels
+    return TensorShape(input_shape.batch_size, c, o_h, o_w)
+
+
+def shape_not_implemented(node):
+    raise NotImplementedError
+
+
+def shape_identity(node):
+    assert len(node.parents) > 0
+    return node.parents[0].output_shape
+
+
+def shape_scalar(node):
+    return TensorShape(1, 1, 1, 1)
+
+
+def shape_data(node):
+    if node.output_shape:
+        # Old-style input specification
+        return node.output_shape
+    try:
+        # New-style input specification
+        return map(int, node.parameters.shape[0].dim)
+    except:
+        # We most likely have a data layer on our hands. The problem is,
+        # Caffe infers the dimensions of the data from the source (eg: LMDB).
+        # We want to avoid reading datasets here. Fail for now.
+        # This can be temporarily fixed by transforming the data layer to
+        # Caffe's "input" layer (as is usually used in the "deploy" version).
+        # TODO: Find a better solution for this.
+        raise KaffeError('Cannot determine dimensions of data layer.\n'
+                         'See comments in function shape_data for more info.')
+
+
+def shape_mem_data(node):
+    params = node.parameters
+    return TensorShape(params.batch_size, params.channels, params.height,
+                       params.width)
+
+
+def shape_concat(node):
+    axis = node.layer.parameters.axis
+    output_shape = None
+    for parent in node.parents:
+        if output_shape is None:
+            output_shape = list(parent.output_shape)
+        else:
+            output_shape[axis] += parent.output_shape[axis]
+    return tuple(output_shape)
+
+
+def shape_convolution(node):
+    return get_strided_kernel_output_shape(node, math.floor)
+
+
+def shape_pool(node):
+    return get_strided_kernel_output_shape(node, math.ceil)
+
+
+def shape_inner_product(node):
+    input_shape = node.get_only_parent().output_shape
+    return TensorShape(input_shape.batch_size, node.layer.parameters.num_output,
+                       1, 1)
--- a/fluid/image_classification/caffe2fluid/kaffe/transformers.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/transformers.py
+'''
+A collection of graph transforms.
+
+A transformer is a callable that accepts a graph and returns a transformed version.
+'''
+import os
+import numpy as np
+
+from .caffe import get_caffe_resolver, has_pycaffe
+from .errors import KaffeError, debug, notice, warn
+from .layers import NodeKind
+
+
+class DataInjector(object):
+    '''
+    Associates parameters loaded from a .caffemodel file with their corresponding nodes.
+    '''
+
+    def __init__(self, def_path, data_path):
+        # The .prototxt file defining the graph
+        self.def_path = def_path
+        # The .caffemodel file containing the learned parameters
+        self.data_path = data_path
+        # Set to true if the fallback protocol-buffer based backend was used
+        self.did_use_pb = False
+        # A list containing (layer name, parameters) tuples
+        self.params = None
+        # Load the parameters
+        self.load()
+
+    def load(self):
+        if has_pycaffe():
+            self.load_using_caffe()
+        else:
+            self.load_using_pb()
+
+    def load_using_caffe(self):
+        caffe = get_caffe_resolver().caffe
+        net = caffe.Net(self.def_path, self.data_path, caffe.TEST)
+        data = lambda blob: blob.data
+        self.params = [(k, map(data, v)) for k, v in net.params.items()]
+
+    def load_using_pb(self):
+        data = get_caffe_resolver().NetParameter()
+        data.MergeFromString(open(self.data_path, 'rb').read())
+        pair = lambda layer: (layer.name, self.normalize_pb_data(layer))
+        layers = data.layers or data.layer
+        self.params = [pair(layer) for layer in layers if layer.blobs]
+        self.did_use_pb = True
+
+    def normalize_pb_data(self, layer):
+        transformed = []
+        for blob in layer.blobs:
+            if len(blob.shape.dim):
+                dims = blob.shape.dim
+                c_o, c_i, h, w = map(int, [1] * (4 - len(dims)) + list(dims))
+            else:
+                c_o = blob.num
+                c_i = blob.channels
+                h = blob.height
+                w = blob.width
+            data = np.array(blob.data, dtype=np.float32).reshape(c_o, c_i, h, w)
+            transformed.append(data)
+        return transformed
+
+    def adjust_parameters(self, node, data):
+        if not self.did_use_pb:
+            return data
+        # When using the protobuf-backend, each parameter initially has four dimensions.
+        # In certain cases (like FC layers), we want to eliminate the singleton dimensions.
+        # This implementation takes care of the common cases. However, it does leave the
+        # potential for future issues.
+        # The Caffe-backend does not suffer from this problem.
+        data = list(data)
+        squeeze_indices = [1]  # Squeeze biases.
+        if node.kind == NodeKind.InnerProduct:
+            squeeze_indices.append(0)  # Squeeze FC.
+
+        for idx in squeeze_indices:
+            if idx >= len(data):
+                continue
+
+            shape_old = data[idx].shape
+            data[idx] = np.squeeze(data[idx])
+            shape_new = data[idx].shape
+            if len(shape_old) != shape_new:
+                debug('squeeze idx:%d, with kind:%s,name:%s' % \
+                        (idx, node.kind, node.name))
+        return data
+
+    def __call__(self, graph):
+        for layer_name, data in self.params:
+            if layer_name in graph:
+                node = graph.get_node(layer_name)
+                node.data = self.adjust_parameters(node, data)
+            else:
+                notice('Ignoring parameters for non-existent layer: %s' % \
+                        layer_name)
+        return graph
+
+
+class DataReshaper(object):
+    def __init__(self, mapping, replace=True):
+        # A dictionary mapping NodeKind to the transposed order.
+        self.mapping = mapping
+        # The node kinds eligible for reshaping
+        self.reshaped_node_types = self.mapping.keys()
+        # If true, the reshaped data will replace the old one.
+        # Otherwise, it's set to the reshaped_data attribute.
+        self.replace = replace
+
+    def has_spatial_parent(self, node):
+        try:
+            parent = node.get_only_parent()
+            s = parent.output_shape
+            return s.height > 1 or s.width > 1
+        except KaffeError:
+            return False
+
+    def map(self, node_kind):
+        try:
+            return self.mapping[node_kind]
+        except KeyError:
+            raise
+            #raise KaffeError('Ordering not found for node kind: {}'.format(node_kind))
+
+    def __call__(self, graph):
+        for node in graph.nodes:
+            if node.data is None:
+                continue
+            if node.kind not in self.reshaped_node_types:
+                # Check for 2+ dimensional data
+                if any(len(tensor.shape) > 1 for tensor in node.data):
+                    notice('parmaters not reshaped for node: {}'.format(node))
+                continue
+            transpose_order = self.map(node.kind)
+            weights = node.data[0]
+            if (node.kind == NodeKind.InnerProduct
+                ) and self.has_spatial_parent(node):
+                # The FC layer connected to the spatial layer needs to be
+                # re-wired to match the new spatial ordering.
+                in_shape = node.get_only_parent().output_shape
+                fc_shape = weights.shape
+                output_channels = fc_shape[0]
+                weights = weights.reshape((output_channels, -1))
+                weights = weights.transpose(transpose_order)
+                node.reshaped_data = weights
+            else:
+                node.reshaped_data = weights.transpose(transpose_order)
+
+        if self.replace:
+            for node in graph.nodes:
+                if hasattr(node, 'reshaped_data'):
+                    # Set the weights
+                    node.data[0] = node.reshaped_data
+                    del node.reshaped_data
+        return graph
+
+
+class SubNodeFuser(object):
+    '''
+    An abstract helper for merging a single-child with its single-parent.
+    '''
+
+    def __call__(self, graph):
+        nodes = graph.nodes
+        fused_nodes = []
+        for node in nodes:
+            if len(node.parents) != 1:
+                # We're only fusing nodes with single parents
+                continue
+            parent = node.get_only_parent()
+            if len(parent.children) != 1:
+                # We can only fuse a node if its parent's
+                # value isn't used by any other node.
+                continue
+            if not self.is_eligible_pair(parent, node):
+                continue
+            # Rewrite the fused node's children to its parent.
+            for child in node.children:
+                child.parents.remove(node)
+                parent.add_child(child)
+            # Disconnect the fused node from the graph.
+            parent.children.remove(node)
+            fused_nodes.append(node)
+            # Let the sub-class merge the fused node in any arbitrary way.
+            self.merge(parent, node)
+        transformed_nodes = [node for node in nodes if node not in fused_nodes]
+        return graph.replaced(transformed_nodes)
+
+    def is_eligible_pair(self, parent, child):
+        '''Returns true if this parent/child pair is eligible for fusion.'''
+        raise NotImplementedError('Must be implemented by subclass.')
+
+    def merge(self, parent, child):
+        '''Merge the child node into the parent.'''
+        raise NotImplementedError('Must be implemented by subclass')
+
+
+class ReLUFuser(SubNodeFuser):
+    '''
+    Fuses rectified linear units with their parent nodes.
+    '''
+
+    def __init__(self, allowed_parent_types=None):
+        # Fuse ReLUs when the parent node is one of the given types.
+        # If None, all node types are eligible.
+        self.allowed_parent_types = allowed_parent_types
+
+    def is_eligible_pair(self, parent, child):
+        return ((self.allowed_parent_types is None or \
+                parent.kind in self.allowed_parent_types) and \
+                child.kind == NodeKind.ReLU)
+
+    def merge(self, parent, _):
+        parent.metadata['relu'] = True
+
+
+class BatchNormScaleBiasFuser(SubNodeFuser):
+    '''
+    The original batch normalization paper includes two learned
+    parameters: a scaling factor \gamma and a bias \beta.
+    Caffe's implementation does not include these two. However, it is commonly
+    replicated by adding a scaling+bias layer immidiately after the batch norm.
+
+    This fuser merges the scaling+bias layer with the batch norm.
+    '''
+
+    def is_eligible_pair(self, parent, child):
+        return (parent.kind == NodeKind.BatchNorm and \
+                child.kind == NodeKind.Scale and \
+                child.parameters.axis == 1 and \
+                child.parameters.bias_term == True)
+
+    def merge(self, parent, child):
+        parent.scale_bias_node = child
+
+
+class BatchNormPreprocessor(object):
+    '''
+    Prescale batch normalization parameters.
+    Concatenate gamma (scale) and beta (bias) terms if set.
+    '''
+
+    def __call__(self, graph):
+        for node in graph.nodes:
+            if node.kind != NodeKind.BatchNorm:
+                continue
+            assert node.data is not None
+            assert len(node.data) == 3
+            node.data = [np.squeeze(i) for i in node.data]
+            mean, variance, scale = node.data
+            # Prescale the stats
+            scaling_factor = 1.0 / scale if scale != 0 else 0
+            mean *= scaling_factor
+            variance *= scaling_factor
+            # Replace with the updated values
+            node.data = [mean, variance]
+            if hasattr(node, 'scale_bias_node'):
+                # Include the scale and bias terms
+                gamma, beta = node.scale_bias_node.data
+                node.data += [np.squeeze(i) for i in [gamma, beta]]
+        return graph
+
+
+class NodeRenamer(object):
+    '''
+    Renames nodes in the graph using a given unary function that
+    accepts a node and returns its new name.
+    '''
+
+    def __init__(self, renamer):
+        self.renamer = renamer
+
+    def __call__(self, graph):
+        for node in graph.nodes:
+            node.name = self.renamer(node)
+        return graph
+
+
+class ParameterNamer(object):
+    '''
+    Convert layer data arrays to a dictionary mapping parameter names to their values.
+    '''
+
+    def __call__(self, graph):
+        for node in graph.nodes:
+            if node.data is None:
+                continue
+            if node.kind in (NodeKind.Convolution, NodeKind.InnerProduct):
+                names = ('weights', )
+                if node.parameters.bias_term:
+                    names += ('biases', )
+            elif node.kind == NodeKind.BatchNorm:
+                names = ('mean', 'variance')
+                if len(node.data) == 4:
+                    names += ('scale', 'offset')
+            else:
+                warn('Unhandled parameters: {}'.format(node.kind))
+                continue
+            assert len(names) == len(node.data)
+            node.data = dict(zip(names, node.data))
+        return graph
--- a/fluid/image_classification/caffe2fluid/proto/caffe.proto
+++ b/fluid/image_classification/caffe2fluid/proto/caffe.proto
--- a/fluid/image_classification/caffe2fluid/proto/compile.sh
+++ b/fluid/image_classification/caffe2fluid/proto/compile.sh
+#!/bin/bash
+
+#function:
+#   script used to generate caffepb.py from caffe.proto using protoc
+#
+
+PROTOC=`which protoc`
+if [[ -z $PROTOC ]];then
+    echo "not found protoc, you should first install it following this[https://github.com/google/protobuf/releases]"
+    exit 1
+fi
+
+WORK_ROOT=$(dirname `readlink -f "$BASH_SOURCE[0]"`)
+PY_NAME="$WORK_ROOT/caffepb.py"
+$PROTOC --proto_path=$WORK_ROOT --python_out=$WORK_ROOT $WORK_ROOT/caffe.proto
+ret=$?
+
+if [ $ret -eq 0 ];then
+    mv $WORK_ROOT/caffe_pb2.py $PY_NAME
+fi
+
+if [ -e "$PY_NAME" ];then
+    echo "succeed to generate [$PY_NAME]"
+    exit 0
+else
+    echo "failed to generate [$PY_NAME]"
+fi
+exit $ret
--- a/fluid/image_classification/mobilenet.py
+++ b/fluid/image_classification/mobilenet.py
--- a/fluid/image_classification/se_resnext.py
+++ b/fluid/image_classification/se_resnext.py
--- a/fluid/neural_machine_translation/README.md
+++ b/fluid/neural_machine_translation/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
+This is a collection of example models for neural machine translation and neural sequence modeling.
+
+### TODO
+
+This project is still under active development.
--- a/fluid/neural_machine_translation/transformer/.gitignore
+++ b/fluid/neural_machine_translation/transformer/.gitignore
+*.pyc
--- a/fluid/neural_machine_translation/transformer/README.md
+++ b/fluid/neural_machine_translation/transformer/README.md
--- a/fluid/neural_machine_translation/transformer/config.py
+++ b/fluid/neural_machine_translation/transformer/config.py
--- a/fluid/neural_machine_translation/transformer/infer.py
+++ b/fluid/neural_machine_translation/transformer/infer.py
--- a/fluid/neural_machine_translation/transformer/model.py
+++ b/fluid/neural_machine_translation/transformer/model.py
--- a/fluid/neural_machine_translation/transformer/optim.py
+++ b/fluid/neural_machine_translation/transformer/optim.py
--- a/fluid/neural_machine_translation/transformer/train.py
+++ b/fluid/neural_machine_translation/transformer/train.py
--- a/fluid/object_detection/README.md
+++ b/fluid/object_detection/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
+# MobileNet-SSD
+
+This model built with paddle fluid is still under active development and is not
+the final version. We welcome feedbacks.
--- a/fluid/object_detection/data/label_list
+++ b/fluid/object_detection/data/label_list
--- a/fluid/object_detection/data/prepare_voc_data.py
+++ b/fluid/object_detection/data/prepare_voc_data.py
--- a/fluid/object_detection/image_util.py
+++ b/fluid/object_detection/image_util.py
--- a/fluid/object_detection/load_model.py
+++ b/fluid/object_detection/load_model.py
--- a/fluid/object_detection/mobilenet_ssd.py
+++ b/fluid/object_detection/mobilenet_ssd.py
--- a/fluid/object_detection/reader.py
+++ b/fluid/object_detection/reader.py
--- a/fluid/object_detection/train.py
+++ b/fluid/object_detection/train.py
--- a/fluid/object_detection/utility.py
+++ b/fluid/object_detection/utility.py
--- a/fluid/ocr_recognition/crnn_ctc_model.py
+++ b/fluid/ocr_recognition/crnn_ctc_model.py
--- a/fluid/ocr_recognition/ctc_train.py
+++ b/fluid/ocr_recognition/ctc_train.py
--- a/fluid/policy_gradient/README.md
+++ b/fluid/policy_gradient/README.md
--- a/fluid/policy_gradient/brain.py
+++ b/fluid/policy_gradient/brain.py
--- a/fluid/sequence_tagging_for_ner/README.md
+++ b/fluid/sequence_tagging_for_ner/README.md
--- a/fluid/sequence_tagging_for_ner/imgs/convergence_curve.png
+++ b/fluid/sequence_tagging_for_ner/imgs/convergence_curve.png
--- a/fluid/sequence_tagging_for_ner/infer.py
+++ b/fluid/sequence_tagging_for_ner/infer.py
--- a/fluid/sequence_tagging_for_ner/network_conf.py
+++ b/fluid/sequence_tagging_for_ner/network_conf.py
--- a/fluid/sequence_tagging_for_ner/train.py
+++ b/fluid/sequence_tagging_for_ner/train.py
--- a/fluid/sequence_tagging_for_ner/utils_extend.py
+++ b/fluid/sequence_tagging_for_ner/utils_extend.py
--- a/fluid/text_classification/README.md
+++ b/fluid/text_classification/README.md
--- a/fluid/text_classification/train.py
+++ b/fluid/text_classification/train.py
--- a/generate_chinese_poetry/README.md
+++ b/generate_chinese_poetry/README.md
--- a/generate_sequence_by_rnn_lm/README.md
+++ b/generate_sequence_by_rnn_lm/README.md
--- a/globally_normalized_reader/README.cn.md
+++ b/globally_normalized_reader/README.cn.md
--- a/globally_normalized_reader/README.md
+++ b/globally_normalized_reader/README.md
--- a/hsigmoid/README.md
+++ b/hsigmoid/README.md
--- a/image_classification/README.md
+++ b/image_classification/README.md
--- a/ltr/README.md
+++ b/ltr/README.md
--- a/mt_with_external_memory/README.md
+++ b/mt_with_external_memory/README.md
--- a/nce_cost/README.md
+++ b/nce_cost/README.md
--- a/nested_sequence/text_classification/README.md
+++ b/nested_sequence/text_classification/README.md
--- a/neural_qa/README.md
+++ b/neural_qa/README.md
--- a/nmt_without_attention/README.md
+++ b/nmt_without_attention/README.md
--- a/scene_text_recognition/README.md
+++ b/scene_text_recognition/README.md
--- a/scheduled_sampling/README.md
+++ b/scheduled_sampling/README.md
--- a/sequence_tagging_for_ner/README.md
+++ b/sequence_tagging_for_ner/README.md
--- a/sequence_tagging_for_ner/images/BIO tag example.png
+++ b/sequence_tagging_for_ner/images/BIO tag example.png
--- a/sequence_tagging_for_ner/images/ner_model_en.png
+++ b/sequence_tagging_for_ner/images/ner_model_en.png
--- a/ssd/README.cn.md
+++ b/ssd/README.cn.md
--- a/ssd/README.md
+++ b/ssd/README.md
--- a/text_classification/README.md
+++ b/text_classification/README.md
--- a/youtube_recall/README.cn.md
+++ b/youtube_recall/README.cn.md
--- a/youtube_recall/README.md
+++ b/youtube_recall/README.md