Merge pull request #1976 from luotao1/release/0.10.0

merge commit on Release/0.10.0 branch to develop branch

Merge pull request #1976 from luotao1/release/0.10.0
merge commit on Release/0.10.0 branch to develop branch
18e5edf6 · gangliao · GitHub · 2cb4a9fb · 73af8d78 · 18e5edf6
25 changed file
--- a/RELEASE.md
+++ b/RELEASE.md
 # Release v0.10.0
+We are glad to release version 0.10.0.  In this version, we are happy to release the new 
+[Python API](http://research.baidu.com/paddlepaddles-new-api-simplifies-deep-learning-programs/).
+- Our old Python API is kind of out of date.  It's hard to learn and hard to
+  use.  To write a PaddlePaddle program using the old API, we'd have to write
+  at least two Python files: one `data provider` and another one that defines
+  the network topology.  Users start a PaddlePaddle job by running the
+  `paddle_trainer` C++ program, which calls Python interpreter to run the
+  network topology configuration script and then start the training loop,
+  which iteratively calls the data provider function to load minibatches.
+  This prevents us from writing a Python program in a modern way, e.g., in the
+  Jupyter Notebook.
+- The new API, which we often refer to as the *v2 API*, allows us to write
+  much shorter Python programs to define the network and the data in a single
+  .py file.  Also, this program can run in Jupyter Notebook, since the entry
+  point is in Python program and PaddlePaddle runs as a shared library loaded
+  and invoked by this Python program.
+Basing on the new API, we delivered an online interative
+book, [Deep Learning 101](http://book.paddlepaddle.org/index.en.html)
+and [its Chinese version](http://book.paddlepaddle.org/).
+We also worked on updating our online documentation to describe the new API.
+But this is an ongoing work.  We will release more documentation improvements
+in the next version.
+We also worked on bring the new API to distributed model training (via MPI and
+Kubernetes).  This work is ongoing. We will release more about it in the next
+version.
 ## New Features
+* We release [new Python API](http://research.baidu.com/paddlepaddles-new-api-simplifies-deep-learning-programs/).
+* Deep Learning 101 book in [English](http://book.paddlepaddle.org/index.en.html) and [Chinese](http://book.paddlepaddle.org/).
+* Support rectangle input for CNN.
+* Support stride pooling for seqlastin and seqfirstin.
+* Expose `seq_concat_layer/seq_reshape_layer` in `trainer_config_helpers`.
+* Add dataset package: CIFAR, MNIST, IMDB, WMT14, CONLL05, movielens, imikolov.
+* Add Priorbox layer for Single Shot Multibox Detection. 
+* Add smooth L1 cost.
+* Add data reader creator and data reader decorator for v2 API.
+* Add the CPU implementation of cmrnorm projection.
 ## Improvements
+* Support Python virtualenv for `paddle_trainer`.
+* Add pre-commit hooks, used for automatically format our code.
+* Upgrade protobuf to version 3.x.
+* Add an option to check data type in Python data provider.
+* Speedup the backward of average layer on GPU.
+* Documentation refinement.
+* Check dead links in documents using Travis-CI.
+* Add a example for explaining `sparse_vector`.
+* Add ReLU in layer_math.py
+* Simplify data processing flow for Quick Start.
+* Support CUDNN Deconv.
+* Add data feeder in v2 API.
+* Support predicting the samples from sys.stdin for sentiment demo.
+* Provide multi-proccess interface for image preprocessing. 
+* Add benchmark document for v1 API.
+* Add ReLU in `layer_math.py`.
+* Add packages for automatically downloading public datasets.
+* Rename `Argument::sumCost` to `Argument::sum` since class `Argument` is nothing with cost.
+* Expose Argument::sum to Python
+* Add a new `TensorExpression` implementation for matrix-related expression evaluations.
+* Add lazy assignment for optimizing the calculation of a batch of multiple expressions.
+* Add abstract calss `Function` and its implementation:
+  * `PadFunc` and `PadGradFunc`.
+  * `ContextProjectionForwardFunc` and `ContextProjectionBackwardFunc`.
+  * `CosSimBackward` and `CosSimBackwardFunc`.
+  * `CrossMapNormalFunc` and `CrossMapNormalGradFunc`.
+  * `MulFunc`.
+* Add class `AutoCompare` and `FunctionCompare`, which make it easier to write unit tests for comparing gpu and cpu version of a function.
+* Generate `libpaddle_test_main.a` and remove the main function inside the test file.
+* Support dense numpy vector in PyDataProvider2.
+* Clean code base, remove some copy-n-pasted code snippets:
+  * Extract `RowBuffer` class for `SparseRowMatrix`.
+  * Clean the  interface of `GradientMachine`.
+  * Use `override` keyword in layer.
+  * Simplify `Evaluator::create`, use `ClassRegister` to create `Evaluator`s.
+* Check MD5 checksum when downloading demo's dataset.
+* Add `paddle::Error` which intentially replace `LOG(FATAL)` in Paddle.
 ## Bug Fixes
+* Check layer input types for `recurrent_group`.
+* Don't run `clang-format` with .cu source files.
+* Fix bugs with `LogActivation`.
+* Fix the bug that runs `test_layerHelpers` multiple times.
+* Fix the bug that the seq2seq demo exceeds protobuf message size limit.
+* Fix the bug in dataprovider converter in GPU mode.
+* Fix a bug in `GatedRecurrentLayer`.
+* Fix bug for `BatchNorm` when testing more than one models.
+* Fix broken unit test of paramRelu.
+* Fix some compile-time warnings about `CpuSparseMatrix`.
+* Fix `MultiGradientMachine` error when `trainer_count > batch_size`.
+* Fix bugs that prevents from asynchronous data loading in `PyDataProvider2`.
 # Release v0.9.0

--- a/cmake/package.cmake
+++ b/cmake/package.cmake
 set(CPACK_PACKAGE_NAME paddle)
-set(CPACK_PACKAGE_DESCRIPTION_SUMMARY "")
 set(CPACK_PACKAGE_VERSION_MAJOR ${PADDLE_MAJOR_VERSION})
 set(CPACK_PACKAGE_VERSION_MINOR ${PADDLE_MINOR_VERSION})
 set(CPACK_PACKAGE_VERSION_PATCH ${PADDLE_PATCH_VERSION})
@@ -10,8 +9,9 @@ set(CPACK_DEBIAN_PACKAGE_ARCHITECTURE amd64)
 set(CPACK_DEBIAN_PACKAGE_MAINTAINER PaddlePaddle Dev <paddle-dev@baidu.com>)
 set(CPACK_PACKAGE_DESCRIPTION_SUMMARY "Paddle")
 set(CPACK_PACKAGE_DESCRIPTION "")
-set(CPACK_DEBIAN_PACKAGE_DEPENDS "libatlas3-base, libgflags2, libgoogle-glog0, libprotobuf8, libpython2.7, libstdc++6, python-numpy, python-pip, python-pip-whl, python-protobuf")
+set(CPACK_DEBIAN_PACKAGE_DEPENDS "libpython2.7-dev, libstdc++6, python-pip, curl, libgfortran3, python-pip-whl")
 set(CPACK_DEBIAN_PACKAGE_SECTION Devel)
+set(CPACK_DEBIAN_PACKAGE_VERSION ${PADDLE_VERSION})
 set(CPACK_DEBIAN_PACKAGE_CONTROL_EXTRA "${PROJ_ROOT}/paddle/scripts/deb/postinst")
 #set(CPACK_GENERATOR "DEB")
 # Start cpack

--- a/demo/sentiment/trainer_config.py
+++ b/demo/sentiment/trainer_config.py
@@ -29,7 +29,7 @@ settings(
    batch_size=128,
    learning_rate=2e-3,
    learning_method=AdamOptimizer(),
-    average_window=0.5,
+    model_average=ModelAverage(0.5),
    regularization=L2Regularization(8e-4),
    gradient_clipping_threshold=25)

--- a/demo/seqToseq/seqToseq_net.py
+++ b/demo/seqToseq/seqToseq_net.py
@@ -69,7 +69,8 @@ def gru_encoder_decoder(data_conf,
                        encoder_size=512,
                        decoder_size=512,
                        beam_size=3,
-                        max_length=250):
+                        max_length=250,
+                        error_clipping=50):
    """
    A wrapper for an attention version of GRU Encoder-Decoder network
    is_generating: whether this config is used for generating
@@ -90,9 +91,19 @@ def gru_encoder_decoder(data_conf,
        input=src_word_id,
        size=word_vector_dim,
        param_attr=ParamAttr(name='_source_language_embedding'))
-    src_forward = simple_gru(input=src_embedding, size=encoder_size)
+    src_forward = simple_gru(
+        input=src_embedding,
+        size=encoder_size,
+        naive=True,
+        gru_layer_attr=ExtraLayerAttribute(
+            error_clipping_threshold=error_clipping))
    src_backward = simple_gru(
-        input=src_embedding, size=encoder_size, reverse=True)
+        input=src_embedding,
+        size=encoder_size,
+        reverse=True,
+        naive=True,
+        gru_layer_attr=ExtraLayerAttribute(
+            error_clipping_threshold=error_clipping))
    encoded_vector = concat_layer(input=[src_forward, src_backward])
    with mixed_layer(size=decoder_size) as encoded_proj:
@@ -117,11 +128,13 @@ def gru_encoder_decoder(data_conf,
            decoder_inputs += full_matrix_projection(input=context)
            decoder_inputs += full_matrix_projection(input=current_word)
-        gru_step = gru_step_layer(
+        gru_step = gru_step_naive_layer(
            name='gru_decoder',
            input=decoder_inputs,
            output_mem=decoder_mem,
-            size=decoder_size)
+            size=decoder_size,
+            layer_attr=ExtraLayerAttribute(
+                error_clipping_threshold=error_clipping))
        with mixed_layer(
                size=target_dict_dim, bias_attr=True,

--- a/doc/getstarted/index_cn.rst
+++ b/doc/getstarted/index_cn.rst
@@ -2,7 +2,8 @@
 ============
 ..  toctree::
-  :maxdepth: 2
+  :maxdepth: 1
  build_and_install/index_cn.rst
-  basic_usage/index_cn.rst
+- `深度学习入门课程 <http://book.paddlepaddle.org/>`_
--- a/doc/getstarted/index_en.rst
+++ b/doc/getstarted/index_en.rst
@@ -2,7 +2,8 @@ GET STARTED
 ============
 ..  toctree::
-  :maxdepth: 2
+  :maxdepth: 1
  build_and_install/index_en.rst
-  basic_usage/index_en.rst
+- `Deep Learning 101 <http://book.paddlepaddle.org/index.en.html>`_
--- a/doc/howto/deep_model/rnn/hierarchical_layer_cn.rst
+++ b/doc/howto/deep_model/rnn/hierarchical_layer_cn.rst
@@ -19,18 +19,18 @@
 在 PaddlePaddle中，下面这些Layer能够接受双层序列作为输入，完成相应的计算。
-pooling_layer
+pooling
-==============
+========
-pooling_layer 的使用示例如下，详细见 :ref:`api_trainer_config_helpers_layers_pooling_layer` 配置API。
+pooling 的使用示例如下，详细见 :ref:`api_v2.layer_pooling` 配置API。
 ..	code-block:: bash
-        seq_pool = pooling_layer(input=layer,
+        seq_pool = pooling(input=layer,
-                                 pooling_type=AvgPooling(),
+                           pooling_type=pooling.Max(),
                           agg_level=AggregateLevel.EACH_SEQUENCE)
- `pooling_type` 目前支持两种，分别是：MaxPooling()和AvgPooling()。
+- `pooling_type` 目前支持两种，分别是：pooling.Max()和pooling.Avg()。
 - `agg_level=AggregateLevel.EACH_TIMESTEP` 时（默认值）：
@@ -47,7 +47,7 @@ pooling_layer 的使用示例如下，详细见 :ref:`api_trainer_config_helpers
 last_seq 和 first_seq
 =====================
-last_seq 的使用示例如下（ :ref:`api_trainer_config_helpers_layers_first_seq` 类似），详细见 :ref:`api_trainer_config_helpers_layers_last_seq` 配置API。
+last_seq 的使用示例如下（ :ref:`api_v2.layer_first_seq` 类似），详细见 :ref:`api_v2.layer_last_seq` 配置API。
 ..	code-block:: bash
@@ -65,14 +65,14 @@ last_seq 的使用示例如下（ :ref:`api_trainer_config_helpers_layers_first_
  - 输入：必须是一个双层序列
  - 输出：一个单层序列，其中每个元素是双层序列中每个subseq最后一个（或第一个）元素。
-expand_layer
+expand
-============
+======
-expand_layer 的使用示例如下，详细见 :ref:`api_trainer_config_helpers_layers_expand_layer` 配置API。
+expand 的使用示例如下，详细见 :ref:`api_v2.layer_expand` 配置API。
 ..	code-block:: bash
-        expand = expand_layer(input=layer1,
+        ex = expand(input=layer1,
                    expand_as=layer2,
                    expand_level=ExpandLevel.FROM_TIMESTEP)

--- a/doc/howto/deep_model/rnn/index_cn.rst
+++ b/doc/howto/deep_model/rnn/index_cn.rst
@@ -4,7 +4,6 @@ RNN相关模型
 ..  toctree::
  :maxdepth: 1
-  rnn_config_cn.rst
  recurrent_group_cn.md
  hierarchical_layer_cn.rst
  hrnn_rnn_api_compare_cn.rst
--- a/doc/howto/deep_model/rnn/index_en.rst
+++ b/doc/howto/deep_model/rnn/index_en.rst
 RNN Models
 ==========
-..  toctree::
-  :maxdepth: 1
-  rnn_config_en.rst
--- a/doc/index_cn.rst
+++ b/doc/index_cn.rst
@@ -5,7 +5,6 @@ PaddlePaddle 文档
  :maxdepth: 1
  getstarted/index_cn.rst
-  tutorials/index_cn.md
  howto/index_cn.rst
  api/index_cn.rst
  faq/index_cn.rst
--- a/doc/index_en.rst
+++ b/doc/index_en.rst
@@ -5,8 +5,6 @@ PaddlePaddle Documentation
  :maxdepth: 1
  getstarted/index_en.rst
-  tutorials/index_en.md
  howto/index_en.rst
  api/index_en.rst
  about/index_en.rst
\ No newline at end of file
--- a/doc_theme/templates/layout.html
+++ b/doc_theme/templates/layout.html
@@ -114,10 +114,7 @@
          </ul>
        </div>
        <ul class="site-page-links">
-          <li><a>Home</a></li>
+          <li><a href="/">Home</a></li>
-          <li><a>Get Started</a></li>
-          <li class="active"><a>Documentation</a></li>
-          <li><a>About Us</a></li>
        </ul>
      </div>
      <div class="doc-module">
@@ -137,7 +134,7 @@
          {{ toctree }}
        {% endblock %}
    </nav>
-    {% if toc %}
+    {% if False %}
    <nav class="local-toc">{{ toc }}</nav>
    {% endif %}
    <section class="doc-content-wrap">
@@ -168,7 +165,8 @@
            VERSION:'{{ release|e }}',
            COLLAPSE_INDEX:false,
            FILE_SUFFIX:'{{ '' if no_search_suffix else file_suffix }}',
-            HAS_SOURCE:  {{ has_source|lower }}
+            HAS_SOURCE:  {{ has_source|lower }},
+            SOURCELINK_SUFFIX: ".txt",
        };
    </script>
    {%- for scriptfile in script_files %}

--- a/paddle/function/CMakeLists.txt
+++ b/paddle/function/CMakeLists.txt
@@ -12,7 +12,7 @@ endif()
 add_library(paddle_function STATIC ${cpp_files} ${cu_objs})
 add_dependencies(paddle_function ${external_project_dependencies})
+add_dependencies(paddle_function gen_proto_cpp)
 if(WITH_GPU)
 if(WITH_TESTING)

--- a/paddle/gserver/tests/sequence_layer_group.conf
+++ b/paddle/gserver/tests/sequence_layer_group.conf
@@ -48,8 +48,7 @@ lstm = lstmemory_group(
    size=hidden_dim,
    act=TanhActivation(),
    gate_act=SigmoidActivation(),
-    state_act=TanhActivation(),
+    state_act=TanhActivation())
-    lstm_layer_attr=ExtraLayerAttribute(error_clipping_threshold=50))
 lstm_last = last_seq(input=lstm)

--- a/paddle/gserver/tests/sequence_nest_layer_group.conf
+++ b/paddle/gserver/tests/sequence_nest_layer_group.conf
@@ -51,8 +51,7 @@ def lstm_group(lstm_group_input):
        size=hidden_dim,
        act=TanhActivation(),
        gate_act=SigmoidActivation(),
-        state_act=TanhActivation(),
+        state_act=TanhActivation())
-        lstm_layer_attr=ExtraLayerAttribute(error_clipping_threshold=50))
    return lstm_output

--- a/paddle/scripts/deb/postinst
+++ b/paddle/scripts/deb/postinst
+#!/bin/bash
+set -e
+echo "Post install paddle debian package."
+echo "Install some python package used for paddle. You can run "
+echo "  pip install /usr/opt/paddle/share/wheels/*.whl to install them."
+find /usr/ -name '*paddle*.whl' | xargs pip install
--- a/paddle/scripts/docker/build.sh
+++ b/paddle/scripts/docker/build.sh
@@ -5,13 +5,8 @@ set -e
 # Set BASE_IMAGE according to env variables
 if [ ${WITH_GPU} == "ON" ]; then
  BASE_IMAGE="nvidia/cuda:8.0-cudnn5-runtime-ubuntu14.04"
-  # additional packages to install when building gpu images
-  GPU_DOCKER_PKG="python-pip python-dev"
 else
-  BASE_IMAGE="python:2.7.13-slim"
+  BASE_IMAGE="ubuntu:14.04"
-  # FIXME: python base image uses different python version than WITH_GPU
-  # need to change PYTHONHOME to /usr/local when using python base image
-  CPU_DOCKER_PYTHON_HOME_ENV="ENV PYTHONHOME /usr/local"
 fi
 DOCKERFILE_GPU_ENV=""
@@ -66,10 +61,7 @@ if [ ${WITH_DOC} == "ON" ]; then
    rm -rf /paddle/build_doc
 fi
 # generate deb package for current build
-# FIXME(typhoonzero): should we remove paddle/scripts/deb ?
+cpack -D CPACK_GENERATOR='DEB' ..
-# FIXME: CPACK_DEBIAN_PACKAGE_DEPENDS removes all dev dependencies, must
-# install them in docker
-cpack -D CPACK_GENERATOR='DEB' -D CPACK_DEBIAN_PACKAGE_DEPENDS="" ..
 if [[ ${WOBOQ:-OFF} == 'ON' ]]; then
    apt-get install -y clang-3.8 llvm-3.8 libclang-3.8-dev
@@ -97,32 +89,30 @@ fi
 paddle version
-if [[ -n ${APT_MIRROR} ]]; then
-  MIRROR_UPDATE="sed -i '${APT_MIRROR}' /etc/apt/sources.list && \\"
-else
-  MIRROR_UPDATE="\\"
-fi
 cat > /paddle/build/Dockerfile <<EOF
 FROM ${BASE_IMAGE}
 MAINTAINER PaddlePaddle Authors <paddle-dev@baidu.com>
 ENV HOME /root
 ENV LANG en_US.UTF-8
 # Use Fix locales to en_US.UTF-8
-RUN ${MIRROR_UPDATE}
+EOF
-    apt-get update && \
-    apt-get install -y libgfortran3 libpython2.7 ${GPU_DOCKER_PKG} && \
+if [[ -n ${APT_MIRROR} ]]; then
-    apt-get clean -y && \
+cat >> /paddle/build/Dockerfile <<EOF
-    pip install --upgrade pip && \
+RUN sed -i '${APT_MIRROR}' /etc/apt/sources.list
-    pip install -U 'protobuf==3.1.0' requests numpy
+EOF
+fi
+cat >> /paddle/build/Dockerfile <<EOF
 # Use different deb file when building different type of images
-ADD *.deb /usr/local/opt/paddle/deb/
+ADD *.deb /
 # run paddle version to install python packages first
-RUN dpkg -i /usr/local/opt/paddle/deb/*.deb && \
+RUN apt-get update &&\
-    rm -f /usr/local/opt/paddle/deb/*.deb && \
+    apt-get install -y python-pip && pip install -U pip && \
-    find /usr/ -name '*paddle-*.whl' | xargs pip install && \
+    dpkg -i /*.deb ; apt-get install -f -y && \
+    apt-get clean -y && \
+    rm -f /*.deb && \
    paddle version
-${CPU_DOCKER_PYTHON_HOME_ENV}
 ${DOCKERFILE_CUDNN_DSO}
 ${DOCKERFILE_GPU_ENV}
 # default command shows the paddle version and exit

--- a/paddle/scripts/travis/docs.sh
+++ b/paddle/scripts/travis/docs.sh
@@ -60,6 +60,7 @@ function deploy_docs() {
 deploy_docs "master" "." 
 deploy_docs "develop" "./develop/"
+deploy_docs "release/0.10.0" "./release/0.10.0/"
 # Check is there anything changed.
 set +e

--- a/paddle/setup.py.in
+++ b/paddle/setup.py.in
@@ -23,7 +23,7 @@ setup(name="py_paddle",
      install_requires = [
        'nltk>=3.2.2',
        'numpy>=1.8.0',      # The numpy is required.
-        'protobuf>=${PROTOBUF_VERSION}'    # The paddle protobuf version
+        'protobuf==${PROTOBUF_VERSION}'    # The paddle protobuf version
      ],
      url='http://www.paddlepaddle.org/',
      license='Apache 2.0',

--- a/python/paddle/trainer_config_helpers/attrs.py
+++ b/python/paddle/trainer_config_helpers/attrs.py
@@ -208,12 +208,15 @@ class ExtraLayerAttribute(object):
                 drop_rate=None,
                 device=None):
        self.attr = dict()
-        if isinstance(error_clipping_threshold, float):
+        if error_clipping_threshold is not None:
-            assert error_clipping_threshold > 0
+            error_clipping_threshold = float(error_clipping_threshold)
-            self.attr["error_clipping_threshold"] = error_clipping_threshold
+            if error_clipping_threshold < 0:
+                raise ValueError("Error clipping must > 0")
-        if isinstance(drop_rate, float):
+            self.attr['error_clipping_threshold'] = error_clipping_threshold
-            assert drop_rate > 0
+        if drop_rate is not None:
+            drop_rate = float(drop_rate)
+            if drop_rate < 0:
+                raise ValueError("Dropout rate must > 0")
            self.attr["drop_rate"] = drop_rate
        if isinstance(device, int):

--- a/python/paddle/trainer_config_helpers/layers.py
+++ b/python/paddle/trainer_config_helpers/layers.py
@@ -84,6 +84,7 @@ __all__ = [
    'GeneratedInput',
    'SubsequenceInput',
    'gru_step_layer',
+    'gru_step_naive_layer',
    'recurrent_layer',
    'BaseGeneratedInput',
    'conv_operator',
@@ -3086,6 +3087,78 @@ def gru_step_layer(input,
        activation=act)
+@wrap_bias_attr_default()
+@wrap_param_attr_default()
+@wrap_act_default(param_names=['gate_act'], act=SigmoidActivation())
+@wrap_act_default(act=TanhActivation())
+@wrap_name_default('gru_step_naive')
+@layer_support(ERROR_CLIPPING, DROPOUT)
+def gru_step_naive_layer(input,
+                         output_mem,
+                         size=None,
+                         name=None,
+                         act=None,
+                         gate_act=None,
+                         bias_attr=None,
+                         param_attr=None,
+                         layer_attr=None):
+    """
+    GRU Step Layer, but using MixedLayer to generate. It support ERROR_CLIPPING
+    and DROPOUT.
+    :param input:
+    :param output_mem:
+    :param size:
+    :param name:
+    :param act:
+    :param gate_act:
+    :param bias_attr:
+    :param param_attr:
+    :param layer_attr:
+    :return:
+    """
+    if input.size % 3 != 0:
+        raise ValueError("GruStep input size must be divided by 3")
+    if size is None:
+        size = input.size / 3
+    def __gate__(gate_name, offset):
+        with mixed_layer(
+                name=name + "_" + gate_name,
+                size=size,
+                layer_attr=layer_attr,
+                bias_attr=bias_attr,
+                act=gate_act) as gate:
+            gate += identity_projection(input=input, offset=offset)
+            gate += full_matrix_projection(
+                input=output_mem, param_attr=param_attr)
+        return gate
+    update_gate = __gate__("update", 0)
+    reset_gate = __gate__("reset", size)
+    with mixed_layer(
+            name=name + "_reset_output", bias_attr=False) as reset_output:
+        reset_output += dotmul_operator(a=output_mem, b=reset_gate)
+    with mixed_layer(
+            name=name + "_output_candidate",
+            size=size,
+            layer_attr=layer_attr,
+            bias_attr=bias_attr,
+            act=act) as output_candidate:
+        output_candidate += identity_projection(input=input, offset=2 * size)
+        output_candidate += full_matrix_projection(
+            input=reset_output, param_attr=param_attr)
+    with mixed_layer(name=name) as output:
+        output += identity_projection(output_mem)
+        output += dotmul_operator(a=output_mem, b=update_gate, scale=-1.0)
+        output += dotmul_operator(a=output_candidate, b=update_gate)
+    return output
 @wrap_name_default()
 @layer_support()
 def get_output_layer(input, arg_name, name=None, layer_attr=None):

--- a/python/paddle/trainer_config_helpers/networks.py
+++ b/python/paddle/trainer_config_helpers/networks.py
@@ -825,7 +825,8 @@ def gru_unit(input,
             gru_param_attr=None,
             act=None,
             gate_act=None,
-             gru_layer_attr=None):
+             gru_layer_attr=None,
+             naive=False):
    """
    Define calculations that a gated recurrent unit performs in a single time
    step. This function itself is not a recurrent layer, so that it can not be
@@ -857,7 +858,12 @@ def gru_unit(input,
    out_mem = memory(name=name, size=size)
-    gru_out = gru_step_layer(
+    if naive:
+        __step__ = gru_step_naive_layer
+    else:
+        __step__ = gru_step_layer
+    gru_out = __step__(
        name=name,
        input=input,
        output_mem=out_mem,
@@ -879,7 +885,8 @@ def gru_group(input,
              gru_param_attr=None,
              act=None,
              gate_act=None,
-              gru_layer_attr=None):
+              gru_layer_attr=None,
+              naive=False):
    """
    gru_group is a recurrent layer group version of Gated Recurrent Unit. It
    does exactly the same calculation as the grumemory layer does. A promising
@@ -928,7 +935,8 @@ def gru_group(input,
            gru_param_attr=gru_param_attr,
            act=act,
            gate_act=gate_act,
-            gru_layer_attr=gru_layer_attr)
+            gru_layer_attr=gru_layer_attr,
+            naive=naive)
    return recurrent_group(
        name='%s_recurrent_group' % name,
@@ -949,7 +957,8 @@ def simple_gru(input,
               gru_param_attr=None,
               act=None,
               gate_act=None,
-               gru_layer_attr=None):
+               gru_layer_attr=None,
+               naive=False):
    """
    You maybe see gru_step_layer, grumemory in layers.py, gru_unit, gru_group,
    simple_gru in network.py. The reason why there are so many interfaces is
@@ -1018,7 +1027,8 @@ def simple_gru(input,
        gru_param_attr=gru_param_attr,
        act=act,
        gate_act=gate_act,
-        gru_layer_attr=gru_layer_attr)
+        gru_layer_attr=gru_layer_attr,
+        naive=naive)
 @wrap_name_default('simple_gru2')

--- a/python/paddle/trainer_config_helpers/tests/configs/protostr/projections.protostr
+++ b/python/paddle/trainer_config_helpers/tests/configs/protostr/projections.protostr
@@ -320,6 +320,7 @@ layers {
    }
  }
  drop_rate: 0.5
+  error_clipping_threshold: 40.0
 }
 parameters {
  name: "___embedding_0__.w0"

--- a/python/paddle/v2/layer.py
+++ b/python/paddle/v2/layer.py
@@ -356,6 +356,9 @@ def mixed(size=0,
    return MixedLayerV2(size, input, name, act, bias_attr, layer_attr)
+mixed.__doc__ = conf_helps.mixed_layer.__doc__
 class RecurrentLayerInput(Layer):
    def __init__(self, recurrent_name, index, parent_layers):
        parents_len = len(parent_layers)
@@ -404,6 +407,8 @@ data.__name__ = 'data'
 AggregateLevel = conf_helps.layers.AggregateLevel
 ExpandLevel = conf_helps.layers.ExpandLevel
 memory = MemoryV2
+memory.__name__ = 'memory'
+memory.__doc__ = conf_helps.memory.__doc__
 def __layer_name_mapping__(inname):
@@ -512,6 +517,9 @@ def recurrent_group(step, input, name=None):
        return retv
+recurrent_group.__doc__ = conf_helps.recurrent_group.__doc__
 @wrap_name_default()
 def beam_search(step,
                input,
@@ -579,6 +587,8 @@ def beam_search(step,
    return tmp
+beam_search.__doc__ = conf_helps.beam_search.__doc__
 __projection_names__ = filter(lambda x: x.endswith('_projection'),
                              dir(conf_helps))

--- a/python/setup.py.in
+++ b/python/setup.py.in
@@ -15,6 +15,9 @@ setup(name='paddle',
      description='Parallel Distributed Deep Learning',
      install_requires=[
          "requests",
+          "numpy",
+          "protobuf==${PROTOBUF_VERSION}",
+          "matplotlib",
      ],
      packages=packages,
      package_dir={