Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into write_new_op

8dded4f2 · qingqing01 · b3361194 · 75e16bd3 · b3361194 · b3361194
7 changed file
--- a/doc/about/index_cn.md
+++ b/doc/about/index_cn.md
-关于PaddlePaddle
-================
-
-PaddlePaddle是一个最早由百度科学家和工程师共同研发的并行分布式深度学习平台，兼备易用性、高效性、灵活性和可扩展性，目前已被百度内部多个产品线广泛使用。
-PaddlePaddle目前已经开放源码, 但是远未完善，我们希望能在这个基础上不断的改进、扩展和延伸。
-同时我们希望广大开发者积极提供反馈和贡献源代码，建立一个活跃的开源社区。
-
-致谢
--------
-
-在此，特别感谢PaddlePaddle的[所有贡献者](https://github.com/PaddlePaddle/Paddle/graphs/contributors)。
--- a/doc/about/index_en.rst
+++ b/doc/about/index_en.rst
-ABOUT
-=======
-
-PaddlPaddle is an easy-to-use, efficient, flexible and scalable deep learning platform,
-which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu.
-
-PaddlePaddle is now open source but far from complete, which is intended to be built upon, improved, scaled, and extended.
-We hope to build an active open source community both by providing feedback and by actively contributing to the source code.
-
-
-Credits
--------
-
-We owe many thanks to `all contributors and developers <https://github.com/PaddlePaddle/Paddle/graphs/contributors>`_ of PaddlePaddle!
--- a/doc/index_en.rst
+++ b/doc/index_en.rst
@@ -7,4 +7,3 @@ PaddlePaddle Documentation
  getstarted/index_en.rst
  howto/index_en.rst
  api/index_en.rst
-  about/index_en.rst
--- a/paddle/framework/backward.cc
+++ b/paddle/framework/backward.cc
@@ -124,6 +124,9 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
    std::list<Pos> insert_position;
    for (auto& dup_output_op : dup_output_ops) {
      const std::string& name = dup_output_op.first;
+      // duplicate @Empty@ don't need to be added
+      if (name == kEmptyVarName) continue;
+
      auto& dup_op = dup_output_op.second;
      // no duplicate output
      if (dup_op.size() == 1) continue;
@@ -209,7 +212,7 @@ std::unique_ptr<OperatorBase> Backward(
    const OperatorBase& forwardOp,
    const std::unordered_set<std::string>& no_grad_vars) {
  std::unordered_set<std::string> no_grad_names;
-  no_grad_names.reserve(no_grad_vars.size());
+  no_grad_names.reserve(no_grad_vars.size() + 1);

  no_grad_names.insert(std::string(kEmptyVarName) + kGradVarSuffix);


--- a/paddle/framework/backward.md
+++ b/paddle/framework/backward.md
-## Operator/expression 's Backward
+# Operator/expression 's Backward

-### Motivation
+## Motivation

-In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation lineage, the operator/ expression's Backward feature will generate the backward pass respect to forward pass.
+In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
  
-### Implement : gradient operator registry
+## Backward Operator Registry

-|                        | forward operator | backward operator                |
-| ---------------------- | ---------------- | -------------------------------- |
+A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients.
+
+|                        | forward operator | backward operator 
+| ---------------------- | ---------------- |------------------------- |		
 | **Operator::inputs_**  | Inputs       | Inputs, Outputs, OutputGradients |	
 | **Operator::outputs_** | Outputs          | InputGradients            |

-Inputs/Outputs means the input/output of the operator,  InputGradients/OutputGradients is the gradient respect to forward opeartor. Forward operator and Backward operator are isomorphic, save their corresponding needs into member attribute.
+ In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.
+
+For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro:
+
+```cpp
+REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
+```
+
+`mul` is the operator's type. `MulOp` and `MulOpMaker` are the operator class and the operator maker class respectively.
+
+`mul_grad` is the type of backward operator, and `MulOpGrad` is its class name.
+
+## Backward Opeartor Creating
+
+Given a certain forward operator, we can get its corresponding backward opeartor by calling:
+
+```cpp
+OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op);
+``` 
+
+The function `BuildGradOp` will sequentially execute following processes:
+
+1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.
+
+2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these are not necessary for gradient computing.
+
+3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.
+
+4. Building backward operator with `inputs`, `outputs` and forward operator's attributes.

-We use a global hash map record the gradient operators available, follow the philosophy  of minimum core, make operator pluggable unit. Each gradient is an operator and it needs to regist itself. 
+## Backward Network Building

-grad_op_builder(fengjiayi)
+A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and put them together.

-### Implement : Backward network
+In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network. 

 given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.


--- a/paddle/operators/net_op.cc
+++ b/paddle/operators/net_op.cc
@@ -31,10 +31,13 @@ void NetOp::CompleteAddOp(bool calc) {
  for (auto& op : ops_) {
    for (auto& ipt : op->Inputs()) {
      for (auto& var_name : ipt.second) {
-        if (!Contains(output_set, var_name)) {  // Not other op's output
-          input_set.insert(var_name);
-        } else {
+        // If input variable has been in output set, then it will be
+        // added into intermediate_outputs_. Otherwise, it will be
+        // added into input set.
+        if (Contains(output_set, var_name)) {
          intermediate_outputs_.insert(var_name);
+        } else {
+          input_set.insert(var_name);
        }
      }
    }

--- a/python/paddle/v2/parameters.py
+++ b/python/paddle/v2/parameters.py
@@ -14,6 +14,7 @@

 import numpy as np
 from paddle.proto.ParameterConfig_pb2 import ParameterConfig
+from collections import OrderedDict
 import paddle.trainer.config_parser as cp
 import struct
 import tarfile
@@ -42,9 +43,25 @@ def create(layers):

 class Parameters(object):
    """
-    Parameters is a dictionary contains Paddle's parameter. The key of
-    Parameters is the name of parameter. The value of Parameters is a plain
-    :code:`numpy.ndarry` .
+    `Parameters` manages all the learnable parameters in a neural network.
+    It stores parameters' information in an OrderedDict. The key is
+    the name of a parameter, and value is a parameter's configuration(in
+    protobuf format), such as initialization mean and std, its size, whether it
+    is a static parameter, and so on.
+
+    :param __param_conf__: store the configurations of learnable parameters in
+        the network in an OrderedDict. Parameter is added one by one into the
+        dict by following their created order in the network: parameters of
+        the previous layers in a network are careted first. You can visit the
+        parameters from bottom to top by iterating over this dict.
+    :type __param_conf__: OrderedDict
+    :param __gradient_machines__: all of the parameters in a neural network are
+        appended to a PaddlePaddle gradient machine, which is used internally to
+        copy parameter values between C++ and Python end.
+    :type __gradient_machines__: list
+    :param __tmp_params__: a dict to store dummy parameters if no
+        __gradient_machines__ is appended to `Parameters`.
+    :type __tmp_params__: dict

    Basically usage is

@@ -62,7 +79,7 @@ class Parameters(object):
    """

    def __init__(self):
-        self.__param_conf__ = dict()
+        self.__param_conf__ = OrderedDict()
        self.__gradient_machines__ = []
        self.__tmp_params__ = dict()

@@ -231,6 +248,9 @@ class Parameters(object):
        :rtype: np.ndarray
        """
        import py_paddle.swig_paddle as api
+        if self.__param_conf__[key].is_static:
+            return np.zeros(self.__param_conf__[key].size, dtype=np.float32)
+
        return self.__getter_inner(key, api.PARAMETER_GRADIENT)

    def set(self, parameter_name, value):
@@ -250,7 +270,7 @@ class Parameters(object):
        append gradient machine to parameters. This method is used internally in
        Trainer.train.

-        :param gradient_machine: Paddle C++ GradientMachine object.
+        :param gradient_machine: PaddlePaddle C++ GradientMachine object.
        :type gradient_machine: api.GradientMachine
        :return:
        """