Merge pull request #3709 from Canpio/complete_backward_doc

Complete backward doc

Merge pull request #3709 from Canpio/complete_backward_doc
Complete backward doc
794c2f23 · fengjiayi · GitHub · 0af46431 · eaeb69f9 · 794c2f23
显示空白变更内容
内联并排

Showing with 42 addition and 12 deletion

paddle/framework/backward.md paddle/framework/backward.md +42 -12

未找到文件。
--- a/paddle/framework/backward.md
+++ b/paddle/framework/backward.md
-## Operator/expression 's Backward
+# Operator/expression 's Backward
-### Motivation
+## Motivation
-In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation lineage, the operator/ expression's Backward feature will generate the backward pass respect to forward pass.
+In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
-### Implement : gradient operator registry
+## Backward Operator Registry
-|                        | forward operator | backward operator                |
+A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients.
-| ---------------------- | ---------------- | -------------------------------- |
+|                        | forward operator | backward operator 
+| ---------------------- | ---------------- |------------------------- |		
 | **Operator::inputs_**  | Inputs       | Inputs, Outputs, OutputGradients |	
 | **Operator::outputs_** | Outputs          | InputGradients            |
-Inputs/Outputs means the input/output of the operator,  InputGradients/OutputGradients is the gradient respect to forward opeartor. Forward operator and Backward operator are isomorphic, save their corresponding needs into member attribute.
+ In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.
+For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro:
+```cpp
+REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
+```
+`mul` is the operator's type. `MulOp` and `MulOpMaker` are the operator class and the operator maker class respectively.
+`mul_grad` is the type of backward operator, and `MulOpGrad` is its class name.
+## Backward Opeartor Creating
+Given a certain forward operator, we can get its corresponding backward opeartor by calling:
+```cpp
+OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op);
+``` 
+The function `BuildGradOp` will sequentially execute following processes:
+1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.
+2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these are not necessary for gradient computing.
+3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.
+4. Building backward operator with `inputs`, `outputs` and forward operator's attributes.
-We use a global hash map record the gradient operators available, follow the philosophy  of minimum core, make operator pluggable unit. Each gradient is an operator and it needs to regist itself. 
+## Backward Network Building
-grad_op_builder(fengjiayi)
+A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and put them together.
-### Implement : Backward network
+In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network. 
 given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.