diff --git a/paddle/framework/backward.md b/paddle/framework/backward.md index c762811dfc190b255e0a3389885a081ce8315caf..0859bf1d9bce208a7f4f9ad6e159988a2ce7120a 100644 --- a/paddle/framework/backward.md +++ b/paddle/framework/backward.md @@ -2,9 +2,20 @@ ## Motivation -In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the gradient operators/expressions together with the chain rule. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass. +In Neural Network, many model is solved by the the backpropagation algorithm(known as BP) at present. Technically it caculates the gradient of the loss function, then distributed back through the networks. Follows the chain rule, so we need to compound the gradient operators/expressions together with the chain rule. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass. -## Backward Operator Registry +## Implementation + +In this design doc, we exported only one API for generating the backward pass. + +```c++ +std::unique_ptr Backward(const OperatorBase& forwardOp, + const std::unordered_set& no_grad_vars); +``` + +The implementation behind it can be divided into two parts. Namely, ** Backward Operator Creating** and **Backward Operator Building**. + +###Backward Operator Registry A backward network is built up with several backward operators. Backward operators take forward operators' inputs outputs, and output gradients and then calculate its input gradients. @@ -25,7 +36,7 @@ REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); `mul_grad` is the type of backward operator, and `MulOpGrad` is its class name. -## Backward Opeartor Creating +###Backward Opeartor Creating Given a certain forward operator, we can get its corresponding backward operator by calling: @@ -43,40 +54,47 @@ The function `BuildGradOp` will sequentially execute following processes: 4. Building backward operator with `inputs`, `outputs` and forward operator's attributes. -## Backward Network Building - -A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and put them together. +###Backward Network Building -In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network. - -given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`, `InputGradients`. +A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and append them together one by one. There is some corner case need to process specially. 1. Op - when the input forward network is an Op, return its gradient Operator Immediately. + when the input forward network is an Op, return its gradient Operator Immediately. If all of its outputs are in no gradient set, then return a special `NoGradient` operator 2. NetOp - when the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp. + In our design, the network itself is also a kind of operator(**NetOp**). So the operators contained by a big network may be some small network. When the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp. + +3. RnnOp + + RnnOp is a nested stepnet operator. Backward module need to recusively call `Backward` for every stepnet. + +4. Shared Variable **shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwrite their shared input variable. -

-
+

+
+ +​ pic 1. Shared variable in operators. + +

- 1. Shared variable in operators. +​ Share variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator replace the overwrite links. -

+

+
- Share variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator replace the overwrite links. +​ pic 2. Replace shared variable's gradient with `Add` operator. -

-
+

- 2. Replace shared variable's gradient with `Add` operator. +​ Because our framework find variable accord to its name, we need rename the output links. We add a suffix of number represent its position in clockwise. -

+5. Part of Gradient is Zero. + In the whole graph, there is some case of that one operator's gradient is not needed, but its input's gradient is a dependency link of other operator, we need to fill a same shape gradient matrix in the position. In our implement, we insert a special `fillZeroLike` operator. -​ Then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it. +Follow these rules above, then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it. diff --git a/paddle/framework/images/duplicate_op2.graffle b/paddle/framework/images/duplicate_op2.graffle index ede3bca30ae17d5af52505fd94dc2f79b23b57e0..5cec3bc64dbd44dc99e348485969f29bd128ceb1 100644 Binary files a/paddle/framework/images/duplicate_op2.graffle and b/paddle/framework/images/duplicate_op2.graffle differ diff --git a/paddle/framework/images/duplicate_op2.png b/paddle/framework/images/duplicate_op2.png index 4e872dc2caf3b0cbd0d5176f11a14801b538dc86..21cdd5cabf1b5203e1435a75b57770d2f702fa92 100644 Binary files a/paddle/framework/images/duplicate_op2.png and b/paddle/framework/images/duplicate_op2.png differ