diff --git a/paddle/framework/backward.md b/paddle/framework/backward.md new file mode 100644 index 0000000000000000000000000000000000000000..74c001b06a9e7b2279abf998604f2acf1b1168e4 --- /dev/null +++ b/paddle/framework/backward.md @@ -0,0 +1,38 @@ +## Operator/expression 's Backward + +### Motivation + +In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation lineage, the operator/ expression's Backward feature will generate the backward pass respect to forward pass. + +### Implement : gradient operator registry + +| | forward operator | backward operator | +| ---------------------- | ---------------- | -------------------------------- | +| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | +| **Operator::outputs_** | Outputs | InputGradients | + +Inputs/Outputs means the input/output of the operator, InputGradients/OutputGradients is the gradient respect to forward opeartor. Forward operator and Backward operator are isomorphic, save their corresponding needs into member attribute. + +We use a global hash map record the gradient operators available, follow the philosophy of minimum core, make operator pluggable unit. Each gradient is an operator and it needs to regist itself. + +grad_op_builder(fengjiayi) + +### Implement : Backward network + +given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`. + +1. bla bla bla (yuyang) + +2. NetOp + + when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively and ensure them done. During the process, we need to collect the `OutputGradients` name. + + We share variable in the same scope, as a result, duplicate operator `OutputGradients` will overwirte then duplicate variable. + + ![./images/duplicate_op]() + + Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator instead. + +![./images/duplicate_op2]() + +​ Then collect the sub graph OutputGradients/InputGradients as the NetOp's and return it. diff --git a/paddle/framework/images/duplicate_op.graffle b/paddle/framework/images/duplicate_op.graffle new file mode 100644 index 0000000000000000000000000000000000000000..5979f792e252f028a615729215529c2be42d9165 Binary files /dev/null and b/paddle/framework/images/duplicate_op.graffle differ diff --git a/paddle/framework/images/duplicate_op.png b/paddle/framework/images/duplicate_op.png new file mode 100644 index 0000000000000000000000000000000000000000..f299c5d37f260a1bb0daec886f0a4ee1c1f31c92 Binary files /dev/null and b/paddle/framework/images/duplicate_op.png differ diff --git a/paddle/framework/images/duplicate_op2.graffle b/paddle/framework/images/duplicate_op2.graffle new file mode 100644 index 0000000000000000000000000000000000000000..2b658085d6a55d368c320051ba7f94ec2900f13c Binary files /dev/null and b/paddle/framework/images/duplicate_op2.graffle differ diff --git a/paddle/framework/images/duplicate_op2.png b/paddle/framework/images/duplicate_op2.png new file mode 100644 index 0000000000000000000000000000000000000000..c5588015d1450fd8c1bda3580680d884494868bb Binary files /dev/null and b/paddle/framework/images/duplicate_op2.png differ