diff --git a/paddle/framework/backward.md b/paddle/framework/backward.md index c8fa3fefe5632a36d9044b4bccfd3dbb7c64dbf6..8aa6728a95bc464ab8884986f0cec6c817d3303b 100644 --- a/paddle/framework/backward.md +++ b/paddle/framework/backward.md @@ -1,23 +1,53 @@ -## Operator/expression 's Backward +# Operator/expression 's Backward -### Motivation +## Motivation -In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation lineage, the operator/ expression's Backward feature will generate the backward pass respect to forward pass. +In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass. + +## Backward Operator Registry -### Implement : gradient operator registry +A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients. -| | forward operator | backward operator | -| ---------------------- | ---------------- | -------------------------------- | -| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | -| **Operator::outputs_** | Outputs | InputGradients | +| | forward operator | backward operator +| ---------------------- | ---------------- |------------------------- | +| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | +| **Operator::outputs_** | Outputs | InputGradients | -Inputs/Outputs means the input/output of the operator, InputGradients/OutputGradients is the gradient respect to forward opeartor. Forward operator and Backward operator are isomorphic, save their corresponding needs into member attribute. + In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced. -We use a global hash map record the gradient operators available, follow the philosophy of minimum core, make operator pluggable unit. Each gradient is an operator and it needs to regist itself. +For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro: -grad_op_builder(fengjiayi) +```cpp +REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); +``` -### Implement : Backward network +`mul` is the operator's type. `MulOp` and `MulOpMaker` are the operator class and the operator maker class respectively. + +`mul_grad` is the type of backward operator, and `MulOpGrad` is its class name. + +## Backward Opeartor Creating + +Given a certain forward operator, we can get its corresponding backward opeartor by calling: + +```cpp +OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op); +``` + +The function `BuildGradOp` will sequentially execute following processes: + +1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`. + +2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these are not necessary for gradient computing. + +3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`. + +4. Building backward operator with `inputs`, `outputs` and forward operator's attributes. + +## Backward Network Building + +A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and put them together. + +In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network. given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.