backward.md 3.9 KB
Newer Older
F
fengjiayi 已提交
1
# Operator/expression 's Backward
D
dongzhihong 已提交
2

F
fengjiayi 已提交
3
## Motivation
D
dongzhihong 已提交
4

D
dongzhihong 已提交
5 6
In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the gradient operators/expressions together with the chain rule. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.

F
fengjiayi 已提交
7 8
## Backward Operator Registry

D
dongzhihong 已提交
9
A backward network is built up with several backward operators. Backward operators take forward operators' inputs outputs, and output gradients and then calculate its input gradients.
F
fengjiayi 已提交
10

F
fengjiayi 已提交
11 12 13 14
|                        | forward operator | backward operator 
| ---------------------- | ---------------- |------------------------- |		
| **Operator::inputs_**  | Inputs       | Inputs, Outputs, OutputGradients |	
| **Operator::outputs_** | Outputs          | InputGradients            |
F
fengjiayi 已提交
15

D
dongzhihong 已提交
16
 In most cases, there is a one-to-one correspondence between the forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.
F
fengjiayi 已提交
17

D
dongzhihong 已提交
18
For example, we have got a `mul_op`, and we can register its information and corresponding backward operator by the following macro:
F
fengjiayi 已提交
19 20

```cpp
21
REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
F
fengjiayi 已提交
22 23
```

F
fengjiayi 已提交
24
`mul` is the operator's type. `MulOp` and `MulOpMaker` are the operator class and the operator maker class respectively.
F
fengjiayi 已提交
25

F
fengjiayi 已提交
26
`mul_grad` is the type of backward operator, and `MulOpGrad` is its class name.
D
dongzhihong 已提交
27

F
fengjiayi 已提交
28
## Backward Opeartor Creating
D
dongzhihong 已提交
29

D
dongzhihong 已提交
30
Given a certain forward operator, we can get its corresponding backward operator by calling:
D
dongzhihong 已提交
31

F
fengjiayi 已提交
32 33
```cpp
OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op);
D
dongzhihong 已提交
34
```
F
fengjiayi 已提交
35 36 37

The function `BuildGradOp` will sequentially execute following processes:

F
fengjiayi 已提交
38
1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.
F
fengjiayi 已提交
39

D
dongzhihong 已提交
40
2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these, are not necessary for gradient computing.
F
fengjiayi 已提交
41

F
fengjiayi 已提交
42
3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.
F
fengjiayi 已提交
43

F
fengjiayi 已提交
44
4. Building backward operator with `inputs`, `outputs` and forward operator's attributes.
F
fengjiayi 已提交
45 46

## Backward Network Building
D
dongzhihong 已提交
47

F
fengjiayi 已提交
48
A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and put them together.
D
dongzhihong 已提交
49

F
fengjiayi 已提交
50
In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network. 
D
dongzhihong 已提交
51

D
dongzhihong 已提交
52
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`, `InputGradients`.
D
dongzhihong 已提交
53

D
dongzhihong 已提交
54 55
1. Op 

D
dongzhihong 已提交
56
   when the input forward network is an Op, return its gradient Operator Immediately.
D
dongzhihong 已提交
57 58 59

2. NetOp 

D
dongzhihong 已提交
60
   when the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp.
D
dongzhihong 已提交
61

D
dongzhihong 已提交
62
   **shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwrite their shared input variable.  
D
dongzhihong 已提交
63 64

   <p align="center">
D
dongzhihong 已提交
65
   <img src="./images/duplicate_op.png" width="50%" ><br/>
D
dongzhihong 已提交
66

D
dongzhihong 已提交
67
   1. Shared variable in operators. 
D
dongzhihong 已提交
68 69 70

   </p>

D
dongzhihong 已提交
71
   Share variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator replace the overwrite links. 
D
dongzhihong 已提交
72 73

   <p align="center">
D
dongzhihong 已提交
74
   <img src="images/duplicate_op2.png" width="50%" ><br/>
D
dongzhihong 已提交
75

D
dongzhihong 已提交
76
   2. Replace shared variable's gradient with `Add` operator.
D
dongzhihong 已提交
77

D
dongzhihong 已提交
78
   </p>
D
dongzhihong 已提交
79 80 81



D
dongzhihong 已提交
82
​	Then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it.