@@ -52,18 +52,32 @@ In our design, the network itself is also a kind of operator. So the operators c
...
@@ -52,18 +52,32 @@ In our design, the network itself is also a kind of operator. So the operators c
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.
1. bla bla bla (yuyang)
1. Op
when the input forward network is a Op, return its gradient Operator Immediately.
2. NetOp
2. NetOp
when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively and ensure them done. During the process, we need to collect the `OutputGradients` name.
when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to forward NetOp.
**shared variable**. As illustrated in the pictures, two operator's `Output``Gradient` will overwirte their shared input variable.
Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator replace the overwirte links.
We share variable in the same scope, as a result, duplicate operator `OutputGradients` will overwirte then duplicate variable.
2. replace shared variable gradient with `Add` Operator
![./images/duplicate_op]()
</p>
Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator instead.
![./images/duplicate_op2]()
Then collect the sub graph OutputGradients/InputGradients as the NetOp's and return it.
Then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it.