提交 40a3a89a 编写于 作者: F fengjiayi

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into complete_backward_doc

...@@ -52,18 +52,32 @@ In our design, the network itself is also a kind of operator. So the operators c ...@@ -52,18 +52,32 @@ In our design, the network itself is also a kind of operator. So the operators c
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`. given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.
1. bla bla bla (yuyang) 1. Op
when the input forward network is a Op, return its gradient Operator Immediately.
2. NetOp 2. NetOp
when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively and ensure them done. During the process, we need to collect the `OutputGradients` name. when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to forward NetOp.
**shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwirte their shared input variable.
<p align="center">
<img src="./images/duplicate_op.png" width="70%" ><br/>
1. shared variable in two operators.
</p>
Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator replace the overwirte links.
<p align="center">
<img src="images/duplicate_op2.png" width="90%" ><br/>
We share variable in the same scope, as a result, duplicate operator `OutputGradients` will overwirte then duplicate variable. 2. replace shared variable gradient with `Add` Operator
![./images/duplicate_op]() </p>
Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator instead.
![./images/duplicate_op2]()
​ Then collect the sub graph OutputGradients/InputGradients as the NetOp's and return it. ​ Then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it.
...@@ -181,7 +181,7 @@ images = data_layer(name='pixel', dims=[BATCH_SIZE, 784]) ...@@ -181,7 +181,7 @@ images = data_layer(name='pixel', dims=[BATCH_SIZE, 784])
labels = data_layer(name='label', dims=[BATCH_SIZE]) labels = data_layer(name='label', dims=[BATCH_SIZE])
fc1 = fc_layer(net=forward_net, input=images, size=100, act="sigmoid") fc1 = fc_layer(net=forward_net, input=images, size=100, act="sigmoid")
fc2 = fc_layer(net=forward_net, input=fc1, size=100, act="sigmoid") fc2 = fc_layer(net=forward_net, input=fc1, size=100, act="sigmoid")
predict = fc_layer(net=forward_net, input=fc2, size=100, act="softmax") predict = fc_layer(net=forward_net, input=fc2, size=10, act="softmax")
cost = cross_entropy_layer(net=forward_net, input=predict, label=labels) cost = cross_entropy_layer(net=forward_net, input=predict, label=labels)
init_net.complete_add_op(True) init_net.complete_add_op(True)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册