diff --git a/paddle/framework/backward.md b/paddle/framework/backward.md
index d5dbd57d19fa955bb1a3396d2e972b47f169495c..b4205fed2e6e98282d60a25a83e5b2e2c5d0cfce 100644
--- a/paddle/framework/backward.md
+++ b/paddle/framework/backward.md
@@ -52,18 +52,32 @@ In our design, the network itself is also a kind of operator. So the operators c
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.
-1. bla bla bla (yuyang)
+1. Op
+
+ when the input forward network is a Op, return its gradient Operator Immediately.
2. NetOp
- when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively and ensure them done. During the process, we need to collect the `OutputGradients` name.
+ when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to forward NetOp.
+
+ **shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwirte their shared input variable.
+
+
+
+
+ 1. shared variable in two operators.
+
+
+
+ Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator replace the overwirte links.
+
+
+
- We share variable in the same scope, as a result, duplicate operator `OutputGradients` will overwirte then duplicate variable.
+ 2. replace shared variable gradient with `Add` Operator
- ![./images/duplicate_op]()
+
- Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator instead.
-![./images/duplicate_op2]()
- Then collect the sub graph OutputGradients/InputGradients as the NetOp's and return it.
+ Then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it.
diff --git a/python/paddle/v2/framework/tests/mnist.py b/python/paddle/v2/framework/tests/mnist.py
index 9a0b109850e92c66e69f74c5cd0853a09b5551a1..a68f302f9c344bf6d63e8d9b48836d69338c3d0b 100644
--- a/python/paddle/v2/framework/tests/mnist.py
+++ b/python/paddle/v2/framework/tests/mnist.py
@@ -181,7 +181,7 @@ images = data_layer(name='pixel', dims=[BATCH_SIZE, 784])
labels = data_layer(name='label', dims=[BATCH_SIZE])
fc1 = fc_layer(net=forward_net, input=images, size=100, act="sigmoid")
fc2 = fc_layer(net=forward_net, input=fc1, size=100, act="sigmoid")
-predict = fc_layer(net=forward_net, input=fc2, size=100, act="softmax")
+predict = fc_layer(net=forward_net, input=fc2, size=10, act="softmax")
cost = cross_entropy_layer(net=forward_net, input=predict, label=labels)
init_net.complete_add_op(True)