From 3120ee5cfbbe6ecf3550b6a338a4c14afe6e4ebd Mon Sep 17 00:00:00 2001 From: dongzhihong Date: Sat, 26 Aug 2017 18:46:06 -0700 Subject: [PATCH] fix backward doc --- paddle/framework/backward.md | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/paddle/framework/backward.md b/paddle/framework/backward.md index 74c001b06a..c8fa3fefe5 100644 --- a/paddle/framework/backward.md +++ b/paddle/framework/backward.md @@ -21,18 +21,32 @@ grad_op_builder(fengjiayi) given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`. -1. bla bla bla (yuyang) +1. Op + + when the input forward network is a Op, return its gradient Operator Immediately. 2. NetOp - when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively and ensure them done. During the process, we need to collect the `OutputGradients` name. + when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to forward NetOp. + + **shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwirte their shared input variable. + +

+
+ + 1. shared variable in two operators. + +

+ + Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator replace the overwirte links. + +

+
- We share variable in the same scope, as a result, duplicate operator `OutputGradients` will overwirte then duplicate variable. + 2. replace shared variable gradient with `Add` Operator - ![./images/duplicate_op]() +

- Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator instead. -![./images/duplicate_op2]() -​ Then collect the sub graph OutputGradients/InputGradients as the NetOp's and return it. +​ Then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it. -- GitLab