diff --git a/doc/design/graph.md b/doc/design/graph.md index 87f696f90f164a639ad5182823ddfb14aab7e065..51b7f87638f8ddff752328a562fe0dd0fe56cfd1 100644 --- a/doc/design/graph.md +++ b/doc/design/graph.md @@ -1,4 +1,4 @@ -# Design Doc: Computations as Graphs +# Design Doc: Computations as a Graph A primary goal of the refactorization of PaddlePaddle is a more flexible representation of deep learning computation, in particular, a graph of operators and variables, instead of sequences of layers as before. @@ -8,6 +8,8 @@ This document explains that the construction of a graph as three steps: - construct the backward part - construct the optimization part +## The Construction of a Graph + Let us take the problem of image classification as a simple example. The application program that trains the model looks like: ```python @@ -25,7 +27,9 @@ The first four lines of above program build the forward part of the graph. ![](images/graph_construction_example_forward_only.png) -In particular, the first line `x = layer.data("images")` creates variable x and a Feed operator that copies a column from the minibatch to x. `y = layer.fc(x)` creates not only the FC operator and output variable y, but also two parameters, W and b. +In particular, the first line `x = layer.data("images")` creates variable x and a Feed operator that copies a column from the minibatch to x. `y = layer.fc(x)` creates not only the FC operator and output variable y, but also two parameters, W and b, and the initialization operators. + +Initialization operators are kind of "run-once" operators -- the `Run` method increments a class data member counter so to run at most once. By doing so, a parameter wouldn't be initialized repeatedly, say, in every minibatch. In this example, all operators are created as `OpDesc` protobuf messages, and all variables are `VarDesc`. These protobuf messages are saved in a `BlockDesc` protobuf message. @@ -49,3 +53,18 @@ According to the chain rule of gradient computation, `ConstructBackwardGraph` wo For each parameter, like W and b created by `layer.fc`, marked as double circles in above graphs, `ConstructOptimizationGraph` creates an optimization operator to apply its gradient. Here results in the complete graph: ![](images/graph_construction_example_all.png) + +## Block and Graph + +The word block and graph are interchangable in the desgin of PaddlePaddle. A [Block[(https://github.com/PaddlePaddle/Paddle/pull/3708) is a metaphore of the code and local variables in a pair of curly braces in programming languages, where operators are like statements or instructions. A graph of operators and variables is a representation of the block. + +A Block keeps operators in an array `BlockDesc::ops` + +```protobuf +message BlockDesc { + repeated OpDesc ops = 1; + repeated VarDesc vars = 2; +} +``` + +in the order that there appear in user programs, like the Python program at the beginning of this article. We can imagine that in `ops`, we have some forward operators, followed by some gradient operators, and then some optimization operators. diff --git a/doc/design/images/graph_construction_example.dot b/doc/design/images/graph_construction_example.dot index bedb6de0111a8ccab4030d034d65cf72705fc25a..8d1b673abf6b78c851676fa379dc850c4818f0e5 100644 --- a/doc/design/images/graph_construction_example.dot +++ b/doc/design/images/graph_construction_example.dot @@ -2,6 +2,8 @@ digraph ImageClassificationGraph { ///////// The forward part ///////// FeedX [label="Feed", color=blue, shape=box]; FeedY [label="Feed", color=blue, shape=box]; + InitW [label="Init", color=blue, shape=diamond]; + Initb [label="Init", color=blue, shape=diamond]; FC [label="FC", color=blue, shape=box]; MSE [label="MSE", color=blue, shape=box]; @@ -14,6 +16,8 @@ digraph ImageClassificationGraph { FeedX -> x -> FC -> y -> MSE -> cost [color=blue]; FeedY -> l [color=blue]; + InitW -> W [color=blue]; + Initb -> b [color=blue]; W -> FC [color=blue]; b -> FC [color=blue]; l -> MSE [color=blue]; diff --git a/doc/design/images/graph_construction_example_all.png b/doc/design/images/graph_construction_example_all.png index 18d8330b60e12720bb993c8cf588d64ff8db1ea9..181187503472d15779b87284105841168b3945c4 100644 Binary files a/doc/design/images/graph_construction_example_all.png and b/doc/design/images/graph_construction_example_all.png differ diff --git a/doc/design/images/graph_construction_example_forward_backward.png b/doc/design/images/graph_construction_example_forward_backward.png index 61c3a02a04bc8891ab5b921a889829bcce386df8..3049a9315fd616464dec54e33064cb75598ca536 100644 Binary files a/doc/design/images/graph_construction_example_forward_backward.png and b/doc/design/images/graph_construction_example_forward_backward.png differ diff --git a/doc/design/images/graph_construction_example_forward_only.png b/doc/design/images/graph_construction_example_forward_only.png index 14805df11fc09f64d6bc17f5e969f1400d615148..25d19088cbf0b5f68cf734f2ff21eba8af4a2860 100644 Binary files a/doc/design/images/graph_construction_example_forward_only.png and b/doc/design/images/graph_construction_example_forward_only.png differ