@@ -17,22 +17,22 @@ The goals of refactoring include:
1. A graph is composed of *variables* and *operators*.
1. The description of graphs must be capable of being serialized/deserialized, so that:
1. The description of graphs must be serializable/deserializable, so that:
1. It can to be sent to the cloud for distributed execution, and
1. It can be sent to the cloud for distributed execution, and
1. It can be sent to clients for mobile or enterprise deployment.
1. The Python program does the following steps
1. The Python program does two things
1.*compilation*: run a Python program to generate a protobuf message representation of the graph and send it to
1.*Compilation* runs a Python program to generate a protobuf message representation of the graph and send it to
1. the C++ library `libpaddle.so` for local execution,
1. the master process of a distributed training job for training, or
1. the server process of a Kubernetes serving job for distributed serving.
1.*execution*: execute the graph by constructing instances of class [`Variable`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24) and [`OperatorBase`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L70), according to the protobuf message.
1.*Execution* executes the graph by constructing instances of class [`Variable`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24) and [`OperatorBase`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L70), according to the protobuf message.
## Description and Realization of Computation Graph
At compile time, the Python program generates a protobuf message representation of the graph, or the description of the graph.
At compile time, the Python program generates a protobuf message representation of the graph, or a description of the graph.
At runtime, the C++ program realizes the graph and runs it.
...
...
@@ -42,11 +42,11 @@ At runtime, the C++ program realizes the graph and runs it.
The word *graph* is interchangeable with *block* in this document. A graph represents computation steps and local variables similar to a C++/Java program block, or a pair of parentheses(`{` and `}`).
The word *graph* is interchangeable with *block* in this document. A graph consists of computation steps and local variables similar to a C++/Java program block, or a pair of parentheses(`{` and `}`).
## Compilation and Execution
1. Run an application Python program to describe the graph. In particular, the Python application program does the following:
1. Run a Python program to describe the graph. In particular, the Python application program does the following:
1. Create `VarDesc` to represent local/intermediate variables,
1. Create operators and set attributes,
...
...
@@ -54,10 +54,10 @@ The word *graph* is interchangeable with *block* in this document. A graph repr
1. Infer the type and the shape of variables,
1. Plan memory-reuse for variables,
1. Generate the backward graph
1.Optimize the computation graph.
1.Potentially, split the graph for distributed training.
1.Add optimization operators to the computation graph.
1.Optionally, split the graph for distributed training.
1. The invocation of `train` or [`infer`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/inference.py#L108) methods in the application Python program does the following:
1. The invocation of `train` or [`infer`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/inference.py#L108) methods in the Python program does the following:
1. Create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block,
1. realize local variables defined in the BlockDesc message in the new scope,
*`Operator` is the fundamental building block of the user interface.
* Operator stores input/output variable names, and attributes.
* The `InferShape` interface is used to infer the shape of the output variable shapes based on the shapes of the input variables.
* Operator stores input/output variable names and attributes.
* The `InferShape` interface is used to infer the shape of the output variables based on the shapes of the input variables.
* Use `Run` to compute the `output` variables from the `input` variables.
---
...
...
@@ -139,7 +139,7 @@ Compile Time -> IR -> Runtime
* Limit the number of `tensor.device(dev) = ` in your code.
*`thrust::transform` and `std::transform`.
*`thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized element-wise kernels.
*`thrust` also has more complex APIs, like `scan`, `reduce`, `reduce_by_key`.
*`thrust`, in addition, supports more complex APIs, like `scan`, `reduce`, `reduce_by_key`.
* Hand-writing `GPUKernel` and `CPU` code
* Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.)
---
...
...
@@ -185,10 +185,10 @@ Make sure the registration process is executed and linked.
1. Write an Op class and its gradient Op class, if required.
2. Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator.
3. Invoke the macro `REGISTER_OP`. This macro will
1. Call maker class to complete the `proto` and the`checker`
1. Call maker class to complete `proto` and`checker`
2. Using the completed `proto` and `checker`, it will add a new key-value pair to the `OpInfoMap`
4. Invoke the `USE` macro in which the Op is used, to make sure that it is linked.
4. Invoke the `USE` macro in which the Op is used to make sure that it is linked.
---
# Backward Module (1/2)
...
...
@@ -199,13 +199,14 @@ Make sure the registration process is executed and linked.
---
# Backward Module (2/2)
### Build Backward Network
-**Input**: graph of forward operators
-**Output**: graph of backward operators
-**Input**: a graph of forward operators
-**Output**: a graph of backward operators
-**Corner cases in construction**
- Shared Variables => insert an `Add` operator to combine gradients
- No Gradient => insert a `fill_zero_grad` operator
- Recursive NetOp => call `Backward` recursively
- RNN Op => recursively call `Backward` on stepnet
- RNN Op => recursively call `Backward` on stepnet
---
...
...
@@ -215,10 +216,10 @@ Make sure the registration process is executed and linked.
*Only dims and data pointers are stored in `Tensor`.
*All operations on `Tensor` are written in `Operator` or global functions.