You need to sign in or sign up before continuing.
提交 b1435ec7 编写于 作者: T Travis CI

Deploy to GitHub Pages: 5d6d2bc1

上级 42819a3a
# Design Doc: Refactorization Overview # Design Doc: Refactorization Overview
The goal of refactorizaiton include: The goals of refactoring include:
1. Make it easy for external contributors to write new elementory computaiton operations. 1. Making it easy for external contributors to write new elementary computation operations.
1. Make the codebase clean and readable. 1. Making the codebase clean and readable.
1. Introduce a new design of computation representation -- a computation graph of operators and variables. 1. Designing a new computation representation -- a computation graph of operators and variables.
1. The graph representation helps implementing auto-scalable and auto fault recoverable distributed computing. 1. Implementing auto-scalability and auto fault recoverable distributed computing with the help of computation graphs.
## Computation Graphs ## Computation Graphs
1. PaddlePaddle represent the computation, training and inference of DL models, by computation graphs. 1. PaddlePaddle represents the computation, training and inference of Deep Learning models, by computation graphs.
1. Please dig into [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a solid example. 1. Please refer to [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a concrete example.
1. Users write Python programs to describe the graphs and run it (locally or remotely). 1. Users write Python programs to describe the graphs and run them (locally or remotely).
1. A graph is composed of *variables* and *operators*. 1. A graph is composed of *variables* and *operators*.
1. The description of graphs must be able to be serialized/deserialized, so it 1. The description of graphs must be capable of being serialized/deserialized, so that
1. could to be sent to the cloud for distributed execution, and 1. It can to be sent to the cloud for distributed execution, and
1. be sent to clients for mobile or enterprise deployment. 1. It can be sent to clients for mobile or enterprise deployment.
1. The Python program do 1. The Python program does the following steps
1. *compilation*: runs a Python program to generate a protobuf message representation of the graph and send it to 1. *compilation*: run a Python program to generate a protobuf message representation of the graph and send it to
1. the C++ library `libpaddle.so` for local execution, 1. the C++ library `libpaddle.so` for local execution,
1. the master process of a distributed training job for training, or 1. the master process of a distributed training job for training, or
1. the server process of a Kubernetes serving job for distributed serving. 1. the server process of a Kubernetes serving job for distributed serving.
1. *execution*: according to the protobuf message, constructs instances of class `Variable` and `OperatorBase`, and run them. 1. *execution*: execute the graph by constructing instances of class [`Variable`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24) and [`OperatorBase`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L70), according to the protobuf message.
## Description and Realization ## Description and Realization of Computation Graph
At compile time, the Python program generates protobuf message representation of the graph, or the description of the graph. At compile time, the Python program generates a protobuf message representation of the graph, or the description of the graph.
At runtime, the C++ program realizes the graph and run it. At runtime, the C++ program realizes the graph and runs it.
| | Representation (protobuf messages) | Realization (C++ class objects) | | | Representation (protobuf messages) | Realization (C++ class objects) |
|---|---|---| |---|---|---|
...@@ -42,30 +42,31 @@ At runtime, the C++ program realizes the graph and run it. ...@@ -42,30 +42,31 @@ At runtime, the C++ program realizes the graph and run it.
|Operation|[OpDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35)|[Operator](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64)| |Operation|[OpDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35)|[Operator](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64)|
|Block|BlockDesc|Block| |Block|BlockDesc|Block|
The word *graph* is exchangable with *block* in this document. A graph represent computation steps and local variables as a C++/Java program block, or a pair of { and }. The word *graph* is interchangeable with *block* in this document. A graph represents computation steps and local variables similar to a C++/Java program block, or a pair of parentheses(`{` and `}`).
## Compilation and Execution ## Compilation and Execution
1. Run an applicaton Python program to describe the graph. In particular, 1. Run an application Python program to describe the graph. In particular, the Python application program does the following:
1. create VarDesc to represent local/intermediate variables, 1. Create `VarDesc` to represent local/intermediate variables,
1. create operators and set attributes, 1. Create operators and set attributes,
1. validate attribute values, 1. Validate attribute values,
1. inference the type and the shape of variables, 1. Infer the type and the shape of variables,
1. plan for memory-reuse for variables, 1. Plan memory-reuse for variables,
1. generate backward and optimization part of the Graph. 1. Generate the backward graph
1. possiblly split the graph for distributed training. 1. Optimize the computation graph.
1. Potentially, split the graph for distributed training.
1. The invocation of `train` or `infer` in the application Python program: 1. The invocation of `train` or [`infer`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/inference.py#L108) methods in the application Python program does the following:
1. create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block, 1. Create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block,
1. realize local variables defined in the BlockDesc message in the new scope, 1. realize local variables defined in the BlockDesc message in the new scope,
1. a scope is similar to the stack frame in programming languages, 1. a scope is similar to the stack frame in programming languages,
1. create an instance of class `Block`, in which, 1. Create an instance of class `Block`, in which,
1. realize operators in the BlockDesc message, 1. realize operators in the BlockDesc message,
1. run the Block by calling 1. Run the Block by calling
1. `Block::Eval(vector<Variable>* targets)` for forward and backward computations, or 1. `Block::Eval(vector<Variable>* targets)` for forward and backward computations, or
1. `Block::Eval(vector<Operator>* targets)` for optimization. 1. `Block::Eval(vector<Operator>* targets)` for optimization.
...@@ -76,14 +77,14 @@ The word *graph* is exchangable with *block* in this document. A graph represen ...@@ -76,14 +77,14 @@ The word *graph* is exchangable with *block* in this document. A graph represen
Compile Time -> IR -> Runtime Compile Time -> IR -> Runtime
``` ```
### Benefit ### Benefits of IR
- Optimization - Optimization
```text ```text
Compile Time -> IR -> Optimized IR -> Runtime Compile Time -> IR -> Optimized IR -> Runtime
``` ```
- Send automatically partitioned IR to different nodes. - Automatically send partitioned IR to different nodes.
- Automatic data parallel - Automatic Data Parallelism
```text ```text
Compile Time Compile Time
|-> Single GPU IR |-> Single GPU IR
...@@ -92,7 +93,7 @@ Compile Time -> IR -> Runtime ...@@ -92,7 +93,7 @@ Compile Time -> IR -> Runtime
|-> Node-1 (runs trainer-IR-1) |-> Node-1 (runs trainer-IR-1)
|-> Node-2 (runs pserver-IR) |-> Node-2 (runs pserver-IR)
``` ```
- Automatic model parallel (planned for future) - Automatic Model Parallelism (planned for future)
--- ---
...@@ -105,10 +106,10 @@ Compile Time -> IR -> Runtime ...@@ -105,10 +106,10 @@ Compile Time -> IR -> Runtime
# Operator # Operator
![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot) ![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot)
* `Operator` is the fundamental building block as the user interface. * `Operator` is the fundamental building block of the user interface.
* Operator stores input/output variable name, and attributes. * Operator stores input/output variable names, and attributes.
* The `InferShape` interface is used to infer output variable shapes by its input shapes. * The `InferShape` interface is used to infer the shape of the output variable shapes based on the shapes of the input variables.
* Use `Run` to compute `input variables` to `output variables`. * Use `Run` to compute the `output` variables from the `input` variables.
--- ---
...@@ -126,30 +127,30 @@ Compile Time -> IR -> Runtime ...@@ -126,30 +127,30 @@ Compile Time -> IR -> Runtime
# Why separate Kernel and Operator # Why separate Kernel and Operator
* Separate GPU and CPU code. * Separate GPU and CPU code.
* Make Paddle can run without GPU. * Make Paddle capable of running without GPU.
* Make one operator (which is user interface) can contain many implementations. * Make one operator (which is a user interface) and create many implementations.
* Same mul op, different FP16, FP32 Kernel. different MKL, eigen kernel. * For example, same multiplication op can have different implementations kernels such as FP16 kernel, FP32 kernel, MKL, eigen kernel.
--- ---
# Libraries for Kernel development # Libraries for Kernel development
* `Eigen::Tensor` contains basic math and element-wise functions. * `Eigen::Tensor` contains basic math and element-wise functions.
* Note that `Eigen::Tensor` has broadcast implementation. * Note that `Eigen::Tensor` has broadcast implementation.
* Limit number of `tensor.device(dev) = ` in your code. * Limit the number of `tensor.device(dev) = ` in your code.
* `thrust::tranform` and `std::transform`. * `thrust::tranform` and `std::transform`.
* `thrust` has the same API as C++ standard library. Using `transform` can quickly implement a customized elementwise kernel. * `thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized elementwise kernels.
* `thrust` has more complex API, like `scan`, `reduce`, `reduce_by_key`. * `thrust` also has more complex APIs, like `scan`, `reduce`, `reduce_by_key`.
* Hand-writing `GPUKernel` and `CPU` code * Hand-writing `GPUKernel` and `CPU` code
* Do not write `.h`. CPU Kernel should be in `.cc`. GPU kernel should be in `.cu`. (`GCC` cannot compile GPU code.) * Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.)
--- ---
# Operator Register # Operator Registration
## Why register is necessary? ## Why registration is necessary?
We need a method to build mappings between Op type names and Op classes. We need a method to build mappings between Op type names and Op classes.
## How to do the register? ## How is registration implemented?
Maintain a map, whose key is the type name and value is corresponding Op constructor. Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.
--- ---
# The Registry Map # The Registry Map
...@@ -177,34 +178,34 @@ REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class) ...@@ -177,34 +178,34 @@ REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class)
REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class) REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class)
``` ```
### `USE` Macros ### USE Macros
make sure the registration process is executed and linked. Make sure the registration process is executed and linked.
--- ---
# Register Process # Registration Process
1. Write Op class, as well as its gradient Op class if there is. 1. Write an Op class and its gradient Op class, if required.
2. Write Op maker class. In the constructor, describe its inputs, outputs, and attributes. 2. Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator.
3. Invoke macro `REGISTER_OP`. The macro will 3. Invoke the macro `REGISTER_OP`. This macro will
1. call maker class to complete `proto` and `checker` 1. Call maker class to complete the `proto` and the `checker`
2. with the completed `proto` and `checker`, build a new key-value pair in the `OpInfoMap` 2. Using the completed `proto` and `checker`, it will add a new key-value pair to the `OpInfoMap`
4. Invoke `USE` macro in where the Op is used to make sure it is linked. 4. Invoke the `USE` macro in which the Op is used, to make sure that it is linked.
--- ---
# Backward Module (1/2) # Backward Module (1/2)
### Create Backward Operator ### Create Backward Operator
- Mapping from forwarding Op to backward Op - Mapping from forward Op to backward Op
![backward](https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png) ![backward](https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png)
--- ---
# Backward Module (2/2) # Backward Module (2/2)
### Build Backward Network ### Build Backward Network
- **Input** graph of forwarding operators - **Input**: graph of forwarding operators
- **Output** graph of backward operators - **Output**: graph of backward operators
- **corner case in construction** - **Corner cases in construction**
- shared variable => insert `Add` operator - Shared Variables => insert an `Add` operator to combine gradients
- no gradient => insert `fill_zero_grad` operator - No Gradient => insert a `fill_zero_grad` operator
- recursive netOp => call `Backward` recursively - Recursive NetOp => call `Backward` recursively
- RNN Op => recursively call `Backward` on stepnet - RNN Op => recursively call `Backward` on stepnet
...@@ -213,41 +214,41 @@ make sure the registration process is executed and linked. ...@@ -213,41 +214,41 @@ make sure the registration process is executed and linked.
* `Tensor` is an n-dimension array with type. * `Tensor` is an n-dimension array with type.
* Only dims and data pointers are stored in `Tensor`. * Only dims and data pointers are stored in `Tensor`.
* All operators on `Tensor` is written in `Operator` or global functions. * All operations on `Tensor` are written in `Operator` or global functions.
* variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) * Variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md)
* `Variable` is the inputs and outputs of an operator. Not just `Tensor`. * `Variable` instances are the inputs and the outputs of an operator. Not just `Tensor`.
* step_scopes in RNN is a variable and not a tensor. * `step_scopes` in RNN is a variable and not a tensor.
* `Scope` is where variables store at. * `Scope` is where variables are stores.
* map<string/*var name */, Variable> * map<string `variable_name`, Variable>
* `Scope` has a hierarchical structure. The local scope can get variable from its parent scope. * `Scope` has a hierarchical structure. The local scope can get variables from its parent scope.
--- ---
# Block (in design) # Block (in design)
## the difference with original RNNOp ## the difference with original RNNOp
- as an operator is more intuitive than `RNNOp`, - As an operator is more intuitive than `RNNOp`,
- offers new interface `Eval(targets)` to deduce the minimal block to `Run`, - Offers a new interface `Eval(targets)` to deduce the minimal block to `Run`,
- fits the compile-time/ runtime separation design. - Fits the compile-time/ runtime separation design paradigm.
- during the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc` - During the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc`
- when graph executes, a Block with `BlockDesc` passed in creates `Op` and `Var` then `Run` - When graph executes, a Block with `BlockDesc` is passed. It then creates `Op` and `Var` instances and then invokes `Run`.
--- ---
# Milestone # Milestone
- take Paddle/books as the main line, the requirement of the models motivates framework refactoring, - Take Paddle/books as the main line, the requirement of the models motivates framework refactoring,
- model migration - Model migration
- framework development gives **priority support** to model migration, for example, - Framework development gives **priority support** to model migration, for example,
- the MNIST demo needs a Python interface, - the MNIST demo needs a Python interface,
- the RNN models require the framework to support `LoDTensor`. - the RNN models require the framework to support `LoDTensor`.
- determine some timelines, - Determine some timelines,
- heavily-relied Ops need to be migrated first, - Frequently used Ops need to be migrated first,
- different models can be migrated parallelly. - Different models can be migrated in parallel.
- improve the framework at the same time - Improve the framework at the same time
- accept imperfection, concentrated on solving the specific problem at the right price. - Accept imperfection, concentrate on solving the specific problem at the right price.
--- ---
# Control the migration quality # Control the migration quality
- compare the performance of migrated models with old ones. - Compare the performance of migrated models with old ones.
- follow google C style - Follow the google C++ style
- build the automatic workflow of generating Python/C++ documentations - Build the automatic workflow of generating Python/C++ documentations.
- the documentation of layers and ops should be written inside the code - The documentation of layers and ops should be written inside the code.
- take the documentation quality into account when doing PR - Take the documentation quality into account when submitting pull requests.
- preview the documentations, read and improve them from users' perspective - Preview the documentations, read and improve them from a user's perspective.
...@@ -179,72 +179,73 @@ ...@@ -179,72 +179,73 @@
<div class="section" id="design-doc-refactorization-overview"> <div class="section" id="design-doc-refactorization-overview">
<span id="design-doc-refactorization-overview"></span><h1>Design Doc: Refactorization Overview<a class="headerlink" href="#design-doc-refactorization-overview" title="Permalink to this headline"></a></h1> <span id="design-doc-refactorization-overview"></span><h1>Design Doc: Refactorization Overview<a class="headerlink" href="#design-doc-refactorization-overview" title="Permalink to this headline"></a></h1>
<p>The goal of refactorizaiton include:</p> <p>The goals of refactoring include:</p>
<ol class="simple"> <ol class="simple">
<li>Make it easy for external contributors to write new elementory computaiton operations.</li> <li>Making it easy for external contributors to write new elementary computation operations.</li>
<li>Make the codebase clean and readable.</li> <li>Making the codebase clean and readable.</li>
<li>Introduce a new design of computation representation &#8211; a computation graph of operators and variables.</li> <li>Designing a new computation representation &#8211; a computation graph of operators and variables.</li>
<li>The graph representation helps implementing auto-scalable and auto fault recoverable distributed computing.</li> <li>Implementing auto-scalability and auto fault recoverable distributed computing with the help of computation graphs.</li>
</ol> </ol>
<div class="section" id="computation-graphs"> <div class="section" id="computation-graphs">
<span id="computation-graphs"></span><h2>Computation Graphs<a class="headerlink" href="#computation-graphs" title="Permalink to this headline"></a></h2> <span id="computation-graphs"></span><h2>Computation Graphs<a class="headerlink" href="#computation-graphs" title="Permalink to this headline"></a></h2>
<ol class="simple"> <ol class="simple">
<li>PaddlePaddle represent the computation, training and inference of DL models, by computation graphs.</li> <li>PaddlePaddle represents the computation, training and inference of Deep Learning models, by computation graphs.</li>
<li>Please dig into <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md">computation graphs</a> for a solid example.</li> <li>Please refer to <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md">computation graphs</a> for a concrete example.</li>
<li>Users write Python programs to describe the graphs and run it (locally or remotely).</li> <li>Users write Python programs to describe the graphs and run them (locally or remotely).</li>
<li>A graph is composed of <em>variables</em> and <em>operators</em>.</li> <li>A graph is composed of <em>variables</em> and <em>operators</em>.</li>
<li>The description of graphs must be able to be serialized/deserialized, so it<ol> <li>The description of graphs must be capable of being serialized/deserialized, so that<ol>
<li>could to be sent to the cloud for distributed execution, and</li> <li>It can to be sent to the cloud for distributed execution, and</li>
<li>be sent to clients for mobile or enterprise deployment.</li> <li>It can be sent to clients for mobile or enterprise deployment.</li>
</ol> </ol>
</li> </li>
<li>The Python program do<ol> <li>The Python program does the following steps<ol>
<li><em>compilation</em>: runs a Python program to generate a protobuf message representation of the graph and send it to<ol> <li><em>compilation</em>: run a Python program to generate a protobuf message representation of the graph and send it to<ol>
<li>the C++ library <code class="docutils literal"><span class="pre">libpaddle.so</span></code> for local execution,</li> <li>the C++ library <code class="docutils literal"><span class="pre">libpaddle.so</span></code> for local execution,</li>
<li>the master process of a distributed training job for training, or</li> <li>the master process of a distributed training job for training, or</li>
<li>the server process of a Kubernetes serving job for distributed serving.</li> <li>the server process of a Kubernetes serving job for distributed serving.</li>
</ol> </ol>
</li> </li>
<li><em>execution</em>: according to the protobuf message, constructs instances of class <code class="docutils literal"><span class="pre">Variable</span></code> and <code class="docutils literal"><span class="pre">OperatorBase</span></code>, and run them.</li> <li><em>execution</em>: execute the graph by constructing instances of class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24"><code class="docutils literal"><span class="pre">Variable</span></code></a> and <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L70"><code class="docutils literal"><span class="pre">OperatorBase</span></code></a>, according to the protobuf message.</li>
</ol> </ol>
</li> </li>
</ol> </ol>
</div> </div>
<div class="section" id="description-and-realization"> <div class="section" id="description-and-realization-of-computation-graph">
<span id="description-and-realization"></span><h2>Description and Realization<a class="headerlink" href="#description-and-realization" title="Permalink to this headline"></a></h2> <span id="description-and-realization-of-computation-graph"></span><h2>Description and Realization of Computation Graph<a class="headerlink" href="#description-and-realization-of-computation-graph" title="Permalink to this headline"></a></h2>
<p>At compile time, the Python program generates protobuf message representation of the graph, or the description of the graph.</p> <p>At compile time, the Python program generates a protobuf message representation of the graph, or the description of the graph.</p>
<p>At runtime, the C++ program realizes the graph and run it.</p> <p>At runtime, the C++ program realizes the graph and runs it.</p>
<p>| | Representation (protobuf messages) | Realization (C++ class objects) | <p>| | Representation (protobuf messages) | Realization (C++ class objects) |
|&#8212;|&#8212;|&#8212;| |&#8212;|&#8212;|&#8212;|
|Data|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L107">VarDesc</a>|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24">Variable</a>| |Data|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L107">VarDesc</a>|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24">Variable</a>|
|Operation|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35">OpDesc</a>|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64">Operator</a>| |Operation|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35">OpDesc</a>|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64">Operator</a>|
|Block|BlockDesc|Block|</p> |Block|BlockDesc|Block|</p>
<p>The word <em>graph</em> is exchangable with <em>block</em> in this document. A graph represent computation steps and local variables as a C++/Java program block, or a pair of { and }.</p> <p>The word <em>graph</em> is interchangeable with <em>block</em> in this document. A graph represents computation steps and local variables similar to a C++/Java program block, or a pair of parentheses(<code class="docutils literal"><span class="pre">{</span></code> and <code class="docutils literal"><span class="pre">}</span></code>).</p>
</div> </div>
<div class="section" id="compilation-and-execution"> <div class="section" id="compilation-and-execution">
<span id="compilation-and-execution"></span><h2>Compilation and Execution<a class="headerlink" href="#compilation-and-execution" title="Permalink to this headline"></a></h2> <span id="compilation-and-execution"></span><h2>Compilation and Execution<a class="headerlink" href="#compilation-and-execution" title="Permalink to this headline"></a></h2>
<ol class="simple"> <ol class="simple">
<li>Run an applicaton Python program to describe the graph. In particular,<ol> <li>Run an application Python program to describe the graph. In particular, the Python application program does the following:<ol>
<li>create VarDesc to represent local/intermediate variables,</li> <li>Create <code class="docutils literal"><span class="pre">VarDesc</span></code> to represent local/intermediate variables,</li>
<li>create operators and set attributes,</li> <li>Create operators and set attributes,</li>
<li>validate attribute values,</li> <li>Validate attribute values,</li>
<li>inference the type and the shape of variables,</li> <li>Infer the type and the shape of variables,</li>
<li>plan for memory-reuse for variables,</li> <li>Plan memory-reuse for variables,</li>
<li>generate backward and optimization part of the Graph.</li> <li>Generate the backward graph</li>
<li>possiblly split the graph for distributed training.</li> <li>Optimize the computation graph.</li>
<li>Potentially, split the graph for distributed training.</li>
</ol> </ol>
</li> </li>
<li>The invocation of <code class="docutils literal"><span class="pre">train</span></code> or <code class="docutils literal"><span class="pre">infer</span></code> in the application Python program:<ol> <li>The invocation of <code class="docutils literal"><span class="pre">train</span></code> or <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/inference.py#L108"><code class="docutils literal"><span class="pre">infer</span></code></a> methods in the application Python program does the following:<ol>
<li>create a new Scope instance in the <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md">scope hierarchy</a> for each run of a block,<ol> <li>Create a new Scope instance in the <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md">scope hierarchy</a> for each run of a block,<ol>
<li>realize local variables defined in the BlockDesc message in the new scope,</li> <li>realize local variables defined in the BlockDesc message in the new scope,</li>
<li>a scope is similar to the stack frame in programming languages,</li> <li>a scope is similar to the stack frame in programming languages,</li>
</ol> </ol>
</li> </li>
<li>create an instance of class <code class="docutils literal"><span class="pre">Block</span></code>, in which,<ol> <li>Create an instance of class <code class="docutils literal"><span class="pre">Block</span></code>, in which,<ol>
<li>realize operators in the BlockDesc message,</li> <li>realize operators in the BlockDesc message,</li>
</ol> </ol>
</li> </li>
<li>run the Block by calling<ol> <li>Run the Block by calling<ol>
<li><code class="docutils literal"><span class="pre">Block::Eval(vector&lt;Variable&gt;*</span> <span class="pre">targets)</span></code> for forward and backward computations, or</li> <li><code class="docutils literal"><span class="pre">Block::Eval(vector&lt;Variable&gt;*</span> <span class="pre">targets)</span></code> for forward and backward computations, or</li>
<li><code class="docutils literal"><span class="pre">Block::Eval(vector&lt;Operator&gt;*</span> <span class="pre">targets)</span></code> for optimization.</li> <li><code class="docutils literal"><span class="pre">Block::Eval(vector&lt;Operator&gt;*</span> <span class="pre">targets)</span></code> for optimization.</li>
</ol> </ol>
...@@ -258,17 +259,17 @@ ...@@ -258,17 +259,17 @@
<div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time -&gt; IR -&gt; Runtime <div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time -&gt; IR -&gt; Runtime
</pre></div> </pre></div>
</div> </div>
<div class="section" id="benefit"> <div class="section" id="benefits-of-ir">
<span id="benefit"></span><h3>Benefit<a class="headerlink" href="#benefit" title="Permalink to this headline"></a></h3> <span id="benefits-of-ir"></span><h3>Benefits of IR<a class="headerlink" href="#benefits-of-ir" title="Permalink to this headline"></a></h3>
<ul> <ul>
<li><p class="first">Optimization</p> <li><p class="first">Optimization</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time -&gt; IR -&gt; Optimized IR -&gt; Runtime <div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time -&gt; IR -&gt; Optimized IR -&gt; Runtime
</pre></div> </pre></div>
</div> </div>
</li> </li>
<li><p class="first">Send automatically partitioned IR to different nodes.</p> <li><p class="first">Automatically send partitioned IR to different nodes.</p>
<ul> <ul>
<li><p class="first">Automatic data parallel</p> <li><p class="first">Automatic Data Parallelism</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time <div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time
|-&gt; Single GPU IR |-&gt; Single GPU IR
|-&gt; [trainer-IR-0, trainer-IR-1, pserver-IR] |-&gt; [trainer-IR-0, trainer-IR-1, pserver-IR]
...@@ -278,7 +279,7 @@ ...@@ -278,7 +279,7 @@
</pre></div> </pre></div>
</div> </div>
</li> </li>
<li><p class="first">Automatic model parallel (planned for future)</p> <li><p class="first">Automatic Model Parallelism (planned for future)</p>
</li> </li>
</ul> </ul>
</li> </li>
...@@ -296,10 +297,10 @@ ...@@ -296,10 +297,10 @@
<span id="operator"></span><h1>Operator<a class="headerlink" href="#operator" title="Permalink to this headline"></a></h1> <span id="operator"></span><h1>Operator<a class="headerlink" href="#operator" title="Permalink to this headline"></a></h1>
<p><img alt="class_diagram" src="http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot" /></p> <p><img alt="class_diagram" src="http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot" /></p>
<ul class="simple"> <ul class="simple">
<li><code class="docutils literal"><span class="pre">Operator</span></code> is the fundamental building block as the user interface.<ul> <li><code class="docutils literal"><span class="pre">Operator</span></code> is the fundamental building block of the user interface.<ul>
<li>Operator stores input/output variable name, and attributes.</li> <li>Operator stores input/output variable names, and attributes.</li>
<li>The <code class="docutils literal"><span class="pre">InferShape</span></code> interface is used to infer output variable shapes by its input shapes.</li> <li>The <code class="docutils literal"><span class="pre">InferShape</span></code> interface is used to infer the shape of the output variable shapes based on the shapes of the input variables.</li>
<li>Use <code class="docutils literal"><span class="pre">Run</span></code> to compute <code class="docutils literal"><span class="pre">input</span> <span class="pre">variables</span></code> to <code class="docutils literal"><span class="pre">output</span> <span class="pre">variables</span></code>.</li> <li>Use <code class="docutils literal"><span class="pre">Run</span></code> to compute the <code class="docutils literal"><span class="pre">output</span></code> variables from the <code class="docutils literal"><span class="pre">input</span></code> variables.</li>
</ul> </ul>
</li> </li>
</ul> </ul>
...@@ -322,11 +323,11 @@ ...@@ -322,11 +323,11 @@
<span id="why-separate-kernel-and-operator"></span><h1>Why separate Kernel and Operator<a class="headerlink" href="#why-separate-kernel-and-operator" title="Permalink to this headline"></a></h1> <span id="why-separate-kernel-and-operator"></span><h1>Why separate Kernel and Operator<a class="headerlink" href="#why-separate-kernel-and-operator" title="Permalink to this headline"></a></h1>
<ul class="simple"> <ul class="simple">
<li>Separate GPU and CPU code.<ul> <li>Separate GPU and CPU code.<ul>
<li>Make Paddle can run without GPU.</li> <li>Make Paddle capable of running without GPU.</li>
</ul> </ul>
</li> </li>
<li>Make one operator (which is user interface) can contain many implementations.<ul> <li>Make one operator (which is a user interface) and create many implementations.<ul>
<li>Same mul op, different FP16, FP32 Kernel. different MKL, eigen kernel.</li> <li>For example, same multiplication op can have different implementations kernels such as FP16 kernel, FP32 kernel, MKL, eigen kernel.</li>
</ul> </ul>
</li> </li>
</ul> </ul>
...@@ -337,30 +338,30 @@ ...@@ -337,30 +338,30 @@
<ul class="simple"> <ul class="simple">
<li><code class="docutils literal"><span class="pre">Eigen::Tensor</span></code> contains basic math and element-wise functions.<ul> <li><code class="docutils literal"><span class="pre">Eigen::Tensor</span></code> contains basic math and element-wise functions.<ul>
<li>Note that <code class="docutils literal"><span class="pre">Eigen::Tensor</span></code> has broadcast implementation.</li> <li>Note that <code class="docutils literal"><span class="pre">Eigen::Tensor</span></code> has broadcast implementation.</li>
<li>Limit number of <code class="docutils literal"><span class="pre">tensor.device(dev)</span> <span class="pre">=</span></code> in your code.</li> <li>Limit the number of <code class="docutils literal"><span class="pre">tensor.device(dev)</span> <span class="pre">=</span></code> in your code.</li>
</ul> </ul>
</li> </li>
<li><code class="docutils literal"><span class="pre">thrust::tranform</span></code> and <code class="docutils literal"><span class="pre">std::transform</span></code>.<ul> <li><code class="docutils literal"><span class="pre">thrust::tranform</span></code> and <code class="docutils literal"><span class="pre">std::transform</span></code>.<ul>
<li><code class="docutils literal"><span class="pre">thrust</span></code> has the same API as C++ standard library. Using <code class="docutils literal"><span class="pre">transform</span></code> can quickly implement a customized elementwise kernel.</li> <li><code class="docutils literal"><span class="pre">thrust</span></code> has the same API as C++ standard library. Using <code class="docutils literal"><span class="pre">transform</span></code>, one can quickly implement customized elementwise kernels.</li>
<li><code class="docutils literal"><span class="pre">thrust</span></code> has more complex API, like <code class="docutils literal"><span class="pre">scan</span></code>, <code class="docutils literal"><span class="pre">reduce</span></code>, <code class="docutils literal"><span class="pre">reduce_by_key</span></code>.</li> <li><code class="docutils literal"><span class="pre">thrust</span></code> also has more complex APIs, like <code class="docutils literal"><span class="pre">scan</span></code>, <code class="docutils literal"><span class="pre">reduce</span></code>, <code class="docutils literal"><span class="pre">reduce_by_key</span></code>.</li>
</ul> </ul>
</li> </li>
<li>Hand-writing <code class="docutils literal"><span class="pre">GPUKernel</span></code> and <code class="docutils literal"><span class="pre">CPU</span></code> code<ul> <li>Hand-writing <code class="docutils literal"><span class="pre">GPUKernel</span></code> and <code class="docutils literal"><span class="pre">CPU</span></code> code<ul>
<li>Do not write <code class="docutils literal"><span class="pre">.h</span></code>. CPU Kernel should be in <code class="docutils literal"><span class="pre">.cc</span></code>. GPU kernel should be in <code class="docutils literal"><span class="pre">.cu</span></code>. (<code class="docutils literal"><span class="pre">GCC</span></code> cannot compile GPU code.)</li> <li>Do not write in header (<code class="docutils literal"><span class="pre">.h</span></code>) files. CPU Kernel should be in cpp source (<code class="docutils literal"><span class="pre">.cc</span></code>) and GPU kernels should be in cuda (<code class="docutils literal"><span class="pre">.cu</span></code>) files. (GCC cannot compile GPU code.)</li>
</ul> </ul>
</li> </li>
</ul> </ul>
</div> </div>
<hr class="docutils" /> <hr class="docutils" />
<div class="section" id="operator-register"> <div class="section" id="operator-registration">
<span id="operator-register"></span><h1>Operator Register<a class="headerlink" href="#operator-register" title="Permalink to this headline"></a></h1> <span id="operator-registration"></span><h1>Operator Registration<a class="headerlink" href="#operator-registration" title="Permalink to this headline"></a></h1>
<div class="section" id="why-register-is-necessary"> <div class="section" id="why-registration-is-necessary">
<span id="why-register-is-necessary"></span><h2>Why register is necessary?<a class="headerlink" href="#why-register-is-necessary" title="Permalink to this headline"></a></h2> <span id="why-registration-is-necessary"></span><h2>Why registration is necessary?<a class="headerlink" href="#why-registration-is-necessary" title="Permalink to this headline"></a></h2>
<p>We need a method to build mappings between Op type names and Op classes.</p> <p>We need a method to build mappings between Op type names and Op classes.</p>
</div> </div>
<div class="section" id="how-to-do-the-register"> <div class="section" id="how-is-registration-implemented">
<span id="how-to-do-the-register"></span><h2>How to do the register?<a class="headerlink" href="#how-to-do-the-register" title="Permalink to this headline"></a></h2> <span id="how-is-registration-implemented"></span><h2>How is registration implemented?<a class="headerlink" href="#how-is-registration-implemented" title="Permalink to this headline"></a></h2>
<p>Maintain a map, whose key is the type name and value is corresponding Op constructor.</p> <p>Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.</p>
</div> </div>
</div> </div>
<hr class="docutils" /> <hr class="docutils" />
...@@ -393,22 +394,22 @@ ...@@ -393,22 +394,22 @@
</div> </div>
</div> </div>
<div class="section" id="use-macros"> <div class="section" id="use-macros">
<span id="use-macros"></span><h2><code class="docutils literal"><span class="pre">USE</span></code> Macros<a class="headerlink" href="#use-macros" title="Permalink to this headline"></a></h2> <span id="use-macros"></span><h2>USE Macros<a class="headerlink" href="#use-macros" title="Permalink to this headline"></a></h2>
<p>make sure the registration process is executed and linked.</p> <p>Make sure the registration process is executed and linked.</p>
</div> </div>
</div> </div>
<hr class="docutils" /> <hr class="docutils" />
<div class="section" id="register-process"> <div class="section" id="registration-process">
<span id="register-process"></span><h1>Register Process<a class="headerlink" href="#register-process" title="Permalink to this headline"></a></h1> <span id="registration-process"></span><h1>Registration Process<a class="headerlink" href="#registration-process" title="Permalink to this headline"></a></h1>
<ol class="simple"> <ol class="simple">
<li>Write Op class, as well as its gradient Op class if there is.</li> <li>Write an Op class and its gradient Op class, if required.</li>
<li>Write Op maker class. In the constructor, describe its inputs, outputs, and attributes.</li> <li>Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator.</li>
<li>Invoke macro <code class="docutils literal"><span class="pre">REGISTER_OP</span></code>. The macro will<ol> <li>Invoke the macro <code class="docutils literal"><span class="pre">REGISTER_OP</span></code>. This macro will<ol>
<li>call maker class to complete <code class="docutils literal"><span class="pre">proto</span></code> and <code class="docutils literal"><span class="pre">checker</span></code></li> <li>Call maker class to complete the <code class="docutils literal"><span class="pre">proto</span></code> and the <code class="docutils literal"><span class="pre">checker</span></code></li>
<li>with the completed <code class="docutils literal"><span class="pre">proto</span></code> and <code class="docutils literal"><span class="pre">checker</span></code>, build a new key-value pair in the <code class="docutils literal"><span class="pre">OpInfoMap</span></code></li> <li>Using the completed <code class="docutils literal"><span class="pre">proto</span></code> and <code class="docutils literal"><span class="pre">checker</span></code>, it will add a new key-value pair to the <code class="docutils literal"><span class="pre">OpInfoMap</span></code></li>
</ol> </ol>
</li> </li>
<li>Invoke <code class="docutils literal"><span class="pre">USE</span></code> macro in where the Op is used to make sure it is linked.</li> <li>Invoke the <code class="docutils literal"><span class="pre">USE</span></code> macro in which the Op is used, to make sure that it is linked.</li>
</ol> </ol>
</div> </div>
<hr class="docutils" /> <hr class="docutils" />
...@@ -417,7 +418,7 @@ ...@@ -417,7 +418,7 @@
<div class="section" id="create-backward-operator"> <div class="section" id="create-backward-operator">
<span id="create-backward-operator"></span><h2>Create Backward Operator<a class="headerlink" href="#create-backward-operator" title="Permalink to this headline"></a></h2> <span id="create-backward-operator"></span><h2>Create Backward Operator<a class="headerlink" href="#create-backward-operator" title="Permalink to this headline"></a></h2>
<ul class="simple"> <ul class="simple">
<li>Mapping from forwarding Op to backward Op <li>Mapping from forward Op to backward Op
<img alt="backward" src="https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png" /></li> <img alt="backward" src="https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png" /></li>
</ul> </ul>
</div> </div>
...@@ -428,12 +429,12 @@ ...@@ -428,12 +429,12 @@
<div class="section" id="build-backward-network"> <div class="section" id="build-backward-network">
<span id="build-backward-network"></span><h2>Build Backward Network<a class="headerlink" href="#build-backward-network" title="Permalink to this headline"></a></h2> <span id="build-backward-network"></span><h2>Build Backward Network<a class="headerlink" href="#build-backward-network" title="Permalink to this headline"></a></h2>
<ul class="simple"> <ul class="simple">
<li><strong>Input</strong> graph of forwarding operators</li> <li><strong>Input</strong>: graph of forwarding operators</li>
<li><strong>Output</strong> graph of backward operators</li> <li><strong>Output</strong>: graph of backward operators</li>
<li><strong>corner case in construction</strong><ul> <li><strong>Corner cases in construction</strong><ul>
<li>shared variable =&gt; insert <code class="docutils literal"><span class="pre">Add</span></code> operator</li> <li>Shared Variables =&gt; insert an <code class="docutils literal"><span class="pre">Add</span></code> operator to combine gradients</li>
<li>no gradient =&gt; insert <code class="docutils literal"><span class="pre">fill_zero_grad</span></code> operator</li> <li>No Gradient =&gt; insert a <code class="docutils literal"><span class="pre">fill_zero_grad</span></code> operator</li>
<li>recursive netOp =&gt; call <code class="docutils literal"><span class="pre">Backward</span></code> recursively</li> <li>Recursive NetOp =&gt; call <code class="docutils literal"><span class="pre">Backward</span></code> recursively</li>
<li>RNN Op =&gt; recursively call <code class="docutils literal"><span class="pre">Backward</span></code> on stepnet</li> <li>RNN Op =&gt; recursively call <code class="docutils literal"><span class="pre">Backward</span></code> on stepnet</li>
</ul> </ul>
</li> </li>
...@@ -446,17 +447,17 @@ ...@@ -446,17 +447,17 @@
<ul class="simple"> <ul class="simple">
<li><code class="docutils literal"><span class="pre">Tensor</span></code> is an n-dimension array with type.<ul> <li><code class="docutils literal"><span class="pre">Tensor</span></code> is an n-dimension array with type.<ul>
<li>Only dims and data pointers are stored in <code class="docutils literal"><span class="pre">Tensor</span></code>.</li> <li>Only dims and data pointers are stored in <code class="docutils literal"><span class="pre">Tensor</span></code>.</li>
<li>All operators on <code class="docutils literal"><span class="pre">Tensor</span></code> is written in <code class="docutils literal"><span class="pre">Operator</span></code> or global functions.</li> <li>All operations on <code class="docutils literal"><span class="pre">Tensor</span></code> are written in <code class="docutils literal"><span class="pre">Operator</span></code> or global functions.</li>
<li>variable length Tensor design <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md">LoDTensor</a></li> <li>Variable length Tensor design <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md">LoDTensor</a></li>
</ul> </ul>
</li> </li>
<li><code class="docutils literal"><span class="pre">Variable</span></code> is the inputs and outputs of an operator. Not just <code class="docutils literal"><span class="pre">Tensor</span></code>.<ul> <li><code class="docutils literal"><span class="pre">Variable</span></code> instances are the inputs and the outputs of an operator. Not just <code class="docutils literal"><span class="pre">Tensor</span></code>.<ul>
<li>step_scopes in RNN is a variable and not a tensor.</li> <li><code class="docutils literal"><span class="pre">step_scopes</span></code> in RNN is a variable and not a tensor.</li>
</ul> </ul>
</li> </li>
<li><code class="docutils literal"><span class="pre">Scope</span></code> is where variables store at.<ul> <li><code class="docutils literal"><span class="pre">Scope</span></code> is where variables are stores.<ul>
<li>map&lt;string/*var name */, Variable&gt;</li> <li>map&lt;string <code class="docutils literal"><span class="pre">variable_name</span></code>, Variable&gt;</li>
<li><code class="docutils literal"><span class="pre">Scope</span></code> has a hierarchical structure. The local scope can get variable from its parent scope.</li> <li><code class="docutils literal"><span class="pre">Scope</span></code> has a hierarchical structure. The local scope can get variables from its parent scope.</li>
</ul> </ul>
</li> </li>
</ul> </ul>
...@@ -467,11 +468,11 @@ ...@@ -467,11 +468,11 @@
<div class="section" id="the-difference-with-original-rnnop"> <div class="section" id="the-difference-with-original-rnnop">
<span id="the-difference-with-original-rnnop"></span><h2>the difference with original RNNOp<a class="headerlink" href="#the-difference-with-original-rnnop" title="Permalink to this headline"></a></h2> <span id="the-difference-with-original-rnnop"></span><h2>the difference with original RNNOp<a class="headerlink" href="#the-difference-with-original-rnnop" title="Permalink to this headline"></a></h2>
<ul class="simple"> <ul class="simple">
<li>as an operator is more intuitive than <code class="docutils literal"><span class="pre">RNNOp</span></code>,</li> <li>As an operator is more intuitive than <code class="docutils literal"><span class="pre">RNNOp</span></code>,</li>
<li>offers new interface <code class="docutils literal"><span class="pre">Eval(targets)</span></code> to deduce the minimal block to <code class="docutils literal"><span class="pre">Run</span></code>,</li> <li>Offers a new interface <code class="docutils literal"><span class="pre">Eval(targets)</span></code> to deduce the minimal block to <code class="docutils literal"><span class="pre">Run</span></code>,</li>
<li>fits the compile-time/ runtime separation design.<ul> <li>Fits the compile-time/ runtime separation design paradigm.<ul>
<li>during the compilation, <code class="docutils literal"><span class="pre">SymbolTable</span></code> stores <code class="docutils literal"><span class="pre">VarDesc</span></code>s and <code class="docutils literal"><span class="pre">OpDesc</span></code>s and serialize to a <code class="docutils literal"><span class="pre">BlockDesc</span></code></li> <li>During the compilation, <code class="docutils literal"><span class="pre">SymbolTable</span></code> stores <code class="docutils literal"><span class="pre">VarDesc</span></code>s and <code class="docutils literal"><span class="pre">OpDesc</span></code>s and serialize to a <code class="docutils literal"><span class="pre">BlockDesc</span></code></li>
<li>when graph executes, a Block with <code class="docutils literal"><span class="pre">BlockDesc</span></code> passed in creates <code class="docutils literal"><span class="pre">Op</span></code> and <code class="docutils literal"><span class="pre">Var</span></code> then <code class="docutils literal"><span class="pre">Run</span></code></li> <li>When graph executes, a Block with <code class="docutils literal"><span class="pre">BlockDesc</span></code> is passed. It then creates <code class="docutils literal"><span class="pre">Op</span></code> and <code class="docutils literal"><span class="pre">Var</span></code> instances and then invokes <code class="docutils literal"><span class="pre">Run</span></code>.</li>
</ul> </ul>
</li> </li>
</ul> </ul>
...@@ -481,32 +482,32 @@ ...@@ -481,32 +482,32 @@
<div class="section" id="milestone"> <div class="section" id="milestone">
<span id="milestone"></span><h1>Milestone<a class="headerlink" href="#milestone" title="Permalink to this headline"></a></h1> <span id="milestone"></span><h1>Milestone<a class="headerlink" href="#milestone" title="Permalink to this headline"></a></h1>
<ul class="simple"> <ul class="simple">
<li>take Paddle/books as the main line, the requirement of the models motivates framework refactoring,</li> <li>Take Paddle/books as the main line, the requirement of the models motivates framework refactoring,</li>
<li>model migration<ul> <li>Model migration<ul>
<li>framework development gives <strong>priority support</strong> to model migration, for example,<ul> <li>Framework development gives <strong>priority support</strong> to model migration, for example,<ul>
<li>the MNIST demo needs a Python interface,</li> <li>the MNIST demo needs a Python interface,</li>
<li>the RNN models require the framework to support <code class="docutils literal"><span class="pre">LoDTensor</span></code>.</li> <li>the RNN models require the framework to support <code class="docutils literal"><span class="pre">LoDTensor</span></code>.</li>
</ul> </ul>
</li> </li>
<li>determine some timelines,</li> <li>Determine some timelines,</li>
<li>heavily-relied Ops need to be migrated first,</li> <li>Frequently used Ops need to be migrated first,</li>
<li>different models can be migrated parallelly.</li> <li>Different models can be migrated in parallel.</li>
</ul> </ul>
</li> </li>
<li>improve the framework at the same time</li> <li>Improve the framework at the same time</li>
<li>accept imperfection, concentrated on solving the specific problem at the right price.</li> <li>Accept imperfection, concentrate on solving the specific problem at the right price.</li>
</ul> </ul>
</div> </div>
<hr class="docutils" /> <hr class="docutils" />
<div class="section" id="control-the-migration-quality"> <div class="section" id="control-the-migration-quality">
<span id="control-the-migration-quality"></span><h1>Control the migration quality<a class="headerlink" href="#control-the-migration-quality" title="Permalink to this headline"></a></h1> <span id="control-the-migration-quality"></span><h1>Control the migration quality<a class="headerlink" href="#control-the-migration-quality" title="Permalink to this headline"></a></h1>
<ul class="simple"> <ul class="simple">
<li>compare the performance of migrated models with old ones.</li> <li>Compare the performance of migrated models with old ones.</li>
<li>follow google C style</li> <li>Follow the google C++ style</li>
<li>build the automatic workflow of generating Python/C++ documentations<ul> <li>Build the automatic workflow of generating Python/C++ documentations.<ul>
<li>the documentation of layers and ops should be written inside the code</li> <li>The documentation of layers and ops should be written inside the code.</li>
<li>take the documentation quality into account when doing PR</li> <li>Take the documentation quality into account when submitting pull requests.</li>
<li>preview the documentations, read and improve them from users&#8217; perspective</li> <li>Preview the documentations, read and improve them from a user&#8217;s perspective.</li>
</ul> </ul>
</li> </li>
</ul> </ul>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
# Design Doc: Refactorization Overview # Design Doc: Refactorization Overview
The goal of refactorizaiton include: The goals of refactoring include:
1. Make it easy for external contributors to write new elementory computaiton operations. 1. Making it easy for external contributors to write new elementary computation operations.
1. Make the codebase clean and readable. 1. Making the codebase clean and readable.
1. Introduce a new design of computation representation -- a computation graph of operators and variables. 1. Designing a new computation representation -- a computation graph of operators and variables.
1. The graph representation helps implementing auto-scalable and auto fault recoverable distributed computing. 1. Implementing auto-scalability and auto fault recoverable distributed computing with the help of computation graphs.
## Computation Graphs ## Computation Graphs
1. PaddlePaddle represent the computation, training and inference of DL models, by computation graphs. 1. PaddlePaddle represents the computation, training and inference of Deep Learning models, by computation graphs.
1. Please dig into [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a solid example. 1. Please refer to [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a concrete example.
1. Users write Python programs to describe the graphs and run it (locally or remotely). 1. Users write Python programs to describe the graphs and run them (locally or remotely).
1. A graph is composed of *variables* and *operators*. 1. A graph is composed of *variables* and *operators*.
1. The description of graphs must be able to be serialized/deserialized, so it 1. The description of graphs must be capable of being serialized/deserialized, so that
1. could to be sent to the cloud for distributed execution, and 1. It can to be sent to the cloud for distributed execution, and
1. be sent to clients for mobile or enterprise deployment. 1. It can be sent to clients for mobile or enterprise deployment.
1. The Python program do 1. The Python program does the following steps
1. *compilation*: runs a Python program to generate a protobuf message representation of the graph and send it to 1. *compilation*: run a Python program to generate a protobuf message representation of the graph and send it to
1. the C++ library `libpaddle.so` for local execution, 1. the C++ library `libpaddle.so` for local execution,
1. the master process of a distributed training job for training, or 1. the master process of a distributed training job for training, or
1. the server process of a Kubernetes serving job for distributed serving. 1. the server process of a Kubernetes serving job for distributed serving.
1. *execution*: according to the protobuf message, constructs instances of class `Variable` and `OperatorBase`, and run them. 1. *execution*: execute the graph by constructing instances of class [`Variable`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24) and [`OperatorBase`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L70), according to the protobuf message.
## Description and Realization ## Description and Realization of Computation Graph
At compile time, the Python program generates protobuf message representation of the graph, or the description of the graph. At compile time, the Python program generates a protobuf message representation of the graph, or the description of the graph.
At runtime, the C++ program realizes the graph and run it. At runtime, the C++ program realizes the graph and runs it.
| | Representation (protobuf messages) | Realization (C++ class objects) | | | Representation (protobuf messages) | Realization (C++ class objects) |
|---|---|---| |---|---|---|
...@@ -42,30 +42,31 @@ At runtime, the C++ program realizes the graph and run it. ...@@ -42,30 +42,31 @@ At runtime, the C++ program realizes the graph and run it.
|Operation|[OpDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35)|[Operator](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64)| |Operation|[OpDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35)|[Operator](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64)|
|Block|BlockDesc|Block| |Block|BlockDesc|Block|
The word *graph* is exchangable with *block* in this document. A graph represent computation steps and local variables as a C++/Java program block, or a pair of { and }. The word *graph* is interchangeable with *block* in this document. A graph represents computation steps and local variables similar to a C++/Java program block, or a pair of parentheses(`{` and `}`).
## Compilation and Execution ## Compilation and Execution
1. Run an applicaton Python program to describe the graph. In particular, 1. Run an application Python program to describe the graph. In particular, the Python application program does the following:
1. create VarDesc to represent local/intermediate variables, 1. Create `VarDesc` to represent local/intermediate variables,
1. create operators and set attributes, 1. Create operators and set attributes,
1. validate attribute values, 1. Validate attribute values,
1. inference the type and the shape of variables, 1. Infer the type and the shape of variables,
1. plan for memory-reuse for variables, 1. Plan memory-reuse for variables,
1. generate backward and optimization part of the Graph. 1. Generate the backward graph
1. possiblly split the graph for distributed training. 1. Optimize the computation graph.
1. Potentially, split the graph for distributed training.
1. The invocation of `train` or `infer` in the application Python program: 1. The invocation of `train` or [`infer`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/inference.py#L108) methods in the application Python program does the following:
1. create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block, 1. Create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block,
1. realize local variables defined in the BlockDesc message in the new scope, 1. realize local variables defined in the BlockDesc message in the new scope,
1. a scope is similar to the stack frame in programming languages, 1. a scope is similar to the stack frame in programming languages,
1. create an instance of class `Block`, in which, 1. Create an instance of class `Block`, in which,
1. realize operators in the BlockDesc message, 1. realize operators in the BlockDesc message,
1. run the Block by calling 1. Run the Block by calling
1. `Block::Eval(vector<Variable>* targets)` for forward and backward computations, or 1. `Block::Eval(vector<Variable>* targets)` for forward and backward computations, or
1. `Block::Eval(vector<Operator>* targets)` for optimization. 1. `Block::Eval(vector<Operator>* targets)` for optimization.
...@@ -76,14 +77,14 @@ The word *graph* is exchangable with *block* in this document. A graph represen ...@@ -76,14 +77,14 @@ The word *graph* is exchangable with *block* in this document. A graph represen
Compile Time -> IR -> Runtime Compile Time -> IR -> Runtime
``` ```
### Benefit ### Benefits of IR
- Optimization - Optimization
```text ```text
Compile Time -> IR -> Optimized IR -> Runtime Compile Time -> IR -> Optimized IR -> Runtime
``` ```
- Send automatically partitioned IR to different nodes. - Automatically send partitioned IR to different nodes.
- Automatic data parallel - Automatic Data Parallelism
```text ```text
Compile Time Compile Time
|-> Single GPU IR |-> Single GPU IR
...@@ -92,7 +93,7 @@ Compile Time -> IR -> Runtime ...@@ -92,7 +93,7 @@ Compile Time -> IR -> Runtime
|-> Node-1 (runs trainer-IR-1) |-> Node-1 (runs trainer-IR-1)
|-> Node-2 (runs pserver-IR) |-> Node-2 (runs pserver-IR)
``` ```
- Automatic model parallel (planned for future) - Automatic Model Parallelism (planned for future)
--- ---
...@@ -105,10 +106,10 @@ Compile Time -> IR -> Runtime ...@@ -105,10 +106,10 @@ Compile Time -> IR -> Runtime
# Operator # Operator
![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot) ![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot)
* `Operator` is the fundamental building block as the user interface. * `Operator` is the fundamental building block of the user interface.
* Operator stores input/output variable name, and attributes. * Operator stores input/output variable names, and attributes.
* The `InferShape` interface is used to infer output variable shapes by its input shapes. * The `InferShape` interface is used to infer the shape of the output variable shapes based on the shapes of the input variables.
* Use `Run` to compute `input variables` to `output variables`. * Use `Run` to compute the `output` variables from the `input` variables.
--- ---
...@@ -126,30 +127,30 @@ Compile Time -> IR -> Runtime ...@@ -126,30 +127,30 @@ Compile Time -> IR -> Runtime
# Why separate Kernel and Operator # Why separate Kernel and Operator
* Separate GPU and CPU code. * Separate GPU and CPU code.
* Make Paddle can run without GPU. * Make Paddle capable of running without GPU.
* Make one operator (which is user interface) can contain many implementations. * Make one operator (which is a user interface) and create many implementations.
* Same mul op, different FP16, FP32 Kernel. different MKL, eigen kernel. * For example, same multiplication op can have different implementations kernels such as FP16 kernel, FP32 kernel, MKL, eigen kernel.
--- ---
# Libraries for Kernel development # Libraries for Kernel development
* `Eigen::Tensor` contains basic math and element-wise functions. * `Eigen::Tensor` contains basic math and element-wise functions.
* Note that `Eigen::Tensor` has broadcast implementation. * Note that `Eigen::Tensor` has broadcast implementation.
* Limit number of `tensor.device(dev) = ` in your code. * Limit the number of `tensor.device(dev) = ` in your code.
* `thrust::tranform` and `std::transform`. * `thrust::tranform` and `std::transform`.
* `thrust` has the same API as C++ standard library. Using `transform` can quickly implement a customized elementwise kernel. * `thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized elementwise kernels.
* `thrust` has more complex API, like `scan`, `reduce`, `reduce_by_key`. * `thrust` also has more complex APIs, like `scan`, `reduce`, `reduce_by_key`.
* Hand-writing `GPUKernel` and `CPU` code * Hand-writing `GPUKernel` and `CPU` code
* Do not write `.h`. CPU Kernel should be in `.cc`. GPU kernel should be in `.cu`. (`GCC` cannot compile GPU code.) * Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.)
--- ---
# Operator Register # Operator Registration
## Why register is necessary? ## Why registration is necessary?
We need a method to build mappings between Op type names and Op classes. We need a method to build mappings between Op type names and Op classes.
## How to do the register? ## How is registration implemented?
Maintain a map, whose key is the type name and value is corresponding Op constructor. Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.
--- ---
# The Registry Map # The Registry Map
...@@ -177,34 +178,34 @@ REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class) ...@@ -177,34 +178,34 @@ REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class)
REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class) REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class)
``` ```
### `USE` Macros ### USE Macros
make sure the registration process is executed and linked. Make sure the registration process is executed and linked.
--- ---
# Register Process # Registration Process
1. Write Op class, as well as its gradient Op class if there is. 1. Write an Op class and its gradient Op class, if required.
2. Write Op maker class. In the constructor, describe its inputs, outputs, and attributes. 2. Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator.
3. Invoke macro `REGISTER_OP`. The macro will 3. Invoke the macro `REGISTER_OP`. This macro will
1. call maker class to complete `proto` and `checker` 1. Call maker class to complete the `proto` and the `checker`
2. with the completed `proto` and `checker`, build a new key-value pair in the `OpInfoMap` 2. Using the completed `proto` and `checker`, it will add a new key-value pair to the `OpInfoMap`
4. Invoke `USE` macro in where the Op is used to make sure it is linked. 4. Invoke the `USE` macro in which the Op is used, to make sure that it is linked.
--- ---
# Backward Module (1/2) # Backward Module (1/2)
### Create Backward Operator ### Create Backward Operator
- Mapping from forwarding Op to backward Op - Mapping from forward Op to backward Op
![backward](https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png) ![backward](https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png)
--- ---
# Backward Module (2/2) # Backward Module (2/2)
### Build Backward Network ### Build Backward Network
- **Input** graph of forwarding operators - **Input**: graph of forwarding operators
- **Output** graph of backward operators - **Output**: graph of backward operators
- **corner case in construction** - **Corner cases in construction**
- shared variable => insert `Add` operator - Shared Variables => insert an `Add` operator to combine gradients
- no gradient => insert `fill_zero_grad` operator - No Gradient => insert a `fill_zero_grad` operator
- recursive netOp => call `Backward` recursively - Recursive NetOp => call `Backward` recursively
- RNN Op => recursively call `Backward` on stepnet - RNN Op => recursively call `Backward` on stepnet
...@@ -213,41 +214,41 @@ make sure the registration process is executed and linked. ...@@ -213,41 +214,41 @@ make sure the registration process is executed and linked.
* `Tensor` is an n-dimension array with type. * `Tensor` is an n-dimension array with type.
* Only dims and data pointers are stored in `Tensor`. * Only dims and data pointers are stored in `Tensor`.
* All operators on `Tensor` is written in `Operator` or global functions. * All operations on `Tensor` are written in `Operator` or global functions.
* variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) * Variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md)
* `Variable` is the inputs and outputs of an operator. Not just `Tensor`. * `Variable` instances are the inputs and the outputs of an operator. Not just `Tensor`.
* step_scopes in RNN is a variable and not a tensor. * `step_scopes` in RNN is a variable and not a tensor.
* `Scope` is where variables store at. * `Scope` is where variables are stores.
* map<string/*var name */, Variable> * map<string `variable_name`, Variable>
* `Scope` has a hierarchical structure. The local scope can get variable from its parent scope. * `Scope` has a hierarchical structure. The local scope can get variables from its parent scope.
--- ---
# Block (in design) # Block (in design)
## the difference with original RNNOp ## the difference with original RNNOp
- as an operator is more intuitive than `RNNOp`, - As an operator is more intuitive than `RNNOp`,
- offers new interface `Eval(targets)` to deduce the minimal block to `Run`, - Offers a new interface `Eval(targets)` to deduce the minimal block to `Run`,
- fits the compile-time/ runtime separation design. - Fits the compile-time/ runtime separation design paradigm.
- during the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc` - During the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc`
- when graph executes, a Block with `BlockDesc` passed in creates `Op` and `Var` then `Run` - When graph executes, a Block with `BlockDesc` is passed. It then creates `Op` and `Var` instances and then invokes `Run`.
--- ---
# Milestone # Milestone
- take Paddle/books as the main line, the requirement of the models motivates framework refactoring, - Take Paddle/books as the main line, the requirement of the models motivates framework refactoring,
- model migration - Model migration
- framework development gives **priority support** to model migration, for example, - Framework development gives **priority support** to model migration, for example,
- the MNIST demo needs a Python interface, - the MNIST demo needs a Python interface,
- the RNN models require the framework to support `LoDTensor`. - the RNN models require the framework to support `LoDTensor`.
- determine some timelines, - Determine some timelines,
- heavily-relied Ops need to be migrated first, - Frequently used Ops need to be migrated first,
- different models can be migrated parallelly. - Different models can be migrated in parallel.
- improve the framework at the same time - Improve the framework at the same time
- accept imperfection, concentrated on solving the specific problem at the right price. - Accept imperfection, concentrate on solving the specific problem at the right price.
--- ---
# Control the migration quality # Control the migration quality
- compare the performance of migrated models with old ones. - Compare the performance of migrated models with old ones.
- follow google C style - Follow the google C++ style
- build the automatic workflow of generating Python/C++ documentations - Build the automatic workflow of generating Python/C++ documentations.
- the documentation of layers and ops should be written inside the code - The documentation of layers and ops should be written inside the code.
- take the documentation quality into account when doing PR - Take the documentation quality into account when submitting pull requests.
- preview the documentations, read and improve them from users' perspective - Preview the documentations, read and improve them from a user's perspective.
...@@ -193,72 +193,73 @@ ...@@ -193,72 +193,73 @@
<div class="section" id="design-doc-refactorization-overview"> <div class="section" id="design-doc-refactorization-overview">
<span id="design-doc-refactorization-overview"></span><h1>Design Doc: Refactorization Overview<a class="headerlink" href="#design-doc-refactorization-overview" title="永久链接至标题"></a></h1> <span id="design-doc-refactorization-overview"></span><h1>Design Doc: Refactorization Overview<a class="headerlink" href="#design-doc-refactorization-overview" title="永久链接至标题"></a></h1>
<p>The goal of refactorizaiton include:</p> <p>The goals of refactoring include:</p>
<ol class="simple"> <ol class="simple">
<li>Make it easy for external contributors to write new elementory computaiton operations.</li> <li>Making it easy for external contributors to write new elementary computation operations.</li>
<li>Make the codebase clean and readable.</li> <li>Making the codebase clean and readable.</li>
<li>Introduce a new design of computation representation &#8211; a computation graph of operators and variables.</li> <li>Designing a new computation representation &#8211; a computation graph of operators and variables.</li>
<li>The graph representation helps implementing auto-scalable and auto fault recoverable distributed computing.</li> <li>Implementing auto-scalability and auto fault recoverable distributed computing with the help of computation graphs.</li>
</ol> </ol>
<div class="section" id="computation-graphs"> <div class="section" id="computation-graphs">
<span id="computation-graphs"></span><h2>Computation Graphs<a class="headerlink" href="#computation-graphs" title="永久链接至标题"></a></h2> <span id="computation-graphs"></span><h2>Computation Graphs<a class="headerlink" href="#computation-graphs" title="永久链接至标题"></a></h2>
<ol class="simple"> <ol class="simple">
<li>PaddlePaddle represent the computation, training and inference of DL models, by computation graphs.</li> <li>PaddlePaddle represents the computation, training and inference of Deep Learning models, by computation graphs.</li>
<li>Please dig into <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md">computation graphs</a> for a solid example.</li> <li>Please refer to <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md">computation graphs</a> for a concrete example.</li>
<li>Users write Python programs to describe the graphs and run it (locally or remotely).</li> <li>Users write Python programs to describe the graphs and run them (locally or remotely).</li>
<li>A graph is composed of <em>variables</em> and <em>operators</em>.</li> <li>A graph is composed of <em>variables</em> and <em>operators</em>.</li>
<li>The description of graphs must be able to be serialized/deserialized, so it<ol> <li>The description of graphs must be capable of being serialized/deserialized, so that<ol>
<li>could to be sent to the cloud for distributed execution, and</li> <li>It can to be sent to the cloud for distributed execution, and</li>
<li>be sent to clients for mobile or enterprise deployment.</li> <li>It can be sent to clients for mobile or enterprise deployment.</li>
</ol> </ol>
</li> </li>
<li>The Python program do<ol> <li>The Python program does the following steps<ol>
<li><em>compilation</em>: runs a Python program to generate a protobuf message representation of the graph and send it to<ol> <li><em>compilation</em>: run a Python program to generate a protobuf message representation of the graph and send it to<ol>
<li>the C++ library <code class="docutils literal"><span class="pre">libpaddle.so</span></code> for local execution,</li> <li>the C++ library <code class="docutils literal"><span class="pre">libpaddle.so</span></code> for local execution,</li>
<li>the master process of a distributed training job for training, or</li> <li>the master process of a distributed training job for training, or</li>
<li>the server process of a Kubernetes serving job for distributed serving.</li> <li>the server process of a Kubernetes serving job for distributed serving.</li>
</ol> </ol>
</li> </li>
<li><em>execution</em>: according to the protobuf message, constructs instances of class <code class="docutils literal"><span class="pre">Variable</span></code> and <code class="docutils literal"><span class="pre">OperatorBase</span></code>, and run them.</li> <li><em>execution</em>: execute the graph by constructing instances of class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24"><code class="docutils literal"><span class="pre">Variable</span></code></a> and <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L70"><code class="docutils literal"><span class="pre">OperatorBase</span></code></a>, according to the protobuf message.</li>
</ol> </ol>
</li> </li>
</ol> </ol>
</div> </div>
<div class="section" id="description-and-realization"> <div class="section" id="description-and-realization-of-computation-graph">
<span id="description-and-realization"></span><h2>Description and Realization<a class="headerlink" href="#description-and-realization" title="永久链接至标题"></a></h2> <span id="description-and-realization-of-computation-graph"></span><h2>Description and Realization of Computation Graph<a class="headerlink" href="#description-and-realization-of-computation-graph" title="永久链接至标题"></a></h2>
<p>At compile time, the Python program generates protobuf message representation of the graph, or the description of the graph.</p> <p>At compile time, the Python program generates a protobuf message representation of the graph, or the description of the graph.</p>
<p>At runtime, the C++ program realizes the graph and run it.</p> <p>At runtime, the C++ program realizes the graph and runs it.</p>
<p>| | Representation (protobuf messages) | Realization (C++ class objects) | <p>| | Representation (protobuf messages) | Realization (C++ class objects) |
|&#8212;|&#8212;|&#8212;| |&#8212;|&#8212;|&#8212;|
|Data|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L107">VarDesc</a>|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24">Variable</a>| |Data|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L107">VarDesc</a>|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24">Variable</a>|
|Operation|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35">OpDesc</a>|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64">Operator</a>| |Operation|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35">OpDesc</a>|<a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64">Operator</a>|
|Block|BlockDesc|Block|</p> |Block|BlockDesc|Block|</p>
<p>The word <em>graph</em> is exchangable with <em>block</em> in this document. A graph represent computation steps and local variables as a C++/Java program block, or a pair of { and }.</p> <p>The word <em>graph</em> is interchangeable with <em>block</em> in this document. A graph represents computation steps and local variables similar to a C++/Java program block, or a pair of parentheses(<code class="docutils literal"><span class="pre">{</span></code> and <code class="docutils literal"><span class="pre">}</span></code>).</p>
</div> </div>
<div class="section" id="compilation-and-execution"> <div class="section" id="compilation-and-execution">
<span id="compilation-and-execution"></span><h2>Compilation and Execution<a class="headerlink" href="#compilation-and-execution" title="永久链接至标题"></a></h2> <span id="compilation-and-execution"></span><h2>Compilation and Execution<a class="headerlink" href="#compilation-and-execution" title="永久链接至标题"></a></h2>
<ol class="simple"> <ol class="simple">
<li>Run an applicaton Python program to describe the graph. In particular,<ol> <li>Run an application Python program to describe the graph. In particular, the Python application program does the following:<ol>
<li>create VarDesc to represent local/intermediate variables,</li> <li>Create <code class="docutils literal"><span class="pre">VarDesc</span></code> to represent local/intermediate variables,</li>
<li>create operators and set attributes,</li> <li>Create operators and set attributes,</li>
<li>validate attribute values,</li> <li>Validate attribute values,</li>
<li>inference the type and the shape of variables,</li> <li>Infer the type and the shape of variables,</li>
<li>plan for memory-reuse for variables,</li> <li>Plan memory-reuse for variables,</li>
<li>generate backward and optimization part of the Graph.</li> <li>Generate the backward graph</li>
<li>possiblly split the graph for distributed training.</li> <li>Optimize the computation graph.</li>
<li>Potentially, split the graph for distributed training.</li>
</ol> </ol>
</li> </li>
<li>The invocation of <code class="docutils literal"><span class="pre">train</span></code> or <code class="docutils literal"><span class="pre">infer</span></code> in the application Python program:<ol> <li>The invocation of <code class="docutils literal"><span class="pre">train</span></code> or <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/inference.py#L108"><code class="docutils literal"><span class="pre">infer</span></code></a> methods in the application Python program does the following:<ol>
<li>create a new Scope instance in the <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md">scope hierarchy</a> for each run of a block,<ol> <li>Create a new Scope instance in the <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md">scope hierarchy</a> for each run of a block,<ol>
<li>realize local variables defined in the BlockDesc message in the new scope,</li> <li>realize local variables defined in the BlockDesc message in the new scope,</li>
<li>a scope is similar to the stack frame in programming languages,</li> <li>a scope is similar to the stack frame in programming languages,</li>
</ol> </ol>
</li> </li>
<li>create an instance of class <code class="docutils literal"><span class="pre">Block</span></code>, in which,<ol> <li>Create an instance of class <code class="docutils literal"><span class="pre">Block</span></code>, in which,<ol>
<li>realize operators in the BlockDesc message,</li> <li>realize operators in the BlockDesc message,</li>
</ol> </ol>
</li> </li>
<li>run the Block by calling<ol> <li>Run the Block by calling<ol>
<li><code class="docutils literal"><span class="pre">Block::Eval(vector&lt;Variable&gt;*</span> <span class="pre">targets)</span></code> for forward and backward computations, or</li> <li><code class="docutils literal"><span class="pre">Block::Eval(vector&lt;Variable&gt;*</span> <span class="pre">targets)</span></code> for forward and backward computations, or</li>
<li><code class="docutils literal"><span class="pre">Block::Eval(vector&lt;Operator&gt;*</span> <span class="pre">targets)</span></code> for optimization.</li> <li><code class="docutils literal"><span class="pre">Block::Eval(vector&lt;Operator&gt;*</span> <span class="pre">targets)</span></code> for optimization.</li>
</ol> </ol>
...@@ -272,17 +273,17 @@ ...@@ -272,17 +273,17 @@
<div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time -&gt; IR -&gt; Runtime <div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time -&gt; IR -&gt; Runtime
</pre></div> </pre></div>
</div> </div>
<div class="section" id="benefit"> <div class="section" id="benefits-of-ir">
<span id="benefit"></span><h3>Benefit<a class="headerlink" href="#benefit" title="永久链接至标题"></a></h3> <span id="benefits-of-ir"></span><h3>Benefits of IR<a class="headerlink" href="#benefits-of-ir" title="永久链接至标题"></a></h3>
<ul> <ul>
<li><p class="first">Optimization</p> <li><p class="first">Optimization</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time -&gt; IR -&gt; Optimized IR -&gt; Runtime <div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time -&gt; IR -&gt; Optimized IR -&gt; Runtime
</pre></div> </pre></div>
</div> </div>
</li> </li>
<li><p class="first">Send automatically partitioned IR to different nodes.</p> <li><p class="first">Automatically send partitioned IR to different nodes.</p>
<ul> <ul>
<li><p class="first">Automatic data parallel</p> <li><p class="first">Automatic Data Parallelism</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time <div class="highlight-text"><div class="highlight"><pre><span></span>Compile Time
|-&gt; Single GPU IR |-&gt; Single GPU IR
|-&gt; [trainer-IR-0, trainer-IR-1, pserver-IR] |-&gt; [trainer-IR-0, trainer-IR-1, pserver-IR]
...@@ -292,7 +293,7 @@ ...@@ -292,7 +293,7 @@
</pre></div> </pre></div>
</div> </div>
</li> </li>
<li><p class="first">Automatic model parallel (planned for future)</p> <li><p class="first">Automatic Model Parallelism (planned for future)</p>
</li> </li>
</ul> </ul>
</li> </li>
...@@ -310,10 +311,10 @@ ...@@ -310,10 +311,10 @@
<span id="operator"></span><h1>Operator<a class="headerlink" href="#operator" title="永久链接至标题"></a></h1> <span id="operator"></span><h1>Operator<a class="headerlink" href="#operator" title="永久链接至标题"></a></h1>
<p><img alt="class_diagram" src="http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot" /></p> <p><img alt="class_diagram" src="http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot" /></p>
<ul class="simple"> <ul class="simple">
<li><code class="docutils literal"><span class="pre">Operator</span></code> is the fundamental building block as the user interface.<ul> <li><code class="docutils literal"><span class="pre">Operator</span></code> is the fundamental building block of the user interface.<ul>
<li>Operator stores input/output variable name, and attributes.</li> <li>Operator stores input/output variable names, and attributes.</li>
<li>The <code class="docutils literal"><span class="pre">InferShape</span></code> interface is used to infer output variable shapes by its input shapes.</li> <li>The <code class="docutils literal"><span class="pre">InferShape</span></code> interface is used to infer the shape of the output variable shapes based on the shapes of the input variables.</li>
<li>Use <code class="docutils literal"><span class="pre">Run</span></code> to compute <code class="docutils literal"><span class="pre">input</span> <span class="pre">variables</span></code> to <code class="docutils literal"><span class="pre">output</span> <span class="pre">variables</span></code>.</li> <li>Use <code class="docutils literal"><span class="pre">Run</span></code> to compute the <code class="docutils literal"><span class="pre">output</span></code> variables from the <code class="docutils literal"><span class="pre">input</span></code> variables.</li>
</ul> </ul>
</li> </li>
</ul> </ul>
...@@ -336,11 +337,11 @@ ...@@ -336,11 +337,11 @@
<span id="why-separate-kernel-and-operator"></span><h1>Why separate Kernel and Operator<a class="headerlink" href="#why-separate-kernel-and-operator" title="永久链接至标题"></a></h1> <span id="why-separate-kernel-and-operator"></span><h1>Why separate Kernel and Operator<a class="headerlink" href="#why-separate-kernel-and-operator" title="永久链接至标题"></a></h1>
<ul class="simple"> <ul class="simple">
<li>Separate GPU and CPU code.<ul> <li>Separate GPU and CPU code.<ul>
<li>Make Paddle can run without GPU.</li> <li>Make Paddle capable of running without GPU.</li>
</ul> </ul>
</li> </li>
<li>Make one operator (which is user interface) can contain many implementations.<ul> <li>Make one operator (which is a user interface) and create many implementations.<ul>
<li>Same mul op, different FP16, FP32 Kernel. different MKL, eigen kernel.</li> <li>For example, same multiplication op can have different implementations kernels such as FP16 kernel, FP32 kernel, MKL, eigen kernel.</li>
</ul> </ul>
</li> </li>
</ul> </ul>
...@@ -351,30 +352,30 @@ ...@@ -351,30 +352,30 @@
<ul class="simple"> <ul class="simple">
<li><code class="docutils literal"><span class="pre">Eigen::Tensor</span></code> contains basic math and element-wise functions.<ul> <li><code class="docutils literal"><span class="pre">Eigen::Tensor</span></code> contains basic math and element-wise functions.<ul>
<li>Note that <code class="docutils literal"><span class="pre">Eigen::Tensor</span></code> has broadcast implementation.</li> <li>Note that <code class="docutils literal"><span class="pre">Eigen::Tensor</span></code> has broadcast implementation.</li>
<li>Limit number of <code class="docutils literal"><span class="pre">tensor.device(dev)</span> <span class="pre">=</span></code> in your code.</li> <li>Limit the number of <code class="docutils literal"><span class="pre">tensor.device(dev)</span> <span class="pre">=</span></code> in your code.</li>
</ul> </ul>
</li> </li>
<li><code class="docutils literal"><span class="pre">thrust::tranform</span></code> and <code class="docutils literal"><span class="pre">std::transform</span></code>.<ul> <li><code class="docutils literal"><span class="pre">thrust::tranform</span></code> and <code class="docutils literal"><span class="pre">std::transform</span></code>.<ul>
<li><code class="docutils literal"><span class="pre">thrust</span></code> has the same API as C++ standard library. Using <code class="docutils literal"><span class="pre">transform</span></code> can quickly implement a customized elementwise kernel.</li> <li><code class="docutils literal"><span class="pre">thrust</span></code> has the same API as C++ standard library. Using <code class="docutils literal"><span class="pre">transform</span></code>, one can quickly implement customized elementwise kernels.</li>
<li><code class="docutils literal"><span class="pre">thrust</span></code> has more complex API, like <code class="docutils literal"><span class="pre">scan</span></code>, <code class="docutils literal"><span class="pre">reduce</span></code>, <code class="docutils literal"><span class="pre">reduce_by_key</span></code>.</li> <li><code class="docutils literal"><span class="pre">thrust</span></code> also has more complex APIs, like <code class="docutils literal"><span class="pre">scan</span></code>, <code class="docutils literal"><span class="pre">reduce</span></code>, <code class="docutils literal"><span class="pre">reduce_by_key</span></code>.</li>
</ul> </ul>
</li> </li>
<li>Hand-writing <code class="docutils literal"><span class="pre">GPUKernel</span></code> and <code class="docutils literal"><span class="pre">CPU</span></code> code<ul> <li>Hand-writing <code class="docutils literal"><span class="pre">GPUKernel</span></code> and <code class="docutils literal"><span class="pre">CPU</span></code> code<ul>
<li>Do not write <code class="docutils literal"><span class="pre">.h</span></code>. CPU Kernel should be in <code class="docutils literal"><span class="pre">.cc</span></code>. GPU kernel should be in <code class="docutils literal"><span class="pre">.cu</span></code>. (<code class="docutils literal"><span class="pre">GCC</span></code> cannot compile GPU code.)</li> <li>Do not write in header (<code class="docutils literal"><span class="pre">.h</span></code>) files. CPU Kernel should be in cpp source (<code class="docutils literal"><span class="pre">.cc</span></code>) and GPU kernels should be in cuda (<code class="docutils literal"><span class="pre">.cu</span></code>) files. (GCC cannot compile GPU code.)</li>
</ul> </ul>
</li> </li>
</ul> </ul>
</div> </div>
<hr class="docutils" /> <hr class="docutils" />
<div class="section" id="operator-register"> <div class="section" id="operator-registration">
<span id="operator-register"></span><h1>Operator Register<a class="headerlink" href="#operator-register" title="永久链接至标题"></a></h1> <span id="operator-registration"></span><h1>Operator Registration<a class="headerlink" href="#operator-registration" title="永久链接至标题"></a></h1>
<div class="section" id="why-register-is-necessary"> <div class="section" id="why-registration-is-necessary">
<span id="why-register-is-necessary"></span><h2>Why register is necessary?<a class="headerlink" href="#why-register-is-necessary" title="永久链接至标题"></a></h2> <span id="why-registration-is-necessary"></span><h2>Why registration is necessary?<a class="headerlink" href="#why-registration-is-necessary" title="永久链接至标题"></a></h2>
<p>We need a method to build mappings between Op type names and Op classes.</p> <p>We need a method to build mappings between Op type names and Op classes.</p>
</div> </div>
<div class="section" id="how-to-do-the-register"> <div class="section" id="how-is-registration-implemented">
<span id="how-to-do-the-register"></span><h2>How to do the register?<a class="headerlink" href="#how-to-do-the-register" title="永久链接至标题"></a></h2> <span id="how-is-registration-implemented"></span><h2>How is registration implemented?<a class="headerlink" href="#how-is-registration-implemented" title="永久链接至标题"></a></h2>
<p>Maintain a map, whose key is the type name and value is corresponding Op constructor.</p> <p>Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.</p>
</div> </div>
</div> </div>
<hr class="docutils" /> <hr class="docutils" />
...@@ -407,22 +408,22 @@ ...@@ -407,22 +408,22 @@
</div> </div>
</div> </div>
<div class="section" id="use-macros"> <div class="section" id="use-macros">
<span id="use-macros"></span><h2><code class="docutils literal"><span class="pre">USE</span></code> Macros<a class="headerlink" href="#use-macros" title="永久链接至标题"></a></h2> <span id="use-macros"></span><h2>USE Macros<a class="headerlink" href="#use-macros" title="永久链接至标题"></a></h2>
<p>make sure the registration process is executed and linked.</p> <p>Make sure the registration process is executed and linked.</p>
</div> </div>
</div> </div>
<hr class="docutils" /> <hr class="docutils" />
<div class="section" id="register-process"> <div class="section" id="registration-process">
<span id="register-process"></span><h1>Register Process<a class="headerlink" href="#register-process" title="永久链接至标题"></a></h1> <span id="registration-process"></span><h1>Registration Process<a class="headerlink" href="#registration-process" title="永久链接至标题"></a></h1>
<ol class="simple"> <ol class="simple">
<li>Write Op class, as well as its gradient Op class if there is.</li> <li>Write an Op class and its gradient Op class, if required.</li>
<li>Write Op maker class. In the constructor, describe its inputs, outputs, and attributes.</li> <li>Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator.</li>
<li>Invoke macro <code class="docutils literal"><span class="pre">REGISTER_OP</span></code>. The macro will<ol> <li>Invoke the macro <code class="docutils literal"><span class="pre">REGISTER_OP</span></code>. This macro will<ol>
<li>call maker class to complete <code class="docutils literal"><span class="pre">proto</span></code> and <code class="docutils literal"><span class="pre">checker</span></code></li> <li>Call maker class to complete the <code class="docutils literal"><span class="pre">proto</span></code> and the <code class="docutils literal"><span class="pre">checker</span></code></li>
<li>with the completed <code class="docutils literal"><span class="pre">proto</span></code> and <code class="docutils literal"><span class="pre">checker</span></code>, build a new key-value pair in the <code class="docutils literal"><span class="pre">OpInfoMap</span></code></li> <li>Using the completed <code class="docutils literal"><span class="pre">proto</span></code> and <code class="docutils literal"><span class="pre">checker</span></code>, it will add a new key-value pair to the <code class="docutils literal"><span class="pre">OpInfoMap</span></code></li>
</ol> </ol>
</li> </li>
<li>Invoke <code class="docutils literal"><span class="pre">USE</span></code> macro in where the Op is used to make sure it is linked.</li> <li>Invoke the <code class="docutils literal"><span class="pre">USE</span></code> macro in which the Op is used, to make sure that it is linked.</li>
</ol> </ol>
</div> </div>
<hr class="docutils" /> <hr class="docutils" />
...@@ -431,7 +432,7 @@ ...@@ -431,7 +432,7 @@
<div class="section" id="create-backward-operator"> <div class="section" id="create-backward-operator">
<span id="create-backward-operator"></span><h2>Create Backward Operator<a class="headerlink" href="#create-backward-operator" title="永久链接至标题"></a></h2> <span id="create-backward-operator"></span><h2>Create Backward Operator<a class="headerlink" href="#create-backward-operator" title="永久链接至标题"></a></h2>
<ul class="simple"> <ul class="simple">
<li>Mapping from forwarding Op to backward Op <li>Mapping from forward Op to backward Op
<img alt="backward" src="https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png" /></li> <img alt="backward" src="https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png" /></li>
</ul> </ul>
</div> </div>
...@@ -442,12 +443,12 @@ ...@@ -442,12 +443,12 @@
<div class="section" id="build-backward-network"> <div class="section" id="build-backward-network">
<span id="build-backward-network"></span><h2>Build Backward Network<a class="headerlink" href="#build-backward-network" title="永久链接至标题"></a></h2> <span id="build-backward-network"></span><h2>Build Backward Network<a class="headerlink" href="#build-backward-network" title="永久链接至标题"></a></h2>
<ul class="simple"> <ul class="simple">
<li><strong>Input</strong> graph of forwarding operators</li> <li><strong>Input</strong>: graph of forwarding operators</li>
<li><strong>Output</strong> graph of backward operators</li> <li><strong>Output</strong>: graph of backward operators</li>
<li><strong>corner case in construction</strong><ul> <li><strong>Corner cases in construction</strong><ul>
<li>shared variable =&gt; insert <code class="docutils literal"><span class="pre">Add</span></code> operator</li> <li>Shared Variables =&gt; insert an <code class="docutils literal"><span class="pre">Add</span></code> operator to combine gradients</li>
<li>no gradient =&gt; insert <code class="docutils literal"><span class="pre">fill_zero_grad</span></code> operator</li> <li>No Gradient =&gt; insert a <code class="docutils literal"><span class="pre">fill_zero_grad</span></code> operator</li>
<li>recursive netOp =&gt; call <code class="docutils literal"><span class="pre">Backward</span></code> recursively</li> <li>Recursive NetOp =&gt; call <code class="docutils literal"><span class="pre">Backward</span></code> recursively</li>
<li>RNN Op =&gt; recursively call <code class="docutils literal"><span class="pre">Backward</span></code> on stepnet</li> <li>RNN Op =&gt; recursively call <code class="docutils literal"><span class="pre">Backward</span></code> on stepnet</li>
</ul> </ul>
</li> </li>
...@@ -460,17 +461,17 @@ ...@@ -460,17 +461,17 @@
<ul class="simple"> <ul class="simple">
<li><code class="docutils literal"><span class="pre">Tensor</span></code> is an n-dimension array with type.<ul> <li><code class="docutils literal"><span class="pre">Tensor</span></code> is an n-dimension array with type.<ul>
<li>Only dims and data pointers are stored in <code class="docutils literal"><span class="pre">Tensor</span></code>.</li> <li>Only dims and data pointers are stored in <code class="docutils literal"><span class="pre">Tensor</span></code>.</li>
<li>All operators on <code class="docutils literal"><span class="pre">Tensor</span></code> is written in <code class="docutils literal"><span class="pre">Operator</span></code> or global functions.</li> <li>All operations on <code class="docutils literal"><span class="pre">Tensor</span></code> are written in <code class="docutils literal"><span class="pre">Operator</span></code> or global functions.</li>
<li>variable length Tensor design <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md">LoDTensor</a></li> <li>Variable length Tensor design <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md">LoDTensor</a></li>
</ul> </ul>
</li> </li>
<li><code class="docutils literal"><span class="pre">Variable</span></code> is the inputs and outputs of an operator. Not just <code class="docutils literal"><span class="pre">Tensor</span></code>.<ul> <li><code class="docutils literal"><span class="pre">Variable</span></code> instances are the inputs and the outputs of an operator. Not just <code class="docutils literal"><span class="pre">Tensor</span></code>.<ul>
<li>step_scopes in RNN is a variable and not a tensor.</li> <li><code class="docutils literal"><span class="pre">step_scopes</span></code> in RNN is a variable and not a tensor.</li>
</ul> </ul>
</li> </li>
<li><code class="docutils literal"><span class="pre">Scope</span></code> is where variables store at.<ul> <li><code class="docutils literal"><span class="pre">Scope</span></code> is where variables are stores.<ul>
<li>map&lt;string/*var name */, Variable&gt;</li> <li>map&lt;string <code class="docutils literal"><span class="pre">variable_name</span></code>, Variable&gt;</li>
<li><code class="docutils literal"><span class="pre">Scope</span></code> has a hierarchical structure. The local scope can get variable from its parent scope.</li> <li><code class="docutils literal"><span class="pre">Scope</span></code> has a hierarchical structure. The local scope can get variables from its parent scope.</li>
</ul> </ul>
</li> </li>
</ul> </ul>
...@@ -481,11 +482,11 @@ ...@@ -481,11 +482,11 @@
<div class="section" id="the-difference-with-original-rnnop"> <div class="section" id="the-difference-with-original-rnnop">
<span id="the-difference-with-original-rnnop"></span><h2>the difference with original RNNOp<a class="headerlink" href="#the-difference-with-original-rnnop" title="永久链接至标题"></a></h2> <span id="the-difference-with-original-rnnop"></span><h2>the difference with original RNNOp<a class="headerlink" href="#the-difference-with-original-rnnop" title="永久链接至标题"></a></h2>
<ul class="simple"> <ul class="simple">
<li>as an operator is more intuitive than <code class="docutils literal"><span class="pre">RNNOp</span></code>,</li> <li>As an operator is more intuitive than <code class="docutils literal"><span class="pre">RNNOp</span></code>,</li>
<li>offers new interface <code class="docutils literal"><span class="pre">Eval(targets)</span></code> to deduce the minimal block to <code class="docutils literal"><span class="pre">Run</span></code>,</li> <li>Offers a new interface <code class="docutils literal"><span class="pre">Eval(targets)</span></code> to deduce the minimal block to <code class="docutils literal"><span class="pre">Run</span></code>,</li>
<li>fits the compile-time/ runtime separation design.<ul> <li>Fits the compile-time/ runtime separation design paradigm.<ul>
<li>during the compilation, <code class="docutils literal"><span class="pre">SymbolTable</span></code> stores <code class="docutils literal"><span class="pre">VarDesc</span></code>s and <code class="docutils literal"><span class="pre">OpDesc</span></code>s and serialize to a <code class="docutils literal"><span class="pre">BlockDesc</span></code></li> <li>During the compilation, <code class="docutils literal"><span class="pre">SymbolTable</span></code> stores <code class="docutils literal"><span class="pre">VarDesc</span></code>s and <code class="docutils literal"><span class="pre">OpDesc</span></code>s and serialize to a <code class="docutils literal"><span class="pre">BlockDesc</span></code></li>
<li>when graph executes, a Block with <code class="docutils literal"><span class="pre">BlockDesc</span></code> passed in creates <code class="docutils literal"><span class="pre">Op</span></code> and <code class="docutils literal"><span class="pre">Var</span></code> then <code class="docutils literal"><span class="pre">Run</span></code></li> <li>When graph executes, a Block with <code class="docutils literal"><span class="pre">BlockDesc</span></code> is passed. It then creates <code class="docutils literal"><span class="pre">Op</span></code> and <code class="docutils literal"><span class="pre">Var</span></code> instances and then invokes <code class="docutils literal"><span class="pre">Run</span></code>.</li>
</ul> </ul>
</li> </li>
</ul> </ul>
...@@ -495,32 +496,32 @@ ...@@ -495,32 +496,32 @@
<div class="section" id="milestone"> <div class="section" id="milestone">
<span id="milestone"></span><h1>Milestone<a class="headerlink" href="#milestone" title="永久链接至标题"></a></h1> <span id="milestone"></span><h1>Milestone<a class="headerlink" href="#milestone" title="永久链接至标题"></a></h1>
<ul class="simple"> <ul class="simple">
<li>take Paddle/books as the main line, the requirement of the models motivates framework refactoring,</li> <li>Take Paddle/books as the main line, the requirement of the models motivates framework refactoring,</li>
<li>model migration<ul> <li>Model migration<ul>
<li>framework development gives <strong>priority support</strong> to model migration, for example,<ul> <li>Framework development gives <strong>priority support</strong> to model migration, for example,<ul>
<li>the MNIST demo needs a Python interface,</li> <li>the MNIST demo needs a Python interface,</li>
<li>the RNN models require the framework to support <code class="docutils literal"><span class="pre">LoDTensor</span></code>.</li> <li>the RNN models require the framework to support <code class="docutils literal"><span class="pre">LoDTensor</span></code>.</li>
</ul> </ul>
</li> </li>
<li>determine some timelines,</li> <li>Determine some timelines,</li>
<li>heavily-relied Ops need to be migrated first,</li> <li>Frequently used Ops need to be migrated first,</li>
<li>different models can be migrated parallelly.</li> <li>Different models can be migrated in parallel.</li>
</ul> </ul>
</li> </li>
<li>improve the framework at the same time</li> <li>Improve the framework at the same time</li>
<li>accept imperfection, concentrated on solving the specific problem at the right price.</li> <li>Accept imperfection, concentrate on solving the specific problem at the right price.</li>
</ul> </ul>
</div> </div>
<hr class="docutils" /> <hr class="docutils" />
<div class="section" id="control-the-migration-quality"> <div class="section" id="control-the-migration-quality">
<span id="control-the-migration-quality"></span><h1>Control the migration quality<a class="headerlink" href="#control-the-migration-quality" title="永久链接至标题"></a></h1> <span id="control-the-migration-quality"></span><h1>Control the migration quality<a class="headerlink" href="#control-the-migration-quality" title="永久链接至标题"></a></h1>
<ul class="simple"> <ul class="simple">
<li>compare the performance of migrated models with old ones.</li> <li>Compare the performance of migrated models with old ones.</li>
<li>follow google C style</li> <li>Follow the google C++ style</li>
<li>build the automatic workflow of generating Python/C++ documentations<ul> <li>Build the automatic workflow of generating Python/C++ documentations.<ul>
<li>the documentation of layers and ops should be written inside the code</li> <li>The documentation of layers and ops should be written inside the code.</li>
<li>take the documentation quality into account when doing PR</li> <li>Take the documentation quality into account when submitting pull requests.</li>
<li>preview the documentations, read and improve them from users&#8217; perspective</li> <li>Preview the documentations, read and improve them from a user&#8217;s perspective.</li>
</ul> </ul>
</li> </li>
</ul> </ul>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册