@@ -17,7 +17,7 @@ The goals of refactoring include:
...
@@ -17,7 +17,7 @@ The goals of refactoring include:
1. A graph is composed of *variables* and *operators*.
1. A graph is composed of *variables* and *operators*.
1. The description of graphs must be capable of being serialized/deserialized, so that
1. The description of graphs must be capable of being serialized/deserialized, so that:
1. It can to be sent to the cloud for distributed execution, and
1. It can to be sent to the cloud for distributed execution, and
1. It can be sent to clients for mobile or enterprise deployment.
1. It can be sent to clients for mobile or enterprise deployment.
...
@@ -137,19 +137,18 @@ Compile Time -> IR -> Runtime
...
@@ -137,19 +137,18 @@ Compile Time -> IR -> Runtime
* `Eigen::Tensor` contains basic math and element-wise functions.
* `Eigen::Tensor` contains basic math and element-wise functions.
* Note that `Eigen::Tensor` has broadcast implementation.
* Note that `Eigen::Tensor` has broadcast implementation.
* Limit the number of `tensor.device(dev) = ` in your code.
* Limit the number of `tensor.device(dev) = ` in your code.
* `thrust::tranform` and `std::transform`.
* `thrust::transform` and `std::transform`.
* `thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized elementwise kernels.
* `thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized element-wise kernels.
* `thrust` also has more complex APIs, like `scan`, `reduce`, `reduce_by_key`.
* `thrust` also has more complex APIs, like `scan`, `reduce`, `reduce_by_key`.
* Hand-writing `GPUKernel` and `CPU` code
* Hand-writing `GPUKernel` and `CPU` code
* Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.)
* Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.)
---
---
# Operator Registration
# Operator Registration
## Why registration is necessary?
## Why is registration necessary?
We need a method to build mappings between Op type names and Op classes.
We need a method to build mappings between Op type names and Op classes.
## How is registration implemented?
## How is registration implemented?
Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.
Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.
---
---
...
@@ -170,7 +169,7 @@ Maintaining a map, whose key is the type name and the value is the corresponding
...
@@ -170,7 +169,7 @@ Maintaining a map, whose key is the type name and the value is the corresponding
# Related Concepts
# Related Concepts
### Op_Maker
### Op_Maker
It's constructor takes `proto` and `checker`. They are compeleted during Op_Maker's construction. ([ScaleOpMaker](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37))
It's constructor takes `proto` and `checker`. They are completed during Op_Maker's construction. ([ScaleOpMaker](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37))
### Register Macros
### Register Macros
```cpp
```cpp
...
@@ -200,7 +199,7 @@ Make sure the registration process is executed and linked.
...
@@ -200,7 +199,7 @@ Make sure the registration process is executed and linked.
---
---
# Backward Module (2/2)
# Backward Module (2/2)
### Build Backward Network
### Build Backward Network
- **Input**: graph of forwarding operators
- **Input**: graph of forward operators
- **Output**: graph of backward operators
- **Output**: graph of backward operators
- **Corner cases in construction**
- **Corner cases in construction**
- Shared Variables => insert an `Add` operator to combine gradients
- Shared Variables => insert an `Add` operator to combine gradients
...
@@ -224,7 +223,7 @@ Make sure the registration process is executed and linked.
...
@@ -224,7 +223,7 @@ Make sure the registration process is executed and linked.
---
---
# Block (in design)
# Block (in design)
## the difference with original RNNOp
## the difference between original RNNOp and Block
- As an operator is more intuitive than `RNNOp`,
- As an operator is more intuitive than `RNNOp`,
- Offers a new interface `Eval(targets)` to deduce the minimal block to `Run`,
- Offers a new interface `Eval(targets)` to deduce the minimal block to `Run`,
- Fits the compile-time/ runtime separation design paradigm.
- Fits the compile-time/ runtime separation design paradigm.
<li>Please refer to <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md">computation graphs</a> for a concrete example.</li>
<li>Please refer to <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md">computation graphs</a> for a concrete example.</li>
<li>Users write Python programs to describe the graphs and run them (locally or remotely).</li>
<li>Users write Python programs to describe the graphs and run them (locally or remotely).</li>
<li>A graph is composed of <em>variables</em> and <em>operators</em>.</li>
<li>A graph is composed of <em>variables</em> and <em>operators</em>.</li>
<li>The description of graphs must be capable of being serialized/deserialized, so that<ol>
<li>The description of graphs must be capable of being serialized/deserialized, so that:<ol>
<li>It can to be sent to the cloud for distributed execution, and</li>
<li>It can to be sent to the cloud for distributed execution, and</li>
<li>It can be sent to clients for mobile or enterprise deployment.</li>
<li>It can be sent to clients for mobile or enterprise deployment.</li>
</ol>
</ol>
...
@@ -341,8 +341,8 @@
...
@@ -341,8 +341,8 @@
<li>Limit the number of <codeclass="docutils literal"><spanclass="pre">tensor.device(dev)</span><spanclass="pre">=</span></code> in your code.</li>
<li>Limit the number of <codeclass="docutils literal"><spanclass="pre">tensor.device(dev)</span><spanclass="pre">=</span></code> in your code.</li>
</ul>
</ul>
</li>
</li>
<li><codeclass="docutils literal"><spanclass="pre">thrust::tranform</span></code> and <codeclass="docutils literal"><spanclass="pre">std::transform</span></code>.<ul>
<li><codeclass="docutils literal"><spanclass="pre">thrust::transform</span></code> and <codeclass="docutils literal"><spanclass="pre">std::transform</span></code>.<ul>
<li><codeclass="docutils literal"><spanclass="pre">thrust</span></code> has the same API as C++ standard library. Using <codeclass="docutils literal"><spanclass="pre">transform</span></code>, one can quickly implement customized elementwise kernels.</li>
<li><codeclass="docutils literal"><spanclass="pre">thrust</span></code> has the same API as C++ standard library. Using <codeclass="docutils literal"><spanclass="pre">transform</span></code>, one can quickly implement customized element-wise kernels.</li>
<li><codeclass="docutils literal"><spanclass="pre">thrust</span></code> also has more complex APIs, like <codeclass="docutils literal"><spanclass="pre">scan</span></code>, <codeclass="docutils literal"><spanclass="pre">reduce</span></code>, <codeclass="docutils literal"><spanclass="pre">reduce_by_key</span></code>.</li>
<li><codeclass="docutils literal"><spanclass="pre">thrust</span></code> also has more complex APIs, like <codeclass="docutils literal"><spanclass="pre">scan</span></code>, <codeclass="docutils literal"><spanclass="pre">reduce</span></code>, <codeclass="docutils literal"><spanclass="pre">reduce_by_key</span></code>.</li>
</ul>
</ul>
</li>
</li>
...
@@ -355,8 +355,8 @@
...
@@ -355,8 +355,8 @@
<hrclass="docutils"/>
<hrclass="docutils"/>
<divclass="section"id="operator-registration">
<divclass="section"id="operator-registration">
<spanid="operator-registration"></span><h1>Operator Registration<aclass="headerlink"href="#operator-registration"title="Permalink to this headline">¶</a></h1>
<spanid="operator-registration"></span><h1>Operator Registration<aclass="headerlink"href="#operator-registration"title="Permalink to this headline">¶</a></h1>
<spanid="why-registration-is-necessary"></span><h2>Why registration is necessary?<aclass="headerlink"href="#why-registration-is-necessary"title="Permalink to this headline">¶</a></h2>
<spanid="why-is-registration-necessary"></span><h2>Why is registration necessary?<aclass="headerlink"href="#why-is-registration-necessary"title="Permalink to this headline">¶</a></h2>
<p>We need a method to build mappings between Op type names and Op classes.</p>
<p>We need a method to build mappings between Op type names and Op classes.</p>
<spanid="related-concepts"></span><h1>Related Concepts<aclass="headerlink"href="#related-concepts"title="Permalink to this headline">¶</a></h1>
<spanid="related-concepts"></span><h1>Related Concepts<aclass="headerlink"href="#related-concepts"title="Permalink to this headline">¶</a></h1>
<divclass="section"id="op-maker">
<divclass="section"id="op-maker">
<spanid="op-maker"></span><h2>Op_Maker<aclass="headerlink"href="#op-maker"title="Permalink to this headline">¶</a></h2>
<spanid="op-maker"></span><h2>Op_Maker<aclass="headerlink"href="#op-maker"title="Permalink to this headline">¶</a></h2>
<p>It’s constructor takes <codeclass="docutils literal"><spanclass="pre">proto</span></code> and <codeclass="docutils literal"><spanclass="pre">checker</span></code>. They are compeleted during Op_Maker’s construction. (<aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37">ScaleOpMaker</a>)</p>
<p>It’s constructor takes <codeclass="docutils literal"><spanclass="pre">proto</span></code> and <codeclass="docutils literal"><spanclass="pre">checker</span></code>. They are completed during Op_Maker’s construction. (<aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37">ScaleOpMaker</a>)</p>
</div>
</div>
<divclass="section"id="register-macros">
<divclass="section"id="register-macros">
<spanid="register-macros"></span><h2>Register Macros<aclass="headerlink"href="#register-macros"title="Permalink to this headline">¶</a></h2>
<spanid="register-macros"></span><h2>Register Macros<aclass="headerlink"href="#register-macros"title="Permalink to this headline">¶</a></h2>
...
@@ -429,7 +429,7 @@
...
@@ -429,7 +429,7 @@
<divclass="section"id="build-backward-network">
<divclass="section"id="build-backward-network">
<spanid="build-backward-network"></span><h2>Build Backward Network<aclass="headerlink"href="#build-backward-network"title="Permalink to this headline">¶</a></h2>
<spanid="build-backward-network"></span><h2>Build Backward Network<aclass="headerlink"href="#build-backward-network"title="Permalink to this headline">¶</a></h2>
<ulclass="simple">
<ulclass="simple">
<li><strong>Input</strong>: graph of forwarding operators</li>
<li><strong>Input</strong>: graph of forward operators</li>
<li><strong>Output</strong>: graph of backward operators</li>
<li><strong>Output</strong>: graph of backward operators</li>
<li><strong>Corner cases in construction</strong><ul>
<li><strong>Corner cases in construction</strong><ul>
<li>Shared Variables => insert an <codeclass="docutils literal"><spanclass="pre">Add</span></code> operator to combine gradients</li>
<li>Shared Variables => insert an <codeclass="docutils literal"><spanclass="pre">Add</span></code> operator to combine gradients</li>
...
@@ -465,8 +465,8 @@
...
@@ -465,8 +465,8 @@
<hrclass="docutils"/>
<hrclass="docutils"/>
<divclass="section"id="block-in-design">
<divclass="section"id="block-in-design">
<spanid="block-in-design"></span><h1>Block (in design)<aclass="headerlink"href="#block-in-design"title="Permalink to this headline">¶</a></h1>
<spanid="block-in-design"></span><h1>Block (in design)<aclass="headerlink"href="#block-in-design"title="Permalink to this headline">¶</a></h1>
<spanid="the-difference-with-original-rnnop"></span><h2>the difference with original RNNOp<aclass="headerlink"href="#the-difference-with-original-rnnop"title="Permalink to this headline">¶</a></h2>
<spanid="the-difference-between-original-rnnop-and-block"></span><h2>the difference between original RNNOp and Block<aclass="headerlink"href="#the-difference-between-original-rnnop-and-block"title="Permalink to this headline">¶</a></h2>
<ulclass="simple">
<ulclass="simple">
<li>As an operator is more intuitive than <codeclass="docutils literal"><spanclass="pre">RNNOp</span></code>,</li>
<li>As an operator is more intuitive than <codeclass="docutils literal"><spanclass="pre">RNNOp</span></code>,</li>
<li>Offers a new interface <codeclass="docutils literal"><spanclass="pre">Eval(targets)</span></code> to deduce the minimal block to <codeclass="docutils literal"><spanclass="pre">Run</span></code>,</li>
<li>Offers a new interface <codeclass="docutils literal"><spanclass="pre">Eval(targets)</span></code> to deduce the minimal block to <codeclass="docutils literal"><spanclass="pre">Run</span></code>,</li>
@@ -17,7 +17,7 @@ The goals of refactoring include:
...
@@ -17,7 +17,7 @@ The goals of refactoring include:
1. A graph is composed of *variables* and *operators*.
1. A graph is composed of *variables* and *operators*.
1. The description of graphs must be capable of being serialized/deserialized, so that
1. The description of graphs must be capable of being serialized/deserialized, so that:
1. It can to be sent to the cloud for distributed execution, and
1. It can to be sent to the cloud for distributed execution, and
1. It can be sent to clients for mobile or enterprise deployment.
1. It can be sent to clients for mobile or enterprise deployment.
...
@@ -137,19 +137,18 @@ Compile Time -> IR -> Runtime
...
@@ -137,19 +137,18 @@ Compile Time -> IR -> Runtime
* `Eigen::Tensor` contains basic math and element-wise functions.
* `Eigen::Tensor` contains basic math and element-wise functions.
* Note that `Eigen::Tensor` has broadcast implementation.
* Note that `Eigen::Tensor` has broadcast implementation.
* Limit the number of `tensor.device(dev) = ` in your code.
* Limit the number of `tensor.device(dev) = ` in your code.
* `thrust::tranform` and `std::transform`.
* `thrust::transform` and `std::transform`.
* `thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized elementwise kernels.
* `thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized element-wise kernels.
* `thrust` also has more complex APIs, like `scan`, `reduce`, `reduce_by_key`.
* `thrust` also has more complex APIs, like `scan`, `reduce`, `reduce_by_key`.
* Hand-writing `GPUKernel` and `CPU` code
* Hand-writing `GPUKernel` and `CPU` code
* Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.)
* Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.)
---
---
# Operator Registration
# Operator Registration
## Why registration is necessary?
## Why is registration necessary?
We need a method to build mappings between Op type names and Op classes.
We need a method to build mappings between Op type names and Op classes.
## How is registration implemented?
## How is registration implemented?
Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.
Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.
---
---
...
@@ -170,7 +169,7 @@ Maintaining a map, whose key is the type name and the value is the corresponding
...
@@ -170,7 +169,7 @@ Maintaining a map, whose key is the type name and the value is the corresponding
# Related Concepts
# Related Concepts
### Op_Maker
### Op_Maker
It's constructor takes `proto` and `checker`. They are compeleted during Op_Maker's construction. ([ScaleOpMaker](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37))
It's constructor takes `proto` and `checker`. They are completed during Op_Maker's construction. ([ScaleOpMaker](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37))
### Register Macros
### Register Macros
```cpp
```cpp
...
@@ -200,7 +199,7 @@ Make sure the registration process is executed and linked.
...
@@ -200,7 +199,7 @@ Make sure the registration process is executed and linked.
---
---
# Backward Module (2/2)
# Backward Module (2/2)
### Build Backward Network
### Build Backward Network
- **Input**: graph of forwarding operators
- **Input**: graph of forward operators
- **Output**: graph of backward operators
- **Output**: graph of backward operators
- **Corner cases in construction**
- **Corner cases in construction**
- Shared Variables => insert an `Add` operator to combine gradients
- Shared Variables => insert an `Add` operator to combine gradients
...
@@ -224,7 +223,7 @@ Make sure the registration process is executed and linked.
...
@@ -224,7 +223,7 @@ Make sure the registration process is executed and linked.
---
---
# Block (in design)
# Block (in design)
## the difference with original RNNOp
## the difference between original RNNOp and Block
- As an operator is more intuitive than `RNNOp`,
- As an operator is more intuitive than `RNNOp`,
- Offers a new interface `Eval(targets)` to deduce the minimal block to `Run`,
- Offers a new interface `Eval(targets)` to deduce the minimal block to `Run`,
- Fits the compile-time/ runtime separation design paradigm.
- Fits the compile-time/ runtime separation design paradigm.
<li>Please refer to <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md">computation graphs</a> for a concrete example.</li>
<li>Please refer to <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md">computation graphs</a> for a concrete example.</li>
<li>Users write Python programs to describe the graphs and run them (locally or remotely).</li>
<li>Users write Python programs to describe the graphs and run them (locally or remotely).</li>
<li>A graph is composed of <em>variables</em> and <em>operators</em>.</li>
<li>A graph is composed of <em>variables</em> and <em>operators</em>.</li>
<li>The description of graphs must be capable of being serialized/deserialized, so that<ol>
<li>The description of graphs must be capable of being serialized/deserialized, so that:<ol>
<li>It can to be sent to the cloud for distributed execution, and</li>
<li>It can to be sent to the cloud for distributed execution, and</li>
<li>It can be sent to clients for mobile or enterprise deployment.</li>
<li>It can be sent to clients for mobile or enterprise deployment.</li>
</ol>
</ol>
...
@@ -355,8 +355,8 @@
...
@@ -355,8 +355,8 @@
<li>Limit the number of <codeclass="docutils literal"><spanclass="pre">tensor.device(dev)</span><spanclass="pre">=</span></code> in your code.</li>
<li>Limit the number of <codeclass="docutils literal"><spanclass="pre">tensor.device(dev)</span><spanclass="pre">=</span></code> in your code.</li>
</ul>
</ul>
</li>
</li>
<li><codeclass="docutils literal"><spanclass="pre">thrust::tranform</span></code> and <codeclass="docutils literal"><spanclass="pre">std::transform</span></code>.<ul>
<li><codeclass="docutils literal"><spanclass="pre">thrust::transform</span></code> and <codeclass="docutils literal"><spanclass="pre">std::transform</span></code>.<ul>
<li><codeclass="docutils literal"><spanclass="pre">thrust</span></code> has the same API as C++ standard library. Using <codeclass="docutils literal"><spanclass="pre">transform</span></code>, one can quickly implement customized elementwise kernels.</li>
<li><codeclass="docutils literal"><spanclass="pre">thrust</span></code> has the same API as C++ standard library. Using <codeclass="docutils literal"><spanclass="pre">transform</span></code>, one can quickly implement customized element-wise kernels.</li>
<li><codeclass="docutils literal"><spanclass="pre">thrust</span></code> also has more complex APIs, like <codeclass="docutils literal"><spanclass="pre">scan</span></code>, <codeclass="docutils literal"><spanclass="pre">reduce</span></code>, <codeclass="docutils literal"><spanclass="pre">reduce_by_key</span></code>.</li>
<li><codeclass="docutils literal"><spanclass="pre">thrust</span></code> also has more complex APIs, like <codeclass="docutils literal"><spanclass="pre">scan</span></code>, <codeclass="docutils literal"><spanclass="pre">reduce</span></code>, <codeclass="docutils literal"><spanclass="pre">reduce_by_key</span></code>.</li>
<spanid="why-registration-is-necessary"></span><h2>Why registration is necessary?<aclass="headerlink"href="#why-registration-is-necessary"title="永久链接至标题">¶</a></h2>
<spanid="why-is-registration-necessary"></span><h2>Why is registration necessary?<aclass="headerlink"href="#why-is-registration-necessary"title="永久链接至标题">¶</a></h2>
<p>We need a method to build mappings between Op type names and Op classes.</p>
<p>We need a method to build mappings between Op type names and Op classes.</p>
<p>It’s constructor takes <codeclass="docutils literal"><spanclass="pre">proto</span></code> and <codeclass="docutils literal"><spanclass="pre">checker</span></code>. They are compeleted during Op_Maker’s construction. (<aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37">ScaleOpMaker</a>)</p>
<p>It’s constructor takes <codeclass="docutils literal"><spanclass="pre">proto</span></code> and <codeclass="docutils literal"><spanclass="pre">checker</span></code>. They are completed during Op_Maker’s construction. (<aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37">ScaleOpMaker</a>)</p>
<spanid="the-difference-with-original-rnnop"></span><h2>the difference with original RNNOp<aclass="headerlink"href="#the-difference-with-original-rnnop"title="永久链接至标题">¶</a></h2>
<spanid="the-difference-between-original-rnnop-and-block"></span><h2>the difference between original RNNOp and Block<aclass="headerlink"href="#the-difference-between-original-rnnop-and-block"title="永久链接至标题">¶</a></h2>
<ulclass="simple">
<ulclass="simple">
<li>As an operator is more intuitive than <codeclass="docutils literal"><spanclass="pre">RNNOp</span></code>,</li>
<li>As an operator is more intuitive than <codeclass="docutils literal"><spanclass="pre">RNNOp</span></code>,</li>
<li>Offers a new interface <codeclass="docutils literal"><spanclass="pre">Eval(targets)</span></code> to deduce the minimal block to <codeclass="docutils literal"><spanclass="pre">Run</span></code>,</li>
<li>Offers a new interface <codeclass="docutils literal"><spanclass="pre">Eval(targets)</span></code> to deduce the minimal block to <codeclass="docutils literal"><spanclass="pre">Run</span></code>,</li>