提交 232aae0c 编写于 作者: T Travis CI

Deploy to GitHub Pages: 4d93f0fc

上级 1c192cb3
# Design Doc: Support new Device/Library
# Design Doc: Supporting new Device/Library
## Background
Deep learning has a high demand for computing resources. New high-performance device and computing library are coming constantly. The deep learning framework has to integrate these high-performance device and computing library flexibly.
Deep learning has a high demand for computing resources. New high-performance devices and computing libraries are appearing very frequently. Deep learning frameworks have to integrate these high-performance devices and computing libraries flexibly and efficiently.
On the one hand, hardware and computing library are not usually one-to-one coresponding relations. For example, in Intel CPU, there are Eigen and MKL computing library. And in Nvidia GPU, there are Eigen and cuDNN computing library. We have to implement specific kernels for an operator for each computing library.
On one hand, hardware and computing libraries usually do not have a one-to-one correspondence. For example,Intel CPUs support Eigen and MKL computing libraries while Nvidia GPUs support Eigen and cuDNN computing libraries. We have to implement operator specific kernels for each computing library.
On the other hand, users usually do not want to care about the low-level hardware and computing library when writing a neural network configuration. In Fluid, `Layer` is exposed in `Python`, and `Operator` is exposed in `C++`. Both `Layer` and `Operator` are independent on hardwares.
On the other hand, users usually do not want to care about the low-level hardware and computing libraries when writing a neural network configuration. In Fluid, `Layer` is exposed in `Python`, and `Operator` is exposed in `C++`. Both `Layer` and `Operator` are hardware independent.
So, how to support a new Device/Library in Fluid becomes a challenge.
## Basic: Integrate A New Device/Library
For a general overview of fluid, please refer to [overview doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md).
For a general overview of fluid, please refer to the [overview doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md).
There are mainly there parts we have to consider in integrating a new device/library:
There are mainly three parts that we have to consider while integrating a new device/library:
- Place and DeviceContext: indicates the device id and manages hardware resources
- Memory and Tensor: malloc/free data on certain device
- Math Functor and OpKernel: implement computing unit on certain device/library
- Math Functor and OpKernel: implement computing unit on certain devices/libraries
### Place and DeviceContext
#### Place
Fluid use class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent specific device and computing library. There are inheritance relationships between different kinds of `Place`.
Fluid uses class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent different devices and computing libraries. There are inheritance relationships between different kinds of `Place`.
```
| CPUPlace --> MKLDNNPlace
......@@ -43,7 +43,7 @@ typedef boost::variant<CUDAPlace, CPUPlace, FPGAPlace> Place;
#### DeviceContext
Fluid use class [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30) to manage the resources in certain hardware, such as CUDA stream in `CDUADeviceContext`. There are also inheritance relationships between different kinds of `DeviceContext`.
Fluid uses class [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30) to manage the resources in different hardwares, such as CUDA stream in `CDUADeviceContext`. There are also inheritance relationships between different kinds of `DeviceContext`.
```
......@@ -52,7 +52,7 @@ DeviceContext ----> CUDADeviceContext --> CUDNNDeviceContext
\-> FPGADeviceContext
```
A example of Nvidia GPU is as follows:
An example of Nvidia GPU is as follows:
- DeviceContext
......@@ -93,7 +93,7 @@ class CUDNNDeviceContext : public CUDADeviceContext {
#### memory module
Fluid provide following [memory interfaces](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36):
Fluid provides the following [memory interfaces](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36):
```
template <typename Place>
......@@ -106,12 +106,12 @@ template <typename Place>
size_t Used(Place place);
```
To implementing these interfaces, we have to implement MemoryAllocator for specific Device
To implementing these interfaces, we have to implement MemoryAllocator for different Devices
#### Tensor
[Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36) holds data with some shape in certain Place.
[Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36) holds data with some shape in a specific Place.
```cpp
class Tensor {
......@@ -168,7 +168,7 @@ t.mutable_data(place);
### Math Functor and OpKernel
Fluid implements computing unit based on different DeviceContext. Some computing unit is shared between operators. These common part will be put in operators/math directory as basic Functors.
Fluid implements computing units based on different DeviceContexts. Some computing units are shared between operators. This common part will be put in operators/math directory as basic Functors.
Let's take [MaxOutFunctor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/math/maxouting.h#L27) as an example:
......@@ -183,7 +183,7 @@ class MaxOutFunctor {
};
```
CPU implement in .cc file
CPU implemention is in .cc file
```
template <typename T>
......@@ -197,7 +197,7 @@ class MaxOutFunctor<platform::CPUDeviceContext, T> {
};
```
CUDA implement in .cu file
CUDA implemention is in .cu file
```
template <typename T>
......@@ -212,11 +212,11 @@ class MaxOutFunctor<platform::CUDADeviceContext, T> {
```
We get computing handle from concrete DeviceContext, and make compution on tensors.
We get computing handle from a concrete DeviceContext, and make compution on tensors.
The implement of `OpKernel` is similar to math functors, the extra thing we need to do is registering the OpKernel to global map.
The implemention of `OpKernel` is similar to math functors, the extra thing we need to do is to register the OpKernel in a global map.
Fluid provides different register interface in op_registry.h
Fluid provides different register interfaces in op_registry.h
Let's take [Crop](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/crop_op.cc#L134) operator as an example:
......@@ -240,7 +240,7 @@ REGISTER_OP_CUDA_KERNEL(
## Advanced topics: How to switch between different Device/Library
Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale in a specific Device. For example, crf operator can be only run at CPU, whereas most other operators can be run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.
Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale on a specific Device. For example, crf operator can only run on CPU, whereas most other operators can run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.
We will discuss how to implement an efficient OpKernel switch policy.
......
......@@ -8,7 +8,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Design Doc: Support new Device/Library &mdash; PaddlePaddle documentation</title>
<title>Design Doc: Supporting new Device/Library &mdash; PaddlePaddle documentation</title>
......@@ -194,7 +194,7 @@
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li>Design Doc: Support new Device/Library</li>
<li>Design Doc: Supporting new Device/Library</li>
</ul>
</div>
......@@ -203,29 +203,29 @@
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="design-doc-support-new-device-library">
<span id="design-doc-support-new-device-library"></span><h1>Design Doc: Support new Device/Library<a class="headerlink" href="#design-doc-support-new-device-library" title="Permalink to this headline"></a></h1>
<div class="section" id="design-doc-supporting-new-device-library">
<span id="design-doc-supporting-new-device-library"></span><h1>Design Doc: Supporting new Device/Library<a class="headerlink" href="#design-doc-supporting-new-device-library" title="Permalink to this headline"></a></h1>
<div class="section" id="background">
<span id="background"></span><h2>Background<a class="headerlink" href="#background" title="Permalink to this headline"></a></h2>
<p>Deep learning has a high demand for computing resources. New high-performance device and computing library are coming constantly. The deep learning framework has to integrate these high-performance device and computing library flexibly.</p>
<p>On the one hand, hardware and computing library are not usually one-to-one coresponding relations. For example, in Intel CPU, there are Eigen and MKL computing library. And in Nvidia GPU, there are Eigen and cuDNN computing library. We have to implement specific kernels for an operator for each computing library.</p>
<p>On the other hand, users usually do not want to care about the low-level hardware and computing library when writing a neural network configuration. In Fluid, <code class="docutils literal"><span class="pre">Layer</span></code> is exposed in <code class="docutils literal"><span class="pre">Python</span></code>, and <code class="docutils literal"><span class="pre">Operator</span></code> is exposed in <code class="docutils literal"><span class="pre">C++</span></code>. Both <code class="docutils literal"><span class="pre">Layer</span></code> and <code class="docutils literal"><span class="pre">Operator</span></code> are independent on hardwares.</p>
<p>Deep learning has a high demand for computing resources. New high-performance devices and computing libraries are appearing very frequently. Deep learning frameworks have to integrate these high-performance devices and computing libraries flexibly and efficiently.</p>
<p>On one hand, hardware and computing libraries usually do not have a one-to-one correspondence. For example,Intel CPUs support Eigen and MKL computing libraries while Nvidia GPUs support Eigen and cuDNN computing libraries. We have to implement operator specific kernels for each computing library.</p>
<p>On the other hand, users usually do not want to care about the low-level hardware and computing libraries when writing a neural network configuration. In Fluid, <code class="docutils literal"><span class="pre">Layer</span></code> is exposed in <code class="docutils literal"><span class="pre">Python</span></code>, and <code class="docutils literal"><span class="pre">Operator</span></code> is exposed in <code class="docutils literal"><span class="pre">C++</span></code>. Both <code class="docutils literal"><span class="pre">Layer</span></code> and <code class="docutils literal"><span class="pre">Operator</span></code> are hardware independent.</p>
<p>So, how to support a new Device/Library in Fluid becomes a challenge.</p>
</div>
<div class="section" id="basic-integrate-a-new-device-library">
<span id="basic-integrate-a-new-device-library"></span><h2>Basic: Integrate A New Device/Library<a class="headerlink" href="#basic-integrate-a-new-device-library" title="Permalink to this headline"></a></h2>
<p>For a general overview of fluid, please refer to <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md">overview doc</a>.</p>
<p>There are mainly there parts we have to consider in integrating a new device/library:</p>
<p>For a general overview of fluid, please refer to the <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md">overview doc</a>.</p>
<p>There are mainly three parts that we have to consider while integrating a new device/library:</p>
<ul class="simple">
<li>Place and DeviceContext: indicates the device id and manages hardware resources</li>
<li>Memory and Tensor: malloc/free data on certain device</li>
<li>Math Functor and OpKernel: implement computing unit on certain device/library</li>
<li>Math Functor and OpKernel: implement computing unit on certain devices/libraries</li>
</ul>
<div class="section" id="place-and-devicecontext">
<span id="place-and-devicecontext"></span><h3>Place and DeviceContext<a class="headerlink" href="#place-and-devicecontext" title="Permalink to this headline"></a></h3>
<div class="section" id="place">
<span id="place"></span><h4>Place<a class="headerlink" href="#place" title="Permalink to this headline"></a></h4>
<p>Fluid use class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55">Place</a> to represent specific device and computing library. There are inheritance relationships between different kinds of <code class="docutils literal"><span class="pre">Place</span></code>.</p>
<p>Fluid uses class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55">Place</a> to represent different devices and computing libraries. There are inheritance relationships between different kinds of <code class="docutils literal"><span class="pre">Place</span></code>.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span> <span class="o">|</span> <span class="n">CPUPlace</span> <span class="o">--&gt;</span> <span class="n">MKLDNNPlace</span>
<span class="n">Place</span> <span class="o">--|</span> <span class="n">CUDAPlace</span> <span class="o">--&gt;</span> <span class="n">CUDNNPlace</span>
<span class="o">|</span> <span class="n">FPGAPlace</span>
......@@ -238,13 +238,13 @@
</div>
<div class="section" id="devicecontext">
<span id="devicecontext"></span><h4>DeviceContext<a class="headerlink" href="#devicecontext" title="Permalink to this headline"></a></h4>
<p>Fluid use class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30">DeviceContext</a> to manage the resources in certain hardware, such as CUDA stream in <code class="docutils literal"><span class="pre">CDUADeviceContext</span></code>. There are also inheritance relationships between different kinds of <code class="docutils literal"><span class="pre">DeviceContext</span></code>.</p>
<p>Fluid uses class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30">DeviceContext</a> to manage the resources in different hardwares, such as CUDA stream in <code class="docutils literal"><span class="pre">CDUADeviceContext</span></code>. There are also inheritance relationships between different kinds of <code class="docutils literal"><span class="pre">DeviceContext</span></code>.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span> <span class="o">/-&gt;</span> <span class="n">CPUDeviceContext</span> <span class="o">--&gt;</span> <span class="n">MKLDeviceContext</span>
<span class="n">DeviceContext</span> <span class="o">----&gt;</span> <span class="n">CUDADeviceContext</span> <span class="o">--&gt;</span> <span class="n">CUDNNDeviceContext</span>
\<span class="o">-&gt;</span> <span class="n">FPGADeviceContext</span>
</pre></div>
</div>
<p>A example of Nvidia GPU is as follows:</p>
<p>An example of Nvidia GPU is as follows:</p>
<ul class="simple">
<li>DeviceContext</li>
</ul>
......@@ -281,7 +281,7 @@
<span id="memory-and-tensor"></span><h3>Memory and Tensor<a class="headerlink" href="#memory-and-tensor" title="Permalink to this headline"></a></h3>
<div class="section" id="memory-module">
<span id="memory-module"></span><h4>memory module<a class="headerlink" href="#memory-module" title="Permalink to this headline"></a></h4>
<p>Fluid provide following <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36">memory interfaces</a>:</p>
<p>Fluid provides the following <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36">memory interfaces</a>:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">template</span> <span class="o">&lt;</span><span class="n">typename</span> <span class="n">Place</span><span class="o">&gt;</span>
<span class="n">void</span><span class="o">*</span> <span class="n">Alloc</span><span class="p">(</span><span class="n">Place</span> <span class="n">place</span><span class="p">,</span> <span class="n">size_t</span> <span class="n">size</span><span class="p">);</span>
......@@ -292,11 +292,11 @@
<span class="n">size_t</span> <span class="n">Used</span><span class="p">(</span><span class="n">Place</span> <span class="n">place</span><span class="p">);</span>
</pre></div>
</div>
<p>To implementing these interfaces, we have to implement MemoryAllocator for specific Device</p>
<p>To implementing these interfaces, we have to implement MemoryAllocator for different Devices</p>
</div>
<div class="section" id="tensor">
<span id="tensor"></span><h4>Tensor<a class="headerlink" href="#tensor" title="Permalink to this headline"></a></h4>
<p><a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36">Tensor</a> holds data with some shape in certain Place.</p>
<p><a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36">Tensor</a> holds data with some shape in a specific Place.</p>
<div class="highlight-cpp"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Tensor</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="cm">/*! Return a pointer to mutable memory block. */</span>
......@@ -349,7 +349,7 @@
</div>
<div class="section" id="math-functor-and-opkernel">
<span id="math-functor-and-opkernel"></span><h3>Math Functor and OpKernel<a class="headerlink" href="#math-functor-and-opkernel" title="Permalink to this headline"></a></h3>
<p>Fluid implements computing unit based on different DeviceContext. Some computing unit is shared between operators. These common part will be put in operators/math directory as basic Functors.</p>
<p>Fluid implements computing units based on different DeviceContexts. Some computing units are shared between operators. This common part will be put in operators/math directory as basic Functors.</p>
<p>Let&#8217;s take <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/math/maxouting.h#L27">MaxOutFunctor</a> as an example:</p>
<p>The interface is defined in header file.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">template</span> <span class="o">&lt;</span><span class="n">typename</span> <span class="n">DeviceContext</span><span class="p">,</span> <span class="n">typename</span> <span class="n">T</span><span class="o">&gt;</span>
......@@ -360,7 +360,7 @@
<span class="p">};</span>
</pre></div>
</div>
<p>CPU implement in .cc file</p>
<p>CPU implemention is in .cc file</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">template</span> <span class="o">&lt;</span><span class="n">typename</span> <span class="n">T</span><span class="o">&gt;</span>
<span class="k">class</span> <span class="nc">MaxOutFunctor</span><span class="o">&lt;</span><span class="n">platform</span><span class="p">::</span><span class="n">CPUDeviceContext</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
<span class="n">public</span><span class="p">:</span>
......@@ -372,7 +372,7 @@
<span class="p">};</span>
</pre></div>
</div>
<p>CUDA implement in .cu file</p>
<p>CUDA implemention is in .cu file</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">template</span> <span class="o">&lt;</span><span class="n">typename</span> <span class="n">T</span><span class="o">&gt;</span>
<span class="k">class</span> <span class="nc">MaxOutFunctor</span><span class="o">&lt;</span><span class="n">platform</span><span class="p">::</span><span class="n">CUDADeviceContext</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
<span class="n">public</span><span class="p">:</span>
......@@ -384,9 +384,9 @@
<span class="p">};</span>
</pre></div>
</div>
<p>We get computing handle from concrete DeviceContext, and make compution on tensors.</p>
<p>The implement of <code class="docutils literal"><span class="pre">OpKernel</span></code> is similar to math functors, the extra thing we need to do is registering the OpKernel to global map.</p>
<p>Fluid provides different register interface in op_registry.h</p>
<p>We get computing handle from a concrete DeviceContext, and make compution on tensors.</p>
<p>The implemention of <code class="docutils literal"><span class="pre">OpKernel</span></code> is similar to math functors, the extra thing we need to do is to register the OpKernel in a global map.</p>
<p>Fluid provides different register interfaces in op_registry.h</p>
<p>Let&#8217;s take <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/crop_op.cc#L134">Crop</a> operator as an example:</p>
<p>In .cc file:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">REGISTER_OP_CPU_KERNEL</span><span class="p">(</span><span class="n">crop</span><span class="p">,</span> <span class="n">ops</span><span class="p">::</span><span class="n">CropKernel</span><span class="o">&lt;</span><span class="nb">float</span><span class="o">&gt;</span><span class="p">);</span>
......@@ -404,7 +404,7 @@
</div>
<div class="section" id="advanced-topics-how-to-switch-between-different-device-library">
<span id="advanced-topics-how-to-switch-between-different-device-library"></span><h2>Advanced topics: How to switch between different Device/Library<a class="headerlink" href="#advanced-topics-how-to-switch-between-different-device-library" title="Permalink to this headline"></a></h2>
<p>Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale in a specific Device. For example, crf operator can be only run at CPU, whereas most other operators can be run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.</p>
<p>Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale on a specific Device. For example, crf operator can only run on CPU, whereas most other operators can run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.</p>
<p>We will discuss how to implement an efficient OpKernel switch policy.</p>
<ul class="simple">
<li>TBD</li>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
# Design Doc: Support new Device/Library
# Design Doc: Supporting new Device/Library
## Background
Deep learning has a high demand for computing resources. New high-performance device and computing library are coming constantly. The deep learning framework has to integrate these high-performance device and computing library flexibly.
Deep learning has a high demand for computing resources. New high-performance devices and computing libraries are appearing very frequently. Deep learning frameworks have to integrate these high-performance devices and computing libraries flexibly and efficiently.
On the one hand, hardware and computing library are not usually one-to-one coresponding relations. For example, in Intel CPU, there are Eigen and MKL computing library. And in Nvidia GPU, there are Eigen and cuDNN computing library. We have to implement specific kernels for an operator for each computing library.
On one hand, hardware and computing libraries usually do not have a one-to-one correspondence. For example,Intel CPUs support Eigen and MKL computing libraries while Nvidia GPUs support Eigen and cuDNN computing libraries. We have to implement operator specific kernels for each computing library.
On the other hand, users usually do not want to care about the low-level hardware and computing library when writing a neural network configuration. In Fluid, `Layer` is exposed in `Python`, and `Operator` is exposed in `C++`. Both `Layer` and `Operator` are independent on hardwares.
On the other hand, users usually do not want to care about the low-level hardware and computing libraries when writing a neural network configuration. In Fluid, `Layer` is exposed in `Python`, and `Operator` is exposed in `C++`. Both `Layer` and `Operator` are hardware independent.
So, how to support a new Device/Library in Fluid becomes a challenge.
## Basic: Integrate A New Device/Library
For a general overview of fluid, please refer to [overview doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md).
For a general overview of fluid, please refer to the [overview doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md).
There are mainly there parts we have to consider in integrating a new device/library:
There are mainly three parts that we have to consider while integrating a new device/library:
- Place and DeviceContext: indicates the device id and manages hardware resources
- Memory and Tensor: malloc/free data on certain device
- Math Functor and OpKernel: implement computing unit on certain device/library
- Math Functor and OpKernel: implement computing unit on certain devices/libraries
### Place and DeviceContext
#### Place
Fluid use class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent specific device and computing library. There are inheritance relationships between different kinds of `Place`.
Fluid uses class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent different devices and computing libraries. There are inheritance relationships between different kinds of `Place`.
```
| CPUPlace --> MKLDNNPlace
......@@ -43,7 +43,7 @@ typedef boost::variant<CUDAPlace, CPUPlace, FPGAPlace> Place;
#### DeviceContext
Fluid use class [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30) to manage the resources in certain hardware, such as CUDA stream in `CDUADeviceContext`. There are also inheritance relationships between different kinds of `DeviceContext`.
Fluid uses class [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30) to manage the resources in different hardwares, such as CUDA stream in `CDUADeviceContext`. There are also inheritance relationships between different kinds of `DeviceContext`.
```
......@@ -52,7 +52,7 @@ DeviceContext ----> CUDADeviceContext --> CUDNNDeviceContext
\-> FPGADeviceContext
```
A example of Nvidia GPU is as follows:
An example of Nvidia GPU is as follows:
- DeviceContext
......@@ -93,7 +93,7 @@ class CUDNNDeviceContext : public CUDADeviceContext {
#### memory module
Fluid provide following [memory interfaces](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36):
Fluid provides the following [memory interfaces](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36):
```
template <typename Place>
......@@ -106,12 +106,12 @@ template <typename Place>
size_t Used(Place place);
```
To implementing these interfaces, we have to implement MemoryAllocator for specific Device
To implementing these interfaces, we have to implement MemoryAllocator for different Devices
#### Tensor
[Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36) holds data with some shape in certain Place.
[Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36) holds data with some shape in a specific Place.
```cpp
class Tensor {
......@@ -168,7 +168,7 @@ t.mutable_data(place);
### Math Functor and OpKernel
Fluid implements computing unit based on different DeviceContext. Some computing unit is shared between operators. These common part will be put in operators/math directory as basic Functors.
Fluid implements computing units based on different DeviceContexts. Some computing units are shared between operators. This common part will be put in operators/math directory as basic Functors.
Let's take [MaxOutFunctor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/math/maxouting.h#L27) as an example:
......@@ -183,7 +183,7 @@ class MaxOutFunctor {
};
```
CPU implement in .cc file
CPU implemention is in .cc file
```
template <typename T>
......@@ -197,7 +197,7 @@ class MaxOutFunctor<platform::CPUDeviceContext, T> {
};
```
CUDA implement in .cu file
CUDA implemention is in .cu file
```
template <typename T>
......@@ -212,11 +212,11 @@ class MaxOutFunctor<platform::CUDADeviceContext, T> {
```
We get computing handle from concrete DeviceContext, and make compution on tensors.
We get computing handle from a concrete DeviceContext, and make compution on tensors.
The implement of `OpKernel` is similar to math functors, the extra thing we need to do is registering the OpKernel to global map.
The implemention of `OpKernel` is similar to math functors, the extra thing we need to do is to register the OpKernel in a global map.
Fluid provides different register interface in op_registry.h
Fluid provides different register interfaces in op_registry.h
Let's take [Crop](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/crop_op.cc#L134) operator as an example:
......@@ -240,7 +240,7 @@ REGISTER_OP_CUDA_KERNEL(
## Advanced topics: How to switch between different Device/Library
Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale in a specific Device. For example, crf operator can be only run at CPU, whereas most other operators can be run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.
Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale on a specific Device. For example, crf operator can only run on CPU, whereas most other operators can run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.
We will discuss how to implement an efficient OpKernel switch policy.
......
......@@ -8,7 +8,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Design Doc: Support new Device/Library &mdash; PaddlePaddle 文档</title>
<title>Design Doc: Supporting new Device/Library &mdash; PaddlePaddle 文档</title>
......@@ -195,7 +195,7 @@
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li>Design Doc: Support new Device/Library</li>
<li>Design Doc: Supporting new Device/Library</li>
</ul>
</div>
......@@ -204,29 +204,29 @@
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="design-doc-support-new-device-library">
<span id="design-doc-support-new-device-library"></span><h1>Design Doc: Support new Device/Library<a class="headerlink" href="#design-doc-support-new-device-library" title="永久链接至标题"></a></h1>
<div class="section" id="design-doc-supporting-new-device-library">
<span id="design-doc-supporting-new-device-library"></span><h1>Design Doc: Supporting new Device/Library<a class="headerlink" href="#design-doc-supporting-new-device-library" title="永久链接至标题"></a></h1>
<div class="section" id="background">
<span id="background"></span><h2>Background<a class="headerlink" href="#background" title="永久链接至标题"></a></h2>
<p>Deep learning has a high demand for computing resources. New high-performance device and computing library are coming constantly. The deep learning framework has to integrate these high-performance device and computing library flexibly.</p>
<p>On the one hand, hardware and computing library are not usually one-to-one coresponding relations. For example, in Intel CPU, there are Eigen and MKL computing library. And in Nvidia GPU, there are Eigen and cuDNN computing library. We have to implement specific kernels for an operator for each computing library.</p>
<p>On the other hand, users usually do not want to care about the low-level hardware and computing library when writing a neural network configuration. In Fluid, <code class="docutils literal"><span class="pre">Layer</span></code> is exposed in <code class="docutils literal"><span class="pre">Python</span></code>, and <code class="docutils literal"><span class="pre">Operator</span></code> is exposed in <code class="docutils literal"><span class="pre">C++</span></code>. Both <code class="docutils literal"><span class="pre">Layer</span></code> and <code class="docutils literal"><span class="pre">Operator</span></code> are independent on hardwares.</p>
<p>Deep learning has a high demand for computing resources. New high-performance devices and computing libraries are appearing very frequently. Deep learning frameworks have to integrate these high-performance devices and computing libraries flexibly and efficiently.</p>
<p>On one hand, hardware and computing libraries usually do not have a one-to-one correspondence. For example,Intel CPUs support Eigen and MKL computing libraries while Nvidia GPUs support Eigen and cuDNN computing libraries. We have to implement operator specific kernels for each computing library.</p>
<p>On the other hand, users usually do not want to care about the low-level hardware and computing libraries when writing a neural network configuration. In Fluid, <code class="docutils literal"><span class="pre">Layer</span></code> is exposed in <code class="docutils literal"><span class="pre">Python</span></code>, and <code class="docutils literal"><span class="pre">Operator</span></code> is exposed in <code class="docutils literal"><span class="pre">C++</span></code>. Both <code class="docutils literal"><span class="pre">Layer</span></code> and <code class="docutils literal"><span class="pre">Operator</span></code> are hardware independent.</p>
<p>So, how to support a new Device/Library in Fluid becomes a challenge.</p>
</div>
<div class="section" id="basic-integrate-a-new-device-library">
<span id="basic-integrate-a-new-device-library"></span><h2>Basic: Integrate A New Device/Library<a class="headerlink" href="#basic-integrate-a-new-device-library" title="永久链接至标题"></a></h2>
<p>For a general overview of fluid, please refer to <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md">overview doc</a>.</p>
<p>There are mainly there parts we have to consider in integrating a new device/library:</p>
<p>For a general overview of fluid, please refer to the <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md">overview doc</a>.</p>
<p>There are mainly three parts that we have to consider while integrating a new device/library:</p>
<ul class="simple">
<li>Place and DeviceContext: indicates the device id and manages hardware resources</li>
<li>Memory and Tensor: malloc/free data on certain device</li>
<li>Math Functor and OpKernel: implement computing unit on certain device/library</li>
<li>Math Functor and OpKernel: implement computing unit on certain devices/libraries</li>
</ul>
<div class="section" id="place-and-devicecontext">
<span id="place-and-devicecontext"></span><h3>Place and DeviceContext<a class="headerlink" href="#place-and-devicecontext" title="永久链接至标题"></a></h3>
<div class="section" id="place">
<span id="place"></span><h4>Place<a class="headerlink" href="#place" title="永久链接至标题"></a></h4>
<p>Fluid use class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55">Place</a> to represent specific device and computing library. There are inheritance relationships between different kinds of <code class="docutils literal"><span class="pre">Place</span></code>.</p>
<p>Fluid uses class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55">Place</a> to represent different devices and computing libraries. There are inheritance relationships between different kinds of <code class="docutils literal"><span class="pre">Place</span></code>.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span> <span class="o">|</span> <span class="n">CPUPlace</span> <span class="o">--&gt;</span> <span class="n">MKLDNNPlace</span>
<span class="n">Place</span> <span class="o">--|</span> <span class="n">CUDAPlace</span> <span class="o">--&gt;</span> <span class="n">CUDNNPlace</span>
<span class="o">|</span> <span class="n">FPGAPlace</span>
......@@ -239,13 +239,13 @@
</div>
<div class="section" id="devicecontext">
<span id="devicecontext"></span><h4>DeviceContext<a class="headerlink" href="#devicecontext" title="永久链接至标题"></a></h4>
<p>Fluid use class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30">DeviceContext</a> to manage the resources in certain hardware, such as CUDA stream in <code class="docutils literal"><span class="pre">CDUADeviceContext</span></code>. There are also inheritance relationships between different kinds of <code class="docutils literal"><span class="pre">DeviceContext</span></code>.</p>
<p>Fluid uses class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30">DeviceContext</a> to manage the resources in different hardwares, such as CUDA stream in <code class="docutils literal"><span class="pre">CDUADeviceContext</span></code>. There are also inheritance relationships between different kinds of <code class="docutils literal"><span class="pre">DeviceContext</span></code>.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span> <span class="o">/-&gt;</span> <span class="n">CPUDeviceContext</span> <span class="o">--&gt;</span> <span class="n">MKLDeviceContext</span>
<span class="n">DeviceContext</span> <span class="o">----&gt;</span> <span class="n">CUDADeviceContext</span> <span class="o">--&gt;</span> <span class="n">CUDNNDeviceContext</span>
\<span class="o">-&gt;</span> <span class="n">FPGADeviceContext</span>
</pre></div>
</div>
<p>A example of Nvidia GPU is as follows:</p>
<p>An example of Nvidia GPU is as follows:</p>
<ul class="simple">
<li>DeviceContext</li>
</ul>
......@@ -282,7 +282,7 @@
<span id="memory-and-tensor"></span><h3>Memory and Tensor<a class="headerlink" href="#memory-and-tensor" title="永久链接至标题"></a></h3>
<div class="section" id="memory-module">
<span id="memory-module"></span><h4>memory module<a class="headerlink" href="#memory-module" title="永久链接至标题"></a></h4>
<p>Fluid provide following <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36">memory interfaces</a>:</p>
<p>Fluid provides the following <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36">memory interfaces</a>:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">template</span> <span class="o">&lt;</span><span class="n">typename</span> <span class="n">Place</span><span class="o">&gt;</span>
<span class="n">void</span><span class="o">*</span> <span class="n">Alloc</span><span class="p">(</span><span class="n">Place</span> <span class="n">place</span><span class="p">,</span> <span class="n">size_t</span> <span class="n">size</span><span class="p">);</span>
......@@ -293,11 +293,11 @@
<span class="n">size_t</span> <span class="n">Used</span><span class="p">(</span><span class="n">Place</span> <span class="n">place</span><span class="p">);</span>
</pre></div>
</div>
<p>To implementing these interfaces, we have to implement MemoryAllocator for specific Device</p>
<p>To implementing these interfaces, we have to implement MemoryAllocator for different Devices</p>
</div>
<div class="section" id="tensor">
<span id="tensor"></span><h4>Tensor<a class="headerlink" href="#tensor" title="永久链接至标题"></a></h4>
<p><a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36">Tensor</a> holds data with some shape in certain Place.</p>
<p><a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36">Tensor</a> holds data with some shape in a specific Place.</p>
<div class="highlight-cpp"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Tensor</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="cm">/*! Return a pointer to mutable memory block. */</span>
......@@ -350,7 +350,7 @@
</div>
<div class="section" id="math-functor-and-opkernel">
<span id="math-functor-and-opkernel"></span><h3>Math Functor and OpKernel<a class="headerlink" href="#math-functor-and-opkernel" title="永久链接至标题"></a></h3>
<p>Fluid implements computing unit based on different DeviceContext. Some computing unit is shared between operators. These common part will be put in operators/math directory as basic Functors.</p>
<p>Fluid implements computing units based on different DeviceContexts. Some computing units are shared between operators. This common part will be put in operators/math directory as basic Functors.</p>
<p>Let&#8217;s take <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/math/maxouting.h#L27">MaxOutFunctor</a> as an example:</p>
<p>The interface is defined in header file.</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">template</span> <span class="o">&lt;</span><span class="n">typename</span> <span class="n">DeviceContext</span><span class="p">,</span> <span class="n">typename</span> <span class="n">T</span><span class="o">&gt;</span>
......@@ -361,7 +361,7 @@
<span class="p">};</span>
</pre></div>
</div>
<p>CPU implement in .cc file</p>
<p>CPU implemention is in .cc file</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">template</span> <span class="o">&lt;</span><span class="n">typename</span> <span class="n">T</span><span class="o">&gt;</span>
<span class="k">class</span> <span class="nc">MaxOutFunctor</span><span class="o">&lt;</span><span class="n">platform</span><span class="p">::</span><span class="n">CPUDeviceContext</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
<span class="n">public</span><span class="p">:</span>
......@@ -373,7 +373,7 @@
<span class="p">};</span>
</pre></div>
</div>
<p>CUDA implement in .cu file</p>
<p>CUDA implemention is in .cu file</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">template</span> <span class="o">&lt;</span><span class="n">typename</span> <span class="n">T</span><span class="o">&gt;</span>
<span class="k">class</span> <span class="nc">MaxOutFunctor</span><span class="o">&lt;</span><span class="n">platform</span><span class="p">::</span><span class="n">CUDADeviceContext</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
<span class="n">public</span><span class="p">:</span>
......@@ -385,9 +385,9 @@
<span class="p">};</span>
</pre></div>
</div>
<p>We get computing handle from concrete DeviceContext, and make compution on tensors.</p>
<p>The implement of <code class="docutils literal"><span class="pre">OpKernel</span></code> is similar to math functors, the extra thing we need to do is registering the OpKernel to global map.</p>
<p>Fluid provides different register interface in op_registry.h</p>
<p>We get computing handle from a concrete DeviceContext, and make compution on tensors.</p>
<p>The implemention of <code class="docutils literal"><span class="pre">OpKernel</span></code> is similar to math functors, the extra thing we need to do is to register the OpKernel in a global map.</p>
<p>Fluid provides different register interfaces in op_registry.h</p>
<p>Let&#8217;s take <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/crop_op.cc#L134">Crop</a> operator as an example:</p>
<p>In .cc file:</p>
<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">REGISTER_OP_CPU_KERNEL</span><span class="p">(</span><span class="n">crop</span><span class="p">,</span> <span class="n">ops</span><span class="p">::</span><span class="n">CropKernel</span><span class="o">&lt;</span><span class="nb">float</span><span class="o">&gt;</span><span class="p">);</span>
......@@ -405,7 +405,7 @@
</div>
<div class="section" id="advanced-topics-how-to-switch-between-different-device-library">
<span id="advanced-topics-how-to-switch-between-different-device-library"></span><h2>Advanced topics: How to switch between different Device/Library<a class="headerlink" href="#advanced-topics-how-to-switch-between-different-device-library" title="永久链接至标题"></a></h2>
<p>Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale in a specific Device. For example, crf operator can be only run at CPU, whereas most other operators can be run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.</p>
<p>Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale on a specific Device. For example, crf operator can only run on CPU, whereas most other operators can run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.</p>
<p>We will discuss how to implement an efficient OpKernel switch policy.</p>
<ul class="simple">
<li>TBD</li>
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册