Deep learning has a high demand for computing resources. New high-performance device and computing library are coming constantly. The deep learning framework has to integrate these high-performance device and computing library flexibly.
On the one hand, hardware and computing library are not usually one-to-one coresponding relations. For example, in Intel CPU, there are Eigen and MKL computing library. And in Nvidia GPU, there are Eigen and cuDNN computing library. We have to implement specific kernels for an operator for each computing library.
On the other hand, users usually do not want to care about the low-level hardware and computing library when writing a neural network configuration. In Fluid, `Layer` is exposed in `Python`, and `Operator` is exposed in `C++`. Both `Layer` and `Operator` are independent on hardwares.
So, how to support a new Device/Library in Fluid becomes a challenge.
## Basic: Integrate A New Device/Library
For a general overview of fluid, please refer to [overview doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md).
There are mainly there parts we have to consider in integrating a new device/library:
- Place and DeviceContext: indicates the device id and manages hardware resources
- Memory and Tensor: malloc/free data on certain device
- Math Functor and OpKernel: implement computing unit on certain device/library
### Place and DeviceContext
#### Place
Fluid use class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent specific device and computing library. There are inheritance relationships between different kinds of `Place`.
Fluid use class [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30) to manage the resources in certain hardware, such as CUDA stream in `CDUADeviceContext`. There are also inheritance relationships between different kinds of `DeviceContext`.
`Placeholder` is used to delay memory allocation; that is, we can first define a tensor, using `Resize` to configure its shape, and then call `mutuable_data` to allocate the actual memory.
```cpp
paddle::framework::Tensor t;
paddle::platform::CPUPlace place;
// set size first
t.Resize({2, 3});
// allocate memory on CPU later
t.mutable_data(place);
```
### Math Functor and OpKernel
Fluid implements computing unit based on different DeviceContext. Some computing unit is shared between operators. These common part will be put in operators/math directory as basic Functors.
Let's take [MaxOutFunctor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/math/maxouting.h#L27) as an example:
## Advanced topics: How to switch between different Device/Library
Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale in a specific Device. For example, crf operator can be only run at CPU, whereas most other operators can be run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.
We will discuss how to implement an efficient OpKernel switch policy.
<liclass="toctree-l2"><aclass="reference internal"href="../howto/usage/k8s/k8s_en.html">Paddle On Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/usage/k8s/k8s_aws_en.html">Distributed PaddlePaddle Training on AWS with Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/dev/new_layer_en.html">Write New Layers</a></li>
<spanid="design-doc-support-new-device-library"></span><h1>Design Doc: Support new Device/Library<aclass="headerlink"href="#design-doc-support-new-device-library"title="Permalink to this headline">¶</a></h1>
<divclass="section"id="background">
<spanid="background"></span><h2>Background<aclass="headerlink"href="#background"title="Permalink to this headline">¶</a></h2>
<p>Deep learning has a high demand for computing resources. New high-performance device and computing library are coming constantly. The deep learning framework has to integrate these high-performance device and computing library flexibly.</p>
<p>On the one hand, hardware and computing library are not usually one-to-one coresponding relations. For example, in Intel CPU, there are Eigen and MKL computing library. And in Nvidia GPU, there are Eigen and cuDNN computing library. We have to implement specific kernels for an operator for each computing library.</p>
<p>On the other hand, users usually do not want to care about the low-level hardware and computing library when writing a neural network configuration. In Fluid, <codeclass="docutils literal"><spanclass="pre">Layer</span></code> is exposed in <codeclass="docutils literal"><spanclass="pre">Python</span></code>, and <codeclass="docutils literal"><spanclass="pre">Operator</span></code> is exposed in <codeclass="docutils literal"><spanclass="pre">C++</span></code>. Both <codeclass="docutils literal"><spanclass="pre">Layer</span></code> and <codeclass="docutils literal"><spanclass="pre">Operator</span></code> are independent on hardwares.</p>
<p>So, how to support a new Device/Library in Fluid becomes a challenge.</p>
<spanid="basic-integrate-a-new-device-library"></span><h2>Basic: Integrate A New Device/Library<aclass="headerlink"href="#basic-integrate-a-new-device-library"title="Permalink to this headline">¶</a></h2>
<p>For a general overview of fluid, please refer to <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md">overview doc</a>.</p>
<p>There are mainly there parts we have to consider in integrating a new device/library:</p>
<ulclass="simple">
<li>Place and DeviceContext: indicates the device id and manages hardware resources</li>
<li>Memory and Tensor: malloc/free data on certain device</li>
<li>Math Functor and OpKernel: implement computing unit on certain device/library</li>
</ul>
<divclass="section"id="place-and-devicecontext">
<spanid="place-and-devicecontext"></span><h3>Place and DeviceContext<aclass="headerlink"href="#place-and-devicecontext"title="Permalink to this headline">¶</a></h3>
<divclass="section"id="place">
<spanid="place"></span><h4>Place<aclass="headerlink"href="#place"title="Permalink to this headline">¶</a></h4>
<p>Fluid use class <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55">Place</a> to represent specific device and computing library. There are inheritance relationships between different kinds of <codeclass="docutils literal"><spanclass="pre">Place</span></code>.</p>
<spanid="devicecontext"></span><h4>DeviceContext<aclass="headerlink"href="#devicecontext"title="Permalink to this headline">¶</a></h4>
<p>Fluid use class <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30">DeviceContext</a> to manage the resources in certain hardware, such as CUDA stream in <codeclass="docutils literal"><spanclass="pre">CDUADeviceContext</span></code>. There are also inheritance relationships between different kinds of <codeclass="docutils literal"><spanclass="pre">DeviceContext</span></code>.</p>
<spanid="memory-and-tensor"></span><h3>Memory and Tensor<aclass="headerlink"href="#memory-and-tensor"title="Permalink to this headline">¶</a></h3>
<divclass="section"id="memory-module">
<spanid="memory-module"></span><h4>memory module<aclass="headerlink"href="#memory-module"title="Permalink to this headline">¶</a></h4>
<p>Fluid provide following <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36">memory interfaces</a>:</p>
<p>To implementing these interfaces, we have to implement MemoryAllocator for specific Device</p>
</div>
<divclass="section"id="tensor">
<spanid="tensor"></span><h4>Tensor<aclass="headerlink"href="#tensor"title="Permalink to this headline">¶</a></h4>
<p><aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36">Tensor</a> holds data with some shape in certain Place.</p>
<p><codeclass="docutils literal"><spanclass="pre">Placeholder</span></code> is used to delay memory allocation; that is, we can first define a tensor, using <codeclass="docutils literal"><spanclass="pre">Resize</span></code> to configure its shape, and then call <codeclass="docutils literal"><spanclass="pre">mutuable_data</span></code> to allocate the actual memory.</p>
<spanid="math-functor-and-opkernel"></span><h3>Math Functor and OpKernel<aclass="headerlink"href="#math-functor-and-opkernel"title="Permalink to this headline">¶</a></h3>
<p>Fluid implements computing unit based on different DeviceContext. Some computing unit is shared between operators. These common part will be put in operators/math directory as basic Functors.</p>
<p>Let’s take <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/math/maxouting.h#L27">MaxOutFunctor</a> as an example:</p>
<p>We get computing handle from concrete DeviceContext, and make compution on tensors.</p>
<p>The implement of <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> is similar to math functors, the extra thing we need to do is registering the OpKernel to global map.</p>
<p>Fluid provides different register interface in op_registry.h</p>
<p>Let’s take <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/crop_op.cc#L134">Crop</a> operator as an example:</p>
<spanid="advanced-topics-how-to-switch-between-different-device-library"></span><h2>Advanced topics: How to switch between different Device/Library<aclass="headerlink"href="#advanced-topics-how-to-switch-between-different-device-library"title="Permalink to this headline">¶</a></h2>
<p>Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale in a specific Device. For example, crf operator can be only run at CPU, whereas most other operators can be run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.</p>
<p>We will discuss how to implement an efficient OpKernel switch policy.</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
Deep learning has a high demand for computing resources. New high-performance device and computing library are coming constantly. The deep learning framework has to integrate these high-performance device and computing library flexibly.
On the one hand, hardware and computing library are not usually one-to-one coresponding relations. For example, in Intel CPU, there are Eigen and MKL computing library. And in Nvidia GPU, there are Eigen and cuDNN computing library. We have to implement specific kernels for an operator for each computing library.
On the other hand, users usually do not want to care about the low-level hardware and computing library when writing a neural network configuration. In Fluid, `Layer` is exposed in `Python`, and `Operator` is exposed in `C++`. Both `Layer` and `Operator` are independent on hardwares.
So, how to support a new Device/Library in Fluid becomes a challenge.
## Basic: Integrate A New Device/Library
For a general overview of fluid, please refer to [overview doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md).
There are mainly there parts we have to consider in integrating a new device/library:
- Place and DeviceContext: indicates the device id and manages hardware resources
- Memory and Tensor: malloc/free data on certain device
- Math Functor and OpKernel: implement computing unit on certain device/library
### Place and DeviceContext
#### Place
Fluid use class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent specific device and computing library. There are inheritance relationships between different kinds of `Place`.
Fluid use class [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30) to manage the resources in certain hardware, such as CUDA stream in `CDUADeviceContext`. There are also inheritance relationships between different kinds of `DeviceContext`.
`Placeholder` is used to delay memory allocation; that is, we can first define a tensor, using `Resize` to configure its shape, and then call `mutuable_data` to allocate the actual memory.
```cpp
paddle::framework::Tensor t;
paddle::platform::CPUPlace place;
// set size first
t.Resize({2, 3});
// allocate memory on CPU later
t.mutable_data(place);
```
### Math Functor and OpKernel
Fluid implements computing unit based on different DeviceContext. Some computing unit is shared between operators. These common part will be put in operators/math directory as basic Functors.
Let's take [MaxOutFunctor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/math/maxouting.h#L27) as an example:
## Advanced topics: How to switch between different Device/Library
Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale in a specific Device. For example, crf operator can be only run at CPU, whereas most other operators can be run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.
We will discuss how to implement an efficient OpKernel switch policy.
<spanid="design-doc-support-new-device-library"></span><h1>Design Doc: Support new Device/Library<aclass="headerlink"href="#design-doc-support-new-device-library"title="永久链接至标题">¶</a></h1>
<p>Deep learning has a high demand for computing resources. New high-performance device and computing library are coming constantly. The deep learning framework has to integrate these high-performance device and computing library flexibly.</p>
<p>On the one hand, hardware and computing library are not usually one-to-one coresponding relations. For example, in Intel CPU, there are Eigen and MKL computing library. And in Nvidia GPU, there are Eigen and cuDNN computing library. We have to implement specific kernels for an operator for each computing library.</p>
<p>On the other hand, users usually do not want to care about the low-level hardware and computing library when writing a neural network configuration. In Fluid, <codeclass="docutils literal"><spanclass="pre">Layer</span></code> is exposed in <codeclass="docutils literal"><spanclass="pre">Python</span></code>, and <codeclass="docutils literal"><spanclass="pre">Operator</span></code> is exposed in <codeclass="docutils literal"><spanclass="pre">C++</span></code>. Both <codeclass="docutils literal"><spanclass="pre">Layer</span></code> and <codeclass="docutils literal"><spanclass="pre">Operator</span></code> are independent on hardwares.</p>
<p>So, how to support a new Device/Library in Fluid becomes a challenge.</p>
<spanid="basic-integrate-a-new-device-library"></span><h2>Basic: Integrate A New Device/Library<aclass="headerlink"href="#basic-integrate-a-new-device-library"title="永久链接至标题">¶</a></h2>
<p>For a general overview of fluid, please refer to <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/read_source.md">overview doc</a>.</p>
<p>There are mainly there parts we have to consider in integrating a new device/library:</p>
<ulclass="simple">
<li>Place and DeviceContext: indicates the device id and manages hardware resources</li>
<li>Memory and Tensor: malloc/free data on certain device</li>
<li>Math Functor and OpKernel: implement computing unit on certain device/library</li>
</ul>
<divclass="section"id="place-and-devicecontext">
<spanid="place-and-devicecontext"></span><h3>Place and DeviceContext<aclass="headerlink"href="#place-and-devicecontext"title="永久链接至标题">¶</a></h3>
<p>Fluid use class <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55">Place</a> to represent specific device and computing library. There are inheritance relationships between different kinds of <codeclass="docutils literal"><spanclass="pre">Place</span></code>.</p>
<p>Fluid use class <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30">DeviceContext</a> to manage the resources in certain hardware, such as CUDA stream in <codeclass="docutils literal"><spanclass="pre">CDUADeviceContext</span></code>. There are also inheritance relationships between different kinds of <codeclass="docutils literal"><spanclass="pre">DeviceContext</span></code>.</p>
<p>Fluid provide following <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memory.h#L36">memory interfaces</a>:</p>
<p><aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.h#L36">Tensor</a> holds data with some shape in certain Place.</p>
<p><codeclass="docutils literal"><spanclass="pre">Placeholder</span></code> is used to delay memory allocation; that is, we can first define a tensor, using <codeclass="docutils literal"><spanclass="pre">Resize</span></code> to configure its shape, and then call <codeclass="docutils literal"><spanclass="pre">mutuable_data</span></code> to allocate the actual memory.</p>
<spanid="math-functor-and-opkernel"></span><h3>Math Functor and OpKernel<aclass="headerlink"href="#math-functor-and-opkernel"title="永久链接至标题">¶</a></h3>
<p>Fluid implements computing unit based on different DeviceContext. Some computing unit is shared between operators. These common part will be put in operators/math directory as basic Functors.</p>
<p>Let’s take <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/math/maxouting.h#L27">MaxOutFunctor</a> as an example:</p>
<p>We get computing handle from concrete DeviceContext, and make compution on tensors.</p>
<p>The implement of <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> is similar to math functors, the extra thing we need to do is registering the OpKernel to global map.</p>
<p>Fluid provides different register interface in op_registry.h</p>
<p>Let’s take <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/crop_op.cc#L134">Crop</a> operator as an example:</p>
<spanid="advanced-topics-how-to-switch-between-different-device-library"></span><h2>Advanced topics: How to switch between different Device/Library<aclass="headerlink"href="#advanced-topics-how-to-switch-between-different-device-library"title="永久链接至标题">¶</a></h2>
<p>Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale in a specific Device. For example, crf operator can be only run at CPU, whereas most other operators can be run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.</p>
<p>We will discuss how to implement an efficient OpKernel switch policy.</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.