An operator can have different kernel implementations, and each operator will have a map to store the related kernels. Fluid uses `OpKernelType` as a key to identify a unique Kernel. Before an operator runs, an certain kernel must be chosen by a key of `OpKernelType`. Currently, `OpKernelType` is defined as follows:
```cpp
struct OpKernelType {
platform::Place place_;
proto::DataType data_type_;
};
```
For more details, please refer to [codes](https://github.com/PaddlePaddle/Paddle/blob/2d5ec16bc8a09fb8e0f62c89b116b0cd1d333907/paddle/framework/operator.h#L348-L374) in github.
It contains two keys, `Place` and `DataType`. And these two keys will be hashed to a unique key to represent a certain type of kernel. However, these two keys are not enough. We need a more complete representation of `OpKernelType`.
We often implement a kernel of an operator with some computing library in certain device(place). Please remind that computing library and device are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices.
For example, Eigen library can support Nvidia GPU/AMD GPU/CPU. And MKLDNN library can support Intel CPU/Intel FPGA. Both `Place` and `Library` should be a key of `OpKernelType`.
It's obvious that different DataTypes, like fp64/fp32/int8 will have different kernels. But the data layout of a Tensor will also lead to different implementation. Please refer to the batch norm operator [kernels](https://github.com/PaddlePaddle/Paddle/blob/a948fac4d0ad7e0412d373b8aabeb711c2899563/paddle/operators/batch_norm_op.cc#L180-L209). Data Layout should also be taken into consideration.
## Solution
There are four keys to determine a kernel type of an operator: `Place`/`Library`/`DataType`/`Layout`.
`Place` is to represent the device memory where data is locating.
### Library
One operator kernel is usually implemented based on one library. `Library` is defined as a enum variable:
```cpp
enum Library { Plain, MKLDNN, CUDNN };
```
We use `Plain` enumerator to represent default library. Since most operators in Fluid are implemented based on `Eigen` library, we take `Eigen` library as the `Plain` enumerator.
A library usually has a corresponding `DeviceContext` which contains some handles needed by computation. Fluid now have two default DeviceContexts in CPU and CUDA, `CPUDeviceContext` and `CUDADeviceContext`. `CPUDeviceContext` contains a Eigen library handle and `CDUADeviceContext` contains a Eigen library handle and cuBLAS handle.
If we want to support new Library, a new enumerator need to be added to `Library` and a new corresponding `LibraryDeviceContext` will be created.
### DataType
`DataType` is defined in [framework.proto](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto). Currently, int32/int64/fp32/fp64 are supported.
### Layout
Actually, a Tensor is a view of a block of memory. Besides a pointer to the memory, we also have to get some other descriptions of this block of memory, such as shape(ddim), stride, and layout.
Different layout leads to different implementation of operator kernel. There are mainly 4 principles we have to follow to support layout in our fluid framework.
- We take layout as a data member of Tensor. Layout is actually a enum variable. If fluid is built with MKLDNN, then, the memory format in MKLDNN will be added into this enum variable too.
- Users have to set layout for input data. And some operators like fill_constant/random, also have to set layout of generating data. Of course, we can have some default layout, like NCHW.
- The inference of Layout is at run-time, not compile-time.
- Every operator have to implement different kernels for different layouts. Let's take MKLDNN as an example, if we want to implement a MKLDNN convolution operator, we have to realize all the kernels for different layout, list at [here](http://01org.github.io/mkl-dnn/structmkldnn_1_1memory.html). And we will have a special macro to do registering kernels for MKLDNN operators.
<spanid="design-doc-the-keys-of-operator-kernel-type"></span><h1>Design Doc: The Keys of Operator Kernel Type<aclass="headerlink"href="#design-doc-the-keys-of-operator-kernel-type"title="Permalink to this headline">¶</a></h1>
<divclass="section"id="problem">
<spanid="problem"></span><h2>Problem<aclass="headerlink"href="#problem"title="Permalink to this headline">¶</a></h2>
<p>An operator can have different kernel implementations, and each operator will have a map to store the related kernels. Fluid uses <codeclass="docutils literal"><spanclass="pre">OpKernelType</span></code> as a key to identify a unique Kernel. Before an operator runs, an certain kernel must be chosen by a key of <codeclass="docutils literal"><spanclass="pre">OpKernelType</span></code>. Currently, <codeclass="docutils literal"><spanclass="pre">OpKernelType</span></code> is defined as follows:</p>
<p>For more details, please refer to <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/2d5ec16bc8a09fb8e0f62c89b116b0cd1d333907/paddle/framework/operator.h#L348-L374">codes</a> in github.</p>
<p>It contains two keys, <codeclass="docutils literal"><spanclass="pre">Place</span></code> and <codeclass="docutils literal"><spanclass="pre">DataType</span></code>. And these two keys will be hashed to a unique key to represent a certain type of kernel. However, these two keys are not enough. We need a more complete representation of <codeclass="docutils literal"><spanclass="pre">OpKernelType</span></code>.</p>
<p>We often implement a kernel of an operator with some computing library in certain device(place). Please remind that computing library and device are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices.</p>
<p>For example, Eigen library can support Nvidia GPU/AMD GPU/CPU. And MKLDNN library can support Intel CPU/Intel FPGA. Both <codeclass="docutils literal"><spanclass="pre">Place</span></code> and <codeclass="docutils literal"><spanclass="pre">Library</span></code> should be a key of <codeclass="docutils literal"><spanclass="pre">OpKernelType</span></code>.</p>
<p>It’s obvious that different DataTypes, like fp64/fp32/int8 will have different kernels. But the data layout of a Tensor will also lead to different implementation. Please refer to the batch norm operator <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/a948fac4d0ad7e0412d373b8aabeb711c2899563/paddle/operators/batch_norm_op.cc#L180-L209">kernels</a>. Data Layout should also be taken into consideration.</p>
</div>
<divclass="section"id="solution">
<spanid="solution"></span><h2>Solution<aclass="headerlink"href="#solution"title="Permalink to this headline">¶</a></h2>
<p>There are four keys to determine a kernel type of an operator: <codeclass="docutils literal"><spanclass="pre">Place</span></code>/<codeclass="docutils literal"><spanclass="pre">Library</span></code>/<codeclass="docutils literal"><spanclass="pre">DataType</span></code>/<codeclass="docutils literal"><spanclass="pre">Layout</span></code>.</p>
<p><codeclass="docutils literal"><spanclass="pre">Place</span></code> is to represent the device memory where data is locating.</p>
</div>
<divclass="section"id="library">
<spanid="library"></span><h3>Library<aclass="headerlink"href="#library"title="Permalink to this headline">¶</a></h3>
<p>One operator kernel is usually implemented based on one library. <codeclass="docutils literal"><spanclass="pre">Library</span></code> is defined as a enum variable:</p>
<p>We use <codeclass="docutils literal"><spanclass="pre">Plain</span></code> enumerator to represent default library. Since most operators in Fluid are implemented based on <codeclass="docutils literal"><spanclass="pre">Eigen</span></code> library, we take <codeclass="docutils literal"><spanclass="pre">Eigen</span></code> library as the <codeclass="docutils literal"><spanclass="pre">Plain</span></code> enumerator.
A library usually has a corresponding <codeclass="docutils literal"><spanclass="pre">DeviceContext</span></code> which contains some handles needed by computation. Fluid now have two default DeviceContexts in CPU and CUDA, <codeclass="docutils literal"><spanclass="pre">CPUDeviceContext</span></code> and <codeclass="docutils literal"><spanclass="pre">CUDADeviceContext</span></code>. <codeclass="docutils literal"><spanclass="pre">CPUDeviceContext</span></code> contains a Eigen library handle and <codeclass="docutils literal"><spanclass="pre">CDUADeviceContext</span></code> contains a Eigen library handle and cuBLAS handle.</p>
<p>If we want to support new Library, a new enumerator need to be added to <codeclass="docutils literal"><spanclass="pre">Library</span></code> and a new corresponding <codeclass="docutils literal"><spanclass="pre">LibraryDeviceContext</span></code> will be created.</p>
</div>
<divclass="section"id="datatype">
<spanid="datatype"></span><h3>DataType<aclass="headerlink"href="#datatype"title="Permalink to this headline">¶</a></h3>
<p><codeclass="docutils literal"><spanclass="pre">DataType</span></code> is defined in <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto">framework.proto</a>. Currently, int32/int64/fp32/fp64 are supported.</p>
</div>
<divclass="section"id="layout">
<spanid="layout"></span><h3>Layout<aclass="headerlink"href="#layout"title="Permalink to this headline">¶</a></h3>
<p>Actually, a Tensor is a view of a block of memory. Besides a pointer to the memory, we also have to get some other descriptions of this block of memory, such as shape(ddim), stride, and layout.</p>
<p>Different layout leads to different implementation of operator kernel. There are mainly 4 principles we have to follow to support layout in our fluid framework.</p>
<ulclass="simple">
<li>We take layout as a data member of Tensor. Layout is actually a enum variable. If fluid is built with MKLDNN, then, the memory format in MKLDNN will be added into this enum variable too.</li>
<li>Users have to set layout for input data. And some operators like fill_constant/random, also have to set layout of generating data. Of course, we can have some default layout, like NCHW.</li>
<li>The inference of Layout is at run-time, not compile-time.</li>
<li>Every operator have to implement different kernels for different layouts. Let’s take MKLDNN as an example, if we want to implement a MKLDNN convolution operator, we have to realize all the kernels for different layout, list at <aclass="reference external"href="http://01org.github.io/mkl-dnn/structmkldnn_1_1memory.html">here</a>. And we will have a special macro to do registering kernels for MKLDNN operators.</li>
</ul>
<p><codeclass="docutils literal"><spanclass="pre">Layout</span></code> is also defined as a enum variable:</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
An operator can have different kernel implementations, and each operator will have a map to store the related kernels. Fluid uses `OpKernelType` as a key to identify a unique Kernel. Before an operator runs, an certain kernel must be chosen by a key of `OpKernelType`. Currently, `OpKernelType` is defined as follows:
```cpp
struct OpKernelType {
platform::Place place_;
proto::DataType data_type_;
};
```
For more details, please refer to [codes](https://github.com/PaddlePaddle/Paddle/blob/2d5ec16bc8a09fb8e0f62c89b116b0cd1d333907/paddle/framework/operator.h#L348-L374) in github.
It contains two keys, `Place` and `DataType`. And these two keys will be hashed to a unique key to represent a certain type of kernel. However, these two keys are not enough. We need a more complete representation of `OpKernelType`.
We often implement a kernel of an operator with some computing library in certain device(place). Please remind that computing library and device are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices.
For example, Eigen library can support Nvidia GPU/AMD GPU/CPU. And MKLDNN library can support Intel CPU/Intel FPGA. Both `Place` and `Library` should be a key of `OpKernelType`.
It's obvious that different DataTypes, like fp64/fp32/int8 will have different kernels. But the data layout of a Tensor will also lead to different implementation. Please refer to the batch norm operator [kernels](https://github.com/PaddlePaddle/Paddle/blob/a948fac4d0ad7e0412d373b8aabeb711c2899563/paddle/operators/batch_norm_op.cc#L180-L209). Data Layout should also be taken into consideration.
## Solution
There are four keys to determine a kernel type of an operator: `Place`/`Library`/`DataType`/`Layout`.
`Place` is to represent the device memory where data is locating.
### Library
One operator kernel is usually implemented based on one library. `Library` is defined as a enum variable:
```cpp
enum Library { Plain, MKLDNN, CUDNN };
```
We use `Plain` enumerator to represent default library. Since most operators in Fluid are implemented based on `Eigen` library, we take `Eigen` library as the `Plain` enumerator.
A library usually has a corresponding `DeviceContext` which contains some handles needed by computation. Fluid now have two default DeviceContexts in CPU and CUDA, `CPUDeviceContext` and `CUDADeviceContext`. `CPUDeviceContext` contains a Eigen library handle and `CDUADeviceContext` contains a Eigen library handle and cuBLAS handle.
If we want to support new Library, a new enumerator need to be added to `Library` and a new corresponding `LibraryDeviceContext` will be created.
### DataType
`DataType` is defined in [framework.proto](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto). Currently, int32/int64/fp32/fp64 are supported.
### Layout
Actually, a Tensor is a view of a block of memory. Besides a pointer to the memory, we also have to get some other descriptions of this block of memory, such as shape(ddim), stride, and layout.
Different layout leads to different implementation of operator kernel. There are mainly 4 principles we have to follow to support layout in our fluid framework.
- We take layout as a data member of Tensor. Layout is actually a enum variable. If fluid is built with MKLDNN, then, the memory format in MKLDNN will be added into this enum variable too.
- Users have to set layout for input data. And some operators like fill_constant/random, also have to set layout of generating data. Of course, we can have some default layout, like NCHW.
- The inference of Layout is at run-time, not compile-time.
- Every operator have to implement different kernels for different layouts. Let's take MKLDNN as an example, if we want to implement a MKLDNN convolution operator, we have to realize all the kernels for different layout, list at [here](http://01org.github.io/mkl-dnn/structmkldnn_1_1memory.html). And we will have a special macro to do registering kernels for MKLDNN operators.
<spanid="design-doc-the-keys-of-operator-kernel-type"></span><h1>Design Doc: The Keys of Operator Kernel Type<aclass="headerlink"href="#design-doc-the-keys-of-operator-kernel-type"title="永久链接至标题">¶</a></h1>
<p>An operator can have different kernel implementations, and each operator will have a map to store the related kernels. Fluid uses <codeclass="docutils literal"><spanclass="pre">OpKernelType</span></code> as a key to identify a unique Kernel. Before an operator runs, an certain kernel must be chosen by a key of <codeclass="docutils literal"><spanclass="pre">OpKernelType</span></code>. Currently, <codeclass="docutils literal"><spanclass="pre">OpKernelType</span></code> is defined as follows:</p>
<p>For more details, please refer to <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/2d5ec16bc8a09fb8e0f62c89b116b0cd1d333907/paddle/framework/operator.h#L348-L374">codes</a> in github.</p>
<p>It contains two keys, <codeclass="docutils literal"><spanclass="pre">Place</span></code> and <codeclass="docutils literal"><spanclass="pre">DataType</span></code>. And these two keys will be hashed to a unique key to represent a certain type of kernel. However, these two keys are not enough. We need a more complete representation of <codeclass="docutils literal"><spanclass="pre">OpKernelType</span></code>.</p>
<p>We often implement a kernel of an operator with some computing library in certain device(place). Please remind that computing library and device are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices.</p>
<p>For example, Eigen library can support Nvidia GPU/AMD GPU/CPU. And MKLDNN library can support Intel CPU/Intel FPGA. Both <codeclass="docutils literal"><spanclass="pre">Place</span></code> and <codeclass="docutils literal"><spanclass="pre">Library</span></code> should be a key of <codeclass="docutils literal"><spanclass="pre">OpKernelType</span></code>.</p>
<p>It’s obvious that different DataTypes, like fp64/fp32/int8 will have different kernels. But the data layout of a Tensor will also lead to different implementation. Please refer to the batch norm operator <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/a948fac4d0ad7e0412d373b8aabeb711c2899563/paddle/operators/batch_norm_op.cc#L180-L209">kernels</a>. Data Layout should also be taken into consideration.</p>
<p>There are four keys to determine a kernel type of an operator: <codeclass="docutils literal"><spanclass="pre">Place</span></code>/<codeclass="docutils literal"><spanclass="pre">Library</span></code>/<codeclass="docutils literal"><spanclass="pre">DataType</span></code>/<codeclass="docutils literal"><spanclass="pre">Layout</span></code>.</p>
<p>One operator kernel is usually implemented based on one library. <codeclass="docutils literal"><spanclass="pre">Library</span></code> is defined as a enum variable:</p>
<p>We use <codeclass="docutils literal"><spanclass="pre">Plain</span></code> enumerator to represent default library. Since most operators in Fluid are implemented based on <codeclass="docutils literal"><spanclass="pre">Eigen</span></code> library, we take <codeclass="docutils literal"><spanclass="pre">Eigen</span></code> library as the <codeclass="docutils literal"><spanclass="pre">Plain</span></code> enumerator.
A library usually has a corresponding <codeclass="docutils literal"><spanclass="pre">DeviceContext</span></code> which contains some handles needed by computation. Fluid now have two default DeviceContexts in CPU and CUDA, <codeclass="docutils literal"><spanclass="pre">CPUDeviceContext</span></code> and <codeclass="docutils literal"><spanclass="pre">CUDADeviceContext</span></code>. <codeclass="docutils literal"><spanclass="pre">CPUDeviceContext</span></code> contains a Eigen library handle and <codeclass="docutils literal"><spanclass="pre">CDUADeviceContext</span></code> contains a Eigen library handle and cuBLAS handle.</p>
<p>If we want to support new Library, a new enumerator need to be added to <codeclass="docutils literal"><spanclass="pre">Library</span></code> and a new corresponding <codeclass="docutils literal"><spanclass="pre">LibraryDeviceContext</span></code> will be created.</p>
<p><codeclass="docutils literal"><spanclass="pre">DataType</span></code> is defined in <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto">framework.proto</a>. Currently, int32/int64/fp32/fp64 are supported.</p>
<p>Actually, a Tensor is a view of a block of memory. Besides a pointer to the memory, we also have to get some other descriptions of this block of memory, such as shape(ddim), stride, and layout.</p>
<p>Different layout leads to different implementation of operator kernel. There are mainly 4 principles we have to follow to support layout in our fluid framework.</p>
<ulclass="simple">
<li>We take layout as a data member of Tensor. Layout is actually a enum variable. If fluid is built with MKLDNN, then, the memory format in MKLDNN will be added into this enum variable too.</li>
<li>Users have to set layout for input data. And some operators like fill_constant/random, also have to set layout of generating data. Of course, we can have some default layout, like NCHW.</li>
<li>The inference of Layout is at run-time, not compile-time.</li>
<li>Every operator have to implement different kernels for different layouts. Let’s take MKLDNN as an example, if we want to implement a MKLDNN convolution operator, we have to realize all the kernels for different layout, list at <aclass="reference external"href="http://01org.github.io/mkl-dnn/structmkldnn_1_1memory.html">here</a>. And we will have a special macro to do registering kernels for MKLDNN operators.</li>
</ul>
<p><codeclass="docutils literal"><spanclass="pre">Layout</span></code> is also defined as a enum variable:</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.