First of all, we should follow some basical principles like:
1. [How to write a new operator](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md). We are trying to add a new kind of kernel into operators, so basically we should follow this doc.
2. [Supporting new Device/Library](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/support_new_device.md). Since MKLDNN is a new library to fluid, we should add `MKLDNNDeviceContext` and maybe `mkldnn_helper.h`, just like [cudnn_helper.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/cudnn_helper.h).
3. [Switch Kernel](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md). Another important point is that we should ensure the data synchronization between different kernel types, which is this [topic](https://github.com/PaddlePaddle/Paddle/issues/6549). So basically we should override `GetExpectedKernelType` and `trans` functions to support switching kernels.
4. [The Keys of Operator Kernel Type](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md). Kernel Type is a pivotal conception which can record the `Place`, `Library`, `DataType` and `Layout`.
## Sulution
In general, there are four parts we should follow to run a MKL-DNN primitive.
- Create a primitive descriptor that describe this operator
- Create a primitive itself by primitive descriptor and the engine
- Create all memory buffers that primitive needed
- Launch a stream to execute the primitive created
More details can refer to [here](http://01org.github.io/mkl-dnn).
It's better to avoid reinitialization of primitives and memory handles in the first three stages in every iteration. \
So we plan to create a map to record all the `primitive` and `memory`, which should not take too much memories as discussed [here](https://github.com/PaddlePaddle/Paddle/issues/6822).
It's assumed that following three conditions should be satisfied.
1. there is a unique key for each operator instance. May be the actual name of `Output Tensor`.
2. the `Input Tensor` inside `Compute` function is the one after converted.
3. we can get the phase(eg. `is_test`) inside `Compute` function, otherwise we need to expose this attribue to user.
### Compute
The algorithm of `Compute` would be described as follow, let's take conv like an example.
```c++
PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), "It must use CPUPlace.");
PADDLE_ENFORCE(platform::is_mkldnn_library(ctx.GetLibrary()), "It must use MKLDNN Library.");
`MKLDNNDeviceContext`, which is very straightforward, should contain some base information like: `stream`, `engine` and the map needed.
### mkldnn_helper
Some functions would be put in `paddle/platform/mkldnn_helper.h`.
- create MKLDNN memories
- create MKLDNN primitives
- error check function
- etc
### Kernel Switch
We should `reorder` the different Layout from other device or to other device. `GetExpectedKernelType` and `trans` functions can help us to implement it.
`GetExpectedKernelType` should get the context, and this operator can return the best `KernelType`.
<spanid="design-doc-add-mkldnn-kernel-in-fluid-operator"></span><h1>Design Doc: Add MKLDNN Kernel in Fluid Operator<aclass="headerlink"href="#design-doc-add-mkldnn-kernel-in-fluid-operator"title="Permalink to this headline">¶</a></h1>
<divclass="section"id="principles">
<spanid="principles"></span><h2>Principles<aclass="headerlink"href="#principles"title="Permalink to this headline">¶</a></h2>
<p>First of all, we should follow some basical principles like:</p>
<olclass="simple">
<li><aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md">How to write a new operator</a>. We are trying to add a new kind of kernel into operators, so basically we should follow this doc.</li>
<li><aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/support_new_device.md">Supporting new Device/Library</a>. Since MKLDNN is a new library to fluid, we should add <codeclass="docutils literal"><spanclass="pre">MKLDNNDeviceContext</span></code> and maybe <codeclass="docutils literal"><spanclass="pre">mkldnn_helper.h</span></code>, just like <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/cudnn_helper.h">cudnn_helper.h</a>.</li>
<li><aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md">Switch Kernel</a>. Another important point is that we should ensure the data synchronization between different kernel types, which is this <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/issues/6549">topic</a>. So basically we should override <codeclass="docutils literal"><spanclass="pre">GetExpectedKernelType</span></code> and <codeclass="docutils literal"><spanclass="pre">trans</span></code> functions to support switching kernels.</li>
<li><aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md">The Keys of Operator Kernel Type</a>. Kernel Type is a pivotal conception which can record the <codeclass="docutils literal"><spanclass="pre">Place</span></code>, <codeclass="docutils literal"><spanclass="pre">Library</span></code>, <codeclass="docutils literal"><spanclass="pre">DataType</span></code> and <codeclass="docutils literal"><spanclass="pre">Layout</span></code>.</li>
</ol>
</div>
<divclass="section"id="sulution">
<spanid="sulution"></span><h2>Sulution<aclass="headerlink"href="#sulution"title="Permalink to this headline">¶</a></h2>
<p>In general, there are four parts we should follow to run a MKL-DNN primitive.</p>
<ulclass="simple">
<li>Create a primitive descriptor that describe this operator</li>
<li>Create a primitive itself by primitive descriptor and the engine</li>
<li>Create all memory buffers that primitive needed</li>
<li>Launch a stream to execute the primitive created
More details can refer to <aclass="reference external"href="http://01org.github.io/mkl-dnn">here</a>.</li>
</ul>
<p>It’s better to avoid reinitialization of primitives and memory handles in the first three stages in every iteration. So we plan to create a map to record all the <codeclass="docutils literal"><spanclass="pre">primitive</span></code> and <codeclass="docutils literal"><spanclass="pre">memory</span></code>, which should not take too much memories as discussed <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/issues/6822">here</a>.</p>
<p>It’s assumed that following three conditions should be satisfied.</p>
<olclass="simple">
<li>there is a unique key for each operator instance. May be the actual name of <codeclass="docutils literal"><spanclass="pre">Output</span><spanclass="pre">Tensor</span></code>.</li>
<li>the <codeclass="docutils literal"><spanclass="pre">Input</span><spanclass="pre">Tensor</span></code> inside <codeclass="docutils literal"><spanclass="pre">Compute</span></code> function is the one after converted.</li>
<li>we can get the phase(eg. <codeclass="docutils literal"><spanclass="pre">is_test</span></code>) inside <codeclass="docutils literal"><spanclass="pre">Compute</span></code> function, otherwise we need to expose this attribue to user.</li>
</ol>
<divclass="section"id="compute">
<spanid="compute"></span><h3>Compute<aclass="headerlink"href="#compute"title="Permalink to this headline">¶</a></h3>
<p>The algorithm of <codeclass="docutils literal"><spanclass="pre">Compute</span></code> would be described as follow, let’s take conv like an example.</p>
<divclass="highlight-c++"><divclass="highlight"><pre><span></span><spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">platform</span><spanclass="o">::</span><spanclass="n">is_cpu_place</span><spanclass="p">(</span><spanclass="n">ctx</span><spanclass="p">.</span><spanclass="n">GetPlace</span><spanclass="p">()),</span><spanclass="s">"It must use CPUPlace."</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">platform</span><spanclass="o">::</span><spanclass="n">is_mkldnn_library</span><spanclass="p">(</span><spanclass="n">ctx</span><spanclass="p">.</span><spanclass="n">GetLibrary</span><spanclass="p">()),</span><spanclass="s">"It must use MKLDNN Library."</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">p</span><spanclass="p">,</span><spanclass="s">"Should have forward Primitive"</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">dev_ctx</span><spanclass="p">.</span><spanclass="n">findMemory</span><spanclass="p">(</span><spanclass="n">op_unique_key</span><spanclass="o">+</span><spanclass="s">"_input"</span><spanclass="p">),</span><spanclass="s">"Should have input memory"</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">dev_ctx</span><spanclass="p">.</span><spanclass="n">findMemory</span><spanclass="p">(</span><spanclass="n">op_unique_key</span><spanclass="o">+</span><spanclass="s">"_output"</span><spanclass="p">),</span><spanclass="s">"Should have output memory"</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">dev_ctx</span><spanclass="p">.</span><spanclass="n">findMemory</span><spanclass="p">(</span><spanclass="n">op_unique_key</span><spanclass="o">+</span><spanclass="s">"_filter"</span><spanclass="p">),</span><spanclass="s">"Should have filter memory"</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">dev_ctx</span><spanclass="p">.</span><spanclass="n">findPrimitiveDesc</span><spanclass="p">(</span><spanclass="n">op_unique_key</span><spanclass="o">+</span><spanclass="s">"_fwd_PD"</span><spanclass="p">),</span><spanclass="s">"Should have forward PrimitiveDesc"</span><spanclass="p">);</span>
<spanclass="n">dev_ctx</span><spanclass="p">.</span><spanclass="n">execute</span><spanclass="p">();</span><spanclass="c1">// the convert primitive should have already contained.</span>
</pre></div>
</div>
<p>The <codeclass="docutils literal"><spanclass="pre">createPrimitiveDesc</span></code> returns the primitive descripotor of this operator, would be like this:</p>
<spanclass="k">auto</span><spanclass="n">fwd_desc</span><spanclass="o">=</span><spanclass="n">mkldnn</span><spanclass="o">::</span><spanclass="n">conv_fwd</span><spanclass="o">::</span><spanclass="n">desc</span><spanclass="p">(</span><spanclass="cm">/* all the setting above*/</span><spanclass="p">);</span>
<spanid="mkldnndevicecontext"></span><h3>MKLDNNDeviceContext<aclass="headerlink"href="#mkldnndevicecontext"title="Permalink to this headline">¶</a></h3>
<p><codeclass="docutils literal"><spanclass="pre">MKLDNNDeviceContext</span></code>, which is very straightforward, should contain some base information like: <codeclass="docutils literal"><spanclass="pre">stream</span></code>, <codeclass="docutils literal"><spanclass="pre">engine</span></code> and the map needed.</p>
</div>
<divclass="section"id="mkldnn-helper">
<spanid="mkldnn-helper"></span><h3>mkldnn_helper<aclass="headerlink"href="#mkldnn-helper"title="Permalink to this headline">¶</a></h3>
<p>Some functions would be put in <codeclass="docutils literal"><spanclass="pre">paddle/platform/mkldnn_helper.h</span></code>.</p>
<ulclass="simple">
<li>create MKLDNN memories</li>
<li>create MKLDNN primitives</li>
<li>error check function</li>
<li>etc</li>
</ul>
</div>
<divclass="section"id="kernel-switch">
<spanid="kernel-switch"></span><h3>Kernel Switch<aclass="headerlink"href="#kernel-switch"title="Permalink to this headline">¶</a></h3>
<p>We should <codeclass="docutils literal"><spanclass="pre">reorder</span></code> the different Layout from other device or to other device. <codeclass="docutils literal"><spanclass="pre">GetExpectedKernelType</span></code> and <codeclass="docutils literal"><spanclass="pre">trans</span></code> functions can help us to implement it.</p>
<p><codeclass="docutils literal"><spanclass="pre">GetExpectedKernelType</span></code> should get the context, and this operator can return the best <codeclass="docutils literal"><spanclass="pre">KernelType</span></code>.
<codeclass="docutils literal"><spanclass="pre">trans</span></code> would be like this:</p>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">p</span><spanclass="p">,</span><spanclass="s">"Should have Reorder Primitive"</span><spanclass="p">);</span>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
First of all, we should follow some basical principles like:
1. [How to write a new operator](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md). We are trying to add a new kind of kernel into operators, so basically we should follow this doc.
2. [Supporting new Device/Library](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/support_new_device.md). Since MKLDNN is a new library to fluid, we should add `MKLDNNDeviceContext` and maybe `mkldnn_helper.h`, just like [cudnn_helper.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/cudnn_helper.h).
3. [Switch Kernel](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md). Another important point is that we should ensure the data synchronization between different kernel types, which is this [topic](https://github.com/PaddlePaddle/Paddle/issues/6549). So basically we should override `GetExpectedKernelType` and `trans` functions to support switching kernels.
4. [The Keys of Operator Kernel Type](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md). Kernel Type is a pivotal conception which can record the `Place`, `Library`, `DataType` and `Layout`.
## Sulution
In general, there are four parts we should follow to run a MKL-DNN primitive.
- Create a primitive descriptor that describe this operator
- Create a primitive itself by primitive descriptor and the engine
- Create all memory buffers that primitive needed
- Launch a stream to execute the primitive created
More details can refer to [here](http://01org.github.io/mkl-dnn).
It's better to avoid reinitialization of primitives and memory handles in the first three stages in every iteration. \
So we plan to create a map to record all the `primitive` and `memory`, which should not take too much memories as discussed [here](https://github.com/PaddlePaddle/Paddle/issues/6822).
It's assumed that following three conditions should be satisfied.
1. there is a unique key for each operator instance. May be the actual name of `Output Tensor`.
2. the `Input Tensor` inside `Compute` function is the one after converted.
3. we can get the phase(eg. `is_test`) inside `Compute` function, otherwise we need to expose this attribue to user.
### Compute
The algorithm of `Compute` would be described as follow, let's take conv like an example.
```c++
PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), "It must use CPUPlace.");
PADDLE_ENFORCE(platform::is_mkldnn_library(ctx.GetLibrary()), "It must use MKLDNN Library.");
`MKLDNNDeviceContext`, which is very straightforward, should contain some base information like: `stream`, `engine` and the map needed.
### mkldnn_helper
Some functions would be put in `paddle/platform/mkldnn_helper.h`.
- create MKLDNN memories
- create MKLDNN primitives
- error check function
- etc
### Kernel Switch
We should `reorder` the different Layout from other device or to other device. `GetExpectedKernelType` and `trans` functions can help us to implement it.
`GetExpectedKernelType` should get the context, and this operator can return the best `KernelType`.
<p>First of all, we should follow some basical principles like:</p>
<olclass="simple">
<li><aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md">How to write a new operator</a>. We are trying to add a new kind of kernel into operators, so basically we should follow this doc.</li>
<li><aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/support_new_device.md">Supporting new Device/Library</a>. Since MKLDNN is a new library to fluid, we should add <codeclass="docutils literal"><spanclass="pre">MKLDNNDeviceContext</span></code> and maybe <codeclass="docutils literal"><spanclass="pre">mkldnn_helper.h</span></code>, just like <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/cudnn_helper.h">cudnn_helper.h</a>.</li>
<li><aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md">Switch Kernel</a>. Another important point is that we should ensure the data synchronization between different kernel types, which is this <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/issues/6549">topic</a>. So basically we should override <codeclass="docutils literal"><spanclass="pre">GetExpectedKernelType</span></code> and <codeclass="docutils literal"><spanclass="pre">trans</span></code> functions to support switching kernels.</li>
<li><aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md">The Keys of Operator Kernel Type</a>. Kernel Type is a pivotal conception which can record the <codeclass="docutils literal"><spanclass="pre">Place</span></code>, <codeclass="docutils literal"><spanclass="pre">Library</span></code>, <codeclass="docutils literal"><spanclass="pre">DataType</span></code> and <codeclass="docutils literal"><spanclass="pre">Layout</span></code>.</li>
<p>In general, there are four parts we should follow to run a MKL-DNN primitive.</p>
<ulclass="simple">
<li>Create a primitive descriptor that describe this operator</li>
<li>Create a primitive itself by primitive descriptor and the engine</li>
<li>Create all memory buffers that primitive needed</li>
<li>Launch a stream to execute the primitive created
More details can refer to <aclass="reference external"href="http://01org.github.io/mkl-dnn">here</a>.</li>
</ul>
<p>It’s better to avoid reinitialization of primitives and memory handles in the first three stages in every iteration. So we plan to create a map to record all the <codeclass="docutils literal"><spanclass="pre">primitive</span></code> and <codeclass="docutils literal"><spanclass="pre">memory</span></code>, which should not take too much memories as discussed <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/issues/6822">here</a>.</p>
<p>It’s assumed that following three conditions should be satisfied.</p>
<olclass="simple">
<li>there is a unique key for each operator instance. May be the actual name of <codeclass="docutils literal"><spanclass="pre">Output</span><spanclass="pre">Tensor</span></code>.</li>
<li>the <codeclass="docutils literal"><spanclass="pre">Input</span><spanclass="pre">Tensor</span></code> inside <codeclass="docutils literal"><spanclass="pre">Compute</span></code> function is the one after converted.</li>
<li>we can get the phase(eg. <codeclass="docutils literal"><spanclass="pre">is_test</span></code>) inside <codeclass="docutils literal"><spanclass="pre">Compute</span></code> function, otherwise we need to expose this attribue to user.</li>
<p>The algorithm of <codeclass="docutils literal"><spanclass="pre">Compute</span></code> would be described as follow, let’s take conv like an example.</p>
<divclass="highlight-c++"><divclass="highlight"><pre><span></span><spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">platform</span><spanclass="o">::</span><spanclass="n">is_cpu_place</span><spanclass="p">(</span><spanclass="n">ctx</span><spanclass="p">.</span><spanclass="n">GetPlace</span><spanclass="p">()),</span><spanclass="s">"It must use CPUPlace."</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">platform</span><spanclass="o">::</span><spanclass="n">is_mkldnn_library</span><spanclass="p">(</span><spanclass="n">ctx</span><spanclass="p">.</span><spanclass="n">GetLibrary</span><spanclass="p">()),</span><spanclass="s">"It must use MKLDNN Library."</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">p</span><spanclass="p">,</span><spanclass="s">"Should have forward Primitive"</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">dev_ctx</span><spanclass="p">.</span><spanclass="n">findMemory</span><spanclass="p">(</span><spanclass="n">op_unique_key</span><spanclass="o">+</span><spanclass="s">"_input"</span><spanclass="p">),</span><spanclass="s">"Should have input memory"</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">dev_ctx</span><spanclass="p">.</span><spanclass="n">findMemory</span><spanclass="p">(</span><spanclass="n">op_unique_key</span><spanclass="o">+</span><spanclass="s">"_output"</span><spanclass="p">),</span><spanclass="s">"Should have output memory"</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">dev_ctx</span><spanclass="p">.</span><spanclass="n">findMemory</span><spanclass="p">(</span><spanclass="n">op_unique_key</span><spanclass="o">+</span><spanclass="s">"_filter"</span><spanclass="p">),</span><spanclass="s">"Should have filter memory"</span><spanclass="p">);</span>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">dev_ctx</span><spanclass="p">.</span><spanclass="n">findPrimitiveDesc</span><spanclass="p">(</span><spanclass="n">op_unique_key</span><spanclass="o">+</span><spanclass="s">"_fwd_PD"</span><spanclass="p">),</span><spanclass="s">"Should have forward PrimitiveDesc"</span><spanclass="p">);</span>
<spanclass="n">dev_ctx</span><spanclass="p">.</span><spanclass="n">execute</span><spanclass="p">();</span><spanclass="c1">// the convert primitive should have already contained.</span>
</pre></div>
</div>
<p>The <codeclass="docutils literal"><spanclass="pre">createPrimitiveDesc</span></code> returns the primitive descripotor of this operator, would be like this:</p>
<spanclass="k">auto</span><spanclass="n">fwd_desc</span><spanclass="o">=</span><spanclass="n">mkldnn</span><spanclass="o">::</span><spanclass="n">conv_fwd</span><spanclass="o">::</span><spanclass="n">desc</span><spanclass="p">(</span><spanclass="cm">/* all the setting above*/</span><spanclass="p">);</span>
<p><codeclass="docutils literal"><spanclass="pre">MKLDNNDeviceContext</span></code>, which is very straightforward, should contain some base information like: <codeclass="docutils literal"><spanclass="pre">stream</span></code>, <codeclass="docutils literal"><spanclass="pre">engine</span></code> and the map needed.</p>
<p>We should <codeclass="docutils literal"><spanclass="pre">reorder</span></code> the different Layout from other device or to other device. <codeclass="docutils literal"><spanclass="pre">GetExpectedKernelType</span></code> and <codeclass="docutils literal"><spanclass="pre">trans</span></code> functions can help us to implement it.</p>
<p><codeclass="docutils literal"><spanclass="pre">GetExpectedKernelType</span></code> should get the context, and this operator can return the best <codeclass="docutils literal"><spanclass="pre">KernelType</span></code>.
<codeclass="docutils literal"><spanclass="pre">trans</span></code> would be like this:</p>
<spanclass="n">PADDLE_ENFORCE</span><spanclass="p">(</span><spanclass="n">p</span><spanclass="p">,</span><spanclass="s">"Should have Reorder Primitive"</span><spanclass="p">);</span>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.