- `framework::OpKernel`: Base class for Op computation kernel.
- `framework::OperatorWithKernel`: Inherited from OperatorBase, describing an operator with computation kernels.
An operator can be differentiated by whether in has kernel methods. An operator with kernel inherits from `OperatorWithKernel` while the ones without inherit from `OperatorBase`. This tutorial focuses on implementing operators with kernels. In short, an operator includes the following information:
Operators can be categorized into two groups: operator with kernel(s) and operator without kernel(s). An operator with kernel(s) inherits from `OperatorWithKernel` while the one without kernel(s) inherits from `OperatorBase`. This tutorial focuses on implementing operators with kernels. In short, an operator includes the following information:
Information | Where is it defined
Information | Where is it defined
...
@@ -32,7 +34,7 @@ Kernel implementation | The kernel methods shared between CPU and CUDA are
...
@@ -32,7 +34,7 @@ Kernel implementation | The kernel methods shared between CPU and CUDA are
Registering the Op | Ops are registered in `.cc` files; For Kernel registration, `.cc` files contain the CPU implementation, while `.cu` files contain the CUDA implementation.
Registering the Op | Ops are registered in `.cc` files; For Kernel registration, `.cc` files contain the CPU implementation, while `.cu` files contain the CUDA implementation.
New Operator implementations are added to the list [paddle/operators](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators), with file names in the format `*_op.h` (if applicable), `*_op.cc`, `*_op.cu` (if applicable).** The system will use the naming scheme to automatically build operators and their corresponding Python extensions.**
New Operator implementations are added to the list [paddle/operators](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators), with file names in the format `*_op.h` (if applicable), `*_op.cc`, `*_op.cu` (if applicable).** The system will use the naming scheme to automatically build operators and their corresponding Python extensions.**
Let's take matrix multiplication operator, [MulOp](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc), as an example to introduce the writing of an Operator with Kernel.
Let's take matrix multiplication operator, [MulOp](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc), as an example to introduce the writing of an Operator with Kernel.
...
@@ -156,7 +158,8 @@ Usually `OpProtoMaker` and `Op`'s type definitions are written in `.cc` files, w
...
@@ -156,7 +158,8 @@ Usually `OpProtoMaker` and `Op`'s type definitions are written in `.cc` files, w
- `typename T` denotes data type, such as `float` or `double`.
- `typename T` denotes data type, such as `float` or `double`.
`MulKernel` types need to rewrite the interface for `Compute`.
`MulKernel` types need to rewrite the interface for `Compute`.
- `Compute` takes one input variable `const framework::ExecutionContext& context`.
- `Compute` takes one input parameter: `const framework::ExecutionContext& context`.
- Compared with `InferShapeContext`, `ExecutionContext` includes device types, and can similarly extract input, output, and attribute variables.
- Compared with `InferShapeContext`, `ExecutionContext` includes device types, and can similarly extract input, output, and attribute variables.
- `Compute` implements the computation logics of an `OpKernel`.
- `Compute` implements the computation logics of an `OpKernel`.
...
@@ -177,7 +180,7 @@ Usually `OpProtoMaker` and `Op`'s type definitions are written in `.cc` files, w
...
@@ -177,7 +180,7 @@ Usually `OpProtoMaker` and `Op`'s type definitions are written in `.cc` files, w
};
};
```
```
Note that **different devices (CPU, CUDA)share an Op definition; whether or not they share the same `OpKernel` depends on whether `Compute` calls functions that support both devices.**
Note that **different devices (CPU, CUDA)share one Op definition; whether or not they share the same `OpKernel` depends on whether `Compute` calls functions can support both devices.**
`MulOp`'s CPU and CUDA share the same `Kernel`. A non-sharing `OpKernel` example can be seen in [`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43).
`MulOp`'s CPU and CUDA share the same `Kernel`. A non-sharing `OpKernel` example can be seen in [`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43).
...
@@ -188,13 +191,14 @@ This concludes the forward implementation of an operator. Next its operation and
...
@@ -188,13 +191,14 @@ This concludes the forward implementation of an operator. Next its operation and
The definition of its corresponding backward operator, if applicable, is similar to that of an forward operator. **Note that a backward operator does not include a `ProtoMaker`**.
The definition of its corresponding backward operator, if applicable, is similar to that of an forward operator. **Note that a backward operator does not include a `ProtoMaker`**.
### Registering Operator
### Registering Operator and OpKernel
- In `.cc` files, register forward and backward operator classes and the CPU kernel.
- In `.cc` files, register forward and backward operator classes and the CPU kernel.
@@ -204,6 +208,7 @@ The definition of its corresponding backward operator, if applicable, is similar
...
@@ -204,6 +208,7 @@ The definition of its corresponding backward operator, if applicable, is similar
- `REGISTER_OP` registers the `ops::MulOp` class, type named `mul`, its type `ProtoMaker` is `ops::MulOpMaker`, registering `ops::MulOpGrad` as `mul_grad`.
- `REGISTER_OP` registers the `ops::MulOp` class, type named `mul`, its type `ProtoMaker` is `ops::MulOpMaker`, registering `ops::MulOpGrad` as `mul_grad`.
- `REGISTER_OP_WITHOUT_GRADIENT` registers an operator without gradient.
- `REGISTER_OP_WITHOUT_GRADIENT` registers an operator without gradient.
- `REGISTER_OP_CPU_KERNEL` registers `ops::MulKernel` class and specialized template types `paddle::platform::CPUPlace` and `float`, which also registers `ops::MulGradKernel`.
- `REGISTER_OP_CPU_KERNEL` registers `ops::MulKernel` class and specialized template types `paddle::platform::CPUPlace` and `float`, which also registers `ops::MulGradKernel`.
...
@@ -225,6 +230,7 @@ The definition of its corresponding backward operator, if applicable, is similar
...
@@ -225,6 +230,7 @@ The definition of its corresponding backward operator, if applicable, is similar
<li><codeclass="docutils literal"><spanclass="pre">framework::OpKernel</span></code>: Base class for Op computation.</li>
<li><codeclass="docutils literal"><spanclass="pre">framework::OperatorWithKernel</span></code>: Inherited from OperatorBase, describing an operator with computation.</li>
<li><codeclass="docutils literal"><spanclass="pre">class</span><spanclass="pre">OpProtoAndCheckerMaker</span></code>: Describes an Operator’s input, output, attributes and description, mainly used to interface with Python API.</li>
<li><codeclass="docutils literal"><spanclass="pre">class</span><spanclass="pre">OpProtoAndCheckerMaker</span></code>: Describes an Operator’s input, output, attributes and description, mainly used to interface with Python API.</li>
<li><codeclass="docutils literal"><spanclass="pre">framework::OpKernel</span></code>: Base class for Op computation kernel.</li>
<li><codeclass="docutils literal"><spanclass="pre">framework::OperatorWithKernel</span></code>: Inherited from OperatorBase, describing an operator with computation kernels.</li>
</ul>
</ul>
<p>An operator can be differentiated by whether in has kernel methods. An operator with kernel inherits from <codeclass="docutils literal"><spanclass="pre">OperatorWithKernel</span></code> while the ones without inherit from <codeclass="docutils literal"><spanclass="pre">OperatorBase</span></code>. This tutorial focuses on implementing operators with kernels. In short, an operator includes the following information:</p>
<p>Operators can be categorized into two groups: operator with kernel(s) and operator without kernel(s). An operator with kernel(s) inherits from <codeclass="docutils literal"><spanclass="pre">OperatorWithKernel</span></code> while the one without kernel(s) inherits from <codeclass="docutils literal"><spanclass="pre">OperatorBase</span></code>. This tutorial focuses on implementing operators with kernels. In short, an operator includes the following information:</p>
OpProtoMake definition | <codeclass="docutils literal"><spanclass="pre">.cc</span></code>files, Backward Op does not need an OpProtoMake interface.
OpProtoMake definition | <codeclass="docutils literal"><spanclass="pre">.cc</span></code>files, Backward Op does not need an OpProtoMake interface.
Op definition | <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files
Op definition | <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files
Kernel implementation | The kernel methods shared between CPU and CUDA are defined in <codeclass="docutils literal"><spanclass="pre">.h</span></code> files. CPU-specific kernels live in <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files, while CUDA-specific kernels are implemented in <codeclass="docutils literal"><spanclass="pre">.cu</span></code>files.
Kernel implementation | The kernel methods shared between CPU and CUDA are defined in <codeclass="docutils literal"><spanclass="pre">.h</span></code> files. CPU-specific kernels live in <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files, while CUDA-specific kernels are implemented in <codeclass="docutils literal"><spanclass="pre">.cu</span></code>files.
Registering the Op | Ops are registered in <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files; For Kernel registration, <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files contain the CPU implementation, while <codeclass="docutils literal"><spanclass="pre">.cu</span></code> files contain the CUDA implementation.</p>
Registering the Op | Ops are registered in <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files; For Kernel registration, <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files contain the CPU implementation, while <codeclass="docutils literal"><spanclass="pre">.cu</span></code> files contain the CUDA implementation.</p>
<p>New Operator implementations are added to the list <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators">paddle/operators</a>, with file names in the format <codeclass="docutils literal"><spanclass="pre">*_op.h</span></code> (if applicable), <codeclass="docutils literal"><spanclass="pre">*_op.cc</span></code>, <codeclass="docutils literal"><spanclass="pre">*_op.cu</span></code> (if applicable).** The system will use the naming scheme to automatically build operators and their corresponding Python extensions.**</p>
<p>New Operator implementations are added to the list <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators">paddle/operators</a>, with file names in the format <codeclass="docutils literal"><spanclass="pre">*_op.h</span></code> (if applicable), <codeclass="docutils literal"><spanclass="pre">*_op.cc</span></code>, <codeclass="docutils literal"><spanclass="pre">*_op.cu</span></code> (if applicable).** The system will use the naming scheme to automatically build operators and their corresponding Python extensions.**</p>
<p>Let’s take matrix multiplication operator, <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc">MulOp</a>, as an example to introduce the writing of an Operator with Kernel.</p>
<p>Let’s take matrix multiplication operator, <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc">MulOp</a>, as an example to introduce the writing of an Operator with Kernel.</p>
</div>
</div>
<divclass="section"id="implementing-c-types">
<divclass="section"id="implementing-c-types">
...
@@ -350,7 +351,7 @@ Registering the Op | Ops are registered in <code class="docutils liter
...
@@ -350,7 +351,7 @@ Registering the Op | Ops are registered in <code class="docutils liter
</ul>
</ul>
<p><codeclass="docutils literal"><spanclass="pre">MulKernel</span></code> types need to rewrite the interface for <codeclass="docutils literal"><spanclass="pre">Compute</span></code>.</p>
<p><codeclass="docutils literal"><spanclass="pre">MulKernel</span></code> types need to rewrite the interface for <codeclass="docutils literal"><spanclass="pre">Compute</span></code>.</p>
<ulclass="simple">
<ulclass="simple">
<li><codeclass="docutils literal"><spanclass="pre">Compute</span></code> takes one input variable<codeclass="docutils literal"><spanclass="pre">const</span><spanclass="pre">framework::ExecutionContext&</span><spanclass="pre">context</span></code>.</li>
<li><codeclass="docutils literal"><spanclass="pre">Compute</span></code> takes one input parameter:<codeclass="docutils literal"><spanclass="pre">const</span><spanclass="pre">framework::ExecutionContext&</span><spanclass="pre">context</span></code>.</li>
<li>Compared with <codeclass="docutils literal"><spanclass="pre">InferShapeContext</span></code>, <codeclass="docutils literal"><spanclass="pre">ExecutionContext</span></code> includes device types, and can similarly extract input, output, and attribute variables.</li>
<li>Compared with <codeclass="docutils literal"><spanclass="pre">InferShapeContext</span></code>, <codeclass="docutils literal"><spanclass="pre">ExecutionContext</span></code> includes device types, and can similarly extract input, output, and attribute variables.</li>
<li><codeclass="docutils literal"><spanclass="pre">Compute</span></code> implements the computation logics of an <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code>.</li>
<li><codeclass="docutils literal"><spanclass="pre">Compute</span></code> implements the computation logics of an <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code>.</li>
</ul>
</ul>
...
@@ -369,18 +370,19 @@ Registering the Op | Ops are registered in <code class="docutils liter
...
@@ -369,18 +370,19 @@ Registering the Op | Ops are registered in <code class="docutils liter
<spanclass="p">};</span>
<spanclass="p">};</span>
</pre></div>
</pre></div>
</div>
</div>
<p>Note that <strong>different devices (CPU, CUDA)share an Op definition; whether or not they share the same <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> depends on whether <codeclass="docutils literal"><spanclass="pre">Compute</span></code> calls functions that support both devices.</strong></p>
<p>Note that <strong>different devices (CPU, CUDA)share one Op definition; whether or not they share the same <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> depends on whether <codeclass="docutils literal"><spanclass="pre">Compute</span></code> calls functions can support both devices.</strong></p>
<p><codeclass="docutils literal"><spanclass="pre">MulOp</span></code>‘s CPU and CUDA share the same <codeclass="docutils literal"><spanclass="pre">Kernel</span></code>. A non-sharing <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> example can be seen in <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43"><codeclass="docutils literal"><spanclass="pre">OnehotCrossEntropyOpKernel</span></code></a>.</p>
<p><codeclass="docutils literal"><spanclass="pre">MulOp</span></code>‘s CPU and CUDA share the same <codeclass="docutils literal"><spanclass="pre">Kernel</span></code>. A non-sharing <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> example can be seen in <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43"><codeclass="docutils literal"><spanclass="pre">OnehotCrossEntropyOpKernel</span></code></a>.</p>
<p>To ease the writing of <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> compute, and for reusing code cross-device, <aclass="reference external"href="https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?fileviewer=file-view-default"><codeclass="docutils literal"><spanclass="pre">Eigen-unsupported</span><spanclass="pre">Tensor</span></code></a> module is used to implement <codeclass="docutils literal"><spanclass="pre">Compute</span></code> interface. To learn about how the Eigen library is used in PaddlePaddle, please see <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md">usage document</a>.</p>
<p>To ease the writing of <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> compute, and for reusing code cross-device, <aclass="reference external"href="https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?fileviewer=file-view-default"><codeclass="docutils literal"><spanclass="pre">Eigen-unsupported</span><spanclass="pre">Tensor</span></code></a> module is used to implement <codeclass="docutils literal"><spanclass="pre">Compute</span></code> interface. To learn about how the Eigen library is used in PaddlePaddle, please see <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md">usage document</a>.</p>
<p>This concludes the forward implementation of an operator. Next its operation and kernel need to be registered in a <codeclass="docutils literal"><spanclass="pre">.cc</span></code> file.</p>
<p>This concludes the forward implementation of an operator. Next its operation and kernel need to be registered in a <codeclass="docutils literal"><spanclass="pre">.cc</span></code> file.</p>
<p>The definition of its corresponding backward operator, if applicable, is similar to that of an forward operator. <strong>Note that a backward operator does not include a <codeclass="docutils literal"><spanclass="pre">ProtoMaker</span></code></strong>.</p>
<p>The definition of its corresponding backward operator, if applicable, is similar to that of an forward operator. <strong>Note that a backward operator does not include a <codeclass="docutils literal"><spanclass="pre">ProtoMaker</span></code></strong>.</p>
<spanid="registering-operator"></span><h3>Registering Operator<aclass="headerlink"href="#registering-operator"title="Permalink to this headline">¶</a></h3>
<spanid="registering-operator-and-opkernel"></span><h3>Registering Operator and OpKernel<aclass="headerlink"href="#registering-operator-and-opkernel"title="Permalink to this headline">¶</a></h3>
<ul>
<ul>
<li><pclass="first">In <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files, register forward and backward operator classes and the CPU kernel.</p>
<li><pclass="first">In <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files, register forward and backward operator classes and the CPU kernel.</p>