@@ -28,8 +28,8 @@ An operator can be differentiated by whether in has kernel methods. An operator
-------------- | :----------------------
OpProtoMake definition | `.cc`files, Backward Op does not need an OpProtoMake interface.
Op definition | `.cc` files
Kernel implementation | The kernel methods shared between CPU and GPU are defined in `.h` files. CPU-specific kernels live in `.cc` files, while GPU-specific kernels are implemented in `.cu`files.
Registering the Op | Ops are registered in `.cc` files; For Kernel registration, `.cc` files contain the CPU implementation, while `.cu` files contain the GPU implementation.
Kernel implementation | The kernel methods shared between CPU and CUDA are defined in `.h` files. CPU-specific kernels live in `.cc` files, while CUDA-specific kernels are implemented in `.cu`files.
Registering the Op | Ops are registered in `.cc` files; For Kernel registration, `.cc` files contain the CPU implementation, while `.cu` files contain the CUDA implementation.
New Operator implementations are added to the list [paddle/operators](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators), with file names in the format `*_op.h` (if applicable), `*_op.cc`, `*_op.cu` (if applicable).** The system will use the naming scheme to automatically build operators and their corresponding Python extensions. **
...
...
@@ -151,7 +151,7 @@ Usually `OpProtoMaker` and `Op`'s type definitions are written in `.cc` files, w
`MulKernel` inherits `framework::OpKernel`, which includes the following templates:
- `typename Place` denotes device type. When different devices, namely the CPU and the GPU, share the same kernel, this template needs to be added. If they don't share kernels, this must not be added. An example of a non-sharing kernel is [`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43).
- `typename DeviceContext` denotes device context type. When different devices, namely the CPUDeviceContext and the CUDADeviceContext, share the same kernel, this template needs to be added. If they don't share kernels, this must not be added. An example of a non-sharing kernel is [`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43).
- `typename T` denotes data type, such as `float` or `double`.
...
...
@@ -163,7 +163,7 @@ Usually `OpProtoMaker` and `Op`'s type definitions are written in `.cc` files, w
`MulKernel`'s implementation of `Compute` is as follows:
Note that **different devices (CPU, GPU)share an Op definition; whether or not they share the same `OpKernel` depends on whether `Compute` calls functions that support both devices.**
Note that **different devices (CPU, CUDA)share an Op definition; whether or not they share the same `OpKernel` depends on whether `Compute` calls functions that support both devices.**
`MulOp`'s CPU and GPU share the same `Kernel`. A non-sharing `OpKernel` example can be seen in [`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43).
`MulOp`'s CPU and CUDA share the same `Kernel`. A non-sharing `OpKernel` example can be seen in [`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43).
To ease the writing of `OpKernel` compute, and for reusing code cross-device, [`Eigen-unsupported Tensor`](https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?fileviewer=file-view-default) module is used to implement `Compute` interface. To learn about how the Eigen library is used in PaddlePaddle, please see [usage document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md).
...
...
@@ -196,9 +195,9 @@ The definition of its corresponding backward operator, if applicable, is similar
@@ -208,17 +207,17 @@ The definition of its corresponding backward operator, if applicable, is similar
- `REGISTER_OP_CPU_KERNEL` registers `ops::MulKernel` class and specialized template types `paddle::platform::CPUPlace` and `float`, which also registers `ops::MulGradKernel`.
- Registering GPU Kernel in `.cu` files
- Note that if GPU Kernel is implemented using the `Eigen unsupported` module, then on top of `.cu`, a macro definition `#define EIGEN_USE_GPU` is needed, such as
- Registering CUDA Kernel in `.cu` files
- Note that if CUDA Kernel is implemented using the `Eigen unsupported` module, then on top of `.cu`, a macro definition `#define EIGEN_USE_GPU` is needed, such as
```cpp
// if use Eigen unsupported module before include head files
Get its output, and compare it with the forward operator's own output.
The code above first loads required packages. In addition, we have
- `self.type = "mul" ` defines the type that is identical to what the operator's registered type.
- `self.inputs` defines input, with type `numpy.array` and initializes it.
- `self.outputs` defines output and completes the same operator computation in the Python script, and returns its result from the Python script.
### Testing Backward Operators
A backward operator unit test inherits `GradientChecker`, which inherits `unittest.TestCase`. As a result, **a backward operator unit test needs to be have the prefix `test_`**.
Get its output, and compare it with the forward operator's own output.
The code above first loads required packages. In addition, we have
- `self.op_type = "mul" ` defines the type that is identical to what the operator's registered type.
- `self.inputs` defines input, with type `numpy.array` and initializes it.
- `self.outputs` defines output and completes the same operator computation in the Python script, and returns its result from the Python script.
Some key points in checking gradient above include:
- `create_op("mul")` creates the backward operator's corresponding forward operator.
- `test_normal` calls `check_grad` to validate scaling tests' correctness and stability through numeric methods.
- The first variable `["X", "Y"]` appoints `X` and `Y` to be scale tested.
- The second variable `"Out"` points to the network's final output target `Out`.
...
...
@@ -338,5 +325,5 @@ ctest -R test_mul_op
- Every `*_op.h` (if applicable), `*_op.cc`, and `*_op.cu` (if applicable) must be created for a unique Op. Compiling will fail if multiple operators are included per file.
- The type with which an operator is registered needs to be identical to the Op's name. Registering `REGISTER_OP(B, ...)` in `A_op.cc` will cause unit testing failures.
- If the operator does not implement a GPU kernel, please refrain from creating an empty `*_op.cu` file, or else unit tests will fail.
- If the operator does not implement a CUDA kernel, please refrain from creating an empty `*_op.cu` file, or else unit tests will fail.
- If multiple operators rely on some shared methods, a file NOT named `*_op.*` can be created to store them, such as `gather.h`.
OpProtoMake definition | <codeclass="docutils literal"><spanclass="pre">.cc</span></code>files, Backward Op does not need an OpProtoMake interface.
Op definition | <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files
Kernel implementation | The kernel methods shared between CPU and GPU are defined in <codeclass="docutils literal"><spanclass="pre">.h</span></code> files. CPU-specific kernels live in <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files, while GPU-specific kernels are implemented in <codeclass="docutils literal"><spanclass="pre">.cu</span></code>files.
Registering the Op | Ops are registered in <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files; For Kernel registration, <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files contain the CPU implementation, while <codeclass="docutils literal"><spanclass="pre">.cu</span></code> files contain the GPU implementation.</p>
Kernel implementation | The kernel methods shared between CPU and CUDA are defined in <codeclass="docutils literal"><spanclass="pre">.h</span></code> files. CPU-specific kernels live in <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files, while CUDA-specific kernels are implemented in <codeclass="docutils literal"><spanclass="pre">.cu</span></code>files.
Registering the Op | Ops are registered in <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files; For Kernel registration, <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files contain the CPU implementation, while <codeclass="docutils literal"><spanclass="pre">.cu</span></code> files contain the CUDA implementation.</p>
<p>New Operator implementations are added to the list <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators">paddle/operators</a>, with file names in the format <codeclass="docutils literal"><spanclass="pre">*_op.h</span></code> (if applicable), <codeclass="docutils literal"><spanclass="pre">*_op.cc</span></code>, <codeclass="docutils literal"><spanclass="pre">*_op.cu</span></code> (if applicable).** The system will use the naming scheme to automatically build operators and their corresponding Python extensions. **</p>
<p>Let’s take matrix multiplication operator, <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/mul_op.cc">MulOp</a>, as an example to introduce the writing of an Operator with Kernel.</p>
</div>
...
...
@@ -339,7 +339,7 @@ Registering the Op | Ops are registered in <code class="docutils liter
<spanid="defining-opkernel"></span><h3>3. Defining OpKernel<aclass="headerlink"href="#defining-opkernel"title="Permalink to this headline">¶</a></h3>
<p><codeclass="docutils literal"><spanclass="pre">MulKernel</span></code> inherits <codeclass="docutils literal"><spanclass="pre">framework::OpKernel</span></code>, which includes the following templates:</p>
<ulclass="simple">
<li><codeclass="docutils literal"><spanclass="pre">typename</span><spanclass="pre">Place</span></code> denotes device type. When different devices, namely the CPU and the GPU, share the same kernel, this template needs to be added. If they don’t share kernels, this must not be added. An example of a non-sharing kernel is <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43"><codeclass="docutils literal"><spanclass="pre">OnehotCrossEntropyOpKernel</span></code></a>.</li>
<li><codeclass="docutils literal"><spanclass="pre">typename</span><spanclass="pre">DeviceContext</span></code> denotes device context type. When different devices, namely the CPUDeviceContext and the CUDADeviceContext, share the same kernel, this template needs to be added. If they don’t share kernels, this must not be added. An example of a non-sharing kernel is <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43"><codeclass="docutils literal"><spanclass="pre">OnehotCrossEntropyOpKernel</span></code></a>.</li>
<li><codeclass="docutils literal"><spanclass="pre">typename</span><spanclass="pre">T</span></code> denotes data type, such as <codeclass="docutils literal"><spanclass="pre">float</span></code> or <codeclass="docutils literal"><spanclass="pre">double</span></code>.</li>
</ul>
<p><codeclass="docutils literal"><spanclass="pre">MulKernel</span></code> types need to rewrite the interface for <codeclass="docutils literal"><spanclass="pre">Compute</span></code>.</p>
...
...
@@ -349,7 +349,7 @@ Registering the Op | Ops are registered in <code class="docutils liter
<li><codeclass="docutils literal"><spanclass="pre">Compute</span></code> implements the computation logics of an <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code>.</li>
</ul>
<p><codeclass="docutils literal"><spanclass="pre">MulKernel</span></code>‘s implementation of <codeclass="docutils literal"><spanclass="pre">Compute</span></code> is as follows:</p>
<p>Note that <strong>different devices (CPU, GPU)share an Op definition; whether or not they share the same <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> depends on whether <codeclass="docutils literal"><spanclass="pre">Compute</span></code> calls functions that support both devices.</strong></p>
<p><codeclass="docutils literal"><spanclass="pre">MulOp</span></code>‘s CPU and GPU share the same <codeclass="docutils literal"><spanclass="pre">Kernel</span></code>. A non-sharing <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> example can be seen in <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43"><codeclass="docutils literal"><spanclass="pre">OnehotCrossEntropyOpKernel</span></code></a>.</p>
<p>Note that <strong>different devices (CPU, CUDA)share an Op definition; whether or not they share the same <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> depends on whether <codeclass="docutils literal"><spanclass="pre">Compute</span></code> calls functions that support both devices.</strong></p>
<p><codeclass="docutils literal"><spanclass="pre">MulOp</span></code>‘s CPU and CUDA share the same <codeclass="docutils literal"><spanclass="pre">Kernel</span></code>. A non-sharing <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> example can be seen in <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43"><codeclass="docutils literal"><spanclass="pre">OnehotCrossEntropyOpKernel</span></code></a>.</p>
<p>To ease the writing of <codeclass="docutils literal"><spanclass="pre">OpKernel</span></code> compute, and for reusing code cross-device, <aclass="reference external"href="https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?fileviewer=file-view-default"><codeclass="docutils literal"><spanclass="pre">Eigen-unsupported</span><spanclass="pre">Tensor</span></code></a> module is used to implement <codeclass="docutils literal"><spanclass="pre">Compute</span></code> interface. To learn about how the Eigen library is used in PaddlePaddle, please see <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md">usage document</a>.</p>
<p>This concludes the forward implementation of an operator. Next its operation and kernel need to be registered in a <codeclass="docutils literal"><spanclass="pre">.cc</span></code> file.</p>
<p>The definition of its corresponding backward operator, if applicable, is similar to that of an forward operator. <strong>Note that a backward operator does not include a <codeclass="docutils literal"><spanclass="pre">ProtoMaker</span></code></strong>.</p>
...
...
@@ -376,9 +375,9 @@ Registering the Op | Ops are registered in <code class="docutils liter
<li><pclass="first">In <codeclass="docutils literal"><spanclass="pre">.cc</span></code> files, register forward and backward operator classes and the CPU kernel.</p>
@@ -390,17 +389,17 @@ Registering the Op | Ops are registered in <code class="docutils liter
</li>
</ul>
<ul>
<li><pclass="first">Registering GPU Kernel in <codeclass="docutils literal"><spanclass="pre">.cu</span></code> files</p>
<li><pclass="first">Registering CUDA Kernel in <codeclass="docutils literal"><spanclass="pre">.cu</span></code> files</p>
<ulclass="simple">
<li>Note that if GPU Kernel is implemented using the <codeclass="docutils literal"><spanclass="pre">Eigen</span><spanclass="pre">unsupported</span></code> module, then on top of <codeclass="docutils literal"><spanclass="pre">.cu</span></code>, a macro definition <codeclass="docutils literal"><spanclass="pre">#define</span><spanclass="pre">EIGEN_USE_GPU</span></code> is needed, such as</li>
<li>Note that if CUDA Kernel is implemented using the <codeclass="docutils literal"><spanclass="pre">Eigen</span><spanclass="pre">unsupported</span></code> module, then on top of <codeclass="docutils literal"><spanclass="pre">.cu</span></code>, a macro definition <codeclass="docutils literal"><spanclass="pre">#define</span><spanclass="pre">EIGEN_USE_GPU</span></code> is needed, such as</li>
</ul>
<divclass="highlight-cpp"><divclass="highlight"><pre><span></span><spanclass="c1">// if use Eigen unsupported module before include head files</span>
<p>Get its output, and compare it with the forward operator’s own output.</p>
<p>The code above first loads required packages. In addition, we have</p>
<ulclass="simple">
<li><codeclass="docutils literal"><spanclass="pre">self.type</span><spanclass="pre">=</span><spanclass="pre">"mul"</span></code> defines the type that is identical to what the operator’s registered type.</li>
<li><codeclass="docutils literal"><spanclass="pre">self.inputs</span></code> defines input, with type <codeclass="docutils literal"><spanclass="pre">numpy.array</span></code> and initializes it.</li>
<li><codeclass="docutils literal"><spanclass="pre">self.outputs</span></code> defines output and completes the same operator computation in the Python script, and returns its result from the Python script.</li>
<spanid="testing-backward-operators"></span><h3>Testing Backward Operators<aclass="headerlink"href="#testing-backward-operators"title="Permalink to this headline">¶</a></h3>
<p>A backward operator unit test inherits <codeclass="docutils literal"><spanclass="pre">GradientChecker</span></code>, which inherits <codeclass="docutils literal"><spanclass="pre">unittest.TestCase</span></code>. As a result, <strong>a backward operator unit test needs to be have the prefix <codeclass="docutils literal"><spanclass="pre">test_</span></code></strong>.</p>
<p>Get its output, and compare it with the forward operator’s own output.</p>
<p>The code above first loads required packages. In addition, we have</p>
<ulclass="simple">
<li><codeclass="docutils literal"><spanclass="pre">self.op_type</span><spanclass="pre">=</span><spanclass="pre">"mul"</span></code> defines the type that is identical to what the operator’s registered type.</li>
<li><codeclass="docutils literal"><spanclass="pre">self.inputs</span></code> defines input, with type <codeclass="docutils literal"><spanclass="pre">numpy.array</span></code> and initializes it.</li>
<li><codeclass="docutils literal"><spanclass="pre">self.outputs</span></code> defines output and completes the same operator computation in the Python script, and returns its result from the Python script.</li>
</ul>
<p>Some key points in checking gradient above include:</p>
<ulclass="simple">
<li><codeclass="docutils literal"><spanclass="pre">create_op("mul")</span></code> creates the backward operator’s corresponding forward operator.</li>
<li><codeclass="docutils literal"><spanclass="pre">test_normal</span></code> calls <codeclass="docutils literal"><spanclass="pre">check_grad</span></code> to validate scaling tests’ correctness and stability through numeric methods.<ul>
<li>The first variable <codeclass="docutils literal"><spanclass="pre">["X",</span><spanclass="pre">"Y"]</span></code> appoints <codeclass="docutils literal"><spanclass="pre">X</span></code> and <codeclass="docutils literal"><spanclass="pre">Y</span></code> to be scale tested.</li>
<li>The second variable <codeclass="docutils literal"><spanclass="pre">"Out"</span></code> points to the network’s final output target <codeclass="docutils literal"><spanclass="pre">Out</span></code>.</li>
...
...
@@ -515,7 +502,7 @@ Registering the Op | Ops are registered in <code class="docutils liter
<ulclass="simple">
<li>Every <codeclass="docutils literal"><spanclass="pre">*_op.h</span></code> (if applicable), <codeclass="docutils literal"><spanclass="pre">*_op.cc</span></code>, and <codeclass="docutils literal"><spanclass="pre">*_op.cu</span></code> (if applicable) must be created for a unique Op. Compiling will fail if multiple operators are included per file.</li>
<li>The type with which an operator is registered needs to be identical to the Op’s name. Registering <codeclass="docutils literal"><spanclass="pre">REGISTER_OP(B,</span><spanclass="pre">...)</span></code> in <codeclass="docutils literal"><spanclass="pre">A_op.cc</span></code> will cause unit testing failures.</li>
<li>If the operator does not implement a GPU kernel, please refrain from creating an empty <codeclass="docutils literal"><spanclass="pre">*_op.cu</span></code> file, or else unit tests will fail.</li>
<li>If the operator does not implement a CUDA kernel, please refrain from creating an empty <codeclass="docutils literal"><spanclass="pre">*_op.cu</span></code> file, or else unit tests will fail.</li>
<li>If multiple operators rely on some shared methods, a file NOT named <codeclass="docutils literal"><spanclass="pre">*_op.*</span></code> can be created to store them, such as <codeclass="docutils literal"><spanclass="pre">gather.h</span></code>.</li>
为了使`OpKernel`的计算过程书写更加简单,并且CPU、GPU的代码可以复用,我们通常借助 Eigen unsupported Tensor模块来实现`Compute`接口。关于在PaddlePaddle中如何使用Eigen库,请参考[使用文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md)。
为了使`OpKernel`的计算过程书写更加简单,并且CPU、CUDA的代码可以复用,我们通常借助 Eigen unsupported Tensor模块来实现`Compute`接口。关于在PaddlePaddle中如何使用Eigen库,请参考[使用文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md)。
为了使`OpKernel`的计算过程书写更加简单,并且CPU、CUDA的代码可以复用,我们通常借助 Eigen unsupported Tensor模块来实现`Compute`接口。关于在PaddlePaddle中如何使用Eigen库,请参考[使用文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md)。