+
+ +
+

Add Kernels for a New Device

+
+

Background

+

PaddlePaddle Fluid have hundreds of operators. Each operator could have one or more kernels. A kernel is an implementation of the operator for a certain device, which could be a hardware device, e.g., the CUDA GPU, or a library that utilizes a device, e.g., Intel MKL that makes full use of the Xeon CPU.

+

This document explains how to add an operator, and its kernels. The kernels of an operator are indexed by a C++ type OpKernelType. An operator chooses the right kernel at runtime. This choosing mechanism is described here.

+
+
+

Write Kernels for A New Device

+
+

Add A New Device

+

For some historical reaons, we misuse the word library for device. For example, we call the deivce type by library type. An example is the header file library_type.h. We will correct this ASAP.

+

To register a new device, we need to add an enum value to LibraryType:

+
enum class LibraryType {
+  kPlain = 0,
+  kMKLDNN = 1,
+  kCUDNN = 2,
+};
+
+
+
+
+

Add A New Place

+

If you have a new kind of Device, firstly you need to add a new kind of Place. For example CUDAPlace:

+
struct CUDAPlace {
+  CUDAPlace() : CUDAPlace(0) {}
+  explicit CUDAPlace(int d) : device(d) {}
+
+  inline int GetDeviceId() const { return device; }
+  // needed for variant equality comparison
+  inline bool operator==(const CUDAPlace &o) const {
+    return device == o.device;
+  }
+  inline bool operator!=(const CUDAPlace &o) const { return !(*this == o); }
+
+  int device;
+};
+
+typedef boost::variant<CUDAPlace, CPUPlace> Place;
+
+
+
+
+

Add device context

+

After a new kind of Device is added, you should add a corresponding DeviceContext for it.

+
class DeviceContext {
+ public:
+  virtual ~DeviceContext() {}
+  virtual Place GetPlace() const = 0;
+
+  virtual void Wait() const {}
+};
+
+
+
+
+

Implement new OpKernel for your Device.

+

A detailed documentation can be found in new_op_and_kernel

+
class OpKernelBase {
+ public:
+  /**
+   * ExecutionContext is the only parameter of Kernel Run function.
+   * Run will get input/output variables, state such as momentum and
+   * device resource such as CUDA stream, cublas handle, etc. from
+   * ExecutionContext. User should construct it before run the Operator.
+   */
+
+  virtual void Compute(const ExecutionContext& context) const = 0;
+
+  virtual ~OpKernelBase() = default;
+};
+
+template <typename T>
+class OpKernel : public OpKernelBase {
+ public:
+  using ELEMENT_TYPE = T;
+};
+
+
+
+
+

Register the OpKernel to framework

+

After writing the components described above, we should register the kernel to the framework.

+

We use REGISTER_OP_KERNEL to do the registration.

+
REGISTER_OP_KERNEL(
+    op_type,
+    library_type,
+    place_type,
+    kernel0, kernel1, ...)
+
+
+

kernel0, kernel1 are kernels that have the same op_type, library_type, place_type but different data_types.

+

take conv2d as an example:

+
```cpp
+REGISTER_OP_KERNEL(conv2d, CPU, paddle::platform::CPUPlace,
+        paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, float>,
+        paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, double>);
+
+REGISTER_OP_KERNEL(conv2d, CUDNN, ::paddle::platform::CUDAPlace,
+       paddle::operators::CUDNNConvOpKernel<float>,
+       paddle::operators::CUDNNConvOpKernel<double>);
+```
+
+
+

In the code above:

+
    +
  • conv2d is the type/name of the operator
  • +
  • CUDNN/CPU is library
  • +
  • paddle::platform::CUDAPlace/CPUPlace is place
  • +
  • template parameter float/double on CUDNNConvOpKernel<T> is data_type.
  • +
+
+
+
+ + +
+