Add Kernels for a New Device

Background

PaddlePaddle Fluid have hundreds of operators. Each operator could have one or more kernels. A kernel is an implementation of the operator for a certain device, which could be a hardware device, e.g., the CUDA GPU, or a library that utilizes a device, e.g., Intel MKL that makes full use of the Xeon CPU.

This document explains how to add an operator, and its kernels. The kernels of an operator are indexed by a C++ type OpKernelType. An operator chooses the right kernel at runtime. This choosing mechanism is described here.

Write Kernels for A New Device

Add A New Device

For some historical reaons, we misuse the word library for device. For example, we call the deivce type by library type. An example is the header file library_type.h. We will correct this ASAP.

To register a new device, we need to add an enum value to LibraryType:

enum class LibraryType {
  kPlain = 0,
  kMKLDNN = 1,
  kCUDNN = 2,
};

Add A New Place

If you have a new kind of Device, firstly you need to add a new kind of Place. For example CUDAPlace:

struct CUDAPlace {
  CUDAPlace() : CUDAPlace(0) {}
  explicit CUDAPlace(int d) : device(d) {}

  inline int GetDeviceId() const { return device; }
  // needed for variant equality comparison
  inline bool operator==(const CUDAPlace &o) const {
    return device == o.device;
  }
  inline bool operator!=(const CUDAPlace &o) const { return !(*this == o); }

  int device;
};

typedef boost::variant<CUDAPlace, CPUPlace> Place;

Add device context

After a new kind of Device is added, you should add a corresponding DeviceContext for it.

class DeviceContext {
 public:
  virtual ~DeviceContext() {}
  virtual Place GetPlace() const = 0;

  virtual void Wait() const {}
};

Implement new OpKernel for your Device.

A detailed documentation can be found in new_op_and_kernel

class OpKernelBase {
 public:
  /**
   * ExecutionContext is the only parameter of Kernel Run function.
   * Run will get input/output variables, state such as momentum and
   * device resource such as CUDA stream, cublas handle, etc. from
   * ExecutionContext. User should construct it before run the Operator.
   */

  virtual void Compute(const ExecutionContext& context) const = 0;

  virtual ~OpKernelBase() = default;
};

template <typename T>
class OpKernel : public OpKernelBase {
 public:
  using ELEMENT_TYPE = T;
};

Register the OpKernel to framework

After writing the components described above, we should register the kernel to the framework.

We use REGISTER_OP_KERNEL to do the registration.

REGISTER_OP_KERNEL(
    op_type,
    library_type,
    place_type,
    kernel0, kernel1, ...)

kernel0, kernel1 are kernels that have the same op_type, library_type, place_type but different data_types.

take conv2d as an example:

```cpp
REGISTER_OP_KERNEL(conv2d, CPU, paddle::platform::CPUPlace,
        paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, float>,
        paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, double>);

REGISTER_OP_KERNEL(conv2d, CUDNN, ::paddle::platform::CUDAPlace,
       paddle::operators::CUDNNConvOpKernel<float>,
       paddle::operators::CUDNNConvOpKernel<double>);
```

In the code above:

  • conv2d is the type/name of the operator
  • CUDNN/CPU is library
  • paddle::platform::CUDAPlace/CPUPlace is place
  • template parameter float/double on CUDNNConvOpKernel<T> is data_type.