Add Kernels for a New Device¶
Background¶
PaddlePaddle Fluid have hundreds of operators. Each operator could have one or more kernels. A kernel is an implementation of the operator for a certain device, which could be a hardware device, e.g., the CUDA GPU, or a library that utilizes a device, e.g., Intel MKL that makes full use of the Xeon CPU.
This document explains how to add an operator, and its kernels.  The kernels of an operator are indexed by a C++ type OpKernelType.  An operator chooses the right kernel at runtime.  This choosing mechanism is described here.
Write Kernels for A New Device¶
Add A New Device¶
For some historical reaons, we misuse the word library for device.  For example, we call the deivce type by library type.  An example is the header file library_type.h.  We will correct this ASAP.
To register a new device, we need to add an enum value to LibraryType:
enum class LibraryType {
  kPlain = 0,
  kMKLDNN = 1,
  kCUDNN = 2,
};
Add A New Place¶
If you have a new kind of Device, firstly you need to add a new kind of Place. For example CUDAPlace:
struct CUDAPlace {
  CUDAPlace() : CUDAPlace(0) {}
  explicit CUDAPlace(int d) : device(d) {}
  inline int GetDeviceId() const { return device; }
  // needed for variant equality comparison
  inline bool operator==(const CUDAPlace &o) const {
    return device == o.device;
  }
  inline bool operator!=(const CUDAPlace &o) const { return !(*this == o); }
  int device;
};
typedef boost::variant<CUDAPlace, CPUPlace> Place;
Add device context¶
After a new kind of Device is added, you should add a corresponding DeviceContext for it.
class DeviceContext {
 public:
  virtual ~DeviceContext() {}
  virtual Place GetPlace() const = 0;
  virtual void Wait() const {}
};
Implement new OpKernel for your Device.¶
A detailed documentation can be found in new_op_and_kernel
class OpKernelBase {
 public:
  /**
   * ExecutionContext is the only parameter of Kernel Run function.
   * Run will get input/output variables, state such as momentum and
   * device resource such as CUDA stream, cublas handle, etc. from
   * ExecutionContext. User should construct it before run the Operator.
   */
  virtual void Compute(const ExecutionContext& context) const = 0;
  virtual ~OpKernelBase() = default;
};
template <typename T>
class OpKernel : public OpKernelBase {
 public:
  using ELEMENT_TYPE = T;
};
Register the OpKernel to framework¶
After writing the components described above, we should register the kernel to the framework.
We use REGISTER_OP_KERNEL to do the registration.
REGISTER_OP_KERNEL(
    op_type,
    library_type,
    place_type,
    kernel0, kernel1, ...)
kernel0, kernel1 are kernels that have the same op_type, library_type, place_type but different data_types.
take conv2d as an example:
```cpp
REGISTER_OP_KERNEL(conv2d, CPU, paddle::platform::CPUPlace,
        paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, float>,
        paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, double>);
REGISTER_OP_KERNEL(conv2d, CUDNN, ::paddle::platform::CUDAPlace,
       paddle::operators::CUDNNConvOpKernel<float>,
       paddle::operators::CUDNNConvOpKernel<double>);
```
In the code above:
- conv2dis the type/name of the operator
- CUDNN/CPUis- library
- paddle::platform::CUDAPlace/CPUPlaceis- place
- template parameter float/doubleonCUDNNConvOpKernel<T>isdata_type.
