new_op_kernel.md 4.4 KB
Newer Older
W
weixing 已提交
1
# Add Kernels for a New Device
Q
Qiao Longfei 已提交
2

W
weixing 已提交
3
## Background
Q
Qiao Longfei 已提交
4 5 6 7 8

PaddlePaddle Fluid have hundreds of operators.  Each operator could have one or more kernels.  A kernel is an implementation of the operator for a certain device, which could be a hardware device, e.g., the CUDA GPU, or a library that utilizes a device, e.g., Intel MKL that makes full use of the Xeon CPU.

[This document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md) explains how to add an operator, and its kernels.  The kernels of an operator are indexed by a C++ type [`OpKernelType`](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md).  An operator chooses the right kernel at runtime.  This choosing mechanism is described [here](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md).

W
weixing 已提交
9
## Write Kernels for A New Device
Q
Qiao Longfei 已提交
10

W
weixing 已提交
11
### Add A New Device
Q
Qiao Longfei 已提交
12 13 14 15 16 17 18 19 20 21 22 23 24 25

  For some historical reaons, we misuse the word *library* for *device*.  For example, we call the deivce type by *library type*.  An example is the header file [`library_type.h`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/library_type.h#L24).  We will correct this ASAP.

To register a new device, we need to add an enum value to `LibraryType`:

```
enum class LibraryType {
  kPlain = 0,
  kMKLDNN = 1,
  kCUDNN = 2,
};
```


W
weixing 已提交
26
### Add A New [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L53)
Q
Qiao Longfei 已提交
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

If you have a new kind of Device, firstly you need to add a new kind of [`Place`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L53). For example `CUDAPlace`:

```cpp
struct CUDAPlace {
  CUDAPlace() : CUDAPlace(0) {}
  explicit CUDAPlace(int d) : device(d) {}

  inline int GetDeviceId() const { return device; }
  // needed for variant equality comparison
  inline bool operator==(const CUDAPlace &o) const {
    return device == o.device;
  }
  inline bool operator!=(const CUDAPlace &o) const { return !(*this == o); }

  int device;
};

typedef boost::variant<CUDAPlace, CPUPlace> Place;
```

W
weixing 已提交
48
### Add [device context]((https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L37))
Q
Qiao Longfei 已提交
49 50 51 52 53 54 55 56 57 58 59 60
After a new kind of Device is added, you should add a corresponding [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L37) for it.

```cpp
class DeviceContext {
 public:
  virtual ~DeviceContext() {}
  virtual Place GetPlace() const = 0;

  virtual void Wait() const {}
};
```

W
weixing 已提交
61
### Implement new [OpKernel](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L351) for your Device.
Q
Qiao Longfei 已提交
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87

A detailed documentation can be found in [`new_op_and_kernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_en.md)

```cpp
class OpKernelBase {
 public:
  /**
   * ExecutionContext is the only parameter of Kernel Run function.
   * Run will get input/output variables, state such as momentum and
   * device resource such as CUDA stream, cublas handle, etc. from
   * ExecutionContext. User should construct it before run the Operator.
   */

  virtual void Compute(const ExecutionContext& context) const = 0;

  virtual ~OpKernelBase() = default;
};

template <typename T>
class OpKernel : public OpKernelBase {
 public:
  using ELEMENT_TYPE = T;
};
```


W
weixing 已提交
88
### Register the OpKernel to framework
Q
Qiao Longfei 已提交
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109

After writing the components described above, we should register the kernel to the framework.

We use `REGISTER_OP_KERNEL` to do the registration.

```cpp
REGISTER_OP_KERNEL(
	op_type,
	library_type,
	place_type,
	kernel0, kernel1, ...)
```

kernel0, kernel1 are kernels that have the same `op_type`, `library_type`, `place_type` but different `data_types`.

take [`conv2d`]((https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/conv_cudnn_op.cu.cc#L318)) as an example:

	```cpp
	REGISTER_OP_KERNEL(conv2d, CPU, paddle::platform::CPUPlace,
    		paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, float>,
    		paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, double>);
W
weixing 已提交
110

Q
Qiao Longfei 已提交
111 112 113 114 115 116 117 118 119 120 121
	REGISTER_OP_KERNEL(conv2d, CUDNN, ::paddle::platform::CUDAPlace,
	       paddle::operators::CUDNNConvOpKernel<float>,
	       paddle::operators::CUDNNConvOpKernel<double>);
	```

In the code above:

 - `conv2d` is the type/name of the operator
 - `CUDNN/CPU` is `library`
 - `paddle::platform::CUDAPlace/CPUPlace` is `place`
 - template parameter `float/double` on `CUDNNConvOpKernel<T>` is `data_type`.