Add Kernels for a New Device¶
Background¶
PaddlePaddle Fluid have hundreds of operators. Each operator could have one or more kernels. A kernel is an implementation of the operator for a certain device, which could be a hardware device, e.g., the CUDA GPU, or a library that utilizes a device, e.g., Intel MKL that makes full use of the Xeon CPU.
This document explains how to add an operator, and its kernels. The kernels of an operator are indexed by a C++ type OpKernelType
. An operator chooses the right kernel at runtime. This choosing mechanism is described here.
Write Kernels for A New Device¶
Add A New Device¶
For some historical reaons, we misuse the word library for device. For example, we call the deivce type by library type. An example is the header file library_type.h
. We will correct this ASAP.
To register a new device, we need to add an enum value to LibraryType
:
enum class LibraryType {
kPlain = 0,
kMKLDNN = 1,
kCUDNN = 2,
};
Add A New Place¶
If you have a new kind of Device, firstly you need to add a new kind of Place
. For example CUDAPlace
:
struct CUDAPlace {
CUDAPlace() : CUDAPlace(0) {}
explicit CUDAPlace(int d) : device(d) {}
inline int GetDeviceId() const { return device; }
// needed for variant equality comparison
inline bool operator==(const CUDAPlace &o) const {
return device == o.device;
}
inline bool operator!=(const CUDAPlace &o) const { return !(*this == o); }
int device;
};
typedef boost::variant<CUDAPlace, CPUPlace> Place;
Add device context¶
After a new kind of Device is added, you should add a corresponding DeviceContext for it.
class DeviceContext {
public:
virtual ~DeviceContext() {}
virtual Place GetPlace() const = 0;
virtual void Wait() const {}
};
Implement new OpKernel for your Device.¶
A detailed documentation can be found in new_op_and_kernel
class OpKernelBase {
public:
/**
* ExecutionContext is the only parameter of Kernel Run function.
* Run will get input/output variables, state such as momentum and
* device resource such as CUDA stream, cublas handle, etc. from
* ExecutionContext. User should construct it before run the Operator.
*/
virtual void Compute(const ExecutionContext& context) const = 0;
virtual ~OpKernelBase() = default;
};
template <typename T>
class OpKernel : public OpKernelBase {
public:
using ELEMENT_TYPE = T;
};
Register the OpKernel to framework¶
After writing the components described above, we should register the kernel to the framework.
We use REGISTER_OP_KERNEL
to do the registration.
REGISTER_OP_KERNEL(
op_type,
library_type,
place_type,
kernel0, kernel1, ...)
kernel0, kernel1 are kernels that have the same op_type
, library_type
, place_type
but different data_types
.
take conv2d
as an example:
```cpp
REGISTER_OP_KERNEL(conv2d, CPU, paddle::platform::CPUPlace,
paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, float>,
paddle::operators::GemmConvKernel<paddle::platform::CPUDeviceContext, double>);
REGISTER_OP_KERNEL(conv2d, CUDNN, ::paddle::platform::CUDAPlace,
paddle::operators::CUDNNConvOpKernel<float>,
paddle::operators::CUDNNConvOpKernel<double>);
```
In the code above:
conv2d
is the type/name of the operatorCUDNN/CPU
islibrary
paddle::platform::CUDAPlace/CPUPlace
isplace
- template parameter
float/double
onCUDNNConvOpKernel<T>
isdata_type
.