Multi-device support (#6403) · Issue · PaddlePaddle / Paddle

Multi-device support

Created by: wangkuiyi

TODO 1. Kernel Selection with Fallback

Our current kernel selection mechanism is defined below

https://github.com/PaddlePaddle/Paddle/blob/7d85b6d36effbd5534b2ef7f70b0001a48844a00/paddle/framework/operator.cc#L404-L429

Please be aware that all our computational operators (i.e., except for control-flow operators like WhileOp and IfElseOp and I/O operators like Send, Recv, ListenAndDo, ReadFile) are derived from the base class OperatorWithKernels.

Each computational operator class has multiple kernels, each a function making use of a specific acceleration device, e.g., MKL, CUDA, etc.

The OperatorWithKernel::Run posted above selects a kernel from kernel_key and runs it.

Our current implementation assumes that all computational operators in a program run on the same device. However, this is not true. For example, it is technically difficult to implement CRFOp on CUDA, so our CRFOp has only the CPU kernel. So, if we assign a program including the CRFOp to run on a CUDA device, it would crash.

Thus we need a fallback mechanism for finding the right kernel. In particular, we need to change the system to provide a priority list of devices, instead of a single device, to a program. For example, [ROCm, CUDA, MKL, CPU]. And we need to change the implementation of OperatorWithKernel::Run to take such a priority list, and finds and runs the existing kernel of the highest priority.

PaddlePaddle / Paddle 大约 1 年 前同步成功

Multi-device support

PaddlePaddle / Paddle
大约 1 年前同步成功