@@ -54,11 +54,7 @@ For a long time, because the Paddle and Paddle-Lite operators are maintained sep
Therefore, this functional operator library will be jointly constructed by training and inference team, and will serve as an independent compilation component and underlying infrastructure (not yet independently split), which can serve training, server-inference, and -inference execution systems at the same time.
### 1.5 The adaptation of the new inference Runtime design 'infrt'
Inference team designed a new runtime `infrt`. It is expected to unify the execution system of Paddle-Inference and Paddle-Lite. It is necessary to directly call the operators in the PHI operator library jointly built this time. Therefore, the adaptation to `infrt` needs to be considered in the design. (Currently the `infrt` project is temporarily on hold).
### 1.6 Op and Kernel parameter normalization
### 1.5 Op and Kernel parameter normalization
The Python 2.0 API project in 2020 standardized the argument list of the Paddle Python-side API, making it concise, easy to use, and standard. However, due to cost considerations, the argument list at the Op level was not standardized, so there will be many early developed operators that differ greatly in arguments from the Python API. For example, `conv` op, the Python API has only 8 arguments, but the corresponding C++ `Conv` Op has 29 arguments. API and Op are essentially the same layer of concepts, both are descriptions of an operation, and the arguments should be consistent. In order to solve this problem, 'the operator definition enhancement project' was launched, and the declarations of 'AsExtra' and 'AsQuant' were added to some unnecessary arguments, but the problem was not fundamentally solved, which is what the construction of the PHI operator library hopes to solve.
...
...
@@ -68,7 +64,7 @@ We hope to be able to achieve the same three-layer arguments of Python API -> Op
### 2.1 Location
The PHI code directory is inside the paddle directory, which is at the same level as fluid, rather than inside the fluid directory. PHI is a basic component that is called by various upper-layer runtime such as fluid, lite, and infrt, and it will be used later as a separately compiled dynamic library, therefore PHI is not suitable as the submodule of fluid.
The PHI code directory is inside the paddle directory, which is at the same level as fluid, rather than inside the fluid directory. PHI is a basic component that is called by various upper-layer runtime such as fluid, lite, and it will be used later as a separately compiled dynamic library, therefore PHI is not suitable as the submodule of fluid.
### 2.2 Directory Structure
...
...
@@ -86,15 +82,19 @@ Training and inference require a clear operator library directory structure:
- For example, a model uses `add` and `multiply` only, ideally it could be cropped to only 2 kernels.
- In the long run, support the requirement of easily reusing kernel implementation.
- Explanation: When reusing the kernel, the corresponding function implementation should be introduced through `include` easily, rather than cannot find the kernel because of the complex directory structure.
- In the long run, support the requirement of the unified writing method among cross-device kernels, and the writing method is intuitive and easy to use, without introducing unnecessary template parameters.
- Explanation: Kernel Primitive API module is at the lower layer of the operator library. Its long-term vision is that each operation uses only one kernel to adapt to various devices, the code that truly distinguishes the device is only in the implementation of the Kernel Primitive API. In the future, the template parameters should be limited to as concise as possible when passing complex parameters into the reused kernel.
- In terms of ease of use, developers can accurately understand where the newly added kernel should be placed, without ambiguity.
- Explanation: When developers add an API, they will not be confused about which directory they should put the corresponding kernel in. Moreover, different people should have no ambiguous understanding of where the same kernel should be placed.
- Do not introduce a lot of duplicate directory design.
- Explanation: Concept splitting is needed, but also with boundaries. Avoid subdirectories with the same name occurring in multiple directories. For example, if `eigen`, `funcs`, `math` directories are placed under the cpu directory, then they shouldn't be placed under the gpu directory. The directory design of the new operator library is mainly divided according to the device, and the directory splitting at other levels should be weakened as much as possible. For example, try not to split based on functions, try not to split based on fields, etc.
- Do not introduce too deep directory design.
...
...
@@ -136,7 +136,6 @@ Some directory structure description:
-`kernels`: Kernels related to each device.
-`cpu, gpu, ...`
##### 2.2.2.2 Kernels directory
```
...
...
@@ -170,6 +169,7 @@ The directory structure is described as follows:
- The auxiliary functions that are only used by the current kernel, they are always placed in the same backend folder as the kernel implementation, and the .h file is used to manage the code. Auxiliary function codes are no longer placed elsewhere, unless their implementations are used in multiple places.
- Even if there are multiple calls, if it is still limited to the same device, directly build the header file and put it in the same directory.
- The implementation of the backward kernel and the forward kernel are placed in different files, and the file suffix is `*_grad_kernel.*`, which is convenient for cmake to separate and compile.
- No more directories are created for the backward kernel, otherwise directories such as cpu/gpu will also be created under the backward kernel directory.
- The implementation of the second-order derivative and the third-order derivative is also placed in the grad kernel implementation file.
- The top-layer is the API-level Tensor interface, which contains two pointer members, `TensorBase` and `AbstractAutogradMeta`.
- Both members are designed as Interface and do not depend on real Tensor and `Autograd` implementations.
-`AutogradMeta` is only meaningful in the dynamic graph API-level Tensor, it will not be used in the specific kernel calculation, so put it in the top-layer Tensor interface.
- In addition, such a design facilitates data sharing and reduces copy overhead.
- When a Tensor is assigned to another Tensor, or Tensor is used as a function return value, only the pointer is actually copied, and no real data copy is performed.
- The top-layer C++ Tensor plays a similar role as the Python-side Tensor, and the interface design is as consistent as possible with the Python-side.
- Contain basic property access and data access methods of Tensor.
-`shape`, `place`, `dtype`, `data`.
- Contain the `autograd` methods required by the dynamic graph Tensor.
...
...
@@ -277,6 +279,7 @@ Tensor ondnn() const;
```
- This conversion process may be `cast` or `copy`:
-`cast` if no data copy required.
-`copy` if data copy required.
- Transformations are implemented by functional kernels.
...
...
@@ -334,12 +337,12 @@ Inherit other Tensors with high degrees of freedom: directly inherit `TensorBase
-`TensorBase` is an abstract class, which leaves a lot of room for the description of specific Tensor. If the description of traditional Tensor cannot meet the requirements, a specialized Tensor implementation can be designed.
#### 2.3.3 C++ API
##### 2.3.3.1 C++ API form
> Highlights of this section:
>
> 1. The C++ API corresponds to the Python 2.0 API: the function name, parameter name, parameter order, and return value are the same.
After investigation, we found that very few framework products are designed with the ease of use of the C++ API in mind. For the long-term consideration, if we want to attract more developers to build the paddle ecology, it is also very important to provide a standardized and easy-to-use C++ API architecture. At the same time, the Python 2.0 API project has laid a good reference foundation for the C++ API, and we can directly inherit its achievements.
...
...
@@ -386,24 +389,24 @@ The key to C++ API generation lies in the configuration of the YAML file. Taking
```yaml
## Forward API configuration
-api:matmul
args:(Tensor x, Tensor y, bool transpose_x=false, bool transpose_y=false)
output:Tensor
infer_meta:
func:MatmulInferMeta
kernel:
func:matmul
backward:matmul_grad
-api:matmul
args:(Tensor x, Tensor y, bool transpose_x=false, bool transpose_y=false)
args:(Tensor x, Tensor y, Tensor out_grad, bool transpose_x=false, bool transpose_y=false)
output:Tensor(x_grad), Tensor(y_grad)
infer_meta:
func:MatmulGradInferMeta
kernel:
func:matmul_grad
```
The meaning of each configuration parameter:
...
...
@@ -426,6 +429,7 @@ Due to the large number of C++ APIs and their various forms and functions, some
##### 2.3.4.1 Kernel form
> Highlights of this section:
>
> 1. Notes on Kernel function form:
> (1) Data type `T` and `DeviceContext` (abbreviated as `Context`) as template parameters;
> (2) `Context` is the first parameter of Kernel;
...
...
@@ -470,14 +474,18 @@ Described as follows:
> FAQ:
>- Why does the first parameter need to be `DeviceContext`? Why must this parameter be passed in?
> - Why does the first parameter need to be `DeviceContext`? Why must this parameter be passed in?
- The PHI kernel requires a pure function format. The variables used in the function are passed in through parameters or created inside the function, global singletons are not allowed inside the function. In order to adapt to various kernel requirements, the `DeviceContext` parameter that stores context information is necessary.
>- Why are two template parameters needed?
> - Why are two template parameters needed?
- In order to efficiently support the reusing of device-independent kernels. If we want to implement a Fourier transform `fft` kernel, assuming that the kernel can be derived by combining the basic kernels, the form of `Xxx<T, Device>()` can avoid dynamically redistributing devices.
##### 2.3.4.3 Kernel implementation
> Highlights of this section:
>
> 1. Kernel focuses on computing logic without mixing scheduling logic.
> 2. Kernel is fine-grained enough, with clear boundaries, no optional parameters, easy to reuse.
...
...
@@ -531,13 +539,14 @@ In addition to the change of kernel form from structure format to functional for
2. In the PHI kernel, the memory application of the output Tensor is required to use the `ctx.Alloc` or `ctx.HostAlloc` method, and no longer use the original `mutable_data` to apply for memory.
> FAQ
>
> 1. Why is `mutable_data` replaced by `ctx.Alloc`?
> Answer: Because the global method `memory::AllocShared` called in the original `mutable_data` method uses a global singleton for memory allocation, which does not conform to the pure function design principle mentioned above. In terms of business requirements, if a single instance is used in the kernel to determine the way of memory allocation, in the multi-threaded environment of inference, different threads will not be able to flexibly specify different memory allocation ways.
##### 2.3.4.4 Kernel registration
> Highlights of this section:
>
> 1. Kernel needs to expose all its key information to the framework and record its input, output and attribute information, otherwise it will lead to unclear boundaries between framework scheduling and Kernel calculation.
When fluid Kernel is registered, only the `place`, `layout`, `dtype`, `input` and `output` of the Kernel are recorded and managed by `ExecutionContext`, and there is no corresponding information record. Now the kernel needs to be changed to a functional type. The input, output and attributes of each function are clear. We hope to record the information of each input and output here, which is also compatible with paddle-lite scheduling.
...
...
@@ -655,6 +664,7 @@ In addition, only basic template adaptation has been implemented at present, and
##### 2.3.4.4 Kernel management
> Highlights of this section:
>
> 1. Introduce the design of the current Kernel management components
For the management of the new form of Kernel, described as follows:
...
...
@@ -663,10 +673,10 @@ For the management of the new form of Kernel, described as follows:
-`KernelKey` is similar to the original `OpKernelType`, but the `palce` and `library_type` fields are combined into one and called `Backend`, because the original `LibraryType` is a limited enumeration class, which is strongly related to place, the splitting increases the cost of understanding instead.
-`Kernel` holds more information than the original `OpKernel`. In addition to the Function during execution, it also holds information about specific parameters, namely `KernelArgsDef`. For Tensor type input and output, it saves Tensor type information, Device, data Type, data layout. For Attribute type input and output, it saves type information.
#### 2.3.5 Kernel Compilation and Dependencies
> Highlights of this section:
>
> 1. Introduce the compilation design of the kernel.
> 2. Introduce the establishment of kernel dependencies.
...
...
@@ -714,6 +724,7 @@ The original `InferShape` of fluid Op is the same as `OpKernel`, has the problem
We also rewrite `InferShape` into a functional form, which supports different Ops to call the same `InferShape` function, which improves ease of use and reduces maintenance costs.
> FAQ:
>
> 1. Why call it `InferMeta` instead of continuing to call it `InferShape`?
> Answer: The `Meta` of `InferMeta` comes from the `meta` member in `DenseTensor`. In PHI, an op has two components, `InferMeta` and `Kernel`. `InferMeta` covers the functions of `InferShape`, but it is not limited to `InferShape`. In addition to the inference of dims and lod, `InferMeta` also infers dtype and layout, which is different from the original.
...
...
@@ -757,8 +768,8 @@ The purpose of using `MetaTensor` is to mask multiple Tensor types, and to be co
The basic design of `MetaTensor` see the `paddle/phi/core/meta_tensor.h`. There is a pointer member `TensorBase` in the base class `MetaTensor`, so it can be compatible with `DenseTensor`, `SelectedRows`, `SparseCsrTensor` and other types in PHI.
> Note:
> Only the content related to the design of PHI itself in this README. If you want to know more about the design of how phi and fluid are compatible, please refer to: