diff --git a/doc/design/support_new_device.md b/doc/design/support_new_device.md index 4c5f10e2ecb9ec09b78926ca27552741d02d7cc9..b154f01d32c5952367f075a732e2f7729d3c8626 100644 --- a/doc/design/support_new_device.md +++ b/doc/design/support_new_device.md @@ -2,9 +2,9 @@ ## Background -Deep learning has a high demand for computing resources. New high-performance devices and computing libraries are appearing very frequently. Deep learning frameworks have to integrate these high-performance devices and computing libraries flexibly and efficiently. +Deep learning has a high demand for computing resources. New high-performance devices and computing libraries are appearing very frequently. Deep learning frameworks have to integrate these high-performance devices and computing libraries in a flexible and efficient manner. -On one hand, hardware and computing libraries usually do not have a one-to-one correspondence. For example,Intel CPUs support Eigen and MKL computing libraries while Nvidia GPUs support Eigen and cuDNN computing libraries. We have to implement operator specific kernels for each computing library. +On one hand, hardware and computing libraries usually do not have a one-to-one correspondence. For example, Intel CPUs support Eigen and MKL computing libraries while Nvidia GPUs support Eigen and cuDNN computing libraries. We have to implement operator specific kernels for each computing library. On the other hand, users usually do not want to care about the low-level hardware and computing libraries when writing a neural network configuration. In Fluid, `Layer` is exposed in `Python`, and `Operator` is exposed in `C++`. Both `Layer` and `Operator` are hardware independent. @@ -17,7 +17,7 @@ For a general overview of fluid, please refer to the [overview doc](https://gith There are mainly three parts that we have to consider while integrating a new device/library: -- Place and DeviceContext: indicates the device id and manages hardware resources +- Place and DeviceContext: indicate the device id and manage hardware resources - Memory and Tensor: malloc/free data on certain device @@ -25,10 +25,10 @@ There are mainly three parts that we have to consider while integrating a new de ### Place and DeviceContext -Please remind that device and computing library are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices. +Please note that device and computing library are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices. #### Place -Fluid uses class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent the device memory where data is located. If we add another device, we have to add corresponding `DevicePlace`. +Fluid uses class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent the device memory where data is located. If we add another device, we have to add the corresponding `DevicePlace`. ``` | CPUPlace @@ -144,7 +144,7 @@ class Tensor { }; ``` -`Placeholder` is used to delay memory allocation; that is, we can first define a tensor, using `Resize` to configure its shape, and then call `mutuable_data` to allocate the actual memory. +`Placeholder` is used to delay memory allocation; that is, we can first define a tensor, using `Resize` to configurate its shape, and then call `mutuable_data` to allocate the actual memory. ```cpp paddle::framework::Tensor t; @@ -163,7 +163,7 @@ Fluid implements computing units based on different DeviceContexts. Some computi Let's take [MaxOutFunctor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/math/maxouting.h#L27) as an example: -The interface is defined in header file. +The interface is defined in the header file. ``` template @@ -203,7 +203,7 @@ class MaxOutFunctor { ``` -We get computing handle from a concrete DeviceContext, and make compution on tensors. +We first obtain the computing handle from a concrete DeviceContext, and then compute on tensors. The implemention of `OpKernel` is similar to math functors, the extra thing we need to do is to register the OpKernel in a global map. @@ -231,7 +231,7 @@ REGISTER_OP_CUDA_KERNEL( ## Advanced topics: How to switch between different Device/Library -Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale on a specific Device. For example, crf operator can only run on CPU, whereas most other operators can run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library. +Generally, we will implement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not suitable on a specific Device. For example, crf operator can only run on CPU, whereas most other operators can run on GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library. For more details, please refer to following docs: