Refine current codes to support multi-devices 2
Created by: dzhwinter
First part:
-
Implement CUDNNDeviceContext, MKLDeviceContext hierarchy.
Use decorator design pattern is not fit our needs so well, doing other things first. - define LayoutType key #6827 (closed) #6832
- define LibraryType key #6770 (closed) #6874
- define the new OpKernelType class(four keys DataType/LayoutType/Place/LibraryType). #6769 (closed) #6879
- refine current kernel register mechanism
- refine CUDNN related operators, change them from operators to kernels https://github.com/PaddlePaddle/Paddle/pull/6660
-
Remove
CUDNNPlace
andMKLDNNPlace
follow new design - rename GPUPlace to CUDAPlace
- add multikernel python test support
Second part:
-
refine Tensor implementation, add a
layout
attribute #6765 (closed) https://github.com/PaddlePaddle/Paddle/pull/6955 -
refine Python interface and some data operators to set
layout
. - share Tensor layout in most operators
Third part:
- DataTransform function interface.
- DataTransformFn register mechanism #6823 (closed)
- kernel hint and GetExpectedKernelType #6883
- memory switch mechanism #6989 (closed) #6991
- refine memory switch mechanism in local scope #7057 (closed) #7058
-
implement some basic DataTransformFn
- CPU <---> CUDA #7050 (closed)
- layout transform, like mkldnn
- change batch norm to use new transform method
- add method to judge which inputs should be transformed refs
- Python interface setting kernel hint
- helper function of getting appropriate DeviceContext #7065 (closed) #7066
- Add a mnist example that switch kernel
- Refine current CRF operator