Discussion on some key questions with a DL framework
Created by: wangkuiyi
This issue comes from https://github.com/PaddlePaddle/Paddle/pull/2445#issuecomment-308007213
I think it would be easier to subdivide this document into the following topics, and try to figure out an optimal design for each topic:
- memory management
- class Place, inspired by Majel
- Allocation and Allocators
- unified malloc and allocation API for GPU and CPU --
p=malloc(Place pl, ...); used(pl); free(p);
- Tensor
- consider re-use existing ones, including mshadow, Eigen, porting Majel, or write a wrapper of these libraries.
- Expression Template
- Expression Template is important for performance optimization.
- Is there any (what is the) difference between above libraries -- mshadow and Eigen. (Majel doesn't complete implementation of Expression Template.)
- Which is better?
- Ops and Variables
- TensorFlow's Ops take Tensor, a wrapper of Eigen, as its inputs and outputs.
- Caffe2 and PyTorch take Variable as Ops' inputs and outputs.
- A Variable includes a tensor for the forward algorithm and another gradient tensor for the backward algorithm. The difference seems that TensorFlow is a general-purpose framework, whereas Caffe2 and PyTorch are DL-specific.
- Which approach should PaddlePaddle take?
- Ops and Gradient Ops
- In TensorFlow, an Op is a general concept -- all computations are represented by ops.
- In Caffe2, each Op has one or more corresponding GradientOps.
- Which approach should PaddlePaddle take?
- Ops and Kernels
- TensorFlow separate Op's signature (as OpDef) and its implementations (as OpKernel)
- others might not have such clear separation.
- Which approach should PaddlePaddle take.
- Execution engine
- How TensorFlow and other solutions parse the network definition and create a network?
- How TensorFlow and other solutions execute the training algorithm over the network while creating/managing the memory of Variables?