Defects of Python API (#721) · Issue · PaddlePaddle / Paddle

Defects of Python API

Created by: reyoung

The Python API of Paddle is developed originally for Paddle model inference and custom Paddle training progress. It was an experimental project for Paddle and just exposed every APIs from C++ when developer using it. It is important to have a plan for python API and fix some essential defects about it.

The defects of API are listed below:

In Paddle, there are many global variables, which may lead to some error when loading multiple models. Related issue #601 (closed).
- The command line flags are global variables, but some of them should be configured by each neural network, instead of global variables.
  - FLAGS_use_gpu is used to configure a neural network using GPU or not. When using paddle train, there is one neural network trained by Paddle, so the global variable use_gpu doesn't matter. However, it is reasonable for users to train multiple neural networks in one program when using Paddle as a library.
  - FLAGS_gpu_id is not appropriate for the same reason as above. The possible flags are in Flags.h.
- The global variables in configuration file parser.
  - In trainer config helpers, there are some decorators use global variables. It should be fixed or give an easy way to reset global variables.
- There may be other global variables which are not listed here.
The LOG(FATAL) and CHECK errors in Paddle.
- The design philosophy in Paddle is crash as soon as possible. There are many LOG(FATAL) and CHECK in Paddle code. How to handle them correctly in Paddle library is very important. Because it should not exit the user program when some operations are illegal.
The PyDataProvider2 and the config file parsing in Paddle process.
- There is a python interpreter embedded inside Paddle to parsing configuration file and loading data. It forces users to split one python script to three scripts when using Python library.
```
* Also, the process model is very confusing. The call stack when using python library will be `Python => Paddle => Python`. The two python interpreters in one process may share some global variable, lead to some unknown behaviour.
```
The c++ APIs are not all exposed. Several levels of APIs for Paddle are listed below.
- The Trainer API. Trainer API is the highest level API. It contains loadModel, trainOnePass, trainOneMiniBatch, etc. They were exposed and we should check whether current API is enough for Trainer.
- The GradientMachine API. The Gradient Machine is an abstract for the neural network. By using GradientMachine API, users can customise training progress. Users can forward a neural network 10 times, and do backwards. This API were partially exposed and we should expose them all.
- The Layer API. API to control each layer forward and backwards. It seems that these APIs are not useful now, and none of them is exposed.
- The Matrix API. APIs to use Paddle matrix in Python. They are partially exposed, should work for feeding Data to GradientMachine. We should check whether current API is enough.
- The Util API. APIs for Paddle utilities, such as threading, networking, etc. None of them is exposed basically, except some API for parsing command line arguments, initialization for Paddle process.

PaddlePaddle / Paddle 大约 2 年 前同步成功

Defects of Python API

PaddlePaddle / Paddle
大约 2 年前同步成功