Paddle Multiple Language API/SDK

Created by: reyoung

Paddle is a standalone application now, which cannot customize training progress conveniently. The current API of Paddle only support the model inference.

We consider to rewrite current API now and make Paddle as a standard Python library and could easily port to another programming language.

There are several agreements and todos for this feature.

Using standard C99 API instead of SWIG

SWIG API is excellent for Python binding, but it seems not work smoothly for other languages, such as Julia, Go. Make Paddle integrated to other systems easily is an essential requirement for Paddle API.

Only expose GradientMachine.

The GradientMachine is an abstraction class of neural network, which can perform forward/backward on multiple local devices(CPU cores, GPU cards). In the cluster environment, we should provide the same abstraction with some additional configurations about node count, etc.

The GradientMachine will always act as a single thread program. We won't provide API about how to sending data from one GPU to another, how to use many CPUs, etc. We think that API is too low-level, and is not necessary to be exposed.

And there are few rules about GradientMachine-API:

Expose GradientMachine as details as possible.
The ParameterUpdater is exposed in C-API, but not for end-users.

Wrap C-API into a standard Python library.

Python is used widely in neural network domains. We will write a standard Python library as the first language binding.

However, Python library can be considered as a demo only; other language bindings are welcome to contribute.

Possible Python API Demos

Here is a possible Python usage in current design. It would be flux.

import paddle

@paddle.network(
    input_types = {
        'img': dense_vector(784),
        'label': integer_value(10)
    }
)
def mnist_network(img, label):
     hidden1 = fc_layer(input=img, size=200)
     hidden2 = fc_layer(input=hidden2, size=200)
     inference = fc_layer(input=hidden2, size=10, act=SoftmaxActivation())
     cost = classification_cost(input=inferrence, label=label)
     return cost


@mnist_network.train_data(files = ['dataset1.txt', 'dataset2.txt'])
@mnist_network.test_data(files=['dataset_test.txt'])
def provider(filename):
      with open(filename) as f:
          for each_sample in readFromFile(f):
               yield each_sample

if __name__ == '__main__':  #main function.
    network = mnist_network()
    #trainer = network.createClusterTrainer("node0, node1")
    trainer = network.createLocalTrainer("gpu0, gpu1").withSGDOptimizer(learning_rate=0.001, batch_size=200)

    for _ in xrange(100):
        trainer.trainOnePass()

PaddlePaddle / Paddle
1 年多前同步成功

Paddle Multiple Language API/SDK

Using standard C99 API instead of SWIG

Only expose GradientMachine.

Wrap C-API into a standard Python library.

Possible Python API Demos

Tasks

Step 1. Single Machine Development.

Step 2. Cluster development.

PaddlePaddle / Paddle 1 年多 前同步成功

Paddle Multiple Language API/SDK

Using standard C99 API instead of SWIG

Only expose GradientMachine.

Wrap C-API into a standard Python library.

Possible Python API Demos

Tasks

Step 1. Single Machine Development.

Step 2. Cluster development.

PaddlePaddle / Paddle
1 年多前同步成功