PaddlePaddle Design Doc¶

Ingredients¶

As our design principle is starting from the essence: how could we allow users to express and solve their problems as neural networks. Some essential concepts that our API have to provide include:

A topology is an expression of layers.
A layer could be any kind of computation, including cost.
Some layers have parameters, some don’t. Most costs don’t have parameters.
In some topologies, layers share parameters. For example, the network for training a ranking model.
At programming time, users specify topologies and possible sharing of parameters. PaddlePaddle can figure out and create parameters required (and possibly shared) by one or more topologies.

Starting from Examples¶

As a summarization of our disucssion, let us present two examples here:

Summarization¶

Above two programs reveal some important design concerns:

Users describe a topology as an expression of layers. Every layer has a parameter name. If the users don’t specify it explicitly, it’s automatically generated as a unique name. By specifying the parameter name, users can specify the sharing of parameters between layers and even between topologies.
paddle.parameters.create figures out parameters required by one or more topologies from parameter names of layers. It creates these parameters and returns a ParameterSet object, which is in essence a map from parameter names to parameters.
At training and inference time, paddle.train and paddle.infer requires both a topology and the parameter set that holds the parameters of that topology. There are some reasons:
1. This prevents users from forgetting to call paddle.parameters.create.
2. paddle.train needs to know which parameter set to update.
3. Users could load another (pre-trained) parameter set and use it with a topology in train.infer.
By specifying the immutable_parameters parameter of paddle.train, we can forbid the update of these parameters.

Reader¶

Not all programming frameworks allow users to define I/O functions. An example is Google MapReduce, which can only read from text, SSTable, and RecordIO files. Hadoop MapReduce allows users to define readers and writers by deriving from base classes Reader and Writer. The former is less flexible but also less error-prone. We decide to provide the flexibility to users to define their readers.

There are some open questions here:

Should a reader return a Python dictionary?
How to map multiple outputs from a reader to multiple data layers?
How to easily compose some existing readers to read more data and feed a topology with more data layers?

Training¶

The recommended way to training a model is to call paddle.train, which simply calls paddle.trainer.Default, a global variable of type paddle.trainer.SGD. Equivalently, we can do

opt = paddle.trainer.SGD(..., paddle.updater.Adam(...))
opt.train(topology, parameters, reader=read, ...)

Updater¶

Please be aware that a trainer can accept an updater as its data member, where an updater is a class derived from paddle.trainer.Updater. This is to make it easier to customize trainers, as discussed here.

Event Handler¶

paddle.train and paddle.trainer.XXX.train take an optional parameter event_handler, which should be either None or a function that handle some events:

BeginTraining
EndTraining
BeginIteration
EndIteration
BeginPass
EndPass

where EndPass is sent if and only if the reader yields end_pass=True.

An example as follows:

def event_handler(event):
    if ininstance(event, paddle.event.EndIteration):
        print paddle.test(...)

paddle.train(topology, parameters, reader, event_handler)

If we are writing a PaddlePaddle program in and for iPython/Jypyter, we can use metaplotlib in the event handler to plot a curve of cost/error versus iterations, as shown here.

Distributed Training¶

If users want to do distributed training on a cluster, s/he should call paddle.dist_train and provides access tokens to the cluster as a parameter.

For example, if the user has a TLS certificate that allows him to access a Kubernetes cluster, s/he should be able to call

paddle.dist_train(model,
                  trainer=paddle.trainer.SGD(...,
                                             paddle.updater.Adam(...)),
                  reader=read,
                  k8s_user="yi",
                  k8s_token="kube_cluster_tls.pem",
                  k8s_job="hello",
                  num_parameter_servers=15)

The pseudo code of paddle.dist_train is as follows:

def dist_train(topology, parameters, trainer, reader, ...):
    if os.getenv("KUBERNETES_SERVICE_HOST") == None:
        image_name = k8s_user + '/' + k8s_job
        docker_build(image_name)
        docker_push()
        kube_ctrl_start_job(image_name, k8s_user, k8s_token)
    else:
        rank = kube_list_containers_in_job_and_return_current_containers_rank()
        if rank == 0:
            master()
        elif rank < 15:
            parameter_server()
        else:
            trainer.train(model, reader=read)

Please be aware that if a process is running on the Kubernetes cluster, it will have some environment variables pre-defined.

If dist_train doesn’t see these environment variables, it knows that it’s running on users’ personal computer, and it should work as a launcher. Otherwise, it knows that it’s running on the cluster and need to figure out its role as either the master, or a trainer, or a parameter server.