PaddlePaddle Design Doc¶
Ingredients¶
As our design principle is starting from the essence: how could we allow users to express and solve their problems as neural networks. Some essential concepts that our API have to provide include:
- A topology is an expression of layers.
- A layer could be any kind of computation, including cost.
- Some layers have parameters, some don’t. Most costs don’t have parameters.
- In some topologies, layers share parameters. For example, the network for training a ranking model.
- At programming time, users specify topologies and possible sharing of parameters. PaddlePaddle can figure out and create parameters required (and possibly shared) by one or more topologies.
Starting from Examples¶
As a summarization of our disucssion, let us present two examples here:
Example 1. Sharing Parameters between Layers¶
We use the 3-branch ranking model in this example. For your convenience, I copy-a-paste the model’s topology as follows:
A -> f -\
Q -> f --> cost
B -> f -/
The following program trains the topology including the cost, and then use the sub-network in the trained topology in inference:
def f(in):
e = paddle.layer.embedding(in, parameter_name="embedding")
o = paddle.layer.softmax(e, parameter_name="semantic")
return o
# Create 3 topologies (subnets), they share parameters because all
# correspoinding layers have the same parameter names.
fA = f(paddle.layer.data(input_name="A"))
fB = f(paddle.layer.data(input_name="B"))
fQ = f(paddle.layer.data(input_name="Q"))
topology = paddle.layer.less_than(
paddle.layer.cross_entropy(fA, fQ),
paddle.layer.corss_entropy(fB, fQ))
# Derive parameters required in topology and create them in model.
parameters = paddle.parameters.create(topology)
# Estimate parameters used in topology from data.
paddle.train(topology, parameters, reader=read_ranking_model_data)
# Inference using fA (or fB or fC, as they share their parameters).
[testA, testB, testQ] = read_ranking_model_data()
print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)
Example 2. Sharing Parameters between “Models”¶
We use GAN in
this example. In the following example program, d0
and d1
correspond to the two networks in the following figure:
def G(in):
# over-simplified example as G has only one layers:
return paddle.layer.fc(in, parameter_name="G")
def D(in);
# again, over-simplified:
return paddle.layer.fc(in, parameter_name="D")
# Construct the first topology, which contains both D and G.
# By learning this topology, we update parameters of G.
d0 = paddle.layer.should_be_false(D(G(paddle.layer.data())))
# Construct a second topology d1, which contains only D. By
# training this topology, we update parameters of D. Note
# that d1 share parameters with d0.
d1 = paddle.layer.should_be_true(D(paddle.layer.data()))
# Create parameters from a list of multiple topologies (models) for
# the chance to share parameters between these topologies.
parameters = paddle.parameters.create([d0, d1])
# Iterative training of GAN.
for ...:
train(d0, parameters, reader=read_from_rng, immutable_parameters={"D"})
train(d1, parameters, reader=read_from_realistic_images)
# Use d1 for inference:
print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images)
Summarization¶
Above two programs reveal some important design concerns:
- Users describe a topology as an expression of layers. Every layer has a parameter name. If the users don’t specify it explicitly, it’s automatically generated as a unique name. By specifying the parameter name, users can specify the sharing of parameters between layers and even between topologies.
paddle.parameters.create
figures out parameters required by one or more topologies from parameter names of layers. It creates these parameters and returns aParameterSet
object, which is in essence a map from parameter names to parameters.- At training and inference time,
paddle.train
andpaddle.infer
requires both a topology and the parameter set that holds the parameters of that topology. There are some reasons:- This prevents users from forgetting to call
paddle.parameters.create
. paddle.train
needs to know which parameter set to update.- Users could load another (pre-trained) parameter set and use it
with a topology in
train.infer
.
- This prevents users from forgetting to call
- By specifying the
immutable_parameters
parameter ofpaddle.train
, we can forbid the update of these parameters.
Reader¶
Not all programming frameworks allow users to define I/O functions.
An example is Google MapReduce, which can only read from text,
SSTable, and RecordIO files. Hadoop MapReduce allows users to define
readers and writers by deriving from base classes Reader
and
Writer
. The former is less flexible but also less error-prone. We
decide to provide the flexibility to users to define their readers.
There are some open questions here:
- Should a reader return a Python dictionary?
- How to map multiple outputs from a reader to multiple data layers?
- How to easily compose some existing readers to read more data and feed a topology with more data layers?
Training¶
The recommended way to training a model is to call paddle.train
,
which simply calls paddle.trainer.Default
, a global variable of
type paddle.trainer.SGD
. Equivalently, we can do
opt = paddle.trainer.SGD(..., paddle.updater.Adam(...))
opt.train(topology, parameters, reader=read, ...)
Updater¶
Please be aware that a trainer can accept an updater as its data
member, where an updater is a class derived from
paddle.trainer.Updater
. This is to make it easier to customize
trainers, as discussed
here.
Event Handler¶
paddle.train
and paddle.trainer.XXX.train
take an optional
parameter event_handler
, which should be either None
or a function
that handle some events:
- BeginTraining
- EndTraining
- BeginIteration
- EndIteration
- BeginPass
- EndPass
where EndPass is sent if and only if the reader yields
end_pass=True
.
An example as follows:
def event_handler(event):
if ininstance(event, paddle.event.EndIteration):
print paddle.test(...)
paddle.train(topology, parameters, reader, event_handler)
If we are writing a PaddlePaddle program in and for iPython/Jypyter, we can use metaplotlib in the event handler to plot a curve of cost/error versus iterations, as shown here.
Distributed Training¶
If users want to do distributed training on a cluster, s/he should
call paddle.dist_train
and provides access tokens to the cluster as
a parameter.
For example, if the user has a TLS certificate that allows him to access a Kubernetes cluster, s/he should be able to call
paddle.dist_train(model,
trainer=paddle.trainer.SGD(...,
paddle.updater.Adam(...)),
reader=read,
k8s_user="yi",
k8s_token="kube_cluster_tls.pem",
k8s_job="hello",
num_parameter_servers=15)
The pseudo code of paddle.dist_train
is as follows:
def dist_train(topology, parameters, trainer, reader, ...):
if os.getenv("KUBERNETES_SERVICE_HOST") == None:
image_name = k8s_user + '/' + k8s_job
docker_build(image_name)
docker_push()
kube_ctrl_start_job(image_name, k8s_user, k8s_token)
else:
rank = kube_list_containers_in_job_and_return_current_containers_rank()
if rank == 0:
master()
elif rank < 15:
parameter_server()
else:
trainer.train(model, reader=read)
Please be aware that if a process is running on the Kubernetes cluster, it will have some environment variables pre-defined.
If dist_train
doesn’t see these environment variables, it knows
that it’s running on users’ personal computer, and it should work as a
launcher. Otherwise, it knows that it’s running on the cluster and
need to figure out its role as either the master, or a trainer, or a
parameter server.