PaddlePaddle Design Doc¶
-Ingredients¶
-As our design principle is starting from the essence: how could we -allow users to express and solve their problems as neural networks. -Some essential concepts that our API have to provide include:
--
-
- A topology is an expression of layers. -
- A layer could be any kind of computation, including cost. -
- Some layers have parameters, some don’t. Most costs don’t have -parameters. -
- In some topologies, layers share parameters. For -example, -the network for training a ranking model. -
- At programming time, users specify topologies and possible sharing -of parameters. PaddlePaddle can figure out and create parameters -required (and possibly shared) by one or more topologies. -
Starting from Examples¶
-As a summarization -of -our disucssion, -let us present two examples here:
-Example 1. Sharing Parameters between Layers¶
-We use -the -3-branch ranking model -in this example. For your convenience, I copy-a-paste the model’s -topology as follows:
-A -> f -\
-Q -> f --> cost
-B -> f -/
-
The following program trains the topology including the cost, and then -use the sub-network in the trained topology in inference:
-def f(in):
- e = paddle.layer.embedding(in, parameter_name="embedding")
- o = paddle.layer.softmax(e, parameter_name="semantic")
- return o
-
-# Create 3 topologies (subnets), they share parameters because all
-# correspoinding layers have the same parameter names.
-fA = f(paddle.layer.data(input_name="A"))
-fB = f(paddle.layer.data(input_name="B"))
-fQ = f(paddle.layer.data(input_name="Q"))
-
-topology = paddle.layer.less_than(
- paddle.layer.cross_entropy(fA, fQ),
- paddle.layer.corss_entropy(fB, fQ))
-
-# Derive parameters required in topology and create them in model.
-parameters = paddle.parameters.create(topology)
-
-# Estimate parameters used in topology from data.
-paddle.train(topology, parameters, reader=read_ranking_model_data)
-
-# Inference using fA (or fB or fC, as they share their parameters).
-[testA, testB, testQ] = read_ranking_model_data()
-print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)
-
Example 2. Sharing Parameters between “Models”¶
-We use GAN in
-this example. In the following example program, d0
and d1
-correspond to the two networks in the following figure:
def G(in):
- # over-simplified example as G has only one layers:
- return paddle.layer.fc(in, parameter_name="G")
-
-def D(in);
- # again, over-simplified:
- return paddle.layer.fc(in, parameter_name="D")
-
-# Construct the first topology, which contains both D and G.
-# By learning this topology, we update parameters of G.
-d0 = paddle.layer.should_be_false(D(G(paddle.layer.data())))
-
-# Construct a second topology d1, which contains only D. By
-# training this topology, we update parameters of D. Note
-# that d1 share parameters with d0.
-d1 = paddle.layer.should_be_true(D(paddle.layer.data()))
-
-# Create parameters from a list of multiple topologies (models) for
-# the chance to share parameters between these topologies.
-parameters = paddle.parameters.create([d0, d1])
-
-# Iterative training of GAN.
-for ...:
- train(d0, parameters, reader=read_from_rng, immutable_parameters={"D"})
- train(d1, parameters, reader=read_from_realistic_images)
-
-# Use d1 for inference:
-print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images)
-
Summarization¶
-Above two programs reveal some important design concerns:
--
-
- Users describe a topology as an expression of layers. Every layer -has a parameter name. If the users don’t specify it explicitly, it’s automatically generated as a unique name. By -specifying the parameter name, users can specify the sharing of -parameters between layers and even between topologies. -
paddle.parameters.create
figures out parameters required by one -or more topologies from parameter names of layers. It creates these -parameters and returns aParameterSet
object, which is in essence -a map from parameter names to parameters.
-- At training and inference time,
paddle.train
andpaddle.infer
-requires both a topology and the parameter set that holds the parameters of that topology. There are some reasons:-
-
- This prevents users from forgetting to call
-
paddle.parameters.create
.
- paddle.train
needs to know which parameter set to update.
-- Users could load another (pre-trained) parameter set and use it
-with a topology in
train.infer
.
-
- - This prevents users from forgetting to call
-
- By specifying the
immutable_parameters
parameter of -paddle.train
, we can forbid the update of these parameters.
-
Reader¶
-Not all programming frameworks allow users to define I/O functions.
-An example is Google MapReduce, which can only read from text,
-SSTable, and RecordIO files. Hadoop MapReduce allows users to define
-readers and writers by deriving from base classes Reader
and
-Writer
. The former is less flexible but also less error-prone. We
-decide to provide the flexibility to users to define their readers.
There are some open questions here:
--
-
- Should a reader return a Python dictionary? -
- How to map multiple outputs from a reader to multiple data layers? -
- How to easily compose some existing readers to read more data and -feed a topology with more data layers? -
Training¶
-The recommended way to training a model is to call paddle.train
,
-which simply calls paddle.trainer.Default
, a global variable of
-type paddle.trainer.SGD
. Equivalently, we can do
opt = paddle.trainer.SGD(..., paddle.updater.Adam(...))
-opt.train(topology, parameters, reader=read, ...)
-
Updater¶
-Please be aware that a trainer can accept an updater as its data
-member, where an updater is a class derived from
-paddle.trainer.Updater
. This is to make it easier to customize
-trainers, as discussed
-here.
Event Handler¶
-paddle.train
and paddle.trainer.XXX.train
take an optional
-parameter event_handler
, which should be either None
or a function
-that handle some events:
-
-
- BeginTraining -
- EndTraining -
- BeginIteration -
- EndIteration -
- BeginPass -
- EndPass -
where EndPass is sent if and only if the reader yields
-end_pass=True
.
An example as follows:
-def event_handler(event):
- if ininstance(event, paddle.event.EndIteration):
- print paddle.test(...)
-
-paddle.train(topology, parameters, reader, event_handler)
-
If we are writing a PaddlePaddle program in and for iPython/Jypyter, -we can use metaplotlib in the event handler to plot a curve of -cost/error versus iterations, as shown -here.
-Distributed Training¶
-If users want to do distributed training on a cluster, s/he should
-call paddle.dist_train
and provides access tokens to the cluster as
-a parameter.
For example, if the user has a TLS certificate that allows him to -access a Kubernetes cluster, s/he should be able to call
-paddle.dist_train(model,
- trainer=paddle.trainer.SGD(...,
- paddle.updater.Adam(...)),
- reader=read,
- k8s_user="yi",
- k8s_token="kube_cluster_tls.pem",
- k8s_job="hello",
- num_parameter_servers=15)
-
The pseudo code of paddle.dist_train
is as follows:
def dist_train(topology, parameters, trainer, reader, ...):
- if os.getenv("KUBERNETES_SERVICE_HOST") == None:
- image_name = k8s_user + '/' + k8s_job
- docker_build(image_name)
- docker_push()
- kube_ctrl_start_job(image_name, k8s_user, k8s_token)
- else:
- rank = kube_list_containers_in_job_and_return_current_containers_rank()
- if rank == 0:
- master()
- elif rank < 15:
- parameter_server()
- else:
- trainer.train(model, reader=read)
-
Please be aware that if a process is running on the Kubernetes -cluster, it will have some environment variables pre-defined.
-If dist_train
doesn’t see these environment variables, it knows
-that it’s running on users’ personal computer, and it should work as a
-launcher. Otherwise, it knows that it’s running on the cluster and
-need to figure out its role as either the master, or a trainer, or a
-parameter server.