Update api.md

dd229dc7 · Yi Wang · 36036c0e · dd229dc7
隐藏空白更改
内联并排

Showing with 90 addition and 38 deletion

doc/design/api.md doc/design/api.md +90 -38

未找到文件。
--- a/doc/design/api.md
+++ b/doc/design/api.md
-import yi_json
+# Design Doc: PaddlePaddle API
-g = 100
+## Ingredients
-def read():
-    queue q;
+As the first step of our design, we list important concepts in deep
-    # warmup q
+learning and try to figure their relationship, as shown below:
-    for i = 0 : 1000
-        q.push(read())
+```
-    yield q.shuffle_get()
+Model = {topology, parameters}
-input = paddle.layer.data(...)
+Evaluator = {Model*, activations}
-intermediate = paddle.layers.fc(input)
+- forward
-output = paddle.layer.softmax(intermediate)
+- test
-model = paddle.model.create(output)
+GradientMachine = {Model*, gradients}
+- backward
-train(model, data_provider=read, cluster="clusterId")
+Optimizer = {Model*, Evaluator*, GradientMachine*}
-#--------------------------------------------------------------------------------
+- train
+- update
-# 1. package, docker build, docker push
+- checkpoint
-# 2. kubectl, clusterId Kuberentes job, 10 trainer containers, 5 parameter server containers
+```
-#--------------------------------------------------------------------------------
+where the pair of curly braces `{` and `}` indicate *composition*, `*`
+indicates a *reference*, and `-` marks a "class method".
-def train():
-    if os.environ["kube_api_server"] == nil:
-        docker_build()
+### Model
-        docker_push()
-        kube_ctrl_start_job()
+We used to think that parameters are part of the toplogy (or layers).
-    else:
+But that is not true, because multiple layers could share the same
-        rank = kube_mpi_rank()
+parameter matrix.  An example is a network that compares two text
-        if rank == 0:
+segments in a semantic space:
-            master()
-        elif rank >= 15:
+```
-            parameter_server()
+          semantic
-        else:
+text A -> projection ---\
-            _train()
+          layer A        \
+                          cosine
+                          similarity -> output
+                          layer
+          semantic       /
+text B -> projection ---/
+          layer B
+```
+In this network, the two semantic projection layers (A and B) share
+the same parameter matrix.
+For more information about our API that specifies topology and
+parameter sharing, please refer to [TODO: API].
+### Evaluator
+Supposed that we have a trained ranking model, we should be able to
+use it in our search engine.  The search engine's Web server is a
+concurrent program so to serve many HTTP requests simultaneously.  It
+doens't make sense for each of these threads to have its own copy of
+model, because that would duplicate topologies and parameters.
+However, each thread should be able to record layer outputs, i.e.,
+activations, computed from an input, derived from the request.  With
+*Evaluator* that saves activations, we can write the over-simplified
+server program as:
+```python
+m = paddle.model.load("trained.model")
+http.handle("/",
+            lambda req:
+                e = paddle.evaluator.create(m)
+                e.forward(req)
+				e.activation(layer="output")) # returns activations of layer "output"
+```
+### GradientMachine
+Similar to the evaluation, the training needs to compute gradients so
+to update model parameters.  Because an [optimizer](#optimizer) might
+run multiple simultaneous threads to update the same model, gradients
+should be separated from the model.  Because gradients are only used
+in training, but not serving, they should be separate from Evaluator.
+Hence the `GradientMachine`.
+### Optimizer
+None of Model, Evaluator, nor GradientMachine implements the training
+loop, hence Optimizer.  We can define a concurrent optimizer that runs
+multiple simultaneious threads to train a model -- just let each
+thread has its own GradientMachine object.