diff --git a/doc/design/api.md b/doc/design/api.md
index 5a90cd1c7562387bdf723181f34d70b2afc079a1..3cfb67cb00d7914fac7203b2a2e462af2138ed0c 100644
--- a/doc/design/api.md
+++ b/doc/design/api.md
@@ -1,38 +1,90 @@
-import yi_json
-
-g = 100
-def read():
-    queue q;
-    # warmup q
-    for i = 0 : 1000
-        q.push(read())
-    yield q.shuffle_get()
-
-input = paddle.layer.data(...)
-intermediate = paddle.layers.fc(input)
-output = paddle.layer.softmax(intermediate)
-
-model = paddle.model.create(output)
-
-train(model, data_provider=read, cluster="clusterId")
-
-#--------------------------------------------------------------------------------
-
-# 1. package, docker build, docker push
-# 2. kubectl, clusterId Kuberentes job, 10 trainer containers, 5 parameter server containers
-
-#--------------------------------------------------------------------------------
-
-def train():
-    if os.environ["kube_api_server"] == nil:
-        docker_build()
-        docker_push()
-        kube_ctrl_start_job()
-    else:
-        rank = kube_mpi_rank()
-        if rank == 0:
-            master()
-        elif rank >= 15:
-            parameter_server()
-        else:
-            _train()
+# Design Doc: PaddlePaddle API
+
+## Ingredients
+
+As the first step of our design, we list important concepts in deep
+learning and try to figure their relationship, as shown below:
+
+```
+Model = {topology, parameters}
+
+Evaluator = {Model*, activations}
+- forward
+- test
+
+GradientMachine = {Model*, gradients}
+- backward
+
+Optimizer = {Model*, Evaluator*, GradientMachine*}
+- train
+- update
+- checkpoint
+```
+
+where the pair of curly braces `{` and `}` indicate *composition*, `*`
+indicates a *reference*, and `-` marks a "class method".
+
+
+### Model
+
+We used to think that parameters are part of the toplogy (or layers).
+But that is not true, because multiple layers could share the same
+parameter matrix.  An example is a network that compares two text
+segments in a semantic space:
+
+```
+          semantic
+text A -> projection ---\
+          layer A        \
+                          cosine
+                          similarity -> output
+                          layer
+          semantic       /
+text B -> projection ---/
+          layer B
+```
+
+In this network, the two semantic projection layers (A and B) share
+the same parameter matrix.
+
+For more information about our API that specifies topology and
+parameter sharing, please refer to [TODO: API].
+
+
+### Evaluator
+
+Supposed that we have a trained ranking model, we should be able to
+use it in our search engine.  The search engine's Web server is a
+concurrent program so to serve many HTTP requests simultaneously.  It
+doens't make sense for each of these threads to have its own copy of
+model, because that would duplicate topologies and parameters.
+However, each thread should be able to record layer outputs, i.e.,
+activations, computed from an input, derived from the request.  With
+*Evaluator* that saves activations, we can write the over-simplified
+server program as:
+
+```python
+m = paddle.model.load("trained.model")
+
+http.handle("/",
+            lambda req:
+                e = paddle.evaluator.create(m)
+                e.forward(req)
+				e.activation(layer="output")) # returns activations of layer "output"
+```
+
+### GradientMachine
+
+Similar to the evaluation, the training needs to compute gradients so
+to update model parameters.  Because an [optimizer](#optimizer) might
+run multiple simultaneous threads to update the same model, gradients
+should be separated from the model.  Because gradients are only used
+in training, but not serving, they should be separate from Evaluator.
+Hence the `GradientMachine`.
+
+### Optimizer
+
+None of Model, Evaluator, nor GradientMachine implements the training
+loop, hence Optimizer.  We can define a concurrent optimizer that runs
+multiple simultaneious threads to train a model -- just let each
+thread has its own GradientMachine object.