Update design doc for Python Layer (#4698)

* Update design doc for Python Layer * Update document

Update design doc for Python Layer (#4698)
* Update design doc for Python Layer * Update document
790b9ce4 · Yu Yang · GitHub · 186d1655 · 790b9ce4
显示空白变更内容
内联并排

Showing with 89 addition and 25 deletion

doc/design/python_api.md doc/design/python_api.md +89 -25

未找到文件。
--- a/doc/design/python_api.md
+++ b/doc/design/python_api.md
@@ -179,40 +179,104 @@ init_attr={
 `optimize_op_attrs` is not in the `VarDesc` message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator's `OpDesc`, and will be in the `OpDesc` message.
-## Layer Functions
+## Layer Function
 A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers.
-### Data Layer
+Layer functions take `Variable` and configuration parameters as its input and return the output variable(s).
+For example, `FullyConnected` take one or more variable as its input. The input could be input data or another layer's output. There are many configuration options for a `FullyConnected` layer, such as layer size, activation, parameter names, initialization strategies of parameters, and so on. The `FullyConnected` layer will return an output variable.
+### Necessity for reusing code between layer functions
+There are a lot of code that can be reused. Such as
+* Give the default value of configuration. e.g., default initialize strategy for parameters is uniform random with `min = -1.0`, `max = 1.0`. and default initialize strategy for bias is to fill zero.
+* Append the activation operator.
+* Create a temporary variable.
+* Create parameter.
+* Generate a unique name.
+* Add a bias.
+* ...
+A mechanism to reuse code between layer functions is necessary. It will be around [150 lines of code](https://github.com/PaddlePaddle/Paddle/pull/4724/files#diff-823b27e07e93914ada859232ae23f846R12) if we write a `FullyConnected` layer without any helper functions.
+### Comparision between global functions and helper class
+The `FullyConnected` layer will be as follow when we provide global functions:
 ```python
-def data_layer(name, type, column_name):
+def fc_layer(input, size, param_attr=None, bias_attr=None, act=None, name=None):
-    block = the_current_program.glolal_block()
+  if name is None:
-    var = block.create_global_var(
+    name = unique_name("fc")
-            name=name,
+  input = multiple_input(input)
-            shape=[None] + type.dims(),
+  param_attr = default_param_attr(param_attr)
-            dtype=type.dtype)
+  param_attr = multiple_param_attr(param_attr, len(input))
-    block.prepend_operator(block,
-                           type="Feed",
+  # mul
-                           inputs = None,
+  mul_results = []
-                           outputs = [var],
+  for ipt, attr in zip(input, param_attr):
-                           {column_name: column_name})
+    shape = ipt.shape[1:] + [size]
-    return var
+    w = g_program.global_block().create_parameter(shape, ipt.dtype, name, attr)
+    tmp = create_tmp_var(name)
+    g_program.current_block().append_op("mul", {ipt, w}, {tmp})
+  mul_results.append(tmp)
+  # add sum
+  ...
+  # add bias
+  ...
+  # add activation
+  ...
+  return out
 ```
-The input to the feed operator is a special variable in the global scope, which is the output of [Python readers](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md).
+We can provide many helpers functions for layer developers. However, there are several disadvantages for global helper functions:
+1. We need a namespace for these methods, then layer developers can quickly figure out what method they can use.
+2. Global functions will force layer developers to pass its parameter time by time.
-### FC Layer
+So we provide a helper class, `LayerHelper`, to share code between layer functions. The `FullyConnected` Layer will be as follow.
 ```python
-def fc_layer(input, size, ...):
+def fc_layer(input, size, param_attr=None, bias_attr=None, act=None, name=None):
-    block = program.current_block()
+  helper = LayerHelper(locals())  # pass all parameter to LayerHelper
-    w = block.create_parameter(...)
-    b = block.create_parameter(...)
+  mul_results = []
-    out = block.create_var()
+  for ipt, param in helper.iter_multiple_input_and_param():
-    op = block.append_operator("FC", X=input, W=w, b=b, out=out)
+    w = helper.create_parameter(shape=ipt.shape[1:] + [size], dtype = ipt.dtype)
-    out.writer = op
+    tmp = helper.create_tmp_variable()
-    return out
+    helper.append_op('mul', {ipt, w}, {tmp})
+    mul_results.append(tmp)
+  pre_bias = helper.add_sum(mul_results)
+  pre_activation = helper.add_bias(pre_bias)
+  return helper.add_activation(pre_activation)
+```
+We not only use the fewer lines of code to write `fc_layer` but also make the code clearer to understand. At the same time, layer developers can figure out what function they can invoke by typing `helper.` in a python editor.
+### Implementation of layer helper
+We just keep all parameters of a layer function as a dictionary in layer helper as a private data member. Every method of layer helper will look up the dictionary after it is invoked. In that way, we can implement a layer helper for all layer functions even some layer does not contain some operator. For example, The `activation` is used by the FullyConnected layer or convolution layers, but a cross-entropy layer does not use it. The example code of `add_activation` are:
+```python
+class LayerHelper(object):
+  def __init__(self, **kwargs):  # kwargs is short for `keyword arguments`
+    self.kwargs = kwargs
+  def add_activation(self, input_var):
+    act = self.kwargs.get("act", None)  # default value is None
+    if act is None:  # do nothing if no act
+      return input_var
+    tmp = self.create_tmp_var(self)
+    self.append_op(type=act, input=input_var, output=tmp)
+    return tmp
 ```
 ## Optimizer