`optimize_op_attrs` is not in the `VarDesc` message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator's `OpDesc`, and will be in the `OpDesc` message.
## Layer Functions
## Layer Function
A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers.
### Data Layer
Layer functions take `Variable` and configuration parameters as its input and return the output variable(s).
For example, `FullyConnected` take one or more variable as its input. The input could be input data or another layer's output. There are many configuration options for a `FullyConnected` layer, such as layer size, activation, parameter names, initialization strategies of parameters, and so on. The `FullyConnected` layer will return an output variable.
### Necessity for reusing code between layer functions
There are a lot of code that can be reused. Such as
* Give the default value of configuration. e.g., default initialize strategy for parameters is uniform random with `min = -1.0`, `max = 1.0`. and default initialize strategy for bias is to fill zero.
* Append the activation operator.
* Create a temporary variable.
* Create parameter.
* Generate a unique name.
* Add a bias.
* ...
A mechanism to reuse code between layer functions is necessary. It will be around [150 lines of code](https://github.com/PaddlePaddle/Paddle/pull/4724/files#diff-823b27e07e93914ada859232ae23f846R12) if we write a `FullyConnected` layer without any helper functions.
### Comparision between global functions and helper class
The `FullyConnected` layer will be as follow when we provide global functions:
The input to the feed operator is a special variable in the global scope, which is the output of [Python readers](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md).
We can provide many helpers functions for layer developers. However, there are several disadvantages for global helper functions:
1. We need a namespace for these methods, then layer developers can quickly figure out what method they can use.
2. Global functions will force layer developers to pass its parameter time by time.
### FC Layer
So we provide a helper class, `LayerHelper`, to share code between layer functions. The `FullyConnected` Layer will be as follow.
```python
def fc_layer(input, size, ...):
block = program.current_block()
w = block.create_parameter(...)
b = block.create_parameter(...)
out = block.create_var()
op = block.append_operator("FC", X=input, W=w, b=b, out=out)
helper = LayerHelper(locals()) # pass all parameter to LayerHelper
mul_results = []
for ipt, param in helper.iter_multiple_input_and_param():
w = helper.create_parameter(shape=ipt.shape[1:] + [size], dtype = ipt.dtype)
tmp = helper.create_tmp_variable()
helper.append_op('mul', {ipt, w}, {tmp})
mul_results.append(tmp)
pre_bias = helper.add_sum(mul_results)
pre_activation = helper.add_bias(pre_bias)
return helper.add_activation(pre_activation)
```
We not only use the fewer lines of code to write `fc_layer` but also make the code clearer to understand. At the same time, layer developers can figure out what function they can invoke by typing `helper.` in a python editor.
### Implementation of layer helper
We just keep all parameters of a layer function as a dictionary in layer helper as a private data member. Every method of layer helper will look up the dictionary after it is invoked. In that way, we can implement a layer helper for all layer functions even some layer does not contain some operator. For example, The `activation` is used by the FullyConnected layer or convolution layers, but a cross-entropy layer does not use it. The example code of `add_activation` are:
```python
class LayerHelper(object):
def __init__(self, **kwargs): # kwargs is short for `keyword arguments`
self.kwargs = kwargs
def add_activation(self, input_var):
act = self.kwargs.get("act", None) # default value is None
<p><codeclass="docutils literal"><spanclass="pre">optimize_op_attrs</span></code> is not in the <codeclass="docutils literal"><spanclass="pre">VarDesc</span></code> message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator’s <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code>, and will be in the <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> message.</p>
</div>
</div>
<divclass="section"id="layer-functions">
<spanid="layer-functions"></span><h2>Layer Functions<aclass="headerlink"href="#layer-functions"title="Permalink to this headline">¶</a></h2>
<divclass="section"id="layer-function">
<spanid="layer-function"></span><h2>Layer Function<aclass="headerlink"href="#layer-function"title="Permalink to this headline">¶</a></h2>
<p>A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers.</p>
<divclass="section"id="data-layer">
<spanid="data-layer"></span><h3>Data Layer<aclass="headerlink"href="#data-layer"title="Permalink to this headline">¶</a></h3>
<p>Layer functions take <codeclass="docutils literal"><spanclass="pre">Variable</span></code> and configuration parameters as its input and return the output variable(s).</p>
<p>For example, <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> take one or more variable as its input. The input could be input data or another layer’s output. There are many configuration options for a <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> layer, such as layer size, activation, parameter names, initialization strategies of parameters, and so on. The <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> layer will return an output variable.</p>
<spanid="necessity-for-reusing-code-between-layer-functions"></span><h3>Necessity for reusing code between layer functions<aclass="headerlink"href="#necessity-for-reusing-code-between-layer-functions"title="Permalink to this headline">¶</a></h3>
<p>There are a lot of code that can be reused. Such as</p>
<ulclass="simple">
<li>Give the default value of configuration. e.g., default initialize strategy for parameters is uniform random with <codeclass="docutils literal"><spanclass="pre">min</span><spanclass="pre">=</span><spanclass="pre">-1.0</span></code>, <codeclass="docutils literal"><spanclass="pre">max</span><spanclass="pre">=</span><spanclass="pre">1.0</span></code>. and default initialize strategy for bias is to fill zero.</li>
<li>Append the activation operator.</li>
<li>Create a temporary variable.</li>
<li>Create parameter.</li>
<li>Generate a unique name.</li>
<li>Add a bias.</li>
<li>...</li>
</ul>
<p>A mechanism to reuse code between layer functions is necessary. It will be around <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/pull/4724/files#diff-823b27e07e93914ada859232ae23f846R12">150 lines of code</a> if we write a <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> layer without any helper functions.</p>
<spanid="comparision-between-global-functions-and-helper-class"></span><h3>Comparision between global functions and helper class<aclass="headerlink"href="#comparision-between-global-functions-and-helper-class"title="Permalink to this headline">¶</a></h3>
<p>The <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> layer will be as follow when we provide global functions:</p>
<p>The input to the feed operator is a special variable in the global scope, which is the output of <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md">Python readers</a>.</p>
<p>We can provide many helpers functions for layer developers. However, there are several disadvantages for global helper functions:</p>
<olclass="simple">
<li>We need a namespace for these methods, then layer developers can quickly figure out what method they can use.</li>
<li>Global functions will force layer developers to pass its parameter time by time.</li>
</ol>
<p>So we provide a helper class, <codeclass="docutils literal"><spanclass="pre">LayerHelper</span></code>, to share code between layer functions. The <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> Layer will be as follow.</p>
<spanclass="n">helper</span><spanclass="o">=</span><spanclass="n">LayerHelper</span><spanclass="p">(</span><spanclass="nb">locals</span><spanclass="p">())</span><spanclass="c1"># pass all parameter to LayerHelper</span>
<p>We not only use the fewer lines of code to write <codeclass="docutils literal"><spanclass="pre">fc_layer</span></code> but also make the code clearer to understand. At the same time, layer developers can figure out what function they can invoke by typing <codeclass="docutils literal"><spanclass="pre">helper.</span></code> in a python editor.</p>
<spanid="implementation-of-layer-helper"></span><h3>Implementation of layer helper<aclass="headerlink"href="#implementation-of-layer-helper"title="Permalink to this headline">¶</a></h3>
<p>We just keep all parameters of a layer function as a dictionary in layer helper as a private data member. Every method of layer helper will look up the dictionary after it is invoked. In that way, we can implement a layer helper for all layer functions even some layer does not contain some operator. For example, The <codeclass="docutils literal"><spanclass="pre">activation</span></code> is used by the FullyConnected layer or convolution layers, but a cross-entropy layer does not use it. The example code of <codeclass="docutils literal"><spanclass="pre">add_activation</span></code> are:</p>
<spanclass="k">def</span><spanclass="fm">__init__</span><spanclass="p">(</span><spanclass="bp">self</span><spanclass="p">,</span><spanclass="o">**</span><spanclass="n">kwargs</span><spanclass="p">):</span><spanclass="c1"># kwargs is short for `keyword arguments`</span>
<spanclass="n">act</span><spanclass="o">=</span><spanclass="bp">self</span><spanclass="o">.</span><spanclass="n">kwargs</span><spanclass="o">.</span><spanclass="n">get</span><spanclass="p">(</span><spanclass="s2">"act"</span><spanclass="p">,</span><spanclass="bp">None</span><spanclass="p">)</span><spanclass="c1"># default value is None</span>
<spanclass="k">if</span><spanclass="n">act</span><spanclass="ow">is</span><spanclass="bp">None</span><spanclass="p">:</span><spanclass="c1"># do nothing if no act</span>
`optimize_op_attrs` is not in the `VarDesc` message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator's `OpDesc`, and will be in the `OpDesc` message.
## Layer Functions
## Layer Function
A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers.
### Data Layer
Layer functions take `Variable` and configuration parameters as its input and return the output variable(s).
For example, `FullyConnected` take one or more variable as its input. The input could be input data or another layer's output. There are many configuration options for a `FullyConnected` layer, such as layer size, activation, parameter names, initialization strategies of parameters, and so on. The `FullyConnected` layer will return an output variable.
### Necessity for reusing code between layer functions
There are a lot of code that can be reused. Such as
* Give the default value of configuration. e.g., default initialize strategy for parameters is uniform random with `min = -1.0`, `max = 1.0`. and default initialize strategy for bias is to fill zero.
* Append the activation operator.
* Create a temporary variable.
* Create parameter.
* Generate a unique name.
* Add a bias.
* ...
A mechanism to reuse code between layer functions is necessary. It will be around [150 lines of code](https://github.com/PaddlePaddle/Paddle/pull/4724/files#diff-823b27e07e93914ada859232ae23f846R12) if we write a `FullyConnected` layer without any helper functions.
### Comparision between global functions and helper class
The `FullyConnected` layer will be as follow when we provide global functions:
The input to the feed operator is a special variable in the global scope, which is the output of [Python readers](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md).
We can provide many helpers functions for layer developers. However, there are several disadvantages for global helper functions:
1. We need a namespace for these methods, then layer developers can quickly figure out what method they can use.
2. Global functions will force layer developers to pass its parameter time by time.
### FC Layer
So we provide a helper class, `LayerHelper`, to share code between layer functions. The `FullyConnected` Layer will be as follow.
```python
def fc_layer(input, size, ...):
block = program.current_block()
w = block.create_parameter(...)
b = block.create_parameter(...)
out = block.create_var()
op = block.append_operator("FC", X=input, W=w, b=b, out=out)
helper = LayerHelper(locals()) # pass all parameter to LayerHelper
mul_results = []
for ipt, param in helper.iter_multiple_input_and_param():
w = helper.create_parameter(shape=ipt.shape[1:] + [size], dtype = ipt.dtype)
tmp = helper.create_tmp_variable()
helper.append_op('mul', {ipt, w}, {tmp})
mul_results.append(tmp)
pre_bias = helper.add_sum(mul_results)
pre_activation = helper.add_bias(pre_bias)
return helper.add_activation(pre_activation)
```
We not only use the fewer lines of code to write `fc_layer` but also make the code clearer to understand. At the same time, layer developers can figure out what function they can invoke by typing `helper.` in a python editor.
### Implementation of layer helper
We just keep all parameters of a layer function as a dictionary in layer helper as a private data member. Every method of layer helper will look up the dictionary after it is invoked. In that way, we can implement a layer helper for all layer functions even some layer does not contain some operator. For example, The `activation` is used by the FullyConnected layer or convolution layers, but a cross-entropy layer does not use it. The example code of `add_activation` are:
```python
class LayerHelper(object):
def __init__(self, **kwargs): # kwargs is short for `keyword arguments`
self.kwargs = kwargs
def add_activation(self, input_var):
act = self.kwargs.get("act", None) # default value is None
<p><codeclass="docutils literal"><spanclass="pre">optimize_op_attrs</span></code> is not in the <codeclass="docutils literal"><spanclass="pre">VarDesc</span></code> message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator’s <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code>, and will be in the <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> message.</p>
<p>Layer functions take <codeclass="docutils literal"><spanclass="pre">Variable</span></code> and configuration parameters as its input and return the output variable(s).</p>
<p>For example, <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> take one or more variable as its input. The input could be input data or another layer’s output. There are many configuration options for a <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> layer, such as layer size, activation, parameter names, initialization strategies of parameters, and so on. The <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> layer will return an output variable.</p>
<spanid="necessity-for-reusing-code-between-layer-functions"></span><h3>Necessity for reusing code between layer functions<aclass="headerlink"href="#necessity-for-reusing-code-between-layer-functions"title="永久链接至标题">¶</a></h3>
<p>There are a lot of code that can be reused. Such as</p>
<ulclass="simple">
<li>Give the default value of configuration. e.g., default initialize strategy for parameters is uniform random with <codeclass="docutils literal"><spanclass="pre">min</span><spanclass="pre">=</span><spanclass="pre">-1.0</span></code>, <codeclass="docutils literal"><spanclass="pre">max</span><spanclass="pre">=</span><spanclass="pre">1.0</span></code>. and default initialize strategy for bias is to fill zero.</li>
<li>Append the activation operator.</li>
<li>Create a temporary variable.</li>
<li>Create parameter.</li>
<li>Generate a unique name.</li>
<li>Add a bias.</li>
<li>...</li>
</ul>
<p>A mechanism to reuse code between layer functions is necessary. It will be around <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/pull/4724/files#diff-823b27e07e93914ada859232ae23f846R12">150 lines of code</a> if we write a <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> layer without any helper functions.</p>
<spanid="comparision-between-global-functions-and-helper-class"></span><h3>Comparision between global functions and helper class<aclass="headerlink"href="#comparision-between-global-functions-and-helper-class"title="永久链接至标题">¶</a></h3>
<p>The <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> layer will be as follow when we provide global functions:</p>
<p>The input to the feed operator is a special variable in the global scope, which is the output of <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md">Python readers</a>.</p>
<p>We can provide many helpers functions for layer developers. However, there are several disadvantages for global helper functions:</p>
<olclass="simple">
<li>We need a namespace for these methods, then layer developers can quickly figure out what method they can use.</li>
<li>Global functions will force layer developers to pass its parameter time by time.</li>
</ol>
<p>So we provide a helper class, <codeclass="docutils literal"><spanclass="pre">LayerHelper</span></code>, to share code between layer functions. The <codeclass="docutils literal"><spanclass="pre">FullyConnected</span></code> Layer will be as follow.</p>
<spanclass="n">helper</span><spanclass="o">=</span><spanclass="n">LayerHelper</span><spanclass="p">(</span><spanclass="nb">locals</span><spanclass="p">())</span><spanclass="c1"># pass all parameter to LayerHelper</span>
<p>We not only use the fewer lines of code to write <codeclass="docutils literal"><spanclass="pre">fc_layer</span></code> but also make the code clearer to understand. At the same time, layer developers can figure out what function they can invoke by typing <codeclass="docutils literal"><spanclass="pre">helper.</span></code> in a python editor.</p>
<spanid="implementation-of-layer-helper"></span><h3>Implementation of layer helper<aclass="headerlink"href="#implementation-of-layer-helper"title="永久链接至标题">¶</a></h3>
<p>We just keep all parameters of a layer function as a dictionary in layer helper as a private data member. Every method of layer helper will look up the dictionary after it is invoked. In that way, we can implement a layer helper for all layer functions even some layer does not contain some operator. For example, The <codeclass="docutils literal"><spanclass="pre">activation</span></code> is used by the FullyConnected layer or convolution layers, but a cross-entropy layer does not use it. The example code of <codeclass="docutils literal"><spanclass="pre">add_activation</span></code> are:</p>
<spanclass="k">def</span><spanclass="fm">__init__</span><spanclass="p">(</span><spanclass="bp">self</span><spanclass="p">,</span><spanclass="o">**</span><spanclass="n">kwargs</span><spanclass="p">):</span><spanclass="c1"># kwargs is short for `keyword arguments`</span>
<spanclass="n">act</span><spanclass="o">=</span><spanclass="bp">self</span><spanclass="o">.</span><spanclass="n">kwargs</span><spanclass="o">.</span><spanclass="n">get</span><spanclass="p">(</span><spanclass="s2">"act"</span><spanclass="p">,</span><spanclass="bp">None</span><spanclass="p">)</span><spanclass="c1"># default value is None</span>
<spanclass="k">if</span><spanclass="n">act</span><spanclass="ow">is</span><spanclass="bp">None</span><spanclass="p">:</span><spanclass="c1"># do nothing if no act</span>