`optimize_op_attrs` is not in the `VarDesc` message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator's `OpDesc`, and will be in the `OpDesc` message.
## Layer Functions
## Layer Function
A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers.
### Data Layer
Layer functions take `Variable` and configuration parameters as its input and return the output variable(s).
For example, `FullyConnected` take one or more variable as its input. The input could be input data or another layer's output. There are many configuration options for a `FullyConnected` layer, such as layer size, activation, parameter names, initialization strategies of parameters, and so on. The `FullyConnected` layer will return an output variable.
### Necessity for reusing code between layer functions
There are a lot of code that can be reused. Such as
* Give the default value of configuration. e.g., default initialize strategy for parameters is uniform random with `min = -1.0`, `max = 1.0`. and default initialize strategy for bias is to fill zero.
* Append the activation operator.
* Create a temporary variable.
* Create parameter.
* Generate a unique name.
* Add a bias.
* ...
A mechanism to reuse code between layer functions is necessary. It will be around [150 lines of code](https://github.com/PaddlePaddle/Paddle/pull/4724/files#diff-823b27e07e93914ada859232ae23f846R12) if we write a `FullyConnected` layer without any helper functions.
### Comparision between global functions and helper class
The `FullyConnected` layer will be as follow when we provide global functions:
The input to the feed operator is a special variable in the global scope, which is the output of [Python readers](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md).
We can provide many helpers functions for layer developers. However, there are several disadvantages for global helper functions:
1. We need a namespace for these methods, then layer developers can quickly figure out what method they can use.
2. Global functions will force layer developers to pass its parameter time by time.
### FC Layer
So we provide a helper class, `LayerHelper`, to share code between layer functions. The `FullyConnected` Layer will be as follow.
We not only use the fewer lines of code to write `fc_layer` but also make the code clearer to understand. At the same time, layer developers can figure out what function they can invoke by typing `helper.` in a python editor.
### Implementation of layer helper
We just keep all parameters of a layer function as a dictionary in layer helper as a private data member. Every method of layer helper will look up the dictionary after it is invoked. In that way, we can implement a layer helper for all layer functions even some layer does not contain some operator. For example, The `activation` is used by the FullyConnected layer or convolution layers, but a cross-entropy layer does not use it. The example code of `add_activation` are:
```python
classLayerHelper(object):
def__init__(self,**kwargs):# kwargs is short for `keyword arguments`
self.kwargs=kwargs
defadd_activation(self,input_var):
act=self.kwargs.get("act",None)# default value is None