The top level user API in Python should be as same as API in `paddle.v2` after refactoring Paddle from a layer based framework to an operator based framework. There are many new classes in C++ in [compile time] for describing neural networks, such as `Variable`, `Operator`, `Block`. The issue about current design is how to give a proper way to wrap the C++ API to `paddle.v2` API and write layers in Python.
Due to the refactorization of the PaddlePaddle core, we need Python classes to construct corresponding protobuf messages that describe a DL program.
This implementation of Python API includes two steps.
| Python classes | Protobuf messages |
1. Implement the Python API using current C++ runtime concepts.
2. Replace the implementation by using compile-time concepts when they are completed.
The implementation of the first step is a temporary implementation. We should design our Python API concepts based on `compile-time` concepts. We just use `runtime` classes to implement it for now.
## Python Class and compile-time protobuf
Since we design our Python API concepts based on `compile-time`, we try to map our Python classes to every compile-time result, i.e., the protobuf messages. They are:
| Python Class | Compile-time protobuf |
| --- | --- |
| --- | --- |
| Program | ProgramDesc |
| Program | ProgramDesc |
| Block | BlockDesc |
| Block | BlockDesc |
| Operator | OpDesc |
| Operator | OpDesc |
| Variable | VarDesc |
| Variable | VarDesc |
Please be aware that these Python classes need to maintain some construction-time information, which are not part of the protobuf messages.
## Core Concepts
### Program
### Program
`Program` is the description of the whole training process and there can only be one `Program` object, which is created automatically by the system at the very beginning. `Program` is formed by a series of `Block`.
A `ProgramDesc` describes a [DL program](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), which is composed of an array of `BlockDesc`s. A `BlockDesc` refers to its parent block by its index in the array. For example, operators in the step block of an RNN operator needs to be able to access variables in its ancessor blocks.
Whenever we create a block, we need set its parent block to the current block, so the Python class `Program` needs to maintain a data member `current_block`.
```python
```python
classProgram(objects):
classProgram(objects):
def__init__(self):
def__init__(self):
self.proto=core.NewProgram()# a C++ ProgramDesc pointer.
self.blocks=vector<Block>()
self.blocks=vector<Block>()
self.blocks.append(Block(None))
self.blocks.append(Block(self,-1))# the global block
self.current_block_idx=0
self.current_block=0# initialized to the global block
`Program`will create the first block in its constructor. The first block is called 'global block'. It is where all parameters are stored.
`Program`is an accessor to the protobuf message `ProgramDesc`, which is created in C++ space, because the InferShape function is in C++, which manipulates `VarDesc` messages, which are in turn members of `BlockDesc`, which is a member of `ProgramDesc`.
### Block
`Program` creates the first block as the global block in its constructor. All parameters and their initializer operators are in the global block.
Block is just like programming languages `{}`, which contains many operators and variables. There are two data fields in `Block`. 1) An associate map, whose key is variable name and value is variable itself; 2) A list of operators.
### Block
The block is hierarchical because PaddlePaddle supports RNN and IfElse. For example, RNN is like `for-loop` in programming languages. There is new `block` inside a `for-loop`. To represent hierarchies, `Block` stores the index of `parent Block` inside. The 'index' means the block's position in `Program`'s `blocks`. If `parent_idx=None`, the block itself is the outermost block, i.e., the 'global block'.
A [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md) includes
1. a map from variable names to an instance of the Python `Variable` class, and
1. a list of `Operator` instances.
```python
```python
classBlock(objects):
classBlock(objects):
def__init__(self,parent_idx,idx):
def__init__(self,program,parent_idx):
self.proto=core.NewBlock(program.proto)
self.program=program
self.vars=map<string,Variable>()
self.vars=map<string,Variable>()
self.ops=vector<Operator>()
self.ops=vector<Operator>()
self.idx=idx
self.parent_idx=parent_idx
self.parent_idx=parent_idx
defcreate_var(self,...):
defcreate_var(self,...):
# create variable in `self.vars`
returnVariable(self,...)
returnVariable(...)
def_create_global_var(self,...):
program.global_block().create_var(...)
defcreate_global_var(self,...):
ifself.parent_idxisnotNone:
defcreate_parameter(self,name,...):
parent_block=program.get_block(parent_idx)
# Parameter is a subclass of variable. See Parameter section for details.
Users are able to create a global variable inside any block since they many create parameters inside a RNN or IfElse. All parameters should be stored in the global block, not the step block in RNN.
Users can create local variables for outputs of operators. Users can also append and prepend an operator in current block. Prepending `random initialize` operator or `load` operator is very useful to initialize parameters before training.
`create_parameter` is necessary because parameters are global variables, those defined in the global block, but can be created in some sub-blocks, e.g., an FC layer in the step block of an RNN operator.
`prepand_operator` is necessary because the constructor of `Parameter` needs to create the initialize (or load) operator of the parameter, and would like to put it in the *preamble* of the global block.
### Operator
### Operator
Operator class will take inputs, outputs and attributes of the operator into `protobuf` OpDesc and create a C++ `OpDesc` instance. The `infer_shape` perform on C++ objects.
The `Operator` class fills in the `OpDesc` message and calls the C++ function `InferShape` to infer output shape from input shape.
After creating a C++ `OpDesc`, `Operator` in Python can only reads the attribute from C++ side.
`Operator` creates the `OpDesc` message in C++ space, so could it call the `InferShape` function, which is in C++.
### Variable
### Variable
Operators' inputs, outputs, and parameters are all variables. In our design, a variable has four key attributes: its name(`name`), the block it belongs to(`block`), a pointer pointed to its C++ Protobuf object(`cpp_var_desc_ptr`), and the operator it is created by(`op`). All of these attributes are initialized in the constructor, except the `op`. The `op` will keep being `None` till the variable is taken as an operator's output.
Operators take Variables as its inputs and outputs.
The Protobuf object should be created in C++ not Python because it is needed by infershape, and infershape is implemented by C++ code. The C++ Protobuf object is accessible for Python through the `cpp_var_desc_ptr`, just like how `shape()` function does.
Please be aware of `self.writer`, that tracks operator who creates the variable. It possible that there are more than one operators who write a variable, but in Python space, each writes to a variable is represented by a Variable class. This is guaranteed by the fact that **`core.NewVarDesc` must NOT create a new `VarDesc` message if its name already exists in the specified block**.
The user is allowed to build a variable without specifying its name. If so, it is going to be assigned with an automatically generated unique name.
### Parameter
### Parameter
The parameter is a kind of special variable. They need to be initialized at the very beginning and updated after each batch training. So if a variable is a parameter, our compiler will add an initializer op and an optimizer op for it during the building process of computation graph. Apart from these, there is no more difference between variable and parameter. In other words, 'parameter' is only a label attached to variables, to tell the compiler these ones require additional processing.
A parameter is a global variable with an initializer (or load) operator.
The class `Parameter` is derived from class `Variable`. In addition to variables have, parameters are able to hold their initializing and updating information. A parameter's `self.op` will always be `None` because it can never be an operator's output.
When users create a parameter, s/he can call
```python
program.create_parameter(
...,
init_attr={
type:"uniform_random",
min:-1.0,
max:1.0,
})
)
```
## Layer Functions
In above example, `init_attr.type` names an initialize operator. It can also name the load operator
```python
init_attr={
type:"load",
filename:"something.numpy",
}
```
A layer is a Python function. When it is invoked, it creates a series of operators and variables then inserts them into the block. It is something like the macro in C++. It is called 'Layer' because the combination of added operators acts just like what a neural network layer does.
`optimize_op_attrs` is not in the `VarDesc` message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator's `OpDesc`, and will be in the `OpDesc` message.
## Layer Functions
Here are examples of how to write a data layer and FC layer:
A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers.
### Data Layer
### Data Layer
```python
```python
defdata_layer(name,type):
defdata_layer(name,type,column_name):
block=program.current_block()
block=the_current_program.glolal_block()
# type = dense_vector(size=10) / integer_value(range=10)
var=block.create_global_var(
returnblock.create_global_var(
name=name,
name=name,
shape=[None]+type.dims(),
shape=[None]+type.dims(),
dtype=type.dtype)
dtype=type.dtype)
block.prepend_operator(block,
type="Feed",
inputs=None,
outputs=[var],
{column_name:column_name})
returnvar
```
```
The input to the feed operator is a special variable in the global scope, which is the output of [Python readers](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md).
All the new variables and operators will be built in the `current block`. In the above `data_layer` code, a variable is created and be inserted into the root block to make it global. This variable is going to be used as input data of the whole network.
In the `fc_layer` code, we create two parameters(`w` and `b`), one variable(`out`) and one operator(`FC operator`), then insert all of them into the `current block`.