python_api.md 8.1 KB
Newer Older
Y
Yu Yang 已提交
1 2
# Design Doc: Python API

Y
Yu Yang 已提交
3
The top level user API in Python should be as same as API in `paddle.v2` after refactoring Paddle from a layer based framework to an operator based framework. There are many new classes in C++ in [compile time] for describing neural networks, such as `Variable`, `Operator`, `Block`. The issue about current design is how to give a proper way to wrap the C++ API to `paddle.v2` API and write layers in Python.
Y
Yu Yang 已提交
4 5 6 7 8 9

This implementation of Python API includes two steps.

1. Implement the Python API using current C++ runtime concepts.
2. Replace the implementation by using compile-time concepts when they are completed.

Y
Yu Yang 已提交
10 11 12 13
The implementation of the first step is a temporary implementation. We should design our Python API concepts based on `compile-time` concepts. We just use `runtime` classes to implement it for now.


## Python Class and compile-time protobuf
Y
Yu Yang 已提交
14

Y
Yu Yang 已提交
15
Since we design our Python API concepts based on `compile-time`, we try to map our Python classes to every compile-time result, i.e., the protobuf messages. They are:
Y
Yu Yang 已提交
16 17 18 19


| Python Class | Compile-time protobuf |
| --- | --- |
20
| Program | ProgramDesc |
Y
Yu Yang 已提交
21 22 23 24
| Block | BlockDesc |
| Operator | OpDesc |
| Variable | VarDesc |

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
### Program

`Program` is the description of the whole training process and there can only be one `Program` object, which is created automatically by the system at the very beginning. `Program` is formed by a series of `Block`.

```python
class Program(objects):
    def __init__(self):
        self.blocks = vector<Block>()
        self.blocks.append(Block(None))
        self.current_block_idx = 0

    def get_block(block_idx):
        return self.blocks[block_idx]

    def current_block():
        return self.get_block(self.current_block_idx)
    
    def fallback_current_block():
        self.current_block_idx = self.current_block().parent_idx


    def create_block():
        new_block_idx = len(self.block)
        self.blocks.append(Block(parent_idx=self.current_block_idx,
                                 idx=new_block_idx))
        self.current_block_idx = new_block_idx
```

`Program` will create the first block in its constructor. The first block is called 'global block'. It is where all parameters are stored.
Y
Yu Yang 已提交
54 55 56

### Block

Y
Yu Yang 已提交
57 58
Block is just like programming languages `{}`, which contains many operators and variables. There are two data fields in `Block`.  1) An associate map, whose key is variable name and value is variable itself; 2) A list of operators.

59
The block is hierarchical because PaddlePaddle supports RNN and IfElse. For example, RNN is like `for-loop` in programming languages. There is new `block` inside a `for-loop`. To represent hierarchies, `Block` stores the index of `parent Block` inside. The 'index' means the block's position in `Program`'s `blocks`. If `parent_idx=None`, the block itself is the outermost block, i.e., the 'global block'.
Y
Yu Yang 已提交
60

Y
Yu Yang 已提交
61 62 63

```python
class Block(objects):
64
    def __init__(self, parent_idx, idx):
Y
Yu Yang 已提交
65 66
        self.vars = map<string, Variable>()
        self.ops = vector<Operator>()
67 68
        self.idx = idx
        self.parent_idx = parent_idx
Y
Yu Yang 已提交
69 70 71 72 73 74 75
    
    def create_var(self, ...):
        # create variable in `self.vars`
        return Variable(...)
    
    
    def create_global_var(self, ...):
76 77 78
        if self.parent_idx is not None:
            parent_block = program.get_block(parent_idx)
            return parent_block.create_global_var(...)
Y
Yu Yang 已提交
79 80 81 82 83 84 85 86 87 88 89
        else:
            return self.create_var(...)
    
    def create_parameter(self, ...):
        return self.create_global_var(...)
    
    def append_operator(self, ...):
        self.ops.append(...)
        
    def prepend_operator(self, ...):
       self.ops.prepend(...)
Y
Yu Yang 已提交
90 91
```

Y
Yu Yang 已提交
92
Users are able to create a global variable inside any block since they many create parameters inside a RNN or IfElse. All parameters should be stored in the global block, not the step block in RNN.
Y
Yu Yang 已提交
93 94 95

Users can create local variables for outputs of operators. Users can also append and prepend an operator in current block. Prepending `random initialize` operator or `load` operator is very useful to initialize parameters before training.

Y
Yu Yang 已提交
96 97 98

### Operator

Y
Yu Yang 已提交
99
Operator class will take inputs, outputs and attributes of the operator into `protobuf` OpDesc and create a C++ `OpDesc` instance. The `infer_shape` perform on C++ objects.
Y
Yu Yang 已提交
100 101 102

```python
class Operator(object):
Y
Yu Yang 已提交
103 104 105 106 107 108 109 110
    def __init__(self, type, inputs, outputs, attrs):
        # create OpDesc in Python
        op_desc = ...
        self.cpp_op_desc_ptr = core.OpDesc(op_desc)
        cpp.infer_shape(self.cpp_op_desc_ptr, inputs, outputs)

    def type(self):
        return self.cpp_op_desc_ptr.type()
Y
Yu Yang 已提交
111 112
```

Y
Yu Yang 已提交
113 114
After creating a C++ `OpDesc`, `Operator` in Python can only reads the attribute from C++ side.

Y
Yu Yang 已提交
115 116
### Variable

F
fengjiayi 已提交
117
Operators' inputs, outputs, and parameters are all variables. In our design, a variable has four key attributes: its name(`name`), the block it belongs to(`block`), a pointer pointed to its C++ Protobuf object(`cpp_var_desc_ptr`), and the operator it is created by(`op`). All of these attributes are initialized in the constructor, except the `op`. The `op` will keep being `None` till the variable is taken as an operator's output.
Y
Yu Yang 已提交
118 119 120 121 122

```python
class Variable(object):
    def __init__(self, shape, dtype="float32", name=None, block=None):
        if name is None:
F
fengjiayi 已提交
123
            name = unique_name_generator()
Y
Yu Yang 已提交
124 125
        self.name = name
        self.block = block
F
fengjiayi 已提交
126
        # build C++ Protobuf object
Y
Yu Yang 已提交
127 128 129 130 131 132 133 134
        self.cpp_var_desc_ptr = ...
        self.op = None

    def shape(self):
        cpp_shape = self.cpp_var_desc_ptr.shape()
        return [None if elem < 0 else elem for elem in cpp_shape]
```

F
fengjiayi 已提交
135
The Protobuf object should be created in C++ not Python because it is needed by infershape, and infershape is implemented by C++ code. The C++ Protobuf object is accessible for Python through the `cpp_var_desc_ptr`, just like how `shape()` function does.
F
fengjiayi 已提交
136

F
fengjiayi 已提交
137
The user is allowed to build a variable without specifying its name. If so, it is going to be assigned with an automatically generated unique name.
F
fengjiayi 已提交
138

Y
Yu Yang 已提交
139 140
### Parameter

F
fengjiayi 已提交
141
The parameter is a kind of special variable. They need to be initialized at the very beginning and updated after each batch training. So if a variable is a parameter, our compiler will add an initializer op and an optimizer op for it during the building process of computation graph. Apart from these, there is no more difference between variable and parameter. In other words, 'parameter' is only a label attached to variables, to tell the compiler these ones require additional processing.
Y
Yu Yang 已提交
142

Y
Update  
Yu Yang 已提交
143 144
```python
class Parameter(Variable):
Y
Yu Yang 已提交
145 146 147 148
    def __init__(self, trainable, initialize_attrs, optimize_attrs):
        pass
```

F
fengjiayi 已提交
149 150 151
The class `Parameter` is derived from class `Variable`. In addition to variables have, parameters are able to hold their initializing and updating information. A parameter's `self.op` will always be `None` because it can never be an operator's output.


Y
Yu Yang 已提交
152 153
## Layer Functions

F
fengjiayi 已提交
154
A layer is a Python function. When it is invoked, it creates a series of operators and variables then inserts them into the block. It is something like the macro in C++. It is called 'Layer' because the combination of added operators acts just like what a neural network layer does. 
F
fengjiayi 已提交
155

F
fengjiayi 已提交
156
Here are examples of how to write a data layer and FC layer:
F
fengjiayi 已提交
157 158 159 160

### Data Layer

```python
161 162
def data_layer(name, type):
    block = program.current_block()
F
fengjiayi 已提交
163 164 165 166 167 168 169 170
    # type = dense_vector(size=10) / integer_value(range=10)
    return block.create_global_var(
            name=name, 
            shape=[None] + type.dims(), 
            dtype=type.dtype)

``` 

171
All the new variables and operators will be built in the `current block`. In the above `data_layer` code, a variable is created and be inserted into the root block to make it global. This variable is going to be used as input data of the whole network.
F
fengjiayi 已提交
172 173 174 175

### FC Layer

```python
176 177
def fc_layer(input, size, ...):
    block = program.current_block()
F
fengjiayi 已提交
178 179
    w = block.create_parameter(...)
    b = block.create_parameter(...)
Y
Yu Yang 已提交
180
    out = block.create_var()
F
fengjiayi 已提交
181 182 183 184 185
    op = block.append_operator(Operator("FC", X=input, W=w, b=b, Out=out))
    out.op = op
    return out
```

186
In the `fc_layer` code, we create two parameters(`w` and `b`), one variable(`out`) and one operator(`FC operator`), then insert all of them into the `current block`.