提交 8dded4f2 编写于 作者: Q qingqing01

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into write_new_op

关于PaddlePaddle
================
PaddlePaddle是一个最早由百度科学家和工程师共同研发的并行分布式深度学习平台,兼备易用性、高效性、灵活性和可扩展性,目前已被百度内部多个产品线广泛使用。
PaddlePaddle目前已经开放源码, 但是远未完善,我们希望能在这个基础上不断的改进、扩展和延伸。
同时我们希望广大开发者积极提供反馈和贡献源代码,建立一个活跃的开源社区。
致谢
--------
在此,特别感谢PaddlePaddle的[所有贡献者](https://github.com/PaddlePaddle/Paddle/graphs/contributors)
ABOUT
=======
PaddlPaddle is an easy-to-use, efficient, flexible and scalable deep learning platform,
which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu.
PaddlePaddle is now open source but far from complete, which is intended to be built upon, improved, scaled, and extended.
We hope to build an active open source community both by providing feedback and by actively contributing to the source code.
Credits
--------
We owe many thanks to `all contributors and developers <https://github.com/PaddlePaddle/Paddle/graphs/contributors>`_ of PaddlePaddle!
......@@ -7,4 +7,3 @@ PaddlePaddle Documentation
getstarted/index_en.rst
howto/index_en.rst
api/index_en.rst
about/index_en.rst
......@@ -124,6 +124,9 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
std::list<Pos> insert_position;
for (auto& dup_output_op : dup_output_ops) {
const std::string& name = dup_output_op.first;
// duplicate @Empty@ don't need to be added
if (name == kEmptyVarName) continue;
auto& dup_op = dup_output_op.second;
// no duplicate output
if (dup_op.size() == 1) continue;
......@@ -209,7 +212,7 @@ std::unique_ptr<OperatorBase> Backward(
const OperatorBase& forwardOp,
const std::unordered_set<std::string>& no_grad_vars) {
std::unordered_set<std::string> no_grad_names;
no_grad_names.reserve(no_grad_vars.size());
no_grad_names.reserve(no_grad_vars.size() + 1);
no_grad_names.insert(std::string(kEmptyVarName) + kGradVarSuffix);
......
## Operator/expression 's Backward
# Operator/expression 's Backward
### Motivation
## Motivation
In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation lineage, the operator/ expression's Backward feature will generate the backward pass respect to forward pass.
In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
## Backward Operator Registry
### Implement : gradient operator registry
A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients.
| | forward operator | backward operator |
| ---------------------- | ---------------- | -------------------------------- |
| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients |
| **Operator::outputs_** | Outputs | InputGradients |
| | forward operator | backward operator
| ---------------------- | ---------------- |------------------------- |
| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients |
| **Operator::outputs_** | Outputs | InputGradients |
Inputs/Outputs means the input/output of the operator, InputGradients/OutputGradients is the gradient respect to forward opeartor. Forward operator and Backward operator are isomorphic, save their corresponding needs into member attribute.
In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.
We use a global hash map record the gradient operators available, follow the philosophy of minimum core, make operator pluggable unit. Each gradient is an operator and it needs to regist itself.
For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro:
grad_op_builder(fengjiayi)
```cpp
REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
```
### Implement : Backward network
`mul` is the operator's type. `MulOp` and `MulOpMaker` are the operator class and the operator maker class respectively.
`mul_grad` is the type of backward operator, and `MulOpGrad` is its class name.
## Backward Opeartor Creating
Given a certain forward operator, we can get its corresponding backward opeartor by calling:
```cpp
OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op);
```
The function `BuildGradOp` will sequentially execute following processes:
1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.
2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these are not necessary for gradient computing.
3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.
4. Building backward operator with `inputs`, `outputs` and forward operator's attributes.
## Backward Network Building
A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and put them together.
In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network.
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.
......
......@@ -31,10 +31,13 @@ void NetOp::CompleteAddOp(bool calc) {
for (auto& op : ops_) {
for (auto& ipt : op->Inputs()) {
for (auto& var_name : ipt.second) {
if (!Contains(output_set, var_name)) { // Not other op's output
input_set.insert(var_name);
} else {
// If input variable has been in output set, then it will be
// added into intermediate_outputs_. Otherwise, it will be
// added into input set.
if (Contains(output_set, var_name)) {
intermediate_outputs_.insert(var_name);
} else {
input_set.insert(var_name);
}
}
}
......
......@@ -14,6 +14,7 @@
import numpy as np
from paddle.proto.ParameterConfig_pb2 import ParameterConfig
from collections import OrderedDict
import paddle.trainer.config_parser as cp
import struct
import tarfile
......@@ -42,9 +43,25 @@ def create(layers):
class Parameters(object):
"""
Parameters is a dictionary contains Paddle's parameter. The key of
Parameters is the name of parameter. The value of Parameters is a plain
:code:`numpy.ndarry` .
`Parameters` manages all the learnable parameters in a neural network.
It stores parameters' information in an OrderedDict. The key is
the name of a parameter, and value is a parameter's configuration(in
protobuf format), such as initialization mean and std, its size, whether it
is a static parameter, and so on.
:param __param_conf__: store the configurations of learnable parameters in
the network in an OrderedDict. Parameter is added one by one into the
dict by following their created order in the network: parameters of
the previous layers in a network are careted first. You can visit the
parameters from bottom to top by iterating over this dict.
:type __param_conf__: OrderedDict
:param __gradient_machines__: all of the parameters in a neural network are
appended to a PaddlePaddle gradient machine, which is used internally to
copy parameter values between C++ and Python end.
:type __gradient_machines__: list
:param __tmp_params__: a dict to store dummy parameters if no
__gradient_machines__ is appended to `Parameters`.
:type __tmp_params__: dict
Basically usage is
......@@ -62,7 +79,7 @@ class Parameters(object):
"""
def __init__(self):
self.__param_conf__ = dict()
self.__param_conf__ = OrderedDict()
self.__gradient_machines__ = []
self.__tmp_params__ = dict()
......@@ -231,6 +248,9 @@ class Parameters(object):
:rtype: np.ndarray
"""
import py_paddle.swig_paddle as api
if self.__param_conf__[key].is_static:
return np.zeros(self.__param_conf__[key].size, dtype=np.float32)
return self.__getter_inner(key, api.PARAMETER_GRADIENT)
def set(self, parameter_name, value):
......@@ -250,7 +270,7 @@ class Parameters(object):
append gradient machine to parameters. This method is used internally in
Trainer.train.
:param gradient_machine: Paddle C++ GradientMachine object.
:param gradient_machine: PaddlePaddle C++ GradientMachine object.
:type gradient_machine: api.GradientMachine
:return:
"""
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册