提交 c8d0d37c 编写于 作者: C caoying03

Merge branch 'develop' into crf

...@@ -22,7 +22,7 @@ COPY ./paddle/scripts/docker/root/ /root/ ...@@ -22,7 +22,7 @@ COPY ./paddle/scripts/docker/root/ /root/
RUN apt-get update && \ RUN apt-get update && \
apt-get install -y \ apt-get install -y \
git python-pip python-dev openssh-server bison \ git python-pip python-dev openssh-server bison libnccl-dev \
wget unzip unrar tar xz-utils bzip2 gzip coreutils ntp \ wget unzip unrar tar xz-utils bzip2 gzip coreutils ntp \
curl sed grep graphviz libjpeg-dev zlib1g-dev \ curl sed grep graphviz libjpeg-dev zlib1g-dev \
python-matplotlib gcc-4.8 g++-4.8 \ python-matplotlib gcc-4.8 g++-4.8 \
......
...@@ -125,3 +125,8 @@ simple_attention ...@@ -125,3 +125,8 @@ simple_attention
:members: simple_attention :members: simple_attention
:noindex: :noindex:
dot_product_attention
---------------------
.. automodule:: paddle.v2.networks
:members: dot_product_attention
:noindex:
...@@ -189,7 +189,7 @@ OpDesc { ...@@ -189,7 +189,7 @@ OpDesc {
inputs = {0} // the index of x in vars of BlockDesc above inputs = {0} // the index of x in vars of BlockDesc above
outputs = {5, 3} // indices of act and hidden_out in vars of BlockDesc above outputs = {5, 3} // indices of act and hidden_out in vars of BlockDesc above
attrs { attrs {
"memories" : {1} // the index of h "states" : {1} // the index of h
"step_net" : <above step net> "step_net" : <above step net>
} }
}; };
......
# Prune
## Motivation
We want to support running inference, training and checkpointing in one `ProgramDesc`. We implement
`void Prune(const ProgramDesc* input, ProgramDesc* output)` function, which takes a `ProgramDesc`
and generate a pruned `ProgramDesc`.
## Challenge
Pruning need to support both variables and operators being evaluation targets. Consider the following
different situations.
```python
# Case 1: run foward pass.
cost_np = session.run(target=cost)
# Case 2: run backward passing.
opts_np, _ = session.run(target=[cost, opt])
# Case 3: run checkpointing
_ = session.run(target=checkpoint)
```
## Solution
To support evaluation of operators, we add `is_target` field in the `OpDesc`.
```c++
message OpDesc {
required string type = 3;
repeated Var inputs = 1;
repeated Var outputs = 2;
repeated Attr attrs = 4;
optional bool is_target = 5 [ default = false ];
};
```
To support evaluation of variables, we add [fetch_op](https://github.com/PaddlePaddle/Paddle/pull/4599).
For each variable in the `target`, we insert a `fetch_op` into the `ProgramDesc` with `variable` being
`fetch_op`'s input. Then we also set `fetch_op` is a target.
### Algorithm
If an operator needs to be run, it must fall into one of the following cases:
1. It is the target.
2. It is depended by some other ops, meaning its output is some other op's input.
The first case can be checked by `op_desc.is_traget()` . The second case can be implement as
```c++
bool HasDependentVar(const OpDesc& op_desc, const std::set<string>& dependent_vars) {
for (auto& var : op_desc.outputs()) {
for (auto& argu : var.arguments()) {
if (dependent_vars.count(argu) != 0) {
return true;
}
}
}
return false;
}
```
Then the whole algorithm can be implemented as the following [code](https://github.com/tonyyang-svail/Paddle/blob/prune_impl/paddle/framework/prune.cc).
...@@ -177,9 +177,6 @@ REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class) ...@@ -177,9 +177,6 @@ REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class)
REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class) REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class)
``` ```
### USE Macros
Make sure the registration process is executed and linked.
--- ---
# Registration Process # Registration Process
1. Write an Op class and its gradient Op class, if required. 1. Write an Op class and its gradient Op class, if required.
...@@ -188,8 +185,6 @@ Make sure the registration process is executed and linked. ...@@ -188,8 +185,6 @@ Make sure the registration process is executed and linked.
1. Call maker class to complete `proto` and `checker` 1. Call maker class to complete `proto` and `checker`
2. Using the completed `proto` and `checker`, it will add a new key-value pair to the `OpInfoMap` 2. Using the completed `proto` and `checker`, it will add a new key-value pair to the `OpInfoMap`
4. Invoke the `USE` macro in which the Op is used to make sure that it is linked.
--- ---
# Backward Module (1/2) # Backward Module (1/2)
### Create Backward Operator ### Create Backward Operator
......
...@@ -3,17 +3,17 @@ ...@@ -3,17 +3,17 @@
## The Problem Posed ## The Problem Posed
Currently, for each C++ operator class definition, there registers a *gradient operator creator* function, which takes a C++ operator instance and returns the corresponding gradient operator instance. Currently, for each C++ operator class definition, a *gradient operator creator* function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance.
However, we noticed two problems with the current deisgn: However, we noticed two problems with the current design:
1. As we decided to separate the *compilation* and *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message. 1. As we decided to separate the *compilation* and the *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message.
1. Some operator's gradient computation requires more than one gradient operators. For example, the gradient of *minus* consists of two operators -- an identity operaotr and a scale operator. So we need to make the registration mechanism to support the mapping from an operator to a set of operators for gradient computation. 1. For some operators, the gradient computation can be written in terms of existing operators. For example, the gradient of *minus* operator consists of two operators -- an *identity* operator followed by a *scale* operator. Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation.
## The Current Implementation ## The Current Implementation
The C++ class `OpInfos` store in a association map which key is the operator type. The `grad_op_type` indicate associated gradient operator type. Operator can create gradient operator by `OpInfo::creator_` of gradient. The pseudo code is Instances of the C++ class `OpInfo` are stored an associative map whose key is the operator type. The `grad_op_type` indicates the associated gradient operator type. An operator can create the gradient operator by invoking `OpInfo::creator_` of the gradient operator. The pseudo code is as follows
```cpp ```cpp
struct OpInfo { struct OpInfo {
...@@ -31,16 +31,16 @@ OperatorBase* CreateGradientOperator(const OperatorBase& op) { ...@@ -31,16 +31,16 @@ OperatorBase* CreateGradientOperator(const OperatorBase& op) {
## Proposed Solution ## Proposed Solution
The mapping relationship between an operator and its gradient operators is a function. The interface of that function is: The mapping relationship between an operator and its gradient operators is a function. The interface of this function is:
```cpp ```cpp
// (OpDesc) --> vector<OpDesc> // (OpDesc) --> vector<OpDesc>
std::function<std::vector<OpDescBind>(const OpDescBind&)>; std::function<std::vector<OpDescBind>(const OpDescBind&)>;
``` ```
The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for protobuf message `OpDesc` to manipulate `OpDesc` fast. The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for the protobuf message `OpDesc` for rapid manipulation of `OpDesc`.
The `GradOpDescMaker` will be registered in `OpInfo`, to replace `grad_op_type_` field. The `OpInfo` should be The `GradOpDescMaker` will be registered in `OpInfo` and will replace the `grad_op_type_` field. The `OpInfo` should look like
```cpp ```cpp
struct OpInfo { struct OpInfo {
...@@ -49,7 +49,7 @@ struct OpInfo { ...@@ -49,7 +49,7 @@ struct OpInfo {
}; };
``` ```
The `grad_op_maker_ ` is `nullptr` if the operator does not have associated gradient operators. The `grad_op_maker_ ` is a `nullptr` if the operator does not have any associated gradient operators.
We propose a base class called `GradOpDescMakerBase` to let operator developers generate `Gradient Operators` easily. The public interface of that class is We propose a base class called `GradOpDescMakerBase` to let operator developers generate `Gradient Operators` easily. The public interface of that class is
...@@ -74,7 +74,7 @@ func = [] (const OpDescBind& fwd_op) { ...@@ -74,7 +74,7 @@ func = [] (const OpDescBind& fwd_op) {
We can write many helper functions since the `GradOpDescMakerBase` is a class now. The basic helper functions get the variables of `Input`, `Output`, `InputGradient` and `OutputGradient` in the forwarding operator. We can write many helper functions since the `GradOpDescMakerBase` is a class now. The basic helper functions get the variables of `Input`, `Output`, `InputGradient` and `OutputGradient` in the forwarding operator.
We should chagne register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`. We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`.
The user interface should be The user interface should be
......
# Regularization in PaddlePaddle
## Introduction to Regularization
A central problem in machine learning is how to design an algorithm that will perform well not just on the training data, but also on new data. Many strategies are used by machine learning practitioners to reduce the test error, possibly at the expense of increased training error. These strategies are collectively known as **regularization**.
### Parameter Norm Penalties
Most common regularization approaches in deep learning are based on limiting the capacity of the models by adding a parameter norm penalty to the objective function `J`. This is given as follows:
<img src="./images/loss_equation.png" align="center"/><br/>
The parameter `alpha` is a hyperparameter that weights the relative contribution of the norm penalty term, `omega`, relative to the standard objective function `J`.
The most commonly used norm penalties are the L2 norm penalty and the L1 norm penalty. These are given as follows:
##### L2 Regularization:
<img src="./images/l2_regularization.png" align="center"/><br/>
##### L1 Regularization
<img src="./images/l1_regularization.png" align="center"/><br/>
A much more detailed mathematical background of reguilarization can be found [here](http://www.deeplearningbook.org/contents/regularization.html).
## How to do Regularization in PaddlePaddle
On surveying existing frameworks like Tensorflow, PyTorch, Caffe, etc, it can be seen that there are 2 common approaches of doing regularization:
1. Making regularization a part of the optimizer using an attribute like `weight_decay` that is used to control the scale of the L2 Penalty. This approach is used in PyTorch as follows:
```python
opt = torch.optim.SGD(params, lr=0.2, weight_decay=0.2)
```
At every optimization step, this code will add the gradient of the L2 Norm of the params to the gradient of the params with respect to the loss function. This can seen in the following code snippet:
```python
if weight_decay != 0:
d_p.add_(weight_decay, p.data)
```
This is a very restyrictive way of doing regularization and does not give the users enough flexibility.
**Advantages**:
- It is easy to implement for us.
- Faster execution of backward. However, it can be done manually by advanced users too.
**Disadvantages**:
- Not flexible for other regularizations such as L1/L0 regularization.
- Does not allow for different regularization coefficient for different parameters. For example, in most models, ony the weight matrices are regularized and the bias vectors are unregularized.
- Tightly coupled optimizer and regularization implementation.
2. Adding regularization ops to the graph through Python API. This approach is used by Tensorflow and Caffe. Using this approach, we manually add regularization ops to the graph and then add the regularization loss to the final loss function before sending them to the optimizer.
**Advantages**:
- Allows for greater flexibility to the users of Paddle. Using this approach, the users can put different regularization to different parameters and also choose parameters that are not a part of regularization.
- Makes it easy for the users to customize and extend the framework.
**Disadvantages**:
- Implementation requires comprehensive design and time.
## Proposal for Regularization in PaddlePaddle
### Low-Level implementation
In the new design, we propose to create new operations for regularization. For now, we can add 2 ops thgat correspond to the most frequently used regularizations:
- L2_regularization_op
- L1_regularization_op
These ops can be like any other ops with their own CPU/GPU implementations either using Eigen or separate Cpu and GPU kernels. As the initial implementation, we can implement their kernels using Eigen following the abstraction pattern implemented for [Activation Ops](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/accuracy_op.h). This abstraction pattern can make it very easy to implement new regularization schemes. other than L1 and L2 norm penalties.
The idea of building ops for regularization is in sync with the refactored Paddle philosophy of using operators to represent any computation unit. The way these ops will be added to the computation graph, will be decided by the [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) in Python API.
### Computation Graph
Below is an example of a really simple feed forward neural network.
<img src="./images/feed_forward.png" align="center"/><br/>
The Python API will modify this computation graph to add regularization operators. The modified computation graph will look as follows:
<img src="./images/feed_forward_regularized.png" align="center"/><br/>
   
### Python API implementation for Regularization
Using the low level ops, `L2_regularization_op` and `L1_regularization_op`, any user can add regularization to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support regularization. An example of such an API can be seen in [Keras](https://keras.io/regularizers/). As per the PaddlePaddle [Python API design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md), the layer functions are responsible for creating operators, operator parameters and variables. Since regularization is a property of parameters, it makes sense to create these in the layer functions.
#### Creation of Regularization ops
There are two possibilities for creating the regularization ops:
1. We create these ops immediately while building the computation graph.
2. We add these ops in a lazy manner, just before the backward, similar to the way the optimization ops are added.
The proposal is to add these ops in a lazy manner just before the backward pass.
#### Storage of Regularization attributes
Since we want to create the regularization ops in a lazy manner, the regularization attributes (type of regularization and weight of regularization penalty) can be stored as attributes of the [`Parameter`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/framework.py#L421) class. This is because regularization is a property of the parameters and storing regularization properties with Parameters also allows for shared parameters.
#### High-level API
In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we lso need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in [Keras](https://keras.io/regularizers/) and also by looking at Tensorflow in [`tf.contrib.layers`](https://www.tensorflow.org/api_guides/python/contrib.layers).
# Design Doc: Selected Rows # Design Doc: Selected Rows
`SelectedRows` is a kind of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in that tensor. It is straightforward to represent the sparse tensor by the following sparse tensor data structure: `SelectedRows` is a type of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in this tensor. It is straight-forward to represent a sparse tensor by the following sparse tensor data structure:
```cpp ```cpp
class SelectedRows { class SelectedRows {
...@@ -11,7 +11,7 @@ class SelectedRows { ...@@ -11,7 +11,7 @@ class SelectedRows {
}; };
``` ```
The field `height_` shows the first dimension of `SelectedRows`. The `rows` are the indices of which rows of `SelectedRows` are non-zeros. The `value_` field is an N-dim tensor and shape is `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`. The field `height_` is the first dimension of `SelectedRows`. The `rows` are the indices of the non-zero rows of `SelectedRows`. The `value_` field is an N-dim tensor of shape `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`.
Suppose that a SelectedRows-typed variable `x` has many rows, but only two of them have values -- row 73 is `[1, 2]` and row 84 is `[3, 4]`, the `SelectedRows` representation would be: Suppose that a SelectedRows-typed variable `x` has many rows, but only two of them have values -- row 73 is `[1, 2]` and row 84 is `[3, 4]`, the `SelectedRows` representation would be:
...@@ -25,7 +25,7 @@ x = SelectedRow { ...@@ -25,7 +25,7 @@ x = SelectedRow {
## SelectedRows in Protobuf ## SelectedRows in Protobuf
`SelectedRows` is a kind of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time since the `rows_` and `value_` are related to training data. `SelectedRows` is a type of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time because the `rows_` and `value_` are dependent on the training data.
So we use `TensorDesc` to unify `data_type` and `dims`. A LodTensorDesc contains a `TensorDesc` and `lod_level`. The description of `SelectedRows` is a Tensor description. So we use `TensorDesc` to unify `data_type` and `dims`. A LodTensorDesc contains a `TensorDesc` and `lod_level`. The description of `SelectedRows` is a Tensor description.
```proto ```proto
...@@ -54,7 +54,7 @@ message VarDesc { ...@@ -54,7 +54,7 @@ message VarDesc {
## InferShape for Selected Rows ## InferShape for Selected Rows
Just like `LoD` information, `InferShape` method will inference output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor. Just like `LoD` information, `InferShape` method will infer the output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor.
For example, the gradient operator of `TableLookup` will always generate `SelectedRows`. Its `InferShape` method should be like following For example, the gradient operator of `TableLookup` will always generate `SelectedRows`. Its `InferShape` method should be like following
...@@ -68,7 +68,7 @@ void TableLookupGrad::InferShape(context) { ...@@ -68,7 +68,7 @@ void TableLookupGrad::InferShape(context) {
## Sparse Operators ## Sparse Operators
There are several operators should be written to support `SelectedRows`. They are: There are several operators that need to be written to support `SelectedRows`. These are:
1. Operators which generates `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`. 1. Operators which generate `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`.
2. Optimize operators which support `SelectedRows` gradient. e.g. `SGD` or `AdaGrad` for `SelectedRows`. However, there should be only one `SGD` operator. `OpWithKernel::Run` should select a suitable kernel for both `dense` tensor or `SelectedRows`. 2. Optimize operators which support `SelectedRows` gradient. e.g. `SGD` or `AdaGrad` for `SelectedRows`. However, there should be only one `SGD` operator. `OpWithKernel::Run` should select a suitable kernel for both `dense` tensor or `SelectedRows`.
# 构建Android平台上的PaddlePaddle库 # 构建Android平台上的PaddlePaddle库
用户可通过交叉编译的方式,在用户熟悉的开发平台(Linux,Mac OS X和Windows)上编译Android平台上适用的PaddlePaddle库。 用户可通过如下两种方式,交叉编译Android平台上适用的PaddlePaddle库:
- 基于Docker容器的编译方式
- 基于Linux交叉编译环境的编译方式
## 基于Docker容器的编译方式
Docker能在所有主要操作系统(包括Linux,Mac OS X和Windows)上运行,因此,使用基于Docker容器的编译方式,用户可在自己熟悉的开发平台上编译Android平台上适用的PaddlePaddle库。
### 构建PaddlePaddle的Android开发镜像
我们把PaddlePaddle的交叉编译环境打包成一个镜像,称为开发镜像,里面涵盖了交叉编译Android版PaddlePaddle库需要的所有编译工具。
```bash
$ git clone https://github.com/PaddlePaddle/Paddle.git
$ cd Paddle
$ docker build -t username/paddle-android:dev . -f Dockerfile.android
```
### 编译PaddlePaddle C-API库
构建好开发镜像后,即可使用开发镜像来编译Android版PaddlePaddle C-API库。
Android的Docker开发镜像向用户提供两个可配置的参数:
| Argument | Optional Values | Default |
|-----------------|-------------------------|---------|
|`ANDROID_ABI` |`armeabi-v7a, arm64-v8a` | `armeabi-v7a` |
|`ANDROID_API` |`>= 21` | `21` |
- 编译`armeabi-v7a``Android API 21`的PaddlePaddle库
```bash
$ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=armeabi-v7a" -e "ANDROID_API=21" username/paddle-android:dev
```
- 编译`arm64-v8a``Android API 21`的PaddlePaddle库
```bash
$ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=arm64-v8a" -e "ANDROID_API=21" username/paddle-android:dev
```
执行上述`docker run`命令时,容器默认执行[paddle/scripts/docker/build_android.sh](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build_android.sh)脚本。该脚本中记录了交叉编译Android版PaddlePaddle库常用的CMake配置,并且会根据`ANDROID_ABI``ANDROID_API`自动构建独立工具链、进行编译和安装。由于arm64架构要求Android API不小于21。因此当`ANDROID_ABI=arm64-v8a``ANDROID_API<21`时,Docker容器中将默认使用`Android API 21`的编译工具链。用户可以参考下文**配置交叉编译参数**章节,根据个人的需求修改定制Docker容器所执行的脚本。编译安装结束之后,PaddlePaddle的C-API库将被安装到`$PWD/install_android`目录,所依赖的第三方库同时也被安装到`$PWD/install_android/third_party`目录。
## 基于Linux交叉编译环境的编译方式
本文档将以Linux x86-64平台为例,介绍交叉编译Android平台上适用的PaddlePaddle库的方法和步骤。 本文档将以Linux x86-64平台为例,介绍交叉编译Android平台上适用的PaddlePaddle库的方法和步骤。
## 准备交叉编译环境 ### 准备交叉编译环境
从源码交叉编译PaddlePaddle,用户需要提前准备好交叉编译环境。Android平台上使用的C/C++交叉编译工具链为[Android NDK](https://developer.android.com/ndk/downloads/index.html?hl=zh-cn),用户可自行前往下载预编译好的版本,也可通过以下命令获取: 从源码交叉编译PaddlePaddle,用户需要提前准备好交叉编译环境。Android平台上使用的C/C++交叉编译工具链为[Android NDK](https://developer.android.com/ndk/downloads/index.html?hl=zh-cn),用户可自行前往下载预编译好的版本,也可通过以下命令获取:
...@@ -13,18 +50,27 @@ unzip -q android-ndk-r14b-linux-x86_64.zip ...@@ -13,18 +50,27 @@ unzip -q android-ndk-r14b-linux-x86_64.zip
``` ```
Android NDK中包含了所有Android API级别、所有架构(arm/arm64/x86/mips)需要用到的编译工具和系统库。用户可根据自己的编译目标架构、所需支持的最低Android API级别,构建[独立工具链](https://developer.android.google.cn/ndk/guides/standalone_toolchain.html?hl=zh-cn) Android NDK中包含了所有Android API级别、所有架构(arm/arm64/x86/mips)需要用到的编译工具和系统库。用户可根据自己的编译目标架构、所需支持的最低Android API级别,构建[独立工具链](https://developer.android.google.cn/ndk/guides/standalone_toolchain.html?hl=zh-cn)
比如:
- 构建`armeabi-v7a``Android API 21`的独立工具链:
```bash
your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \
--arch=arm --platform=android-21 --install-dir=your/path/to/arm_standalone_toolchain
```
此命令将在`your/path/to/arm_standalone_toolchain`目录生成一套独立编译工具链,面向架构为32位ARM架构,支持的最小的Android API级别为21,支持编译器`arm-linux-androideabi-gcc (GCC) 4.9``clang 3.8`
- 构建`arm64-v8a``Android API 21`的独立工具链:
```bash ```bash
your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \ your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \
--arch=arm --platform=android-21 --install-dir=your/path/to/my_standalone_toolchain --arch=arm64 --platform=android-21 --install-dir=your/path/to/arm64_standalone_toolchain
``` ```
此命令将在your/path/to/my_standalone_toolchain目录生成一套编译工具链,面向架构为32位ARM架构,支持的最小的Android API级别为21,使用的编译器为arm-linux-androideabi-gcc (GCC) 4.9 此命令将在`your/path/to/arm64_standalone_toolchain`目录生成一套独立编译工具链,面向架构为64位ARM64架构,支持的最小Android API级别为21,支持编译器`arm-linux-androideabi-gcc (GCC) 4.9``clang 3.8`
注意:**PaddlePaddle要求使用的编译工具链所支持的Andoid API级别不小于21** 注意:**PaddlePaddle要求使用的编译工具链所支持的Android API级别不小于21**
## 配置交叉编译参数 ### 配置交叉编译参数
CMake系统对交叉编译提供了支持[cmake-toolchains](https://cmake.org/cmake/help/v3.0/manual/cmake-toolchains.7.html#cross-compiling)。为了简化cmake配置,PaddlePaddle为交叉编译提供了工具链配置文档[cmake/cross_compiling/android.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/android.cmake),以提供一些默认的编译器和编译参数相关配置。注意,从CMake 3.7版本开始,CMake官方对Android平台的交叉编译提供了通用的支持。PaddlePaddle若检测到用户使用的CMake版本不低于3.7时,将会将用户传进来的配置参数传递CMake系统,交由CMake系统本身来处理。有关参数配置的详细说明见[cmake-toolchains](https://cmake.org/cmake/help/v3.7/manual/cmake-toolchains.7.html#cross-compiling) CMake系统对交叉编译提供了支持[cmake-toolchains](https://cmake.org/cmake/help/v3.0/manual/cmake-toolchains.7.html#cross-compiling)。为了简化cmake配置,PaddlePaddle为交叉编译提供了工具链配置文档[cmake/cross_compiling/android.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/android.cmake),以提供一些默认的编译器和编译参数相关配置。注意,从CMake 3.7版本开始,CMake官方对Android平台的交叉编译提供了通用的支持。PaddlePaddle若检测到用户使用的CMake版本不低于3.7时,将会将用户传进来的配置参数传递CMake系统,交由CMake系统本身来处理。有关参数配置的详细说明见[cmake-toolchains](https://cmake.org/cmake/help/v3.7/manual/cmake-toolchains.7.html#cross-compiling)
...@@ -36,23 +82,43 @@ CMake系统对交叉编译提供了支持[cmake-toolchains](https://cmake.org/cm ...@@ -36,23 +82,43 @@ CMake系统对交叉编译提供了支持[cmake-toolchains](https://cmake.org/cm
Android平台可选配置参数: Android平台可选配置参数:
- `ANDROID_STANDALONE_TOOLCHAIN`,独立工具链所在的绝对路径,或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动推导和设置需要使用的交叉编译器、sysroot、以及Android API级别;否则,用户需要在cmake时手动设置这些值。无默认值。 - `ANDROID_STANDALONE_TOOLCHAIN`,独立工具链所在的绝对路径,或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动推导和设置需要使用的交叉编译器、sysroot、以及Android API级别;否则,用户需要在cmake时手动设置这些值。无默认值。
- `ANDROID_ABI`,目标架构ABI。目前只支持`armeabi-v7a`,默认值为`armeabi-v7a` - `ANDROID_TOOLCHAIN`,目标工具链。可设置`gcc/clang`,默认值为`clang`
- CMake 3.7以上,将会始终使用`clang`工具链;CMake 3.7以下,可设置`ANDROID_TOOLCHAIN=gcc`以使用`gcc`工具链。
- Android官方提供的`clang`编译器要求系统支持`GLIBC 2.15`以上。
- `ANDROID_ABI`,目标架构ABI。目前支持`armeabi-v7a``arm64-v8a`,默认值为`armeabi-v7a`
- `ANDROID_NATIVE_API_LEVEL`,工具链的Android API级别。若没有显式设置,PaddlePaddle将根据`ANDROID_STANDALONE_TOOLCHAIN`的值自动推导得到。 - `ANDROID_NATIVE_API_LEVEL`,工具链的Android API级别。若没有显式设置,PaddlePaddle将根据`ANDROID_STANDALONE_TOOLCHAIN`的值自动推导得到。
- `ANROID_ARM_MODE`,是否使用ARM模式。可设置`ON/OFF`,默认值为`ON` - `ANROID_ARM_MODE`,是否使用ARM模式。
- `ANDROID_ARM_NEON`,是否使用NEON指令。目前必须设置成`ON`,默认值为`ON` - `ANDROID_ABI=armeabi-v7a`时,可设置`ON/OFF`,默认值为`ON`
- `ANDROID_ABI=arm64-v8a`时,不需要设置。
- `ANDROID_ARM_NEON`,是否使用NEON指令。
- `ANDROID_ABI=armeabi-v7a`时,可设置`ON/OFF`,默认值为`ON`
- `ANDROID_ABI=arm64-v8a`时,不需要设置。
其他配置参数: 其他配置参数:
- `USE_EIGEN_FOR_BLAS`,是否使用Eigen库进行矩阵计算。可设置`ON/OFF`,默认值为`OFF`
- `HOST_C/CXX_COMPILER`,宿主机的C/C++编译器。在编译宿主机版protoc可执行文件和目标机版OpenBLAS库时需要用到。默认设置成环境变量`CC`的值;若环境变量`CC`没有设置,则设置成`cc`编译器。 - `HOST_C/CXX_COMPILER`,宿主机的C/C++编译器。在编译宿主机版protoc可执行文件和目标机版OpenBLAS库时需要用到。默认设置成环境变量`CC`的值;若环境变量`CC`没有设置,则设置成`cc`编译器。
一种常用的cmake配置如下: 常用的cmake配置如下:
```bash ```bash
cmake -DCMAKE_SYSTEM_NAME=Android \ cmake -DCMAKE_SYSTEM_NAME=Android \
-DANDROID_STANDALONE_TOOLCHAIN=your/path/to/my_standalone_toolchain \ -DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm_standalone_toolchain \
-DANDROID_ABI=armeabi-v7a \ -DANDROID_ABI=armeabi-v7a \
-DANDROID_ARM_NEON=ON \ -DANDROID_ARM_NEON=ON \
-DANDROID_ARM_MODE=ON \ -DANDROID_ARM_MODE=ON \
-DUSE_EIGEN_FOR_BLAS=ON \
-DCMAKE_INSTALL_PREFIX=your/path/to/install \
-DWITH_C_API=ON \
-DWITH_SWIG_PY=OFF \
..
```
```
cmake -DCMAKE_SYSTEM_NAME=Android \
-DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm64_standalone_toolchain \
-DANDROID_ABI=arm64-v8a \
-DUSE_EIGEN_FOR_BLAS=OFF \
-DCMAKE_INSTALL_PREFIX=your/path/to/install \ -DCMAKE_INSTALL_PREFIX=your/path/to/install \
-DWITH_C_API=ON \ -DWITH_C_API=ON \
-DWITH_SWIG_PY=OFF \ -DWITH_SWIG_PY=OFF \
...@@ -61,7 +127,12 @@ cmake -DCMAKE_SYSTEM_NAME=Android \ ...@@ -61,7 +127,12 @@ cmake -DCMAKE_SYSTEM_NAME=Android \
用户还可根据自己的需求设置其他编译参数。比如希望最小化生成的库的大小,可以设置`CMAKE_BUILD_TYPE``MinSizeRel`;若希望最快的执行速度,则可设置`CMAKE_BUILD_TYPE``Release`。亦可以通过手动设置`CMAKE_C/CXX_FLAGS_MINSIZEREL/RELEASE`来影响PaddlePaddle的编译过程。 用户还可根据自己的需求设置其他编译参数。比如希望最小化生成的库的大小,可以设置`CMAKE_BUILD_TYPE``MinSizeRel`;若希望最快的执行速度,则可设置`CMAKE_BUILD_TYPE``Release`。亦可以通过手动设置`CMAKE_C/CXX_FLAGS_MINSIZEREL/RELEASE`来影响PaddlePaddle的编译过程。
## 编译和安装 **性能TIPS**,为了达到最快的计算速度,在CMake参数配置上,有以下建议:
- 设置`CMAKE_BUILD_TYPE``Release`
- 使用`clang`编译工具链
- `armeabi-v7a`时,设置`USE_EIGEN_BLAS=ON`,使用Eigen进行矩阵计算;`arm64-v8a`时,设置`USE_EIGEN_FOR_BLAS=OFF`,使用OpenBLAS进行矩阵计算
### 编译和安装
CMake配置完成后,执行以下命令,PaddlePaddle将自动下载和编译所有第三方依赖库、编译和安装PaddlePaddle预测库。 CMake配置完成后,执行以下命令,PaddlePaddle将自动下载和编译所有第三方依赖库、编译和安装PaddlePaddle预测库。
...@@ -72,4 +143,4 @@ make install ...@@ -72,4 +143,4 @@ make install
注意:如果你曾经在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。 注意:如果你曾经在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。
执行完安装命令后,`your/path/to/install`目录中会包含`include``lib`目录,其中`include`中包含C-API的头文件,`lib`中包含一个Android版本的库。自此,PaddlePaddle的已经安装完成,用户可将`your/path/to/install`目录下的生成文件用于深度学习相关Android App中,调用方法见C-API文档。 执行完安装命令后,`your/path/to/install`目录中会包含`include``lib``third_party`目录,其中`include`中包含C-API的头文件,`lib`中包含若干个不同Android ABI的PaddlePaddle库,`third_party`中包含所依赖的所有第三方库。自此,PaddlePaddle的已经安装完成,用户可将`your/path/to/install`目录下的生成文件用于深度学习相关Android App中,调用方法见C-API文档。
```eval_rst # PaddlePaddle分布式训练
.. _cluster_train:
* [概述](#概述)
* [环境准备](#环境准备)
* [启动参数说明](#启动参数说明)
* [启动参数服务器](#启动参数服务器)
* [启动计算节点](#启动计算节点)
* [准备数据集](#准备数据集)
* [准备训练程序](#准备训练程序)
* [使用分布式计算平台或工具](#使用分布式计算平台或工具)
* [使用Fabric启动集群作业](#使用fabric启动集群作业)
* [准备一个Linux集群](#准备一个linux集群)
* [启动集群作业](#启动集群作业)
* [终止集群作业](#终止集群作业)
* [检查集群训练结果](#检查集群训练结果)
* [检查模型输出](#检查模型输出)
* [在OpenMPI集群中提交训练作业](#在openmpi集群中提交训练作业)
* [准备OpenMPI集群](#准备OpenMPI集群)
* [启动集群作业](#启动集群作业-1)
* [在Kubernetes集群中提交训练作业](#在kubernetes集群中提交训练作业)
# 概述
本文将介绍如何使用PaddlePaddle在不同的集群框架下完成分布式训练。分布式训练架构如下图所示:
<img src="https://user-images.githubusercontent.com/13348433/31772175-5f419eca-b511-11e7-9db7-5231fe3d9ccb.png" width="500">
- 数据分片(Data shard): 用于训练神经网络的数据,被切分成多个部分,每个部分分别给每个trainer使用。
- 计算节点(Trainer): 每个trainer启动后读取切分好的一部分数据,开始神经网络的“前馈”和“后馈”计算,并和参数服务器通信。在完成一定量数据的训练后,上传计算得出的梯度(gradients),然后下载优化更新后的神经网络参数(parameters)。
- 参数服务器(Parameter server):每个参数服务器只保存整个神经网络所有参数的一部分。参数服务器接收从计算节点上传的梯度,并完成参数优化更新,再将更新后的参数下发到每个计算节点。
这样,通过计算节点和参数服务器的分布式协作,可以完成神经网络的SGD方法的训练。PaddlePaddle可以同时支持同步随机梯度下降(SGD)和异步随机梯度下降。
在使用同步SGD训练神经网络时,PaddlePaddle使用同步屏障(barrier),使梯度的提交和参数的更新按照顺序方式执行。在异步SGD中,则并不会等待所有trainer提交梯度才更新参数,这样极大地提高了计算的并行性:参数服务器之间不相互依赖,并行地接收梯度和更新参数,参数服务器也不会等待计算节点全部都提交梯度之后才开始下一步,计算节点之间也不会相互依赖,并行地执行模型的训练。可以看出,虽然异步SGD方式会提高参数更新并行度, 但是并不能保证参数同步更新,在任意时间某一台参数服务器上保存的参数可能比另一台要更新,与同步SGD相比,梯度会有噪声。
# 环境准备
1. 准备您的计算集群。计算集群通常由一组(几台到几千台规模)的Linux服务器组成。服务器之间可以通过局域网(LAN)联通,每台服务器具有集群中唯一的IP地址(或者可被DNS解析的主机名)。集群中的每台计算机通常被成为一个“节点”。
1. 我们需要在集群的所有节点上安装 PaddlePaddle。 如果要启用GPU,还需要在节点上安装对应的GPU驱动以及CUDA。PaddlePaddle的安装可以参考[build_and_install](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/getstarted/build_and_install)的多种安装方式。我们推荐使用[Docker](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)安装方式来快速安装PaddlePaddle。
安装完成之后,执行下面的命令可以查看已经安装的版本(docker安装方式可以进入docker容器执行:`docker run -it paddlepaddle/paddle:[tag] /bin/bash`):
```bash
$ paddle version
PaddlePaddle 0.10.0, compiled with
with_avx: ON
with_gpu: OFF
with_double: OFF
with_python: ON
with_rdma: OFF
with_timer: OFF
``` ```
# 运行分布式训练 下面以`doc/howto/usage/cluster/src/word2vec`中的代码作为实例,介绍使用PaddlePaddle v2 API完成分布式训练。
在本文中,我们将阐释如何在集群上运行分布式 Paddle 训练作业。我们将以[推荐系统](https://github.com/baidu/Paddle/tree/develop/demo/recommendation)为例创建分布式的单进程训练。 # 启动参数说明
## 启动参数服务器
执行以下的命令启动一个参数服务器并等待和计算节点的数据交互
```bash
$ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1
```
在本文中使用的[脚本](https://github.com/baidu/Paddle/tree/develop/paddle/scripts/cluster_train)通过 SSH 运行分布式作业。 它们还可以供那些运行更复杂的集群管理系统(如 MPI 和 [Kubernetes](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/k8s) )的用户参考。 如果希望可以在后台运行pserver程序,并保存输出到一个日志文件,可以运行:
```bash
$ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 &> pserver.log
```
## 前提条件 | 参数 | 是否必选 | 默认值 | 说明 |
| ------------- | ------------- | ------------- | ------------- |
| port | 必选 | 7164 | pserver监听的起始端口,根据ports_num决定<br>总端口个数,从起始端口监听多个端口用于通信 |
| ports_num | 必选 | 1 | 监听的端口个数 |
| ports_num_for_sparse | 必选 | 1 | 用于稀疏类型参数通信的端口个数 |
| num_gradient_servers | 必选 | 1 | 当前训练任务pserver总数 |
## 启动计算节点
执行以下命令启动使用python编写的trainer程序(文件名为任意文件名,如train.py)
```bash
$ python train.py
```
1. 上述脚本使用 Python 库 [fabric](http://www.fabfile.org/) 来运行 SSH 命令。 我们使用 `pip` 来安装 fabric: trainer需要和pserver保持网络联通以完成训练。trainer启动需要传入端口、pserver地址等参数使trainer可以正确连接到pserver。这些参数可以通过环境变量(https://zh.wikipedia.org/wiki/环境变量 )或编写程序时`paddle.init()`中传入参数。如果同时使用`paddle.init()`参数和环境变量,将会优先使用`paddle.init()`中传入的参数。
```bash 使用环境变量:
pip install fabric
```
2. 我们需要在集群的所有节点上安装 PaddlePaddle。 如果要启用GPU,需要在 `/usr/local/cuda` 中安装 CUDA; 否则 Paddle 将在运行时报错。 ```bash
export PADDLE_INIT_USE_GPU=False
export PADDLE_INIT_TRAINER_COUNT=1
export PADDLE_INIT_PORT=7164
export PADDLE_INIT_PORTS_NUM=1
export PADDLE_INIT_PORTS_NUM_FOR_SPARSE=1
export PADDLE_INIT_NUM_GRADIENT_SERVERS=1
export PADDLE_INIT_TRAINER_ID=0
export PADDLE_INIT_PSERVERS=127.0.0.1
```
3. 在 [`cluster_train/conf.py`] 中设置 `ROOT_DIR`, 该 ROOT_DIR 要在所有节点上存在。为了方便起见,我们通常在所有节点上创建一个 Unix 用户 `paddle`,并设置 `ROOT_DIR=/home/paddle`。这样,我们可以将 SSH 公钥写入 `/home/paddle/.ssh/authorized_keys`,以便用户 `paddle` 可以 SSH 到所有节点而不用密码。 使用参数:
## 准备工作空间 ```python
paddle.init(
use_gpu=False,
trainer_count=1,
port=7164,
ports_num=1,
ports_num_for_sparse=1,
num_gradient_servers=1,
trainer_id=0,
pservers="127.0.0.1")
```
我们将放置依赖库、配置等文件的目录视为 *工作空间(workspace)* | 参数 | 是否必选 | 默认 | 说明 |
| ------------- | ------------- | ------------- | ------------- |
| use_gpu | 可选 | False | 是否启用GPU训练 |
| trainer_count | 必选 | 1 | 当前训练任务trainer总个数 |
| port | 必选 | 7164 | 连接到pserver的端口 |
| ports_num | 必选 | 1 | 连接到pserver的端口个数 |
| ports_num_for_sparse | 必选 | 1 | 和pserver之间用于稀疏类型参数通信的端口个数 |
| num_gradient_servers | 必选 | 1 | 当前训练任务pserver总数 |
| trainer_id | 必选 | 0 | 每个trainer的唯一ID,从0开始的整数 |
| pservers | 必选 | 127.0.0.1 | 当前训练任务启动的pserver的IP列表,多个IP使用“,”隔开 |
这些 `train/test` 数据应该在启动集群作业之前准备好。 为了满足训练/测试数据放置在工作空间中不同目录的要求,PADDLE 根据在模型配置文件中使用的名为 `train.list/test.list` 的索引文件引用训练/测试数据,所以训练/测试数据也包含 train.list/test.list 两个列表文件。所有本地训练 demo 已经提供了脚本来帮助您创建这两个文件,并且集群作业中的所有节点将在正常情况下处理具有相同逻辑代码的文件。
通常,你可以使用本地训练中的相同模型文件进行集群训练。请记住,在模型文件的 `setting`函数中设置的 `batch_size` 表示在集群作业**每个**节点中的 batch 大小,而不是使用同步 SGD 的总 batch 大小。 ## 准备数据集
以下步骤基于 demo 目录中的 [demo/recommendation](https://github.com/PaddlePaddle/Paddle/tree/develop/demo/recommendation) 参考样例数据准备脚本[prepare.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py),准备训练数据和验证数据集,我们使用paddle.dataset.imikolov数据集,并根据分布式训练并发数(trainer节点个数),在`prepare.py`开头部分指定`SPLIT_COUNT`将数据切分成多份
你只需完成 demo/recommendation 教程文档到 `Train` 的部分,之后你会得到训练/测试数据和模型配置文件。最后,只需使用 demo/recommendation 作为集群训练的工作空间。 在线上系统中,通常会使用MapReduce任务的输出结果作为训练结果,这样训练文件的个数会比较多,而且个数并不确定。在trainer中可以使用下面取模的方法为每个trainer分配训练数据文件:
最后,你的工作空间应如下所示: ```python
``` import os
. train_list = []
|-- common_utils.py flist = os.listdir("/train_data/")
|-- data for f in flist:
| |-- config.json suffix = int(f.split("-")[1])
| |-- config_generator.py if suffix % TRAINER_COUNT == TRAINER_ID:
| |-- meta.bin train_list.append(f)
| |-- meta_config.json
| |-- meta_generator.py
| |-- ml-1m
| |-- ml_data.sh
| |-- ratings.dat.test
| |-- ratings.dat.train
| |-- split.py
| |-- test.list
| `-- train.list
|-- dataprovider.py
|-- evaluate.sh
|-- prediction.py
|-- preprocess.sh
|-- requirements.txt
|-- run.sh
`-- trainer_config.py
``` ```
虽然这些文件并非都需要集群训练,但是也没有必要删除无用的文件。
`trainer_config.py` 示例程序`prepare.py`会把训练集和测试集分别分割成多个文件(例子中为3个,后缀为`-00000``-00001``-00002`):
表示模型配置文件。 ```
train.txt
`train.list``test.list` train.txt-00000
文件索引。它存储当前节点所有训练/测试数据的所有相对或绝对文件路径。 train.txt-00001
train.txt-00002
test.txt
test.txt-00000
test.txt-00001
test.txt-00002
```
`dataprovider.py` 在进行分布式训练时,每个trainer进程需要能够读取属于自己的一份数据。在一些分布式系统中,系统会提供一个分布式存储服务,这样保存在分布式存储中的数据可以被集群中的每个节点读取到。如果不使用分布式存储,则需要手动拷贝属于每个trainer节点的训练数据到对应的节点上。
用于读取训练/测试样本。这与本地训练相同。
`data` 对于不同的训练任务,训练数据格式和训练程序的`reader()`会大不相同,所以开发者需要根据自己训练任务的实际场景完成训练数据的分割和`reader()`的编写。
数据目录中的所有文件被 train.list/test.list 引用。
## 准备训练程序
## 准备集群作业配置 我们会对每个训练任务都会在每个节点上创建一个工作空间(workspace),其中包含了用户的训练程序、程序依赖、挂载或下载的训练数据分片。
以下选项必须在 cluster_train/conf.py 中认真设置 最后,工作空间应如下所示:
```
.
|-- my_lib.py
|-- word_dict.pickle
|-- train.py
|-- train_data_dir/
| |-- train.txt-00000
| |-- train.txt-00001
| |-- train.txt-00002
`-- test_data_dir/
|-- test.txt-00000
|-- test.txt-00001
`-- test.txt-00002
```
`HOSTS` 所有节点运行集群作业的主机名或 IP 。你还可以将用户和 ssh 端口附加到主机名上,例如 root@192.168.100.17:9090。 - `my_lib.py`:会被`train.py`调用的一些用户定义的库函数,比如PIL库等。
- `word_dict.pickle`:在`train.py`中会使用到的字典数据文件。
- `train.py`:训练程序,代码参考[api_train_v2_cluster.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py)***注意:*** 对于本样例代码,在使用不同的分布式计算平台时,您可能需要修改`train.py`开头的部分(如下),以便获得训练数据的位置和获取环境变量配置:
`ROOT_DIR` 用于放置 JOB 工作空间目录的工作空间 ROOT 目录 ```python
cluster_train_file = "./train_data_dir/train/train.txt"
cluster_test_file = "./test_data_dir/test/test.txt"
node_id = os.getenv("OMPI_COMM_WORLD_RANK")
if not node_id:
raise EnvironmentError("must provied OMPI_COMM_WORLD_RANK")
```
`PADDLE_NIC` 集群通信通道的 NIC(Network Interface Card, 网络接口卡) 接口名称,例如以太网的 eth0,infiniband 的 ib0。 - `train_data_dir`:包含训练数据的目录,可以是从分布式存储挂载过来的,也可以是在任务启动前下载到本地的。
- `test_data_dir`:包含测试数据集的目录。
`PADDLE_PORT` 集群通信通道的端口号 # 使用分布式计算平台或工具
`PADDLE_PORTS_NUM` 用于集群通信通道的端口数。 如果集群节点数量少(少于5〜6个节点),建议将其设置为较大,如2〜8,以获得更好的网络性能。 PaddlePaddle可以使用多种分布式计算平台构建分布式计算任务,包括:
- [Kubernetes](http://kubernetes.io) Google开源的容器集群的调度框架,支持大规模集群生产环境的完整集群方案。
- [OpenMPI](https://www.open-mpi.org) 成熟的高性能并行计算框架。
- [Fabric](http://www.fabfile.org) 集群管理工具。可以使用`Fabric`编写集群任务提交和管理脚本。
`PADDLE_PORTS_NUM_FOR_SPARSE` 用于 sparse remote updater 集群通信信道的端口数。如果使用 sparse remote update,则可以像 `PADDLE_PORTS_NUM` 一样设置 对于不同的集群平台,会分别介绍集群作业的启动和停止方法。这些例子都可以在[cluster_train_v2](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/scripts/cluster_train_v2)找到
`LD_LIBRARY_PATH` 为集群作业设置额外的 LD_LIBRARY_PATH。你可以使用它来设置 CUDA 库的路径 在使用分布式计算平台进行训练时,任务被调度在集群中时,分布式计算平台通常会通过API或者环境变量提供任务运行需要的参数,比如节点的ID、IP和任务节点个数等
默认配置如下: ## 使用Fabric启动集群作业
```python ### 准备一个Linux集群
HOSTS = [ 可以在`paddle/scripts/cluster_train_v2/fabric/docker_cluster`目录下,执行`kubectl -f ssh_servers.yaml`启动一个测试集群,并使用`kubectl get po -o wide`获得这些节点的IP地址。
"root@192.168.100.17",
"root@192.168.100.18",
]
'''
工作空间配置
'''
#工作空间根目录
ROOT_DIR = "/home/paddle"
'''
网络配置
'''
#pserver NIC
PADDLE_NIC = "eth0"
#pserver 端口
PADDLE_PORT = 7164
#pserver 端口数
PADDLE_PORTS_NUM = 2
#pserver sparse ports num
PADDLE_PORTS_NUM_FOR_SPARSE = 2
#集群作业中所有进程的环境设置
LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/lib64"
```
### 启动集群作业 ### 启动集群作业
`paddle.py` 提供了自动化脚本来启动不同节点中的所有 PaddlePaddle 集群进程。默认情况下,所有命令行选项可以设置为```paddle.py``` 命令选项并且 `paddle.py` 将透明、自动地将这些选项应用到 PaddlePaddle 底层进程。
`paddle.py` 提供了自动化脚本来启动不同节点中的所有 PaddlePaddle 集群进程。默认情况下,所有命令行选项可以设置为 `paddle.py` 命令选项并且 `paddle.py` 将透明、自动地将这些选项应用到 PaddlePaddle 底层进程。
`paddle.py` 为方便作业启动提供了两个独特的命令选项。 `paddle.py` 为方便作业启动提供了两个独特的命令选项。
`job_dispatch_package` 设为本地 `workspace` 目录,它将被分发到 conf.py 中设置的所有节点。 它有助于帮助频繁修改和访问工作区文件的用户减少负担,否则频繁的多节点工作空间部署可能会很麻烦。 - `job_dispatch_package` 设为本地 `workspace` 目录,它将被分发到 `conf.py` 中设置的所有节点。它有助于帮助频繁修改和访问工作区文件的用户减少负担,否则频繁的多节点工作空间部署可能会很麻烦。
`job_workspace` 设为已部署的工作空间目录,`paddle.py` 将跳过分发阶段直接启动所有节点的集群作业。它可以帮助减少分发延迟。 - `job_workspace` 设为已部署的工作空间目录,`paddle.py` 将跳过分发阶段直接启动所有节点的集群作业。它可以帮助减少分发延迟。
`cluster_train/run.sh` 提供了命令样例来运行 `demo/recommendation` 集群工作,只需用你定义的目录修改 `job_dispatch_package``job_workspace`,然后: `cluster_train/run.sh` 提供了命令样例来运行 `doc/howto/usage/cluster/src/word2vec` 集群任务,只需用您定义的目录修改 `job_dispatch_package``job_workspace`,然后:
``` ```
sh run.sh sh run.sh
``` ```
...@@ -149,7 +229,7 @@ sh run.sh ...@@ -149,7 +229,7 @@ sh run.sh
提供 pserver 运行日志,有助于诊断分布式错误。 提供 pserver 运行日志,有助于诊断分布式错误。
`server.log` `server.log`
提供 pserver 进程的 stderr 和 stdout。训练失败时可以检查错误日志。 提供 parameter server 进程的 stderr 和 stdout。训练失败时可以检查错误日志。
`train.log` `train.log`
提供训练过程的 stderr 和 stdout。训练失败时可以检查错误日志。 提供训练过程的 stderr 和 stdout。训练失败时可以检查错误日志。
...@@ -157,3 +237,49 @@ sh run.sh ...@@ -157,3 +237,49 @@ sh run.sh
### 检查模型输出 ### 检查模型输出
运行完成后,模型文件将被写入节点 0 的 `output` 目录中。 运行完成后,模型文件将被写入节点 0 的 `output` 目录中。
工作空间中的 `nodefile` 表示当前集群作业的节点 ID。 工作空间中的 `nodefile` 表示当前集群作业的节点 ID。
## 在OpenMPI集群中提交训练作业
### 准备OpenMPI集群
执行下面的命令以启动3个节点的OpenMPI集群和一个"head"节点:
```bash
paddle/scripts/cluster_train_v2/openmpi/docker_cluster
kubectl create -f head.yaml
kubectl create -f mpi-nodes.yaml
```
然后可以从head节点ssh无密码登录到OpenMPI的每个节点上。
### 启动集群作业
您可以按照下面的步骤在OpenMPI集群中提交paddle训练任务:
```bash
# 获得head和node节点的IP地址
kubectl get po -o wide
# 将node节点的IP地址保存到machines文件中
kubectl get po -o wide | grep nodes | awk '{print $6}' > machines
# 拷贝必要的文件到head节点
scp -i ssh/id_rsa.mpi.pub machines prepare.py train.py start_mpi_train.sh tutorial@[headIP]:~
# ssh 登录到head节点
ssh -i ssh/id_rsa.mpi.pub tutorial@[headIP]
# --------------- 以下操作均在head节点中执行 ---------------
# 准备训练数据
python prepare.py
# 拷贝训练程序和字典文件到每台MPI节点
cat machines | xargs -i scp word_dict.pickle train.py start_mpi_train.sh machines {}:/home/tutorial
# 创建日志目录
mpirun -hostfile machines -n 3 mkdir /home/tutorial/logs
# 拷贝训练数据到各自的节点
scp train.txt-00000 test.txt-00000 [node1IP]:/home/tutorial
scp train.txt-00001 test.txt-00001 [node2IP]:/home/tutorial
scp train.txt-00002 test.txt-00002 [node3IP]:/home/tutorial
# 启动训练任务
mpirun -hostfile machines -n 3 /home/tutorial/start_mpi_train.sh
```
## 在Kubernetes集群中提交训练作业
此部分的使用方法可以参考[here](../k8s/k8s_distributed_cn.md)
# Run Distributed Training # PaddlePaddle Distributed Training
* [Introduction](#introduction)
* [Preparations](#preparations)
* [Command-line arguments](#command-line-arguments)
* [Starting parameter server](#starting-parameter-server)
* [Starting trainer](#starting-trainer)
* [Prepare Training Dataset](#prepare-training-dataset)
* [Prepare Training program](#prepare-training-program)
* [Use cluster platforms or cluster management tools](#use-cluster-platforms-or-cluster-management-tools)
* [Cluster Training Using Fabric](#cluster-training-using-fabric)
* [Prepare a Linux cluster](#prepare-a-linux-cluster)
* [Launching Cluster Job](#launching-cluster-job)
* [Kill Cluster Job](#kill-cluster-job)
* [Check Cluster Training Result](#check-cluster-training-result)
* [Check Model Output](#check-model-output)
* [Cluster Training Using OpenMPI](#cluster-training-using-openmpi)
* [Prepare an OpenMPI cluster](#prepare-an-openmpi-cluster)
* [Launching Cluster Job](#launching-cluster-job-1)
* [Cluster Training Using Kubernetes](#cluster-training-using-kubernetes)
# Introduction
In this article, we'll explain how to run distributed training jobs with PaddlePaddle on different types of clusters. The diagram below shows the main architecture of a distributed trainning job:
<img src="https://user-images.githubusercontent.com/13348433/31772146-41523d84-b511-11e7-8a12-a69fd136c283.png" width="500">
- Data shard: training data will be split into multiple partitions, trainers use the partitions of the whole dataset to do the training job.
- Trainer: each trainer reads the data shard, and train the neural network. Then the trainer will upload calculated "gradients" to parameter servers, and wait for parameters to be optimized on the parameter server side. When that finishes, the trainer download optimized parameters and continues its training.
- Parameter server: every parameter server stores part of the whole neural network model data. They will do optimization calculations when gradients are uploaded from trainers, and then send updated parameters to trainers.
PaddlePaddle can support both synchronize stochastic gradient descent (SGD) and asynchronous SGD.
When training with synchronize SGD, PaddlePaddle uses an internal "synchronize barrier" which makes gradients update and parameter download in strict order. On the other hand, asynchronous SGD won't wait for all trainers to finish upload at a single step, this will increase the parallelism of distributed training: parameter servers do not depend on each other, they'll do parameter optimization concurrently. Parameter servers will not wait for trainers, so trainers will also do their work concurrently. But asynchronous SGD will introduce more randomness and noises in the gradient.
# Preparations
1. Prepare your computer cluster. It's normally a bunch of Linux servers connected by LAN. Each server will be assigned a unique IP address. The computers in the cluster can be called "nodes".
2. Install PaddlePaddle on every node. If you are going to take advantage of GPU cards, you'll also need to install proper driver and CUDA libraries. To install PaddlePaddle please read [this build and install](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/getstarted/build_and_install) document. We strongly recommend using [Docker installation](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
After installation, you can check the version by typing the below command (run a docker container if using docker: `docker run -it paddlepaddle/paddle:[tag] /bin/bash`):
```bash
$ paddle version
PaddlePaddle 0.10.0rc, compiled with
with_avx: ON
with_gpu: OFF
with_double: OFF
with_python: ON
with_rdma: OFF
with_timer: OFF
```
In this article, we explain how to run distributed Paddle training jobs on clusters. We will create the distributed version of the single-process training example, [recommendation](https://github.com/baidu/Paddle/tree/develop/demo/recommendation). We'll take `doc/howto/usage/cluster/src/word2vec` as an example to introduce distributed training using PaddlePaddle v2 API.
[Scripts](https://github.com/baidu/Paddle/tree/develop/paddle/scripts/cluster_train) used in this article launch distributed jobs via SSH. They also work as a reference for users running more sophisticated cluster management systems like MPI and [Kubernetes](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/k8s). # Command-line arguments
## Prerequisite ## Starting parameter server
1. Aforementioned scripts use a Python library [fabric](http://www.fabfile.org/) to run SSH commands. We can use `pip` to install fabric: Type the below command to start a parameter server which will wait for trainers to connect:
```bash ```bash
pip install fabric $ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1
``` ```
1. We need to install PaddlePaddle on all nodes in the cluster. To enable GPUs, we need to install CUDA in `/usr/local/cuda`; otherwise Paddle would report errors at runtime. If you wish to run parameter servers in background, and save a log file, you can type:
```bash
$ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 &> pserver.log
```
1. Set the `ROOT_DIR` variable in [`cluster_train/conf.py`] on all nodes. For convenience, we often create a Unix user `paddle` on all nodes and set `ROOT_DIR=/home/paddle`. In this way, we can write public SSH keys into `/home/paddle/.ssh/authorized_keys` so that user `paddle` can SSH to all nodes without password. | param | required | default | description |
| ------------- | ------------- | ------------- | ------------- |
| port | required | 7164 | port which parameter server will listen on. If ports_num greater than 1, parameter server will listen on multiple ports for more network throughput |
| ports_num | required | 1 | total number of ports will listen on |
| ports_num_for_sparse | required | 1 | number of ports which serves sparse parameter update |
| num_gradient_servers | required | 1 | total number of gradient servers |
## Prepare Job Workspace ## Starting trainer
Type the command below to start the trainer(name the file whatever you want, like "train.py")
We refer to the directory where we put dependent libraries, config files, etc., as *workspace*. ```bash
$ python train.py
```
These `train/test` data should be prepared before launching cluster job. To satisfy the requirement that train/test data are placed in different directory from workspace, PADDLE refers train/test data according to index file named as `train.list/test.list` which are used in model config file. So the train/test data also contains train.list/test.list two list file. All local training demo already provides scripts to help you create these two files, and all nodes in cluster job will handle files with same logical code in normal condition. Trainers' network need to be connected with parameter servers' network to finish the job. Trainers need to know port and IPs to locate parameter servers. You can pass arguments to trainers through [environment variables](https://en.wikipedia.org/wiki/Environment_variable) or pass to `paddle.init()` function. Arguments passed to the `paddle.init()` function will overwrite environment variables.
Generally, you can use same model file from local training for cluster training. What you should have in mind that, the `batch_size` set in `setting` function in model file means batch size in `each` node of cluster job instead of total batch size if synchronization SGD was used. Use environment viriables:
Following steps are based on [demo/recommendation](https://github.com/PaddlePaddle/Paddle/tree/develop/demo/recommendation) demo in demo directory. ```bash
export PADDLE_INIT_USE_GPU=False
export PADDLE_INIT_TRAINER_COUNT=1
export PADDLE_INIT_PORT=7164
export PADDLE_INIT_PORTS_NUM=1
export PADDLE_INIT_PORTS_NUM_FOR_SPARSE=1
export PADDLE_INIT_NUM_GRADIENT_SERVERS=1
export PADDLE_INIT_TRAINER_ID=0
export PADDLE_INIT_PSERVERS=127.0.0.1
python train.py
```
You just go through demo/recommendation tutorial doc until `Train` section, and at last you will get train/test data and model configuration file. Finaly, just use demo/recommendation as workspace for cluster training. Pass arguments:
At last your workspace should look like as follow: ```python
paddle.init(
use_gpu=False,
trainer_count=1,
port=7164,
ports_num=1,
ports_num_for_sparse=1,
num_gradient_servers=1,
trainer_id=0,
pservers="127.0.0.1")
``` ```
.
|-- common_utils.py | param | required | default | description |
|-- data | ------------- | ------------- | ------------- | ------------- |
| |-- config.json | use_gpu | optional | False | set to "True" to enable GPU training |
| |-- config_generator.py | trainer_count | required | 1 | total count of trainers in the training job |
| |-- meta.bin | port | required | 7164 | port to connect to parameter server |
| |-- meta_config.json | ports_num | required | 1 | number of ports for communication |
| |-- meta_generator.py | ports_num_for_sparse | required | 1 | number of ports for sparse type caculation |
| |-- ml-1m | num_gradient_servers | required | 1 | total number of gradient server |
| |-- ml_data.sh | trainer_id | required | 0 | ID for every trainer, start from 0 |
| |-- ratings.dat.test | pservers | required | 127.0.0.1 | list of IPs of parameter servers, separated by "," |
| |-- ratings.dat.train
| |-- split.py ## Prepare Training Dataset
| |-- test.list
| `-- train.list Here's some example code [prepare.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py), it will download public `imikolov` dataset and split it into multiple files according to job parallelism(trainers count). Modify `SPLIT_COUNT` at the begining of `prepare.py` to change the count of output files.
|-- dataprovider.py
|-- evaluate.sh In the real world, we often use `MapReduce` job's output as training data, so there will be lots of files. You can use `mod` to assign training file to trainers:
|-- prediction.py
|-- preprocess.sh ```python
|-- requirements.txt import os
|-- run.sh train_list = []
`-- trainer_config.py flist = os.listdir("/train_data/")
for f in flist:
suffix = int(f.split("-")[1])
if suffix % TRAINER_COUNT == TRAINER_ID:
train_list.append(f)
```
Example code `prepare.py` will split training data and testing data into 3 files with digital suffix like `-00000`, `-00001` and`-00002`:
```
train.txt
train.txt-00000
train.txt-00001
train.txt-00002
test.txt
test.txt-00000
test.txt-00001
test.txt-00002
``` ```
Not all of these files are needed for cluster training, but it's not necessary to remove useless files.
`trainer_config.py` When job started, every trainer needs to get it's own part of data. In some distributed systems a storage service will be provided, so the date under that path can be accessed by all the trainer nodes. Without the storage service, you must copy the training data to each trainer node.
Indicates the model config file.
`train.list` and `test.list` Different training jobs may have different data format and `reader()` function, developers may need to write different data prepare scripts and `reader()` functions for their job.
File index. It stores all relative or absolute file paths of all train/test data at current node.
`dataprovider.py` ## Prepare Training program
used to read train/test samples. It's same as local training.
`data` We'll create a *workspace* directory on each node, storing your training program, dependencies, mounted or downloaded dataset directory.
all files in data directory are refered by train.list/test.list which are refered by data provider.
## Prepare Cluster Job Configuration Your workspace may looks like:
```
.
|-- my_lib.py
|-- word_dict.pickle
|-- train.py
|-- train_data_dir/
| |-- train.txt-00000
| |-- train.txt-00001
| |-- train.txt-00002
`-- test_data_dir/
|-- test.txt-00000
|-- test.txt-00001
`-- test.txt-00002
```
The options below must be carefully set in cluster_train/conf.py - `my_lib.py`: user defined libraries, like PIL libs. This is optional.
- `word_dict.pickle`: dict file for training word embeding.
- `train.py`: training program. Sample code: [api_train_v2_cluster.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py). ***NOTE:*** You may need to modify the head part of `train.py` when using different cluster platform to retrive configuration environment variables:
`HOSTS` all nodes hostname or ip that will run cluster job. You can also append user and ssh port with hostname, such as root@192.168.100.17:9090. ```python
cluster_train_file = "./train_data_dir/train/train.txt"
cluster_test_file = "./test_data_dir/test/test.txt"
node_id = os.getenv("OMPI_COMM_WORLD_RANK")
if not node_id:
raise EnvironmentError("must provied OMPI_COMM_WORLD_RANK")
```
`ROOT_DIR` workspace ROOT directory for placing JOB workspace directory - `train_data_dir`: containing training data. Mount from storage service or copy trainning data to here.
- `test_data_dir`: containing testing data.
`PADDLE_NIC` the NIC(Network Interface Card) interface name for cluster communication channel, such as eth0 for ethternet, ib0 for infiniband. # Use cluster platforms or cluster management tools
`PADDLE_PORT` port number for cluster commnunication channel PaddlePaddle supports running jobs on several platforms including:
- [Kubernetes](http://kubernetes.io) open-source system for automating deployment, scaling, and management of containerized applications from Google.
- [OpenMPI](https://www.open-mpi.org) Mature high performance parallel computing framework.
- [Fabric](http://www.fabfile.org) A cluster management tool. Write scripts to submit jobs or manage the cluster.
`PADDLE_PORTS_NUM` the number of port used for cluster communication channle. if the number of cluster nodes is small(less than 5~6nodes), recommend you set it to larger, such as 2 ~ 8, for better network performance. We'll introduce cluster job management on these platforms. The examples can be found under [cluster_train_v2](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/scripts/cluster_train_v2).
`PADDLE_PORTS_NUM_FOR_SPARSE` the number of port used for sparse updater cluster commnunication channel. if sparse remote update is used, set it like `PADDLE_PORTS_NUM` These cluster platforms provide API or environment variables for training processes, when the job is dispatched to different nodes. Like node ID, IP or total number of nodes etc.
`LD_LIBRARY_PATH` set addtional LD_LIBRARY_PATH for cluster job. You can use it to set CUDA libraries path. ## Cluster Training Using Fabric
Default Configuration as follow: ### Prepare a Linux cluster
```python Run `kubectl -f ssh_servers.yaml` under the directory: `paddle/scripts/cluster_train_v2/fabric/docker_cluster` will launch a demo cluster. Run `kubectl get po -o wide` to get IP addresses of these nodes.
HOSTS = [
"root@192.168.100.17",
"root@192.168.100.18",
]
'''
workspace configuration
'''
#root dir for workspace
ROOT_DIR = "/home/paddle"
'''
network configuration
'''
#pserver nics
PADDLE_NIC = "eth0"
#pserver port
PADDLE_PORT = 7164
#pserver ports num
PADDLE_PORTS_NUM = 2
#pserver sparse ports num
PADDLE_PORTS_NUM_FOR_SPARSE = 2
#environments setting for all processes in cluster job
LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/lib64"
```
### Launching Cluster Job ### Launching Cluster Job
`paddle.py` provides automatical scripts to start all PaddlePaddle cluster processes in different nodes. By default, all command line options can set as `paddle.py` command options and `paddle.py` will transparently and automatically set these options to PaddlePaddle lower level processes. `paddle.py` provides automatical scripts to start all PaddlePaddle cluster processes in different nodes. By default, all command line options can be set as `paddle.py` command options and `paddle.py` will transparently and automatically set these options to PaddlePaddle lower level processes.
`paddle.py`provides two distinguished command option for easy job launching. `paddle.py`provides two distinguished command option for easy job launching.
`job_dispatch_package` set it with local `workspace`directory, it will be dispatched to all nodes set in conf.py. It could be helpful for frequent hacking workspace files, otherwise frequent mulit-nodes workspace deployment could make your crazy. - `job_dispatch_package` set it with local `workspace` directory, it will be dispatched to all nodes which is set in `conf.py`. It could be helpful for frequently manipulating workspace files. otherwise, frequent multi-nodes workspace deployment is very annoying.
`job_workspace` set it with already deployed workspace directory, `paddle.py` will skip dispatch stage to directly launch cluster job with all nodes. It could help to reduce heavy - `job_workspace` set it with already deployed workspace directory, `paddle.py` will skip dispatch stage to directly launch cluster job with all nodes. It could help to reduce heavy
dispatch latency. dispatch latency.
`cluster_train/run.sh` provides command line sample to run `demo/recommendation` cluster job, just modify `job_dispatch_package` and `job_workspace` with your defined directory, then: `cluster_train/run.sh` provides command line sample to run `demo/recommendation` cluster job, just modify `job_dispatch_package` and `job_workspace` with your defined directory, then:
...@@ -134,23 +225,69 @@ sh run.sh ...@@ -134,23 +225,69 @@ sh run.sh
The cluster Job will start in several seconds. The cluster Job will start in several seconds.
### Kill Cluster Job ### Kill Cluster Job
`paddle.py` can capture `Ctrl + C` SIGINT signal to automatically kill all processes launched by it. So just stop `paddle.py` to kill cluster job. You should mannally kill job if program crashed. `paddle.py` can capture `Ctrl + C` SIGINT signal to automatically kill all processes launched by it. So just stop `paddle.py` to kill cluster job. You should manually kill the job if the program crashed.
### Check Cluster Training Result ### Check Cluster Training Result
Check log in $workspace/log for details, each node owns same log structure. Check log in $workspace/log for details, each node owns same log structure.
`paddle_trainer.INFO` `paddle_trainer.INFO`
It provides almost all interal output log for training, same as local training. Check runtime model convergence here. It provides almost all internal output log for training, same as local training. Check runtime model convergence here.
`paddle_pserver2.INFO` `paddle_pserver2.INFO`
It provides pserver running log, which could help to diagnose distributed error. It provides parameter server running log, which could help to diagnose distributed error.
`server.log` `server.log`
It provides stderr and stdout of pserver process. Check error log if training crashs. It provides stderr and stdout of parameter server process. Check error log if training crashes.
`train.log` `train.log`
It provides stderr and stdout of trainer process. Check error log if training crashs. It provides stderr and stdout of trainer process. Check error log if training crashes.
### Check Model Output ### Check Model Output
After one pass finished, model files will be writed in `output` directory in node 0. After one pass finished, model files will be written in `output` directory in node 0.
`nodefile` in workspace indicates the node id of current cluster job. `nodefile` in workspace indicates the node id of current cluster job.
## Cluster Training Using OpenMPI
### Prepare an OpenMPI cluster
Run the following command to start a 3-node MPI cluster and one "head" node.
```bash
cd paddle/scripts/cluster_train_v2/openmpi/docker_cluster
kubectl create -f head.yaml
kubectl create -f mpi-nodes.yaml
```
Then you can log in to every OpenMPI node using ssh without input any passwords.
### Launching Cluster Job
Follow the steps to launch a PaddlePaddle training job in OpenMPI cluster:\
```bash
# find out node IP addresses
kubectl get po -o wide
# generate a "machines" file containing node IP addresses
kubectl get po -o wide | grep nodes | awk '{print $6}' > machines
# copy necessary files onto "head" node
scp -i ssh/id_rsa.mpi.pub machines prepare.py train.py start_mpi_train.sh tutorial@[headIP]:~
# login to head node using ssh
ssh -i ssh/id_rsa.mpi.pub tutorial@[headIP]
# --------------- in head node ---------------
# prepare training data
python prepare.py
# copy training data and dict file to MPI nodes
cat machines | xargs -i scp word_dict.pickle train.py start_mpi_train.sh machines {}:/home/tutorial
# creat a directory for storing log files
mpirun -hostfile machines -n 3 mkdir /home/tutorial/logs
# copy training data to every node
scp train.txt-00000 test.txt-00000 [node1IP]:/home/tutorial
scp train.txt-00001 test.txt-00001 [node2IP]:/home/tutorial
scp train.txt-00002 test.txt-00002 [node3IP]:/home/tutorial
# start the job
mpirun -hostfile machines -n 3 /home/tutorial/start_mpi_train.sh
```
## Cluster Training Using Kubernetes
The details can be found [here](../k8s/k8s_cn.md)
import gzip
import math
import paddle.v2 as paddle
embsize = 32
hiddensize = 256
N = 5
def wordemb(inlayer):
wordemb = paddle.layer.embedding(
input=inlayer,
size=embsize,
param_attr=paddle.attr.Param(
name="_proj",
initial_std=0.001,
learning_rate=1,
l2_rate=0,
sparse_update=True))
return wordemb
def main():
# for local training
cluster_train = False
if not cluster_train:
paddle.init(use_gpu=False, trainer_count=1)
else:
paddle.init(
use_gpu=False,
trainer_count=2,
port=7164,
ports_num=1,
ports_num_for_sparse=1,
num_gradient_servers=1)
word_dict = paddle.dataset.imikolov.build_dict()
dict_size = len(word_dict)
firstword = paddle.layer.data(
name="firstw", type=paddle.data_type.integer_value(dict_size))
secondword = paddle.layer.data(
name="secondw", type=paddle.data_type.integer_value(dict_size))
thirdword = paddle.layer.data(
name="thirdw", type=paddle.data_type.integer_value(dict_size))
fourthword = paddle.layer.data(
name="fourthw", type=paddle.data_type.integer_value(dict_size))
nextword = paddle.layer.data(
name="fifthw", type=paddle.data_type.integer_value(dict_size))
Efirst = wordemb(firstword)
Esecond = wordemb(secondword)
Ethird = wordemb(thirdword)
Efourth = wordemb(fourthword)
contextemb = paddle.layer.concat(input=[Efirst, Esecond, Ethird, Efourth])
hidden1 = paddle.layer.fc(input=contextemb,
size=hiddensize,
act=paddle.activation.Sigmoid(),
layer_attr=paddle.attr.Extra(drop_rate=0.5),
bias_attr=paddle.attr.Param(learning_rate=2),
param_attr=paddle.attr.Param(
initial_std=1. / math.sqrt(embsize * 8),
learning_rate=1))
predictword = paddle.layer.fc(input=hidden1,
size=dict_size,
bias_attr=paddle.attr.Param(learning_rate=2),
act=paddle.activation.Softmax())
def event_handler(event):
if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 100 == 0:
with gzip.open("batch-" + str(event.batch_id) + ".tar.gz",
'w') as f:
trainer.save_parameter_to_tar(f)
result = trainer.test(
paddle.batch(
paddle.dataset.imikolov.test(word_dict, N), 32))
print "Pass %d, Batch %d, Cost %f, %s, Testing metrics %s" % (
event.pass_id, event.batch_id, event.cost, event.metrics,
result.metrics)
cost = paddle.layer.classification_cost(input=predictword, label=nextword)
parameters = paddle.parameters.create(cost)
adagrad = paddle.optimizer.AdaGrad(
learning_rate=3e-3,
regularization=paddle.optimizer.L2Regularization(8e-4))
trainer = paddle.trainer.SGD(cost,
parameters,
adagrad,
is_local=not cluster_train)
trainer.train(
paddle.batch(paddle.dataset.imikolov.train(word_dict, N), 32),
num_passes=30,
event_handler=event_handler)
if __name__ == '__main__':
main()
import math
import os
import paddle.v2 as paddle
import pickle
embsize = 32
hiddensize = 256
N = 5
cluster_train_file = "./train_data_dir/train/train.txt"
cluster_test_file = "./test_data_dir/test/test.txt"
node_id = os.getenv("OMPI_COMM_WORLD_RANK")
if not node_id:
raise EnvironmentError("must provied OMPI_COMM_WORLD_RANK")
def wordemb(inlayer):
wordemb = paddle.layer.embedding(
input=inlayer,
size=embsize,
param_attr=paddle.attr.Param(
name="_proj",
initial_std=0.001,
learning_rate=1,
l2_rate=0,
sparse_update=True))
return wordemb
def cluster_reader_cluster(filename, node_id):
def cluster_reader():
with open("-".join([filename, "%05d" % int(node_id)]), "r") as f:
for l in f:
csv_data = [int(cell) for cell in l.split(",")]
yield tuple(csv_data)
return cluster_reader
def main():
# get arguments from env
# for local training
TRUTH = ["true", "True", "TRUE", "1", "yes", "Yes", "YES"]
cluster_train = os.getenv('PADDLE_CLUSTER_TRAIN', "False") in TRUTH
use_gpu = os.getenv('PADDLE_INIT_USE_GPU', "False")
if not cluster_train:
paddle.init(
use_gpu=use_gpu,
trainer_count=int(os.getenv("PADDLE_INIT_TRAINER_COUNT", "1")))
else:
paddle.init(
use_gpu=use_gpu,
trainer_count=int(os.getenv("PADDLE_INIT_TRAINER_COUNT", "1")),
port=int(os.getenv("PADDLE_INIT_PORT", "7164")),
ports_num=int(os.getenv("PADDLE_INIT_PORTS_NUM", "1")),
ports_num_for_sparse=int(
os.getenv("PADDLE_INIT_PORTS_NUM_FOR_SPARSE", "1")),
num_gradient_servers=int(
os.getenv("PADDLE_INIT_NUM_GRADIENT_SERVERS", "1")),
trainer_id=int(os.getenv("PADDLE_INIT_TRAINER_ID", "0")),
pservers=os.getenv("PADDLE_INIT_PSERVERS", "127.0.0.1"))
fn = open("thirdparty/wuyi_train_thdpty/word_dict.pickle", "r")
word_dict = pickle.load(fn)
fn.close()
dict_size = len(word_dict)
firstword = paddle.layer.data(
name="firstw", type=paddle.data_type.integer_value(dict_size))
secondword = paddle.layer.data(
name="secondw", type=paddle.data_type.integer_value(dict_size))
thirdword = paddle.layer.data(
name="thirdw", type=paddle.data_type.integer_value(dict_size))
fourthword = paddle.layer.data(
name="fourthw", type=paddle.data_type.integer_value(dict_size))
nextword = paddle.layer.data(
name="fifthw", type=paddle.data_type.integer_value(dict_size))
Efirst = wordemb(firstword)
Esecond = wordemb(secondword)
Ethird = wordemb(thirdword)
Efourth = wordemb(fourthword)
contextemb = paddle.layer.concat(input=[Efirst, Esecond, Ethird, Efourth])
hidden1 = paddle.layer.fc(input=contextemb,
size=hiddensize,
act=paddle.activation.Sigmoid(),
layer_attr=paddle.attr.Extra(drop_rate=0.5),
bias_attr=paddle.attr.Param(learning_rate=2),
param_attr=paddle.attr.Param(
initial_std=1. / math.sqrt(embsize * 8),
learning_rate=1))
predictword = paddle.layer.fc(input=hidden1,
size=dict_size,
bias_attr=paddle.attr.Param(learning_rate=2),
act=paddle.activation.Softmax())
def event_handler(event):
if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 100 == 0:
result = trainer.test(
paddle.batch(
cluster_reader_cluster(cluster_test_file, node_id), 32))
print "Pass %d, Batch %d, Cost %f, %s, Testing metrics %s" % (
event.pass_id, event.batch_id, event.cost, event.metrics,
result.metrics)
cost = paddle.layer.classification_cost(input=predictword, label=nextword)
parameters = paddle.parameters.create(cost)
adagrad = paddle.optimizer.AdaGrad(
learning_rate=3e-3,
regularization=paddle.optimizer.L2Regularization(8e-4))
trainer = paddle.trainer.SGD(cost,
parameters,
adagrad,
is_local=not cluster_train)
trainer.train(
paddle.batch(cluster_reader_cluster(cluster_train_file, node_id), 32),
num_passes=30,
event_handler=event_handler)
if __name__ == '__main__':
main()
import paddle.v2 as paddle
import tarfile
import os
import pickle
SPLIT_COUNT = 3
N = 5
def file_len(fd):
for i, l in enumerate(fd):
pass
return i + 1
def split_from_reader_by_line(filename, reader, split_count):
fn = open(filename, "w")
for batch_id, batch_data in enumerate(reader()):
batch_data_str = [str(d) for d in batch_data]
fn.write(",".join(batch_data_str))
fn.write("\n")
fn.close()
fn = open(filename, "r")
total_line_count = file_len(fn)
fn.close()
per_file_lines = total_line_count / split_count + 1
cmd = "split -d -a 5 -l %d %s %s-" % (per_file_lines, filename, filename)
os.system(cmd)
word_dict = paddle.dataset.imikolov.build_dict()
with open("word_dict.pickle", "w") as dict_f:
pickle.dump(word_dict, dict_f)
split_from_reader_by_line("train.txt",
paddle.dataset.imikolov.train(word_dict, N),
SPLIT_COUNT)
split_from_reader_by_line("test.txt",
paddle.dataset.imikolov.test(word_dict, N),
SPLIT_COUNT)
...@@ -137,7 +137,7 @@ func (c *Client) FinishInitParams() error { ...@@ -137,7 +137,7 @@ func (c *Client) FinishInitParams() error {
return err return err
} }
} }
return nil return c.sel.Done()
} }
// SendGrads sends gradients to parameter servers for updating // SendGrads sends gradients to parameter servers for updating
......
...@@ -28,9 +28,9 @@ add_style_check_target(paddle_capi ${CAPI_SOURCES} ${CAPI_HEADER} ...@@ -28,9 +28,9 @@ add_style_check_target(paddle_capi ${CAPI_SOURCES} ${CAPI_HEADER}
add_dependencies(paddle_capi paddle_proto) add_dependencies(paddle_capi paddle_proto)
# combine all paddle static libraries together, into libpaddle_capi_whole.a # TODO: paddle_capi_whole will be removed.
# user should use PaddleCAPI as -lpaddle_capi_whole if(MOBILE_INFERENCE)
set(PADDLE_CAPI_INFER_LIBS set(PADDLE_CAPI_INFER_LIBS
paddle_utils paddle_utils
paddle_parameter paddle_parameter
paddle_math paddle_math
...@@ -38,13 +38,27 @@ set(PADDLE_CAPI_INFER_LIBS ...@@ -38,13 +38,27 @@ set(PADDLE_CAPI_INFER_LIBS
paddle_function paddle_function
paddle_gserver paddle_gserver
paddle_proto) paddle_proto)
else()
set(PADDLE_CAPI_INFER_LIBS
paddle_utils
paddle_parameter
paddle_math
paddle_cuda
paddle_function
paddle_gserver
paddle_proto
paddle_pserver
paddle_network)
endif()
cc_library(paddle_capi_whole DEPS paddle_capi ${PADDLE_CAPI_INFER_LIBS}) cc_library(paddle_capi_whole DEPS paddle_capi ${PADDLE_CAPI_INFER_LIBS})
# No shared library for iOS # Link the static library for inference
cc_library(paddle_capi_engine DEPS paddle_capi paddle_utils paddle_parameter paddle_math paddle_cuda paddle_proto)
cc_library(paddle_capi_layers DEPS paddle_function paddle_gserver)
# Link the shared library for inference
if(NOT IOS) if(NOT IOS)
set(LINK_FLAGS " -Wl,--retain-symbols-file ${CMAKE_CURRENT_SOURCE_DIR}/export.sym -Wl,--version-script ${CMAKE_CURRENT_SOURCE_DIR}/export.map") set(LINK_FLAGS "-Wl,--version-script ${CMAKE_CURRENT_SOURCE_DIR}/paddle_capi.map")
# TODO: merge mkl into paddle_capi_shared
add_library(paddle_capi_shared SHARED ${CAPI_SOURCES}) add_library(paddle_capi_shared SHARED ${CAPI_SOURCES})
set_target_properties(paddle_capi_shared PROPERTIES LINK_FLAGS "${LINK_FLAGS}") set_target_properties(paddle_capi_shared PROPERTIES LINK_FLAGS "${LINK_FLAGS}")
target_include_directories(paddle_capi_shared PUBLIC ${CMAKE_CURRENT_BINARY_DIR}) target_include_directories(paddle_capi_shared PUBLIC ${CMAKE_CURRENT_BINARY_DIR})
...@@ -53,9 +67,10 @@ endif() ...@@ -53,9 +67,10 @@ endif()
# install library & headers. # install library & headers.
install(FILES ${CAPI_HEADERS} DESTINATION include/paddle) install(FILES ${CAPI_HEADERS} DESTINATION include/paddle)
install(FILES paddle_capi.map DESTINATION include/paddle)
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/config.h DESTINATION include/paddle) install(FILES ${CMAKE_CURRENT_BINARY_DIR}/config.h DESTINATION include/paddle)
if(ANDROID) if(ANDROID)
install(TARGETS paddle_capi_whole paddle_capi_shared install(TARGETS paddle_capi_whole paddle_capi_engine paddle_capi_layers paddle_capi_shared
ARCHIVE DESTINATION lib/${ANDROID_ABI} ARCHIVE DESTINATION lib/${ANDROID_ABI}
LIBRARY DESTINATION lib/${ANDROID_ABI}) LIBRARY DESTINATION lib/${ANDROID_ABI})
execute_process( execute_process(
...@@ -80,7 +95,7 @@ if(ANDROID) ...@@ -80,7 +95,7 @@ if(ANDROID)
)" )"
) )
else(ANDROID) else(ANDROID)
install(TARGETS paddle_capi_whole ARCHIVE DESTINATION lib) install(TARGETS paddle_capi_whole paddle_capi_engine paddle_capi_layers ARCHIVE DESTINATION lib)
if(NOT IOS) if(NOT IOS)
install(TARGETS paddle_capi_shared DESTINATION lib) install(TARGETS paddle_capi_shared DESTINATION lib)
endif() endif()
......
...@@ -19,14 +19,15 @@ cc_test(scope_test SRCS scope_test.cc DEPS scope) ...@@ -19,14 +19,15 @@ cc_test(scope_test SRCS scope_test.cc DEPS scope)
proto_library(framework_proto SRCS framework.proto) proto_library(framework_proto SRCS framework.proto)
cc_library(attribute SRCS attribute.cc DEPS framework_proto) cc_library(attribute SRCS attribute.cc DEPS framework_proto)
cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS attribute ddim op_info) cc_test(program_desc_test SRCS program_desc_test.cc DEPS proto_desc)
cc_library(op_proto_maker SRCS op_proto_maker.cc DEPS framework_proto attribute) cc_library(op_proto_maker SRCS op_proto_maker.cc DEPS framework_proto attribute)
cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker) cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker)
cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto) cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto)
cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope proto_desc) cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope glog)
cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry) cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry)
cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS attribute ddim op_info operator)
cc_library(op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator) cc_library(op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator glog proto_desc)
cc_test(op_registry_test SRCS op_registry_test.cc DEPS op_registry) cc_test(op_registry_test SRCS op_registry_test.cc DEPS op_registry)
py_proto_compile(framework_py_proto SRCS framework.proto) py_proto_compile(framework_py_proto SRCS framework.proto)
...@@ -42,7 +43,10 @@ add_custom_command(TARGET framework_py_proto POST_BUILD ...@@ -42,7 +43,10 @@ add_custom_command(TARGET framework_py_proto POST_BUILD
cc_library(backward SRCS backward.cc DEPS net_op) cc_library(backward SRCS backward.cc DEPS net_op)
cc_test(backward_test SRCS backward_test.cc DEPS backward recurrent_op device_context) cc_test(backward_test SRCS backward_test.cc DEPS backward recurrent_op device_context)
cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto backward) cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto backward glog)
cc_library(prune SRCS prune.cc DEPS framework_proto)
cc_test(prune_test SRCS prune_test.cc DEPS op_info prune recurrent_op device_context)
cc_library(tensor_array SRCS tensor_array.cc DEPS lod_tensor) cc_library(tensor_array SRCS tensor_array.cc DEPS lod_tensor)
cc_test(tensor_array_test SRCS tensor_array_test.cc DEPS tensor_array place) cc_test(tensor_array_test SRCS tensor_array_test.cc DEPS tensor_array place)
......
...@@ -19,19 +19,7 @@ limitations under the License. */ ...@@ -19,19 +19,7 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace framework { namespace framework {
static ProgramDesc* g_program_desc = nullptr; Attribute GetAttrValue(const OpDesc::Attr& attr_desc, ProgramDesc* program) {
ProgramDesc& GetProgramDesc() {
if (g_program_desc == nullptr) {
g_program_desc = new ProgramDesc();
auto root_block = g_program_desc->mutable_blocks()->Add();
root_block->set_idx(0);
root_block->set_parent_idx(-1);
}
return *g_program_desc;
}
Attribute GetAttrValue(const OpDesc::Attr& attr_desc) {
switch (attr_desc.type()) { switch (attr_desc.type()) {
case framework::AttrType::BOOLEAN: { case framework::AttrType::BOOLEAN: {
return attr_desc.b(); return attr_desc.b();
...@@ -74,7 +62,9 @@ Attribute GetAttrValue(const OpDesc::Attr& attr_desc) { ...@@ -74,7 +62,9 @@ Attribute GetAttrValue(const OpDesc::Attr& attr_desc) {
return val; return val;
} }
case framework::AttrType::BLOCK: { case framework::AttrType::BLOCK: {
return GetProgramDesc().mutable_blocks(attr_desc.block_idx()); PADDLE_ENFORCE(program != nullptr,
"Need to specify ProgramDesc when get a block attr");
return program->mutable_blocks(attr_desc.block_idx());
} }
} }
PADDLE_ENFORCE(false, "Unknown OpDesc::AttrDesc::type !"); PADDLE_ENFORCE(false, "Unknown OpDesc::AttrDesc::type !");
......
...@@ -26,16 +26,13 @@ limitations under the License. */ ...@@ -26,16 +26,13 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace framework { namespace framework {
ProgramDesc& GetProgramDesc();
template <typename T> template <typename T>
inline AttrType AttrTypeID() { inline AttrType AttrTypeID() {
Attribute tmp = T(); Attribute tmp = T();
return static_cast<AttrType>(tmp.which() - 1); return static_cast<AttrType>(tmp.which() - 1);
} }
Attribute GetAttrValue(const OpDesc::Attr& attr_desc); Attribute GetAttrValue(const OpDesc::Attr& attr_desc, ProgramDesc* desc);
class AttrReader { class AttrReader {
public: public:
......
...@@ -21,6 +21,7 @@ ...@@ -21,6 +21,7 @@
#include "paddle/framework/block_desc.h" #include "paddle/framework/block_desc.h"
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/operators/dynamic_recurrent_op.h"
#include "paddle/operators/net_op.h" #include "paddle/operators/net_op.h"
#include "paddle/operators/recurrent_op.h" #include "paddle/operators/recurrent_op.h"
...@@ -220,8 +221,7 @@ static std::unique_ptr<OperatorBase> BackwardRecursive( ...@@ -220,8 +221,7 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
// process recurrent gradient op as a special operator. // process recurrent gradient op as a special operator.
if (forwardOp.Type() == "recurrent") { if (forwardOp.Type() == "recurrent") {
// NOTE clean up cycle call somewhere (RNN's stepnet constains itself), // NOTE clean up cycle call somewhere (RNN's stepnet constains itself),
// or // or this will result in infinite loop.
// this will result in infinite loop.
const auto& rnnop = const auto& rnnop =
*static_cast<const operators::RecurrentOp*>(&forwardOp); *static_cast<const operators::RecurrentOp*>(&forwardOp);
auto rnn_grad_op = auto rnn_grad_op =
...@@ -231,6 +231,18 @@ static std::unique_ptr<OperatorBase> BackwardRecursive( ...@@ -231,6 +231,18 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
// create stepnet's gradient op // create stepnet's gradient op
rnn_grad_op->set_stepnet( rnn_grad_op->set_stepnet(
BackwardRecursive(stepnet_op, no_grad_names, grad_to_var, uniq_id)); BackwardRecursive(stepnet_op, no_grad_names, grad_to_var, uniq_id));
} else if (forwardOp.Type() == "dynamic_recurrent") {
// NOTE clean up cycle call somewhere (RNN's stepnet constains itself),
// or this will result in infinite loop.
const auto& rnnop =
*static_cast<const operators::DynamicRecurrentOp*>(&forwardOp);
auto rnn_grad_op =
static_cast<operators::DynamicRecurrentGradientOp*>(grad_op.get());
const auto& stepnet_op =
*static_cast<const OperatorBase*>(&rnnop.rnn.GetStepUnit());
// create stepnet's gradient op
rnn_grad_op->rnn.SetStepUnit(
BackwardRecursive(stepnet_op, no_grad_names, grad_to_var, uniq_id));
} }
if (net->ops_.empty()) { // Current no aux op is added to network if (net->ops_.empty()) { // Current no aux op is added to network
...@@ -281,12 +293,16 @@ static void CreateGradVarInBlock( ...@@ -281,12 +293,16 @@ static void CreateGradVarInBlock(
auto ops = block_desc->AllOps(); auto ops = block_desc->AllOps();
for (size_t op_index = grad_op_start_index; op_index < ops.size(); for (size_t op_index = grad_op_start_index; op_index < ops.size();
++op_index) { ++op_index) {
bool need_infer_shape = false;
ForEachVarName(ops[op_index]->Outputs(), ForEachVarName(ops[op_index]->Outputs(),
[&](const std::string& grad_var_name) { [&](const std::string& grad_var_name) {
if (block_desc->HasVar(grad_var_name)) { if (block_desc->HasVar(grad_var_name)) {
return false; return false;
} }
block_desc->Var(grad_var_name); need_infer_shape = true;
auto var = block_desc->Var(grad_var_name);
// FIXME(qiao) infer the datatype
var->SetDataType(framework::DataType::FP32);
auto it = param_name_map.find(grad_var_name); auto it = param_name_map.find(grad_var_name);
if (it == param_name_map.end()) { if (it == param_name_map.end()) {
return false; return false;
...@@ -298,12 +314,14 @@ static void CreateGradVarInBlock( ...@@ -298,12 +314,14 @@ static void CreateGradVarInBlock(
grad_record.op_idx_ = static_cast<int>(op_index); grad_record.op_idx_ = static_cast<int>(op_index);
return false; /* not break */ return false; /* not break */
}); });
if (need_infer_shape) {
ops[op_index]->InferShape(*block_desc);
}
} }
} }
std::vector<std::unique_ptr<OpDescBind>> MakeOpGrad( std::vector<std::unique_ptr<OpDescBind>> MakeOpGrad(
const std::unique_ptr<OpDescBind>& op_desc, const OpDescBind* op_desc, std::unordered_set<std::string>* no_grad_vars,
std::unordered_set<std::string>* no_grad_vars,
std::unordered_map<std::string, std::string>* grad_to_var) { std::unordered_map<std::string, std::string>* grad_to_var) {
std::vector<std::unique_ptr<OpDescBind>> grad_op_descs; std::vector<std::unique_ptr<OpDescBind>> grad_op_descs;
// All input gradients of forwarding operator do not need to calculate. // All input gradients of forwarding operator do not need to calculate.
...@@ -350,7 +368,7 @@ std::vector<std::unique_ptr<OpDescBind>> MakeBlockBackward( ...@@ -350,7 +368,7 @@ std::vector<std::unique_ptr<OpDescBind>> MakeBlockBackward(
std::unordered_set<std::string>* no_grad_vars, std::unordered_set<std::string>* no_grad_vars,
std::unordered_map<std::string, std::string>* grad_to_var) { std::unordered_map<std::string, std::string>* grad_to_var) {
BlockDescBind* cur_block = program_desc.Block(block_idx); BlockDescBind* cur_block = program_desc.Block(block_idx);
std::deque<std::unique_ptr<OpDescBind>>& op_descs = cur_block->ops_; std::vector<OpDescBind*> op_descs = cur_block->AllOps();
std::unordered_map<std::string, std::vector<size_t>> dup_out_ops; std::unordered_map<std::string, std::vector<size_t>> dup_out_ops;
size_t grad_desc_idx = 0; size_t grad_desc_idx = 0;
std::vector<std::unique_ptr<OpDescBind>> backward_descs; std::vector<std::unique_ptr<OpDescBind>> backward_descs;
...@@ -368,7 +386,7 @@ std::vector<std::unique_ptr<OpDescBind>> MakeBlockBackward( ...@@ -368,7 +386,7 @@ std::vector<std::unique_ptr<OpDescBind>> MakeBlockBackward(
program_desc, step_block_idx, no_grad_vars, grad_to_var); program_desc, step_block_idx, no_grad_vars, grad_to_var);
BlockDescBind* backward_block = program_desc.AppendBlock(*cur_block); BlockDescBind* backward_block = program_desc.AppendBlock(*cur_block);
for (auto& ptr : backward_block_op_descs) { for (auto& ptr : backward_block_op_descs) {
backward_block->ops_.push_back(std::move(ptr)); backward_block->AppendAllocatedOp(std::move(ptr));
} }
op_grads[0]->SetBlockAttr("step_block", *backward_block); op_grads[0]->SetBlockAttr("step_block", *backward_block);
} }
...@@ -425,17 +443,22 @@ ParamGradInfoMap AppendBackward( ...@@ -425,17 +443,22 @@ ParamGradInfoMap AppendBackward(
const int root_block_idx = 0; const int root_block_idx = 0;
auto root_block = program_desc.Block(root_block_idx); auto root_block = program_desc.Block(root_block_idx);
auto& all_ops = root_block->ops_;
// insert fill one op for target // insert fill one op for target
// TODO(qiao) add some check to the target.
std::string fill_one_op_out = GradVarName(target.Name()); std::string fill_one_op_out = GradVarName(target.Name());
std::vector<int64_t> target_shape_desc = target.Shape();
std::vector<int> target_shape;
std::transform(target_shape_desc.begin(), target_shape_desc.end(),
std::back_inserter(target_shape),
[](int64_t dim) { return static_cast<int>(dim); });
std::unique_ptr<OpDescBind> fill_one_op( std::unique_ptr<OpDescBind> fill_one_op(
new OpDescBind("fill_constant", {}, {{"Out", {fill_one_op_out}}}, new OpDescBind("fill_constant", {}, {{"Out", {fill_one_op_out}}},
{{"shape", std::vector<int>{1}}, {{"shape", target_shape},
{"value", static_cast<float>(1.0)}, {"value", static_cast<float>(1.0)},
{"data_type", framework::DataType::FP32}})); {"data_type", framework::DataType::FP32}}));
all_ops.push_back(std::move(fill_one_op)); root_block->AppendAllocatedOp(std::move(fill_one_op));
size_t forward_op_num = all_ops.size(); size_t forward_op_num = root_block->OpSize();
size_t forward_block_num = program_desc.Size(); size_t forward_block_num = program_desc.Size();
// Insert backward operators // Insert backward operators
...@@ -443,13 +466,22 @@ ParamGradInfoMap AppendBackward( ...@@ -443,13 +466,22 @@ ParamGradInfoMap AppendBackward(
auto backward_op_descs = MakeBlockBackward(program_desc, root_block_idx, auto backward_op_descs = MakeBlockBackward(program_desc, root_block_idx,
&no_grad_var_names, &grad_to_var); &no_grad_var_names, &grad_to_var);
std::unordered_map<std::string, GradVarInfo> retv;
// Create Variable
for (auto& ptr : backward_op_descs) { for (auto& ptr : backward_op_descs) {
all_ops.push_back(std::move(ptr)); root_block->AppendAllocatedOp(std::move(ptr));
} }
root_block->Var(fill_one_op_out); // Create Variable
// Create target gradient variable
std::unordered_map<std::string, GradVarInfo> retv;
auto var = root_block->Var(fill_one_op_out);
// FIXME(qiao) infer the data type
var->SetDataType(framework::DataType::FP32);
var->SetShape(target.Shape());
auto& target_grad = retv[target.Name()];
target_grad.name_ = fill_one_op_out;
target_grad.block_idx_ = root_block_idx;
target_grad.op_idx_ = static_cast<int>(forward_op_num);
// create grad_var for all blocks in this program // create grad_var for all blocks in this program
CreateGradVarInBlock(forward_op_num, grad_to_var, root_block, &retv); CreateGradVarInBlock(forward_op_num, grad_to_var, root_block, &retv);
......
...@@ -26,6 +26,20 @@ namespace framework { ...@@ -26,6 +26,20 @@ namespace framework {
using DeviceContext = platform::DeviceContext; using DeviceContext = platform::DeviceContext;
class NoneOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext *ctx) const override {}
};
template <typename Place, typename T>
class NoneKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &context) const override {}
};
class RowWiseAddOpMaker : public OpProtoAndCheckerMaker { class RowWiseAddOpMaker : public OpProtoAndCheckerMaker {
public: public:
RowWiseAddOpMaker(OpProto *proto, OpAttrChecker *op_checker) RowWiseAddOpMaker(OpProto *proto, OpAttrChecker *op_checker)
...@@ -215,19 +229,51 @@ class MinusOpMaker : public OpProtoAndCheckerMaker { ...@@ -215,19 +229,51 @@ class MinusOpMaker : public OpProtoAndCheckerMaker {
namespace f = paddle::framework; namespace f = paddle::framework;
namespace ops = paddle::operators; namespace ops = paddle::operators;
using EnforceNotMet = paddle::platform::EnforceNotMet; using EnforceNotMet = paddle::platform::EnforceNotMet;
REGISTER_OPERATOR(rowwise_add, f::NOP, f::RowWiseAddOpMaker, // rowwise_add
REGISTER_OPERATOR(rowwise_add, f::NoneOp, f::RowWiseAddOpMaker,
f::RowWiseAddGradMaker); f::RowWiseAddGradMaker);
REGISTER_OPERATOR(rowwise_add_grad, f::NOP); REGISTER_OP_CPU_KERNEL(rowwise_add,
REGISTER_OP(mul, f::NOP, f::MulOpMaker, mul_grad, f::NOP); f::NoneKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP(sigmoid, f::NOP, f::SigmoidOpMaker, sigmoid_grad, f::NOP); REGISTER_OPERATOR(rowwise_add_grad, f::NoneOp);
REGISTER_OP_WITHOUT_GRADIENT(nograd, f::NOP, f::NoGradOpMaker); REGISTER_OP_CPU_KERNEL(rowwise_add_grad,
REGISTER_OP_WITHOUT_GRADIENT(fill_zeros_like, f::NOP, f::FillZeroOpMaker); f::NoneKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP(sum, f::NOP, f::SumOpMaker, sum_grad, f::NOP); // mul
REGISTER_OP(mul, f::NoneOp, f::MulOpMaker, mul_grad, f::NoneOp);
REGISTER_OP_CPU_KERNEL(mul, f::NoneKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(mul_grad,
f::NoneKernel<paddle::platform::CPUPlace, float>);
// sigmoid
REGISTER_OP(sigmoid, f::NoneOp, f::SigmoidOpMaker, sigmoid_grad, f::NoneOp);
REGISTER_OP_CPU_KERNEL(sigmoid,
f::NoneKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_WITHOUT_GRADIENT(nograd, f::NoneOp, f::NoGradOpMaker);
// fill_zeros_like
REGISTER_OP_WITHOUT_GRADIENT(fill_zeros_like, f::NoneOp, f::FillZeroOpMaker);
REGISTER_OP_CPU_KERNEL(fill_zeros_like,
f::NoneKernel<paddle::platform::CPUPlace, float>);
// sum
REGISTER_OP(sum, f::NoneOp, f::SumOpMaker, sum_grad, f::NoneOp);
REGISTER_OP_CPU_KERNEL(sum, f::NoneKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(sum_grad,
f::NoneKernel<paddle::platform::CPUPlace, float>);
// fc
REGISTER_OP_WITHOUT_GRADIENT(fc, f::FcOp, f::FcOpMaker); REGISTER_OP_WITHOUT_GRADIENT(fc, f::FcOp, f::FcOpMaker);
REGISTER_OP(many_output_op, f::NOP, f::ManyOutputOpMaker, many_output_op_grad, // many_output_op
f::NOP); REGISTER_OP(many_output_op, f::NoneOp, f::ManyOutputOpMaker,
REGISTER_OP(mult_in_out, f::NOP, f::MultInOutOpMaker, mult_in_out_grad, f::NOP); many_output_op_grad, f::NoneOp);
REGISTER_OPERATOR(minus, f::NOP, f::MinusOpMaker, f::MinusGradOpDescMaker); // mult_in_out
REGISTER_OP(mult_in_out, f::NoneOp, f::MultInOutOpMaker, mult_in_out_grad,
f::NoneOp);
REGISTER_OP_CPU_KERNEL(mult_in_out,
f::NoneKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(mult_in_out_grad,
f::NoneKernel<paddle::platform::CPUPlace, float>);
// minus
REGISTER_OPERATOR(minus, f::NoneOp, f::MinusOpMaker, f::MinusGradOpDescMaker);
REGISTER_OP_CPU_KERNEL(minus, f::NoneKernel<paddle::platform::CPUPlace, float>);
// scale
REGISTER_OPERATOR(scale, f::NoneOp);
REGISTER_OP_CPU_KERNEL(scale, f::NoneKernel<paddle::platform::CPUPlace, float>);
TEST(Backward, simple_op_not_need_grad) { TEST(Backward, simple_op_not_need_grad) {
auto fwd = f::OpRegistry::CreateOp( auto fwd = f::OpRegistry::CreateOp(
...@@ -449,20 +495,10 @@ TEST(Backward, linear_net_intermediate_variable_has_no_grad) { ...@@ -449,20 +495,10 @@ TEST(Backward, linear_net_intermediate_variable_has_no_grad) {
EXPECT_EQ(bwd_net->ops_[2]->Outputs(all).size(), 0UL); EXPECT_EQ(bwd_net->ops_[2]->Outputs(all).size(), 0UL);
} }
// =================================== //
f::ProgramDesc *GetNewProgramDesc() {
auto *program_desc = new f::ProgramDesc();
auto *root_block = program_desc->add_blocks();
root_block->set_idx(0);
root_block->set_parent_idx(-1);
return program_desc;
}
TEST(Backward, simple_single_op) { TEST(Backward, simple_single_op) {
f::ProgramDesc *program_desc = GetNewProgramDesc(); f::ProgramDescBind program;
f::ProgramDescBind &program = f::ProgramDescBind::Instance(program_desc);
f::BlockDescBind *block = program.Block(0); f::BlockDescBind *block = program.Block(0);
f::OpDescBind *op = block->AppendOp(); f::OpDescBind *op = block->AppendOp();
op->SetType("rowwise_add"); op->SetType("rowwise_add");
op->SetInput("X", {"x"}); op->SetInput("X", {"x"});
...@@ -487,7 +523,7 @@ TEST(Backward, simple_single_op) { ...@@ -487,7 +523,7 @@ TEST(Backward, simple_single_op) {
EXPECT_EQ(grad_op->Output(f::GradVarName("b")), EXPECT_EQ(grad_op->Output(f::GradVarName("b")),
std::vector<std::string>({f::GradVarName("b")})); std::vector<std::string>({f::GradVarName("b")}));
EXPECT_EQ(var_to_grad.size(), 2UL); EXPECT_EQ(var_to_grad.size(), 3UL);
EXPECT_EQ(var_to_grad.at("b"), f::GradVarInfo(f::GradVarName("b"), 0, 2)); EXPECT_EQ(var_to_grad.at("b"), f::GradVarInfo(f::GradVarName("b"), 0, 2));
EXPECT_EQ(var_to_grad.at("x"), f::GradVarInfo(f::GradVarName("x"), 0, 2)); EXPECT_EQ(var_to_grad.at("x"), f::GradVarInfo(f::GradVarName("x"), 0, 2));
...@@ -496,8 +532,7 @@ TEST(Backward, simple_single_op) { ...@@ -496,8 +532,7 @@ TEST(Backward, simple_single_op) {
} }
TEST(Backward, default_attribute) { TEST(Backward, default_attribute) {
f::ProgramDesc *program_desc = GetNewProgramDesc(); f::ProgramDescBind program;
f::ProgramDescBind &program = f::ProgramDescBind::Instance(program_desc);
f::BlockDescBind *block = program.Block(0); f::BlockDescBind *block = program.Block(0);
f::OpDescBind *op = block->AppendOp(); f::OpDescBind *op = block->AppendOp();
op->SetType("mul"); op->SetType("mul");
...@@ -523,8 +558,7 @@ TEST(Backward, default_attribute) { ...@@ -523,8 +558,7 @@ TEST(Backward, default_attribute) {
} }
TEST(Backward, simple_mult_op) { TEST(Backward, simple_mult_op) {
f::ProgramDesc *program_desc = GetNewProgramDesc(); f::ProgramDescBind program;
f::ProgramDescBind &program = f::ProgramDescBind::Instance(program_desc);
f::BlockDescBind *block = program.Block(0); f::BlockDescBind *block = program.Block(0);
f::OpDescBind *op1 = block->AppendOp(); f::OpDescBind *op1 = block->AppendOp();
op1->SetType("rowwise_add"); op1->SetType("rowwise_add");
...@@ -588,7 +622,7 @@ TEST(Backward, simple_mult_op) { ...@@ -588,7 +622,7 @@ TEST(Backward, simple_mult_op) {
EXPECT_EQ(grad_op3->Output(f::GradVarName("b")), EXPECT_EQ(grad_op3->Output(f::GradVarName("b")),
std::vector<std::string>({f::GradVarName("b3")})); std::vector<std::string>({f::GradVarName("b3")}));
EXPECT_EQ(var_to_grad.size(), 6UL); EXPECT_EQ(var_to_grad.size(), 7UL);
EXPECT_EQ(var_to_grad.at("x1"), f::GradVarInfo(f::GradVarName("x1"), 0, 6)); EXPECT_EQ(var_to_grad.at("x1"), f::GradVarInfo(f::GradVarName("x1"), 0, 6));
EXPECT_EQ(var_to_grad.at("b1"), f::GradVarInfo(f::GradVarName("b1"), 0, 6)); EXPECT_EQ(var_to_grad.at("b1"), f::GradVarInfo(f::GradVarName("b1"), 0, 6));
EXPECT_EQ(var_to_grad.at("out1"), EXPECT_EQ(var_to_grad.at("out1"),
...@@ -607,8 +641,7 @@ TEST(Backward, simple_mult_op) { ...@@ -607,8 +641,7 @@ TEST(Backward, simple_mult_op) {
} }
TEST(Backward, intermedia_var_no_grad) { TEST(Backward, intermedia_var_no_grad) {
f::ProgramDesc *program_desc = GetNewProgramDesc(); f::ProgramDescBind program;
f::ProgramDescBind &program = f::ProgramDescBind::Instance(program_desc);
f::BlockDescBind *block = program.Block(0); f::BlockDescBind *block = program.Block(0);
f::OpDescBind *op1 = block->AppendOp(); f::OpDescBind *op1 = block->AppendOp();
op1->SetType("rowwise_add"); op1->SetType("rowwise_add");
...@@ -666,7 +699,7 @@ TEST(Backward, intermedia_var_no_grad) { ...@@ -666,7 +699,7 @@ TEST(Backward, intermedia_var_no_grad) {
std::vector<std::string>({f::GradVarName("out1")})); std::vector<std::string>({f::GradVarName("out1")}));
EXPECT_EQ(grad_op4->Output(f::GradVarName("Y")), std::vector<std::string>()); EXPECT_EQ(grad_op4->Output(f::GradVarName("Y")), std::vector<std::string>());
EXPECT_EQ(var_to_grad.size(), 3UL); EXPECT_EQ(var_to_grad.size(), 4UL);
EXPECT_EQ(var_to_grad.at("x1"), f::GradVarInfo(f::GradVarName("x1"), 0, 6)); EXPECT_EQ(var_to_grad.at("x1"), f::GradVarInfo(f::GradVarName("x1"), 0, 6));
EXPECT_EQ(var_to_grad.at("b1"), f::GradVarInfo(f::GradVarName("b1"), 0, 6)); EXPECT_EQ(var_to_grad.at("b1"), f::GradVarInfo(f::GradVarName("b1"), 0, 6));
EXPECT_EQ(var_to_grad.at("out1"), EXPECT_EQ(var_to_grad.at("out1"),
...@@ -678,8 +711,7 @@ TEST(Backward, intermedia_var_no_grad) { ...@@ -678,8 +711,7 @@ TEST(Backward, intermedia_var_no_grad) {
} }
TEST(Backward, var_no_grad) { TEST(Backward, var_no_grad) {
f::ProgramDesc *program_desc = GetNewProgramDesc(); f::ProgramDescBind program;
f::ProgramDescBind &program = f::ProgramDescBind::Instance(program_desc);
f::BlockDescBind *block = program.Block(0); f::BlockDescBind *block = program.Block(0);
f::OpDescBind *op1 = block->AppendOp(); f::OpDescBind *op1 = block->AppendOp();
op1->SetType("mult_in_out"); op1->SetType("mult_in_out");
...@@ -744,7 +776,7 @@ TEST(Backward, var_no_grad) { ...@@ -744,7 +776,7 @@ TEST(Backward, var_no_grad) {
EXPECT_EQ(grad_op1->Output(f::GradVarName("H")), EXPECT_EQ(grad_op1->Output(f::GradVarName("H")),
std::vector<std::string>({f::GradVarName("h1")})); std::vector<std::string>({f::GradVarName("h1")}));
EXPECT_EQ(var_to_grad.size(), 3UL); EXPECT_EQ(var_to_grad.size(), 4UL);
EXPECT_EQ(var_to_grad.at("y1"), f::GradVarInfo(f::GradVarName("y1"), 0, 3)); EXPECT_EQ(var_to_grad.at("y1"), f::GradVarInfo(f::GradVarName("y1"), 0, 3));
EXPECT_EQ(var_to_grad.at("x1"), f::GradVarInfo(f::GradVarName("x1"), 0, 5)); EXPECT_EQ(var_to_grad.at("x1"), f::GradVarInfo(f::GradVarName("x1"), 0, 5));
EXPECT_EQ(var_to_grad.at("h1"), f::GradVarInfo(f::GradVarName("h1"), 0, 5)); EXPECT_EQ(var_to_grad.at("h1"), f::GradVarInfo(f::GradVarName("h1"), 0, 5));
...@@ -755,8 +787,7 @@ TEST(Backward, var_no_grad) { ...@@ -755,8 +787,7 @@ TEST(Backward, var_no_grad) {
} }
TEST(Backward, shared_var) { TEST(Backward, shared_var) {
f::ProgramDesc *program_desc = GetNewProgramDesc(); f::ProgramDescBind program;
f::ProgramDescBind &program = f::ProgramDescBind::Instance(program_desc);
f::BlockDescBind *block = program.Block(0); f::BlockDescBind *block = program.Block(0);
f::OpDescBind *op1 = block->AppendOp(); f::OpDescBind *op1 = block->AppendOp();
op1->SetType("rowwise_add"); op1->SetType("rowwise_add");
...@@ -830,7 +861,7 @@ TEST(Backward, shared_var) { ...@@ -830,7 +861,7 @@ TEST(Backward, shared_var) {
EXPECT_EQ(grad_op1->Output(f::GradVarName("b")), EXPECT_EQ(grad_op1->Output(f::GradVarName("b")),
std::vector<std::string>({f::GradVarName("b1")})); std::vector<std::string>({f::GradVarName("b1")}));
EXPECT_EQ(var_to_grad.size(), 5UL); EXPECT_EQ(var_to_grad.size(), 6UL);
EXPECT_EQ(var_to_grad.at("b3"), f::GradVarInfo(f::GradVarName("b3"), 0, 4)); EXPECT_EQ(var_to_grad.at("b3"), f::GradVarInfo(f::GradVarName("b3"), 0, 4));
EXPECT_EQ(var_to_grad.at("y2"), f::GradVarInfo(f::GradVarName("y2"), 0, 5)); EXPECT_EQ(var_to_grad.at("y2"), f::GradVarInfo(f::GradVarName("y2"), 0, 5));
EXPECT_EQ(var_to_grad.at("out1"), EXPECT_EQ(var_to_grad.at("out1"),
...@@ -846,8 +877,7 @@ TEST(Backward, shared_var) { ...@@ -846,8 +877,7 @@ TEST(Backward, shared_var) {
} }
TEST(Backward, half_backward) { TEST(Backward, half_backward) {
f::ProgramDesc *program_desc = GetNewProgramDesc(); f::ProgramDescBind program;
f::ProgramDescBind &program = f::ProgramDescBind::Instance(program_desc);
f::BlockDescBind *block = program.Block(0); f::BlockDescBind *block = program.Block(0);
auto *op1 = block->AppendOp(); auto *op1 = block->AppendOp();
op1->SetType("minus"); op1->SetType("minus");
...@@ -863,7 +893,7 @@ TEST(Backward, half_backward) { ...@@ -863,7 +893,7 @@ TEST(Backward, half_backward) {
auto ops = block->AllOps(); auto ops = block->AllOps();
ASSERT_EQ(3UL, ops.size()); ASSERT_EQ(3UL, ops.size());
EXPECT_EQ(var_to_grad.size(), 1UL); EXPECT_EQ(var_to_grad.size(), 2UL);
EXPECT_EQ(var_to_grad.at("a"), EXPECT_EQ(var_to_grad.at("a"),
f::GradVarInfo(f::GradVarName("a"), 0, forward_len + 1)); f::GradVarInfo(f::GradVarName("a"), 0, forward_len + 1));
} }
...@@ -19,11 +19,11 @@ namespace paddle { ...@@ -19,11 +19,11 @@ namespace paddle {
namespace framework { namespace framework {
VarDescBind *BlockDescBind::Var(const std::string &name) { VarDescBind *BlockDescBind::Var(const std::string &name) {
need_update_ = true;
auto it = vars_.find(name); auto it = vars_.find(name);
if (it != vars_.end()) { if (it != vars_.end()) {
return it->second.get(); return it->second.get();
} }
need_update_ = true;
auto *var = new VarDescBind(name); auto *var = new VarDescBind(name);
vars_[name].reset(var); vars_[name].reset(var);
return var; return var;
...@@ -41,6 +41,19 @@ bool BlockDescBind::HasVar(const std::string &name) const { ...@@ -41,6 +41,19 @@ bool BlockDescBind::HasVar(const std::string &name) const {
return vars_.find(name) != vars_.end(); return vars_.find(name) != vars_.end();
} }
VarDescBind *BlockDescBind::FindVarRecursive(const std::string &name) const {
auto it = vars_.find(name);
if (it == vars_.end()) {
return Parent() == kNoneBlockIndex ? nullptr
: ParentBlock()->FindVarRecursive(name);
}
return it->second.get();
}
bool BlockDescBind::HasVarRecursive(const std::string &name) const {
return FindVarRecursive(name) != nullptr;
}
std::vector<VarDescBind *> BlockDescBind::AllVars() const { std::vector<VarDescBind *> BlockDescBind::AllVars() const {
std::vector<VarDescBind *> res; std::vector<VarDescBind *> res;
for (const auto &p : vars_) { for (const auto &p : vars_) {
...@@ -55,6 +68,11 @@ OpDescBind *BlockDescBind::AppendOp() { ...@@ -55,6 +68,11 @@ OpDescBind *BlockDescBind::AppendOp() {
return ops_.back().get(); return ops_.back().get();
} }
void BlockDescBind::AppendAllocatedOp(std::unique_ptr<OpDescBind> &&op_desc) {
need_update_ = true;
ops_.emplace_back(std::move(op_desc));
}
OpDescBind *BlockDescBind::PrependOp() { OpDescBind *BlockDescBind::PrependOp() {
need_update_ = true; need_update_ = true;
ops_.emplace_front(new OpDescBind()); ops_.emplace_front(new OpDescBind());
...@@ -70,15 +88,19 @@ std::vector<OpDescBind *> BlockDescBind::AllOps() const { ...@@ -70,15 +88,19 @@ std::vector<OpDescBind *> BlockDescBind::AllOps() const {
} }
void BlockDescBind::Flush() { void BlockDescBind::Flush() {
for (auto &op_desc : ops_) {
op_desc->Flush();
}
if (need_update_) { if (need_update_) {
auto &op_field = *this->desc_->mutable_ops(); auto &op_field = *this->desc_->mutable_ops();
op_field.Clear(); this->ClearPBOps();
op_field.Reserve(static_cast<int>(ops_.size())); op_field.Reserve(static_cast<int>(ops_.size()));
for (auto &op_desc : ops_) { for (auto &op_desc : ops_) {
op_field.AddAllocated(op_desc->Proto()); op_field.AddAllocated(op_desc->Proto());
} }
auto &var_field = *this->desc_->mutable_vars(); auto &var_field = *this->desc_->mutable_vars();
var_field.Clear(); this->ClearPBVars();
var_field.Reserve(static_cast<int>(vars_.size())); var_field.Reserve(static_cast<int>(vars_.size()));
for (auto &var_desc : vars_) { for (auto &var_desc : vars_) {
var_field.AddAllocated(var_desc.second->Proto()); var_field.AddAllocated(var_desc.second->Proto());
...@@ -88,7 +110,7 @@ void BlockDescBind::Flush() { ...@@ -88,7 +110,7 @@ void BlockDescBind::Flush() {
} }
BlockDescBind *BlockDescBind::ParentBlock() const { BlockDescBind *BlockDescBind::ParentBlock() const {
if (this->desc_->parent_idx() == -1) { if (this->desc_->parent_idx() == kNoneBlockIndex) {
return nullptr; return nullptr;
} }
return prog_->Block(static_cast<size_t>(this->desc_->parent_idx())); return prog_->Block(static_cast<size_t>(this->desc_->parent_idx()));
...@@ -98,6 +120,35 @@ BlockDesc *BlockDescBind::Proto() { ...@@ -98,6 +120,35 @@ BlockDesc *BlockDescBind::Proto() {
Flush(); Flush();
return desc_; return desc_;
} }
BlockDescBind::BlockDescBind(const BlockDescBind &other, BlockDesc *desc,
ProgramDescBind *prog)
: prog_(prog), desc_(desc) {
need_update_ = true;
for (auto &op : other.ops_) {
ops_.emplace_back(new OpDescBind(*op));
}
for (auto &it : other.vars_) {
auto *var = new VarDescBind(*it.second);
vars_[it.first].reset(var);
}
}
void BlockDescBind::ClearPBOps() {
auto ops = this->desc_->mutable_ops();
while (!ops->empty()) {
// we do not own the OpDesc, so release the ownership.
ops->ReleaseLast();
}
}
void BlockDescBind::ClearPBVars() {
auto vars = this->desc_->mutable_vars();
while (!vars->empty()) {
// we do not own the VarDesc, so release the ownership.
vars->ReleaseLast();
}
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -16,9 +16,12 @@ limitations under the License. */ ...@@ -16,9 +16,12 @@ limitations under the License. */
#include <deque> #include <deque>
#include <memory> #include <memory>
#include <set>
#include <unordered_map> #include <unordered_map>
#include <vector> #include <vector>
#include "paddle/framework/op_desc.h" #include "paddle/framework/op_desc.h"
#include "paddle/framework/proto_desc.h"
#include "paddle/framework/var_desc.h" #include "paddle/framework/var_desc.h"
#include "paddle/platform/macros.h" #include "paddle/platform/macros.h"
...@@ -36,6 +39,14 @@ class BlockDescBind { ...@@ -36,6 +39,14 @@ class BlockDescBind {
BlockDescBind(ProgramDescBind *prog, BlockDesc *desc) BlockDescBind(ProgramDescBind *prog, BlockDesc *desc)
: prog_(prog), desc_(desc), need_update_(false) {} : prog_(prog), desc_(desc), need_update_(false) {}
BlockDescBind(const BlockDescBind &other, BlockDesc *desc,
ProgramDescBind *prog);
~BlockDescBind() {
this->ClearPBVars();
this->ClearPBOps();
}
int32_t ID() const { return desc_->idx(); } int32_t ID() const { return desc_->idx(); }
int32_t Parent() const { return desc_->parent_idx(); } int32_t Parent() const { return desc_->parent_idx(); }
...@@ -46,23 +57,43 @@ class BlockDescBind { ...@@ -46,23 +57,43 @@ class BlockDescBind {
bool HasVar(const std::string &var_name) const; bool HasVar(const std::string &var_name) const;
VarDescBind *FindVarRecursive(const std::string &name_bytes) const;
bool HasVarRecursive(const std::string &var_name) const;
std::set<std::string> LocalVarNames() const {
std::set<std::string> var_names;
for (auto &var : vars_) {
var_names.insert(var.first);
}
return var_names;
}
std::vector<VarDescBind *> AllVars() const; std::vector<VarDescBind *> AllVars() const;
BlockDescBind *ParentBlock() const; BlockDescBind *ParentBlock() const;
OpDescBind *AppendOp(); OpDescBind *AppendOp();
void AppendAllocatedOp(std::unique_ptr<OpDescBind> &&op_desc);
OpDescBind *PrependOp(); OpDescBind *PrependOp();
std::vector<OpDescBind *> AllOps() const; std::vector<OpDescBind *> AllOps() const;
size_t OpSize() const { return ops_.size(); }
OpDescBind *Op(int idx) { return ops_.at(idx).get(); }
void Flush(); void Flush();
BlockDesc *Proto(); BlockDesc *Proto();
// FIXME(yuyang18): backward will access private data of BlockDesc. private:
// Mark it public temporary. We can fix it later. void ClearPBOps();
public: void ClearPBVars();
private:
ProgramDescBind *prog_; // not_own ProgramDescBind *prog_; // not_own
BlockDesc *desc_; // not_own BlockDesc *desc_; // not_own
bool need_update_; bool need_update_;
......
...@@ -26,6 +26,8 @@ inline DataType ToDataType(std::type_index type) { ...@@ -26,6 +26,8 @@ inline DataType ToDataType(std::type_index type) {
return DataType::FP64; return DataType::FP64;
} else if (typeid(int).hash_code() == type.hash_code()) { } else if (typeid(int).hash_code() == type.hash_code()) {
return DataType::INT32; return DataType::INT32;
} else if (typeid(int64_t).hash_code() == type.hash_code()) {
return DataType::INT64;
} else { } else {
PADDLE_THROW("Not supported"); PADDLE_THROW("Not supported");
} }
......
...@@ -64,99 +64,27 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id) { ...@@ -64,99 +64,27 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id) {
auto& block = pdesc.blocks(block_id); auto& block = pdesc.blocks(block_id);
auto& device = device_contexts_[0]; auto& device = device_contexts_[0];
// Instantiate all the vars in the global scope
for (auto& var : block.vars()) {
scope->Var(var.name());
}
Scope& local_scope = scope->NewScope(); Scope& local_scope = scope->NewScope();
std::vector<bool> should_run = Prune(pdesc, block_id); for (auto& var : block.vars()) {
PADDLE_ENFORCE_EQ(should_run.size(), static_cast<size_t>(block.ops_size())); if (var.persistable()) {
for (size_t i = 0; i < should_run.size(); ++i) { auto* ptr = scope->Var(var.name());
if (should_run[i]) { VLOG(3) << "Create Variable " << var.name()
for (auto& var : block.ops(i).outputs()) { << " global, which pointer is " << ptr;
for (auto& argu : var.arguments()) {
if (local_scope.FindVar(argu) == nullptr) {
local_scope.Var(argu);
}
}
}
auto op = paddle::framework::OpRegistry::CreateOp(block.ops(i));
op->Run(local_scope, *device);
}
}
// TODO(tonyyang-svail):
// - Destroy local_scope
}
std::vector<bool> Prune(const ProgramDesc& pdesc, int block_id) {
// TODO(tonyyang-svail):
// - will change to use multiple blocks for RNN op and Cond Op
auto& block = pdesc.blocks(block_id);
auto& ops = block.ops();
bool expect_feed = true;
for (auto& op_desc : ops) {
PADDLE_ENFORCE(op_desc.type() != kFeedOpType || expect_feed,
"All FeedOps are at the beginning of the ProgramDesc");
expect_feed = (op_desc.type() == kFeedOpType);
}
bool expect_fetch = true;
for (auto op_iter = ops.rbegin(); op_iter != ops.rend(); ++op_iter) {
auto& op_desc = *op_iter;
PADDLE_ENFORCE(op_desc.type() != kFetchOpType || expect_fetch,
"All FetchOps must at the end of the ProgramDesc");
expect_fetch = (op_desc.type() == kFetchOpType);
}
std::set<std::string> dependent_vars;
std::vector<bool> should_run;
for (auto op_iter = ops.rbegin(); op_iter != ops.rend(); ++op_iter) {
auto& op_desc = *op_iter;
bool found_dependent_vars = false;
for (auto& var : op_desc.outputs()) {
for (auto& argu : var.arguments()) {
if (dependent_vars.count(argu) != 0) {
found_dependent_vars = true;
}
}
}
if (op_desc.type() == kFetchOpType || found_dependent_vars) {
// erase its output to the dependency graph
for (auto& var : op_desc.outputs()) {
for (auto& argu : var.arguments()) {
dependent_vars.erase(argu);
}
}
// insert its input to the dependency graph
for (auto& var : op_desc.inputs()) {
for (auto& argu : var.arguments()) {
dependent_vars.insert(argu);
}
}
should_run.push_back(true);
} else { } else {
should_run.push_back(false); auto* ptr = local_scope.Var(var.name());
VLOG(3) << "Create Variable " << var.name()
<< " locally, which pointer is " << ptr;
} }
} }
// TODO(tonyyang-svail): for (auto& op_desc : block.ops()) {
// - check this after integration of Init auto op = paddle::framework::OpRegistry::CreateOp(
// PADDLE_ENFORCE(dependent_vars.empty()); op_desc, const_cast<ProgramDesc*>(&pdesc));
op->Run(local_scope, *device);
// since we are traversing the ProgramDesc in reverse order }
// we reverse the should_run vector
std::reverse(should_run.begin(), should_run.end());
return should_run; scope->DeleteScope(&local_scope);
} }
} // namespace framework } // namespace framework
......
...@@ -40,16 +40,5 @@ class Executor { ...@@ -40,16 +40,5 @@ class Executor {
std::vector<platform::DeviceContext*> device_contexts_; std::vector<platform::DeviceContext*> device_contexts_;
}; };
/* @Brief
* Pruning the graph
*
* @param
* ProgramDesc
*
* @return
* vector<bool> Same size as ops. Indicates whether an op should be run.
*/
std::vector<bool> Prune(const ProgramDesc& pdesc, int block_id);
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -13,37 +13,45 @@ See the License for the specific language governing permissions and ...@@ -13,37 +13,45 @@ See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#pragma once #pragma once
#include "glog/logging.h"
#include "paddle/framework/feed_fetch_type.h"
#include "paddle/framework/scope.h" #include "paddle/framework/scope.h"
#include "paddle/framework/variable.h" #include "paddle/framework/variable.h"
namespace paddle { namespace paddle {
namespace framework { namespace framework {
template <typename T> void SetFeedVariable(Scope* scope, const LoDTensor& input,
void SetFeedVariable(const LoDTensor& input, const std::string& var_name, const std::string& var_name, size_t index) {
size_t index) {
// If var_name Variable is not found in GlobalScope, a new variable will // If var_name Variable is not found in GlobalScope, a new variable will
// be created. // be created.
Variable* g_feed_value = GetGlobalScope().Var(var_name); VLOG(3) << "SetFeedVariable name=" << var_name << " index=" << index;
Variable* g_feed_value = scope->Var(var_name);
auto& feed_inputs = auto& feed_inputs =
*(g_feed_value->GetMutable<std::vector<paddle::framework::LoDTensor>>()); *(g_feed_value->GetMutable<std::vector<paddle::framework::LoDTensor>>());
if (index >= feed_inputs.size()) { if (index >= feed_inputs.size()) {
feed_inputs.resize(index + 1); feed_inputs.resize(index + 1);
} }
// shared data with input tensor // shared data with input tensor
feed_inputs[index].ShareDataWith<T>(input); feed_inputs[index].ShareDataWith(input);
// set lod // set lod
feed_inputs[index].set_lod(input.lod()); feed_inputs[index].set_lod(input.lod());
} }
LoDTensor& GetFetchVariable(const std::string& var_name, size_t index) { LoDTensor& GetFetchVariable(const Scope& scope, const std::string& var_name,
size_t index) {
// Since we want to fetch LodTensor from a variable, the variable must // Since we want to fetch LodTensor from a variable, the variable must
// be created alreadly. // be created alreadly.
Variable* g_fetch_value = GetGlobalScope().FindVar(var_name); Variable* g_fetch_value = scope.FindVar(var_name);
auto& fetch_outputs = PADDLE_ENFORCE(g_fetch_value->IsType<FeedFetchList>(),
*(g_fetch_value->GetMutable<std::vector<paddle::framework::LoDTensor>>()); "Only %s can be invoked by GetFetchVariable",
typeid(FeedFetchList).name());
auto& fetch_outputs = *g_fetch_value->GetMutable<FeedFetchList>();
auto& tensor = fetch_outputs[index];
VLOG(3) << "Fetch " << var_name << " with index " << index
<< " shape= " << tensor.dims();
PADDLE_ENFORCE_LT(index, fetch_outputs.size()); PADDLE_ENFORCE_LT(index, fetch_outputs.size());
return fetch_outputs[index]; return tensor;
} }
} // namespace framework } // namespace framework
......
...@@ -55,6 +55,7 @@ message OpDesc { ...@@ -55,6 +55,7 @@ message OpDesc {
repeated Var inputs = 1; repeated Var inputs = 1;
repeated Var outputs = 2; repeated Var outputs = 2;
repeated Attr attrs = 4; repeated Attr attrs = 4;
optional bool is_target = 5 [ default = false ];
}; };
// OpProto describes a C++ framework::OperatorBase derived class. // OpProto describes a C++ framework::OperatorBase derived class.
...@@ -67,6 +68,7 @@ message OpProto { ...@@ -67,6 +68,7 @@ message OpProto {
optional bool duplicable = 3 [ default = false ]; optional bool duplicable = 3 [ default = false ];
optional bool intermediate = 4 [ default = false ]; optional bool intermediate = 4 [ default = false ];
optional bool dispensable = 5 [ default = false ];
} }
// AttrProto describes the C++ type Attribute. // AttrProto describes the C++ type Attribute.
...@@ -111,6 +113,8 @@ message VarDesc { ...@@ -111,6 +113,8 @@ message VarDesc {
enum VarType { enum VarType {
LOD_TENSOR = 1; LOD_TENSOR = 1;
SELECTED_ROWS = 2; SELECTED_ROWS = 2;
FEED_MINIBATCH = 3;
FETCH_LIST = 4;
} }
required string name = 1; required string name = 1;
required VarType type = 2; required VarType type = 2;
......
...@@ -25,31 +25,50 @@ LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end) { ...@@ -25,31 +25,50 @@ LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end) {
for (size_t i = level_begin; i < level_end; i++) { for (size_t i = level_begin; i < level_end; i++) {
new_lod.emplace_back(in.at(i)); new_lod.emplace_back(in.at(i));
} }
// transform the lowest level to absolute offset.
LoD abs_offset_lod = ToAbsOffset(in);
new_lod.back() = abs_offset_lod[level_end - 1];
return new_lod; return new_lod;
} }
LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin, LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin,
size_t elem_end) { size_t elem_end) {
// slice the lod. PADDLE_ENFORCE_LT(level, in.size());
LoD new_lod; PADDLE_ENFORCE_LT(elem_end, in[level].size());
new_lod.reserve(in.size() - level);
auto start = in.at(level)[elem_begin]; LoD res;
auto end = in.at(level)[elem_end]; res.resize(in.size() - level);
// copy the first level
for (auto it = in.begin() + level; it != in.end(); it++) { res[0].assign(in[level].begin() + elem_begin,
auto it_begin = std::find(it->begin(), it->end(), start); in[level].begin() + elem_end + 1);
auto it_end = std::find(it_begin, it->end(), end); for (size_t lvl = 1; lvl < res.size(); lvl++) {
PADDLE_ENFORCE(it_begin != it->end(), "error in parsing lod info"); const auto& in_level = in[level + lvl];
PADDLE_ENFORCE(it_end != it->end(), "error in parsing lod info"); const auto& above_level = res[lvl - 1];
new_lod.emplace_back(it_begin, it_end + 1); auto& out_level = res[lvl];
// reset offset if tensor is copyed and sliced. out_level.assign(in_level.begin() + above_level.front(),
std::transform(new_lod.back().begin(), new_lod.back().end(), in_level.begin() + above_level.back() + 1);
new_lod.back().begin(),
[start](int v) { return v - start; });
PADDLE_ENFORCE_EQ(new_lod.back().front(), 0, "error in slice LoD");
} }
PADDLE_ENFORCE_LE(new_lod.size(), in.size()); for (size_t lvl = 0; lvl < res.size(); lvl++) {
return new_lod; // to make the first offset equals 0, all the elements minus the first
// element
size_t front = res[lvl].front();
for (auto& ele : res[lvl]) {
ele -= front;
}
}
return res;
}
LoD ToAbsOffset(const LoD& in) {
// the lowest level stores relative offsets
if (in.empty() || in.size() == 1) return in;
LoD result = in;
for (int level = result.size() - 2; level >= 0; level--) {
for (auto& ele : result[level]) {
ele = result[level + 1][ele];
}
}
return result;
} }
bool operator==(const LoD& a, const LoD& b) { bool operator==(const LoD& a, const LoD& b) {
...@@ -75,17 +94,7 @@ bool operator==(const LoD& a, const LoD& b) { ...@@ -75,17 +94,7 @@ bool operator==(const LoD& a, const LoD& b) {
size_t LoDTensor::NumElements(size_t level, size_t idx) const { size_t LoDTensor::NumElements(size_t level, size_t idx) const {
PADDLE_ENFORCE_LT(level, NumLevels()); PADDLE_ENFORCE_LT(level, NumLevels());
PADDLE_ENFORCE_LT(idx, NumElements(level)); PADDLE_ENFORCE_LT(idx, NumElements(level));
// the last level of LoD, just return number of records in Tensor
if (level == NumLevels() - 1) {
return lod_[level][idx + 1] - lod_[level][idx]; return lod_[level][idx + 1] - lod_[level][idx];
}
// high level of LoD, and there is another lower level, return number of
// lower-level elements
auto tmp = SliceInLevel(lod_, level, idx, idx + 1);
PADDLE_ENFORCE_GE(tmp.size(), 2);
// there is a 0 as a placeholder stored in LoD, so the number of elements
// equals lod.size() - 1
return tmp[1].size() - 1;
} }
void LoDTensor::ShrinkLevels(size_t level_begin, size_t level_end) { void LoDTensor::ShrinkLevels(size_t level_begin, size_t level_end) {
......
...@@ -39,23 +39,36 @@ using Vector = thrust::host_vector< ...@@ -39,23 +39,36 @@ using Vector = thrust::host_vector<
#endif #endif
/* /*
* 3-level LoD stores * LoD is short for Level of Details.
*
* 0 10 20
* 0 5 10 15 20
* 0 2 5 7 10 12 15 20
* *
* - in a level, each element indicates offset in the underlying Tensor * - in a level, each element indicates relative offset of the lower level
* - the first element should be 0 and that indicates that this sequence start * - the first element should be 0 and that indicates that this sequence start
* from 0 * from 0
* - each sequence's begin and end(no-inclusive) is level[id, id+1] * - each sequence's begin and end(no-inclusive) is level[id, id+1]
*
* For example:
* 3-level LoD stores
*
* 0 2 3
* 0 2 4 7
* 0 2 5 7 10 12 15 20
*/ */
using LoD = std::vector<Vector<size_t>>; using LoD = std::vector<Vector<size_t>>;
/*
* Slice levels from a LoD.
* NOTE the lowest level should always be the absolute offsets of the underlying
* tensor instances. So if higher layers are sliced without the lowest level,
* the lower level of the sliced LoD will be transformed to the absolute offset.
*/
LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end); LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end);
LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin, LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin,
size_t elem_end); size_t elem_end);
/*
* Transform an LoD from relative offsets to absolute offsets.
*/
LoD ToAbsOffset(const LoD& in);
bool operator==(const LoD& a, const LoD& b); bool operator==(const LoD& a, const LoD& b);
...@@ -74,12 +87,12 @@ class LoDTensor : public Tensor { ...@@ -74,12 +87,12 @@ class LoDTensor : public Tensor {
LoD lod() const { return lod_; } LoD lod() const { return lod_; }
/* /*
* Get a element from LoD. * Get the start offset and end offset of an element from LoD.
*/ */
size_t lod_element(size_t level, size_t elem) const { std::pair<size_t, size_t> lod_element(size_t level, size_t elem) const {
PADDLE_ENFORCE_LT(level, NumLevels()); PADDLE_ENFORCE_LT(level, NumLevels());
PADDLE_ENFORCE_LT(elem, NumElements(level)); PADDLE_ENFORCE_LT(elem, NumElements(level));
return (lod_)[level][elem]; return std::make_pair((lod_)[level][elem], (lod_)[level][elem + 1]);
} }
/* /*
......
...@@ -30,8 +30,8 @@ class LoDTensorTester : public ::testing::Test { ...@@ -30,8 +30,8 @@ class LoDTensorTester : public ::testing::Test {
// 0 5 10 15 20 // 0 5 10 15 20
// 0 2 5 7 10 12 15 20 // 0 2 5 7 10 12 15 20
LoD lod; LoD lod;
lod.push_back(std::vector<size_t>{0, 10, 20}); lod.push_back(std::vector<size_t>{0, 2, 3});
lod.push_back(std::vector<size_t>{0, 5, 10, 15, 20}); lod.push_back(std::vector<size_t>{0, 2, 5, 8});
lod.push_back(std::vector<size_t>{0, 2, 5, 7, 10, 12, 15, 17, 20}); lod.push_back(std::vector<size_t>{0, 2, 5, 7, 10, 12, 15, 17, 20});
ASSERT_EQ(lod.size(), 3UL); ASSERT_EQ(lod.size(), 3UL);
...@@ -52,14 +52,14 @@ TEST_F(LoDTensorTester, NumLevels) { ASSERT_EQ(lod_tensor_.NumLevels(), 3UL); } ...@@ -52,14 +52,14 @@ TEST_F(LoDTensorTester, NumLevels) { ASSERT_EQ(lod_tensor_.NumLevels(), 3UL); }
TEST_F(LoDTensorTester, NumElements) { TEST_F(LoDTensorTester, NumElements) {
ASSERT_EQ(lod_tensor_.NumElements(0), 2UL); ASSERT_EQ(lod_tensor_.NumElements(0), 2UL);
ASSERT_EQ(lod_tensor_.NumElements(1), 4UL); ASSERT_EQ(lod_tensor_.NumElements(1), 3UL);
ASSERT_EQ(lod_tensor_.NumElements(2), 8UL); ASSERT_EQ(lod_tensor_.NumElements(2), 8UL);
} }
TEST_F(LoDTensorTester, NumElements2) { TEST_F(LoDTensorTester, NumElements2) {
ASSERT_EQ(lod_tensor_.NumElements(0, 0), 2UL); ASSERT_EQ(lod_tensor_.NumElements(0, 0), 2UL);
ASSERT_EQ(lod_tensor_.NumElements(0, 1), 2UL); ASSERT_EQ(lod_tensor_.NumElements(0, 1), 1UL);
ASSERT_EQ(lod_tensor_.NumElements(1, 1), 2UL); ASSERT_EQ(lod_tensor_.NumElements(1, 1), 3UL);
} }
TEST_F(LoDTensorTester, ShrinkLevels) { TEST_F(LoDTensorTester, ShrinkLevels) {
...@@ -68,17 +68,16 @@ TEST_F(LoDTensorTester, ShrinkLevels) { ...@@ -68,17 +68,16 @@ TEST_F(LoDTensorTester, ShrinkLevels) {
LoDTensor new_lod_tensor = lod_tensor_; LoDTensor new_lod_tensor = lod_tensor_;
new_lod_tensor.ShrinkLevels(level, level + 1); new_lod_tensor.ShrinkLevels(level, level + 1);
ASSERT_EQ(new_lod_tensor.NumLevels(), 1UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 1UL);
ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor_.NumElements(level));
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>()); ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
} }
// shrink 2 level // shrink 2 level
for (size_t level = 0; level < 2UL; ++level) { for (size_t level = 0; level < 2UL; ++level) {
LoDTensor new_lod_tensor = lod_tensor_; LoDTensor new_lod_tensor = lod_tensor_;
new_lod_tensor.ShrinkLevels(level, level + 2); new_lod_tensor.ShrinkLevels(level, level + 2);
// the lowest level's last element should be the tensor's batch_size.
ASSERT_EQ(new_lod_tensor.lod().back().back(),
lod_tensor_.lod().back().back());
ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor_.NumElements(level));
ASSERT_EQ(new_lod_tensor.NumElements(1),
lod_tensor_.NumElements(level + 1));
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>()); ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
} }
} }
...@@ -86,19 +85,19 @@ TEST_F(LoDTensorTester, ShrinkLevels) { ...@@ -86,19 +85,19 @@ TEST_F(LoDTensorTester, ShrinkLevels) {
TEST_F(LoDTensorTester, ShrinkInLevel) { TEST_F(LoDTensorTester, ShrinkInLevel) {
size_t level = 0; size_t level = 0;
LoDTensor new_lod_tensor = lod_tensor_; LoDTensor new_lod_tensor = lod_tensor_;
new_lod_tensor.ShrinkInLevel(level, 0, 2); new_lod_tensor.ShrinkInLevel(level, 0, 1);
EXPECT_EQ(new_lod_tensor.NumLevels(), 3UL); EXPECT_EQ(new_lod_tensor.NumLevels(), 3UL);
EXPECT_EQ(new_lod_tensor.NumElements(0), 2UL); EXPECT_EQ(new_lod_tensor.NumElements(0), 1UL);
EXPECT_EQ(new_lod_tensor.NumElements(1), 4UL); EXPECT_EQ(new_lod_tensor.NumElements(1), 2UL);
EXPECT_EQ(new_lod_tensor.NumElements(2), 8UL); EXPECT_EQ(new_lod_tensor.NumElements(2), 5UL);
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>()); ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
level = 1; level = 1;
new_lod_tensor = lod_tensor_; new_lod_tensor = lod_tensor_;
new_lod_tensor.ShrinkInLevel(level, 0, 2); new_lod_tensor.ShrinkInLevel(level, 1, 2);
ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
ASSERT_EQ(new_lod_tensor.NumElements(0), 2UL); ASSERT_EQ(new_lod_tensor.NumElements(0), 1UL);
ASSERT_EQ(new_lod_tensor.NumElements(1), 4UL); ASSERT_EQ(new_lod_tensor.NumElements(1), 3UL);
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>()); ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
} }
......
...@@ -36,8 +36,8 @@ TEST(LoDTensor, LoDInGPU) { ...@@ -36,8 +36,8 @@ TEST(LoDTensor, LoDInGPU) {
lod_tensor.mutable_data<float>(place); lod_tensor.mutable_data<float>(place);
lod_tensor.set_lod(src_lod); lod_tensor.set_lod(src_lod);
CHECK_EQ(lod_tensor.lod_element(0, 2), 4UL); CHECK_EQ(lod_tensor.lod_element(0, 2).first, 4UL);
CHECK_EQ(lod_tensor.lod_element(0, 4), 8UL); CHECK_EQ(lod_tensor.lod_element(0, 4).first, 8UL);
auto lod = lod_tensor.lod(); auto lod = lod_tensor.lod();
......
...@@ -44,6 +44,11 @@ class OpProtoAndCheckerMaker { ...@@ -44,6 +44,11 @@ class OpProtoAndCheckerMaker {
var_->set_intermediate(true); var_->set_intermediate(true);
return *this; return *this;
} }
VariableBuilder& AsDispensable() {
var_->set_dispensable(true);
return *this;
}
}; };
VariableBuilder AddInput(const std::string& name, const std::string& comment); VariableBuilder AddInput(const std::string& name, const std::string& comment);
......
...@@ -43,12 +43,13 @@ static VariableNameMap ConvertOpDescVarsToVarNameMap( ...@@ -43,12 +43,13 @@ static VariableNameMap ConvertOpDescVarsToVarNameMap(
return ret_val; return ret_val;
} }
std::unique_ptr<OperatorBase> OpRegistry::CreateOp(const OpDesc& op_desc) { std::unique_ptr<OperatorBase> OpRegistry::CreateOp(const OpDesc& op_desc,
ProgramDesc* program) {
VariableNameMap inputs = ConvertOpDescVarsToVarNameMap(op_desc.inputs()); VariableNameMap inputs = ConvertOpDescVarsToVarNameMap(op_desc.inputs());
VariableNameMap outputs = ConvertOpDescVarsToVarNameMap(op_desc.outputs()); VariableNameMap outputs = ConvertOpDescVarsToVarNameMap(op_desc.outputs());
AttributeMap attrs; AttributeMap attrs;
for (auto& attr : op_desc.attrs()) { for (auto& attr : op_desc.attrs()) {
attrs[attr.name()] = GetAttrValue(attr); attrs[attr.name()] = GetAttrValue(attr, program);
} }
return CreateOp(op_desc.type(), inputs, outputs, attrs); return CreateOp(op_desc.type(), inputs, outputs, attrs);
......
...@@ -20,6 +20,8 @@ limitations under the License. */ ...@@ -20,6 +20,8 @@ limitations under the License. */
#include <typeinfo> #include <typeinfo>
#include <unordered_map> #include <unordered_map>
#include <unordered_set> #include <unordered_set>
#include "glog/logging.h" // For VLOG()
#include "paddle/framework/attribute.h" #include "paddle/framework/attribute.h"
#include "paddle/framework/details/op_registry.h" #include "paddle/framework/details/op_registry.h"
#include "paddle/framework/framework.pb.h" #include "paddle/framework/framework.pb.h"
...@@ -45,18 +47,15 @@ class Registrar { ...@@ -45,18 +47,15 @@ class Registrar {
template <typename... ARGS> template <typename... ARGS>
struct OperatorRegistrar : public Registrar { struct OperatorRegistrar : public Registrar {
explicit OperatorRegistrar(const char* op_type) : op_type(op_type) { explicit OperatorRegistrar(const char* op_type) {
PADDLE_ENFORCE(!OpInfoMap::Instance().Has(op_type), PADDLE_ENFORCE(!OpInfoMap::Instance().Has(op_type),
"'%s' is registered more than once.", op_type); "'%s' is registered more than once.", op_type);
static_assert(sizeof...(ARGS) != 0, static_assert(sizeof...(ARGS) != 0,
"OperatorRegistrar should be invoked at least by OpClass"); "OperatorRegistrar should be invoked at least by OpClass");
OpInfo info;
details::OperatorRegistrarRecursive<0, false, ARGS...>(op_type, &info); details::OperatorRegistrarRecursive<0, false, ARGS...>(op_type, &info);
OpInfoMap::Instance().Insert(op_type, info); OpInfoMap::Instance().Insert(op_type, info);
} }
const char* op_type;
OpInfo info;
}; };
class OpRegistry { class OpRegistry {
...@@ -77,7 +76,8 @@ class OpRegistry { ...@@ -77,7 +76,8 @@ class OpRegistry {
const VariableNameMap& outputs, const VariableNameMap& outputs,
AttributeMap attrs); AttributeMap attrs);
static std::unique_ptr<OperatorBase> CreateOp(const OpDesc& op_desc); static std::unique_ptr<OperatorBase> CreateOp(const OpDesc& op_desc,
ProgramDesc* program);
static std::unique_ptr<OperatorBase> CreateOp(const OpDescBind& op_desc); static std::unique_ptr<OperatorBase> CreateOp(const OpDescBind& op_desc);
}; };
......
...@@ -74,7 +74,7 @@ TEST(OpRegistry, CreateOp) { ...@@ -74,7 +74,7 @@ TEST(OpRegistry, CreateOp) {
attr->set_type(paddle::framework::AttrType::FLOAT); attr->set_type(paddle::framework::AttrType::FLOAT);
attr->set_f(scale); attr->set_f(scale);
auto op = paddle::framework::OpRegistry::CreateOp(op_desc); auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr);
paddle::framework::Scope scope; paddle::framework::Scope scope;
paddle::platform::CPUDeviceContext dev_ctx; paddle::platform::CPUDeviceContext dev_ctx;
op->Run(scope, dev_ctx); op->Run(scope, dev_ctx);
...@@ -95,7 +95,7 @@ TEST(OpRegistry, IllegalAttr) { ...@@ -95,7 +95,7 @@ TEST(OpRegistry, IllegalAttr) {
bool caught = false; bool caught = false;
try { try {
paddle::framework::OpRegistry::CreateOp(op_desc); paddle::framework::OpRegistry::CreateOp(op_desc, nullptr);
} catch (paddle::platform::EnforceNotMet err) { } catch (paddle::platform::EnforceNotMet err) {
caught = true; caught = true;
std::string msg = "larger_than check fail"; std::string msg = "larger_than check fail";
...@@ -115,7 +115,7 @@ TEST(OpRegistry, DefaultValue) { ...@@ -115,7 +115,7 @@ TEST(OpRegistry, DefaultValue) {
ASSERT_TRUE(op_desc.IsInitialized()); ASSERT_TRUE(op_desc.IsInitialized());
auto op = paddle::framework::OpRegistry::CreateOp(op_desc); auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr);
paddle::framework::Scope scope; paddle::framework::Scope scope;
paddle::platform::CPUDeviceContext dev_ctx; paddle::platform::CPUDeviceContext dev_ctx;
op->Run(scope, dev_ctx); op->Run(scope, dev_ctx);
...@@ -131,7 +131,7 @@ TEST(OpRegistry, CustomChecker) { ...@@ -131,7 +131,7 @@ TEST(OpRegistry, CustomChecker) {
// attr 'test_attr' is not set // attr 'test_attr' is not set
bool caught = false; bool caught = false;
try { try {
paddle::framework::OpRegistry::CreateOp(op_desc); paddle::framework::OpRegistry::CreateOp(op_desc, nullptr);
} catch (paddle::platform::EnforceNotMet err) { } catch (paddle::platform::EnforceNotMet err) {
caught = true; caught = true;
std::string msg = "Attribute 'test_attr' is required!"; std::string msg = "Attribute 'test_attr' is required!";
...@@ -149,7 +149,7 @@ TEST(OpRegistry, CustomChecker) { ...@@ -149,7 +149,7 @@ TEST(OpRegistry, CustomChecker) {
attr->set_i(3); attr->set_i(3);
caught = false; caught = false;
try { try {
paddle::framework::OpRegistry::CreateOp(op_desc); paddle::framework::OpRegistry::CreateOp(op_desc, nullptr);
} catch (paddle::platform::EnforceNotMet err) { } catch (paddle::platform::EnforceNotMet err) {
caught = true; caught = true;
std::string msg = "'test_attr' must be even!"; std::string msg = "'test_attr' must be even!";
...@@ -166,7 +166,7 @@ TEST(OpRegistry, CustomChecker) { ...@@ -166,7 +166,7 @@ TEST(OpRegistry, CustomChecker) {
attr->set_name("test_attr"); attr->set_name("test_attr");
attr->set_type(paddle::framework::AttrType::INT); attr->set_type(paddle::framework::AttrType::INT);
attr->set_i(4); attr->set_i(4);
auto op = paddle::framework::OpRegistry::CreateOp(op_desc); auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr);
paddle::platform::CPUDeviceContext dev_ctx; paddle::platform::CPUDeviceContext dev_ctx;
paddle::framework::Scope scope; paddle::framework::Scope scope;
op->Run(scope, dev_ctx); op->Run(scope, dev_ctx);
......
...@@ -252,5 +252,20 @@ std::ostream& operator<<(std::ostream& os, ...@@ -252,5 +252,20 @@ std::ostream& operator<<(std::ostream& os,
return os; return os;
} }
bool OpSupportGPU(const std::string& op_type) {
auto& all_kernels = OperatorWithKernel::AllOpKernels();
auto it = all_kernels.find(op_type);
if (it == all_kernels.end()) {
// All control operator must support GPU
return true;
}
for (auto& kern_pair : it->second) {
if (platform::is_gpu_place(kern_pair.first.place_)) {
return true;
}
}
return false;
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -20,12 +20,13 @@ limitations under the License. */ ...@@ -20,12 +20,13 @@ limitations under the License. */
#include <unordered_map> #include <unordered_map>
#include <vector> #include <vector>
#include "op_info.h" #include "glog/logging.h" // For VLOG
#include "paddle/framework/attribute.h" #include "paddle/framework/attribute.h"
#include "paddle/framework/block_desc.h" #include "paddle/framework/block_desc.h"
#include "paddle/framework/data_type.h" #include "paddle/framework/data_type.h"
#include "paddle/framework/framework.pb.h" #include "paddle/framework/framework.pb.h"
#include "paddle/framework/lod_tensor.h" #include "paddle/framework/lod_tensor.h"
#include "paddle/framework/op_info.h"
#include "paddle/framework/scope.h" #include "paddle/framework/scope.h"
#include "paddle/framework/shape_inference.h" #include "paddle/framework/shape_inference.h"
#include "paddle/framework/tensor.h" #include "paddle/framework/tensor.h"
...@@ -326,37 +327,47 @@ class CompileTimeInferShapeContext : public InferShapeContext { ...@@ -326,37 +327,47 @@ class CompileTimeInferShapeContext : public InferShapeContext {
bool HasInput(const std::string& name) const override { bool HasInput(const std::string& name) const override {
const std::vector<std::string>& input_names = op_.Input(name); const std::vector<std::string>& input_names = op_.Input(name);
auto length = input_names.size(); auto length = input_names.size();
if (length == 0) {
return false;
}
PADDLE_ENFORCE_EQ(length, 1UL, PADDLE_ENFORCE_EQ(length, 1UL,
"Input(%s) should have only one value, " "Input(%s) should have only one value, "
"but it have %d now", "but it have %d now",
name, length); name, length);
return block_.HasVar(input_names[0]); return block_.HasVarRecursive(input_names[0]);
} }
bool HasOutput(const std::string& name) const override { bool HasOutput(const std::string& name) const override {
const std::vector<std::string>& output_names = op_.Output(name); const std::vector<std::string>& output_names = op_.Output(name);
auto length = output_names.size(); auto length = output_names.size();
if (length == 0) {
return false;
}
PADDLE_ENFORCE_EQ(length, 1UL, PADDLE_ENFORCE_EQ(length, 1UL,
"Output(%s) should have only one value, " "Output(%s) should have only one value, "
"but it have %d now", "but it have %d now",
name, length); name, length);
return block_.HasVar(output_names[0]); return block_.HasVarRecursive(output_names[0]);
} }
bool HasInputs(const std::string& name) const override { bool HasInputs(const std::string& name) const override {
const std::vector<std::string>& input_names = op_.Input(name); const std::vector<std::string>& input_names = op_.Input(name);
PADDLE_ENFORCE(!input_names.empty(), "Inputs(%s) length is 0", name); if (input_names.empty()) {
return false;
}
for (auto& input : input_names) { for (auto& input : input_names) {
if (!block_.HasVar(input)) return false; if (!block_.HasVarRecursive(input)) return false;
} }
return true; return true;
} }
bool HasOutputs(const std::string& name) const override { bool HasOutputs(const std::string& name) const override {
const std::vector<std::string>& output_names = op_.Output(name); const std::vector<std::string>& output_names = op_.Output(name);
PADDLE_ENFORCE(!output_names.empty(), "Inputs(%s) length is 0", name); if (output_names.empty()) {
return false;
}
for (auto& output : output_names) { for (auto& output : output_names) {
if (!block_.HasVar(output)) return false; if (!block_.HasVarRecursive(output)) return false;
} }
return true; return true;
} }
...@@ -403,11 +414,11 @@ class CompileTimeInferShapeContext : public InferShapeContext { ...@@ -403,11 +414,11 @@ class CompileTimeInferShapeContext : public InferShapeContext {
private: private:
DDim GetDim(const std::string& name) const override { DDim GetDim(const std::string& name) const override {
return framework::make_ddim(block_.FindVar(name)->Shape()); return framework::make_ddim(block_.FindVarRecursive(name)->Shape());
} }
void SetDim(const std::string& name, const DDim& dim) override { void SetDim(const std::string& name, const DDim& dim) override {
block_.FindVar(name)->SetShape(framework::vectorize(dim)); block_.FindVarRecursive(name)->SetShape(framework::vectorize(dim));
} }
const OpDescBind& op_; const OpDescBind& op_;
...@@ -420,13 +431,27 @@ class RuntimeInferShapeContext : public InferShapeContext { ...@@ -420,13 +431,27 @@ class RuntimeInferShapeContext : public InferShapeContext {
: op_(op), scope_(scope) {} : op_(op), scope_(scope) {}
bool HasInput(const std::string& name) const override { bool HasInput(const std::string& name) const override {
auto ipt = op_.Input(name); auto& ins = Inputs(name);
size_t length = ins.size();
if (length == 0) {
return false;
}
PADDLE_ENFORCE_EQ(length, 1UL, "Input %s should have more than one inputs",
name);
auto ipt = ins[0];
auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt); auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
return var != nullptr; return var != nullptr;
} }
bool HasOutput(const std::string& name) const override { bool HasOutput(const std::string& name) const override {
auto ipt = op_.Output(name); auto& outs = Outputs(name);
size_t length = outs.size();
if (length == 0) {
return false;
}
PADDLE_ENFORCE_EQ(length, 1UL, "Output %s should have more than one inputs",
name);
auto ipt = outs[0];
auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt); auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
return var != nullptr; return var != nullptr;
} }
...@@ -573,6 +598,7 @@ class OperatorWithKernel : public OperatorBase { ...@@ -573,6 +598,7 @@ class OperatorWithKernel : public OperatorBase {
void Run(const Scope& scope, void Run(const Scope& scope,
const platform::DeviceContext& dev_ctx) const final { const platform::DeviceContext& dev_ctx) const final {
VLOG(3) << "Running operator " << this->Type();
RuntimeInferShapeContext infer_shape_ctx(*this, scope); RuntimeInferShapeContext infer_shape_ctx(*this, scope);
this->InferShape(&infer_shape_ctx); this->InferShape(&infer_shape_ctx);
...@@ -647,5 +673,7 @@ class OperatorWithKernel : public OperatorBase { ...@@ -647,5 +673,7 @@ class OperatorWithKernel : public OperatorBase {
std::ostream& operator<<(std::ostream& os, std::ostream& operator<<(std::ostream& os,
const OperatorWithKernel::OpKernelKey& kernel_key); const OperatorWithKernel::OpKernelKey& kernel_key);
extern bool OpSupportGPU(const std::string& op_type);
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -83,7 +83,7 @@ TEST(OperatorBase, all) { ...@@ -83,7 +83,7 @@ TEST(OperatorBase, all) {
paddle::platform::CPUDeviceContext device_context; paddle::platform::CPUDeviceContext device_context;
paddle::framework::Scope scope; paddle::framework::Scope scope;
auto op = paddle::framework::OpRegistry::CreateOp(op_desc); auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr);
scope.Var("OUT1"); scope.Var("OUT1");
ASSERT_EQ(paddle::framework::op_run_num, 0); ASSERT_EQ(paddle::framework::op_run_num, 0);
op->Run(scope, device_context); op->Run(scope, device_context);
...@@ -208,7 +208,7 @@ TEST(OpKernel, all) { ...@@ -208,7 +208,7 @@ TEST(OpKernel, all) {
paddle::platform::CPUDeviceContext cpu_device_context; paddle::platform::CPUDeviceContext cpu_device_context;
paddle::framework::Scope scope; paddle::framework::Scope scope;
auto op = paddle::framework::OpRegistry::CreateOp(op_desc); auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr);
ASSERT_EQ(paddle::framework::cpu_kernel_run_num, 0); ASSERT_EQ(paddle::framework::cpu_kernel_run_num, 0);
op->Run(scope, cpu_device_context); op->Run(scope, cpu_device_context);
ASSERT_EQ(paddle::framework::cpu_kernel_run_num, 1); ASSERT_EQ(paddle::framework::cpu_kernel_run_num, 1);
...@@ -244,7 +244,7 @@ TEST(OpKernel, multi_inputs) { ...@@ -244,7 +244,7 @@ TEST(OpKernel, multi_inputs) {
scope.Var("y0")->GetMutable<Tensor>(); scope.Var("y0")->GetMutable<Tensor>();
scope.Var("y1")->GetMutable<Tensor>(); scope.Var("y1")->GetMutable<Tensor>();
auto op = paddle::framework::OpRegistry::CreateOp(op_desc); auto op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr);
op->Run(scope, cpu_device_context); op->Run(scope, cpu_device_context);
} }
......
...@@ -18,27 +18,10 @@ limitations under the License. */ ...@@ -18,27 +18,10 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace framework { namespace framework {
using ProgDescMap =
std::unordered_map<ProgramDesc *, std::unique_ptr<ProgramDescBind>>;
static ProgDescMap *g_bind_map = nullptr;
ProgramDescBind &ProgramDescBind::Instance(ProgramDesc *prog) {
if (g_bind_map == nullptr) {
g_bind_map = new ProgDescMap();
}
auto &map = *g_bind_map;
auto &ptr = map[prog];
if (ptr == nullptr) {
ptr.reset(new ProgramDescBind(prog));
}
return *ptr;
}
BlockDescBind *ProgramDescBind::AppendBlock(const BlockDescBind &parent) { BlockDescBind *ProgramDescBind::AppendBlock(const BlockDescBind &parent) {
auto *b = prog_->add_blocks(); auto *b = prog_.add_blocks();
b->set_parent_idx(parent.ID()); b->set_parent_idx(parent.ID());
b->set_idx(prog_->blocks_size() - 1); b->set_idx(prog_.blocks_size() - 1);
blocks_.emplace_back(new BlockDescBind(this, b)); blocks_.emplace_back(new BlockDescBind(this, b));
return blocks_.back().get(); return blocks_.back().get();
} }
...@@ -47,13 +30,22 @@ ProgramDesc *ProgramDescBind::Proto() { ...@@ -47,13 +30,22 @@ ProgramDesc *ProgramDescBind::Proto() {
for (auto &block : blocks_) { for (auto &block : blocks_) {
block->Flush(); block->Flush();
} }
return prog_; return &prog_;
}
ProgramDescBind::ProgramDescBind() {
auto *block = prog_.mutable_blocks()->Add();
block->set_idx(kRootBlockIndex);
block->set_parent_idx(kNoneBlockIndex);
blocks_.emplace_back(new BlockDescBind(this, block));
} }
ProgramDescBind::ProgramDescBind(ProgramDesc *prog) { ProgramDescBind::ProgramDescBind(const ProgramDescBind &o) {
prog_ = prog; prog_ = o.prog_;
for (auto &block : *prog->mutable_blocks()) {
blocks_.emplace_back(new BlockDescBind(this, &block)); for (int i = 0; i < prog_.blocks_size(); ++i) {
auto *block = prog_.mutable_blocks(i);
blocks_.emplace_back(new BlockDescBind(*o.blocks_[i], block, this));
} }
} }
} // namespace framework } // namespace framework
......
...@@ -17,6 +17,7 @@ limitations under the License. */ ...@@ -17,6 +17,7 @@ limitations under the License. */
#include <memory> #include <memory>
#include <vector> #include <vector>
#include "paddle/framework/framework.pb.h" #include "paddle/framework/framework.pb.h"
#include "paddle/framework/proto_desc.h"
#include "paddle/platform/macros.h" #include "paddle/platform/macros.h"
namespace paddle { namespace paddle {
...@@ -26,7 +27,9 @@ class BlockDescBind; ...@@ -26,7 +27,9 @@ class BlockDescBind;
class ProgramDescBind { class ProgramDescBind {
public: public:
static ProgramDescBind &Instance(ProgramDesc *prog); ProgramDescBind();
ProgramDescBind(const ProgramDescBind &o);
BlockDescBind *AppendBlock(const BlockDescBind &parent); BlockDescBind *AppendBlock(const BlockDescBind &parent);
...@@ -37,14 +40,9 @@ class ProgramDescBind { ...@@ -37,14 +40,9 @@ class ProgramDescBind {
ProgramDesc *Proto(); ProgramDesc *Proto();
private: private:
explicit ProgramDescBind(ProgramDesc *prog); ProgramDesc prog_;
// Not owned
ProgramDesc *prog_;
std::vector<std::unique_ptr<BlockDescBind>> blocks_; std::vector<std::unique_ptr<BlockDescBind>> blocks_;
DISABLE_COPY_AND_ASSIGN(ProgramDescBind);
}; };
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/program_desc.h"
#include "gtest/gtest.h"
#include "paddle/framework/block_desc.h"
namespace paddle {
namespace framework {
TEST(ProgramDesc, copy_ctor) {
ProgramDescBind program;
auto* global_block = program.Block(0);
auto* x = global_block->Var("X");
x->SetType(VarDesc_VarType_LOD_TENSOR);
x->SetLoDLevel(0);
x->SetDataType(FP32);
x->SetShape({1000, 784});
auto* y = global_block->Var("Y");
y->SetType(VarDesc_VarType_LOD_TENSOR);
y->SetLoDLevel(0);
y->SetDataType(FP32);
y->SetShape({784, 100});
auto* op = global_block->AppendOp();
op->SetType("mul");
op->SetInput("X", {x->Name()});
op->SetInput("Y", {y->Name()});
auto* out = global_block->Var("Out");
out->SetType(VarDesc_VarType_LOD_TENSOR);
op->SetOutput("Y", {out->Name()});
ProgramDescBind program_copy(program);
auto* global_block_copy = program_copy.Block(0);
ASSERT_NE(global_block, global_block_copy);
auto assert_same_var = [&](const std::string& name, VarDescBind* var_before) {
ASSERT_TRUE(global_block_copy->HasVar(name));
auto* copy = global_block_copy->Var(name);
ASSERT_NE(copy, var_before);
ASSERT_EQ(copy->Name(), var_before->Name());
ASSERT_EQ(copy->GetType(), var_before->GetType());
ASSERT_EQ(copy->Shape(), var_before->Shape());
ASSERT_EQ(copy->Proto()->SerializeAsString(),
var_before->Proto()->SerializeAsString());
};
ASSERT_EQ(global_block->LocalVarNames(), global_block_copy->LocalVarNames());
ASSERT_EQ(3, global_block_copy->LocalVarNames().size());
assert_same_var("X", x);
assert_same_var("Y", y);
assert_same_var("Out", out);
for (size_t i = 0; i < global_block->OpSize(); ++i) {
auto op_origin = global_block->Op(i);
auto op_copy = global_block->Op(i);
ASSERT_EQ(op_origin->Type(), op_copy->Type());
ASSERT_EQ(op_origin->Inputs(), op_copy->Inputs());
ASSERT_EQ(op_origin->Outputs(), op_copy->Outputs());
ASSERT_EQ(op_copy->Proto()->SerializeAsString(),
op_origin->Proto()->SerializeAsString());
}
// Not check block's protostr are same it because the order of vars could be
// different and it is correct.
}
} // namespace framework
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
namespace paddle {
namespace framework {
// The Index of first Block in Program. also called root block.
constexpr int kRootBlockIndex = 0;
// The Parent Index of root Block, this block does not exist.
constexpr int kNoneBlockIndex = -1;
} // namespace framework
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/prune.h"
#include <algorithm>
#include <set>
#include <string>
#include <vector>
#include <glog/logging.h>
namespace paddle {
namespace framework {
const std::string kFeedOpType = "feed";
const std::string kFetchOpType = "fetch";
bool HasDependentVar(const OpDesc& op_desc,
const std::set<std::string>& dependent_vars) {
for (auto& var : op_desc.outputs()) {
for (auto& argu : var.arguments()) {
if (dependent_vars.count(argu) != 0) {
return true;
}
}
}
return false;
}
bool IsTarget(const OpDesc& op_desc) {
if (op_desc.has_is_target()) {
return op_desc.is_target();
}
return false;
}
void prune_impl(const ProgramDesc& input, ProgramDesc& output, int block_id) {
// TODO(tonyyang-svail):
// - will change to use multiple blocks for RNN op and Cond Op
auto& block = input.blocks(block_id);
auto& ops = block.ops();
bool expect_feed = true;
for (auto& op_desc : ops) {
PADDLE_ENFORCE(op_desc.type() != kFeedOpType || expect_feed,
"All FeedOps are at the beginning of the ProgramDesc");
expect_feed = (op_desc.type() == kFeedOpType);
}
bool expect_fetch = true;
for (auto op_iter = ops.rbegin(); op_iter != ops.rend(); ++op_iter) {
auto& op_desc = *op_iter;
PADDLE_ENFORCE(op_desc.type() != kFetchOpType || expect_fetch,
"All FetchOps must at the end of the ProgramDesc");
expect_fetch = (op_desc.type() == kFetchOpType);
}
std::set<std::string> dependent_vars;
std::vector<bool> should_run;
for (auto op_iter = ops.rbegin(); op_iter != ops.rend(); ++op_iter) {
auto& op_desc = *op_iter;
if (IsTarget(op_desc) || HasDependentVar(op_desc, dependent_vars)) {
// insert its input to the dependency graph
for (auto& var : op_desc.inputs()) {
for (auto& argu : var.arguments()) {
dependent_vars.insert(argu);
}
}
should_run.push_back(true);
} else {
should_run.push_back(false);
}
}
// since we are traversing the ProgramDesc in reverse order
// we reverse the should_run vector
std::reverse(should_run.begin(), should_run.end());
output = input;
auto* op_field = output.mutable_blocks(block_id)->mutable_ops();
op_field->Clear();
for (size_t i = 0; i < should_run.size(); ++i) {
if (should_run[i]) {
*op_field->Add() = input.blocks(block_id).ops(i);
}
}
}
void Prune(const ProgramDesc& input, ProgramDesc& output) {
prune_impl(input, output, 0);
}
} // namespace framework
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/framework.pb.h"
#include "paddle/platform/enforce.h"
namespace paddle {
namespace framework {
void Prune(const ProgramDesc& input, ProgramDesc& output);
} // namespace framework
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/prune.h"
#include "paddle/framework/attribute.h"
#include "paddle/framework/operator.h"
#include "paddle/operators/net_op.h"
#include "paddle/framework/block_desc.h"
#include "paddle/framework/op_desc.h"
#include "paddle/framework/program_desc.h"
#include <gtest/gtest.h>
namespace f = paddle::framework;
namespace ops = paddle::operators;
void AddOp(const std::string &type, const f::VariableNameMap &inputs,
const f::VariableNameMap &outputs, f::AttributeMap attrs,
paddle::framework::BlockDescBind *block) {
// insert output
for (auto kv : outputs) {
for (auto v : kv.second) {
auto var = block->Var(v);
var->SetDataType(paddle::framework::DataType::FP32);
}
}
// insert op
auto op = block->AppendOp();
op->SetType(type);
for (auto &kv : inputs) {
op->SetInput(kv.first, kv.second);
}
for (auto &kv : outputs) {
op->SetOutput(kv.first, kv.second);
}
op->SetAttrMap(attrs);
}
TEST(Prune, one_operator) {
f::ProgramDescBind program;
f::BlockDescBind *block = program.Block(0);
AddOp("one_one", {{"input", {"a"}}}, {{"output", {"b"}}}, {}, block);
f::ProgramDesc *pdesc = program.Proto();
f::ProgramDesc pruned;
Prune(*pdesc, pruned);
PADDLE_ENFORCE_EQ(pruned.blocks(0).ops_size(), 0);
pdesc->mutable_blocks(0)->mutable_ops(0)->set_is_target(true);
Prune(*pdesc, pruned);
PADDLE_ENFORCE_EQ(pruned.blocks(0).ops_size(), 1);
}
TEST(Prune, forward) {
f::ProgramDescBind program;
f::BlockDescBind *block = program.Block(0);
AddOp("one_one", {{"input", {"a"}}}, {{"output", {"b"}}}, {}, block);
AddOp("one_one", {{"input", {"b"}}}, {{"output", {"c"}}}, {}, block);
AddOp("one_one", {{"input", {"c"}}}, {{"output", {"d"}}}, {}, block);
AddOp("one_one", {{"input", {"d"}}}, {{"output", {"e"}}}, {}, block);
f::ProgramDesc *pdesc = program.Proto();
for (int i = 0; i < pdesc->blocks(0).ops_size(); ++i) {
f::ProgramDesc pruned;
pdesc->mutable_blocks(0)->mutable_ops(i)->set_is_target(true);
Prune(*pdesc, pruned);
PADDLE_ENFORCE_EQ(pruned.blocks(0).ops_size(), i + 1);
}
}
TEST(Prune, multi_input_op) {
f::ProgramDescBind program;
f::BlockDescBind *block = program.Block(0);
AddOp("one_one", {{"input", {"a0"}}}, {{"output", {"b0"}}}, {}, block);
AddOp("one_one", {{"input", {"a1"}}}, {{"output", {"b1"}}}, {}, block);
AddOp("one_one", {{"input", {"a2"}}}, {{"output", {"b2"}}}, {}, block);
AddOp("three_one", {{"input", {"b0", "b1", "b2"}}}, {{"output", {"c"}}}, {},
block);
f::ProgramDesc *pdesc = program.Proto();
pdesc->mutable_blocks(0)->mutable_ops(3)->set_is_target(true);
f::ProgramDesc pruned;
Prune(*pdesc, pruned);
PADDLE_ENFORCE_EQ(pruned.blocks(0).ops_size(), 4);
}
TEST(Prune, multi_output_op) {
f::ProgramDescBind program;
f::BlockDescBind *block = program.Block(0);
AddOp("one_two", {{"input", {"a"}}}, {{"output", {"b", "c"}}}, {}, block);
AddOp("one_one", {{"input", {"b"}}}, {{"output", {"b1"}}}, {}, block);
AddOp("one_one", {{"input", {"c"}}}, {{"output", {"c1"}}}, {}, block);
f::ProgramDesc *pdesc = program.Proto();
pdesc->mutable_blocks(0)->mutable_ops(2)->set_is_target(true);
f::ProgramDesc pruned;
Prune(*pdesc, pruned);
PADDLE_ENFORCE_EQ(pruned.blocks(0).ops_size(), 2);
}
TEST(Prune, multi_target) {
f::ProgramDescBind program;
f::BlockDescBind *block = program.Block(0);
AddOp("one_two", {{"input", {"a"}}}, {{"output", {"b", "c"}}}, {}, block);
AddOp("one_one", {{"input", {"b"}}}, {{"output", {"b1"}}}, {}, block);
AddOp("one_one", {{"input", {"c"}}}, {{"output", {"c1"}}}, {}, block);
f::ProgramDesc *pdesc = program.Proto();
pdesc->mutable_blocks(0)->mutable_ops(1)->set_is_target(true);
pdesc->mutable_blocks(0)->mutable_ops(2)->set_is_target(true);
f::ProgramDesc pruned;
Prune(*pdesc, pruned);
PADDLE_ENFORCE_EQ(pruned.blocks(0).ops_size(), 3);
}
...@@ -65,12 +65,11 @@ void Scope::DropKids() { ...@@ -65,12 +65,11 @@ void Scope::DropKids() {
kids_.clear(); kids_.clear();
} }
framework::Scope& GetGlobalScope() { void Scope::DeleteScope(Scope* scope) {
static framework::Scope* g_scope = nullptr; auto it = std::find(this->kids_.begin(), this->kids_.end(), scope);
if (g_scope == nullptr) { PADDLE_ENFORCE(it != this->kids_.end(), "Cannot find %p as kid scope", scope);
g_scope = new framework::Scope(); this->kids_.erase(it);
} delete scope;
return *g_scope;
} }
} // namespace framework } // namespace framework
......
...@@ -59,6 +59,8 @@ class Scope { ...@@ -59,6 +59,8 @@ class Scope {
/// Find the scope or an ancestor scope that contains the given variable. /// Find the scope or an ancestor scope that contains the given variable.
const Scope* FindScope(const Variable* var) const; const Scope* FindScope(const Variable* var) const;
void DeleteScope(Scope* scope);
/// Drop all kids scopes belonged to this scope. /// Drop all kids scopes belonged to this scope.
void DropKids(); void DropKids();
...@@ -72,8 +74,5 @@ class Scope { ...@@ -72,8 +74,5 @@ class Scope {
DISABLE_COPY_AND_ASSIGN(Scope); DISABLE_COPY_AND_ASSIGN(Scope);
}; };
framework::Scope& GetGlobalScope();
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -60,6 +60,10 @@ class Tensor { ...@@ -60,6 +60,10 @@ class Tensor {
template <typename T> template <typename T>
inline T* mutable_data(platform::Place place); inline T* mutable_data(platform::Place place);
inline void* mutable_data(platform::Place place, std::type_index type);
inline void* mutable_data(platform::Place place);
/** /**
* @brief Return a pointer to mutable memory block. * @brief Return a pointer to mutable memory block.
* *
...@@ -81,7 +85,6 @@ class Tensor { ...@@ -81,7 +85,6 @@ class Tensor {
inline Tensor& Resize(const DDim& dims); inline Tensor& Resize(const DDim& dims);
/*! The internal of two tensors share the same memory block. */ /*! The internal of two tensors share the same memory block. */
template <typename T>
inline Tensor& ShareDataWith(const Tensor& src); inline Tensor& ShareDataWith(const Tensor& src);
/** /**
...@@ -96,26 +99,9 @@ class Tensor { ...@@ -96,26 +99,9 @@ class Tensor {
// TODO(qijun): https://github.com/PaddlePaddle/Paddle/issues/4647 // TODO(qijun): https://github.com/PaddlePaddle/Paddle/issues/4647
// Remove `CopyFrom` and `CopyFromVector` from Tensor interface // Remove `CopyFrom` and `CopyFromVector` from Tensor interface
// and make them global functions // and make them global functions
template <typename T>
inline void CopyFrom(const Tensor& src, const platform::Place& dst_place, inline void CopyFrom(const Tensor& src, const platform::Place& dst_place,
const platform::DeviceContext& ctx); const platform::DeviceContext& ctx);
// FIXME(yuyang18): CopyFrom should without template T, use the replace
// `CopyFrom` with `CopyFromTensor`
inline void CopyFromTensor(const Tensor& src,
const platform::Place& dst_place,
const platform::DeviceContext& ctx) {
// NOLINTNEXTLINES_8 cpplint.py will recognize below lines as functions.
// That is a bug of cpplint.py. Just ignore lint these lines.
if (src.type() == std::type_index(typeid(double))) {
CopyFrom<double>(src, dst_place, ctx);
} else if (src.type() == std::type_index(typeid(float))) {
CopyFrom<float>(src, dst_place, ctx);
} else if (src.type() == std::type_index(typeid(int))) {
CopyFrom<int>(src, dst_place, ctx);
}
}
/** /**
* @brief Copy the content of an external vector to a tensor. * @brief Copy the content of an external vector to a tensor.
* *
...@@ -137,7 +123,6 @@ class Tensor { ...@@ -137,7 +123,6 @@ class Tensor {
* @param[in] end_idx The index of the end row(exclusive) to slice. * @param[in] end_idx The index of the end row(exclusive) to slice.
* The index number begins from 0. * The index number begins from 0.
*/ */
template <typename T>
inline Tensor Slice(const int& begin_idx, const int& end_idx) const; inline Tensor Slice(const int& begin_idx, const int& end_idx) const;
platform::Place place() const { platform::Place place() const {
...@@ -149,7 +134,6 @@ class Tensor { ...@@ -149,7 +134,6 @@ class Tensor {
std::type_index type() const { return holder_->type(); } std::type_index type() const { return holder_->type(); }
private: private:
template <typename T>
inline void check_memory_size() const; inline void check_memory_size() const;
private: private:
...@@ -158,20 +142,22 @@ class Tensor { ...@@ -158,20 +142,22 @@ class Tensor {
* parameter of Variable. * parameter of Variable.
*/ */
struct Placeholder { struct Placeholder {
virtual ~Placeholder() {} virtual ~Placeholder() = default;
virtual void* ptr() const = 0; virtual void* ptr() const = 0;
virtual size_t size() const = 0; virtual size_t size() const = 0;
virtual std::type_index type() const = 0; virtual std::type_index type() const = 0;
virtual platform::Place place() const = 0; virtual platform::Place place() const = 0;
virtual void set_type(std::type_index type) = 0;
}; };
template <typename T, typename Place> template <typename Place>
struct PlaceholderImpl : public Placeholder { struct PlaceholderImpl : public Placeholder {
PlaceholderImpl(Place place, size_t size) PlaceholderImpl(Place place, size_t size, std::type_index type)
: ptr_(static_cast<T*>(memory::Alloc(place, size)), : ptr_(static_cast<uint8_t*>(memory::Alloc(place, size)),
memory::PODDeleter<T, Place>(place)), memory::PODDeleter<uint8_t, Place>(place)),
place_(place), place_(place),
size_(size) { size_(size),
type_(type) {
PADDLE_ENFORCE_NOT_NULL(ptr_, "Insufficient %s memory to allocation.", PADDLE_ENFORCE_NOT_NULL(ptr_, "Insufficient %s memory to allocation.",
(is_cpu_place(place_) ? "CPU" : "GPU")); (is_cpu_place(place_) ? "CPU" : "GPU"));
} }
...@@ -179,16 +165,20 @@ class Tensor { ...@@ -179,16 +165,20 @@ class Tensor {
virtual size_t size() const { return size_; } virtual size_t size() const { return size_; }
virtual platform::Place place() const { return place_; } virtual platform::Place place() const { return place_; }
virtual void* ptr() const { return static_cast<void*>(ptr_.get()); } virtual void* ptr() const { return static_cast<void*>(ptr_.get()); }
virtual std::type_index type() const { return std::type_index(typeid(T)); } virtual std::type_index type() const { return type_; }
virtual void set_type(std::type_index type) { type_ = type; }
/*! the pointer of memory block. */ /*! the pointer of memory block. */
std::unique_ptr<T, memory::PODDeleter<T, Place>> ptr_; std::unique_ptr<uint8_t, memory::PODDeleter<uint8_t, Place>> ptr_;
/*! the place of memory block. */ /*! the place of memory block. */
platform::Place place_; platform::Place place_;
/*! the size of memory block. */ /*! the size of memory block. */
size_t size_; size_t size_;
/* the current type of memory */
std::type_index type_;
}; };
/*! holds the memory block if allocated. */ /*! holds the memory block if allocated. */
......
...@@ -106,7 +106,7 @@ void TensorArray::Write(size_t index, const LoDTensor& value) { ...@@ -106,7 +106,7 @@ void TensorArray::Write(size_t index, const LoDTensor& value) {
values_[index].Resize(value.dims()); values_[index].Resize(value.dims());
values_[index].mutable_data<value_type>(platform::CPUPlace()); values_[index].mutable_data<value_type>(platform::CPUPlace());
values_[index].CopyFrom<value_type>(value, platform::CPUPlace(), values_[index].CopyFrom(value, platform::CPUPlace(),
platform::CPUDeviceContext()); platform::CPUDeviceContext());
} }
...@@ -116,7 +116,7 @@ void TensorArray::WriteShared(size_t index, const LoDTensor& value) { ...@@ -116,7 +116,7 @@ void TensorArray::WriteShared(size_t index, const LoDTensor& value) {
values_.resize(index + 1); values_.resize(index + 1);
} }
values_[index].ShareDataWith<value_type>(value); values_[index].ShareDataWith(value);
} }
LoDTensor TensorArray::Pack(size_t level, const std::vector<DySeqMeta>& meta, LoDTensor TensorArray::Pack(size_t level, const std::vector<DySeqMeta>& meta,
...@@ -163,8 +163,8 @@ LoDTensor TensorArray::Stack() const { ...@@ -163,8 +163,8 @@ LoDTensor TensorArray::Stack() const {
result.mutable_data<value_type>(platform::CPUPlace()); result.mutable_data<value_type>(platform::CPUPlace());
for (size_t idx = 0; idx < size(); idx++) { for (size_t idx = 0; idx < size(); idx++) {
result.Slice<value_type>(idx, idx + 1) result.Slice(idx, idx + 1)
.CopyFrom<value_type>(Read(idx), platform::CPUPlace(), .CopyFrom(Read(idx), platform::CPUPlace(),
platform::CPUDeviceContext()); platform::CPUDeviceContext());
} }
return result; return result;
...@@ -191,12 +191,11 @@ void TensorArray::Unstack(const LoDTensor& source, bool data_shared) const { ...@@ -191,12 +191,11 @@ void TensorArray::Unstack(const LoDTensor& source, bool data_shared) const {
auto& value = values_[elem]; auto& value = values_[elem];
if (data_shared) { if (data_shared) {
// share memory // share memory
value.ShareDataWith<value_type>(source.Slice<value_type>(elem, elem + 1)); value.ShareDataWith(source.Slice(elem, elem + 1));
} else { } else {
// copy // copy
value.Resize(value_dims); value.Resize(value_dims);
value.CopyFrom<value_type>(source.Slice<value_type>(elem, elem + 1), value.CopyFrom(source.Slice(elem, elem + 1), platform::CPUPlace(),
platform::CPUPlace(),
platform::CPUDeviceContext()); platform::CPUDeviceContext());
} }
} }
...@@ -242,11 +241,10 @@ LoDTensor DynamicBatchUnpacker::GetBatch(size_t index) { ...@@ -242,11 +241,10 @@ LoDTensor DynamicBatchUnpacker::GetBatch(size_t index) {
for (size_t i = 0; i < indice.size(); i++) { for (size_t i = 0; i < indice.size(); i++) {
auto index = indice[i]; auto index = indice[i];
auto target = result.Slice<value_type>(i, i + 1); auto target = result.Slice(i, i + 1);
auto slice = source->Slice<value_type>(index, index + 1); auto slice = source->Slice(index, index + 1);
target.CopyFrom<value_type>(slice, platform::CPUPlace(), target.CopyFrom(slice, platform::CPUPlace(), platform::CPUDeviceContext());
platform::CPUDeviceContext());
} }
return result; return result;
...@@ -277,9 +275,9 @@ LoDTensor PackDynamicBatch(const std::vector<LoDTensor>& source, ...@@ -277,9 +275,9 @@ LoDTensor PackDynamicBatch(const std::vector<LoDTensor>& source,
// target is result[index] // target is result[index]
auto index = seq_meta.begin + batch_id; auto index = seq_meta.begin + batch_id;
if (index >= seq_meta.end) break; if (index >= seq_meta.end) break;
auto source_ = source[batch_id].Slice<float>(seq_id, seq_id + 1); auto source_ = source[batch_id].Slice(seq_id, seq_id + 1);
auto target = result.Slice<float>(index, index + 1); auto target = result.Slice(index, index + 1);
target.CopyFrom<float>(source_, platform::CPUPlace(), target.CopyFrom(source_, platform::CPUPlace(),
platform::CPUDeviceContext()); platform::CPUDeviceContext());
} }
} }
......
...@@ -91,7 +91,7 @@ class TensorArrayPackTester : public ::testing::Test { ...@@ -91,7 +91,7 @@ class TensorArrayPackTester : public ::testing::Test {
size_t begin = level[i]; size_t begin = level[i];
size_t end = level[i + 1]; size_t end = level[i + 1];
for (size_t j = begin; j < end; j++) { for (size_t j = begin; j < end; j++) {
auto record = source.Slice<int>(j, j + 1); auto record = source.Slice(j, j + 1);
for (int dim = 0; dim < 128; dim++) { for (int dim = 0; dim < 128; dim++) {
record.mutable_data<int>(platform::CPUPlace())[dim] = j - begin; record.mutable_data<int>(platform::CPUPlace())[dim] = j - begin;
} }
......
...@@ -19,12 +19,50 @@ limitations under the License. */ ...@@ -19,12 +19,50 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace framework { namespace framework {
template <typename... T>
struct SizeOfTypeFunctor;
template <typename T> template <typename T>
struct SizeOfTypeFunctor<T> {
size_t operator()(std::type_index type) const {
if (typeid(T).hash_code() == type.hash_code()) {
return sizeof(T);
} else {
return 0UL;
}
}
};
template <>
struct SizeOfTypeFunctor<> {
size_t operator()(std::type_index type) const { return 0UL; }
};
template <typename HEAD, typename... TAIL>
struct SizeOfTypeFunctor<HEAD, TAIL...> {
size_t operator()(std::type_index type) const {
SizeOfTypeFunctor<HEAD> head;
size_t head_size = head(type);
if (head_size != 0) {
return head_size;
}
SizeOfTypeFunctor<TAIL...> tail;
return tail(type);
}
};
static inline size_t SizeOfType(std::type_index type) {
SizeOfTypeFunctor<int, float, double, int16_t, int64_t> functor;
size_t size = functor(type);
PADDLE_ENFORCE(size != 0UL, "Cannot get size of type %s", type.name());
return size;
}
inline void Tensor::check_memory_size() const { inline void Tensor::check_memory_size() const {
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE_NOT_NULL(
holder_, "Tensor holds no memory. Call Tensor::mutable_data first."); holder_, "Tensor holds no memory. Call Tensor::mutable_data first.");
PADDLE_ENFORCE_GE( PADDLE_ENFORCE_GE(
holder_->size(), numel() * sizeof(T) + offset_, holder_->size(), numel() * SizeOfType(type()) + offset_,
"Tensor's dims_ is out of bound. Call Tensor::mutable_data " "Tensor's dims_ is out of bound. Call Tensor::mutable_data "
"first to re-allocate memory.\n" "first to re-allocate memory.\n"
"or maybe the required data-type mismatches the data already stored."); "or maybe the required data-type mismatches the data already stored.");
...@@ -32,14 +70,23 @@ inline void Tensor::check_memory_size() const { ...@@ -32,14 +70,23 @@ inline void Tensor::check_memory_size() const {
template <typename T> template <typename T>
inline const T* Tensor::data() const { inline const T* Tensor::data() const {
check_memory_size<T>(); check_memory_size();
PADDLE_ENFORCE(std::is_same<T, void>::value ||
holder_->type().hash_code() == typeid(T).hash_code(),
"Tensor holds the wrong type, it holds %s",
this->holder_->type().name());
return reinterpret_cast<const T*>( return reinterpret_cast<const T*>(
reinterpret_cast<uintptr_t>(holder_->ptr()) + offset_); reinterpret_cast<uintptr_t>(holder_->ptr()) + offset_);
} }
template <typename T> template <typename T>
inline T* Tensor::data() { inline T* Tensor::data() {
check_memory_size<T>(); check_memory_size();
PADDLE_ENFORCE(std::is_same<T, void>::value ||
holder_->type().hash_code() == typeid(T).hash_code(),
"Tensor holds the wrong type, it holds %s",
this->holder_->type().name());
return reinterpret_cast<T*>(reinterpret_cast<uintptr_t>(holder_->ptr()) + return reinterpret_cast<T*>(reinterpret_cast<uintptr_t>(holder_->ptr()) +
offset_); offset_);
} }
...@@ -54,51 +101,62 @@ inline T* Tensor::mutable_data(DDim dims, platform::Place place) { ...@@ -54,51 +101,62 @@ inline T* Tensor::mutable_data(DDim dims, platform::Place place) {
template <typename T> template <typename T>
inline T* Tensor::mutable_data(platform::Place place) { inline T* Tensor::mutable_data(platform::Place place) {
static_assert(std::is_pod<T>::value, "T must be POD"); static_assert(std::is_pod<T>::value, "T must be POD");
return reinterpret_cast<T*>(mutable_data(place, typeid(T)));
}
inline void* Tensor::mutable_data(platform::Place place, std::type_index type) {
if (holder_ != nullptr) {
holder_->set_type(type);
}
PADDLE_ENFORCE_GT(numel(), 0, PADDLE_ENFORCE_GT(numel(), 0,
"Tensor's numel must be larger than zero to call " "Tensor's numel must be larger than zero to call "
"Tensor::mutable_data. Call Tensor::set_dim first."); "Tensor::mutable_data. Call Tensor::set_dim first.");
int64_t size = numel() * SizeOfType(type);
/* some versions of boost::variant don't have operator!= */ /* some versions of boost::variant don't have operator!= */
int64_t size = numel() * sizeof(T);
if (holder_ == nullptr || !(holder_->place() == place) || if (holder_ == nullptr || !(holder_->place() == place) ||
holder_->size() < size + offset_) { holder_->size() < size + offset_) {
if (platform::is_cpu_place(place)) { if (platform::is_cpu_place(place)) {
holder_.reset(new PlaceholderImpl<T, platform::CPUPlace>( holder_.reset(new PlaceholderImpl<platform::CPUPlace>(
boost::get<platform::CPUPlace>(place), size)); boost::get<platform::CPUPlace>(place), size, type));
} else if (platform::is_gpu_place(place)) { } else if (platform::is_gpu_place(place)) {
#ifndef PADDLE_WITH_CUDA #ifndef PADDLE_WITH_CUDA
PADDLE_THROW("'GPUPlace' is not supported in CPU only device."); PADDLE_THROW("'GPUPlace' is not supported in CPU only device.");
} }
#else #else
holder_.reset(new PlaceholderImpl<T, platform::GPUPlace>( holder_.reset(new PlaceholderImpl<platform::GPUPlace>(
boost::get<platform::GPUPlace>(place), size)); boost::get<platform::GPUPlace>(place), size, type));
} }
#endif #endif
offset_ = 0; offset_ = 0;
} }
return reinterpret_cast<T*>(reinterpret_cast<uintptr_t>(holder_->ptr()) + return reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(holder_->ptr()) +
offset_); offset_);
} }
template <typename T> inline void* Tensor::mutable_data(platform::Place place) {
PADDLE_ENFORCE(this->holder_ != nullptr,
"Cannot invoke mutable data if current hold nothing");
return mutable_data(place, holder_->type());
}
inline Tensor& Tensor::ShareDataWith(const Tensor& src) { inline Tensor& Tensor::ShareDataWith(const Tensor& src) {
src.check_memory_size<T>(); src.check_memory_size();
*this = src; *this = src;
return *this; return *this;
} }
template <typename T>
inline void Tensor::CopyFrom(const Tensor& src, inline void Tensor::CopyFrom(const Tensor& src,
const platform::Place& dst_place, const platform::Place& dst_place,
const platform::DeviceContext& ctx) { const platform::DeviceContext& ctx) {
src.check_memory_size<T>(); src.check_memory_size();
Resize(src.dims()); Resize(src.dims());
auto src_place = src.holder_->place(); auto src_place = src.holder_->place();
auto src_ptr = static_cast<const void*>(src.data<T>()); auto src_ptr = src.data<void>();
auto dst_ptr = static_cast<void*>(mutable_data<T>(dst_place)); auto dst_ptr = mutable_data(dst_place, src.type());
auto size = src.numel() * sizeof(T); auto size = src.numel() * SizeOfType(src.type());
if (platform::is_cpu_place(src_place) && platform::is_cpu_place(dst_place)) { if (platform::is_cpu_place(src_place) && platform::is_cpu_place(dst_place)) {
memory::Copy(boost::get<platform::CPUPlace>(dst_place), dst_ptr, memory::Copy(boost::get<platform::CPUPlace>(dst_place), dst_ptr,
...@@ -165,9 +223,8 @@ inline void Tensor::CopyFromVector(const std::vector<T>& src, ...@@ -165,9 +223,8 @@ inline void Tensor::CopyFromVector(const std::vector<T>& src,
#endif #endif
} }
template <typename T>
inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const { inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const {
check_memory_size<T>(); check_memory_size();
PADDLE_ENFORCE_GE(begin_idx, 0, PADDLE_ENFORCE_GE(begin_idx, 0,
"The start row index must be greater than 0."); "The start row index must be greater than 0.");
PADDLE_ENFORCE_LE(end_idx, dims_[0], "The end row index is out of bound."); PADDLE_ENFORCE_LE(end_idx, dims_[0], "The end row index is out of bound.");
...@@ -183,7 +240,7 @@ inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const { ...@@ -183,7 +240,7 @@ inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const {
DDim dst_dims = dims_; DDim dst_dims = dims_;
dst_dims[0] = end_idx - begin_idx; dst_dims[0] = end_idx - begin_idx;
dst.Resize(dst_dims); dst.Resize(dst_dims);
dst.offset_ = offset_ + begin_idx * base * sizeof(T); dst.offset_ = offset_ + begin_idx * base * SizeOfType(type());
return dst; return dst;
} }
} }
...@@ -197,10 +254,9 @@ inline const DDim& Tensor::dims() const { return dims_; } ...@@ -197,10 +254,9 @@ inline const DDim& Tensor::dims() const { return dims_; }
inline int64_t Tensor::numel() const { return product(dims_); } inline int64_t Tensor::numel() const { return product(dims_); }
template <typename T>
inline Tensor ReshapeToMatrix(const Tensor& src, int num_col_dims) { inline Tensor ReshapeToMatrix(const Tensor& src, int num_col_dims) {
Tensor res; Tensor res;
res.ShareDataWith<T>(src); res.ShareDataWith(src);
res.Resize(flatten_to_2d(src.dims(), num_col_dims)); res.Resize(flatten_to_2d(src.dims(), num_col_dims));
return res; return res;
} }
......
...@@ -108,7 +108,7 @@ TEST(Tensor, ShareDataWith) { ...@@ -108,7 +108,7 @@ TEST(Tensor, ShareDataWith) {
// Try to share data form uninitialized tensor // Try to share data form uninitialized tensor
bool caught = false; bool caught = false;
try { try {
dst_tensor.ShareDataWith<float>(src_tensor); dst_tensor.ShareDataWith(src_tensor);
} catch (paddle::platform::EnforceNotMet err) { } catch (paddle::platform::EnforceNotMet err) {
caught = true; caught = true;
std::string msg = std::string msg =
...@@ -122,7 +122,7 @@ TEST(Tensor, ShareDataWith) { ...@@ -122,7 +122,7 @@ TEST(Tensor, ShareDataWith) {
ASSERT_TRUE(caught); ASSERT_TRUE(caught);
src_tensor.mutable_data<int>(make_ddim({2, 3, 4}), CPUPlace()); src_tensor.mutable_data<int>(make_ddim({2, 3, 4}), CPUPlace());
dst_tensor.ShareDataWith<int>(src_tensor); dst_tensor.ShareDataWith(src_tensor);
ASSERT_EQ(src_tensor.data<int>(), dst_tensor.data<int>()); ASSERT_EQ(src_tensor.data<int>(), dst_tensor.data<int>());
} }
...@@ -131,7 +131,7 @@ TEST(Tensor, ShareDataWith) { ...@@ -131,7 +131,7 @@ TEST(Tensor, ShareDataWith) {
Tensor src_tensor; Tensor src_tensor;
Tensor dst_tensor; Tensor dst_tensor;
src_tensor.mutable_data<int>(make_ddim({2, 3, 4}), GPUPlace()); src_tensor.mutable_data<int>(make_ddim({2, 3, 4}), GPUPlace());
dst_tensor.ShareDataWith<int>(src_tensor); dst_tensor.ShareDataWith(src_tensor);
ASSERT_EQ(src_tensor.data<int>(), dst_tensor.data<int>()); ASSERT_EQ(src_tensor.data<int>(), dst_tensor.data<int>());
} }
#endif #endif
...@@ -143,7 +143,7 @@ TEST(Tensor, Slice) { ...@@ -143,7 +143,7 @@ TEST(Tensor, Slice) {
{ {
Tensor src_tensor; Tensor src_tensor;
src_tensor.mutable_data<int>(make_ddim({5, 3, 4}), CPUPlace()); src_tensor.mutable_data<int>(make_ddim({5, 3, 4}), CPUPlace());
Tensor slice_tensor = src_tensor.Slice<int>(1, 3); Tensor slice_tensor = src_tensor.Slice(1, 3);
DDim slice_dims = slice_tensor.dims(); DDim slice_dims = slice_tensor.dims();
ASSERT_EQ(arity(slice_dims), 3); ASSERT_EQ(arity(slice_dims), 3);
EXPECT_EQ(slice_dims[0], 2); EXPECT_EQ(slice_dims[0], 2);
...@@ -167,7 +167,7 @@ TEST(Tensor, Slice) { ...@@ -167,7 +167,7 @@ TEST(Tensor, Slice) {
{ {
Tensor src_tensor; Tensor src_tensor;
src_tensor.mutable_data<double>(make_ddim({6, 9}), GPUPlace()); src_tensor.mutable_data<double>(make_ddim({6, 9}), GPUPlace());
Tensor slice_tensor = src_tensor.Slice<double>(2, 6); Tensor slice_tensor = src_tensor.Slice(2, 6);
DDim slice_dims = slice_tensor.dims(); DDim slice_dims = slice_tensor.dims();
ASSERT_EQ(arity(slice_dims), 2); ASSERT_EQ(arity(slice_dims), 2);
EXPECT_EQ(slice_dims[0], 4); EXPECT_EQ(slice_dims[0], 4);
...@@ -202,7 +202,7 @@ TEST(Tensor, CopyFrom) { ...@@ -202,7 +202,7 @@ TEST(Tensor, CopyFrom) {
memcpy(src_ptr, arr, 9 * sizeof(int)); memcpy(src_ptr, arr, 9 * sizeof(int));
auto cpu_place = new paddle::platform::CPUPlace(); auto cpu_place = new paddle::platform::CPUPlace();
dst_tensor.CopyFrom<int>(src_tensor, *cpu_place, cpu_ctx); dst_tensor.CopyFrom(src_tensor, *cpu_place, cpu_ctx);
const int* dst_ptr = dst_tensor.data<int>(); const int* dst_ptr = dst_tensor.data<int>();
ASSERT_NE(src_ptr, dst_ptr); ASSERT_NE(src_ptr, dst_ptr);
...@@ -210,8 +210,8 @@ TEST(Tensor, CopyFrom) { ...@@ -210,8 +210,8 @@ TEST(Tensor, CopyFrom) {
EXPECT_EQ(src_ptr[i], dst_ptr[i]); EXPECT_EQ(src_ptr[i], dst_ptr[i]);
} }
Tensor slice_tensor = src_tensor.Slice<int>(1, 2); Tensor slice_tensor = src_tensor.Slice(1, 2);
dst_tensor.CopyFrom<int>(slice_tensor, *cpu_place, cpu_ctx); dst_tensor.CopyFrom(slice_tensor, *cpu_place, cpu_ctx);
const int* slice_ptr = slice_tensor.data<int>(); const int* slice_ptr = slice_tensor.data<int>();
dst_ptr = dst_tensor.data<int>(); dst_ptr = dst_tensor.data<int>();
ASSERT_NE(dst_ptr, slice_ptr); ASSERT_NE(dst_ptr, slice_ptr);
...@@ -233,11 +233,11 @@ TEST(Tensor, CopyFrom) { ...@@ -233,11 +233,11 @@ TEST(Tensor, CopyFrom) {
// CPU Tensor to GPU Tensor // CPU Tensor to GPU Tensor
auto gpu_place = new paddle::platform::GPUPlace(0); auto gpu_place = new paddle::platform::GPUPlace(0);
CUDADeviceContext gpu_ctx(*gpu_place); CUDADeviceContext gpu_ctx(*gpu_place);
gpu_tensor.CopyFrom<int>(src_tensor, *gpu_place, gpu_ctx); gpu_tensor.CopyFrom(src_tensor, *gpu_place, gpu_ctx);
// GPU Tensor to CPU Tensor // GPU Tensor to CPU Tensor
auto cpu_place = new paddle::platform::CPUPlace(); auto cpu_place = new paddle::platform::CPUPlace();
dst_tensor.CopyFrom<int>(gpu_tensor, *cpu_place, gpu_ctx); dst_tensor.CopyFrom(gpu_tensor, *cpu_place, gpu_ctx);
// Sync before Compare Tensors // Sync before Compare Tensors
gpu_ctx.Wait(); gpu_ctx.Wait();
...@@ -247,13 +247,13 @@ TEST(Tensor, CopyFrom) { ...@@ -247,13 +247,13 @@ TEST(Tensor, CopyFrom) {
EXPECT_EQ(src_ptr[i], dst_ptr[i]); EXPECT_EQ(src_ptr[i], dst_ptr[i]);
} }
Tensor slice_tensor = src_tensor.Slice<int>(1, 2); Tensor slice_tensor = src_tensor.Slice(1, 2);
// CPU Slice Tensor to GPU Tensor // CPU Slice Tensor to GPU Tensor
gpu_tensor.CopyFrom<int>(slice_tensor, *gpu_place, gpu_ctx); gpu_tensor.CopyFrom(slice_tensor, *gpu_place, gpu_ctx);
// GPU Tensor to CPU Tensor // GPU Tensor to CPU Tensor
dst_tensor.CopyFrom<int>(gpu_tensor, *cpu_place, gpu_ctx); dst_tensor.CopyFrom(gpu_tensor, *cpu_place, gpu_ctx);
// Sync before Compare Slice Tensors // Sync before Compare Slice Tensors
gpu_ctx.Wait(); gpu_ctx.Wait();
...@@ -320,7 +320,7 @@ TEST(Tensor, CopyFromVector) { ...@@ -320,7 +320,7 @@ TEST(Tensor, CopyFromVector) {
CUDADeviceContext gpu_ctx(*gpu_place); CUDADeviceContext gpu_ctx(*gpu_place);
gpu_tensor.CopyFromVector<int>(src_vec, gpu_ctx); gpu_tensor.CopyFromVector<int>(src_vec, gpu_ctx);
// Copy from GPU to CPU tensor for comparison // Copy from GPU to CPU tensor for comparison
dst_tensor.CopyFrom<int>(gpu_tensor, *cpu_place, gpu_ctx); dst_tensor.CopyFrom(gpu_tensor, *cpu_place, gpu_ctx);
// Sync before Compare Tensors // Sync before Compare Tensors
gpu_ctx.Wait(); gpu_ctx.Wait();
...@@ -340,7 +340,7 @@ TEST(Tensor, CopyFromVector) { ...@@ -340,7 +340,7 @@ TEST(Tensor, CopyFromVector) {
cpu_tensor.CopyFromVector<int>(src_vec, cpu_ctx); cpu_tensor.CopyFromVector<int>(src_vec, cpu_ctx);
gpu_tensor.Resize(make_ddim({2, 2})); gpu_tensor.Resize(make_ddim({2, 2}));
gpu_tensor.CopyFromVector<int>(src_vec, gpu_ctx); gpu_tensor.CopyFromVector<int>(src_vec, gpu_ctx);
dst_tensor.CopyFrom<int>(gpu_tensor, *cpu_place, gpu_ctx); dst_tensor.CopyFrom(gpu_tensor, *cpu_place, gpu_ctx);
// Sync before Compare Tensors // Sync before Compare Tensors
gpu_ctx.Wait(); gpu_ctx.Wait();
...@@ -368,7 +368,7 @@ TEST(Tensor, ReshapeToMatrix) { ...@@ -368,7 +368,7 @@ TEST(Tensor, ReshapeToMatrix) {
for (int i = 0; i < 2 * 3 * 4 * 9; ++i) { for (int i = 0; i < 2 * 3 * 4 * 9; ++i) {
src_ptr[i] = i; src_ptr[i] = i;
} }
Tensor res = ReshapeToMatrix<int>(src, 2); Tensor res = ReshapeToMatrix(src, 2);
ASSERT_EQ(res.dims()[0], 2 * 3); ASSERT_EQ(res.dims()[0], 2 * 3);
ASSERT_EQ(res.dims()[1], 4 * 9); ASSERT_EQ(res.dims()[1], 4 * 9);
} }
...@@ -79,6 +79,10 @@ class VarDescBind { ...@@ -79,6 +79,10 @@ class VarDescBind {
void SetType(VarDesc::VarType type) { desc_.set_type(type); } void SetType(VarDesc::VarType type) { desc_.set_type(type); }
bool Persistable() const { return desc_.persistable(); }
void SetPersistable(bool persistable) { desc_.set_persistable(persistable); }
private: private:
const TensorDesc &tensor_desc() const; const TensorDesc &tensor_desc() const;
TensorDesc *mutable_tensor_desc(); TensorDesc *mutable_tensor_desc();
......
...@@ -62,7 +62,7 @@ namespace paddle { ...@@ -62,7 +62,7 @@ namespace paddle {
namespace framework { namespace framework {
TEST(InferVarType, sum_op) { TEST(InferVarType, sum_op) {
auto &prog = ProgramDescBind::Instance(&GetProgramDesc()); ProgramDescBind prog;
auto *op = prog.Block(0)->AppendOp(); auto *op = prog.Block(0)->AppendOp();
op->SetType("sum"); op->SetType("sum");
op->SetInput("X", {"test_a", "test_b", "test_c"}); op->SetInput("X", {"test_a", "test_b", "test_c"});
...@@ -83,7 +83,7 @@ TEST(InferVarType, sum_op) { ...@@ -83,7 +83,7 @@ TEST(InferVarType, sum_op) {
} }
TEST(InferVarType, sum_op_without_infer_var_type) { TEST(InferVarType, sum_op_without_infer_var_type) {
auto &prog = ProgramDescBind::Instance(&GetProgramDesc()); ProgramDescBind prog;
auto *op = prog.Block(0)->AppendOp(); auto *op = prog.Block(0)->AppendOp();
op->SetType("sum_without_infer_var_type"); op->SetType("sum_without_infer_var_type");
op->SetInput("X", {"test2_a", "test2_b", "test2_c"}); op->SetInput("X", {"test2_a", "test2_b", "test2_c"});
......
...@@ -25,7 +25,10 @@ class Variable { ...@@ -25,7 +25,10 @@ class Variable {
public: public:
template <typename T> template <typename T>
const T& Get() const { const T& Get() const {
PADDLE_ENFORCE(IsType<T>(), "Variable must be type %s", typeid(T).name()); PADDLE_ENFORCE(holder_ != nullptr, "Variable must hold some thing");
PADDLE_ENFORCE(IsType<T>(),
"Variable must be type %s, the holding type is %s",
typeid(T).name(), holder_->Type().name());
return *static_cast<const T*>(holder_->Ptr()); return *static_cast<const T*>(holder_->Ptr());
} }
......
...@@ -21,6 +21,10 @@ limitations under the License. */ ...@@ -21,6 +21,10 @@ limitations under the License. */
#include "paddle/utils/Logging.h" #include "paddle/utils/Logging.h"
#include "paddle/utils/Stat.h" #include "paddle/utils/Stat.h"
#ifdef PADDLE_USE_MKLDNN
#include "paddle/gserver/layers/MKLDNNLayer.h"
#endif
#ifndef PADDLE_MOBILE_INFERENCE #ifndef PADDLE_MOBILE_INFERENCE
#include "MultiNetwork.h" #include "MultiNetwork.h"
#include "RecurrentGradientMachine.h" #include "RecurrentGradientMachine.h"
...@@ -300,6 +304,17 @@ void NeuralNetwork::backward(const UpdateCallback& callback) { ...@@ -300,6 +304,17 @@ void NeuralNetwork::backward(const UpdateCallback& callback) {
} }
} }
void NeuralNetwork::finish() {
#ifdef PADDLE_USE_MKLDNN
FOR_EACH_R(layer, layers_) {
MKLDNNLayerPtr dnnLayer = std::dynamic_pointer_cast<MKLDNNLayer>(*layer);
if (dnnLayer) {
dnnLayer->convertWeightsToPaddle();
}
}
#endif
}
Argument NeuralNetwork::getLayerOutput(const std::string& layerName) { Argument NeuralNetwork::getLayerOutput(const std::string& layerName) {
return getLayer(layerName)->getOutput(); return getLayer(layerName)->getOutput();
} }
......
...@@ -134,6 +134,9 @@ public: ...@@ -134,6 +134,9 @@ public:
const std::string& getName() const { return subModelName_; } const std::string& getName() const { return subModelName_; }
/// some finish work, like convert the weight format of MKLDNNLayers
void finish();
protected: protected:
/** /**
* The constructor of NeuralNetwork. * The constructor of NeuralNetwork.
......
...@@ -313,6 +313,7 @@ void MKLDNNConvLayer::resetOutValue( ...@@ -313,6 +313,7 @@ void MKLDNNConvLayer::resetOutValue(
cvtOutVal_ = MKLDNNMatrix::createReorder(out, cpuOutVal_); cvtOutVal_ = MKLDNNMatrix::createReorder(out, cpuOutVal_);
CHECK(cvtOutVal_) << "should not be empty"; CHECK(cvtOutVal_) << "should not be empty";
} else { } else {
cpuOut->setData(output_.value->getData());
cpuOutVal_ = out; cpuOutVal_ = out;
} }
// when output is cpu device, change the mkldnn output value and make them // when output is cpu device, change the mkldnn output value and make them
...@@ -456,17 +457,18 @@ void MKLDNNConvLayer::resetOutGrad( ...@@ -456,17 +457,18 @@ void MKLDNNConvLayer::resetOutGrad(
MKLDNNLayer::resetOutGrad(out, outVal_->getPrimitiveDesc()); MKLDNNLayer::resetOutGrad(out, outVal_->getPrimitiveDesc());
} else { } else {
const MatrixPtr& cpuOut = getOutput(CPU_DEVICE).grad; const MatrixPtr& cpuOut = getOutput(CPU_DEVICE).grad;
// always share the same grad data of CPU output
// then the activation can get the right grad from output_.grad
output_.grad->setData(cpuOut->getData());
// same PrimitiveDesc with cpuInVal_ // same PrimitiveDesc with cpuInVal_
CHECK(cpuOutVal_); CHECK(cpuOutVal_);
cpuOutGrad_ = MKLDNNMatrix::create(cpuOut, cpuOutVal_->getPrimitiveDesc()); cpuOutGrad_ = MKLDNNMatrix::create(cpuOut, cpuOutVal_->getPrimitiveDesc());
// create reorder if primitive desc does not match // create reorder if primitive desc does not match
if (cpuOutGrad_->getPrimitiveDesc() != outVal_->getPrimitiveDesc()) { if (cpuOutGrad_->getPrimitiveDesc() != outVal_->getPrimitiveDesc()) {
out = MKLDNNMatrix::create(output_.grad, outVal_->getPrimitiveDesc()); out = MKLDNNMatrix::create(nullptr, outVal_->getPrimitiveDesc());
cvtOutGrad_ = MKLDNNMatrix::createReorder(cpuOutGrad_, out); cvtOutGrad_ = MKLDNNMatrix::createReorder(cpuOutGrad_, out);
CHECK(cvtOutGrad_); CHECK(cvtOutGrad_);
} else { } else {
// share the same data of CPU output
output_.grad->setData(cpuOut->getData());
out = cpuOutGrad_; out = cpuOutGrad_;
} }
} }
......
...@@ -46,6 +46,9 @@ protected: ...@@ -46,6 +46,9 @@ protected:
// backward also need reset after reset forward handle // backward also need reset after reset forward handle
bool needResetBwd_; bool needResetBwd_;
// is output only mkldnn
bool outputOnlyMKLDNN_;
// mkldnn engine, stream and primivtives // mkldnn engine, stream and primivtives
mkldnn::engine engine_; mkldnn::engine engine_;
std::shared_ptr<MKLDNNStream> stream_; std::shared_ptr<MKLDNNStream> stream_;
...@@ -141,6 +144,9 @@ public: ...@@ -141,6 +144,9 @@ public:
updateInputData(); updateInputData();
} }
if (!outputOnlyMKLDNN_) {
clearGrads();
}
stream_->submit(pipelineFwd_); stream_->submit(pipelineFwd_);
} }
...@@ -389,7 +395,8 @@ protected: ...@@ -389,7 +395,8 @@ protected:
CHECK_EQ(outputOtherDevice_[i].deviceId, CPU_DEVICE) CHECK_EQ(outputOtherDevice_[i].deviceId, CPU_DEVICE)
<< "Only support other device is CPU yet"; << "Only support other device is CPU yet";
} }
return outputOtherDevice_.size() == 0; outputOnlyMKLDNN_ = outputOtherDevice_.size() == 0;
return outputOnlyMKLDNN_;
} }
/** /**
...@@ -398,6 +405,16 @@ protected: ...@@ -398,6 +405,16 @@ protected:
void setDevice(int id) { deviceId_ = id; } void setDevice(int id) { deviceId_ = id; }
private: private:
/**
* clear all grad
*/
void clearGrads() {
output_.grad->zeroMem();
for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
outputOtherDevice_[i].grad->zeroMem();
}
}
/** /**
* Set deviceId of the params used in this layer. * Set deviceId of the params used in this layer.
*/ */
......
...@@ -146,6 +146,7 @@ void MKLDNNPoolLayer::resetOutValue(MKLDNNMatrixPtr& out) { ...@@ -146,6 +146,7 @@ void MKLDNNPoolLayer::resetOutValue(MKLDNNMatrixPtr& out) {
cvtOutVal_ = MKLDNNMatrix::createReorder(out, cpuOutVal_); cvtOutVal_ = MKLDNNMatrix::createReorder(out, cpuOutVal_);
CHECK(cvtOutVal_) << "should not be emptry"; CHECK(cvtOutVal_) << "should not be emptry";
} else { } else {
cpuOut->setData(output_.value->getData());
cpuOutVal_ = out; cpuOutVal_ = out;
} }
output_.value = std::dynamic_pointer_cast<Matrix>(cpuOutVal_); output_.value = std::dynamic_pointer_cast<Matrix>(cpuOutVal_);
...@@ -213,15 +214,16 @@ void MKLDNNPoolLayer::resetOutGrad(MKLDNNMatrixPtr& out) { ...@@ -213,15 +214,16 @@ void MKLDNNPoolLayer::resetOutGrad(MKLDNNMatrixPtr& out) {
MKLDNNLayer::resetOutGrad(out, outVal_->getPrimitiveDesc()); MKLDNNLayer::resetOutGrad(out, outVal_->getPrimitiveDesc());
} else { } else {
const MatrixPtr& cpuOut = getOutput(CPU_DEVICE).grad; const MatrixPtr& cpuOut = getOutput(CPU_DEVICE).grad;
// always share the same grad data of CPU output
// then the activation can get the right grad from output_.grad
output_.grad->setData(cpuOut->getData());
cpuOutGrad_ = MKLDNNMatrix::create( cpuOutGrad_ = MKLDNNMatrix::create(
cpuOut, memory::dims{bs_, oc_, oh_, ow_}, format::nchw, engine_); cpuOut, memory::dims{bs_, oc_, oh_, ow_}, format::nchw, engine_);
if (cpuOutGrad_->getPrimitiveDesc() != outVal_->getPrimitiveDesc()) { if (cpuOutGrad_->getPrimitiveDesc() != outVal_->getPrimitiveDesc()) {
out = MKLDNNMatrix::create(output_.grad, outVal_->getPrimitiveDesc()); out = MKLDNNMatrix::create(nullptr, outVal_->getPrimitiveDesc());
cvtOutGrad_ = MKLDNNMatrix::createReorder(cpuOutGrad_, out); cvtOutGrad_ = MKLDNNMatrix::createReorder(cpuOutGrad_, out);
CHECK(cvtOutGrad_) << "should not be emptry"; CHECK(cvtOutGrad_) << "should not be emptry";
} else { } else {
// share the same data of CPU output
output_.grad->setData(cpuOut->getData());
out = cpuOutGrad_; out = cpuOutGrad_;
} }
} }
......
...@@ -26,7 +26,10 @@ if(WITH_MKLDNN) ...@@ -26,7 +26,10 @@ if(WITH_MKLDNN)
test_MKLDNN.cpp test_MKLDNN.cpp
MKLDNNTester.cpp MKLDNNTester.cpp
LayerGradUtil.cpp) LayerGradUtil.cpp)
add_test(NAME test_MKLDNN COMMAND test_MKLDNN) add_test(NAME test_MKLDNN
COMMAND .set_python_path.sh -d ${PADDLE_SOURCE_DIR}/python
${CMAKE_CURRENT_BINARY_DIR}/test_MKLDNN
WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle)
endif() endif()
################ test_CRFLayerGrad #################### ################ test_CRFLayerGrad ####################
......
...@@ -15,6 +15,7 @@ limitations under the License. */ ...@@ -15,6 +15,7 @@ limitations under the License. */
#include "MKLDNNTester.h" #include "MKLDNNTester.h"
#include "paddle/gserver/layers/MKLDNNBase.h" #include "paddle/gserver/layers/MKLDNNBase.h"
#include "paddle/gserver/layers/MKLDNNLayer.h" #include "paddle/gserver/layers/MKLDNNLayer.h"
#include "paddle/trainer/Trainer.h"
namespace paddle { namespace paddle {
...@@ -315,6 +316,7 @@ void MKLDNNTester::runOnce() { ...@@ -315,6 +316,7 @@ void MKLDNNTester::runOnce() {
auto& value = para->getBuf(PARAMETER_VALUE); auto& value = para->getBuf(PARAMETER_VALUE);
real lr = 1e-3; real lr = 1e-3;
value->add(*grad, lr); value->add(*grad, lr);
grad->zeroMem();
}; };
randomTopDiffs(); randomTopDiffs();
dnnLayer_->backward(updateCallback); dnnLayer_->backward(updateCallback);
...@@ -411,4 +413,143 @@ void MKLDNNTester::run(const TestConfig& dnn, ...@@ -411,4 +413,143 @@ void MKLDNNTester::run(const TestConfig& dnn,
} }
} }
void MKLDNNTester::initArgument(DataIn& data,
const std::string& configPath,
const size_t iter) {
TrainerConfigHelper config(configPath);
size_t batchSize = config.getOptConfig().batch_size();
data.inArgs.resize(iter);
data.outGrads.resize(iter);
data.paraValues.clear();
for (const auto& layer_name : config.getModelConfig().input_layer_names()) {
auto layer_config = std::find_if(config.getModelConfig().layers().begin(),
config.getModelConfig().layers().end(),
[=](const LayerConfig& layer_config) {
return layer_config.name() == layer_name;
});
CHECK(layer_config != config.getModelConfig().layers().end());
size_t layerSize = layer_config->size();
for (size_t i = 0; i < iter; ++i) {
Argument arg;
arg.value = Matrix::create(batchSize, layerSize, false, false);
arg.grad = Matrix::create(batchSize, layerSize, false, false);
arg.value->randomizeUniform();
arg.value->add(-0.5);
arg.value->sigmoid(*arg.value);
arg.grad->zeroMem();
arg.ids = VectorT<int>::create(batchSize, false);
arg.ids->rand(layerSize);
generateSequenceStartPositions(batchSize, arg.sequenceStartPositions);
data.inArgs[i].push_back(arg);
}
}
for (const auto& layer_name : config.getModelConfig().output_layer_names()) {
auto layer_config = std::find_if(config.getModelConfig().layers().begin(),
config.getModelConfig().layers().end(),
[=](const LayerConfig& layer_config) {
return layer_config.name() == layer_name;
});
CHECK(layer_config != config.getModelConfig().layers().end());
size_t layerSize = layer_config->size();
for (size_t i = 0; i < iter; ++i) {
MatrixPtr grad = Matrix::create(batchSize, layerSize, false, false);
grad->randomizeUniform();
data.outGrads[i].push_back(grad);
}
}
for (const auto& para_config : config.getModelConfig().parameters()) {
VectorPtr value = Vector::create(para_config.size(), false);
value->randnorm(0, 2);
data.paraValues.push_back(value);
}
}
void MKLDNNTester::getOutResult(const std::string& configPath,
DataIn& in,
DataOut& out,
bool use_mkldnn,
size_t iter) {
FLAGS_use_gpu = false;
FLAGS_use_mkldnn = use_mkldnn;
*ThreadLocalRand::getSeed() = 1;
srand(1);
Trainer trainer;
auto config = std::make_shared<TrainerConfigHelper>(configPath);
trainer.init(config, false);
auto gradientMachine = trainer.getGradientMachine();
std::vector<ParameterPtr> parameters = gradientMachine->getParameters();
for (size_t i = 0; i < in.paraValues.size(); i++) {
parameters[i]->getBuf(PARAMETER_VALUE)->copyFrom(*in.paraValues[i]);
}
UpdateCallback simpleUpdate = [](Parameter* para) {
auto& grad = para->getBuf(PARAMETER_GRADIENT);
auto& value = para->getBuf(PARAMETER_VALUE);
real lr = 1e-2;
value->add(*grad, lr);
grad->zeroMem();
};
vector<Argument> outArgs;
gradientMachine->start();
out.outValues.clear();
out.paraValues.clear();
for (size_t i = 0; i < iter; ++i) {
VLOG(MKLDNN_TESTS) << "runing iteration " << i;
gradientMachine->forward(in.inArgs[i], &outArgs, PASS_TRAIN);
// save forward result
for (size_t k = 0; k < outArgs.size(); k++) {
MatrixPtr value = Matrix::create(outArgs[k].value->getHeight(),
outArgs[k].value->getWidth(),
false,
false);
value->copyFrom(*outArgs[k].value);
out.outValues.push_back(value);
}
// random backward input
for (size_t k = 0; k < outArgs.size(); k++) {
outArgs[k].grad->copyFrom(*in.outGrads[i][k]);
}
gradientMachine->backward(simpleUpdate);
}
gradientMachine->finish();
// save param value
for (size_t i = 0; i < in.paraValues.size(); i++) {
VectorPtr val = Vector::create(
parameters[i]->getBuf(PARAMETER_VALUE)->getSize(), false);
val->copyFrom(*parameters[i]->getBuf(PARAMETER_VALUE));
out.paraValues.push_back(val);
}
}
void MKLDNNTester::compareResult(DataOut& ref, DataOut& dnn, float eps) {
CHECK_EQ(ref.outValues.size(), dnn.outValues.size());
CHECK_EQ(ref.paraValues.size(), dnn.paraValues.size());
for (size_t i = 0; i < ref.outValues.size(); i++) {
EXPECT_LE(fabs(compareMatrix(ref.outValues[i], dnn.outValues[i])), eps);
}
for (size_t i = 0; i < ref.paraValues.size(); i++) {
EXPECT_LE(fabs(compareVector(ref.paraValues[i], dnn.paraValues[i])), eps);
}
}
void MKLDNNTester::runBranchesTest(const std::string& configPath,
size_t iter,
float eps) {
DataIn in;
initArgument(in, configPath, iter);
DataOut outCpu, outDnn;
getOutResult(configPath, in, outCpu, false, iter);
getOutResult(configPath, in, outDnn, true, iter);
compareResult(outCpu, outDnn, eps);
}
} // namespace paddle } // namespace paddle
...@@ -33,6 +33,17 @@ class MKLDNNTester { ...@@ -33,6 +33,17 @@ class MKLDNNTester {
NUM = 2, // Number of total NUM = 2, // Number of total
}; };
struct DataIn {
std::vector<std::vector<Argument>> inArgs;
std::vector<std::vector<MatrixPtr>> outGrads;
std::vector<VectorPtr> paraValues;
};
struct DataOut {
std::vector<MatrixPtr> outValues;
std::vector<VectorPtr> paraValues;
};
protected: protected:
std::vector<TestConfig> configs_; std::vector<TestConfig> configs_;
vector<string> layerNames_; vector<string> layerNames_;
...@@ -74,7 +85,17 @@ public: ...@@ -74,7 +85,17 @@ public:
float epsilon = 1e-4, float epsilon = 1e-4,
bool log = false, bool log = false,
int level = MKLDNN_ALL); int level = MKLDNN_ALL);
void setLogLevel(int lvl) { lvl_ = lvl; } static void runBranchesTest(const std::string& configPath,
size_t iter = 3,
float eps = 1e-4);
static void initArgument(DataIn& data,
const std::string& configPath,
size_t iter = 3);
static void getOutResult(const std::string& configPath,
DataIn& in,
DataOut& out,
bool use_mkldnn,
size_t iter = 3);
private: private:
void reset(const TestConfig& dnn, const TestConfig& ref, size_t batchSize); void reset(const TestConfig& dnn, const TestConfig& ref, size_t batchSize);
...@@ -101,8 +122,9 @@ private: ...@@ -101,8 +122,9 @@ private:
void saveWgt(const vector<ParameterPtr>& from, vector<VectorPtr>& to); void saveWgt(const vector<ParameterPtr>& from, vector<VectorPtr>& to);
void restoreWgt(const vector<VectorPtr>& from, vector<ParameterPtr>& to); void restoreWgt(const vector<VectorPtr>& from, vector<ParameterPtr>& to);
double compareMatrix(const MatrixPtr& m1, const MatrixPtr& m2); static double compareMatrix(const MatrixPtr& m1, const MatrixPtr& m2);
double compareVector(const VectorPtr& v1, const VectorPtr& v2); static double compareVector(const VectorPtr& v1, const VectorPtr& v2);
static void compareResult(DataOut& ref, DataOut& dnn, float eps = 1e-4);
/** /**
* Get delta percent * Get delta percent
...@@ -111,7 +133,7 @@ private: ...@@ -111,7 +133,7 @@ private:
* else return sum(abs(a-b)) / sum(abs(b)) * else return sum(abs(a-b)) / sum(abs(b))
* The return value should be smaller than eps when passing. * The return value should be smaller than eps when passing.
*/ */
double getDelta(const real* d1, static double getDelta(const real* d1,
const real* d2, const real* d2,
size_t len, size_t len,
const float failRate = 1e-3, const float failRate = 1e-3,
......
# Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle.trainer_config_helpers import *
settings(batch_size=16)
channels = get_config_arg("channels", int, 2)
def two_conv(input, group_name):
out1 = img_conv_layer(input=input,
name=group_name+'_conv1',
filter_size=1,
num_filters=channels,
padding=0,
shared_biases=True,
act=ReluActivation())
out2 = img_conv_layer(input=input,
name=group_name+'_conv2',
filter_size=3,
num_filters=channels,
padding=1,
shared_biases=True,
act=ReluActivation())
return out1, out2
data = data_layer(name ="input", size=channels*16*16)
conv = img_conv_layer(input=data,
num_channels=channels,
filter_size=3,
num_filters=channels,
padding=1,
shared_biases=True,
act=ReluActivation())
a1, a2 = two_conv(input=conv, group_name='a')
concat = concat_layer(input=[a1, a2])
b1, b2 = two_conv(input=conv, group_name='b')
addto = addto_layer(input=[b1, b2])
outputs([concat, addto])
# Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle.trainer_config_helpers import *
settings(batch_size=16)
channels = get_config_arg("channels", int, 2)
def two_fc(input, group_name):
out1 = fc_layer(input=input,
name=group_name+'_fc1',
size=channels,
bias_attr=False,
act=LinearActivation())
out2 = fc_layer(input=input,
name=group_name+'_fc2',
size=channels,
bias_attr=False,
act=LinearActivation())
return out1, out2
data = data_layer(name ="input", size=channels*16*16)
conv = img_conv_layer(input=data,
num_channels=channels,
filter_size=3,
num_filters=channels,
padding=1,
shared_biases=True,
act=LinearActivation())
pool = img_pool_layer(input=conv,
pool_size=3,
stride=2,
padding=1,
pool_type=AvgPooling())
a1, a2 = two_fc(input=pool, group_name='a')
concat = concat_layer(input=[a1, a2])
b1, b2 = two_fc(input=pool, group_name='b')
addto = addto_layer(input=[b1, b2])
outputs([concat, addto])
# Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle.trainer_config_helpers import *
settings(batch_size=16)
channels = get_config_arg("channels", int, 2)
def two_pool(input, group_name):
out1 = img_pool_layer(input=input,
name=group_name+'_pool1',
pool_size=3,
stride=2,
padding=0,
pool_type=MaxPooling())
out2 = img_pool_layer(input=input,
name=group_name+'_pool2',
pool_size=5,
stride=2,
padding=1,
pool_type=MaxPooling())
return out1, out2
data = data_layer(name ="input", size=channels*16*16)
conv = img_conv_layer(input=data,
num_channels=channels,
filter_size=3,
num_filters=channels,
padding=1,
shared_biases=True,
act=LinearActivation())
pool = img_pool_layer(input=conv,
pool_size=3,
stride=1,
padding=1,
pool_type=AvgPooling())
a1, a2 = two_pool(input=pool, group_name='a')
concat = concat_layer(input=[a1, a2])
b1, b2 = two_pool(input=pool, group_name='b')
addto = addto_layer(input=[b1, b2])
outputs([concat, addto])
...@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and ...@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include <gtest/gtest.h> #include <gtest/gtest.h>
#include <paddle/utils/PythonUtil.h>
#include <string> #include <string>
#include <vector> #include <vector>
#include "MKLDNNTester.h" #include "MKLDNNTester.h"
...@@ -40,12 +41,13 @@ DECLARE_bool(use_mkldnn); ...@@ -40,12 +41,13 @@ DECLARE_bool(use_mkldnn);
struct testFcDesc { struct testFcDesc {
int bs; int bs;
int ic; int ic;
int oc;
int ih, iw; // oh == ow == 1 int ih, iw; // oh == ow == 1
int oc;
}; };
static void getMKLDNNFcConfig(TestConfig& cfg, const testFcDesc& pm) { static void getMKLDNNFcConfig(TestConfig& cfg, const testFcDesc& pm) {
cfg.layerConfig.set_type("mkldnn_fc"); cfg.layerConfig.set_type("mkldnn_fc");
cfg.layerConfig.set_active_type("relu");
cfg.layerConfig.set_size(pm.oc); cfg.layerConfig.set_size(pm.oc);
cfg.inputDefs.push_back( cfg.inputDefs.push_back(
{INPUT_DATA, {INPUT_DATA,
...@@ -86,6 +88,7 @@ struct testConvDesc { ...@@ -86,6 +88,7 @@ struct testConvDesc {
static void getMKLDNNConvConfig(TestConfig& cfg, const testConvDesc& pm) { static void getMKLDNNConvConfig(TestConfig& cfg, const testConvDesc& pm) {
cfg.layerConfig.set_type("mkldnn_conv"); cfg.layerConfig.set_type("mkldnn_conv");
cfg.layerConfig.set_active_type("relu");
cfg.layerConfig.set_num_filters(pm.oc); cfg.layerConfig.set_num_filters(pm.oc);
cfg.layerConfig.set_size(pm.oc * pm.oh * pm.ow); cfg.layerConfig.set_size(pm.oc * pm.oh * pm.ow);
cfg.layerConfig.set_shared_biases(true); cfg.layerConfig.set_shared_biases(true);
...@@ -158,6 +161,7 @@ struct testPoolDesc { ...@@ -158,6 +161,7 @@ struct testPoolDesc {
static void getMKLDNNPoolConfig(TestConfig& cfg, const testPoolDesc& pm) { static void getMKLDNNPoolConfig(TestConfig& cfg, const testPoolDesc& pm) {
cfg.layerConfig.set_type("mkldnn_pool"); cfg.layerConfig.set_type("mkldnn_pool");
cfg.layerConfig.set_active_type("relu");
cfg.layerConfig.set_size(pm.ic * pm.oh * pm.ow); cfg.layerConfig.set_size(pm.ic * pm.oh * pm.ow);
cfg.inputDefs.push_back( cfg.inputDefs.push_back(
{INPUT_DATA, {INPUT_DATA,
...@@ -244,13 +248,26 @@ TEST(MKLDNNActivation, Activations) { ...@@ -244,13 +248,26 @@ TEST(MKLDNNActivation, Activations) {
} }
} }
// TODO(TJ): add branch test DECLARE_string(config_args);
TEST(MKLDNNLayer, branches) {
std::vector<std::string> cases = {"conv", "pool", "fc"};
for (auto name : cases) {
std::string config = "./gserver/tests/mkldnn_branches_" + name + ".conf";
for (auto channels : {2, 32}) {
std::ostringstream oss;
oss << "channels=" << channels;
FLAGS_config_args = oss.str();
MKLDNNTester::runBranchesTest(config);
}
}
}
int main(int argc, char** argv) { int main(int argc, char** argv) {
testing::InitGoogleTest(&argc, argv); testing::InitGoogleTest(&argc, argv);
FLAGS_use_gpu = false; FLAGS_use_gpu = false;
FLAGS_use_mkldnn = true; FLAGS_use_mkldnn = true;
initMain(argc, argv); initMain(argc, argv);
initPython(argc, argv);
FLAGS_thread_local_rand_use_global_seed = true; FLAGS_thread_local_rand_use_global_seed = true;
srand(1); srand(1);
return RUN_ALL_TESTS(); return RUN_ALL_TESTS();
......
...@@ -69,5 +69,8 @@ information, or not. But the output only shares the LoD with input `Inference`. ...@@ -69,5 +69,8 @@ information, or not. But the output only shares the LoD with input `Inference`.
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(accuracy, ops::AccuracyOp, ops::AccuracyOpMaker); REGISTER_OP_WITHOUT_GRADIENT(accuracy, ops::AccuracyOp, ops::AccuracyOpMaker);
REGISTER_OP_CPU_KERNEL(accuracy, REGISTER_OP_CPU_KERNEL(
ops::AccuracyKernel<paddle::platform::CPUPlace, float>); accuracy, ops::AccuracyKernel<paddle::platform::CPUPlace, float>,
ops::AccuracyKernel<paddle::platform::CPUPlace, int>,
ops::AccuracyKernel<paddle::platform::CPUPlace, double>,
ops::AccuracyKernel<paddle::platform::CPUPlace, int64_t>);
...@@ -21,9 +21,9 @@ namespace paddle { ...@@ -21,9 +21,9 @@ namespace paddle {
namespace operators { namespace operators {
using platform::PADDLE_CUDA_NUM_THREADS; using platform::PADDLE_CUDA_NUM_THREADS;
template <int BlockSize> template <typename T, int BlockSize>
__global__ void AccuracyCudaKernel(const int N, const int D, const int* Xdata, __global__ void AccuracyCudaKernel(const int N, const int D, const T* Xdata,
const int* labeldata, float* accuracy) { const T* labeldata, float* accuracy) {
int count = 0; int count = 0;
__shared__ int total[BlockSize]; __shared__ int total[BlockSize];
...@@ -57,8 +57,8 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> { ...@@ -57,8 +57,8 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
auto* accuracy = ctx.Output<Tensor>("Accuracy"); auto* accuracy = ctx.Output<Tensor>("Accuracy");
// FIXME(typhoonzero): only support indices currently // FIXME(typhoonzero): only support indices currently
// if add support for output values, how to detect the data type? // if add support for output values, how to detect the data type?
const int* inference_data = inference->data<int>(); const T* inference_data = inference->data<T>();
const int* label_data = label->data<int>(); const T* label_data = label->data<T>();
float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace()); float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace());
size_t num_samples = inference->dims()[0]; size_t num_samples = inference->dims()[0];
...@@ -69,7 +69,7 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> { ...@@ -69,7 +69,7 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
return; return;
} }
AccuracyCudaKernel<PADDLE_CUDA_NUM_THREADS><<< AccuracyCudaKernel<T, PADDLE_CUDA_NUM_THREADS><<<
1, PADDLE_CUDA_NUM_THREADS, 0, 1, PADDLE_CUDA_NUM_THREADS, 0,
reinterpret_cast<const platform::CUDADeviceContext&>( reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context()) ctx.device_context())
...@@ -81,5 +81,7 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> { ...@@ -81,5 +81,7 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
REGISTER_OP_GPU_KERNEL(accuracy, REGISTER_OP_GPU_KERNEL(accuracy, paddle::operators::AccuracyOpCUDAKernel<float>,
paddle::operators::AccuracyOpCUDAKernel<float>); paddle::operators::AccuracyOpCUDAKernel<double>,
paddle::operators::AccuracyOpCUDAKernel<int>,
paddle::operators::AccuracyOpCUDAKernel<int64_t>);
...@@ -43,10 +43,6 @@ class AdamOp : public framework::OperatorWithKernel { ...@@ -43,10 +43,6 @@ class AdamOp : public framework::OperatorWithKernel {
"Output(Moment1Out) of AdamOp should not be null."); "Output(Moment1Out) of AdamOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Moment2Out"), PADDLE_ENFORCE(ctx->HasOutput("Moment2Out"),
"Output(Moment2Out) of AdamOp should not be null."); "Output(Moment2Out) of AdamOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Beta1PowOut"),
"Output(Beta1PowOut) of AdamOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Beta2PowOut"),
"Output(Beta2PowOut) of AdamOp should not be null.");
auto lr_dims = ctx->GetInputDim("LearningRate"); auto lr_dims = ctx->GetInputDim("LearningRate");
PADDLE_ENFORCE_EQ(framework::product(lr_dims), 1, PADDLE_ENFORCE_EQ(framework::product(lr_dims), 1,
...@@ -72,8 +68,6 @@ class AdamOp : public framework::OperatorWithKernel { ...@@ -72,8 +68,6 @@ class AdamOp : public framework::OperatorWithKernel {
ctx->SetOutputDim("ParamOut", param_dims); ctx->SetOutputDim("ParamOut", param_dims);
ctx->SetOutputDim("Moment1Out", param_dims); ctx->SetOutputDim("Moment1Out", param_dims);
ctx->SetOutputDim("Moment2Out", param_dims); ctx->SetOutputDim("Moment2Out", param_dims);
ctx->SetOutputDim("Beta1PowOut", beta1_pow_dims);
ctx->SetOutputDim("Beta2PowOut", beta2_pow_dims);
} }
}; };
...@@ -92,8 +86,6 @@ class AdamOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -92,8 +86,6 @@ class AdamOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("ParamOut", "(Tensor) Output parameter"); AddOutput("ParamOut", "(Tensor) Output parameter");
AddOutput("Moment1Out", "(Tensor) Output first moment"); AddOutput("Moment1Out", "(Tensor) Output first moment");
AddOutput("Moment2Out", "(Tensor) Output second moment"); AddOutput("Moment2Out", "(Tensor) Output second moment");
AddOutput("Beta1PowOut", "(Tensor) Output beta1 power accumulator");
AddOutput("Beta2PowOut", "(Tensor) Output beta2 power accumulator");
AddAttr<float>("beta1", AddAttr<float>("beta1",
"(float, default 0.9) " "(float, default 0.9) "
...@@ -121,10 +113,8 @@ Adam updates: ...@@ -121,10 +113,8 @@ Adam updates:
moment1_out = beta1 * moment1 + (1 − beta1) * grad moment1_out = beta1 * moment1 + (1 − beta1) * grad
moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad
beta1_pow_out = beta1_pow * beta1
beta2_pow_out = beta2_pow * beta2
learning_rate_t = learning_rate_t * learning_rate_t = learning_rate_t *
sqrt(1 - beta2_pow_out) / (1 - beta1_pow_out) sqrt(1 - beta2_pow) / (1 - beta1_pow)
param_out = param - learning_rate_t * moment1/ (sqrt(moment2) + epsilon) param_out = param - learning_rate_t * moment1/ (sqrt(moment2) + epsilon)
References: References:
......
...@@ -26,14 +26,10 @@ class AdamOpKernel : public framework::OpKernel<T> { ...@@ -26,14 +26,10 @@ class AdamOpKernel : public framework::OpKernel<T> {
auto param_out_tensor = ctx.Output<framework::Tensor>("ParamOut"); auto param_out_tensor = ctx.Output<framework::Tensor>("ParamOut");
auto moment1_out_tensor = ctx.Output<framework::Tensor>("Moment1Out"); auto moment1_out_tensor = ctx.Output<framework::Tensor>("Moment1Out");
auto moment2_out_tensor = ctx.Output<framework::Tensor>("Moment2Out"); auto moment2_out_tensor = ctx.Output<framework::Tensor>("Moment2Out");
auto beta1_pow_out_tensor = ctx.Output<framework::Tensor>("Beta1PowOut");
auto beta2_pow_out_tensor = ctx.Output<framework::Tensor>("Beta2PowOut");
param_out_tensor->mutable_data<T>(ctx.GetPlace()); param_out_tensor->mutable_data<T>(ctx.GetPlace());
moment1_out_tensor->mutable_data<T>(ctx.GetPlace()); moment1_out_tensor->mutable_data<T>(ctx.GetPlace());
moment2_out_tensor->mutable_data<T>(ctx.GetPlace()); moment2_out_tensor->mutable_data<T>(ctx.GetPlace());
beta1_pow_out_tensor->mutable_data<T>(ctx.GetPlace());
beta2_pow_out_tensor->mutable_data<T>(ctx.GetPlace());
float beta1 = ctx.Attr<float>("beta1"); float beta1 = ctx.Attr<float>("beta1");
float beta2 = ctx.Attr<float>("beta2"); float beta2 = ctx.Attr<float>("beta2");
...@@ -56,18 +52,13 @@ class AdamOpKernel : public framework::OpKernel<T> { ...@@ -56,18 +52,13 @@ class AdamOpKernel : public framework::OpKernel<T> {
auto param_out = framework::EigenVector<T>::Flatten(*param_out_tensor); auto param_out = framework::EigenVector<T>::Flatten(*param_out_tensor);
auto moment1_out = framework::EigenVector<T>::Flatten(*moment1_out_tensor); auto moment1_out = framework::EigenVector<T>::Flatten(*moment1_out_tensor);
auto moment2_out = framework::EigenVector<T>::Flatten(*moment2_out_tensor); auto moment2_out = framework::EigenVector<T>::Flatten(*moment2_out_tensor);
auto beta1_pow_out =
framework::EigenVector<T>::Flatten(*beta1_pow_out_tensor);
auto beta2_pow_out =
framework::EigenVector<T>::Flatten(*beta2_pow_out_tensor);
auto place = ctx.GetEigenDevice<Place>(); auto place = ctx.GetEigenDevice<Place>();
moment1_out.device(place) = beta1 * moment1 + (1 - beta1) * grad; moment1_out.device(place) = beta1 * moment1 + (1 - beta1) * grad;
moment2_out.device(place) = beta2 * moment2 + (1 - beta2) * grad.square(); moment2_out.device(place) = beta2 * moment2 + (1 - beta2) * grad.square();
beta1_pow_out.device(place) = beta1_pow * beta1;
beta2_pow_out.device(place) = beta2_pow * beta2;
// All of these are tensors of 1 element // All of these are tensors of 1 element
auto lr_t = lr * (1 - beta2_pow_out).sqrt() / (1 - beta1_pow_out); auto lr_t = lr * (1 - beta2_pow).sqrt() / (1 - beta1_pow);
// Eigen does not support automatic broadcast // Eigen does not support automatic broadcast
// Get dimensions of moment vector to broadcast lr_t // Get dimensions of moment vector to broadcast lr_t
Eigen::DSizes<int, 1> m_dsize(moment1_out_tensor->numel()); Eigen::DSizes<int, 1> m_dsize(moment1_out_tensor->numel());
......
...@@ -41,8 +41,6 @@ class AdamaxOp : public framework::OperatorWithKernel { ...@@ -41,8 +41,6 @@ class AdamaxOp : public framework::OperatorWithKernel {
"Output(MomentOut) of AdamaxOp should not be null."); "Output(MomentOut) of AdamaxOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("InfNormOut"), PADDLE_ENFORCE(ctx->HasOutput("InfNormOut"),
"Output(InfNormOut) of AdamaxOp should not be null."); "Output(InfNormOut) of AdamaxOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Beta1PowOut"),
"Output(Beta1PowOut) of AdamaxOp should not be null.");
auto lr_dims = ctx->GetInputDim("LearningRate"); auto lr_dims = ctx->GetInputDim("LearningRate");
PADDLE_ENFORCE_EQ(framework::product(lr_dims), 1, PADDLE_ENFORCE_EQ(framework::product(lr_dims), 1,
...@@ -64,7 +62,6 @@ class AdamaxOp : public framework::OperatorWithKernel { ...@@ -64,7 +62,6 @@ class AdamaxOp : public framework::OperatorWithKernel {
ctx->SetOutputDim("ParamOut", param_dims); ctx->SetOutputDim("ParamOut", param_dims);
ctx->SetOutputDim("MomentOut", param_dims); ctx->SetOutputDim("MomentOut", param_dims);
ctx->SetOutputDim("InfNormOut", param_dims); ctx->SetOutputDim("InfNormOut", param_dims);
ctx->SetOutputDim("Beta1PowOut", beta1_pow_dims);
} }
}; };
...@@ -86,7 +83,6 @@ class AdamaxOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -86,7 +83,6 @@ class AdamaxOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("InfNormOut", AddOutput("InfNormOut",
"(Tensor) " "(Tensor) "
"Output exponentially weighted infinity norm"); "Output exponentially weighted infinity norm");
AddOutput("Beta1PowOut", "(Tensor) Output beta1 power accumulator");
AddAttr<float>("beta1", AddAttr<float>("beta1",
"(float, default 0.9) " "(float, default 0.9) "
...@@ -113,8 +109,7 @@ Adamax updates: ...@@ -113,8 +109,7 @@ Adamax updates:
moment_out = beta1 * moment + (1 - beta1) * grad moment_out = beta1 * moment + (1 - beta1) * grad
inf_norm_out = max(beta2 * inf_norm + epsilon, abs(grad)) inf_norm_out = max(beta2 * inf_norm + epsilon, abs(grad))
beta1_pow_out = beta1_pow * beta1 learning_rate_t = learning_rate/(1 - beta1_pow)
learning_rate_t = learning_rate/(1 - beta1_pow_out)
param_out = param - learning_rate_t * moment_out/inf_norm_out param_out = param - learning_rate_t * moment_out/inf_norm_out
The original paper does not have an epsilon attribute. The original paper does not have an epsilon attribute.
......
...@@ -26,12 +26,10 @@ class AdamaxOpKernel : public framework::OpKernel<T> { ...@@ -26,12 +26,10 @@ class AdamaxOpKernel : public framework::OpKernel<T> {
auto param_out_tensor = ctx.Output<framework::Tensor>("ParamOut"); auto param_out_tensor = ctx.Output<framework::Tensor>("ParamOut");
auto moment_out_tensor = ctx.Output<framework::Tensor>("MomentOut"); auto moment_out_tensor = ctx.Output<framework::Tensor>("MomentOut");
auto inf_norm_out_tensor = ctx.Output<framework::Tensor>("InfNormOut"); auto inf_norm_out_tensor = ctx.Output<framework::Tensor>("InfNormOut");
auto beta1_pow_out_tensor = ctx.Output<framework::Tensor>("Beta1PowOut");
param_out_tensor->mutable_data<T>(ctx.GetPlace()); param_out_tensor->mutable_data<T>(ctx.GetPlace());
moment_out_tensor->mutable_data<T>(ctx.GetPlace()); moment_out_tensor->mutable_data<T>(ctx.GetPlace());
inf_norm_out_tensor->mutable_data<T>(ctx.GetPlace()); inf_norm_out_tensor->mutable_data<T>(ctx.GetPlace());
beta1_pow_out_tensor->mutable_data<T>(ctx.GetPlace());
float beta1 = ctx.Attr<float>("beta1"); float beta1 = ctx.Attr<float>("beta1");
float beta2 = ctx.Attr<float>("beta2"); float beta2 = ctx.Attr<float>("beta2");
...@@ -53,15 +51,12 @@ class AdamaxOpKernel : public framework::OpKernel<T> { ...@@ -53,15 +51,12 @@ class AdamaxOpKernel : public framework::OpKernel<T> {
auto moment_out = framework::EigenVector<T>::Flatten(*moment_out_tensor); auto moment_out = framework::EigenVector<T>::Flatten(*moment_out_tensor);
auto inf_norm_out = auto inf_norm_out =
framework::EigenVector<T>::Flatten(*inf_norm_out_tensor); framework::EigenVector<T>::Flatten(*inf_norm_out_tensor);
auto beta1_pow_out =
framework::EigenVector<T>::Flatten(*beta1_pow_out_tensor);
auto place = ctx.GetEigenDevice<Place>(); auto place = ctx.GetEigenDevice<Place>();
moment_out.device(place) = beta1 * moment + (1 - beta1) * grad; moment_out.device(place) = beta1 * moment + (1 - beta1) * grad;
inf_norm_out.device(place) = inf_norm_out.device(place) =
grad.abs().cwiseMax((beta2 * inf_norm) + epsilon); grad.abs().cwiseMax((beta2 * inf_norm) + epsilon);
beta1_pow_out.device(place) = beta1_pow * beta1; auto lr_t = lr / (1 - beta1_pow);
auto lr_t = lr / (1 - beta1_pow_out);
Eigen::DSizes<int, 1> m_dsize(moment_out_tensor->numel()); Eigen::DSizes<int, 1> m_dsize(moment_out_tensor->numel());
param_out.device(place) = param_out.device(place) =
param - lr_t.broadcast(m_dsize) * (moment_out / inf_norm_out); param - lr_t.broadcast(m_dsize) * (moment_out / inf_norm_out);
......
# Batch Normalization
## What is batch normalization
Batch normalization is a frequently-used method in deep network training. It adjusts the mean and variance of a layer's output, and make the data distribution easier for next layer's training.
The principle of batch normalization can be summarized into a simple function:
```
y = (x - E[x]) / STD[x]) * scale + bias
```
`x` is a batch of output data of a certain layer. `E[x]` and `STD[x]` is the mean and standard deviation of `x`, respectively。 `scale` and `bias` are two trainable parameters. The training of batch normalization layer equals to the learning of best values of `scale` and `bias`.
In our design, we use a single operator(`batch_norm_op`) to implement the whole batch normalization in C++, and wrap it as a layer in Python.
## Differences with normal operators
`batch_norm_op` is a single operator. However, there are a few differences between `BatchNormOp` and normal operators, which we shall take into consideration in our design.
1. `batch_norm_op` shall behave differently in training and inferencing. For example, during inferencing, there is no batch data and it's impossible to compute `E[x]` and `STD[x]`, so we have to use an `estimated_mean` and an `estimated_variance` instead of them. These require our framework to be able to inform operators current running type (training/inferencing), then operators can switch their behaviors.
2. `batch_norm_op` shall have the ability to maintain `estimated_mean` and `estimated_variance` across mini-batch. In each mini-batch, `estimated_mean` is iterated by the following equations:
```
if batch_id == 0
estimated_mean = E[x]
else
estimated_mean = estimated_mean * momentum + (1.0 - momentum_) * E[x]
```
The iterating of `estimated_variance` is similar. `momentum` is an attribute, which controls estimated_mean updating speed.
## Implementation
Batch normalization is designed as a single operator is C++, and then wrapped as a layer in Python.
### C++
As most C++ operators do, `batch_norm_op` is defined by inputs, outputs, attributes and compute kernels.
#### Inputs
- `x`: The inputs data, which is generated by the previous layer.
- `estimated_mean`: The estimated mean of all previous data batches. It is updated in each forward propagation and will be used in inferencing to take the role of `E[x]`.
- `estimated_var`: The estimated standard deviation of all previous data batches. It is updated in each forward propagation and will be used in inferencing to take the role of `STD[x]`.
- `scale`: trainable parameter 'scale'
- `bias`: trainable parameter 'bias'
#### Outputs
- `y`: The output data.
- `batch_mean`: The mean value of batch data.
- `batch_var`: The standard deviation value of batch data.
- `saved_mean`: Updated `estimated_mean` with current batch data. It's supposed to share the memory with input `estimated_mean`.
- `saved_var`: Updated `estimated_var` with current batch data. It's supposed to share the memory with input `estimated_var`.
#### Attributes
- `is_infer`: *bool*. If true, run `batch_norm_op` in inferencing mode.
- `use_global_est`: *bool*. If true, use `saved_mean` and `saved_var` instead of `E[x]` and `STD[x]` in trainning.
- `epsilon`: *float*. The epsilon value to avoid division by zero.
- `momentum`: *float*. Factor used in `estimated_mean` and `estimated_var` updating. The usage is shown above.
#### Kernels
The following graph showes the training computational process of `batch_norm_op`:
<img src="./images/batch_norm_op_kernel.png" width="800"/>
cudnn provides APIs to finish the whole series of computation, we can use them in our GPU kernel.
### Python
`batch_norm_op` is warpped as a layer in Python:
```python
def batch_norm_layer(net,
input,
output,
scale,
bias,
use_global_est = False,
epsilon = 1e-6,
momentum = 0.99):
mean_cache = scope.new_var(name = 'estimated_mean', trainable = False)
var_cache = scop.new_var(name = 'estimated_var', trainable = False)
batch_mean = scope.new_var(name = 'batch_mean')
batch_var = scope.new_var(name = 'batch_var')
batch_norm_op = Operator('batch_norm_op',
x = input,
estimated_mean = mean_cache,
estimated_mean = var_cache,
scale = scale,
bias = bias,
y = output,
batch_mean = batch_mean,
batch_var = batch_var,
saved_mean = mean_cache,
saved_var = var_cache,
is_infer = False,
use_global_est = use_global_est,
epsilon = epsilon,
momentum = momentum)
net.append_op(batch_norm_op)
return output
```
Because Python API has not been finally decided, the code above can be regarded as pseudo code. There are a few key points we shall note:
1. `estimated_mean` and `estimated_var` are assigned the same variables with `saved_mean` and `saved_var` respectively. So they share same the memories. The output mean and variance values(`saved_mean` and `saved_var`) of a certain batch will be the inputs(`estimated_mean` and `estimated_var`) of the next batch.
2. `is_infer` decided whether `batch_norm_op` will run in training mode or inferencing mode. However, a network may contains both training and inferencing parts. And user may switch `batch_norm_op`'s running mode in Python `for` loop like this:
```python
for pass_id in range(PASS_NUM):
# ...
net.train() # run training model
if pass_id % 100 == 0:
net.infer(test_image) # run inferencing model
# ...
```
`is_infer` is an attribute. Once an operator is created, its attributes can not be changed. It suggests us that we shall maintain two `batch_norm_op` in the model, one's `is_infer` is `True`(we call it `infer_batch_norm_op`) and the other one's is `False`(we call it `train_batch_norm_op`). They share all parameters and variables, but be placed in two different branches. That is to say, if a network contains a `batch_norm_op`, it will fork into two branches, one go through `train_batch_norm_op` and the other one go through `infer_batch_norm_op`:
<div align=center>
<img src="./images/batch_norm_fork.png" width="500"/>
</div>
Just like what is shown in the above graph, the net forks before `batch_norm_op` and will never merge again. All the operators after `batch_norm_op` will duplicate.
When the net runs in training mode, the end of the left branch will be set as the running target, so the dependency tracking process will ignore right branch automatically. When the net runs in inferencing mode, the process is reversed.
How to set a target is related to Python API design, so I will leave it here waiting for more discussions.
...@@ -27,8 +27,8 @@ class ClipOp : public framework::OperatorWithKernel { ...@@ -27,8 +27,8 @@ class ClipOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE(ctx->HasOutput("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of ClipOp should not be null."); "Output(Out) of ClipOp should not be null.");
auto x_dims = ctx->GetInputDim("X"); auto x_dims = ctx->GetInputDim("X");
auto max = Attr<float>("max"); auto max = ctx->Attrs().Get<float>("max");
auto min = Attr<float>("min"); auto min = ctx->Attrs().Get<float>("min");
PADDLE_ENFORCE_LT(min, max, "max should be greater than min."); PADDLE_ENFORCE_LT(min, max, "max should be greater than min.");
ctx->SetOutputDim("Out", x_dims); ctx->SetOutputDim("Out", x_dims);
ctx->ShareLoD("X", /*->*/ "Out"); ctx->ShareLoD("X", /*->*/ "Out");
......
...@@ -108,17 +108,17 @@ class GemmConv2DKernel : public framework::OpKernel<T> { ...@@ -108,17 +108,17 @@ class GemmConv2DKernel : public framework::OpKernel<T> {
int in_step = input_channels / groups; int in_step = input_channels / groups;
int out_step = output_channels / groups; int out_step = output_channels / groups;
for (int i = 0; i < batch_size; i++) { for (int i = 0; i < batch_size; i++) {
Tensor in_batch = input->Slice<T>(i, i + 1).Resize(input_shape); Tensor in_batch = input->Slice(i, i + 1).Resize(input_shape);
Tensor out_batch = output->Slice<T>(i, i + 1).Resize(output_matrix_shape); Tensor out_batch = output->Slice(i, i + 1).Resize(output_matrix_shape);
for (int g = 0; g < groups; g++) { for (int g = 0; g < groups; g++) {
// im2col // im2col
Tensor in_slice = in_batch.Slice<T>(g * in_step, (g + 1) * in_step); Tensor in_slice = in_batch.Slice(g * in_step, (g + 1) * in_step);
im2col(context.device_context(), in_slice, col, strides[0], strides[1], im2col(context.device_context(), in_slice, col, strides[0], strides[1],
paddings[0], paddings[1]); paddings[0], paddings[1]);
// gemm // gemm
Tensor out_slice = out_batch.Slice<T>(g * out_step, (g + 1) * out_step); Tensor out_slice = out_batch.Slice(g * out_step, (g + 1) * out_step);
Tensor filter_slice = filter.Slice<T>(g * out_step, (g + 1) * out_step); Tensor filter_slice = filter.Slice(g * out_step, (g + 1) * out_step);
math::matmul<Place, T>(context.device_context(), filter_slice, false, math::matmul<Place, T>(context.device_context(), filter_slice, false,
col_matrix, false, T(1.0), &out_slice, T(0.0)); col_matrix, false, T(1.0), &out_slice, T(0.0));
} }
...@@ -198,22 +198,20 @@ class GemmConvGrad2DKernel : public framework::OpKernel<T> { ...@@ -198,22 +198,20 @@ class GemmConvGrad2DKernel : public framework::OpKernel<T> {
for (int i = 0; i < batch_size; i++) { for (int i = 0; i < batch_size; i++) {
Tensor out_grad_batch = Tensor out_grad_batch =
output_grad->Slice<T>(i, i + 1).Resize(output_matrix_shape); output_grad->Slice(i, i + 1).Resize(output_matrix_shape);
Tensor in_grad_batch = Tensor in_grad_batch = input_grad->Slice(i, i + 1).Resize(input_shape);
input_grad->Slice<T>(i, i + 1).Resize(input_shape);
for (int g = 0; g < groups; g++) { for (int g = 0; g < groups; g++) {
// gemm // gemm
Tensor out_grad_slice = Tensor out_grad_slice =
out_grad_batch.Slice<T>(g * out_step, (g + 1) * out_step); out_grad_batch.Slice(g * out_step, (g + 1) * out_step);
Tensor filter_slice = Tensor filter_slice = filter.Slice(g * out_step, (g + 1) * out_step);
filter.Slice<T>(g * out_step, (g + 1) * out_step);
math::matmul<Place, T>(context.device_context(), filter_slice, true, math::matmul<Place, T>(context.device_context(), filter_slice, true,
out_grad_slice, false, T(1.0), &col_matrix, out_grad_slice, false, T(1.0), &col_matrix,
T(0.0)); T(0.0));
// col2im // col2im
Tensor in_grad_slice = Tensor in_grad_slice =
in_grad_batch.Slice<T>(g * in_step, (g + 1) * in_step); in_grad_batch.Slice(g * in_step, (g + 1) * in_step);
col2im(context.device_context(), in_grad_slice, col, strides[0], col2im(context.device_context(), in_grad_slice, col, strides[0],
strides[1], paddings[0], paddings[1]); strides[1], paddings[0], paddings[1]);
} }
...@@ -229,19 +227,19 @@ class GemmConvGrad2DKernel : public framework::OpKernel<T> { ...@@ -229,19 +227,19 @@ class GemmConvGrad2DKernel : public framework::OpKernel<T> {
for (int i = 0; i < batch_size; i++) { for (int i = 0; i < batch_size; i++) {
Tensor out_grad_batch = Tensor out_grad_batch =
output_grad->Slice<T>(i, i + 1).Resize(output_matrix_shape); output_grad->Slice(i, i + 1).Resize(output_matrix_shape);
Tensor in_batch = input->Slice<T>(i, i + 1).Resize(input_shape); Tensor in_batch = input->Slice(i, i + 1).Resize(input_shape);
for (int g = 0; g < groups; g++) { for (int g = 0; g < groups; g++) {
// im2col // im2col
Tensor out_grad_slice = Tensor out_grad_slice =
out_grad_batch.Slice<T>(g * out_step, (g + 1) * out_step); out_grad_batch.Slice(g * out_step, (g + 1) * out_step);
Tensor in_slice = in_batch.Slice<T>(g * in_step, (g + 1) * in_step); Tensor in_slice = in_batch.Slice(g * in_step, (g + 1) * in_step);
im2col(context.device_context(), in_slice, col, strides[0], im2col(context.device_context(), in_slice, col, strides[0],
strides[1], paddings[0], paddings[1]); strides[1], paddings[0], paddings[1]);
// gemm // gemm
Tensor filter_grad_slice = Tensor filter_grad_slice =
filter_grad_.Slice<T>(g * out_step, (g + 1) * out_step); filter_grad_.Slice(g * out_step, (g + 1) * out_step);
math::matmul<Place, T>(context.device_context(), out_grad_slice, math::matmul<Place, T>(context.device_context(), out_grad_slice,
false, col_matrix, true, T(1.0), false, col_matrix, true, T(1.0),
&filter_grad_slice, T(1.0)); &filter_grad_slice, T(1.0));
......
...@@ -23,6 +23,7 @@ using framework::Scope; ...@@ -23,6 +23,7 @@ using framework::Scope;
using framework::TensorArray; using framework::TensorArray;
using framework::LoDTensor; using framework::LoDTensor;
using framework::Variable; using framework::Variable;
using framework::OperatorBase;
using framework::DySeqMetaBatch; using framework::DySeqMetaBatch;
namespace detail { namespace detail {
...@@ -43,72 +44,72 @@ inline void CreateVariables(Scope& scope, ...@@ -43,72 +44,72 @@ inline void CreateVariables(Scope& scope,
* be reordered, but the RNN op should not change the `boot_state` as an input * be reordered, but the RNN op should not change the `boot_state` as an input
* variable's content. * variable's content.
*/ */
template <typename T> inline void ReorderInitialState(const DySeqMetaBatch& metas,
inline void ReorderBootState(const DySeqMetaBatch& metas,
const LoDTensor& boot_state, LoDTensor* tensor, const LoDTensor& boot_state, LoDTensor* tensor,
const platform::Place& dst_place) { const platform::Place& dst_place) {
for (size_t seq_id = 0; seq_id < metas.size(); seq_id++) { for (size_t seq_id = 0; seq_id < metas.size(); seq_id++) {
auto slice = tensor->Slice<T>(seq_id, seq_id + 1); auto slice = tensor->Slice(seq_id, seq_id + 1);
auto boot_slice = auto boot_slice =
boot_state.Slice<T>(metas[seq_id].ori_idx, metas[seq_id].ori_idx + 1); boot_state.Slice(metas[seq_id].ori_idx, metas[seq_id].ori_idx + 1);
// TODO(superjom) pass in device context as an argument // TODO(superjom) pass in device context as an argument
slice.template CopyFrom<T>(boot_slice, dst_place, slice.CopyFrom(boot_slice, dst_place, platform::CPUDeviceContext());
platform::CPUDeviceContext());
} }
} }
} // namespace detail inline void RestoreInitialState(const DySeqMetaBatch& metas,
const LoDTensor& tensor, LoDTensor* boot_state,
class DynamicRecurrentOpProtoAndCheckerMaker const platform::Place& dst_place) {
: public framework::OpProtoAndCheckerMaker { for (size_t seq_id = 0; seq_id < metas.size(); seq_id++) {
public: auto slice = tensor.Slice(seq_id, seq_id + 1);
DynamicRecurrentOpProtoAndCheckerMaker(framework::OpProto* proto, auto boot_slice =
framework::OpAttrChecker* op_checker) boot_state->Slice(metas[seq_id].ori_idx, metas[seq_id].ori_idx + 1);
: OpProtoAndCheckerMaker(proto, op_checker) { boot_slice.CopyFrom(slice, dst_place, platform::CPUDeviceContext());
const auto& name = DynamicRecurrentOp::kArgName;
// inputs and outputs stored in proto
AddInput(name.inlinks,
"the inputs that need to be segmented for each step.")
.AsDuplicable();
AddInput(name.boot_memories, "variables to initialize memories.")
.AsDuplicable();
AddOutput(name.outlinks, "the outputs that need to concated for all steps.")
.AsDuplicable();
AddOutput(name.step_scopes, "step scopes");
// Attributes stored in AttributeMap
AddAttr<std::vector<std::string>>(name.pre_memories,
"names of pre-memories");
AddAttr<std::vector<std::string>>(name.memories, "names of memories");
AddComment("This is a RNN operator for varience-length sequences.");
} }
}; }
void DynamicRecurrentOp::Run(const Scope& scope, } // namespace detail
const platform::DeviceContext& dev_ctx) const {
cache_.Init(kArgName, *this, scope, &arg_); // Implementation for forward propagation.
template <>
void RNNAlgorithm::Run<RNNAlgorithm::ComputeMode::kForward>(
const framework::Scope& scope, const framework::OperatorBase& op,
const platform::DeviceContext& dev_ctx) {
SetComputeMode(ComputeMode::kForward);
cache_.Init(kArgNames[mode_], op, scope, &dev_ctx, &arg_);
SplitInputs(); SplitInputs();
CreateScopes(); CreateScopes();
WriteStepInputs(); WriteStepInputs();
InitStates(); InitStates();
WriteStepOutputs(); WriteStepOutputs();
RunSteps();
ConcatOutputs();
}
// call stepnet in all the time steps // Implementation for backward propagation.
for (size_t step = 0; step < cache_.num_steps; step++) { template <>
auto& step_scope = cache_.GetScope(step); void RNNAlgorithm::Run<RNNAlgorithm::ComputeMode::kBackward>(
stepnet_->Run(step_scope, dev_ctx); const framework::Scope& scope, const framework::OperatorBase& op,
const platform::DeviceContext& dev_ctx) {
SetComputeMode(ComputeMode::kBackward);
cache_.Init(kArgNames[mode_], op, scope, &dev_ctx, &arg_);
SplitInputs();
WriteStepInputs();
InitStates();
WriteStepOutputs();
RunSteps();
// copy boot-states' gradients back.
for (const auto& state : arg_.states) {
ExportInitialStateGradient(state);
} }
ConcatOutputs(); ConcatOutputs();
} }
void DynamicRecurrentOp::SplitInputs() const { void RNNAlgorithm::SplitInputs() {
// TODO(superjom) make level a config // TODO(superjom) make level a config
// TODO(superjom) check all the inputs has the same LoD // TODO(superjom) check all the inputs has the same LoD
int level = 0; int level = 0;
for (const auto& item : cache_.inlinks) { for (const auto& item : cache_.inputs) {
const auto& var = item.second; const auto& var = item.second;
const auto& tensor = var->Get<LoDTensor>(); const auto& tensor = var->Get<LoDTensor>();
TensorArray& ta = step_inputs_[item.first]; TensorArray& ta = step_inputs_[item.first];
...@@ -125,8 +126,8 @@ void DynamicRecurrentOp::SplitInputs() const { ...@@ -125,8 +126,8 @@ void DynamicRecurrentOp::SplitInputs() const {
} }
} }
void DynamicRecurrentOp::WriteStepInputs() const { void RNNAlgorithm::WriteStepInputs() {
for (const auto& item : cache_.inlinks) { for (const auto& item : cache_.inputs) {
auto ta_it = step_inputs_.find(item.first); auto ta_it = step_inputs_.find(item.first);
PADDLE_ENFORCE(ta_it != step_inputs_.end(), PADDLE_ENFORCE(ta_it != step_inputs_.end(),
"step_inputs_ not compatible with memory set"); "step_inputs_ not compatible with memory set");
...@@ -138,20 +139,20 @@ void DynamicRecurrentOp::WriteStepInputs() const { ...@@ -138,20 +139,20 @@ void DynamicRecurrentOp::WriteStepInputs() const {
if (var == nullptr) { if (var == nullptr) {
var = step_scope.Var(item.first); var = step_scope.Var(item.first);
} }
var->GetMutable<LoDTensor>()->ShareDataWith<value_type>(tensor); var->GetMutable<LoDTensor>()->ShareDataWith(tensor);
} }
} }
} }
void DynamicRecurrentOp::WriteStepOutputs() const { void RNNAlgorithm::WriteStepOutputs() {
// initialize step outputs // initialize step outputs
for (const auto& item : cache_.outlinks) { for (const auto& item : cache_.outputs) {
step_outputs_.emplace(item.first, TensorArray()); step_outputs_.emplace(item.first, TensorArray());
} }
PADDLE_ENFORCE_GT(step_outputs_.size(), 0UL); PADDLE_ENFORCE_GT(step_outputs_.size(), 0UL);
} }
void DynamicRecurrentOp::CreateScopes() const { void RNNAlgorithm::CreateScopes() {
PADDLE_ENFORCE_GT(cache_.num_steps, 0); PADDLE_ENFORCE_GT(cache_.num_steps, 0);
// resize scopes // resize scopes
size_t num_scopes_need_create = cache_.num_steps - cache_.scopes->size(); size_t num_scopes_need_create = cache_.num_steps - cache_.scopes->size();
...@@ -160,19 +161,19 @@ void DynamicRecurrentOp::CreateScopes() const { ...@@ -160,19 +161,19 @@ void DynamicRecurrentOp::CreateScopes() const {
} }
// init temporary inputs // init temporary inputs
PADDLE_ENFORCE_NOT_NULL(stepnet_, "stepnet should be set first"); PADDLE_ENFORCE_NOT_NULL(step_unit_, "stepnet should be set first");
std::vector<std::string> memories; std::vector<std::string> states;
std::vector<std::string> pre_memories; std::vector<std::string> ex_states;
std::vector<std::string> stepnet_outputs; std::vector<std::string> step_unit_outputs;
std::transform(arg_.memories.begin(), arg_.memories.end(), std::transform(arg_.states.begin(), arg_.states.end(),
std::back_inserter(memories), std::back_inserter(states),
[](const rnn::MemoryAttr& m) { return m.var; }); [](const rnn::StateAttr& m) { return m.var; });
std::transform(arg_.memories.begin(), arg_.memories.end(), std::transform(arg_.states.begin(), arg_.states.end(),
std::back_inserter(pre_memories), std::back_inserter(ex_states),
[](const rnn::MemoryAttr& m) { return m.pre_var; }); [](const rnn::StateAttr& m) { return m.pre_var; });
for (const auto& item : stepnet_->Outputs()) { for (const auto& item : step_unit_->Outputs()) {
for (const auto& var : item.second) { for (const auto& var : item.second) {
stepnet_outputs.push_back(var); step_unit_outputs.push_back(var);
} }
} }
...@@ -180,13 +181,13 @@ void DynamicRecurrentOp::CreateScopes() const { ...@@ -180,13 +181,13 @@ void DynamicRecurrentOp::CreateScopes() const {
auto& scope = cache_.GetScope(step); auto& scope = cache_.GetScope(step);
detail::CreateVariables(scope, arg_.inlinks); detail::CreateVariables(scope, arg_.inlinks);
detail::CreateVariables(scope, arg_.outlinks); detail::CreateVariables(scope, arg_.outlinks);
detail::CreateVariables(scope, memories); detail::CreateVariables(scope, states);
detail::CreateVariables(scope, pre_memories); detail::CreateVariables(scope, ex_states);
detail::CreateVariables(scope, stepnet_outputs); detail::CreateVariables(scope, step_unit_outputs);
} }
} }
void DynamicRecurrentOp::ConcatOutputs() const { void RNNAlgorithm::ConcatOutputs() {
// TODO(superjom) transform this to a config // TODO(superjom) transform this to a config
int level = 0; int level = 0;
for (size_t step = 0; step < cache_.num_steps; step++) { for (size_t step = 0; step < cache_.num_steps; step++) {
...@@ -199,31 +200,45 @@ void DynamicRecurrentOp::ConcatOutputs() const { ...@@ -199,31 +200,45 @@ void DynamicRecurrentOp::ConcatOutputs() const {
item.second.WriteShared(step, *tensor); item.second.WriteShared(step, *tensor);
} }
} }
// the inlinks' lods should be the same, so randomly get one lod. // the inputs' lods should be the same, so randomly get one lod.
const auto& some_lod = const auto& some_lod =
cache_.scope->FindVar(arg_.inlinks.front())->Get<LoDTensor>().lod(); cache_.scope->FindVar(arg_.inlinks.front())->Get<LoDTensor>().lod();
const auto& some_meta = dy_seq_metas_[arg_.inlinks.front()]; const auto& some_meta = dy_seq_metas_[arg_.inlinks.front()];
for (auto& item : step_outputs_) { for (auto& item : step_outputs_) {
auto tensor = item.second.Pack(level, some_meta, some_lod); auto tensor = item.second.Pack(level, some_meta, some_lod);
auto* output = cache_.outlinks[item.first]->GetMutable<LoDTensor>(); auto* output = cache_.outputs[item.first]->GetMutable<LoDTensor>();
const_cast<LoDTensor*>(output)->ShareDataWith<value_type>(tensor); const_cast<LoDTensor*>(output)->ShareDataWith(tensor);
}
}
void RNNAlgorithm::RunSteps() {
if (IsBackward()) {
// call stepnet in all the time steps reversely
for (int step = cache_.num_steps - 1; step >= 0; step--) {
auto& step_scope = cache_.GetScope(step);
step_unit_->Run(step_scope, *cache_.dev_ctx);
}
} else {
for (size_t step = 0; step < cache_.num_steps; step++) {
auto& step_scope = cache_.GetScope(step);
step_unit_->Run(step_scope, *cache_.dev_ctx);
}
} }
} }
void DynamicRecurrentOp::InitStates() const { void RNNAlgorithm::InitStates() {
for (size_t step = 0; step < cache_.num_steps; step++) { for (size_t step = 0; step < cache_.num_steps; step++) {
for (const auto& memory : arg_.memories) { for (const auto& state : arg_.states) {
CreateState(memory, step); CreateState(state, step);
LinkState(memory, step); LinkState(state, step);
} }
} }
} }
void DynamicRecurrentOp::CreateState(const rnn::MemoryAttr& memory, void RNNAlgorithm::CreateState(const rnn::StateAttr& state_attr, size_t step) {
size_t step) const {
auto& scope = cache_.GetScope(step); auto& scope = cache_.GetScope(step);
auto& state = *cache_.GetTensor(scope, memory.var); auto& state = *cache_.GetTensor(scope, state_attr.var);
auto& boot_state = *cache_.GetTensor(*cache_.scope, memory.boot_var); auto& boot_state = *cache_.GetTensor(*cache_.scope, state_attr.boot_var);
size_t num_instances = size_t num_instances =
step_inputs_[arg_.inlinks.front()].Read(step).dims()[0]; step_inputs_[arg_.inlinks.front()].Read(step).dims()[0];
...@@ -232,55 +247,78 @@ void DynamicRecurrentOp::CreateState(const rnn::MemoryAttr& memory, ...@@ -232,55 +247,78 @@ void DynamicRecurrentOp::CreateState(const rnn::MemoryAttr& memory,
state.Resize(dims); state.Resize(dims);
state.mutable_data<value_type>(platform::CPUPlace()); state.mutable_data<value_type>(platform::CPUPlace());
states_[memory.var].WriteShared(step, state); states_[state_attr.var].WriteShared(step, state);
} }
void DynamicRecurrentOp::LinkState(const rnn::MemoryAttr& memory, void RNNAlgorithm::LinkState(const rnn::StateAttr& state, size_t step) {
size_t step) const {
auto& scope = cache_.GetScope(step); auto& scope = cache_.GetScope(step);
auto& state_pre = *cache_.GetTensor(scope, memory.pre_var); auto& state_pre = *cache_.GetTensor(scope, state.pre_var);
// process the first state's boot-state(the 0-step in forward mode or the
// last step in backward mode)
// Only forward mode need to link the boot-state to the `pre-state` in first
// time step. In backward mode, need to copy the gradient of `pre-state` in
// first time step to the gradient of `boot-state`.
if (step == 0 && IsForward()) {
LinkInitialState(state);
} else {
size_t num_instances =
step_inputs_[arg_.inlinks.front()].Read(step).dims()[0];
auto* pre_state = cache_.GetTensor(cache_.GetScope(step - 1), state.var);
// shink and share from previous state
auto shrinked_pre_state = pre_state->Slice(0, num_instances);
state_pre.ShareDataWith(shrinked_pre_state);
}
}
void RNNAlgorithm::LinkInitialState(const rnn::StateAttr& state) {
// all the step_inputs' metas should be the same, just randomly select one // all the step_inputs' metas should be the same, just randomly select one
// and get the dyseq meta. // and get the dyseq meta.
const auto& some_meta = dy_seq_metas_[arg_.inlinks.front()]; const auto& some_meta = dy_seq_metas_[arg_.inlinks.front()];
size_t num_instances = auto& scope = cache_.GetScope(0);
step_inputs_[arg_.inlinks.front()].Read(step).dims()[0]; auto& state_pre = *cache_.GetTensor(scope, state.pre_var);
auto* pre_state = cache_.GetTensor(*cache_.scope, state.boot_var);
LoDTensor* pre_state{nullptr};
if (step == 0) {
pre_state = cache_.GetTensor(*cache_.scope, memory.boot_var);
pre_state->mutable_data<float>(platform::CPUPlace()); pre_state->mutable_data<float>(platform::CPUPlace());
// allocate memory // allocate state
state_pre.Resize(pre_state->dims()); state_pre.Resize(pre_state->dims());
state_pre.mutable_data<value_type>(platform::CPUPlace()); state_pre.mutable_data<value_type>(platform::CPUPlace());
detail::ReorderBootState<value_type>(some_meta, *pre_state, &state_pre, detail::ReorderInitialState(some_meta, *pre_state, &state_pre,
pre_state->place()); pre_state->place());
} else { }
pre_state = cache_.GetTensor(cache_.GetScope(step - 1), memory.var);
}
// shink and share from previous state void RNNAlgorithm::ExportInitialStateGradient(const rnn::StateAttr& state) {
auto shrinked_pre_state = pre_state->Slice<value_type>(0, num_instances); // all the step_inputs' metas should be the same, just randomly select one
state_pre.ShareDataWith<value_type>(shrinked_pre_state); // and get the dyseq meta.
const auto& some_meta = dy_seq_metas_[arg_.inlinks.front()];
auto& scope = cache_.GetScope(0);
auto& state_pre = *cache_.GetTensor(scope, state.pre_var);
auto& pre_state = *cache_.GetTensor(*cache_.scope, state.boot_var);
pre_state.Resize(state_pre.dims());
detail::RestoreInitialState(some_meta, state_pre, &pre_state,
pre_state.place());
} }
void DynamicRecurrentOp::ArgCache::Init( void RNNAlgorithm::ArgCache::Init(const rnn::ArgumentName& name,
const rnn::ArgumentName& name, const paddle::framework::OperatorBase& op, const paddle::framework::OperatorBase& op,
const paddle::framework::Scope& scope, rnn::Argument* arg) { const paddle::framework::Scope& scope,
platform::DeviceContext const* dev_ctx,
rnn::Argument* arg) {
this->scope = &scope; this->scope = &scope;
InitArgument(name, op, arg); InitArgument(name, op, arg);
CacheScopes(scope, *arg); CacheScopes(scope, *arg);
CacheInlinks(scope, arg->inlinks); CacheInlinks(scope, arg->inlinks);
CacheOutlinks(scope, arg->outlinks); CacheOutlinks(scope, arg->outlinks);
this->dev_ctx = dev_ctx;
} }
void DynamicRecurrentOp::ArgCache::InitArgument(const rnn::ArgumentName& name, void RNNAlgorithm::ArgCache::InitArgument(const rnn::ArgumentName& name,
const OperatorBase& op, const OperatorBase& op,
rnn::Argument* arg) { rnn::Argument* arg) {
rnn::InitArgument(name, arg, op, false /*is_grad*/); rnn::InitArgument(name, arg, op, false /*is_grad*/);
} }
void DynamicRecurrentOp::ArgCache::CacheScopes(const Scope& scope, void RNNAlgorithm::ArgCache::CacheScopes(const Scope& scope,
const rnn::Argument& arg) { const rnn::Argument& arg) {
auto scopes_var = scope.FindVar(arg.step_scopes); auto scopes_var = scope.FindVar(arg.step_scopes);
PADDLE_ENFORCE(scopes_var != nullptr, PADDLE_ENFORCE(scopes_var != nullptr,
...@@ -290,45 +328,85 @@ void DynamicRecurrentOp::ArgCache::CacheScopes(const Scope& scope, ...@@ -290,45 +328,85 @@ void DynamicRecurrentOp::ArgCache::CacheScopes(const Scope& scope,
this->scopes = scopes_var->GetMutable<std::vector<Scope*>>(); this->scopes = scopes_var->GetMutable<std::vector<Scope*>>();
} }
void DynamicRecurrentOp::ArgCache::CacheInlinks( void RNNAlgorithm::ArgCache::CacheInlinks(
const Scope& scope, const std::vector<std::string>& names) { const Scope& scope, const std::vector<std::string>& names) {
for (auto name : names) { for (auto name : names) {
auto* var = GetVariable(scope, name); auto* var = GetVariable(scope, name);
inlinks[name] = var; inputs[name] = var;
} }
} }
void DynamicRecurrentOp::ArgCache::CacheOutlinks( void RNNAlgorithm::ArgCache::CacheOutlinks(
const Scope& scope, const std::vector<std::string>& names) { const Scope& scope, const std::vector<std::string>& names) {
for (auto name : names) { for (auto name : names) {
auto* var = GetVariable(scope, name); auto* var = GetVariable(scope, name);
outlinks[name] = var; outputs[name] = var;
} }
} }
Variable* DynamicRecurrentOp::ArgCache::GetVariable(const Scope& scope, Variable* RNNAlgorithm::ArgCache::GetVariable(const Scope& scope,
const std::string& name) { const std::string& name) {
auto* var = scope.FindVar(name); auto* var = scope.FindVar(name);
PADDLE_ENFORCE_NOT_NULL(var, "variable [%s] not exist in scope", name); PADDLE_ENFORCE_NOT_NULL(var, "variable [%s] not exist in scope", name);
return var; return var;
} }
LoDTensor* DynamicRecurrentOp::ArgCache::GetTensor( LoDTensor* RNNAlgorithm::ArgCache::GetTensor(const framework::Scope& scope,
const framework::Scope& scope, const std::string& name) { const std::string& name) {
auto* var = GetVariable(scope, name); auto* var = GetVariable(scope, name);
return var->GetMutable<LoDTensor>(); return var->GetMutable<LoDTensor>();
} }
const rnn::ArgumentName DynamicRecurrentOp::kArgName{ const std::array<rnn::ArgumentName, 2> RNNAlgorithm::kArgNames{
"step_net", "step_scopes", "inlinks", "outlinks", {rnn::ArgumentName{"step_unit", "step_scopes", "inputs", "outputs",
"memories", "pre_memories", "boot_memories"}; "states", "ex_states", "initial_states"},
rnn::ArgumentName{"step_unit", "step_scopes@GRAD", "outputs@GRAD",
"inputs@GRAD", "states", "ex_states",
"initial_states@GRAD"}}};
void DynamicRecurrentOp::Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const {
rnn.Run<RNNAlgorithm::ComputeMode::kForward>(
scope, *dynamic_cast<const OperatorBase*>(this), dev_ctx);
}
void DynamicRecurrentGradientOp::Run( void DynamicRecurrentGradientOp::Run(
const Scope& scope, const platform::DeviceContext& dev_ctx) const {} const Scope& scope, const platform::DeviceContext& dev_ctx) const {
rnn.Run<RNNAlgorithm::ComputeMode::kBackward>(
scope, *dynamic_cast<const OperatorBase*>(this), dev_ctx);
}
class DynamicRecurrentOpProtoAndCheckerMaker
: public framework::OpProtoAndCheckerMaker {
public:
DynamicRecurrentOpProtoAndCheckerMaker(framework::OpProto* proto,
framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
const auto& name =
RNNAlgorithm::kArgNames[RNNAlgorithm::ComputeMode::kForward];
// inputs and outputs stored in proto
AddInput(name.inlinks,
"the inputs that need to be segmented for each step.")
.AsDuplicable();
AddInput(name.initial_states, "variables to initialize states.")
.AsDuplicable();
AddOutput(name.outlinks, "the outputs that need to concated for all steps.")
.AsDuplicable();
AddOutput(name.step_scopes, "step scopes");
// Attributes stored in AttributeMap
AddAttr<std::vector<std::string>>(name.ex_states, "names of ex_states");
AddAttr<std::vector<std::string>>(name.states, "names of states");
AddComment("This is a RNN operator for varience-length sequences.");
}
};
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
REGISTER_OP_WITHOUT_GRADIENT( REGISTER_OP(dynamic_recurrent, paddle::operators::DynamicRecurrentOp,
dynamic_recurrent, paddle::operators::DynamicRecurrentOp, paddle::operators::DynamicRecurrentOpProtoAndCheckerMaker,
paddle::operators::DynamicRecurrentOpProtoAndCheckerMaker); dynamic_recurrent_grad,
paddle::operators::DynamicRecurrentGradientOp);
...@@ -27,47 +27,39 @@ ...@@ -27,47 +27,39 @@
namespace paddle { namespace paddle {
namespace operators { namespace operators {
class DynamicRecurrentOp : public framework::OperatorBase { class RNNAlgorithm {
public: public:
static const rnn::ArgumentName kArgName; enum ComputeMode { kForward = 0, kBackward = 1 };
static const std::array<rnn::ArgumentName, 2> kArgNames;
using value_type = float; using value_type = float;
DynamicRecurrentOp(const std::string& type, /*
const framework::VariableNameMap& inputs, * Different `Run` method for forward and backward, `_` is just for template
const framework::VariableNameMap& outputs, * specifialization.
const framework::AttributeMap& attrs) */
: OperatorBase(type, inputs, outputs, attrs) {} template <ComputeMode _>
void Run(const framework::Scope& scope, const framework::OperatorBase& op,
DynamicRecurrentOp(const DynamicRecurrentOp& o) const platform::DeviceContext& dev_ctx);
: framework::OperatorBase(
static_cast<const framework::OperatorBase&>(o)) {
// TODO(yuyang18): Implement copy ctor well.
PADDLE_THROW("Not implemented");
}
void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const override;
/* /*
* Split the inputs(LoDTensors) to segments for each time step. * Split the inputs(LoDTensors) to segments for each time step.
*/ */
void SplitInputs() const; void SplitInputs();
/* /*
* Create step-scopes to store temporary outputs in each time steps. * Create step-scopes to store temporary outputs in each time steps.
*/ */
void CreateScopes() const; void CreateScopes();
/* /*
* Link TensorArray steps to the corresponding variables located in * Link TensorArray steps to the corresponding variables located in
* step-scopes. * step-scopes.
*/ */
void WriteStepInputs() const; void WriteStepInputs();
/* /*
* Write output of each step to the corresponding TensorArray. * Write output of each step to the corresponding TensorArray.
*/ */
void WriteStepOutputs() const; void WriteStepOutputs();
/* /*
* Initialize the states, each state will have a corresponding pre-state, * Initialize the states, each state will have a corresponding pre-state,
...@@ -75,54 +67,83 @@ class DynamicRecurrentOp : public framework::OperatorBase { ...@@ -75,54 +67,83 @@ class DynamicRecurrentOp : public framework::OperatorBase {
* pre-state in the first time step will be initialized with an zero tensor or * pre-state in the first time step will be initialized with an zero tensor or
* a tensor in parent scope if is provided. * a tensor in parent scope if is provided.
*/ */
void InitStates() const; void InitStates();
/* /*
* Create state variables for each time step. * Create state variables for each time step.
*/ */
void CreateState(const rnn::MemoryAttr& memory, size_t step) const; void CreateState(const rnn::StateAttr& state, size_t step);
/* /*
* Link pre-state variable in current scope to the state variable in the * Link pre-state variable in current scope to the state variable in the
* previous time step (scope). * previous time step (scope) by reference.
*/
void LinkState(const rnn::StateAttr& state, size_t step);
/*
* Link the pre-state of the first time step to the `boot-state` in parent's
* scope.
*/
void LinkInitialState(const rnn::StateAttr& state);
/*
* Copy the gradient from `pre-state` in the first step-scope to the
* `boot-state` in parent's scope.
*/
void ExportInitialStateGradient(const rnn::StateAttr& state);
/*
* Calculate time steps.
*/ */
void LinkState(const rnn::MemoryAttr& memory, size_t step) const; void RunSteps();
/* /*
* Concatenate outputs in each time step and generate a LoDTensor. * Concatenate outputs in each time step and generate a LoDTensor.
*/ */
void ConcatOutputs() const; void ConcatOutputs();
void SetComputeMode(ComputeMode mode) { mode_ = mode; }
bool IsForward() const { return mode_ == ComputeMode::kForward; }
bool IsBackward() const { return mode_ == ComputeMode::kBackward; }
/* /*
* set a stepnet that is created according to a RecurrentOp's stepnet. * set a step unit that is created according to a RecurrentOp's step unit.
*/ */
void SetStepNet(std::unique_ptr<OperatorBase> net) { void SetStepUnit(std::unique_ptr<framework::OperatorBase> step_unit) {
PADDLE_ENFORCE_NOT_NULL(net); PADDLE_ENFORCE_NOT_NULL(step_unit);
stepnet_ = std::move(net); step_unit_ = std::move(step_unit);
} }
const OperatorBase& GetStepNet() const { return *stepnet_; } const framework::OperatorBase& GetStepUnit() const { return *step_unit_; }
const framework::TensorArray& state(const std::string& name) const { const framework::TensorArray& state(const std::string& name) const {
return states_[name]; auto it = states_.find(name);
PADDLE_ENFORCE(it != states_.end());
return it->second;
} }
const framework::TensorArray& step_input(const std::string& name) const { const framework::TensorArray& step_input(const std::string& name) const {
return step_inputs_[name]; auto it = step_inputs_.find(name);
PADDLE_ENFORCE(it != step_inputs_.end());
return it->second;
} }
const framework::TensorArray& step_output(const std::string& name) const { const framework::TensorArray& step_output(const std::string& name) const {
return step_outputs_[name]; auto it = step_outputs_.find(name);
PADDLE_ENFORCE(it != step_outputs_.end());
return it->second;
} }
protected: protected:
struct ArgCache { struct ArgCache {
framework::Scope const* scope; framework::Scope const* scope;
std::vector<framework::Scope*>* scopes; std::vector<framework::Scope*>* scopes;
std::map<std::string, framework::Variable*> inlinks; std::map<std::string, framework::Variable*> inputs;
std::map<std::string, framework::Variable*> outlinks; std::map<std::string, framework::Variable*> outputs;
platform::DeviceContext const* dev_ctx;
size_t num_steps{0}; size_t num_steps{0};
void Init(const rnn::ArgumentName& name, const OperatorBase& op, void Init(const rnn::ArgumentName& name, const framework::OperatorBase& op,
const framework::Scope& scope, rnn::Argument* arg); const framework::Scope& scope,
platform::DeviceContext const* dev_ctx, rnn::Argument* arg);
framework::Scope& GetScope(size_t index) { framework::Scope& GetScope(size_t index) {
PADDLE_ENFORCE_LT(index, num_steps); PADDLE_ENFORCE_LT(index, num_steps);
...@@ -133,8 +154,8 @@ class DynamicRecurrentOp : public framework::OperatorBase { ...@@ -133,8 +154,8 @@ class DynamicRecurrentOp : public framework::OperatorBase {
const std::string& name); const std::string& name);
private: private:
void InitArgument(const rnn::ArgumentName& name, const OperatorBase& op, void InitArgument(const rnn::ArgumentName& name,
rnn::Argument* arg); const framework::OperatorBase& op, rnn::Argument* arg);
void CacheScopes(const framework::Scope& scope, const rnn::Argument& arg); void CacheScopes(const framework::Scope& scope, const rnn::Argument& arg);
void CacheInlinks(const framework::Scope& scope, void CacheInlinks(const framework::Scope& scope,
const std::vector<std::string>& names); const std::vector<std::string>& names);
...@@ -145,27 +166,49 @@ class DynamicRecurrentOp : public framework::OperatorBase { ...@@ -145,27 +166,49 @@ class DynamicRecurrentOp : public framework::OperatorBase {
}; };
private: private:
std::unique_ptr<OperatorBase> stepnet_; std::unique_ptr<framework::OperatorBase> step_unit_;
mutable std::map<std::string, framework::TensorArray> states_; std::map<std::string, framework::TensorArray> states_;
mutable std::map<std::string, framework::TensorArray> step_inputs_; std::map<std::string, framework::TensorArray> step_inputs_;
mutable std::map<std::string, framework::TensorArray> step_outputs_; std::map<std::string, framework::TensorArray> step_outputs_;
mutable std::map<std::string, std::vector<framework::DySeqMeta>> std::map<std::string, std::vector<framework::DySeqMeta>> dy_seq_metas_;
dy_seq_metas_; rnn::Argument arg_;
mutable rnn::Argument arg_; ArgCache cache_;
mutable ArgCache cache_; ComputeMode mode_{ComputeMode::kForward};
#ifdef PADDLE_WITH_TESTING #ifdef PADDLE_WITH_TESTING
friend class DynamicRecurrentOpTestHelper; // test forward
FRIEND_TEST(DynamicRecurrentOpTestHelper, SplitInputs); friend class RNNAlgorithmTestHelper;
FRIEND_TEST(DynamicRecurrentOpTestHelper, CreateCache); FRIEND_TEST(RNNAlgorithmTestHelper, SplitInputs);
FRIEND_TEST(DynamicRecurrentOpTestHelper, CreateScopes); FRIEND_TEST(RNNAlgorithmTestHelper, CreateCache);
FRIEND_TEST(DynamicRecurrentOpTestHelper, WriteStepInputs); FRIEND_TEST(RNNAlgorithmTestHelper, CreateScopes);
FRIEND_TEST(DynamicRecurrentOpTestHelper, WriteStepOutputs); FRIEND_TEST(RNNAlgorithmTestHelper, WriteStepInputs);
FRIEND_TEST(DynamicRecurrentOpTestHelper, InitStates); FRIEND_TEST(RNNAlgorithmTestHelper, WriteStepOutputs);
FRIEND_TEST(DynamicRecurrentOpTestHelper, ConcatOutputs); FRIEND_TEST(RNNAlgorithmTestHelper, InitStates);
FRIEND_TEST(RNNAlgorithmTestHelper, ConcatOutputs);
// TODO(superjom) test backward
#endif #endif
}; };
class DynamicRecurrentOp : public framework::OperatorBase {
public:
DynamicRecurrentOp(const std::string& type,
const framework::VariableNameMap& inputs,
const framework::VariableNameMap& outputs,
const framework::AttributeMap& attrs)
: OperatorBase(type, inputs, outputs, attrs) {}
DynamicRecurrentOp(const DynamicRecurrentOp& o)
: framework::OperatorBase(
static_cast<const framework::OperatorBase&>(o)) {
PADDLE_THROW("Not implemented");
}
void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const override;
mutable RNNAlgorithm rnn;
};
class DynamicRecurrentGradientOp : public framework::OperatorBase { class DynamicRecurrentGradientOp : public framework::OperatorBase {
public: public:
DynamicRecurrentGradientOp(const std::string& type, DynamicRecurrentGradientOp(const std::string& type,
...@@ -174,8 +217,16 @@ class DynamicRecurrentGradientOp : public framework::OperatorBase { ...@@ -174,8 +217,16 @@ class DynamicRecurrentGradientOp : public framework::OperatorBase {
const framework::AttributeMap& attrs) const framework::AttributeMap& attrs)
: OperatorBase(type, inputs, outputs, attrs) {} : OperatorBase(type, inputs, outputs, attrs) {}
DynamicRecurrentGradientOp(const DynamicRecurrentGradientOp& o)
: framework::OperatorBase(
static_cast<const framework::OperatorBase&>(o)) {
PADDLE_THROW("Not implemented");
}
void Run(const framework::Scope& scope, void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const override; const platform::DeviceContext& dev_ctx) const override;
mutable RNNAlgorithm rnn;
}; };
} // namespace operators } // namespace operators
......
...@@ -43,16 +43,16 @@ LoDTensor* CreateVar(Scope& scope, std::string name, framework::DDim dims, ...@@ -43,16 +43,16 @@ LoDTensor* CreateVar(Scope& scope, std::string name, framework::DDim dims,
return tensor; return tensor;
} }
class DynamicRecurrentOpTestHelper : public ::testing::Test { class RNNAlgorithmTestHelper : public ::testing::Test {
protected: protected:
const rnn::ArgumentName argname = DynamicRecurrentOp::kArgName; const rnn::ArgumentName argname = RNNAlgorithm::kArgNames[0];
virtual void SetUp() override { virtual void SetUp() override {
CreateGlobalVariables(); CreateGlobalVariables();
auto op_desc = CreateOpDesc(); auto op_desc = CreateOpDesc();
op = paddle::framework::OpRegistry::CreateOp(op_desc); op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr);
dop = dynamic_cast<DynamicRecurrentOp*>(op.get()); dop = &(dynamic_cast<DynamicRecurrentOp*>(op.get())->rnn);
InitCacheManually(); InitCacheManually();
InitStepNet(); InitStepNet();
} }
...@@ -63,20 +63,20 @@ class DynamicRecurrentOpTestHelper : public ::testing::Test { ...@@ -63,20 +63,20 @@ class DynamicRecurrentOpTestHelper : public ::testing::Test {
op_desc.set_type("dynamic_recurrent"); op_desc.set_type("dynamic_recurrent");
OpDescNewVar(argname.inlinks, {"in0"}, op_desc.add_inputs()); OpDescNewVar(argname.inlinks, {"in0"}, op_desc.add_inputs());
OpDescNewVar(argname.boot_memories, {"boot_mem"}, op_desc.add_inputs()); OpDescNewVar(argname.initial_states, {"boot_mem"}, op_desc.add_inputs());
OpDescNewVar(argname.step_scopes, {"step_scopes"}, op_desc.add_outputs()); OpDescNewVar(argname.step_scopes, {"step_scopes"}, op_desc.add_outputs());
OpDescNewVar(argname.outlinks, {"out0"}, op_desc.add_outputs()); OpDescNewVar(argname.outlinks, {"out0"}, op_desc.add_outputs());
// set pre-memories // set pre-states
auto pre_memories = op_desc.mutable_attrs()->Add(); auto pre_memories = op_desc.mutable_attrs()->Add();
pre_memories->set_name(argname.pre_memories); pre_memories->set_name(argname.ex_states);
pre_memories->set_type(paddle::framework::AttrType::STRINGS); pre_memories->set_type(paddle::framework::AttrType::STRINGS);
auto pre_memories_item = pre_memories->add_strings(); auto pre_memories_item = pre_memories->add_strings();
*pre_memories_item = "mem@pre"; *pre_memories_item = "mem@pre";
// set memories // set states
auto memories = op_desc.mutable_attrs()->Add(); auto memories = op_desc.mutable_attrs()->Add();
memories->set_name(argname.memories); memories->set_name(argname.states);
memories->set_type(paddle::framework::AttrType::STRINGS); memories->set_type(paddle::framework::AttrType::STRINGS);
auto memories_item = memories->add_strings(); auto memories_item = memories->add_strings();
*memories_item = "mem"; *memories_item = "mem";
...@@ -113,32 +113,33 @@ class DynamicRecurrentOpTestHelper : public ::testing::Test { ...@@ -113,32 +113,33 @@ class DynamicRecurrentOpTestHelper : public ::testing::Test {
} }
void InitCacheManually() { void InitCacheManually() {
dop->cache_.Init(DynamicRecurrentOp::kArgName, *dop, scope, &dop->arg_); dop->cache_.Init(RNNAlgorithm::kArgNames[0], *op, scope, &device_context,
&dop->arg_);
} }
void InitStepNet() { void InitStepNet() {
std::unique_ptr<framework::OperatorBase> stepnet{new NetOp}; std::unique_ptr<framework::OperatorBase> stepnet{new NetOp};
dynamic_cast<NetOp*>(stepnet.get()) dynamic_cast<NetOp*>(stepnet.get())
->AppendOp(std::unique_ptr<TestOp>(new TestOp( ->AppendOp(std::unique_ptr<TestOp>(new TestOp(
"test", {{"inlinks", {"in0"}}, {"boot_memories", {"boot_mem"}}}, "test", {{"inputs", {"in0"}}, {"initial_states", {"boot_mem"}}},
{{"outlinks", {"out0"}}, {"step_scopes", {"step_scopes"}}}, {}))); {{"outputs", {"out0"}}, {"step_scopes", {"step_scopes"}}}, {})));
dop->SetStepNet(std::move(stepnet)); dop->SetStepUnit(std::move(stepnet));
} }
protected: protected:
DynamicRecurrentOp* dop; RNNAlgorithm* dop;
std::unique_ptr<framework::OperatorBase> op; std::unique_ptr<framework::OperatorBase> op;
paddle::platform::CPUDeviceContext device_context; paddle::platform::CPUDeviceContext device_context;
paddle::framework::Scope scope; paddle::framework::Scope scope;
}; };
TEST_F(DynamicRecurrentOpTestHelper, CreateCache) { TEST_F(RNNAlgorithmTestHelper, CreateCache) {
const rnn::Argument& arg = dop->arg_; const rnn::Argument& arg = dop->arg_;
ASSERT_EQ(arg.inlinks.size(), 1UL); ASSERT_EQ(arg.inlinks.size(), 1UL);
ASSERT_EQ(arg.outlinks.size(), 1UL); ASSERT_EQ(arg.outlinks.size(), 1UL);
} }
TEST_F(DynamicRecurrentOpTestHelper, SplitInputs) { TEST_F(RNNAlgorithmTestHelper, SplitInputs) {
dop->SplitInputs(); dop->SplitInputs();
auto& in0_ta = dop->step_inputs_["in0"]; auto& in0_ta = dop->step_inputs_["in0"];
ASSERT_EQ(in0_ta.size(), 4UL); ASSERT_EQ(in0_ta.size(), 4UL);
...@@ -153,14 +154,14 @@ TEST_F(DynamicRecurrentOpTestHelper, SplitInputs) { ...@@ -153,14 +154,14 @@ TEST_F(DynamicRecurrentOpTestHelper, SplitInputs) {
EXPECT_EQ(batch3.dims()[0], 1); EXPECT_EQ(batch3.dims()[0], 1);
} }
TEST_F(DynamicRecurrentOpTestHelper, CreateScopes) { TEST_F(RNNAlgorithmTestHelper, CreateScopes) {
dop->SplitInputs(); dop->SplitInputs();
dop->CreateScopes(); dop->CreateScopes();
ASSERT_EQ(dop->cache_.num_steps, 4UL); ASSERT_EQ(dop->cache_.num_steps, 4UL);
ASSERT_EQ(dop->cache_.scopes->size(), 4UL); ASSERT_EQ(dop->cache_.scopes->size(), 4UL);
} }
TEST_F(DynamicRecurrentOpTestHelper, WriteStepInputs) { TEST_F(RNNAlgorithmTestHelper, WriteStepInputs) {
dop->SplitInputs(); dop->SplitInputs();
dop->CreateScopes(); dop->CreateScopes();
dop->WriteStepInputs(); dop->WriteStepInputs();
...@@ -173,7 +174,7 @@ TEST_F(DynamicRecurrentOpTestHelper, WriteStepInputs) { ...@@ -173,7 +174,7 @@ TEST_F(DynamicRecurrentOpTestHelper, WriteStepInputs) {
} }
} }
TEST_F(DynamicRecurrentOpTestHelper, WriteStepOutputs) { TEST_F(RNNAlgorithmTestHelper, WriteStepOutputs) {
dop->SplitInputs(); dop->SplitInputs();
dop->CreateScopes(); dop->CreateScopes();
dop->WriteStepInputs(); dop->WriteStepInputs();
...@@ -187,11 +188,12 @@ TEST_F(DynamicRecurrentOpTestHelper, WriteStepOutputs) { ...@@ -187,11 +188,12 @@ TEST_F(DynamicRecurrentOpTestHelper, WriteStepOutputs) {
} }
} }
TEST_F(DynamicRecurrentOpTestHelper, ConcatOutputs) { TEST_F(RNNAlgorithmTestHelper, ConcatOutputs) {
// Let's leave this test to python unittest. // Let's leave this test to python unittest.
} }
TEST_F(DynamicRecurrentOpTestHelper, InitStates) { TEST_F(RNNAlgorithmTestHelper, InitStates) {
dop->SetComputeMode(RNNAlgorithm::ComputeMode::kForward);
dop->SplitInputs(); dop->SplitInputs();
dop->CreateScopes(); dop->CreateScopes();
dop->WriteStepInputs(); dop->WriteStepInputs();
...@@ -208,12 +210,6 @@ TEST_F(DynamicRecurrentOpTestHelper, InitStates) { ...@@ -208,12 +210,6 @@ TEST_F(DynamicRecurrentOpTestHelper, InitStates) {
auto* boot_state = scope.FindVar("boot_mem"); auto* boot_state = scope.FindVar("boot_mem");
ASSERT_TRUE(boot_state != nullptr); ASSERT_TRUE(boot_state != nullptr);
if (step == 0) {
// check pre_state is a reference of boot_state
ASSERT_EQ(boot_state->Get<LoDTensor>().data<float>(),
pre_state->Get<LoDTensor>().data<float>());
}
} }
} }
......
...@@ -108,7 +108,7 @@ void ElementwiseCompute(const framework::ExecutionContext& ctx) { ...@@ -108,7 +108,7 @@ void ElementwiseCompute(const framework::ExecutionContext& ctx) {
PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(), PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(),
"Rank of first input must >= rank of second input.") "Rank of first input must >= rank of second input.")
if (x_dims == y_dims || product(y_dims) == 1) { if (x_dims == y_dims) {
functor f; functor f;
f.template Run<Place, T>(x, y, z, ctx); f.template Run<Place, T>(x, y, z, ctx);
return; return;
...@@ -174,12 +174,6 @@ void ElementwiseGradCompute(const framework::ExecutionContext& ctx) { ...@@ -174,12 +174,6 @@ void ElementwiseGradCompute(const framework::ExecutionContext& ctx) {
return; return;
} }
if (product(y_dims) == 1) {
functor1 f;
f(place, x, y, out, dx, dy, dout);
return;
}
int axis = ctx.Attr<int>("axis"); int axis = ctx.Attr<int>("axis");
axis = (axis == -1 ? x_dims.size() - y_dims.size() : axis); axis = (axis == -1 ? x_dims.size() - y_dims.size() : axis);
......
...@@ -26,8 +26,9 @@ class FeedOp : public framework::OperatorBase { ...@@ -26,8 +26,9 @@ class FeedOp : public framework::OperatorBase {
: OperatorBase(type, inputs, outputs, attrs) {} : OperatorBase(type, inputs, outputs, attrs) {}
void Run(const framework::Scope &scope, void Run(const framework::Scope &scope,
const platform::DeviceContext &dev_ctx) const override { const platform::DeviceContext &dev_ctx) const override {
auto feed_var_name = Input("Input"); auto feed_var_name = Input("X");
auto *feed_var = scope.FindVar(feed_var_name); auto *feed_var = scope.FindVar(feed_var_name);
PADDLE_ENFORCE(feed_var != nullptr, PADDLE_ENFORCE(feed_var != nullptr,
"Cannot find feed_var in scope, feed_var_name is %s", "Cannot find feed_var in scope, feed_var_name is %s",
feed_var_name); feed_var_name);
...@@ -40,18 +41,32 @@ class FeedOp : public framework::OperatorBase { ...@@ -40,18 +41,32 @@ class FeedOp : public framework::OperatorBase {
auto col = Attr<int>("col"); auto col = Attr<int>("col");
VLOG(3) << "Feed Var " << feed_var_name << "'s " << col << " column to var"
<< out_name;
auto &feed_list = feed_var->Get<framework::FeedFetchList>(); auto &feed_list = feed_var->Get<framework::FeedFetchList>();
auto &feed_item = feed_list.at(static_cast<size_t>(col)); auto &feed_item = feed_list.at(static_cast<size_t>(col));
auto *out_item = out_var->GetMutable<framework::FeedFetchType>(); auto *out_item = out_var->GetMutable<framework::FeedFetchType>();
out_item->CopyFromTensor(feed_item, dev_ctx.GetPlace(), dev_ctx); out_item->CopyFrom(feed_item, dev_ctx.GetPlace(), dev_ctx);
out_item->set_lod(feed_item.lod()); out_item->set_lod(feed_item.lod());
} }
}; };
class FeedOpInfoMaker : public framework::OpProtoAndCheckerMaker {
public:
FeedOpInfoMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of feed op");
AddOutput("Out", "The output of feed op");
AddComment("feed op, it should not be configured by users directly");
AddAttr<int>("col", "column of feed");
}
};
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
// We do not need to register OpInfoMaker,
// since feed operator will not be used by end users directly
REGISTER_OPERATOR(feed, paddle::operators::FeedOp, REGISTER_OPERATOR(feed, paddle::operators::FeedOp,
paddle::framework::EmptyGradOpMaker); paddle::framework::EmptyGradOpMaker,
paddle::operators::FeedOpInfoMaker);
...@@ -27,7 +27,7 @@ class FetchOp : public framework::OperatorBase { ...@@ -27,7 +27,7 @@ class FetchOp : public framework::OperatorBase {
void Run(const framework::Scope &scope, void Run(const framework::Scope &scope,
const platform::DeviceContext &dev_ctx) const override { const platform::DeviceContext &dev_ctx) const override {
auto fetch_var_name = Input("Input"); auto fetch_var_name = Input("X");
auto *fetch_var = scope.FindVar(fetch_var_name); auto *fetch_var = scope.FindVar(fetch_var_name);
PADDLE_ENFORCE(fetch_var != nullptr, PADDLE_ENFORCE(fetch_var != nullptr,
"Cannot find fetch variable in scope, fetch_var_name is %s", "Cannot find fetch variable in scope, fetch_var_name is %s",
...@@ -51,14 +51,26 @@ class FetchOp : public framework::OperatorBase { ...@@ -51,14 +51,26 @@ class FetchOp : public framework::OperatorBase {
// FIXME(yuyang18): Should we assume the fetch operator always generate // FIXME(yuyang18): Should we assume the fetch operator always generate
// CPU outputs? // CPU outputs?
dst_item.CopyFromTensor(src_item, platform::CPUPlace(), dev_ctx); dst_item.CopyFrom(src_item, platform::CPUPlace(), dev_ctx);
VLOG(3) << "Fetch variable " << fetch_var_name << " to " << out_name;
} }
}; };
class FetchOpInfoMaker : public framework::OpProtoAndCheckerMaker {
public:
FetchOpInfoMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of fetch op");
AddOutput("Out", "The output of fetch op");
AddComment("fetch op, it should not be configured by users directly");
AddAttr<int>("col", "column of fetch");
}
};
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
// We do not need to register OpInfoMaker,
// since fetch operator will not be used by end users directly
REGISTER_OPERATOR(fetch, paddle::operators::FetchOp, REGISTER_OPERATOR(fetch, paddle::operators::FetchOp,
paddle::framework::EmptyGradOpMaker); paddle::framework::EmptyGradOpMaker,
paddle::operators::FetchOpInfoMaker);
...@@ -59,7 +59,7 @@ class GaussianRandomOp : public framework::OperatorWithKernel { ...@@ -59,7 +59,7 @@ class GaussianRandomOp : public framework::OperatorWithKernel {
protected: protected:
framework::DataType IndicateDataType( framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override { const framework::ExecutionContext& ctx) const override {
return static_cast<framework::DataType>(Attr<int>("data_type")); return static_cast<framework::DataType>(ctx.Attr<int>("data_type"));
} }
}; };
......
digraph ImageBatchNormForkGragh {
subgraph cluster_before {
Prev [label="...", shape=plaintext];
Rnn [label="rnn_op", shape=box];
BatchNorm [label="batch_norm_op", shape=box];
Fc [label="fc_op", shape=box];
After [label="...", shape=plaintext];
Prev -> Rnn -> BatchNorm -> Fc -> After;
label="original";
}
subgraph cluster_after {
Prev2 [label="...", shape=plaintext];
Rnn2 [label="rnn_op", shape=box];
BatchNorm2_1 [label="train_batch_norm_op", shape=box];
BatchNorm2_2 [label="infer_batch_norm_op", shape=box];
Fc2_1 [label="fc_op", shape=box];
Fc2_2 [label="fc_op", shape=box];
After2_1 [label="...", shape=plaintext];
After2_2 [label="...", shape=plaintext];
Prev2 -> Rnn2 -> BatchNorm2_1 -> Fc2_1 -> After2_1;
Rnn2 -> BatchNorm2_2 ->Fc2_2 ->After2_2
label="forked";
}
}
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/increment_op.h"
namespace paddle {
namespace operators {
class IncrementOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of IncrementOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of IncrementOp should not be null.");
ctx->SetOutputDim("Out", ctx->GetInputDim("X"));
ctx->ShareLoD("X", /*->*/ "Out");
}
};
template <typename AttrType>
class IncrementOpMaker : public framework::OpProtoAndCheckerMaker {
public:
IncrementOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) The input tensor of increment operator");
AddOutput("Out", "(Tensor) The output tensor of increment operator.");
AddComment(R"DOC(Increment operator
The equation is: Out = X + step
)DOC");
AddAttr<AttrType>("step",
"The step size by which the "
"input tensor will be incremented.")
.SetDefault(1.0);
}
};
class IncrementGradOpMaker : public framework::SingleGradOpDescMaker {
public:
using framework::SingleGradOpDescMaker::SingleGradOpDescMaker;
std::unique_ptr<framework::OpDescBind> Apply() const override {
auto *grad_op = new framework::OpDescBind();
grad_op->SetType("scale");
grad_op->SetInput("X", OutputGrad("Out"));
grad_op->SetOutput("Out", InputGrad("X"));
grad_op->SetAttr("scale", 1.0f);
return std::unique_ptr<framework::OpDescBind>(grad_op);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(increment, ops::IncrementOp, ops::IncrementOpMaker<float>,
ops::IncrementGradOpMaker);
REGISTER_OP_CPU_KERNEL(increment,
ops::IncrementKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/increment_op.h"
REGISTER_OP_GPU_KERNEL(
increment,
paddle::operators::IncrementKernel<paddle::platform::GPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
template <typename Place, typename T, typename AttrType = T>
class IncrementKernel : public framework::OpKernel<T> {
public:
virtual void Compute(const framework::ExecutionContext& context) const {
auto* tensor = context.Output<framework::Tensor>("Out");
auto* in = context.Input<framework::Tensor>("X");
tensor->mutable_data<T>(in->place());
auto step = static_cast<T>(context.Attr<AttrType>("step"));
auto eigen_out = framework::EigenVector<T>::Flatten(*tensor);
auto eigen_in = framework::EigenVector<T>::Flatten(*in);
auto& place = context.GetEigenDevice<Place>();
eigen_out.device(place) = eigen_in + step;
}
};
} // namespace operators
} // namespace paddle
...@@ -460,14 +460,13 @@ class LinearChainCrfGradOpKernel<platform::CPUPlace, T> ...@@ -460,14 +460,13 @@ class LinearChainCrfGradOpKernel<platform::CPUPlace, T>
for (int i = 0; i < tag_num; ++i) for (int i = 0; i < tag_num; ++i)
beta_value[(seq_length - 1) * tag_num + i] = w_exps[tag_num + i]; beta_value[(seq_length - 1) * tag_num + i] = w_exps[tag_num + i];
NormalizeL1<T>(beta_value + (seq_length - 1) * tag_num, tag_num); NormalizeL1<T>(beta_value + (seq_length - 1) * tag_num, tag_num);
for (int k = seq_length - 2; k >= 0; --k) { for (int k = seq_length - 2; k >= 0; --k) {
for (int i = 0; i < tag_num; ++i) { for (int i = 0; i < tag_num; ++i) {
T sum = 0.; T sum = 0.;
for (int j = 0; j < tag_num; ++j) { for (int j = 0; j < tag_num; ++j) {
sum += x_exps[(i + state_trans_base_idx) * tag_num + j] * sum += w_exps[(i + state_trans_base_idx) * tag_num + j] *
beta_value[(k + 1) * tag_num + j] * x_exps[(k + 1) * tag_num + j] *
x_exps[(k + 1) * tag_num + j]; beta_value[(k + 1) * tag_num + j];
} }
beta_value[k * tag_num + i] = sum; beta_value[k * tag_num + i] = sum;
} }
......
...@@ -64,7 +64,7 @@ void testIm2col() { ...@@ -64,7 +64,7 @@ void testIm2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
input = input_tmp; input = input_tmp;
} else { } else {
input.CopyFrom<float>(input_tmp, *place, *context); input.CopyFrom(input_tmp, *place, *context);
} }
output_cfo.mutable_data<float>( output_cfo.mutable_data<float>(
{1, filter_size, filter_size, output_height, output_width}, *place); {1, filter_size, filter_size, output_height, output_width}, *place);
...@@ -85,8 +85,7 @@ void testIm2col() { ...@@ -85,8 +85,7 @@ void testIm2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
out_cfo_ptr = output_cfo.data<float>(); out_cfo_ptr = output_cfo.data<float>();
} else { } else {
output_tmp.CopyFrom<float>(output_cfo, paddle::platform::CPUPlace(), output_tmp.CopyFrom(output_cfo, paddle::platform::CPUPlace(), *context);
*context);
out_cfo_ptr = output_tmp.data<float>(); out_cfo_ptr = output_tmp.data<float>();
} }
EXPECT_EQ(out_cfo_ptr[0], 0); EXPECT_EQ(out_cfo_ptr[0], 0);
...@@ -102,8 +101,7 @@ void testIm2col() { ...@@ -102,8 +101,7 @@ void testIm2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
out_ocf_ptr = output_ocf.data<float>(); out_ocf_ptr = output_ocf.data<float>();
} else { } else {
output_tmp.CopyFrom<float>(output_ocf, paddle::platform::CPUPlace(), output_tmp.CopyFrom(output_ocf, paddle::platform::CPUPlace(), *context);
*context);
out_ocf_ptr = output_tmp.data<float>(); out_ocf_ptr = output_tmp.data<float>();
} }
EXPECT_EQ(out_ocf_ptr[0], 0); EXPECT_EQ(out_ocf_ptr[0], 0);
......
...@@ -130,6 +130,87 @@ void matmul<platform::CPUPlace, double>( ...@@ -130,6 +130,87 @@ void matmul<platform::CPUPlace, double>(
matrix_b.data<double>(), beta, matrix_out->data<double>()); matrix_b.data<double>(), beta, matrix_out->data<double>());
} }
#ifdef PADDLE_USE_MKLML
// Use cblas_{s,d}gemm_batched if available: Run with 1 group of size batchSize.
template <>
void batched_gemm<platform::CPUPlace, float>(
const platform::DeviceContext& context, const CBLAS_TRANSPOSE transA,
const CBLAS_TRANSPOSE transB, const int M, const int N, const int K,
const float alpha, const float* A, const float* B, const float beta,
float* C, const int batchCount, const int strideA, const int strideB) {
int lda = (transA == CblasNoTrans) ? K : M;
int ldb = (transB == CblasNoTrans) ? N : K;
int ldc = N;
auto a_array = std::vector<const float*>(batchCount);
auto b_array = std::vector<const float*>(batchCount);
auto c_array = std::vector<float*>(batchCount);
for (int k = 0; k < batchCount; ++k) {
a_array[k] = &A[k * strideA];
b_array[k] = &B[k * strideB];
c_array[k] = &C[k * M * N];
}
cblas_sgemm_batch(CblasRowMajor, &transA, &transB, &M, &N, &K, &alpha,
a_array.data(), &lda, b_array.data(), &ldb, &beta,
c_array.data(), &ldc, 1 /* group_count */, &batchCount);
}
template <>
void batched_gemm<platform::CPUPlace, double>(
const platform::DeviceContext& context, const CBLAS_TRANSPOSE transA,
const CBLAS_TRANSPOSE transB, const int M, const int N, const int K,
const double alpha, const double* A, const double* B, const double beta,
double* C, const int batchCount, const int strideA, const int strideB) {
int lda = (transA == CblasNoTrans) ? K : M;
int ldb = (transB == CblasNoTrans) ? N : K;
int ldc = N;
auto a_array = std::vector<const double*>(batchCount);
auto b_array = std::vector<const double*>(batchCount);
auto c_array = std::vector<double*>(batchCount);
for (int k = 0; k < batchCount; ++k) {
a_array[k] = &A[k * strideA];
b_array[k] = &B[k * strideB];
c_array[k] = &C[k * M * N];
}
cblas_dgemm_batch(CblasRowMajor, &transA, &transB, &M, &N, &K, &alpha,
a_array.data(), &lda, b_array.data(), &ldb, &beta,
c_array.data(), &ldc, 1 /* group_count */, &batchCount);
}
#else
// The below is a naive but correct serial implementation that just loops
// over the batch dimension. This is a fallback for when the batched gemm
// functions of Intel MKL are not available. In the future, this computation
// should be parallelized.
template <>
void batched_gemm<platform::CPUPlace, float>(
const platform::DeviceContext& context, const CBLAS_TRANSPOSE transA,
const CBLAS_TRANSPOSE transB, const int M, const int N, const int K,
const float alpha, const float* A, const float* B, const float beta,
float* C, const int batchCount, const int strideA, const int strideB) {
for (int k = 0; k < batchCount; ++k) {
const float* Ak = &A[k * strideA];
const float* Bk = &B[k * strideB];
float* Ck = &C[k * M * N];
gemm<platform::CPUPlace, float>(context, transA, transB, M, N, K, alpha, Ak,
Bk, beta, Ck);
}
}
template <>
void batched_gemm<platform::CPUPlace, double>(
const platform::DeviceContext& context, const CBLAS_TRANSPOSE transA,
const CBLAS_TRANSPOSE transB, const int M, const int N, const int K,
const double alpha, const double* A, const double* B, const double beta,
double* C, const int batchCount, const int strideA, const int strideB) {
for (int k = 0; k < batchCount; ++k) {
const double* Ak = &A[k * strideA];
const double* Bk = &B[k * strideB];
double* Ck = &C[k * M * N];
gemm<platform::CPUPlace, double>(context, transA, transB, M, N, K, alpha,
Ak, Bk, beta, Ck);
}
}
#endif
template struct SetConstant<platform::CPUPlace, float>; template struct SetConstant<platform::CPUPlace, float>;
} // namespace math } // namespace math
......
...@@ -155,6 +155,54 @@ void matmul<platform::GPUPlace, double>( ...@@ -155,6 +155,54 @@ void matmul<platform::GPUPlace, double>(
matrix_b.data<double>(), beta, matrix_out->data<double>()); matrix_b.data<double>(), beta, matrix_out->data<double>());
} }
template <>
void batched_gemm<platform::GPUPlace, float>(
const platform::DeviceContext& context, const CBLAS_TRANSPOSE transA,
const CBLAS_TRANSPOSE transB, const int M, const int N, const int K,
const float alpha, const float* A, const float* B, const float beta,
float* C, const int batchCount, const int strideA, const int strideB) {
// Note that cublas follows fortran order, so the order is different from
// the cblas convention.
int lda = (transA == CblasNoTrans) ? K : M;
int ldb = (transB == CblasNoTrans) ? N : K;
int ldc = N;
cublasOperation_t cuTransA =
(transA == CblasNoTrans) ? CUBLAS_OP_N : CUBLAS_OP_T;
cublasOperation_t cuTransB =
(transB == CblasNoTrans) ? CUBLAS_OP_N : CUBLAS_OP_T;
const int strideC = M * N;
PADDLE_ENFORCE(platform::dynload::cublasSgemmStridedBatched(
reinterpret_cast<const platform::CUDADeviceContext&>(context)
.cublas_handle(),
cuTransB, cuTransA, N, M, K, &alpha, B, ldb, strideB, A, lda, strideA,
&beta, C, ldc, strideC, batchCount));
}
template <>
void batched_gemm<platform::GPUPlace, double>(
const platform::DeviceContext& context, const CBLAS_TRANSPOSE transA,
const CBLAS_TRANSPOSE transB, const int M, const int N, const int K,
const double alpha, const double* A, const double* B, const double beta,
double* C, const int batchCount, const int strideA, const int strideB) {
// Note that cublas follows fortran order, so the order is different from
// the cblas convention.
int lda = (transA == CblasNoTrans) ? K : M;
int ldb = (transB == CblasNoTrans) ? N : K;
int ldc = N;
cublasOperation_t cuTransA =
(transA == CblasNoTrans) ? CUBLAS_OP_N : CUBLAS_OP_T;
cublasOperation_t cuTransB =
(transB == CblasNoTrans) ? CUBLAS_OP_N : CUBLAS_OP_T;
const int strideC = M * N;
PADDLE_ENFORCE(platform::dynload::cublasDgemmStridedBatched(
reinterpret_cast<const platform::CUDADeviceContext&>(context)
.cublas_handle(),
cuTransB, cuTransA, N, M, K, &alpha, B, ldb, strideB, A, lda, strideA,
&beta, C, ldc, strideC, batchCount));
}
template struct SetConstant<platform::GPUPlace, float>; template struct SetConstant<platform::GPUPlace, float>;
} // namespace math } // namespace math
......
...@@ -63,7 +63,7 @@ namespace math { ...@@ -63,7 +63,7 @@ namespace math {
// Support continuous memory now // Support continuous memory now
// If transA = N, and transB = N // If transA = N, and transB = N
// Then matrixA: M * K, matrixB: K * N matrixC : M * N // Then matrixA: M * K, matrixB: K * N, matrixC : M * N
// For more detailed info, please refer to // For more detailed info, please refer to
// http://www.netlib.org/lapack/explore-html/d4/de2/sgemm_8f.html // http://www.netlib.org/lapack/explore-html/d4/de2/sgemm_8f.html
template <typename Place, typename T> template <typename Place, typename T>
...@@ -85,6 +85,14 @@ void matmul(const platform::DeviceContext& context, ...@@ -85,6 +85,14 @@ void matmul(const platform::DeviceContext& context,
const framework::Tensor& matrix_b, bool trans_b, T alpha, const framework::Tensor& matrix_b, bool trans_b, T alpha,
framework::Tensor* matrix_out, T beta); framework::Tensor* matrix_out, T beta);
// Batched gemm
template <typename Place, typename T>
void batched_gemm(const platform::DeviceContext& context,
const CBLAS_TRANSPOSE transA, const CBLAS_TRANSPOSE transB,
const int M, const int N, const int K, const T alpha,
const T* A, const T* B, const T beta, T* C,
const int batchCount, const int strideA, const int strideB);
template <typename Place, typename T> template <typename Place, typename T>
struct SetConstant { struct SetConstant {
void operator()(const platform::DeviceContext& context, void operator()(const platform::DeviceContext& context,
......
...@@ -16,15 +16,15 @@ TEST(math_function, notrans_mul_trans) { ...@@ -16,15 +16,15 @@ TEST(math_function, notrans_mul_trans) {
auto* gpu_place = new paddle::platform::GPUPlace(0); auto* gpu_place = new paddle::platform::GPUPlace(0);
paddle::platform::CUDADeviceContext context(*gpu_place); paddle::platform::CUDADeviceContext context(*gpu_place);
input1_gpu.CopyFrom<float>(input1, *gpu_place, context); input1_gpu.CopyFrom(input1, *gpu_place, context);
input2_gpu.CopyFrom<float>(input1, *gpu_place, context); input2_gpu.CopyFrom(input1, *gpu_place, context);
out_gpu.mutable_data<float>({2, 2}, *gpu_place); out_gpu.mutable_data<float>({2, 2}, *gpu_place);
paddle::operators::math::matmul<paddle::platform::GPUPlace, float>( paddle::operators::math::matmul<paddle::platform::GPUPlace, float>(
context, input1_gpu, false, input2_gpu, true, 1, &out_gpu, 0); context, input1_gpu, false, input2_gpu, true, 1, &out_gpu, 0);
out.CopyFrom<float>(out_gpu, *cpu_place, context); out.CopyFrom(out_gpu, *cpu_place, context);
float* out_ptr = out.data<float>(); float* out_ptr = out.data<float>();
context.Wait(); context.Wait();
...@@ -50,15 +50,15 @@ TEST(math_function, trans_mul_notrans) { ...@@ -50,15 +50,15 @@ TEST(math_function, trans_mul_notrans) {
auto* gpu_place = new paddle::platform::GPUPlace(0); auto* gpu_place = new paddle::platform::GPUPlace(0);
paddle::platform::CUDADeviceContext context(*gpu_place); paddle::platform::CUDADeviceContext context(*gpu_place);
input1_gpu.CopyFrom<float>(input1, *gpu_place, context); input1_gpu.CopyFrom(input1, *gpu_place, context);
input2_gpu.CopyFrom<float>(input1, *gpu_place, context); input2_gpu.CopyFrom(input1, *gpu_place, context);
out_gpu.mutable_data<float>({3, 3}, *gpu_place); out_gpu.mutable_data<float>({3, 3}, *gpu_place);
paddle::operators::math::matmul<paddle::platform::GPUPlace, float>( paddle::operators::math::matmul<paddle::platform::GPUPlace, float>(
context, input1_gpu, true, input2_gpu, false, 1, &out_gpu, 0); context, input1_gpu, true, input2_gpu, false, 1, &out_gpu, 0);
out.CopyFrom<float>(out_gpu, *cpu_place, context); out.CopyFrom(out_gpu, *cpu_place, context);
float* out_ptr = out.data<float>(); float* out_ptr = out.data<float>();
context.Wait(); context.Wait();
...@@ -99,9 +99,9 @@ TEST(math_function, gemm_notrans_cublas) { ...@@ -99,9 +99,9 @@ TEST(math_function, gemm_notrans_cublas) {
auto* gpu_place = new paddle::platform::GPUPlace(0); auto* gpu_place = new paddle::platform::GPUPlace(0);
paddle::platform::CUDADeviceContext context(*gpu_place); paddle::platform::CUDADeviceContext context(*gpu_place);
input1_gpu.CopyFrom<float>(input1, *gpu_place, context); input1_gpu.CopyFrom(input1, *gpu_place, context);
input2_gpu.CopyFrom<float>(input2, *gpu_place, context); input2_gpu.CopyFrom(input2, *gpu_place, context);
input3_gpu.CopyFrom<float>(input3, *gpu_place, context); input3_gpu.CopyFrom(input3, *gpu_place, context);
float* a = input1_gpu.data<float>(); float* a = input1_gpu.data<float>();
float* b = input2_gpu.data<float>(); float* b = input2_gpu.data<float>();
float* c = input3_gpu.mutable_data<float>(*gpu_place); float* c = input3_gpu.mutable_data<float>(*gpu_place);
...@@ -109,7 +109,7 @@ TEST(math_function, gemm_notrans_cublas) { ...@@ -109,7 +109,7 @@ TEST(math_function, gemm_notrans_cublas) {
paddle::operators::math::gemm<paddle::platform::GPUPlace, float>( paddle::operators::math::gemm<paddle::platform::GPUPlace, float>(
context, false, false, m, n, k, 1, a, 3, b + 1, 4, 1, c + 1, 4); context, false, false, m, n, k, 1, a, 3, b + 1, 4, 1, c + 1, 4);
input3.CopyFrom<float>(input3_gpu, *cpu_place, context); input3.CopyFrom(input3_gpu, *cpu_place, context);
// numpy code: // numpy code:
// a = np.arange(6).reshape(2, 3) // a = np.arange(6).reshape(2, 3)
...@@ -154,9 +154,9 @@ TEST(math_function, gemm_trans_cublas) { ...@@ -154,9 +154,9 @@ TEST(math_function, gemm_trans_cublas) {
auto* gpu_place = new paddle::platform::GPUPlace(0); auto* gpu_place = new paddle::platform::GPUPlace(0);
paddle::platform::CUDADeviceContext context(*gpu_place); paddle::platform::CUDADeviceContext context(*gpu_place);
input1_gpu.CopyFrom<float>(input1, *gpu_place, context); input1_gpu.CopyFrom(input1, *gpu_place, context);
input2_gpu.CopyFrom<float>(input2, *gpu_place, context); input2_gpu.CopyFrom(input2, *gpu_place, context);
input3_gpu.CopyFrom<float>(input3, *gpu_place, context); input3_gpu.CopyFrom(input3, *gpu_place, context);
float* a = input1_gpu.data<float>(); float* a = input1_gpu.data<float>();
float* b = input2_gpu.data<float>(); float* b = input2_gpu.data<float>();
float* c = input3_gpu.mutable_data<float>(*gpu_place); float* c = input3_gpu.mutable_data<float>(*gpu_place);
...@@ -164,7 +164,7 @@ TEST(math_function, gemm_trans_cublas) { ...@@ -164,7 +164,7 @@ TEST(math_function, gemm_trans_cublas) {
paddle::operators::math::gemm<paddle::platform::GPUPlace, float>( paddle::operators::math::gemm<paddle::platform::GPUPlace, float>(
context, false, true, m, n, k, 1, a, 3, b + 3, 3, 1, c + 1, 4); context, false, true, m, n, k, 1, a, 3, b + 3, 3, 1, c + 1, 4);
input3.CopyFrom<float>(input3_gpu, *cpu_place, context); input3.CopyFrom(input3_gpu, *cpu_place, context);
context.Wait(); context.Wait();
EXPECT_EQ(input3_ptr[0], 0); EXPECT_EQ(input3_ptr[0], 0);
......
/* Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/operators/math/math_function.h"
namespace paddle {
namespace operators {
namespace math {
// Implements the logic of numpy matmul:
// https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html
//
// but allowing also for a, b to be transposed
//
// Both a & b can be 1- to 3-dimensional. Higher rank tensors are not supported
// yet.
template <typename Place, typename T>
class MatMulFunctor {
public:
void operator()(const platform::DeviceContext& context,
const framework::Tensor& a, bool trans_a,
const framework::Tensor& b, bool trans_b, T alpha,
framework::Tensor* out, T beta) {
auto dim_a = a.dims();
auto dim_b = b.dims();
PADDLE_ENFORCE(a.place() == b.place() && b.place() == out->place(),
"Tensors must all be in the same place.");
PADDLE_ENFORCE_GE(dim_a.size(), 1,
"Input tensor a must be at least 1-dimensional.");
PADDLE_ENFORCE_GE(dim_b.size(), 1,
"Input tensor b must be at least 1-dimensional.");
PADDLE_ENFORCE_LE(dim_a.size(), 3,
"Input tensor a must be at most 3-dimensional.");
PADDLE_ENFORCE_LE(dim_b.size(), 3,
"Input tensor b must be at most 3-dimensional.");
int M = 0, N = 0, kA = 0, kB = 0, batchCountA = 0, batchCountB = 0,
strideA = 0, strideB = 0;
switch (dim_a.size()) {
case 1:
// similar to np.matmul:
// prepend dimension 1 (no transpose) or append dimension 1 (transpose)
M = trans_a ? dim_a[0] : 1;
kA = trans_a ? 1 : dim_a[0];
break;
case 2:
M = trans_a ? dim_a[1] : dim_a[0];
kA = trans_a ? dim_a[0] : dim_a[1];
break;
case 3:
batchCountA = dim_a[0];
M = trans_a ? dim_a[2] : dim_a[1];
kA = trans_a ? dim_a[1] : dim_a[2];
strideA = M * kA;
break;
default:
assert(false);
}
switch (dim_b.size()) {
case 1:
// similar to np.matmul:
// append dimension 1 (no transpose) or prepend dimension 1 (transpose)
kB = trans_b ? 1 : dim_b[0];
N = trans_b ? dim_b[0] : 1;
break;
case 2:
kB = trans_b ? dim_b[1] : dim_b[0];
N = trans_b ? dim_b[0] : dim_b[1];
break;
case 3:
batchCountB = dim_b[0];
kB = trans_b ? dim_b[2] : dim_b[1];
N = trans_b ? dim_b[1] : dim_b[2];
strideB = kB * N;
break;
default:
assert(false);
}
PADDLE_ENFORCE_EQ(
kA, kB,
"First matrix's width must be equal with second matrix's height.");
if (batchCountA && batchCountB) {
PADDLE_ENFORCE_EQ(
batchCountA, batchCountB,
"When input tensors a and b are both batched, they must have the "
"same batch dimension.");
}
int batchCount = std::max(batchCountA, batchCountB);
CBLAS_TRANSPOSE transA = (trans_a == false) ? CblasNoTrans : CblasTrans;
CBLAS_TRANSPOSE transB = (trans_b == false) ? CblasNoTrans : CblasTrans;
if (!batchCount) {
// regular matrix multiplication
gemm<Place, T>(context, transA, transB, M, N, kA, alpha, a.data<T>(),
b.data<T>(), beta, out->data<T>());
} else {
// batched matrix multiplication
batched_gemm<Place, T>(context, transA, transB, M, N, kA, alpha,
a.data<T>(), b.data<T>(), beta, out->data<T>(),
batchCount, strideA, strideB);
}
}
};
} // namespace math
} // namespace operators
} // namespace paddle
...@@ -67,7 +67,7 @@ TEST(selected_rows_functor, gpu_add) { ...@@ -67,7 +67,7 @@ TEST(selected_rows_functor, gpu_add) {
EXPECT_EQ(out_rows[6], 9); EXPECT_EQ(out_rows[6], 9);
Tensor out_cpu; Tensor out_cpu;
out_cpu.CopyFrom<float>(*out_value, cpu_place, ctx); out_cpu.CopyFrom(*out_value, cpu_place, ctx);
ctx.Wait(); ctx.Wait();
auto* out_cpu_data = out_cpu.data<float>(); auto* out_cpu_data = out_cpu.data<float>();
...@@ -94,7 +94,7 @@ TEST(selected_rows_functor, gpu_add) { ...@@ -94,7 +94,7 @@ TEST(selected_rows_functor, gpu_add) {
add_tensor_functor(ctx, *output, *tensor1, tensor2.get()); add_tensor_functor(ctx, *output, *tensor1, tensor2.get());
Tensor tensor2_cpu; Tensor tensor2_cpu;
tensor2_cpu.CopyFrom<float>(*tensor2, cpu_place, ctx); tensor2_cpu.CopyFrom(*tensor2, cpu_place, ctx);
ctx.Wait(); ctx.Wait();
auto* tensor2_cpu_data = tensor2_cpu.data<float>(); auto* tensor2_cpu_data = tensor2_cpu.data<float>();
......
...@@ -78,7 +78,7 @@ void testVol2col() { ...@@ -78,7 +78,7 @@ void testVol2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
input = input_tmp; input = input_tmp;
} else { } else {
input.CopyFrom<float>(input_tmp, *place, *context); input.CopyFrom(input_tmp, *place, *context);
} }
output.mutable_data<float>({1, filter_size, filter_size, filter_size, output.mutable_data<float>({1, filter_size, filter_size, filter_size,
output_depth, output_height, output_width}, output_depth, output_height, output_width},
...@@ -93,7 +93,7 @@ void testVol2col() { ...@@ -93,7 +93,7 @@ void testVol2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
out_cfo_ptr = output.data<float>(); out_cfo_ptr = output.data<float>();
} else { } else {
output_tmp.CopyFrom<float>(output, paddle::platform::CPUPlace(), *context); output_tmp.CopyFrom(output, paddle::platform::CPUPlace(), *context);
out_cfo_ptr = output_tmp.data<float>(); out_cfo_ptr = output_tmp.data<float>();
} }
...@@ -107,7 +107,7 @@ void testVol2col() { ...@@ -107,7 +107,7 @@ void testVol2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
input = input_tmp; input = input_tmp;
} else { } else {
input.CopyFrom<float>(input_tmp, *place, *context); input.CopyFrom(input_tmp, *place, *context);
} }
paddle::operators::math::Col2VolFunctor<Place, float> col2vol; paddle::operators::math::Col2VolFunctor<Place, float> col2vol;
...@@ -118,7 +118,7 @@ void testVol2col() { ...@@ -118,7 +118,7 @@ void testVol2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
in_ptr = input.data<float>(); in_ptr = input.data<float>();
} else { } else {
input_tmp.CopyFrom<float>(input, paddle::platform::CPUPlace(), *context); input_tmp.CopyFrom(input, paddle::platform::CPUPlace(), *context);
in_ptr = input_tmp.data<float>(); in_ptr = input_tmp.data<float>();
} }
......
/* Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/matmul_op.h"
namespace paddle {
namespace operators {
using framework::Tensor;
class MatMulOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext* context) const override {
PADDLE_ENFORCE(context->HasInput("X"),
"Input(X) of MatMulOp should not be null.");
PADDLE_ENFORCE(context->HasInput("Y"),
"Input(Y) of MatMulOp should not be null.");
PADDLE_ENFORCE(context->HasOutput("Out"),
"Output(Out) of MatMulOp should not be null.");
auto dim_x = context->GetInputDim("X");
auto dim_y = context->GetInputDim("Y");
bool transpose_x = context->Attrs().Get<bool>("transpose_X");
bool transpose_y = context->Attrs().Get<bool>("transpose_Y");
PADDLE_ENFORCE_GE(dim_x.size(), 1,
"Input tensor X must be at least 1-dimensional.");
PADDLE_ENFORCE_GE(dim_y.size(), 1,
"Input tensor Y must be at least 1-dimensional.");
PADDLE_ENFORCE_LE(dim_x.size(), 3,
"Input tensor X must be at most 3-dimensional.");
PADDLE_ENFORCE_LE(dim_y.size(), 3,
"Input tensor Y must be at most 3-dimensional.");
int M = 0, N = 0, KX = 0, KY = 0, batchCountX = 0, batchCountY = 0;
bool remove_initial_dim = false, remove_final_dim = false;
switch (dim_x.size()) {
case 1:
if (transpose_x) {
M = dim_x[0];
KX = 1;
} else {
M = 1;
KX = dim_x[0];
remove_initial_dim = true;
}
break;
case 2:
M = transpose_x ? dim_x[1] : dim_x[0];
KX = transpose_x ? dim_x[0] : dim_x[1];
break;
case 3:
batchCountX = dim_x[0];
M = transpose_x ? dim_x[2] : dim_x[1];
KX = transpose_x ? dim_x[1] : dim_x[2];
break;
default:
assert(false);
}
switch (dim_y.size()) {
case 1:
if (transpose_y) {
N = dim_y[0];
KY = 1;
} else {
N = 1;
KY = dim_y[0];
remove_final_dim = true;
}
break;
case 2:
KY = transpose_y ? dim_y[1] : dim_y[0];
N = transpose_y ? dim_y[0] : dim_y[1];
break;
case 3:
batchCountY = dim_y[0];
KY = transpose_y ? dim_y[2] : dim_y[1];
N = transpose_y ? dim_y[1] : dim_y[2];
break;
default:
assert(false);
}
PADDLE_ENFORCE_EQ(
KX, KY,
"First matrix's width must be equal with second matrix's height.");
if (batchCountX && batchCountY) {
PADDLE_ENFORCE_EQ(
batchCountX, batchCountY,
"When Input(X) and Input(Y) are both three dimensional, they "
"must have the same batch dimension.");
}
int batchCount = std::max(batchCountX, batchCountY);
std::vector<int64_t> dim_out;
if (batchCount) {
dim_out.push_back(batchCount);
}
if (!remove_initial_dim) {
dim_out.push_back(M);
}
if (!remove_final_dim) {
dim_out.push_back(N);
}
if (dim_out.size() == 0) {
// We don't support 0-dimensional Tensors (scalars), so instead
// treat the output as a Tensor of shape (1, ) in this case.
dim_out.push_back(1);
}
context->SetOutputDim("Out", framework::make_ddim(dim_out));
context->ShareLoD("X", /*->*/ "Out");
}
};
class MatMulOpMaker : public framework::OpProtoAndCheckerMaker {
public:
MatMulOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The first input of MatMul op");
AddInput("Y", "The second input of MatMul op");
AddOutput("Out", "The output of MatMul op");
AddAttr<bool>("transpose_X",
R"DOC(If true, use the transpose of `X`.
)DOC")
.SetDefault(false);
AddAttr<bool>("transpose_Y",
R"DOC(If true, use the transpose of `Y`.
)DOC")
.SetDefault(false);
AddComment(R"DOC(
The MatMul operator is used to perform (batched) matrix multiplication
over the last two dimensions of the input tensors `X` and `Y`.
If a transpose flag is specified, the last two dimensions of the
tensor are transposed. If the tensor is rank-1 of shape [D], then
for `X` it is treated as [1, D] in nontransposed form and as [D, 1]
in transposed form, whereas for `Y` it is the opposite: It is treated
as [D, 1] in nontransposed form and as [1, D] in transposed form.
Examples without transpose:
- X: [K], Y: [K] => Out: [1]
- X: [K], Y: [K, N] => Out: [N]
- X: [B, M, K], Y: [K] => Out: [B, M]
- X: [M, K], Y: [B, K, N] => Out: [B, M, N]
- X: [B, M, K], Y: [B, K, N] => Out: [B, M, N]
The behavior is designed to be similar to the `numpy.matmul` function.
The differences are:
- Currently only rank 1 to rank 3 input tensors are supported.
- We add `transpose_X` and `transpose_Y` flags.
Both the input `X` and `Y` can carry the LoD (Level of Details) information,
or not. But the output only shares the LoD with input `X`.
)DOC");
}
};
class MatMulOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext* context) const override {
PADDLE_ENFORCE(context->HasInput("X"), "Input(X) should not be null");
PADDLE_ENFORCE(context->HasInput("Y"), "Input(Y) should not be null");
PADDLE_ENFORCE(context->HasInput(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null");
auto x_dims = context->GetInputDim("X");
auto y_dims = context->GetInputDim("Y");
auto x_grad_name = framework::GradVarName("X");
auto y_grad_name = framework::GradVarName("Y");
if (context->HasOutput(x_grad_name)) {
context->SetOutputDim(x_grad_name, x_dims);
}
if (context->HasOutput(y_grad_name)) {
context->SetOutputDim(y_grad_name, y_dims);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP(matmul, ops::MatMulOp, ops::MatMulOpMaker, matmul_grad,
ops::MatMulOpGrad);
REGISTER_OP_CPU_KERNEL(matmul,
ops::MatMulKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(
matmul_grad, ops::MatMulGradKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/matmul_op.h"
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(matmul,
ops::MatMulKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(
matmul_grad, ops::MatMulGradKernel<paddle::platform::GPUPlace, float>);
/* Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
You may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/op_registry.h"
#include "paddle/operators/math/matmul.h"
#include "paddle/operators/transpose_op.h"
namespace paddle {
namespace operators {
namespace matmul_detail {
using Tensor = framework::Tensor;
using DDim = framework::DDim;
using framework::make_ddim;
using framework::vectorize;
template <typename Place, typename T>
class MatMulKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
const Tensor& x = *context.Input<Tensor>("X");
const Tensor& y = *context.Input<Tensor>("Y");
Tensor* out = context.Output<Tensor>("Out");
out->mutable_data<T>(context.GetPlace());
bool transpose_x = context.Attr<bool>("transpose_X");
bool transpose_y = context.Attr<bool>("transpose_Y");
math::MatMulFunctor<Place, T>()(context.device_context(), x, transpose_x, y,
transpose_y, T(1), out, T(0));
}
};
template <typename T>
inline Tensor Reshape(const Tensor& input, const DDim& dims) {
Tensor output;
output.ShareDataWith(input);
output.Resize(dims);
return output;
}
// Reshape a rank-3 tensor from P x M x N to (P * M) x N.
// Identity op if the tensor is not of rank 3.
template <typename T>
Tensor CombineBatchAndM(const Tensor& input) {
Tensor output;
output.ShareDataWith(input);
auto in_dims = input.dims();
if (in_dims.size() == 3) {
std::vector<int64_t> out_dims = {in_dims[0] * in_dims[1], in_dims[2]};
output.Resize(make_ddim(out_dims));
}
return output;
}
// Reshape a rank-3 tensor from P x M x N to M x (P * N).
// (Warning: This requires transposing data and writes into new memory.)
// Identity op if the tensor is not of rank 3.
template <typename Place, typename T>
Tensor CombineBatchAndN(const framework::ExecutionContext& context,
const Tensor& input) {
Tensor output;
auto in_dims = input.dims();
if (in_dims.size() == 3) {
output.Resize(in_dims);
output.mutable_data<T>(context.GetPlace());
EigenTranspose<Place, T, 3>(context, input, output, {1, 0, 2});
std::vector<int64_t> out_dims = {in_dims[1], in_dims[0] * in_dims[2]};
output.Resize(make_ddim(out_dims));
} else {
output.ShareDataWith(input);
}
return output;
}
// Using dimensional constraints on matrix multiplication, it is
// straight-forward to check the following table for when X and Y
// are both matrices.
//
// transpose_X | False | True | False | True
// transpose_Y | False | False | True | True
// -----------+----------+----------+----------+-----------
// dX = | dOut Y^T | Y dOut^T | dOut Y | Y^T dOut^T
// dY = | X^T dOut | X dOut | dOut^T X | dOut^T X^T
//
// When X is a vector of size K, we treat it instead as a matrix of shape
// (1, K). Similarly, when Y is a vector of size K, we treat it instead as
// a matrix of shape (K, 1).
//
// When X and Y are both 3-dimensional tensors, then the first dimension
// the batch dimension can be ignored and the exact same formulas apply
// as for two matrices.
//
// Finally, when, e.g., X is a 3-dimensional tensor but Y is a matrix, we end
// up with formulas like
//
// dY_{ij} = \sum_{p, m} X_{pmi} dOut_{pmj}
//
// To handle this sort of scenario, we reshape X : P x M x K, dOut: P x M x N
// to X: (P * M) x K, dOut: (P * M) x N.
template <typename Place, typename T>
class MatMulGradKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
const Tensor& x = *context.Input<Tensor>("X");
const Tensor& y = *context.Input<Tensor>("Y");
const Tensor& dout = *context.Input<Tensor>(framework::GradVarName("Out"));
Tensor* dx = context.Output<Tensor>(framework::GradVarName("X"));
Tensor* dy = context.Output<Tensor>(framework::GradVarName("Y"));
bool transpose_x = context.Attr<bool>("transpose_X");
bool transpose_y = context.Attr<bool>("transpose_Y");
std::vector<int64_t> x_dims = vectorize(x.dims());
std::vector<int64_t> y_dims = vectorize(y.dims());
// If X is a vector, reshape it to a matrix.
if (x_dims.size() == 1) {
x_dims.insert(x_dims.begin(), 1);
}
// If Y is a vector, reshape it to a matrix.
if (y_dims.size() == 1) {
y_dims.push_back(1);
}
// Fix the dOut dimensions.
int M = 0, N = 0, batchCountX = 0, batchCountY = 0;
switch (x_dims.size()) {
case 2:
M = transpose_x ? x_dims[1] : x_dims[0];
break;
case 3:
batchCountX = x_dims[0];
M = transpose_x ? x_dims[2] : x_dims[1];
break;
default:
assert(false);
}
switch (y_dims.size()) {
case 2:
N = transpose_y ? y_dims[0] : y_dims[1];
break;
case 3:
batchCountY = y_dims[0];
N = transpose_y ? y_dims[1] : y_dims[2];
break;
default:
assert(false);
}
if (batchCountX && batchCountY) {
PADDLE_ENFORCE_EQ(
batchCountX, batchCountY,
"When Input(X) and Input(Y) are both three dimensional, they "
"must have the same batch dimension.");
}
int batchCount = std::max(batchCountX, batchCountY);
std::vector<int64_t> dout_dims = {M, N};
if (batchCount) {
dout_dims.insert(dout_dims.begin(), batchCount);
}
Tensor X = Reshape<T>(x, make_ddim(x_dims));
Tensor Y = Reshape<T>(y, make_ddim(y_dims));
Tensor dOut = Reshape<T>(dout, make_ddim(dout_dims));
if (dx) {
dx->mutable_data<T>(context.GetPlace());
const Tensor& dOut_for_dX =
(x_dims.size() == 2 && y_dims.size() == 3)
? CombineBatchAndN<Place, T>(context, dOut)
: dOut;
if (x_dims.size() == 2 && y_dims.size() == 3) {
Y = transpose_y ? CombineBatchAndM<T>(Y)
: CombineBatchAndN<Place, T>(context, Y);
}
if (transpose_x) {
math::MatMulFunctor<Place, T>()(context.device_context(), Y,
transpose_y, dOut_for_dX, transpose_x,
T(1), dx, T(0));
} else {
math::MatMulFunctor<Place, T>()(context.device_context(), dOut_for_dX,
transpose_x, Y, !transpose_y, T(1), dx,
T(0));
}
}
if (dy) {
dy->mutable_data<T>(context.GetPlace());
const Tensor& dOut_for_dY = (y_dims.size() == 2 && x_dims.size() == 3)
? CombineBatchAndM<T>(dOut)
: dOut;
if (y_dims.size() == 2 && x_dims.size() == 3) {
X = transpose_x ? CombineBatchAndN<Place, T>(context, X)
: CombineBatchAndM<T>(X);
dOut = CombineBatchAndM<T>(dOut);
}
if (transpose_y) {
math::MatMulFunctor<Place, T>()(context.device_context(), dOut_for_dY,
transpose_y, X, transpose_x, T(1), dy,
T(0));
} else {
math::MatMulFunctor<Place, T>()(context.device_context(), X,
!transpose_x, dOut_for_dY, transpose_y,
T(1), dy, T(0));
}
}
}
};
} // namespace matmul_detail
using matmul_detail::MatMulKernel;
using matmul_detail::MatMulGradKernel;
} // namespace operators
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/momentum_op.h"
namespace paddle {
namespace operators {
class MomentumOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Param"),
"Input(param) of Momentum should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Grad"),
"Input(grad) of Momentum should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Velocity"),
"Input(velocity) of Momentum should not be null.");
PADDLE_ENFORCE(ctx->HasInput("LearningRate"),
"Input(LearningRate) of Momentum should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("ParamOut"),
"Output(ParamOut) of Momentum should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("VelocityOut"),
"Output(VelocityOut) of Momentum should not be null.");
auto param_dim = ctx->GetInputDim("Param");
PADDLE_ENFORCE_EQ(
param_dim, ctx->GetInputDim("Grad"),
"Param and Grad input of MomentumOp should have the same dimension.");
PADDLE_ENFORCE_EQ(
param_dim, ctx->GetInputDim("Velocity"),
"Param and Velocity of MomentumOp should have the same dimension.");
PADDLE_ENFORCE_EQ(framework::product(ctx->GetInputDim("LearningRate")), 1,
"Learning_rate should be a scalar");
ctx->SetOutputDim("ParamOut", param_dim);
ctx->SetOutputDim("VelocityOut", param_dim);
}
};
class MomentumOpMaker : public framework::OpProtoAndCheckerMaker {
public:
MomentumOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param",
"(Tensor, default Tensor<float>) "
"Input parameter that has to be updated");
AddInput("Grad",
"(Tensor, default Tensor<float>) "
"Input gradient of the parameter");
AddInput("Velocity",
"(Tensor, default Tensor<float>) "
"Input velocity (corresponding to the parameter) "
"that has to be updated");
AddInput("LearningRate",
"(Tensor, default Tensor<float>) "
"Input learning rate");
AddOutput("ParamOut", "(Tensor) Output updated parameter");
AddOutput("VelocityOut", "(Tensor) Output updated velocity");
AddAttr<float>("mu", "(float) Momentum coefficient");
AddAttr<bool>("useNesterov", "(bool) Use Nesterov Momentum")
.SetDefault(false);
AddComment(R"DOC(
Momentum Algorithm with a flag for Nestrov Moemntum (momentum).
velocity = mu * velocity + gradient
if (use_nesterov):
param = param - gradient * learning_rate + mu * velocity * learning_rate
else:
param = param - learning_rate * velocity
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(momentum, ops::MomentumOp, ops::MomentumOpMaker);
REGISTER_OP_CPU_KERNEL(
momentum, ops::MomentumOpKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#define EIGEN_USE_GPU
#include "paddle/operators/momentum_op.h"
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(
momentum, ops::MomentumOpKernel<paddle::platform::GPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
template <typename Place, typename T>
class MomentumOpKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto param_out = ctx.Output<framework::Tensor>("ParamOut");
auto velocity_out = ctx.Output<framework::Tensor>("VelocityOut");
auto param = ctx.Input<framework::Tensor>("Param");
auto velocity = ctx.Input<framework::Tensor>("Velocity");
auto grad = ctx.Input<framework::Tensor>("Grad");
auto learning_rate = ctx.Input<framework::Tensor>("LearningRate");
param_out->mutable_data<T>(ctx.GetPlace());
velocity_out->mutable_data<T>(ctx.GetPlace());
float mu = ctx.Attr<float>("mu");
bool use_nesterov = ctx.Attr<bool>("useNesterov");
auto p_out = framework::EigenVector<T>::Flatten(*param_out);
auto v_out = framework::EigenVector<T>::Flatten(*velocity_out);
auto p = framework::EigenVector<T>::Flatten(*param);
auto v = framework::EigenVector<T>::Flatten(*velocity);
auto g = framework::EigenVector<T>::Flatten(*grad);
auto lr = framework::EigenVector<T>::Flatten(*learning_rate);
auto place = ctx.GetEigenDevice<Place>();
Eigen::DSizes<int, 1> grad_dsize(grad->numel());
v_out.device(place) = v * mu + g;
if (use_nesterov) {
p_out.device(place) = p - g * lr.broadcast(grad_dsize) +
v_out * mu * lr.broadcast(grad_dsize);
} else {
p_out.device(place) = p - lr.broadcast(grad_dsize) * v_out;
}
}
};
} // namespace operators
} // namespace paddle
...@@ -104,10 +104,10 @@ class MulOpGrad : public framework::OperatorWithKernel { ...@@ -104,10 +104,10 @@ class MulOpGrad : public framework::OperatorWithKernel {
auto y_dims = ctx->GetInputDim("Y"); auto y_dims = ctx->GetInputDim("Y");
auto out_dims = ctx->GetInputDim(framework::GradVarName("Out")); auto out_dims = ctx->GetInputDim(framework::GradVarName("Out"));
auto x_mat_dims = auto x_mat_dims = framework::flatten_to_2d(
framework::flatten_to_2d(x_dims, Attr<int>("x_num_col_dims")); x_dims, ctx->Attrs().Get<int>("x_num_col_dims"));
auto y_mat_dims = auto y_mat_dims = framework::flatten_to_2d(
framework::flatten_to_2d(y_dims, Attr<int>("y_num_col_dims")); y_dims, ctx->Attrs().Get<int>("y_num_col_dims"));
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
x_mat_dims[0], out_dims[0], x_mat_dims[0], out_dims[0],
......
...@@ -36,12 +36,12 @@ class MulKernel : public framework::OpKernel<T> { ...@@ -36,12 +36,12 @@ class MulKernel : public framework::OpKernel<T> {
Tensor* z = context.Output<Tensor>("Out"); Tensor* z = context.Output<Tensor>("Out");
const Tensor x_matrix = const Tensor x_matrix =
x->dims().size() > 2 x->dims().size() > 2
? framework::ReshapeToMatrix<T>( ? framework::ReshapeToMatrix(
*x, context.template Attr<int>("x_num_col_dims")) *x, context.template Attr<int>("x_num_col_dims"))
: *x; : *x;
const Tensor y_matrix = const Tensor y_matrix =
y->dims().size() > 2 y->dims().size() > 2
? framework::ReshapeToMatrix<T>( ? framework::ReshapeToMatrix(
*y, context.template Attr<int>("y_num_col_dims")) *y, context.template Attr<int>("y_num_col_dims"))
: *y; : *y;
...@@ -59,11 +59,11 @@ class MulGradKernel : public framework::OpKernel<T> { ...@@ -59,11 +59,11 @@ class MulGradKernel : public framework::OpKernel<T> {
int y_num_col_dims = ctx.template Attr<int>("y_num_col_dims"); int y_num_col_dims = ctx.template Attr<int>("y_num_col_dims");
const Tensor* x = ctx.Input<Tensor>("X"); const Tensor* x = ctx.Input<Tensor>("X");
const Tensor* y = ctx.Input<Tensor>("Y"); const Tensor* y = ctx.Input<Tensor>("Y");
const Tensor x_matrix = const Tensor x_matrix = x->dims().size() > 2
x->dims().size() > 2 ? framework::ReshapeToMatrix<T>(*x, x_num_col_dims) ? framework::ReshapeToMatrix(*x, x_num_col_dims)
: *x; : *x;
const Tensor y_matrix = const Tensor y_matrix = y->dims().size() > 2
y->dims().size() > 2 ? framework::ReshapeToMatrix<T>(*y, y_num_col_dims) ? framework::ReshapeToMatrix(*y, y_num_col_dims)
: *y; : *y;
const Tensor* dout = ctx.Input<Tensor>(framework::GradVarName("Out")); const Tensor* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
...@@ -71,8 +71,8 @@ class MulGradKernel : public framework::OpKernel<T> { ...@@ -71,8 +71,8 @@ class MulGradKernel : public framework::OpKernel<T> {
Tensor* dy = ctx.Output<Tensor>(framework::GradVarName("Y")); Tensor* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
if (dx) { if (dx) {
dx->mutable_data<T>(ctx.GetPlace()); dx->mutable_data<T>(ctx.GetPlace());
Tensor dx_matrix = dx->dims().size() > 2 ? framework::ReshapeToMatrix<T>( Tensor dx_matrix = dx->dims().size() > 2
*dx, x_num_col_dims) ? framework::ReshapeToMatrix(*dx, x_num_col_dims)
: *dx; : *dx;
// dx = dout * y'. dx: M x K, dout : M x N, y : K x N // dx = dout * y'. dx: M x K, dout : M x N, y : K x N
math::matmul<Place, T>(ctx.device_context(), *dout, false, y_matrix, true, math::matmul<Place, T>(ctx.device_context(), *dout, false, y_matrix, true,
...@@ -80,8 +80,8 @@ class MulGradKernel : public framework::OpKernel<T> { ...@@ -80,8 +80,8 @@ class MulGradKernel : public framework::OpKernel<T> {
} }
if (dy) { if (dy) {
dy->mutable_data<T>(ctx.GetPlace()); dy->mutable_data<T>(ctx.GetPlace());
Tensor dy_matrix = dy->dims().size() > 2 ? framework::ReshapeToMatrix<T>( Tensor dy_matrix = dy->dims().size() > 2
*dy, y_num_col_dims) ? framework::ReshapeToMatrix(*dy, y_num_col_dims)
: *dy; : *dy;
// dy = x' * dout. dy K x N, dout : M x N, x : M x K // dy = x' * dout. dy K x N, dout : M x N, x : M x K
math::matmul<Place, T>(ctx.device_context(), x_matrix, true, *dout, false, math::matmul<Place, T>(ctx.device_context(), x_matrix, true, *dout, false,
......
...@@ -33,8 +33,7 @@ class MultiplexGPUKernel : public framework::OpKernel<T> { ...@@ -33,8 +33,7 @@ class MultiplexGPUKernel : public framework::OpKernel<T> {
auto cols = ins[0]->numel() / rows; auto cols = ins[0]->numel() / rows;
// copy index to cpu // copy index to cpu
Tensor index_t_cpu; Tensor index_t_cpu;
index_t_cpu.CopyFrom<int32_t>(*ids, platform::CPUPlace(), index_t_cpu.CopyFrom(*ids, platform::CPUPlace(), ctx.device_context());
ctx.device_context());
auto* index = index_t_cpu.data<int32_t>(); auto* index = index_t_cpu.data<int32_t>();
auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context()) ctx.device_context())
...@@ -71,8 +70,7 @@ class MultiplexGradGPUKernel : public framework::OpKernel<T> { ...@@ -71,8 +70,7 @@ class MultiplexGradGPUKernel : public framework::OpKernel<T> {
auto cols = ins[0]->numel() / rows; auto cols = ins[0]->numel() / rows;
// copy index to cpu // copy index to cpu
Tensor index_t_cpu; Tensor index_t_cpu;
index_t_cpu.CopyFrom<int32_t>(*ids, platform::CPUPlace(), index_t_cpu.CopyFrom(*ids, platform::CPUPlace(), ctx.device_context());
ctx.device_context());
auto* index = index_t_cpu.data<int32_t>(); auto* index = index_t_cpu.data<int32_t>();
auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/proximal_gd_op.h"
namespace paddle {
namespace operators {
class ProximalGDOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Param"),
"Input(Param) of ProximalGDOp should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Grad"),
"Input(Grad) of ProximalGDOp should not be null.");
PADDLE_ENFORCE(ctx->HasInput("LearningRate"),
"Input(LearningRate) of ProximalGDOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("ParamOut"),
"Output(ParamOut) of ProximalGDOp should not be null.");
auto param_dim = ctx->GetInputDim("Param");
PADDLE_ENFORCE_EQ(param_dim, ctx->GetInputDim("Grad"),
"Two input of ProximalGD Op's dimension must be same.");
auto lr_dim = ctx->GetInputDim("LearningRate");
PADDLE_ENFORCE_EQ(framework::product(lr_dim), 1,
"Learning Rate should be a scalar.");
ctx->SetOutputDim("ParamOut", param_dim);
}
};
class ProximalGDOpMaker : public framework::OpProtoAndCheckerMaker {
public:
ProximalGDOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param",
"(Tensor, default Tensor<float>) "
"Input parameter value that has to be updated.");
AddInput("Grad",
"(Tensor, default Tensor<float>) "
"Input gradient of the parameter.");
AddInput("LearningRate",
"(Tensor, default Tensor<float>) "
"The learning rate should be a tensor of size 1.");
AddOutput("ParamOut", "(Tensor) Output updated parameter value.");
AddAttr<float>("l1",
"(float, default 0.0) "
"L1 regularization strength.")
.SetDefault(0.0f);
AddAttr<float>("l2",
"(float, default 0.0)"
"L2 regularization strength.")
.SetDefault(0.0f);
AddComment(R"DOC(
Optimizer that implements the proximal gradient descent algorithm.
prox_param = param - learning_rate * grad
param = sign(prox_param) / (1 + learning_rate * l2) *
max { |prox_param| - learning_rate * l1 , 0 }
The paper that proposed Proximal Gradient Descent:
(http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf)
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(proximal_gd, ops::ProximalGDOp,
ops::ProximalGDOpMaker);
REGISTER_OP_CPU_KERNEL(
proximal_gd, ops::ProximalGDOpKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
You may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. */
#define EIGEN_USE_GPU
#include "paddle/operators/proximal_gd_op.h"
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(
proximal_gd, ops::ProximalGDOpKernel<paddle::platform::GPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
template <typename Place, typename T>
class ProximalGDOpKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* param_out = ctx.Output<Tensor>("ParamOut");
param_out->mutable_data<T>(ctx.GetPlace());
auto grad = ctx.Input<Tensor>("Grad");
auto l1 = static_cast<T>(ctx.Attr<float>("l1"));
auto l2 = static_cast<T>(ctx.Attr<float>("l2"));
auto p = EigenVector<T>::Flatten(*ctx.Input<Tensor>("Param"));
auto g = EigenVector<T>::Flatten(*grad);
auto lr = EigenVector<T>::Flatten(*ctx.Input<Tensor>("LearningRate"));
auto p_out = EigenVector<T>::Flatten(*param_out);
auto place = ctx.GetEigenDevice<Place>();
Eigen::DSizes<int, 1> grad_dsize(grad->numel());
auto prox_param = p - lr.broadcast(grad_dsize) * g;
if (l1 > 0) {
p_out.device(place) =
prox_param.sign() *
(((prox_param.abs() - (lr * l1).broadcast(grad_dsize))
.cwiseMax(T(0.0))) /
(1.0 + (lr * l2).broadcast(grad_dsize)));
} else {
p_out.device(place) =
prox_param / (1.0 + (lr * l2).broadcast(grad_dsize));
}
}
};
} // namespace operators
} // namespace paddle
...@@ -42,7 +42,7 @@ void RecurrentAlgorithm::Run(const Scope& scope, ...@@ -42,7 +42,7 @@ void RecurrentAlgorithm::Run(const Scope& scope,
for (size_t step_id = 0; step_id < seq_len; step_id++) { for (size_t step_id = 0; step_id < seq_len; step_id++) {
if (step_id > 0) { if (step_id > 0) {
rnn::LinkMemories(step_scopes, arg_->memories, step_id, -1); rnn::LinkMemories(step_scopes, arg_->states, step_id, -1);
} }
(*stepnet_)->Run(*step_scopes[step_id], dev_ctx); (*stepnet_)->Run(*step_scopes[step_id], dev_ctx);
} }
...@@ -59,7 +59,8 @@ void RecurrentAlgorithm::CreateScopes(const Scope& scope, ...@@ -59,7 +59,8 @@ void RecurrentAlgorithm::CreateScopes(const Scope& scope,
// Now all variables in scope must be created outside of op. // Now all variables in scope must be created outside of op.
PADDLE_ENFORCE_NOT_NULL(stepnet_); PADDLE_ENFORCE_NOT_NULL(stepnet_);
PADDLE_ENFORCE(!(*stepnet_)->Outputs().empty(), "stepnet_ op has no outputs"); PADDLE_ENFORCE(!(*stepnet_)->Outputs().empty(),
"step_unit_ op has no outputs");
if (seq_len > step_scopes->size()) { if (seq_len > step_scopes->size()) {
for (size_t i = step_scopes->size(); i < seq_len; ++i) { for (size_t i = step_scopes->size(); i < seq_len; ++i) {
...@@ -86,7 +87,7 @@ void RecurrentAlgorithm::CreateScopes(const Scope& scope, ...@@ -86,7 +87,7 @@ void RecurrentAlgorithm::CreateScopes(const Scope& scope,
} }
void RecurrentAlgorithm::InitMemories(Scope* step_scope) const { void RecurrentAlgorithm::InitMemories(Scope* step_scope) const {
for (auto& attr : arg_->memories) { for (auto& attr : arg_->states) {
auto* pre_mem = step_scope->Var(attr.pre_var)->GetMutable<LoDTensor>(); auto* pre_mem = step_scope->Var(attr.pre_var)->GetMutable<LoDTensor>();
PADDLE_ENFORCE(step_scope->FindVar(attr.boot_var) != nullptr, PADDLE_ENFORCE(step_scope->FindVar(attr.boot_var) != nullptr,
"memory [%s]'s boot variable [%s] not exists", attr.var, "memory [%s]'s boot variable [%s] not exists", attr.var,
...@@ -95,17 +96,17 @@ void RecurrentAlgorithm::InitMemories(Scope* step_scope) const { ...@@ -95,17 +96,17 @@ void RecurrentAlgorithm::InitMemories(Scope* step_scope) const {
step_scope->FindVar(attr.boot_var)->GetMutable<LoDTensor>(); step_scope->FindVar(attr.boot_var)->GetMutable<LoDTensor>();
pre_mem->Resize(boot_mem->dims()); pre_mem->Resize(boot_mem->dims());
PADDLE_ENFORCE_EQ(pre_mem->dims().size(), 2); PADDLE_ENFORCE_EQ(pre_mem->dims().size(), 2);
pre_mem->ShareDataWith<float>(*boot_mem); pre_mem->ShareDataWith(*boot_mem);
} }
} }
const rnn::ArgumentName RecurrentOp::kArgName{ const rnn::ArgumentName RecurrentOp::kArgName{
"step_net", "step_scopes", "inlinks", "outlinks", "step_net", "step_scopes", "inputs", "outputs",
"memories", "pre_memories", "boot_memories"}; "states", "ex_states", "initial_states"};
const rnn::ArgumentName RecurrentGradientOp::kArgName{ const rnn::ArgumentName RecurrentGradientOp::kArgName{
"step_net", "step_scopes@GRAD", "outlinks@GRAD", "inlinks@GRAD", "step_net", "step_scopes@GRAD", "outputs@GRAD", "inputs@GRAD",
"memories", "pre_memories", "boot_memories@GRAD"}; "states", "ex_states", "initial_states@GRAD"};
RecurrentOp::RecurrentOp(const std::string& type, RecurrentOp::RecurrentOp(const std::string& type,
const framework::VariableNameMap& inputs, const framework::VariableNameMap& inputs,
...@@ -127,7 +128,7 @@ class RecurrentAlgorithmProtoAndCheckerMaker ...@@ -127,7 +128,7 @@ class RecurrentAlgorithmProtoAndCheckerMaker
AddInput(name.inlinks, AddInput(name.inlinks,
"the inputs that need to be segmented for each step.") "the inputs that need to be segmented for each step.")
.AsDuplicable(); .AsDuplicable();
AddInput(name.boot_memories, "variables to initialize memories.") AddInput(name.initial_states, "variables to initialize states.")
.AsDuplicable(); .AsDuplicable();
AddOutput(name.outlinks, "the outputs that need to concated for all steps.") AddOutput(name.outlinks, "the outputs that need to concated for all steps.")
...@@ -135,9 +136,8 @@ class RecurrentAlgorithmProtoAndCheckerMaker ...@@ -135,9 +136,8 @@ class RecurrentAlgorithmProtoAndCheckerMaker
AddOutput(name.step_scopes, "step scopes"); AddOutput(name.step_scopes, "step scopes");
// Attributes stored in AttributeMap // Attributes stored in AttributeMap
AddAttr<std::vector<std::string>>(name.pre_memories, AddAttr<std::vector<std::string>>(name.ex_states, "names of pre-states");
"names of pre-memories"); AddAttr<std::vector<std::string>>(name.states, "names of states");
AddAttr<std::vector<std::string>>(name.memories, "names of memories");
AddComment("This is a recurrent group operator."); AddComment("This is a recurrent group operator.");
} }
...@@ -152,7 +152,7 @@ void RecurrentGradientAlgorithm::Run( ...@@ -152,7 +152,7 @@ void RecurrentGradientAlgorithm::Run(
rnn::SegmentInputs(step_scopes, arg_->inlinks, seq_len); rnn::SegmentInputs(step_scopes, arg_->inlinks, seq_len);
for (int step_id = seq_len - 1; step_id >= 0; --step_id) { for (int step_id = seq_len - 1; step_id >= 0; --step_id) {
if (static_cast<size_t>(step_id) != seq_len - 1) { if (static_cast<size_t>(step_id) != seq_len - 1) {
rnn::LinkMemories(step_scopes, arg_->memories, step_id, 1); rnn::LinkMemories(step_scopes, arg_->states, step_id, 1);
} }
(*stepnet_)->Run(*step_scopes[step_id], dev_ctx); (*stepnet_)->Run(*step_scopes[step_id], dev_ctx);
} }
...@@ -162,7 +162,7 @@ void RecurrentGradientAlgorithm::Run( ...@@ -162,7 +162,7 @@ void RecurrentGradientAlgorithm::Run(
void RecurrentGradientAlgorithm::LinkBootMemoryGradients( void RecurrentGradientAlgorithm::LinkBootMemoryGradients(
Scope* step_scope) const { Scope* step_scope) const {
for (auto& attr : arg_->memories) { for (auto& attr : arg_->states) {
PADDLE_ENFORCE(step_scope->FindVar(attr.var) != nullptr, PADDLE_ENFORCE(step_scope->FindVar(attr.var) != nullptr,
"memory variable [%s] does not exists", attr.var); "memory variable [%s] does not exists", attr.var);
PADDLE_ENFORCE(step_scope->FindVar(attr.boot_var) != nullptr, PADDLE_ENFORCE(step_scope->FindVar(attr.boot_var) != nullptr,
...@@ -171,7 +171,7 @@ void RecurrentGradientAlgorithm::LinkBootMemoryGradients( ...@@ -171,7 +171,7 @@ void RecurrentGradientAlgorithm::LinkBootMemoryGradients(
auto* boot_mem_grad = auto* boot_mem_grad =
step_scope->Var(attr.boot_var)->GetMutable<LoDTensor>(); step_scope->Var(attr.boot_var)->GetMutable<LoDTensor>();
boot_mem_grad->Resize(mem_grad->dims()); boot_mem_grad->Resize(mem_grad->dims());
boot_mem_grad->ShareDataWith<float>(*mem_grad); boot_mem_grad->ShareDataWith(*mem_grad);
} }
} }
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
limitations under the License. */ limitations under the License. */
#include "paddle/operators/reduce_op.h" #include "paddle/operators/reduce_op.h"
#include "paddle/operators/net_op.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
...@@ -159,6 +160,66 @@ class ReduceMinOpMaker : public ReduceOpMaker { ...@@ -159,6 +160,66 @@ class ReduceMinOpMaker : public ReduceOpMaker {
} }
}; };
class NormOp : public NetOp {
public:
NormOp(const std::string &type, const framework::VariableNameMap &inputs,
const framework::VariableNameMap &outputs,
const framework::AttributeMap &attrs)
: NetOp(type, inputs, outputs, attrs) {
PADDLE_ENFORCE_NE(Input("X"), framework::kEmptyVarName,
"Input(X) of NormOp should not be null.");
PADDLE_ENFORCE_NE(Output("AbsOut"), framework::kEmptyVarName,
"Output(AbsOut) of NormOp should not be null.");
PADDLE_ENFORCE_NE(Output("PowOut"), framework::kEmptyVarName,
"Output(PowOut) of NormOp should not be null.");
PADDLE_ENFORCE_NE(Output("SumOut"), framework::kEmptyVarName,
"Output(SumOut) of NormOp should not be null.");
PADDLE_ENFORCE_NE(Output("Out"), framework::kEmptyVarName,
"Output(Out) of NormOp should not be null.");
auto dim = Attr<int>("dim");
auto keep_dim = Attr<bool>("keep_dim");
auto p = Attr<float>("p");
PADDLE_ENFORCE_GT(p, 0, "Order of the norm should be positive.");
AppendOp(framework::OpRegistry::CreateOp("abs", {{"X", {Input("X")}}},
{{"Y", {Output("AbsOut")}}}, {}));
AppendOp(framework::OpRegistry::CreateOp("pow", {{"X", {Output("AbsOut")}}},
{{"Y", {Output("PowOut")}}},
{{"factor", p}}));
framework::AttributeMap sum_attr;
sum_attr["dim"] = dim;
sum_attr["keep_dim"] = keep_dim;
AppendOp(framework::OpRegistry::CreateOp(
"reduce_sum", {{"X", {Output("PowOut")}}},
{{"Out", {Output("SumOut")}}}, sum_attr));
AppendOp(framework::OpRegistry::CreateOp(
"pow", {{"X", {Output("SumOut")}}}, {{"Y", {Output("Out")}}},
{{"factor", static_cast<float>(1. / p)}}));
CompleteAddOp(false);
}
};
class NormOpMaker : public ReduceOpMaker {
public:
NormOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: ReduceOpMaker(proto, op_checker) {
AddOutput("AbsOut",
"(Tensor) The intermediate output of Norm operator, "
"saving the absolute value of the input tensor X.")
.AsIntermediate();
AddOutput("PowOut",
"(Tensor) The intermediate output of Norm operator, "
"saving the p-th power of the output tensor AbsOut.")
.AsIntermediate();
AddOutput("SumOut",
"(Tensor) the intermediate output of Norm operator, "
"saving the sum of PowOut reduced on the given dimension.")
.AsIntermediate();
AddAttr<float>("p", "(float, default 2) The order of Norm.").SetDefault(2);
SetComment("Norm", "vector p-norm");
AddComment(comment_);
}
};
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
...@@ -176,6 +237,8 @@ REGISTER_OP(reduce_max, ops::ReduceOp, ops::ReduceMaxOpMaker, reduce_max_grad, ...@@ -176,6 +237,8 @@ REGISTER_OP(reduce_max, ops::ReduceOp, ops::ReduceMaxOpMaker, reduce_max_grad,
REGISTER_OP(reduce_min, ops::ReduceOp, ops::ReduceMinOpMaker, reduce_min_grad, REGISTER_OP(reduce_min, ops::ReduceOp, ops::ReduceMinOpMaker, reduce_min_grad,
ops::ReduceGradOp); ops::ReduceGradOp);
REGISTER_OP_WITHOUT_GRADIENT(norm, ops::NormOp, ops::NormOpMaker);
#define REGISTER_REDUCE_CPU_KERNEL(reduce_type, functor, grad_functor) \ #define REGISTER_REDUCE_CPU_KERNEL(reduce_type, functor, grad_functor) \
REGISTER_OP_CPU_KERNEL( \ REGISTER_OP_CPU_KERNEL( \
reduce_type, \ reduce_type, \
......
...@@ -33,7 +33,7 @@ class ReshapeKernel : public framework::OpKernel<T> { ...@@ -33,7 +33,7 @@ class ReshapeKernel : public framework::OpKernel<T> {
std::transform(shape.begin(), shape.end(), shape_int64.begin(), std::transform(shape.begin(), shape.end(), shape_int64.begin(),
[](int a) { return static_cast<int64_t>(a); }); [](int a) { return static_cast<int64_t>(a); });
auto out_dims = framework::make_ddim(shape_int64); auto out_dims = framework::make_ddim(shape_int64);
out->CopyFrom<T>(*in, ctx.GetPlace(), ctx.device_context()); out->CopyFrom(*in, ctx.GetPlace(), ctx.device_context());
out->Resize(out_dims); out->Resize(out_dims);
} }
}; };
...@@ -47,7 +47,7 @@ class ReshapeGradKernel : public framework::OpKernel<T> { ...@@ -47,7 +47,7 @@ class ReshapeGradKernel : public framework::OpKernel<T> {
d_x->mutable_data<T>(ctx.GetPlace()); d_x->mutable_data<T>(ctx.GetPlace());
auto in_dims = d_x->dims(); auto in_dims = d_x->dims();
d_x->CopyFrom<T>(*d_out, ctx.GetPlace(), ctx.device_context()); d_x->CopyFrom(*d_out, ctx.GetPlace(), ctx.device_context());
d_x->Resize(in_dims); d_x->Resize(in_dims);
} }
}; };
......
...@@ -36,14 +36,14 @@ void SegmentInputs(const std::vector<Scope*>& step_scopes, ...@@ -36,14 +36,14 @@ void SegmentInputs(const std::vector<Scope*>& step_scopes,
LoDTensor* input = input_var->GetMutable<LoDTensor>(); LoDTensor* input = input_var->GetMutable<LoDTensor>();
f::DDim dims = input->dims(); f::DDim dims = input->dims();
PADDLE_ENFORCE_EQ(static_cast<size_t>(dims[0]), seq_len, PADDLE_ENFORCE_EQ(static_cast<size_t>(dims[0]), seq_len,
"all the inlinks be the same length"); "all the inputs be the same length");
f::DDim step_dims = slice_ddim(dims, 1, dims.size()); f::DDim step_dims = slice_ddim(dims, 1, dims.size());
for (size_t j = 0; j < seq_len; j++) { for (size_t j = 0; j < seq_len; j++) {
Tensor* step_input = Tensor* step_input =
step_scopes[j]->Var(inlinks[i])->GetMutable<Tensor>(); step_scopes[j]->Var(inlinks[i])->GetMutable<Tensor>();
// The input of operators of each step is Tensor here. // The input of operators of each step is Tensor here.
// Maybe need to modify Slice function. // Maybe need to modify Slice function.
*step_input = input->Slice<float>(j, j + 1); *step_input = input->Slice(j, j + 1);
step_input->Resize(step_dims); step_input->Resize(step_dims);
} }
} }
...@@ -71,14 +71,14 @@ void ConcatOutputs(const std::vector<Scope*>& step_scopes, ...@@ -71,14 +71,14 @@ void ConcatOutputs(const std::vector<Scope*>& step_scopes,
step_scopes[j]->FindVar(outlinks[i])->GetMutable<LoDTensor>(); step_scopes[j]->FindVar(outlinks[i])->GetMutable<LoDTensor>();
// TODO(luotao02) data type and platform::DeviceContext() should set // TODO(luotao02) data type and platform::DeviceContext() should set
// correctly // correctly
(output->Slice<float>(j, j + 1)) (output->Slice(j, j + 1))
.CopyFrom<float>(*step_output, platform::CPUPlace(), ctx); .CopyFrom(*step_output, platform::CPUPlace(), ctx);
} }
} }
} }
void LinkMemories(const std::vector<Scope*>& scopes, void LinkMemories(const std::vector<Scope*>& scopes,
const std::vector<rnn::MemoryAttr>& memories, const std::vector<rnn::StateAttr>& memories,
const size_t step_id, const int offset) { const size_t step_id, const int offset) {
PADDLE_ENFORCE_LT(step_id, scopes.size(), PADDLE_ENFORCE_LT(step_id, scopes.size(),
"step [%d] is out of range of step scopes' size [%d]", "step [%d] is out of range of step scopes' size [%d]",
...@@ -95,7 +95,7 @@ void LinkMemories(const std::vector<Scope*>& scopes, ...@@ -95,7 +95,7 @@ void LinkMemories(const std::vector<Scope*>& scopes,
auto* mem = scope->FindVar(attr.pre_var)->GetMutable<LoDTensor>(); auto* mem = scope->FindVar(attr.pre_var)->GetMutable<LoDTensor>();
auto* linked_mem = linked_scope->FindVar(attr.var)->GetMutable<LoDTensor>(); auto* linked_mem = linked_scope->FindVar(attr.var)->GetMutable<LoDTensor>();
mem->Resize(linked_mem->dims()); mem->Resize(linked_mem->dims());
mem->ShareDataWith<float>(*linked_mem); mem->ShareDataWith(*linked_mem);
} }
} }
...@@ -106,26 +106,26 @@ void InitArgument(const ArgumentName& name, Argument* arg, ...@@ -106,26 +106,26 @@ void InitArgument(const ArgumentName& name, Argument* arg,
arg->inlinks = op.Inputs(name.inlinks); arg->inlinks = op.Inputs(name.inlinks);
arg->outlinks = op.Outputs(name.outlinks); arg->outlinks = op.Outputs(name.outlinks);
auto& boot_memories = auto& boot_memories = is_grad ? op.Outputs(name.initial_states)
is_grad ? op.Outputs(name.boot_memories) : op.Inputs(name.boot_memories); : op.Inputs(name.initial_states);
// attributes // attributes
auto& memories = op.Attr<std::vector<std::string>>(name.memories); auto& memories = op.Attr<std::vector<std::string>>(name.states);
auto& pre_memories = op.Attr<std::vector<std::string>>(name.pre_memories); auto& pre_memories = op.Attr<std::vector<std::string>>(name.ex_states);
PADDLE_ENFORCE(memories.size() == boot_memories.size(), PADDLE_ENFORCE(memories.size() == boot_memories.size(),
"the size of memories, boot_memories don't match:%d,%d", "the size of states, initial_states don't match:%d,%d",
memories.size(), boot_memories.size()); memories.size(), boot_memories.size());
PADDLE_ENFORCE(pre_memories.size() == boot_memories.size(), PADDLE_ENFORCE(pre_memories.size() == boot_memories.size(),
"the size of pre_memories, boot_memories don't match:%d,%d", "the size of ex_states, initial_states don't match:%d,%d",
pre_memories.size(), boot_memories.size()); pre_memories.size(), boot_memories.size());
PADDLE_ENFORCE(memories.size() > 0, "more than 1 memories should be set"); PADDLE_ENFORCE(memories.size() > 0, "more than 1 states should be set");
for (size_t i = 0; i < memories.size(); ++i) { for (size_t i = 0; i < memories.size(); ++i) {
rnn::MemoryAttr mem_attr; rnn::StateAttr mem_attr;
mem_attr.var = memories[i]; mem_attr.var = memories[i];
mem_attr.pre_var = pre_memories[i]; mem_attr.pre_var = pre_memories[i];
mem_attr.boot_var = boot_memories[i]; mem_attr.boot_var = boot_memories[i];
(arg->memories).push_back(mem_attr); (arg->states).push_back(mem_attr);
} }
} }
......
...@@ -31,7 +31,7 @@ using Scope = framework::Scope; ...@@ -31,7 +31,7 @@ using Scope = framework::Scope;
* boot memories in father scope. Other attributes are copied from Op's proto * boot memories in father scope. Other attributes are copied from Op's proto
* attributes. * attributes.
*/ */
struct MemoryAttr { struct StateAttr {
// name of current state variable // name of current state variable
std::string var; std::string var;
// name of previous step's state variable // name of previous step's state variable
...@@ -46,7 +46,7 @@ struct Argument { ...@@ -46,7 +46,7 @@ struct Argument {
std::string step_scopes; std::string step_scopes;
std::vector<std::string> inlinks; std::vector<std::string> inlinks;
std::vector<std::string> outlinks; std::vector<std::string> outlinks;
std::vector<rnn::MemoryAttr> memories; std::vector<rnn::StateAttr> states;
}; };
struct ArgumentName { struct ArgumentName {
...@@ -54,9 +54,9 @@ struct ArgumentName { ...@@ -54,9 +54,9 @@ struct ArgumentName {
std::string step_scopes; std::string step_scopes;
std::string inlinks; std::string inlinks;
std::string outlinks; std::string outlinks;
std::string memories; // the memory name std::string states; // the memory name
std::string pre_memories; // the previous memory name std::string ex_states; // the previous memory name
std::string boot_memories; // the boot memory name std::string initial_states; // the boot memory name
}; };
/** /**
...@@ -74,7 +74,7 @@ void ConcatOutputs(const std::vector<Scope*>& step_scopes, ...@@ -74,7 +74,7 @@ void ConcatOutputs(const std::vector<Scope*>& step_scopes,
const size_t seq_len, const platform::DeviceContext& ctx); const size_t seq_len, const platform::DeviceContext& ctx);
void LinkMemories(const std::vector<Scope*>& step_scopes, void LinkMemories(const std::vector<Scope*>& step_scopes,
const std::vector<MemoryAttr>& memories, const size_t step_id, const std::vector<StateAttr>& memories, const size_t step_id,
const int offset); const int offset);
void InitArgument(const ArgumentName& name, Argument* arg, void InitArgument(const ArgumentName& name, Argument* arg,
......
...@@ -30,7 +30,7 @@ class ScatterOpCUDAKernel : public framework::OpKernel<T> { ...@@ -30,7 +30,7 @@ class ScatterOpCUDAKernel : public framework::OpKernel<T> {
auto *Updates = ctx.Input<Tensor>("Updates"); auto *Updates = ctx.Input<Tensor>("Updates");
auto *Out = ctx.Output<Tensor>("Out"); auto *Out = ctx.Output<Tensor>("Out");
Out->ShareDataWith<T>(*Ref); Out->ShareDataWith(*Ref);
GPUScatterAssign<T>(ctx.device_context(), *Updates, *Index, Out); GPUScatterAssign<T>(ctx.device_context(), *Updates, *Index, Out);
} }
...@@ -48,7 +48,7 @@ class ScatterGradOpCUDAKernel : public framework::OpKernel<T> { ...@@ -48,7 +48,7 @@ class ScatterGradOpCUDAKernel : public framework::OpKernel<T> {
auto *dOut = ctx.Input<Tensor>(framework::GradVarName("Out")); auto *dOut = ctx.Input<Tensor>(framework::GradVarName("Out"));
// In place gradient: dRef = dO // In place gradient: dRef = dO
dRef->ShareDataWith<T>(*dOut); dRef->ShareDataWith(*dOut);
dUpdates->mutable_data<T>(ctx.GetPlace()); dUpdates->mutable_data<T>(ctx.GetPlace());
// Gradient by Gather: dUpdates = dO[Index] // Gradient by Gather: dUpdates = dO[Index]
GPUGather<T>(ctx.device_context(), *dOut, *Index, dUpdates); GPUGather<T>(ctx.device_context(), *dOut, *Index, dUpdates);
......
...@@ -35,7 +35,7 @@ class ScatterOpKernel : public framework::OpKernel<T> { ...@@ -35,7 +35,7 @@ class ScatterOpKernel : public framework::OpKernel<T> {
auto *Out = ctx.Output<Tensor>("Out"); auto *Out = ctx.Output<Tensor>("Out");
// In place output: Out = Ref, Out[Index] += Updates // In place output: Out = Ref, Out[Index] += Updates
Out->ShareDataWith<T>(*Ref); Out->ShareDataWith(*Ref);
// Apply ScatterUpdate: Out[index] += Updates[:] // Apply ScatterUpdate: Out[index] += Updates[:]
ScatterAssign<T>(ctx.device_context(), *Updates, *Index, Out); ScatterAssign<T>(ctx.device_context(), *Updates, *Index, Out);
} }
...@@ -53,7 +53,7 @@ class ScatterGradientOpKernel : public framework::OpKernel<T> { ...@@ -53,7 +53,7 @@ class ScatterGradientOpKernel : public framework::OpKernel<T> {
auto *dOut = ctx.Input<Tensor>(framework::GradVarName("Out")); auto *dOut = ctx.Input<Tensor>(framework::GradVarName("Out"));
// In place gradient: dRef = dO // In place gradient: dRef = dO
dRef->ShareDataWith<T>(*dOut); dRef->ShareDataWith(*dOut);
dUpdates->mutable_data<T>(ctx.GetPlace()); dUpdates->mutable_data<T>(ctx.GetPlace());
// Gradient by Gather: dUpdates += dO[Index] // Gradient by Gather: dUpdates += dO[Index]
CPUGather<T>(ctx.device_context(), *dOut, *Index, dUpdates); CPUGather<T>(ctx.device_context(), *dOut, *Index, dUpdates);
......
...@@ -87,7 +87,7 @@ class SequenceConcatOpKernel : public framework::OpKernel<T> { ...@@ -87,7 +87,7 @@ class SequenceConcatOpKernel : public framework::OpKernel<T> {
auto out_lod_level = out_lod[level]; auto out_lod_level = out_lod[level];
for (size_t i = 0; i < out_lod_level.size() - 1; ++i) { for (size_t i = 0; i < out_lod_level.size() - 1; ++i) {
Tensor out_t = out->Slice<T>(static_cast<int>(out_lod_level[i]), Tensor out_t = out->Slice(static_cast<int>(out_lod_level[i]),
static_cast<int>(out_lod_level[i + 1])); static_cast<int>(out_lod_level[i + 1]));
auto out_stride = framework::stride(out_t.dims()); auto out_stride = framework::stride(out_t.dims());
size_t offset = 0; size_t offset = 0;
...@@ -95,7 +95,7 @@ class SequenceConcatOpKernel : public framework::OpKernel<T> { ...@@ -95,7 +95,7 @@ class SequenceConcatOpKernel : public framework::OpKernel<T> {
for (size_t j = 0; j < n; ++j) { for (size_t j = 0; j < n; ++j) {
auto in_lod_level = ins[j]->lod()[level]; auto in_lod_level = ins[j]->lod()[level];
auto in_stride = framework::stride(ins[j]->dims()); auto in_stride = framework::stride(ins[j]->dims());
Tensor in_t = ins[j]->Slice<T>(static_cast<int>(in_lod_level[i]), Tensor in_t = ins[j]->Slice(static_cast<int>(in_lod_level[i]),
static_cast<int>(in_lod_level[i + 1])); static_cast<int>(in_lod_level[i + 1]));
size_t axis_dim = in_t.dims()[axis]; size_t axis_dim = in_t.dims()[axis];
StridedMemcpy<T>(ctx.device_context(), in_t.data<T>(), in_stride, StridedMemcpy<T>(ctx.device_context(), in_t.data<T>(), in_stride,
...@@ -130,7 +130,7 @@ class SequenceConcatGradOpKernel : public framework::OpKernel<T> { ...@@ -130,7 +130,7 @@ class SequenceConcatGradOpKernel : public framework::OpKernel<T> {
for (size_t i = 0; i < out_lod_level.size() - 1; ++i) { for (size_t i = 0; i < out_lod_level.size() - 1; ++i) {
Tensor out_grad_t = Tensor out_grad_t =
out_grad->Slice<T>(static_cast<int>(out_lod_level[i]), out_grad->Slice(static_cast<int>(out_lod_level[i]),
static_cast<int>(out_lod_level[i + 1])); static_cast<int>(out_lod_level[i + 1]));
auto out_grad_stride = framework::stride(out_grad_t.dims()); auto out_grad_stride = framework::stride(out_grad_t.dims());
size_t offset = 0; size_t offset = 0;
...@@ -139,7 +139,7 @@ class SequenceConcatGradOpKernel : public framework::OpKernel<T> { ...@@ -139,7 +139,7 @@ class SequenceConcatGradOpKernel : public framework::OpKernel<T> {
auto x_grad_lod_level = x_grads[j]->lod()[level]; auto x_grad_lod_level = x_grads[j]->lod()[level];
auto x_grad_stride = framework::stride(x_grads[j]->dims()); auto x_grad_stride = framework::stride(x_grads[j]->dims());
Tensor x_grad_t = Tensor x_grad_t =
x_grads[j]->Slice<T>(static_cast<int>(x_grad_lod_level[i]), x_grads[j]->Slice(static_cast<int>(x_grad_lod_level[i]),
static_cast<int>(x_grad_lod_level[i + 1])); static_cast<int>(x_grad_lod_level[i + 1]));
size_t axis_dim = x_grad_t.dims()[axis]; size_t axis_dim = x_grad_t.dims()[axis];
StridedMemcpy<T>(ctx.device_context(), out_grad_t.data<T>() + offset, StridedMemcpy<T>(ctx.device_context(), out_grad_t.data<T>() + offset,
......
...@@ -64,9 +64,9 @@ class SequencePoolKernel : public framework::OpKernel<T> { ...@@ -64,9 +64,9 @@ class SequencePoolKernel : public framework::OpKernel<T> {
out->mutable_data<T>(context.GetPlace()); out->mutable_data<T>(context.GetPlace());
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
for (int i = 0; i < static_cast<int>(lod_level_0.size()) - 1; ++i) { for (int i = 0; i < static_cast<int>(lod_level_0.size()) - 1; ++i) {
Tensor in_t = in->Slice<T>(static_cast<int>(lod_level_0[i]), Tensor in_t = in->Slice(static_cast<int>(lod_level_0[i]),
static_cast<int>(lod_level_0[i + 1])); static_cast<int>(lod_level_0[i + 1]));
Tensor out_t = out->Slice<T>(i, i + 1); Tensor out_t = out->Slice(i, i + 1);
int64_t h = static_cast<int64_t>(lod_level_0[i + 1] - lod_level_0[i]); int64_t h = static_cast<int64_t>(lod_level_0[i + 1] - lod_level_0[i]);
auto in_e = EigenMatrix<T>::From(in_t, framework::make_ddim({h, w})); auto in_e = EigenMatrix<T>::From(in_t, framework::make_ddim({h, w}));
auto out_e = EigenVector<T>::Flatten(out_t); auto out_e = EigenVector<T>::Flatten(out_t);
...@@ -116,9 +116,9 @@ class SequencePoolGradKernel : public framework::OpKernel<T> { ...@@ -116,9 +116,9 @@ class SequencePoolGradKernel : public framework::OpKernel<T> {
} }
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
for (int i = 0; i < static_cast<int>(lod.size()) - 1; ++i) { for (int i = 0; i < static_cast<int>(lod.size()) - 1; ++i) {
auto in_g_t = in_g->Slice<T>(static_cast<int>(lod[i]), auto in_g_t =
static_cast<int>(lod[i + 1])); in_g->Slice(static_cast<int>(lod[i]), static_cast<int>(lod[i + 1]));
auto out_g_t = out_g->Slice<T>(i, i + 1); auto out_g_t = out_g->Slice(i, i + 1);
int64_t h = static_cast<int64_t>(lod[i + 1] - lod[i]); int64_t h = static_cast<int64_t>(lod[i + 1] - lod[i]);
auto in_g_e = EigenMatrix<T>::From(in_g_t, {h, w}); auto in_g_e = EigenMatrix<T>::From(in_g_t, {h, w});
auto out_g_e = EigenMatrix<T>::From(out_g_t, {1, w}); auto out_g_e = EigenMatrix<T>::From(out_g_t, {1, w});
......
...@@ -46,8 +46,8 @@ class SequenceSoftmaxKernel : public framework::OpKernel<T> { ...@@ -46,8 +46,8 @@ class SequenceSoftmaxKernel : public framework::OpKernel<T> {
for (int i = 0; i < static_cast<int>(lod[level].size()) - 1; ++i) { for (int i = 0; i < static_cast<int>(lod[level].size()) - 1; ++i) {
int start_pos = static_cast<int>(lod[level][i]); int start_pos = static_cast<int>(lod[level][i]);
int end_pos = static_cast<int>(lod[level][i + 1]); int end_pos = static_cast<int>(lod[level][i + 1]);
Tensor x_i = x->Slice<T>(start_pos, end_pos); Tensor x_i = x->Slice(start_pos, end_pos);
Tensor out_i = out->Slice<T>(start_pos, end_pos); Tensor out_i = out->Slice(start_pos, end_pos);
// Reshape from (end_pos - start_pos) x 1UL to 1UL x (end_pos - start_pos) // Reshape from (end_pos - start_pos) x 1UL to 1UL x (end_pos - start_pos)
framework::DDim dims_i = framework::make_ddim({1UL, end_pos - start_pos}); framework::DDim dims_i = framework::make_ddim({1UL, end_pos - start_pos});
...@@ -75,9 +75,9 @@ class SequenceSoftmaxGradKernel : public framework::OpKernel<T> { ...@@ -75,9 +75,9 @@ class SequenceSoftmaxGradKernel : public framework::OpKernel<T> {
int start_pos = static_cast<int>(lod[level][i]); int start_pos = static_cast<int>(lod[level][i]);
int end_pos = static_cast<int>(lod[level][i + 1]); int end_pos = static_cast<int>(lod[level][i + 1]);
Tensor out_i = out->Slice<T>(start_pos, end_pos); Tensor out_i = out->Slice(start_pos, end_pos);
Tensor out_grad_i = out_grad->Slice<T>(start_pos, end_pos); Tensor out_grad_i = out_grad->Slice(start_pos, end_pos);
Tensor x_grad_i = x_grad->Slice<T>(start_pos, end_pos); Tensor x_grad_i = x_grad->Slice(start_pos, end_pos);
// Reshape from (end_pos - start_pos) x 1UL to 1UL x (end_pos - start_pos) // Reshape from (end_pos - start_pos) x 1UL to 1UL x (end_pos - start_pos)
framework::DDim dims_i = framework::make_ddim({1UL, end_pos - start_pos}); framework::DDim dims_i = framework::make_ddim({1UL, end_pos - start_pos});
......
...@@ -21,7 +21,7 @@ class SGDOp : public framework::OperatorWithKernel { ...@@ -21,7 +21,7 @@ class SGDOp : public framework::OperatorWithKernel {
public: public:
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext *ctx) const override { void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Param"), PADDLE_ENFORCE(ctx->HasInput("Param"),
"Input(Param) of SGDOp should not be null."); "Input(Param) of SGDOp should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Grad"), PADDLE_ENFORCE(ctx->HasInput("Grad"),
...@@ -35,15 +35,15 @@ class SGDOp : public framework::OperatorWithKernel { ...@@ -35,15 +35,15 @@ class SGDOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_EQ(framework::product(lr_dims), 1, PADDLE_ENFORCE_EQ(framework::product(lr_dims), 1,
"Learning rate should have 1 element"); "Learning rate should have 1 element");
auto param_dim = ctx->GetInputDim("Param"); auto param_dim = ctx->GetInputDim("Param");
PADDLE_ENFORCE_EQ(param_dim, ctx->GetInputDim("Grad"), // TODO(qijun): check dimensions of Param and Grad at complie
"Two input of SGD Op's dimension must be same."); // and run time.
ctx->SetOutputDim("ParamOut", param_dim); ctx->SetOutputDim("ParamOut", param_dim);
} }
}; };
class SGDOpMaker : public framework::OpProtoAndCheckerMaker { class SGDOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SGDOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) SGDOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", "Input parameter"); AddInput("Param", "Input parameter");
AddInput("LearningRate", "Learning rate of SGD"); AddInput("LearningRate", "Learning rate of SGD");
...@@ -58,6 +58,38 @@ param_out = param - learning_rate * grad; ...@@ -58,6 +58,38 @@ param_out = param - learning_rate * grad;
)DOC"); )DOC");
} }
}; };
template <typename T>
struct SparseSGDFunctor<platform::CPUPlace, T> {
void operator()(const platform::DeviceContext& context,
const framework::SelectedRows& input,
const framework::Tensor& learning_rate,
framework::Tensor* output) {
auto in_height = input.height();
auto out_dims = output->dims();
PADDLE_ENFORCE_EQ(in_height, out_dims[0]);
auto& in_value = input.value();
auto& in_rows = input.rows();
int64_t in_row_numel = in_value.numel() / in_rows.size();
PADDLE_ENFORCE_EQ(in_row_numel, output->numel() / in_height);
auto* in_data = in_value.data<T>();
auto* out_data = output->data<T>();
auto* lr = learning_rate.data<T>();
for (size_t i = 0; i < in_rows.size(); i++) {
for (int64_t j = 0; j < in_row_numel; j++) {
out_data[in_rows[i] * in_row_numel + j] -=
lr[0] * in_data[i * in_row_numel + j];
}
}
}
};
template struct SparseSGDFunctor<platform::CPUPlace, float>;
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
......
...@@ -14,6 +14,66 @@ ...@@ -14,6 +14,66 @@
#define EIGEN_USE_GPU #define EIGEN_USE_GPU
#include "paddle/operators/sgd_op.h" #include "paddle/operators/sgd_op.h"
#include "paddle/platform/cuda_helper.h"
namespace paddle {
namespace operators {
namespace {
template <typename T>
__global__ void SparseSGDFunctorKernel(const T* selected_rows,
const int64_t* rows,
const T* learning_rate, T* tensor_out,
int64_t row_numel, int block_size) {
const int ty = blockIdx.y;
int tid = threadIdx.x;
selected_rows += ty * row_numel;
tensor_out += rows[ty] * row_numel;
for (int index = tid; index < row_numel; index += block_size) {
// Since index in rows of SelectedRows can be duplicate, we have to use
// Atomic Operation to avoid concurrent write error.
paddle::platform::CudaAtomicAdd(
tensor_out + index, -1.0 * learning_rate[0] * selected_rows[index]);
}
}
} // namespace
template <typename T>
struct SparseSGDFunctor<platform::GPUPlace, T> {
void operator()(const platform::DeviceContext& context,
const framework::SelectedRows& input,
const framework::Tensor& learning_rate,
framework::Tensor* output) {
auto in_height = input.height();
auto out_dims = output->dims();
PADDLE_ENFORCE_EQ(in_height, out_dims[0]);
auto& in_value = input.value();
auto& in_rows = input.rows();
int64_t in_row_numel = in_value.numel() / in_rows.size();
PADDLE_ENFORCE_EQ(in_row_numel, output->numel() / in_height);
auto* in_data = in_value.data<T>();
auto* out_data = output->data<T>();
int block_size = 256;
dim3 threads(block_size, 1);
dim3 grid(1, in_rows.size());
SparseSGDFunctorKernel<
T><<<grid, threads, 0,
reinterpret_cast<const platform::CUDADeviceContext&>(context)
.stream()>>>(in_data, in_rows.data(), learning_rate.data<T>(),
out_data, in_row_numel, block_size);
}
};
template struct SparseSGDFunctor<platform::GPUPlace, float>;
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(sgd, REGISTER_OP_GPU_KERNEL(sgd,
......
...@@ -15,20 +15,32 @@ limitations under the License. */ ...@@ -15,20 +15,32 @@ limitations under the License. */
#pragma once #pragma once
#include "paddle/framework/eigen.h" #include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/framework/selected_rows.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
template <typename Place, typename T>
struct SparseSGDFunctor {
void operator()(const platform::DeviceContext& context,
const framework::SelectedRows& input,
const framework::Tensor& learning_rate,
framework::Tensor* output);
};
template <typename Place, typename T> template <typename Place, typename T>
class SGDOpKernel : public framework::OpKernel<T> { class SGDOpKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto param = ctx.Input<framework::Tensor>("Param"); auto* param = ctx.Input<framework::Tensor>("Param");
auto grad = ctx.Input<framework::Tensor>("Grad"); auto* param_out = ctx.Output<framework::Tensor>("ParamOut");
auto param_out = ctx.Output<framework::Tensor>("ParamOut"); auto* learning_rate = ctx.Input<framework::Tensor>("LearningRate");
auto learning_rate = ctx.Input<framework::Tensor>("LearningRate");
auto* grad_var = ctx.InputVar("Grad");
// Actually, all tensors are LoDTensor except SelectedRows.
if (grad_var->IsType<framework::LoDTensor>()) {
param_out->mutable_data<T>(ctx.GetPlace()); param_out->mutable_data<T>(ctx.GetPlace());
auto* grad = ctx.Input<framework::Tensor>("Grad");
auto p = framework::EigenVector<T>::Flatten(*param); auto p = framework::EigenVector<T>::Flatten(*param);
auto g = framework::EigenVector<T>::Flatten(*grad); auto g = framework::EigenVector<T>::Flatten(*grad);
...@@ -38,8 +50,18 @@ class SGDOpKernel : public framework::OpKernel<T> { ...@@ -38,8 +50,18 @@ class SGDOpKernel : public framework::OpKernel<T> {
Eigen::DSizes<int, 1> grad_dsize(grad->numel()); Eigen::DSizes<int, 1> grad_dsize(grad->numel());
o.device(place) = p - lr.broadcast(grad_dsize) * g; o.device(place) = p - lr.broadcast(grad_dsize) * g;
} else if (grad_var->IsType<framework::SelectedRows>()) {
// TODO(qijun): In Sparse SGD operator, in-place update is enforced.
// This manual optimization brings difficulty to track data dependency.
// It's better to find a more elegant solution.
PADDLE_ENFORCE_EQ(param, param_out);
auto* grad = ctx.Input<framework::SelectedRows>("Grad");
SparseSGDFunctor<Place, T> functor;
functor(ctx.device_context(), *grad, *learning_rate, param_out);
} else {
PADDLE_THROW("Unsupported Variable Type of Grad");
}
} }
}; };
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
...@@ -85,7 +85,7 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel<T> { ...@@ -85,7 +85,7 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel<T> {
context.Input<Tensor>(framework::GradVarName("Loss"))->data<T>(); context.Input<Tensor>(framework::GradVarName("Loss"))->data<T>();
Tensor* logit_grad = Tensor* logit_grad =
context.Output<Tensor>(framework::GradVarName("Logits")); context.Output<Tensor>(framework::GradVarName("Logits"));
logit_grad->ShareDataWith<T>(*context.Input<Tensor>("Softmax")); logit_grad->ShareDataWith(*context.Input<Tensor>("Softmax"));
T* logit_grad_data = logit_grad->data<T>(); T* logit_grad_data = logit_grad->data<T>();
const int batch_size = logit_grad->dims()[0]; const int batch_size = logit_grad->dims()[0];
......
...@@ -57,7 +57,7 @@ class SoftmaxWithCrossEntropyGradKernel : public framework::OpKernel<T> { ...@@ -57,7 +57,7 @@ class SoftmaxWithCrossEntropyGradKernel : public framework::OpKernel<T> {
const Tensor* labels = context.Input<Tensor>("Label"); const Tensor* labels = context.Input<Tensor>("Label");
Tensor* logit_grad = Tensor* logit_grad =
context.Output<Tensor>(framework::GradVarName("Logits")); context.Output<Tensor>(framework::GradVarName("Logits"));
logit_grad->ShareDataWith<T>(*context.Input<Tensor>("Softmax")); logit_grad->ShareDataWith(*context.Input<Tensor>("Softmax"));
const int class_num = logit_grad->dims()[1]; const int class_num = logit_grad->dims()[1];
if (context.Attr<bool>("soft_label")) { if (context.Attr<bool>("soft_label")) {
......
...@@ -53,10 +53,10 @@ class UniformRandomOp : public framework::OperatorWithKernel { ...@@ -53,10 +53,10 @@ class UniformRandomOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE( PADDLE_ENFORCE(
ctx->Attrs().Get<float>("min") < ctx->Attrs().Get<float>("max"), ctx->Attrs().Get<float>("min") < ctx->Attrs().Get<float>("max"),
"uniform_random's min must less then max"); "uniform_random's min must less then max");
auto& dims = ctx->Attrs().Get<std::vector<int>>("dims"); auto& shape = ctx->Attrs().Get<std::vector<int>>("shape");
std::vector<int64_t> temp; std::vector<int64_t> temp;
temp.reserve(dims.size()); temp.reserve(shape.size());
for (auto dim : dims) { for (auto dim : shape) {
temp.push_back(static_cast<int64_t>(dim)); temp.push_back(static_cast<int64_t>(dim));
} }
ctx->SetOutputDim("Out", framework::make_ddim(temp)); ctx->SetOutputDim("Out", framework::make_ddim(temp));
...@@ -65,7 +65,7 @@ class UniformRandomOp : public framework::OperatorWithKernel { ...@@ -65,7 +65,7 @@ class UniformRandomOp : public framework::OperatorWithKernel {
protected: protected:
framework::DataType IndicateDataType( framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override { const framework::ExecutionContext& ctx) const override {
return static_cast<framework::DataType>(Attr<int>("data_type")); return static_cast<framework::DataType>(ctx.Attr<int>("data_type"));
} }
}; };
...@@ -78,7 +78,7 @@ class UniformRandomOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -78,7 +78,7 @@ class UniformRandomOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC(Uniform random operator. AddComment(R"DOC(Uniform random operator.
Used to initialize tensor with uniform random generator. Used to initialize tensor with uniform random generator.
)DOC"); )DOC");
AddAttr<std::vector<int>>("dims", "the dimension of random tensor"); AddAttr<std::vector<int>>("shape", "the dimension of random tensor");
AddAttr<float>("min", "Minimum value of uniform random").SetDefault(-1.0f); AddAttr<float>("min", "Minimum value of uniform random").SetDefault(-1.0f);
AddAttr<float>("max", "Maximun value of uniform random").SetDefault(1.0f); AddAttr<float>("max", "Maximun value of uniform random").SetDefault(1.0f);
AddAttr<int>("seed", AddAttr<int>("seed",
......
...@@ -265,6 +265,10 @@ public: ...@@ -265,6 +265,10 @@ public:
addParameterType(PARAMETER_SECOND_MOMENTUM); addParameterType(PARAMETER_SECOND_MOMENTUM);
} }
virtual void startBatch(int64_t numSamplesProcessed) {
learningRate_ = calcLearningRate(numSamplesProcessed, pass_);
}
virtual void finishBatch() { ++step_; } virtual void finishBatch() { ++step_; }
virtual void update(const VectorPtr vecs[], virtual void update(const VectorPtr vecs[],
......
...@@ -77,6 +77,10 @@ extern void *cublas_dso_handle; ...@@ -77,6 +77,10 @@ extern void *cublas_dso_handle;
__macro(cublasDgemmBatched); \ __macro(cublasDgemmBatched); \
__macro(cublasCgemmBatched); \ __macro(cublasCgemmBatched); \
__macro(cublasZgemmBatched); \ __macro(cublasZgemmBatched); \
__macro(cublasSgemmStridedBatched); \
__macro(cublasDgemmStridedBatched); \
__macro(cublasCgemmStridedBatched); \
__macro(cublasZgemmStridedBatched); \
__macro(cublasSgetrfBatched); \ __macro(cublasSgetrfBatched); \
__macro(cublasSgetriBatched); \ __macro(cublasSgetriBatched); \
__macro(cublasDgetrfBatched); \ __macro(cublasDgetrfBatched); \
......
...@@ -100,21 +100,11 @@ using namespace paddle::framework; // NOLINT ...@@ -100,21 +100,11 @@ using namespace paddle::framework; // NOLINT
// Bind Methods // Bind Methods
void BindProgramDesc(py::module &m) { void BindProgramDesc(py::module &m) {
py::class_<ProgramDescBind>(m, "ProgramDesc", "") py::class_<ProgramDescBind>(m, "ProgramDesc", "")
.def_static("instance", .def(py::init<>())
[]() -> ProgramDescBind * { .def("__init__",
return &ProgramDescBind::Instance(&GetProgramDesc()); [](ProgramDescBind &self, const ProgramDescBind &other) {
}, new (&self) ProgramDescBind(other);
py::return_value_policy::reference) })
.def_static("__create_program_desc__",
[]() -> ProgramDescBind * {
// Only used for unit-test
auto *prog_desc = new ProgramDesc;
auto *block = prog_desc->mutable_blocks()->Add();
block->set_idx(0);
block->set_parent_idx(-1);
return &ProgramDescBind::Instance(prog_desc);
},
py::return_value_policy::reference)
.def("append_block", &ProgramDescBind::AppendBlock, .def("append_block", &ProgramDescBind::AppendBlock,
py::return_value_policy::reference) py::return_value_policy::reference)
.def("append_backward", .def("append_backward",
...@@ -163,6 +153,11 @@ void BindBlockDesc(py::module &m) { ...@@ -163,6 +153,11 @@ void BindBlockDesc(py::module &m) {
return self.Var(name); return self.Var(name);
}, },
py::return_value_policy::reference) py::return_value_policy::reference)
.def("has_var",
[](BlockDescBind &self, py::bytes byte_name) {
std::string name = byte_name;
return self.HasVar(name);
})
.def("find_var", .def("find_var",
[](BlockDescBind &self, py::bytes byte_name) { [](BlockDescBind &self, py::bytes byte_name) {
std::string name = byte_name; std::string name = byte_name;
...@@ -171,8 +166,8 @@ void BindBlockDesc(py::module &m) { ...@@ -171,8 +166,8 @@ void BindBlockDesc(py::module &m) {
py::return_value_policy::reference) py::return_value_policy::reference)
.def("all_vars", &BlockDescBind::AllVars, .def("all_vars", &BlockDescBind::AllVars,
py::return_value_policy::reference) py::return_value_policy::reference)
.def("all_ops", &BlockDescBind::AllOps, .def("op_size", &BlockDescBind::OpSize)
py::return_value_policy::reference) .def("op", &BlockDescBind::Op, py::return_value_policy::reference)
.def("serialize_to_string", [](BlockDescBind &block_desc) -> py::bytes { .def("serialize_to_string", [](BlockDescBind &block_desc) -> py::bytes {
const BlockDesc *desc = block_desc.Proto(); const BlockDesc *desc = block_desc.Proto();
PADDLE_ENFORCE(desc->IsInitialized(), PADDLE_ENFORCE(desc->IsInitialized(),
...@@ -211,7 +206,8 @@ void BindVarDsec(py::module &m) { ...@@ -211,7 +206,8 @@ void BindVarDsec(py::module &m) {
.def("set_lod_level", &VarDescBind::SetLoDLevel) .def("set_lod_level", &VarDescBind::SetLoDLevel)
.def("type", &VarDescBind::GetType) .def("type", &VarDescBind::GetType)
.def("set_type", &VarDescBind::SetType) .def("set_type", &VarDescBind::SetType)
.def("serialize_to_string", [](VarDescBind &var_desc) -> py::bytes { .def("serialize_to_string",
[](VarDescBind &var_desc) -> py::bytes {
const VarDesc *desc = var_desc.Proto(); const VarDesc *desc = var_desc.Proto();
PADDLE_ENFORCE(desc->IsInitialized(), PADDLE_ENFORCE(desc->IsInitialized(),
"VarDesc has not been initialized."); "VarDesc has not been initialized.");
...@@ -220,11 +216,15 @@ void BindVarDsec(py::module &m) { ...@@ -220,11 +216,15 @@ void BindVarDsec(py::module &m) {
desc->SerializeToString(&res), desc->SerializeToString(&res),
"Serialize VarDesc Error. This could be a bug of Paddle."); "Serialize VarDesc Error. This could be a bug of Paddle.");
return res; return res;
}); })
.def("persistable", &VarDescBind::Persistable)
.def("set_persistable", &VarDescBind::SetPersistable);
py::enum_<VarDesc::VarType>(var_desc, "VarType", "") py::enum_<VarDesc::VarType>(var_desc, "VarType", "")
.value("LOD_TENSOR", VarDesc::LOD_TENSOR) .value("LOD_TENSOR", VarDesc::LOD_TENSOR)
.value("SELECTED_ROWS", VarDesc::SELECTED_ROWS); .value("SELECTED_ROWS", VarDesc::SELECTED_ROWS)
.value("FEED_MINIBATCH", VarDesc::FEED_MINIBATCH)
.value("FETCH_LIST", VarDesc::FETCH_LIST);
} }
void BindOpDesc(py::module &m) { void BindOpDesc(py::module &m) {
......
...@@ -17,6 +17,7 @@ limitations under the License. */ ...@@ -17,6 +17,7 @@ limitations under the License. */
#include "paddle/framework/backward.h" #include "paddle/framework/backward.h"
#include "paddle/framework/executor.h" #include "paddle/framework/executor.h"
#include "paddle/framework/feed_fetch_method.h" #include "paddle/framework/feed_fetch_method.h"
#include "paddle/framework/framework.pb.h"
#include "paddle/framework/lod_tensor.h" #include "paddle/framework/lod_tensor.h"
#include "paddle/framework/selected_rows.h" #include "paddle/framework/selected_rows.h"
#include "paddle/framework/tensor_array.h" #include "paddle/framework/tensor_array.h"
...@@ -83,10 +84,12 @@ PYBIND11_PLUGIN(core) { ...@@ -83,10 +84,12 @@ PYBIND11_PLUGIN(core) {
.def("set", PyCPUTensorSetFromArray<float>) .def("set", PyCPUTensorSetFromArray<float>)
.def("set", PyCPUTensorSetFromArray<int>) .def("set", PyCPUTensorSetFromArray<int>)
.def("set", PyCPUTensorSetFromArray<double>) .def("set", PyCPUTensorSetFromArray<double>)
.def("set", PyCPUTensorSetFromArray<int64_t>)
#ifdef PADDLE_WITH_CUDA #ifdef PADDLE_WITH_CUDA
.def("set", PyCUDATensorSetFromArray<float>) .def("set", PyCUDATensorSetFromArray<float>)
.def("set", PyCUDATensorSetFromArray<int>) .def("set", PyCUDATensorSetFromArray<int>)
.def("set", PyCUDATensorSetFromArray<double>) .def("set", PyCUDATensorSetFromArray<double>)
.def("set", PyCUDATensorSetFromArray<int64_t>)
#endif #endif
.def("shape", [](Tensor &self) { return vectorize(self.dims()); }) .def("shape", [](Tensor &self) { return vectorize(self.dims()); })
.def("set_float_element", TensorSetElement<float>) .def("set_float_element", TensorSetElement<float>)
...@@ -110,6 +113,7 @@ PYBIND11_PLUGIN(core) { ...@@ -110,6 +113,7 @@ PYBIND11_PLUGIN(core) {
new (&instance) LoDTensor(new_lod); new (&instance) LoDTensor(new_lod);
#endif #endif
}) })
.def("__init__", [](LoDTensor &instance) { new (&instance) LoDTensor(); })
.def("set_lod", .def("set_lod",
[](LoDTensor &self, const std::vector<std::vector<size_t>> &lod) { [](LoDTensor &self, const std::vector<std::vector<size_t>> &lod) {
#ifndef PADDLE_WITH_CUDA #ifndef PADDLE_WITH_CUDA
...@@ -153,7 +157,15 @@ PYBIND11_PLUGIN(core) { ...@@ -153,7 +157,15 @@ PYBIND11_PLUGIN(core) {
py::return_value_policy::reference) py::return_value_policy::reference)
.def("set_height", &SelectedRows::set_height) .def("set_height", &SelectedRows::set_height)
.def("height", &SelectedRows::height) .def("height", &SelectedRows::height)
.def("set_rows", &SelectedRows::set_rows) .def("set_rows",
[](SelectedRows &self, std::vector<int64_t> rows) {
#ifndef PADDLE_WITH_CUDA
self.set_rows(rows);
#else
Vector<int64_t> new_rows(rows);
self.set_rows(new_rows);
#endif
})
.def("rows", [](SelectedRows &self) { .def("rows", [](SelectedRows &self) {
#ifndef PADDLE_WITH_CUDA #ifndef PADDLE_WITH_CUDA
return self.rows(); return self.rows();
...@@ -186,6 +198,11 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -186,6 +198,11 @@ All parameter, weight, gradient are variables in Paddle.
return self.GetMutable<LoDTensor>(); return self.GetMutable<LoDTensor>();
}, },
py::return_value_policy::reference) py::return_value_policy::reference)
.def("get_selected_rows",
[](Variable &self) -> SelectedRows * {
return self.GetMutable<SelectedRows>();
},
py::return_value_policy::reference)
.def("get_net", .def("get_net",
[](Variable &self) -> operators::NetOp * { [](Variable &self) -> operators::NetOp * {
return self.GetMutable<operators::NetOp>(); return self.GetMutable<operators::NetOp>();
...@@ -250,6 +267,17 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -250,6 +267,17 @@ All parameter, weight, gradient are variables in Paddle.
.def(py::init<>()) .def(py::init<>())
.def("__str__", string::to_string<const platform::CPUPlace &>); .def("__str__", string::to_string<const platform::CPUPlace &>);
py::class_<platform::Place>(m, "Place")
.def(py::init<>())
.def("set_place",
[](platform::Place &self, const platform::CPUPlace &cpu_place) {
self = cpu_place;
})
.def("set_place",
[](platform::Place &self, const platform::GPUPlace &gpu_place) {
self = gpu_place;
});
py::class_<OperatorBase>(m, "Operator") py::class_<OperatorBase>(m, "Operator")
.def_static("create", .def_static("create",
[](py::bytes protobin) { [](py::bytes protobin) {
...@@ -259,7 +287,7 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -259,7 +287,7 @@ All parameter, weight, gradient are variables in Paddle.
PADDLE_ENFORCE(desc.IsInitialized(), PADDLE_ENFORCE(desc.IsInitialized(),
"User OpDesc is not initialized, reason %s", "User OpDesc is not initialized, reason %s",
desc.InitializationErrorString()); desc.InitializationErrorString());
return OpRegistry::CreateOp(desc); return OpRegistry::CreateOp(desc, nullptr);
}) })
.def("backward", .def("backward",
[](const OperatorBase &forwardOp, [](const OperatorBase &forwardOp,
...@@ -363,7 +391,7 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -363,7 +391,7 @@ All parameter, weight, gradient are variables in Paddle.
PADDLE_ENFORCE(desc.IsInitialized(), PADDLE_ENFORCE(desc.IsInitialized(),
"User OpDesc is not initialized, reason %s", "User OpDesc is not initialized, reason %s",
desc.InitializationErrorString()); desc.InitializationErrorString());
auto rnn_op = OpRegistry::CreateOp(desc); auto rnn_op = OpRegistry::CreateOp(desc, nullptr);
return static_cast<operators::RecurrentOp *>(rnn_op.release()); return static_cast<operators::RecurrentOp *>(rnn_op.release());
}) })
.def("set_stepnet", [](operators::RecurrentOp &self, .def("set_stepnet", [](operators::RecurrentOp &self,
...@@ -381,22 +409,22 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -381,22 +409,22 @@ All parameter, weight, gradient are variables in Paddle.
PADDLE_ENFORCE(desc.IsInitialized(), PADDLE_ENFORCE(desc.IsInitialized(),
"User OpDesc is not initialized, reason %s", "User OpDesc is not initialized, reason %s",
desc.InitializationErrorString()); desc.InitializationErrorString());
auto rnn_op = OpRegistry::CreateOp(desc); auto rnn_op = OpRegistry::CreateOp(desc, nullptr);
return static_cast<operators::DynamicRecurrentOp *>( return static_cast<operators::DynamicRecurrentOp *>(
rnn_op.release()); rnn_op.release());
}) })
.def("set_stepnet", .def("set_step_unit",
[](operators::DynamicRecurrentOp &self, const operators::NetOp &net) [](operators::DynamicRecurrentOp &self, const operators::NetOp &net)
-> void { self.SetStepNet(net.Clone()); }) -> void { self.rnn.SetStepUnit(net.Clone()); })
.def("get_state", .def("get_state",
[](operators::DynamicRecurrentOp &self, const std::string &name) [](operators::DynamicRecurrentOp &self, const std::string &name)
-> const TensorArray & { return self.state(name); }) -> const TensorArray & { return self.rnn.state(name); })
.def("get_step_input", .def("get_step_input",
[](operators::DynamicRecurrentOp &self, const std::string &name) [](operators::DynamicRecurrentOp &self, const std::string &name)
-> const TensorArray & { return self.step_input(name); }) -> const TensorArray & { return self.rnn.step_input(name); })
.def("get_step_output", .def("get_step_output",
[](operators::DynamicRecurrentOp &self, const std::string &name) [](operators::DynamicRecurrentOp &self, const std::string &name)
-> const TensorArray & { return self.step_output(name); }); -> const TensorArray & { return self.rnn.step_output(name); });
// cond_op // cond_op
py::class_<operators::CondOp, OperatorBase>(m, "CondOp") py::class_<operators::CondOp, OperatorBase>(m, "CondOp")
...@@ -408,7 +436,7 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -408,7 +436,7 @@ All parameter, weight, gradient are variables in Paddle.
PADDLE_ENFORCE(desc.IsInitialized(), PADDLE_ENFORCE(desc.IsInitialized(),
"User OpDesc is not initialized, reason %s", "User OpDesc is not initialized, reason %s",
desc.InitializationErrorString()); desc.InitializationErrorString());
auto cond_op = OpRegistry::CreateOp(desc); auto cond_op = OpRegistry::CreateOp(desc, nullptr);
return static_cast<operators::CondOp *>(cond_op.release()); return static_cast<operators::CondOp *>(cond_op.release());
}) })
.def("set_truenet", .def("set_truenet",
...@@ -422,18 +450,15 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -422,18 +450,15 @@ All parameter, weight, gradient are variables in Paddle.
py::class_<framework::Executor>(m, "Executor") py::class_<framework::Executor>(m, "Executor")
.def(py::init<std::vector<platform::Place> &>()) .def(py::init<std::vector<platform::Place> &>())
.def("run", .def("run", [](Executor &self, ProgramDescBind *program_bind,
[](Executor &self, const ProgramDesc &program_desc, int block_id) { Scope *scope, int block_id) {
framework::Scope &global_scope = GetGlobalScope(); self.Run(*program_bind->Proto(), scope, block_id);
self.Run(program_desc, &global_scope, block_id);
}); });
m.def("unique_integer", UniqueIntegerGenerator); m.def("unique_integer", UniqueIntegerGenerator);
m.def("is_compile_gpu", IsCompileGPU); m.def("is_compile_gpu", IsCompileGPU);
m.def("set_feed_variable_float", framework::SetFeedVariable<float>); m.def("set_feed_variable", framework::SetFeedVariable);
m.def("set_feed_variable_double", framework::SetFeedVariable<double>);
m.def("set_feed_variable_int", framework::SetFeedVariable<int>);
m.def("get_fetch_variable", framework::GetFetchVariable); m.def("get_fetch_variable", framework::GetFetchVariable);
BindProgramDesc(m); BindProgramDesc(m);
...@@ -441,6 +466,8 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -441,6 +466,8 @@ All parameter, weight, gradient are variables in Paddle.
BindVarDsec(m); BindVarDsec(m);
BindOpDesc(m); BindOpDesc(m);
m.def("op_support_gpu", OpSupportGPU);
return m.ptr(); return m.ptr();
} }
} // namespace pybind } // namespace pybind
......
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
HOSTS = [
"root@10.1.9.7",
"root@10.1.18.7",
"root@10.1.32.9",
]
'''
workspace configuration
'''
#root dir for workspace, can be set as any director with real user account
ROOT_DIR = "/root"
'''
network configuration
'''
#pserver nics
PADDLE_NIC = "eth0"
#pserver port
PADDLE_PORT = 7164
#pserver ports num
PADDLE_PORTS_NUM = 1
#pserver sparse ports num
PADDLE_PORTS_NUM_FOR_SPARSE = 1
#trainer whether use gpu
PADDLE_USE_GPU = "False"
#environments setting for all processes in cluster job
LD_LIBRARY_PATH = "/usr/local/cuda/lib64:/usr/lib64"
FROM docker.paddlepaddlehub.com/paddle:0.10.0rc2
RUN apt-get update && apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN echo 'root:root' |chpasswd
RUN sed -ri 's/^PermitRootLogin\s+.*/PermitRootLogin yes/' /etc/ssh/sshd_config
RUN sed -ri 's/UsePAM yes/#UsePAM yes/g' /etc/ssh/sshd_config
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ssh-servers
spec:
replicas: 3
template:
metadata:
labels:
app: ssh-servers
spec:
containers:
- name: ssh-servers
image: docker.paddlepaddlehub.com/paddlessh
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
ports:
- containerPort: 22
#!/bin/bash
python paddle.py \
--job_dispatch_package="/root/wuyi/fabric_submit/workspace" \
--dot_period=10 \
--ports_num_for_sparse=1 \
--log_period=50 \
--num_passes=5 \
--trainer_count=2 \
--saving_period=1 \
--local=0 \
--config=./trainer_config.py \
--save_dir=./output \
--use_gpu=0
# Build this image: docker build -t mpi .
#
FROM paddledev/paddle:0.10.0rc3
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update -y && \
apt-get upgrade -y && \
apt-get install -y openssh-server zip unzip vim sudo \
gcc gfortran openmpi-checkpoint binutils wget curl git openmpi-bin openmpi-common libopenmpi-dev && \
pip install mpi4py numpy virtualenv scipy matplotlib lxml sqlalchemy suds ipython obspy && \
mkdir /var/run/sshd && \
echo 'root:tutorial' | chpasswd && \
sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
# SSH login fix. Otherwise user is kicked off after login
sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd && \
echo "export VISIBLE=now" >> /etc/profile && \
adduser --disabled-password --gecos "" tutorial && \
echo "tutorial ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers && \
mkdir /home/tutorial/.ssh/
ENV HOME /home/tutorial
ENV NOTVISIBLE "in users profile"
# ------------------------------------------------------------
# Set-Up SSH with our Github deploy key
# ------------------------------------------------------------
ADD ssh/config /home/tutorial/.ssh/config
ADD ssh/id_rsa.mpi /home/tutorial/.ssh/id_rsa
ADD ssh/id_rsa.mpi.pub /home/tutorial/.ssh/id_rsa.pub
ADD ssh/id_rsa.mpi.pub /home/tutorial/.ssh/authorized_keys
#---------------------------------------------------------------
#LD_LIBRARY_PATH
#---------------------------------------------------------------
RUN export LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
WORKDIR /home/tutorial
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: mpi-header
labels:
app: mpi-header
spec:
replicas: 1
template:
metadata:
labels:
app: mpi-header
spec:
containers:
- image: typhoon1986/paddle-openmpi
name : mpi-header
resources:
limits:
cpu: 500m
memory: 2Gi
requests:
cpu: 500m
memory: 2Gi
ports:
- containerPort: 22
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: mpi-nodes
labels:
app: mpi-nodes
spec:
replicas: 3
template:
metadata:
labels:
app: mpi-nodes
spec:
containers:
- image: typhoon1986/paddle-openmpi
name : mpi-nodes
resources:
limits:
cpu: 500m
memory: 2Gi
requests:
cpu: 500m
memory: 2Gi
ports:
- containerPort: 22
imagePullPolicy: Always
-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEA7PWLZmgdJ508dD15T6+xqGDvL9Ehzo9SgsnN6xJ+qpUvvOi4
1axW0AqR4MnPTg/uuvk+x4tUpuufOW4w22UTGjsdvmIVWa9ujLtcRiN3YPY+SU+Y
O5FfqKg7r/hBn+/GMcSoffwSs7vVgmhBBnp/mJh2O1cOAFZEe98/47mbg3/kHBAk
36NOQktaU3l48B38EhBTnjWfcEGm1HcTRPFxXV5Wiko6ZhKFEuHcTVKng4ROtUqE
mgHyI0aB7TAxg4na0ejItsYWEPWGeDOw6ms/4MwylxNosWzHFPW9p4zgLCLNr+b6
bDDfYKjXZflAuTQtQhLmJUwD9uuYLAijpSE2fQIDAQABAoIBADgcgRET8Gt0CV/B
OtvKz/f+VEVvcWD3gWNlJDTZIVOFllNWjIZUlA4ZoqenQkbK8Q4nfV1FOht4yjCQ
TlN1oMtiWk297i5Zo4UBzPzy4w774I39oh/g8dT/WXr2/5s+7SDV38xNh6Q2A34o
79T35wUcfUrZ93/O7dKjb/6d8hx2FMha0wVKqY4lmG1lQE3bbx3kakec0PdvU5kO
YHKlpqj3pMR7CpMa+4yL/iXFwWYmnK+uu+zw7JR7PwvH1CzrnvW438wjQ1QmYbSx
mHHOE89X67Lsl5hn81qYWBhpwAlBwi1qscsE0cV9GcFyKqWFqZsj5coM9u3CRfvy
lrWe1OUCgYEA+LBUFEd3Hxs4sFiYElJ8R9SAs1udaqPvAl01hTEijJLfYlMMVs/y
rgNN7j22zjDak2f8QdyMJZX7EZdRmdYcHO0csYOwbYvalzcnwk+U3mxmdD3r4xSo
DSvkJ70fogAqUlcVIg2re6fCmZVJQTvMQYTVEM8zQomJRt/Lb2esSfsCgYEA8+zv
44aToe8uqiDs4w8guRW7LCDkTw4z4IVo9JUibIaPjaAs5bZEBXSB43EEywXCR75H
fML0rU1PVvKh1rqcvZdVzm+XMWVr3asPk0sapaiHaTcmyZvJRDxxqbLFp0zRP1T6
cCtXNFdHWU4KiuKrUi6cDyOKchpfkSZa4seiT+cCgYB+n4FgBfdQPlMB70oW4irn
g/q32CjxuGCk6oKqu5bkzo+xB6obtavSEFqouIGQwO056tNVUY+GP7Rjg5GH663K
yKw4cl3tmS0Gm43B8TVSfw03mKO3rrfWZQe5eCFYIg9qd26KNT2gK435FzsCXQkm
PxUhhu6JrW/ZR2/U3Iur6wKBgADrWLAb1ryagSuE+j+U1AO+kDkHWrTtkcZ72jxp
v3p3O11GSEUJXdJDcSXhTCpTuDq6/dv7hB6PFwh126RKicKxKlKf2wsFndV1Cpb8
hnovW2tLGOtTmfuW2rrQAKyzvmolsNfxYd/BoHQ2thV16z1hDZeFA8WQUeHjKh6G
sBbrAoGATdtQlaUxx4izua6k02ihkxx/cRYwDl2N8UDvDBHokS7vJFMX8b8NpsGg
zMElnqSpu/pe/0UG7N2MtPF6uyMcX8AZzzcsRkiMkDvWJzYt8Jpf+Eyd/uryF+Yv
yrXaOEY83tm6x/fny5ZaZmk8lNth7bfWywuTMkZLX3fYpWtIeE4=
-----END RSA PRIVATE KEY-----
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDs9YtmaB0nnTx0PXlPr7GoYO8v0SHOj1KCyc3rEn6qlS+86LjVrFbQCpHgyc9OD+66+T7Hi1Sm6585bjDbZRMaOx2+YhVZr26Mu1xGI3dg9j5JT5g7kV+oqDuv+EGf78YxxKh9/BKzu9WCaEEGen+YmHY7Vw4AVkR73z/juZuDf+QcECTfo05CS1pTeXjwHfwSEFOeNZ9wQabUdxNE8XFdXlaKSjpmEoUS4dxNUqeDhE61SoSaAfIjRoHtMDGDidrR6Mi2xhYQ9YZ4M7Dqaz/gzDKXE2ixbMcU9b2njOAsIs2v5vpsMN9gqNdl+UC5NC1CEuYlTAP265gsCKOlITZ9 oweidner@peahi
#!/bin/bash
# General trainning configurations
NICS=eth0
PADDLE_INIT_PORT=7164
PADDLE_INIT_PORTS_NUM=1
PADDLE_INIT_PORTS_NUM_FOR_SPARSE=1
PADDLE_INIT_PSERVERS=$(cat machines | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/,/g')
PADDLE_INIT_USE_GPU=False
PADDLE_INIT_NUM_GRADIENT_SERVERS=${OMPI_COMM_WORLD_SIZE}
PADDLE_INIT_TRAINER_ID=${OMPI_COMM_WORLD_RANK}
PADDLE_CLUSTER_TRAIN=True
env
# start pserver
stdbuf -oL nohup paddle pserver --port=$PADDLE_INIT_PORT --ports_num=$PADDLE_INIT_PORTS_NUM \
--ports_num_for_sparse=$PADDLE_INIT_PORTS_NUM_FOR_SPARSE --nics=$NICS \
--comment=paddle_cluster_pserver \
--num_gradient_servers=$PADDLE_INIT_NUM_GRADIENT_SERVERS &> logs/pserver.log &
# start trainer
# NOTE: train.py will use the above environment variables as configuration
python train.py &> logs/train.log
# kill background pservers when train finishes
ps -ef | grep pserver | awk '{print $2}' | xargs kill
...@@ -141,10 +141,17 @@ RUN sed -i '${APT_MIRROR}' /etc/apt/sources.list ...@@ -141,10 +141,17 @@ RUN sed -i '${APT_MIRROR}' /etc/apt/sources.list
EOF EOF
fi fi
if [[ ${WITH_GPU} == "ON" ]]; then
NCCL_DEPS="apt-get install -y libnccl-dev &&"
else
NCCL_DEPS=""
fi
cat >> /paddle/build/Dockerfile <<EOF cat >> /paddle/build/Dockerfile <<EOF
ADD python/dist/*.whl / ADD python/dist/*.whl /
# run paddle version to install python packages first # run paddle version to install python packages first
RUN apt-get update &&\ RUN apt-get update &&\
${NCCL_DEPS}\
apt-get install -y wget python-pip && pip install -U pip && \ apt-get install -y wget python-pip && pip install -U pip && \
pip install /*.whl; apt-get install -f -y && \ pip install /*.whl; apt-get install -f -y && \
apt-get clean -y && \ apt-get clean -y && \
......
...@@ -26,8 +26,9 @@ __all__ = [ ...@@ -26,8 +26,9 @@ __all__ = [
'sequence_conv_pool', 'simple_lstm', "simple_img_conv_pool", 'sequence_conv_pool', 'simple_lstm', "simple_img_conv_pool",
"img_conv_bn_pool", 'lstmemory_group', 'lstmemory_unit', 'small_vgg', "img_conv_bn_pool", 'lstmemory_group', 'lstmemory_unit', 'small_vgg',
'img_conv_group', 'vgg_16_network', 'gru_unit', 'gru_group', 'simple_gru', 'img_conv_group', 'vgg_16_network', 'gru_unit', 'gru_group', 'simple_gru',
'simple_attention', 'simple_gru2', 'bidirectional_gru', 'text_conv_pool', 'simple_attention', 'dot_product_attention', 'simple_gru2',
'bidirectional_lstm', 'inputs', 'outputs' 'bidirectional_gru', 'text_conv_pool', 'bidirectional_lstm', 'inputs',
'outputs'
] ]
###################################################### ######################################################
...@@ -1361,6 +1362,7 @@ def simple_attention(encoded_sequence, ...@@ -1361,6 +1362,7 @@ def simple_attention(encoded_sequence,
compute attention weight. compute attention weight.
:type transform_param_attr: ParameterAttribute :type transform_param_attr: ParameterAttribute
:return: a context vector :return: a context vector
:rtype: LayerOutput
""" """
assert encoded_proj.size == decoder_state.size assert encoded_proj.size == decoder_state.size
proj_size = encoded_proj.size proj_size = encoded_proj.size
...@@ -1396,6 +1398,88 @@ def simple_attention(encoded_sequence, ...@@ -1396,6 +1398,88 @@ def simple_attention(encoded_sequence,
input=scaled, pooling_type=SumPooling(), name="%s_pooling" % name) input=scaled, pooling_type=SumPooling(), name="%s_pooling" % name)
@wrap_name_default()
def dot_product_attention(encoded_sequence,
attended_sequence,
transformed_state,
softmax_param_attr=None,
name=None):
"""
Calculate and return a context vector with dot-product attention mechanism.
The dimension of the context vector equals to that of the attended_sequence.
.. math::
a(s_{i-1},h_{j}) & = s_{i-1}^\mathrm{T} h_{j}
e_{i,j} & = a(s_{i-1}, h_{j})
a_{i,j} & = \\frac{exp(e_{i,j})}{\\sum_{k=1}^{T_x}{exp(e_{i,k})}}
c_{i} & = \\sum_{j=1}^{T_{x}}a_{i,j}z_{j}
where :math:`h_{j}` is the jth element of encoded_sequence,
:math:`z_{j}` is the jth element of attended_sequence,
:math:`s_{i-1}` is transformed_state.
The example usage is:
.. code-block:: python
context = dot_product_attention(encoded_sequence=enc_seq,
attended_sequence=att_seq,
transformed_state=state,)
:param name: A prefix attached to the name of each layer that defined inside
the dot_product_attention.
:type name: basestring
:param softmax_param_attr: The parameter attribute of sequence softmax
that is used to produce attention weight.
:type softmax_param_attr: ParameterAttribute
:param encoded_sequence: The output hidden vectors of the encoder.
:type encoded_sequence: LayerOutput
:param attended_sequence: The attention weight is computed by a feed forward neural
network which has two inputs : decoder's transformed hidden
state of previous time step and encoder's output.
attended_sequence is the sequence to be attended.
:type attended_sequence: LayerOutput
:param transformed_state: The transformed hidden state of decoder in previous time step.
Since the dot-product operation will be performed on it and the
encoded_sequence, their dimensions must be equal. For flexibility,
we suppose transformations of the decoder's hidden state have been
done outside dot_product_attention and no more will be performed
inside. Then users can use either the original or transformed one.
:type transformed_state: LayerOutput
:return: The context vector.
:rtype: LayerOutput
"""
assert transformed_state.size == encoded_sequence.size
expanded = expand_layer(
input=transformed_state,
expanded_as=encoded_sequence,
name='%s_expand' % name)
m = linear_comb_layer(
weights=expanded, vectors=encoded_sequence, name='%s_dot-product')
attention_weight = fc_layer(
input=m,
size=1,
act=SequenceSoftmaxActivation(),
param_attr=softmax_param_attr,
name="%s_softmax" % name,
bias_attr=False)
scaled = scaling_layer(
weight=attention_weight,
input=attended_sequence,
name='%s_scaling' % name)
return pooling_layer(
input=scaled, pooling_type=SumPooling(), name="%s_pooling" % name)
def inputs(layers, *args): def inputs(layers, *args):
""" """
Declare the inputs of network. The order of input should be as same as Declare the inputs of network. The order of input should be as same as
......
import paddle.v2.framework.core as core
from paddle.v2.framework.framework import Block, Program
g_scope = core.Scope()
class Executor(object):
def __init__(self, places):
if not isinstance(places, list) and not isinstance(places, tuple):
places = [places]
act_places = []
for each in places:
p = core.Place()
p.set_place(each)
act_places.append(p)
self.executor = core.Executor(act_places)
def run(self,
program,
feed,
fetch_list,
feed_var_name='feed',
fetch_var_name='fetch',
scope=None):
if not isinstance(program, Program):
raise TypeError()
if scope is None:
scope = g_scope
program = program.clone()
global_block = program.global_block()
feed_var = global_block.create_var(
name=feed_var_name,
type=core.VarDesc.VarType.FEED_MINIBATCH,
persistable=True)
for i, name in enumerate(feed):
out = global_block.var(name)
global_block.prepend_op(
'feed',
inputs={'X': [feed_var]},
outputs={'Out': [out]},
attrs={'col': i})
core.set_feed_variable(scope, feed[name], feed_var.name, i)
fetch_var = global_block.create_var(
name=fetch_var_name,
type=core.VarDesc.VarType.FETCH_LIST,
persistable=True)
for i, var in enumerate(fetch_list):
global_block.append_op(
type='fetch',
inputs={'X': [var]},
outputs={'Out': [fetch_var]},
attrs={'col': i})
self.executor.run(program.desc, scope, 0)
return [
core.get_fetch_variable(scope, fetch_var_name, i)
for i in xrange(len(fetch_list))
]
...@@ -15,6 +15,7 @@ class Variable(object): ...@@ -15,6 +15,7 @@ class Variable(object):
shape=None, shape=None,
dtype=None, dtype=None,
lod_level=None, lod_level=None,
persistable=None,
**kwargs): **kwargs):
self.block = block self.block = block
...@@ -70,6 +71,17 @@ class Variable(object): ...@@ -70,6 +71,17 @@ class Variable(object):
"lod_level is {2}. They are not " "lod_level is {2}. They are not "
"matched".format(self.name, self.lod_level, "matched".format(self.name, self.lod_level,
lod_level)) lod_level))
if persistable is not None:
if is_new_var:
self.desc.set_persistable(persistable)
else:
if persistable != self.persistable:
raise ValueError(
"Variable {0} has been created before."
"The previous persistable is {1}; the new "
"persistable is {2}. They are not matched".format(
self.name, self.persistable, persistable))
self.block.vars[name] = self self.block.vars[name] = self
self.op = None self.op = None
...@@ -80,6 +92,10 @@ class Variable(object): ...@@ -80,6 +92,10 @@ class Variable(object):
__repr__ = __str__ __repr__ = __str__
@property
def persistable(self):
return self.desc.persistable()
@property @property
def name(self): def name(self):
return self.desc.name() return self.desc.name()
...@@ -232,7 +248,7 @@ class Operator(object): ...@@ -232,7 +248,7 @@ class Operator(object):
if attrs is not None: if attrs is not None:
for attr in proto.attrs: for attr in proto.attrs:
attr_name = attr.name attr_name = attr.name
if not attr_name in attrs: if (not attr_name in attrs) or (attrs[attr_name] is None):
continue continue
if not isinstance(attrs[attr_name], Block): if not isinstance(attrs[attr_name], Block):
self.desc.set_attr(attr_name, attrs[attr_name]) self.desc.set_attr(attr_name, attrs[attr_name])
...@@ -240,6 +256,7 @@ class Operator(object): ...@@ -240,6 +256,7 @@ class Operator(object):
self.desc.set_block_attr(attr_name, attrs[attr_name].desc) self.desc.set_block_attr(attr_name, attrs[attr_name].desc)
self.desc.check_attrs() self.desc.check_attrs()
if type not in {'feed', 'fetch'}:
self.desc.infer_shape(self.block.desc) self.desc.infer_shape(self.block.desc)
def __str__(self): def __str__(self):
...@@ -306,6 +323,17 @@ class Block(object): ...@@ -306,6 +323,17 @@ class Block(object):
def idx(self): def idx(self):
return self.desc.id return self.desc.id
def var(self, name):
if not isinstance(name, basestring):
raise TypeError()
v = self.vars.get(name, None)
if v is None:
raise ValueError("var %s not in this block" % name)
return v
def all_parameters(self):
return {v for k, v in self.vars.iteritems() if isinstance(v, Parameter)}
def create_var(self, *args, **kwargs): def create_var(self, *args, **kwargs):
return Variable(self, *args, **kwargs) return Variable(self, *args, **kwargs)
...@@ -314,7 +342,10 @@ class Block(object): ...@@ -314,7 +342,10 @@ class Block(object):
def create_parameter(self, *args, **kwargs): def create_parameter(self, *args, **kwargs):
global_block = self.program.global_block() global_block = self.program.global_block()
return Parameter(global_block, *args, **kwargs) param = Parameter(global_block, *args, **kwargs)
if 'init_attr' in kwargs:
self._prepend_initialize_ops_(param, kwargs['init_attr'])
return param
def append_op(self, *args, **kwargs): def append_op(self, *args, **kwargs):
op_desc = self.desc.append_op() op_desc = self.desc.append_op()
...@@ -335,7 +366,11 @@ class Block(object): ...@@ -335,7 +366,11 @@ class Block(object):
self.create_var(name=var.name(), desc=var, type=var.type()) self.create_var(name=var.name(), desc=var, type=var.type())
# sync operators from cpp # sync operators from cpp
ops_in_cpp = self.desc.all_ops() ops_in_cpp = []
for op_idx in range(0, self.desc.op_size()):
ops_in_cpp.append(self.desc.op(op_idx))
if len(self.ops) != 0:
first_op_in_python = self.ops[0].desc first_op_in_python = self.ops[0].desc
last_op_in_python = self.ops[len(self.ops) - 1].desc last_op_in_python = self.ops[len(self.ops) - 1].desc
start_index = None start_index = None
...@@ -348,6 +383,9 @@ class Block(object): ...@@ -348,6 +383,9 @@ class Block(object):
assert start_index is not None assert start_index is not None
assert end_index is not None assert end_index is not None
assert start_index <= end_index assert start_index <= end_index
else:
start_index = 0
end_index = -1
# sync ops append to the head of cpp_ops # sync ops append to the head of cpp_ops
for index in range((start_index - 1 - 1), -1, -1): for index in range((start_index - 1 - 1), -1, -1):
...@@ -365,20 +403,21 @@ class Block(object): ...@@ -365,20 +403,21 @@ class Block(object):
for index in range(len(self.ops)): for index in range(len(self.ops)):
assert self.ops[index].desc == ops_in_cpp[index] assert self.ops[index].desc == ops_in_cpp[index]
def _prepend_initialize_ops_(self, param, init_attr):
op_type = init_attr['type']
init_attr['shape'] = param.shape
init_attr['data_type'] = int(param.data_type)
op = self.prepend_op(
type=op_type,
inputs=None,
outputs={'Out': [param]},
attrs=init_attr)
param.op = op
class Program(object):
@classmethod
def instance(cls):
# From https://stackoverflow.com/questions/8212053
# Making Program as a Singleton class.
if not hasattr(cls, '_instance'):
cls._instance = cls()
return cls._instance
def __init__(self, desc=None): class Program(object):
if desc is None: def __init__(self):
desc = core.ProgramDesc.instance() self.desc = core.ProgramDesc()
self.desc = desc
self.blocks = [Block(self, 0)] self.blocks = [Block(self, 0)]
self.current_block_idx = 0 self.current_block_idx = 0
...@@ -387,16 +426,32 @@ class Program(object): ...@@ -387,16 +426,32 @@ class Program(object):
proto = framework_pb2.ProgramDesc.FromString(str(protostr)) proto = framework_pb2.ProgramDesc.FromString(str(protostr))
return proto.__str__() return proto.__str__()
__repr__ = __str__ def clone(self):
p = Program()
p.desc = core.ProgramDesc(self.desc)
p.blocks = [Block(p, i) for i in xrange(self.desc.num_blocks())]
p.sync_with_cpp()
return p
def __repr__(self):
return str(self)
def global_block(self): def global_block(self):
return self.blocks[0] return self.blocks[0]
def block(self, index):
return self.blocks[index]
def current_block(self): def current_block(self):
return self.blocks[self.current_block_idx] return self.blocks[self.current_block_idx]
def append_backward(self, target, no_grad_set): def append_backward(self, target, no_grad_set=None):
"""
return map(param_name -> (grad_name, block_index, op_index))
"""
assert isinstance(target, Variable) assert isinstance(target, Variable)
if no_grad_set is None:
no_grad_set = set()
param_to_grad_info = self.desc.append_backward(target.desc, no_grad_set) param_to_grad_info = self.desc.append_backward(target.desc, no_grad_set)
self.sync_with_cpp() self.sync_with_cpp()
return param_to_grad_info return param_to_grad_info
...@@ -429,29 +484,14 @@ class Parameter(Variable): ...@@ -429,29 +484,14 @@ class Parameter(Variable):
if each < 0: if each < 0:
raise ValueError("Parameter shape should not be related with " raise ValueError("Parameter shape should not be related with "
"batch-size") "batch-size")
Variable.__init__(self, block, shape=shape, dtype=dtype, **kwargs)
Variable.__init__(
self, block, persistable=True, shape=shape, dtype=dtype, **kwargs)
self.trainable = kwargs.get('trainable', True) self.trainable = kwargs.get('trainable', True)
self.init_attr = kwargs.get('initialize_attr', {
'type': 'uniform_random',
'min': -1.0,
'max': 1.0
})
self.optimize_attr = kwargs.get('optimize_attr', {'learning_rate': 1.0}) self.optimize_attr = kwargs.get('optimize_attr', {'learning_rate': 1.0})
self._append_initialize_ops_()
def _append_initialize_ops_(self):
attr = self.init_attr
op_type = attr.pop('type', None)
block = self.block
assert isinstance(block, Block)
shape = self.shape
attr['dims'] = shape
attr['data_type'] = int(self.data_type)
op = block.prepend_op(
type=op_type, inputs=None, outputs={'Out': [self]}, attrs=attr)
self.op = op
# program is a global instance. # program is a global instance.
g_program = Program.instance() g_program = Program()
g_init_program = Program()
from paddle.v2.framework.framework import Variable, OpProtoHolder, g_program from paddle.v2.framework.framework import Variable, OpProtoHolder, g_program, g_init_program
import paddle.v2.framework.core as core import paddle.v2.framework.core as core
import copy import copy
import itertools import itertools
...@@ -29,6 +29,14 @@ class LayerHelper(object): ...@@ -29,6 +29,14 @@ class LayerHelper(object):
else: else:
return prog return prog
@property
def init_program(self):
prog = self.kwargs.get('init_program', None)
if prog is None:
return g_init_program
else:
return prog
def append_op(self, *args, **kwargs): def append_op(self, *args, **kwargs):
return self.program.current_block().append_op(*args, **kwargs) return self.program.current_block().append_op(*args, **kwargs)
...@@ -66,16 +74,14 @@ class LayerHelper(object): ...@@ -66,16 +74,14 @@ class LayerHelper(object):
actual = self.kwargs.get('param_attr', None) actual = self.kwargs.get('param_attr', None)
return actual if actual is not None else default return actual if actual is not None else default
def bias_attr(self, size, dtype): def bias_attr(self):
bias_attr = self.kwargs.get('bias_attr', False) bias_attr = self.kwargs.get('bias_attr', None)
if bias_attr is None or bias_attr: if bias_attr is True:
bias_attr = { bias_attr = {
'name': None, 'name': None,
'init_attr': { 'init_attr': {
'type': 'fill_constant', 'type': 'fill_constant',
'value': 0.0, 'value': 0.0
'shape': [size],
'dataType': dtype
} }
} }
return bias_attr return bias_attr
...@@ -113,29 +119,32 @@ class LayerHelper(object): ...@@ -113,29 +119,32 @@ class LayerHelper(object):
def create_parameter(self, attr, shape, dtype, suffix='w'): def create_parameter(self, attr, shape, dtype, suffix='w'):
if attr['name'] is None: if attr['name'] is None:
attr['name'] = unique_name(".".join([self.name, suffix])) attr['name'] = unique_name(".".join([self.name, suffix]))
return self.program.global_block().create_parameter( self.init_program.global_block().create_parameter(
name=attr['name'], name=attr['name'],
dtype=dtype, dtype=dtype,
shape=shape, shape=shape,
initialize_attr=attr['init_attr']) init_attr=attr['init_attr'])
return self.program.global_block().create_parameter(
name=attr['name'], dtype=dtype, shape=shape)
def create_tmp_variable(self, dtype): def create_tmp_variable(self, dtype):
return self.program.current_block().create_var( return self.program.current_block().create_var(
name=unique_name(".".join([self.name, 'tmp'])), dtype=dtype) name=unique_name(".".join([self.name, 'tmp'])),
dtype=dtype,
persistable=False)
def create_global_variable(self, *args, **kwargs): def create_global_variable(self, *args, **kwargs):
return self.program.global_block().create_var(*args, **kwargs) return self.program.global_block().create_var(
*args, persistable=False, **kwargs)
def append_bias_op(self, input_var): def append_bias_op(self, input_var):
bias_attr = self.bias_attr( size = list(input_var.shape[1:])
self.kwargs['size'], dtype=input_var.data_type) bias_attr = self.bias_attr()
if not bias_attr: if not bias_attr:
return input_var return input_var
b = self.create_parameter( b = self.create_parameter(
attr=bias_attr, attr=bias_attr, shape=size, dtype=input_var.data_type, suffix='b')
shape=[self.kwargs['size']],
dtype=input_var.data_type,
suffix='b')
tmp = self.create_tmp_variable(dtype=input_var.data_type) tmp = self.create_tmp_variable(dtype=input_var.data_type)
self.append_op( self.append_op(
type='elementwise_add', type='elementwise_add',
......
...@@ -3,17 +3,18 @@ import paddle.v2.framework.core as core ...@@ -3,17 +3,18 @@ import paddle.v2.framework.core as core
from paddle.v2.framework.framework import OpProtoHolder, Variable from paddle.v2.framework.framework import OpProtoHolder, Variable
import re import re
__all__ = ['fc_layer', 'data_layer', 'cross_entropy'] __all__ = ['fc', 'data', 'cross_entropy', 'conv2d', 'pool2d']
def fc_layer(input, def fc(input,
size, size,
param_attr=None, param_attr=None,
bias_attr=True, bias_attr=True,
name=None, name=None,
act=None, act=None,
num_flatten_dims=1, num_flatten_dims=1,
program=None): program=None,
init_program=None):
# create helper # create helper
helper = LayerHelper('fc', **locals()) helper = LayerHelper('fc', **locals())
...@@ -24,6 +25,7 @@ def fc_layer(input, ...@@ -24,6 +25,7 @@ def fc_layer(input,
for input_var, param_attr in helper.iter_inputs_and_params(): for input_var, param_attr in helper.iter_inputs_and_params():
input_shape = input_var.shape input_shape = input_var.shape
param_shape = list(input_shape[num_flatten_dims:]) + [size] param_shape = list(input_shape[num_flatten_dims:]) + [size]
w = helper.create_parameter( w = helper.create_parameter(
attr=param_attr, shape=param_shape, dtype=dtype) attr=param_attr, shape=param_shape, dtype=dtype)
tmp = helper.create_tmp_variable(dtype) tmp = helper.create_tmp_variable(dtype)
...@@ -34,7 +36,10 @@ def fc_layer(input, ...@@ -34,7 +36,10 @@ def fc_layer(input,
"Y": w, "Y": w,
}, },
outputs={"Out": tmp}, outputs={"Out": tmp},
attrs={'x_num_col_dims': num_flatten_dims}) attrs={
'x_num_col_dims': num_flatten_dims,
'y_num_col_dims': len(input_shape) - num_flatten_dims
})
mul_results.append(tmp) mul_results.append(tmp)
# sum # sum
...@@ -50,12 +55,15 @@ def fc_layer(input, ...@@ -50,12 +55,15 @@ def fc_layer(input,
return helper.append_activation(pre_activation) return helper.append_activation(pre_activation)
def data_layer(name, def data(name,
shape, shape,
data_type='float32', data_type='float32',
type=core.VarDesc.VarType.LOD_TENSOR, type=core.VarDesc.VarType.LOD_TENSOR,
program=None): append_batch_size=True,
program=None,
init_program=None):
helper = LayerHelper('data', **locals()) helper = LayerHelper('data', **locals())
if append_batch_size:
shape = [-1] + shape # append batch size as -1 shape = [-1] + shape # append batch size as -1
return helper.create_global_variable( return helper.create_global_variable(
name=name, shape=shape, dtype=data_type, type=type) name=name, shape=shape, dtype=data_type, type=type)
...@@ -111,6 +119,7 @@ def _create_op_func_(op_type): ...@@ -111,6 +119,7 @@ def _create_op_func_(op_type):
_create_op_func_('mean') _create_op_func_('mean')
_create_op_func_('mul')
def cross_entropy(input, label, **kwargs): def cross_entropy(input, label, **kwargs):
...@@ -141,3 +150,93 @@ def square_error_cost(input, label, **kwargs): ...@@ -141,3 +150,93 @@ def square_error_cost(input, label, **kwargs):
outputs={'Y': [square_out]}, outputs={'Y': [square_out]},
attrs={'factor': 2.0}) attrs={'factor': 2.0})
return square_out return square_out
def conv2d(input,
num_filters,
name=None,
filter_size=[1, 1],
act=None,
groups=None,
stride=[1, 1],
padding=None,
bias_attr=None,
param_attr=None,
program=None,
init_program=None):
helper = LayerHelper('conv2d', **locals())
dtype = helper.input_dtype()
num_channels = input.shape[1]
if groups is None:
num_filter_channels = num_channels
else:
if num_channels % groups is not 0:
raise ValueError("num_channels must be divisible by groups.")
num_filter_channels = num_channels / groups
if isinstance(filter_size, int):
filter_size = [filter_size, filter_size]
if isinstance(stride, int):
stride = [stride, stride]
if isinstance(padding, int):
padding = [padding, padding]
input_shape = input.shape
filter_shape = [num_filters, num_filter_channels] + filter_size
filter = helper.create_parameter(
attr=helper.param_attr, shape=filter_shape, dtype=dtype)
pre_bias = helper.create_tmp_variable(dtype)
helper.append_op(
type='conv2d',
inputs={
'Input': input,
'Filter': filter,
},
outputs={"Output": pre_bias},
attrs={'strides': stride,
'paddings': padding,
'groups': groups})
pre_act = helper.append_bias_op(pre_bias)
return helper.append_activation(pre_act)
def pool2d(input,
pool_size,
pool_type,
pool_stride=[1, 1],
pool_padding=[0, 0],
global_pooling=False,
program=None,
init_program=None):
if pool_type not in ["max", "avg"]:
raise ValueError(
"Unknown pool_type: '%s'. It can only be 'max' or 'avg'.",
str(pool_type))
if isinstance(pool_size, int):
pool_size = [pool_size, pool_size]
if isinstance(pool_stride, int):
pool_stride = [pool_stride, pool_stride]
if isinstance(pool_padding, int):
pool_padding = [pool_padding, pool_padding]
helper = LayerHelper('conv2d', **locals())
dtype = helper.input_dtype()
pool_out = helper.create_tmp_variable(dtype)
helper.append_op(
type="pool2d",
inputs={"X": input},
outputs={"Out": pool_out},
attrs={
"pooling_type": pool_type,
"ksize": pool_size,
"global_pooling": global_pooling,
"strides": pool_stride,
"paddings": pool_padding
})
return pool_out
import paddle.v2.framework.layers as layers
def simple_img_conv_pool(input,
filter_size,
num_filters,
pool_size,
pool_stride,
act,
program=None,
init_program=None):
conv_out = layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
act=act,
program=program,
init_program=init_program)
pool_out = layers.pool2d(
input=conv_out,
pool_size=pool_size,
pool_type='max',
pool_stride=pool_stride,
program=program,
init_program=init_program)
return pool_out
import paddle.v2.framework.framework as framework
from collections import defaultdict
__all__ = ['SGDOptimizer', 'MomentumOptimizer', 'AdagradOptimizer']
class Optimizer(object):
"""Optimizer Base class.
Define the common interface of an optimizer.
User should not use this class directly,
but need to use one of it's implementation.
"""
def __init__(self):
# Dictionary of accumulators. Some optimizer subclasses need to
# allocate and manage extra variables associated with the parameters
# to train. These variables are called accumulators.
# {accum_name : { paramter_name : accumulator_for_parameter, ...}, ...}
self._accumulators = defaultdict(lambda: dict())
def _append_optimize_op(self, block, param_and_grad):
""" append optimize operator to block and return all the added optimize_op
"""
raise NotImplementedError()
def _initialize_tensors(self, block):
"""Create all necessary tensors, that will be shared for all parameter updates.
Tensors like learning rate should be initialized here.
Args:
block: the block in which the loss variable is present
"""
pass
def _create_accumulators(self, block, parameters):
"""Create all accumulators needed by the parameters
Args:
block: the block in which the loss variable is present
parameters: list of parameter variables for the optimizer
"""
pass
def _add_accumulator(self, block, name, param, dtype=None, fill_value=0.0):
"""Utility function to add an accumulator for a parameter
Args:
block: the block in which the loss variable is present
name: name of the accumulator
param: parameter variable for which accumulator is to be added
dtype: data type of the accumulator variable
fill_value: value to initialize the accumulator variable
"""
if (name in self._accumulators and
param.name in self._accumulators[name]):
raise Exception("Accumulator {} already exists for parmeter {}".
format(name, param.name))
global_block = block.program.global_block()
param_shape = list(param.shape)
param_acc = global_block.create_var(
dtype=dtype, shape=param_shape, lod_level=0)
# Initialize the accumulator with fill_value
# FIXME: Fix when Initialization design has been implemented
# https://github.com/PaddlePaddle/Paddle/pull/4852
global_block.append_op(
type="fill_constant",
outputs={"Out": param_acc},
attrs={"shape": param_shape,
"value": fill_value})
# Add to accumulators dict
self._accumulators[name][param.name] = param_acc
def _get_accumulator(self, name, param):
"""Utility function to fetch an accumulator for a parameter
Args:
name: name of the accumulator
param: parameter variable for which accumulator is to be fetched
Returns:
accumulator variable for the parameter
"""
if (name not in self._accumulators or
param.name not in self._accumulators[name]):
raise Exception("Accumulator {} does not exist for parameter {}".
format(name, param.name))
return self._accumulators[name][param.name]
def create_backward_pass(self, loss, parameter_list=None, no_grad_set=None):
"""Create and add gradient Operators in BlockDesc to compute
gradients of `loss` for parameters in parameter_list
Args:
loss: an variable generated by cost function.
no_grad_set: variable that should not create gradient
parameter_list: parameters that need to compute gradient and
update to optimize the lost.
Returns:
list of (parameters, gradients) pair.
"""
assert isinstance(loss, framework.Variable)
param_grad_map = loss.block.program.append_backward(loss, no_grad_set or
set())
if parameter_list is not None:
parameters = parameter_list
else:
params = loss.block.program.global_block().all_parameters()
parameters = [param.name for param in params]
params_and_grads = []
for param in parameters:
if param not in param_grad_map:
raise Exception("param %s is not in map" % param)
grad_info = param_grad_map[param]
grad_block = loss.block.program.block(grad_info[1])
if not grad_block.has_var(grad_info[0]):
raise Exception("grad block[%d] did not have grad var %s" %
grad_info[1], grad_info[0])
# Get the param var from the global block
param_var = loss.block.program.global_block().var(param)
grad_var = grad_block.var(grad_info[0])
if loss.block.has_var(grad_info[0]):
params_and_grads.append((param_var, grad_var))
else:
params_and_grads.append((param_var, None))
return params_and_grads
def create_optimization_pass(self, parameters_and_grads, loss):
"""Add optimization operators to update gradients to variables.
Args:
loss: the target that this optimization is for.
parameters_and_grads: a list of (variable, gradient) pair to update.
Returns:
optmization_op_list: a list of optimization operator that will update
parameter using gradient.
"""
# This is a default implementation of create_optimization_pass that
# can be shared by most optimizers. This implementation assumes that
# the subclass will implement the _append_optimize_op method and the
# _initialize_tensors method. The subclass can extend the
# _create_accumulators method if it needs to create accumulators
# for parameters.
# Create any accumulators
self._create_accumulators(loss.block,
[p[0] for p in parameters_and_grads])
# Create any necessary tensors
self._initialize_tensors(loss.block)
optimize_ops = []
for param_and_grad in parameters_and_grads:
if param_and_grad[1] is not None:
optimize_op = self._append_optimize_op(loss.block,
param_and_grad)
optimize_ops.append(optimize_op)
return optimize_ops
def minimize(self, loss, parameter_list=None, no_grad_set=None):
"""Add operations to minimize `loss` by updating `parameter_list`.
This method combines interface `create_backward_pass()` and
`create_optimization_pass()` into one.
"""
params_grads = self.create_backward_pass(loss, parameter_list,
no_grad_set or set())
optimize_ops = self.create_optimization_pass(params_grads, loss)
return optimize_ops
class SGDOptimizer(Optimizer):
""" Simple SGD optimizer without any state.
"""
def __init__(self, learning_rate):
assert learning_rate is not None
super(SGDOptimizer, self).__init__()
self.type = "sgd"
self._learning_rate = learning_rate
def _initialize_tensors(self, block):
assert isinstance(block, framework.Block)
lr_shape = [1]
# create a variable for learning_rate
self._lr = block.create_var(
dtype="float32", shape=lr_shape, lod_level=0)
# create an op to init the learning_rate
# FIXME: Fix when Initialization design has been implemented
# https://github.com/PaddlePaddle/Paddle/pull/4852
block.append_op(
type="fill_constant",
outputs={"Out": self._lr},
attrs={"shape": lr_shape,
"value": self._learning_rate})
def _append_optimize_op(self, block, param_and_grad):
assert isinstance(block, framework.Block)
# create the optimize op
sgd_op = block.append_op(
type=self.type,
inputs={
"Param": param_and_grad[0],
"Grad": param_and_grad[1],
"LearningRate": self._lr
},
outputs={"ParamOut": param_and_grad[0]})
return sgd_op
class MomentumOptimizer(Optimizer):
"""Simple Momentum optimizer with velocity state
"""
_velocity_acc_str = "velocity"
def __init__(self, learning_rate, momentum):
assert learning_rate is not None
assert momentum is not None
super(MomentumOptimizer, self).__init__()
self.type = "momentum"
self._learning_rate = learning_rate
self._momentum = momentum
def _initialize_tensors(self, block):
assert isinstance(block, framework.Block)
lr_shape = [1]
# create a variable for learning_rate
self._lr = block.create_var(
dtype="float32", shape=lr_shape, lod_level=0)
# create an op to init the learning_rate
# FIXME: Fix when Initialization design has been implemented
# https://github.com/PaddlePaddle/Paddle/pull/4852
block.append_op(
type="fill_constant",
outputs={"Out": self._lr},
attrs={"shape": lr_shape,
"value": self._learning_rate})
def _create_accumulators(self, block, parameters):
assert isinstance(block, framework.Block)
for p in parameters:
self._add_accumulator(block, self._velocity_acc_str, p, 'float32')
def _append_optimize_op(self, block, param_and_grad):
assert isinstance(block, framework.Block)
velocity_acc = self._get_accumulator(self._velocity_acc_str,
param_and_grad[0])
# create the momentum optimize op
momentum_op = block.append_op(
type=self.type,
inputs={
"Param": param_and_grad[0],
"Grad": param_and_grad[1],
"Velocity": velocity_acc,
"LearningRate": self._lr
},
outputs={
"ParamOut": param_and_grad[0],
"VelocityOut": velocity_acc
},
attrs={"mu": self._momentum})
return momentum_op
class AdagradOptimizer(Optimizer):
"""Simple Adagrad optimizer with moment state
"""
_moment_acc_str = "moment"
def __init__(self, learning_rate, epsilon=1.0e-6):
assert learning_rate is not None
assert epsilon is not None
super(AdagradOptimizer, self).__init__()
self.type = "adagrad"
self._learning_rate = learning_rate
self._epsilon = epsilon
def _initialize_tensors(self, block):
assert isinstance(block, framework.Block)
lr_shape = [1]
# create a variable for learning_rate
self._lr = block.create_var(
dtype="float32", shape=lr_shape, lod_level=0)
# create an op to init the learning_rate
# FIXME: Fix when Initialization design has been implemented
# https://github.com/PaddlePaddle/Paddle/pull/4852
block.append_op(
type="fill_constant",
outputs={"Out": self._lr},
attrs={"shape": lr_shape,
"value": self._learning_rate})
def _create_accumulators(self, block, parameters):
assert isinstance(block, framework.Block)
for p in parameters:
self._add_accumulator(block, self._moment_acc_str, p, 'float32')
def _append_optimize_op(self, block, param_and_grad):
assert isinstance(block, framework.Block)
moment_acc = self._get_accumulator(self._moment_acc_str,
param_and_grad[0])
# create the adagrad optimizer op
adagrad_op = block.append_op(
type=self.type,
inputs={
"Param": param_and_grad[0],
"Grad": param_and_grad[1],
"Moment": moment_acc,
"LearningRate": self._lr
},
outputs={"ParamOut": param_and_grad[0],
"MomentOut": moment_acc},
attrs={"epsilon": self._epsilon})
return adagrad_op
...@@ -33,14 +33,12 @@ class TestAdamOp1(OpTest): ...@@ -33,14 +33,12 @@ class TestAdamOp1(OpTest):
self.attrs = {'epsilon': epsilon, 'beta1': beta1, 'beta2': beta2} self.attrs = {'epsilon': epsilon, 'beta1': beta1, 'beta2': beta2}
param_out, moment1_out, moment2_out, beta1_pow_out, \ param_out, moment1_out, \
beta2_pow_out = adam_step(self.inputs, self.attrs) moment2_out = adam_step(self.inputs, self.attrs)
self.outputs = { self.outputs = {
'Moment1Out': moment1_out, 'Moment1Out': moment1_out,
'Moment2Out': moment2_out, 'Moment2Out': moment2_out,
'Beta1PowOut': beta1_pow_out,
'Beta2PowOut': beta2_pow_out,
'ParamOut': param_out 'ParamOut': param_out
} }
...@@ -78,14 +76,12 @@ class TestAdamOp2(OpTest): ...@@ -78,14 +76,12 @@ class TestAdamOp2(OpTest):
attributes = {'epsilon': epsilon, 'beta1': beta1, 'beta2': beta2} attributes = {'epsilon': epsilon, 'beta1': beta1, 'beta2': beta2}
param_out, moment1_out, moment2_out, beta1_pow_out, \ param_out, moment1_out, \
beta2_pow_out = adam_step(self.inputs, attributes) moment2_out = adam_step(self.inputs, attributes)
self.outputs = { self.outputs = {
'Moment1Out': moment1_out, 'Moment1Out': moment1_out,
'Moment2Out': moment2_out, 'Moment2Out': moment2_out,
'Beta1PowOut': beta1_pow_out,
'Beta2PowOut': beta2_pow_out,
'ParamOut': param_out 'ParamOut': param_out
} }
...@@ -127,14 +123,12 @@ class TestAdamOpMultipleSteps(OpTest): ...@@ -127,14 +123,12 @@ class TestAdamOpMultipleSteps(OpTest):
def test_check_output(self): def test_check_output(self):
for _ in range(self.num_steps): for _ in range(self.num_steps):
param_out, moment1_out, moment2_out, beta1_pow_out, \ param_out, moment1_out, \
beta2_pow_out = adam_step(self.inputs, self.attrs) moment2_out = adam_step(self.inputs, self.attrs)
self.outputs = { self.outputs = {
'Moment1Out': moment1_out, 'Moment1Out': moment1_out,
'Moment2Out': moment2_out, 'Moment2Out': moment2_out,
'Beta1PowOut': beta1_pow_out,
'Beta2PowOut': beta2_pow_out,
'ParamOut': param_out 'ParamOut': param_out
} }
...@@ -145,8 +139,10 @@ class TestAdamOpMultipleSteps(OpTest): ...@@ -145,8 +139,10 @@ class TestAdamOpMultipleSteps(OpTest):
self.inputs['Param'] = param_out self.inputs['Param'] = param_out
self.inputs['Moment1'] = moment1_out self.inputs['Moment1'] = moment1_out
self.inputs['Moment2'] = moment2_out self.inputs['Moment2'] = moment2_out
self.inputs['Beta1Pow'] = beta1_pow_out
self.inputs['Beta2Pow'] = beta2_pow_out # Update powers of Beta1 and Beta2 for next time step
self.inputs['Beta1Pow'] *= self.attrs['beta1']
self.inputs['Beta2Pow'] *= self.attrs['beta1']
# Randomize gradient for next step # Randomize gradient for next step
self.inputs['Grad'] = np.random.uniform( self.inputs['Grad'] = np.random.uniform(
...@@ -175,11 +171,9 @@ def adam_step(inputs, attributes): ...@@ -175,11 +171,9 @@ def adam_step(inputs, attributes):
moment1_out = beta1 * moment1 + (1 - beta1) * grad moment1_out = beta1 * moment1 + (1 - beta1) * grad
moment2_out = beta2 * moment2 + (1 - beta2) * np.square(grad) moment2_out = beta2 * moment2 + (1 - beta2) * np.square(grad)
beta1_pow_out = beta1_pow * beta1 lr_t = lr * np.sqrt(1 - beta2_pow) / (1 - beta1_pow)
beta2_pow_out = beta2_pow * beta2
lr_t = lr * np.sqrt(1 - beta2_pow_out) / (1 - beta1_pow_out)
param_out = param - lr_t * (moment1_out / (np.sqrt(moment2_out) + epsilon)) param_out = param - lr_t * (moment1_out / (np.sqrt(moment2_out) + epsilon))
return param_out, moment1_out, moment2_out, beta1_pow_out, beta2_pow_out return param_out, moment1_out, moment2_out
if __name__ == "__main__": if __name__ == "__main__":
......
...@@ -31,14 +31,13 @@ class TestAdamaxOp1(OpTest): ...@@ -31,14 +31,13 @@ class TestAdamaxOp1(OpTest):
self.attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon} self.attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon}
param_out, moment_out, inf_norm_out, beta1_pow_out = adamax_step( param_out, moment_out, inf_norm_out = adamax_step(self.inputs,
self.inputs, self.attrs) self.attrs)
self.outputs = { self.outputs = {
'ParamOut': param_out, 'ParamOut': param_out,
'MomentOut': moment_out, 'MomentOut': moment_out,
'InfNormOut': inf_norm_out, 'InfNormOut': inf_norm_out
'Beta1PowOut': beta1_pow_out
} }
def test_check_output(self): def test_check_output(self):
...@@ -73,14 +72,12 @@ class TestAdamaxOp2(OpTest): ...@@ -73,14 +72,12 @@ class TestAdamaxOp2(OpTest):
} }
attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon} attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon}
param_out, moment_out, inf_norm_out, beta1_pow_out = adamax_step( param_out, moment_out, inf_norm_out = adamax_step(self.inputs, attrs)
self.inputs, attrs)
self.outputs = { self.outputs = {
'ParamOut': param_out, 'ParamOut': param_out,
'MomentOut': moment_out, 'MomentOut': moment_out,
'InfNormOut': inf_norm_out, 'InfNormOut': inf_norm_out
'Beta1PowOut': beta1_pow_out
} }
def test_check_output(self): def test_check_output(self):
...@@ -117,19 +114,15 @@ class TestAdamaxOpMultipleSteps(OpTest): ...@@ -117,19 +114,15 @@ class TestAdamaxOpMultipleSteps(OpTest):
self.attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon} self.attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon}
param_out, moment_out, inf_norm_out, beta1_pow_out = adamax_step(
self.inputs, self.attrs)
def test_check_output(self): def test_check_output(self):
for _ in range(self.num_steps): for _ in range(self.num_steps):
param_out, moment_out, inf_norm_out, beta1_pow_out = adamax_step( param_out, moment_out, inf_norm_out = adamax_step(self.inputs,
self.inputs, self.attrs) self.attrs)
self.outputs = { self.outputs = {
'ParamOut': param_out, 'ParamOut': param_out,
'MomentOut': moment_out, 'MomentOut': moment_out,
'InfNormOut': inf_norm_out, 'InfNormOut': inf_norm_out
'Beta1PowOut': beta1_pow_out
} }
# Verify output for this step # Verify output for this step
...@@ -139,7 +132,9 @@ class TestAdamaxOpMultipleSteps(OpTest): ...@@ -139,7 +132,9 @@ class TestAdamaxOpMultipleSteps(OpTest):
self.inputs['Param'] = param_out self.inputs['Param'] = param_out
self.inputs['Moment'] = moment_out self.inputs['Moment'] = moment_out
self.inputs['InfNorm'] = inf_norm_out self.inputs['InfNorm'] = inf_norm_out
self.inputs['Beta1Pow'] = beta1_pow_out
# Update Beta1 Power accumulator for next step
self.inputs['Beta1Pow'] *= self.attrs['beta1']
# Randomize gradient for next step # Randomize gradient for next step
self.inputs['Grad'] = np.random.uniform( self.inputs['Grad'] = np.random.uniform(
...@@ -167,11 +162,10 @@ def adamax_step(inputs, attributes): ...@@ -167,11 +162,10 @@ def adamax_step(inputs, attributes):
moment_out = beta1 * moment + (1 - beta1) * grad moment_out = beta1 * moment + (1 - beta1) * grad
inf_norm_out = np.maximum(beta2 * inf_norm + epsilon, np.abs(grad)) inf_norm_out = np.maximum(beta2 * inf_norm + epsilon, np.abs(grad))
beta1_pow_out = beta1_pow * beta1 lr_t = (lr / (1 - beta1_pow))
lr_t = (lr / (1 - beta1_pow_out))
param_out = param - lr_t * np.divide(moment_out, inf_norm_out) param_out = param - lr_t * np.divide(moment_out, inf_norm_out)
return param_out, moment_out, inf_norm_out, beta1_pow_out return param_out, moment_out, inf_norm_out
if __name__ == "__main__": if __name__ == "__main__":
......
...@@ -21,7 +21,7 @@ class TestCrossEntropyOp1(OpTest): ...@@ -21,7 +21,7 @@ class TestCrossEntropyOp1(OpTest):
self.inputs = {"X": X, "Label": label} self.inputs = {"X": X, "Label": label}
self.outputs = {"Y": cross_entropy} self.outputs = {"Y": cross_entropy}
self.attrs = {"softLabel": False} self.attrs = {"soft_label": False}
def test_check_output(self): def test_check_output(self):
self.check_output() self.check_output()
......
...@@ -4,6 +4,12 @@ import unittest ...@@ -4,6 +4,12 @@ import unittest
from paddle.v2.framework.op import Operator, DynamicRecurrentOp from paddle.v2.framework.op import Operator, DynamicRecurrentOp
import numpy as np import numpy as np
# for siplicity, just one level LoD
lod_py = [[0, 4, 7, 9, 10]]
input_dim = 30
num_sents = len(lod_py[0]) - 1
weight_dim = 15
def create_tensor(scope, name, shape, np_data): def create_tensor(scope, name, shape, np_data):
tensor = scope.var(name).get_tensor() tensor = scope.var(name).get_tensor()
...@@ -12,6 +18,17 @@ def create_tensor(scope, name, shape, np_data): ...@@ -12,6 +18,17 @@ def create_tensor(scope, name, shape, np_data):
return tensor return tensor
class PyRNNStep(object):
def __init__(self):
self.x = np.random.normal(size=(lod_py[0][-1],
input_dim)).astype("float32")
self.W = np.random.normal(size=(input_dim, input_dim)).astype("float32")
self.U = np.random.normal(size=(input_dim, input_dim)).astype("float32")
self.h_boot = np.random.normal(size=(num_sents,
input_dim)).astype("float32")
class DynamicRecurrentOpTest(unittest.TestCase): class DynamicRecurrentOpTest(unittest.TestCase):
''' '''
Test RNNOp Test RNNOp
...@@ -23,17 +40,13 @@ class DynamicRecurrentOpTest(unittest.TestCase): ...@@ -23,17 +40,13 @@ class DynamicRecurrentOpTest(unittest.TestCase):
- U - U
vars: vars:
- x - x
memories: states:
- h - h
outputs: outputs:
- h - h
''' '''
# for siplicity, just one level LoD py = PyRNNStep()
lod_py = [[0, 4, 7, 9, 10]]
input_dim = 30
num_sents = len(lod_py[0]) - 1
weight_dim = 15
def forward(self): def forward(self):
self.scope = core.Scope() self.scope = core.Scope()
...@@ -42,64 +55,55 @@ class DynamicRecurrentOpTest(unittest.TestCase): ...@@ -42,64 +55,55 @@ class DynamicRecurrentOpTest(unittest.TestCase):
self.create_step_net() self.create_step_net()
ctx = core.DeviceContext.create(core.CPUPlace()) ctx = core.DeviceContext.create(core.CPUPlace())
self.rnnop.run(self.scope, ctx) self.rnnop.run(self.scope, ctx)
state = self.rnnop.get_state("h@mem") state = self.rnnop.get_state("h@state")
print 'state size: ', state.size() print 'state size: ', state.size()
step_inputs = self.rnnop.get_step_input("x") step_inputs = self.rnnop.get_step_input("x")
print "x size ", step_inputs.size() print "x size ", step_inputs.size()
for i in range(step_inputs.size()): for i in range(step_inputs.size()):
print "x %d" % i, np.array(step_inputs.read(i).get_dims()) print "x %d" % i, np.array(step_inputs.read(i).get_dims())
step_outputs = self.rnnop.get_step_output('h@mem') step_outputs = self.rnnop.get_step_output('h@state')
print 'step_outputs.size ', step_outputs.size() print 'step_outputs.size ', step_outputs.size()
output = self.scope.find_var("h@mem").get_tensor() output = self.scope.find_var("h@state").get_tensor()
print 'output', np.array(output).shape print 'output', np.array(output).shape
def create_global_variables(self): def create_global_variables(self):
x = np.random.normal(size=(self.lod_py[0][-1],
self.input_dim)).astype("float32")
W = np.random.normal(size=(self.input_dim,
self.input_dim)).astype("float32")
U = np.random.normal(size=(self.input_dim,
self.input_dim)).astype("float32")
h_boot = np.random.normal(size=(self.num_sents,
self.input_dim)).astype("float32")
# create inlink # create inlink
x_tensor = create_tensor(self.scope, "x", x_tensor = create_tensor(self.scope, "x", [num_sents, input_dim],
[self.num_sents, self.input_dim], x) self.py.x)
x_tensor.set_lod(self.lod_py) x_tensor.set_lod(lod_py)
create_tensor(self.scope, "W", [self.input_dim, self.input_dim], W) create_tensor(self.scope, "W", [input_dim, input_dim], self.py.W)
create_tensor(self.scope, "U", [self.input_dim, self.input_dim], U) create_tensor(self.scope, "U", [input_dim, input_dim], self.py.U)
create_tensor(self.scope, "h_boot", [self.num_sents, self.input_dim], create_tensor(self.scope, "h_boot", [num_sents, input_dim],
h_boot) self.py.h_boot)
self.scope.var("step_scopes") self.scope.var("step_scopes")
self.scope.var("h@mem") self.scope.var("h@state")
def create_rnn_op(self): def create_rnn_op(self):
# create RNNOp # create RNNOp
self.rnnop = DynamicRecurrentOp( self.rnnop = DynamicRecurrentOp(
# inputs # inputs
inlinks=["x"], inputs=["x"],
boot_memories=["h_boot"], initial_states=["h_boot"],
step_net="stepnet", step_net="step_unit",
# outputs # outputs
outlinks=["h@mem"], outputs=["h@state"],
step_scopes="step_scopes", step_scopes="step_scopes",
# attributes # attributes
pre_memories=["h@pre"], ex_states=["h@pre"],
memories=["h@mem"]) states=["h@state"])
def create_step_net(self): def create_step_net(self):
stepnet = core.Net.create() step_unit = core.Net.create()
x_fc_op = Operator("mul", X="x", Y="W", Out="Wx") x_fc_op = Operator("mul", X="x", Y="W", Out="Wx")
h_fc_op = Operator("mul", X="h@pre", Y="U", Out="Uh") h_fc_op = Operator("mul", X="h@pre", Y="U", Out="Uh")
sum_op = Operator("sum", X=["Wx", "Uh"], Out="sum") sum_op = Operator("sum", X=["Wx", "Uh"], Out="sum")
sig_op = Operator("sigmoid", X="sum", Y="h@mem") sig_op = Operator("sigmoid", X="sum", Y="h@state")
for op in [x_fc_op, h_fc_op, sum_op, sig_op]: for op in [x_fc_op, h_fc_op, sum_op, sig_op]:
stepnet.append_op(op) step_unit.append_op(op)
stepnet.complete_add_op(True) step_unit.complete_add_op(True)
self.rnnop.set_stepnet(stepnet) self.rnnop.set_step_unit(step_unit)
def test_forward(self): def test_forward(self):
print 'test recurrent op forward' print 'test recurrent op forward'
...@@ -107,5 +111,58 @@ class DynamicRecurrentOpTest(unittest.TestCase): ...@@ -107,5 +111,58 @@ class DynamicRecurrentOpTest(unittest.TestCase):
print 'pd_output', pd_output print 'pd_output', pd_output
class RecurrentGradientOpTest(unittest.TestCase):
py = PyRNNStep()
def create_forward_op(self):
# create RNNOp
self.forward_op = DynamicRecurrentOp(
# inputs
inputs=["x"],
initial_states=["h_boot"],
step_net="step_unit",
# outputs
outputs=["h@state"],
step_scopes="step_scopes",
# attributes
ex_states=["h@pre"],
states=["h@state"])
def create_gradient_op(self):
a = set()
backward_op = core.DynamicRecurrentOp.backward(self.forward_op, a)
def create_step_net(self):
step_unit = core.Net.create()
x_fc_op = Operator("mul", X="x", Y="W", Out="Wx")
h_fc_op = Operator("mul", X="h@pre", Y="U", Out="Uh")
sum_op = Operator("sum", X=["Wx", "Uh"], Out="sum")
sig_op = Operator("sigmoid", X="sum", Y="h@state")
for op in [x_fc_op, h_fc_op, sum_op, sig_op]:
step_unit.append_op(op)
step_unit.complete_add_op(True)
self.forward_op.set_step_unit(step_unit)
def create_global_variables(self):
# create inlink
x_tensor = create_tensor(self.scope, "x", [num_sents, input_dim],
self.py.x)
x_tensor.set_lod(lod_py)
create_tensor(self.scope, "W", [input_dim, input_dim], self.py.W)
create_tensor(self.scope, "U", [input_dim, input_dim], self.py.U)
create_tensor(self.scope, "h_boot", [num_sents, input_dim],
self.py.h_boot)
self.scope.var("step_scopes")
self.scope.var("h@state")
def test_grad(self):
self.scope = core.Scope()
self.create_forward_op()
self.create_global_variables()
self.create_step_net()
self.create_gradient_op()
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -92,5 +92,33 @@ class TestElementwiseAddOp_broadcast_3(TestElementwiseOp): ...@@ -92,5 +92,33 @@ class TestElementwiseAddOp_broadcast_3(TestElementwiseOp):
} }
class TestElementwiseAddOp_rowwise_add_0(TestElementwiseOp):
def setUp(self):
self.op_type = "elementwise_add"
self.inputs = {
'X': np.random.rand(2, 3, 4).astype(np.float32),
'Y': np.random.rand(3, 4).astype(np.float32)
}
self.attrs = {'axis': 1}
self.outputs = {
'Out': self.inputs['X'] + self.inputs['Y'].reshape(1, 3, 4)
}
class TestElementwiseAddOp_rowwise_add_1(TestElementwiseOp):
def setUp(self):
self.op_type = "elementwise_add"
self.inputs = {
'X': np.random.rand(2, 1).astype(np.float32),
'Y': np.random.rand(1).astype(np.float32)
}
self.attrs = {'axis': 1}
self.outputs = {
'Out': self.inputs['X'] + self.inputs['Y'].reshape(1, 1)
}
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
import unittest
from paddle.v2.framework.layers import mul, data
import paddle.v2.framework.core as core
from paddle.v2.framework.executor import Executor
from paddle.v2.framework.framework import g_program
import numpy
class TestExecutor(unittest.TestCase):
def test_mul(self):
a = data(name='a', shape=[784], data_type='float32')
b = data(
name='b',
shape=[784, 100],
data_type='float32',
append_batch_size=False)
out = mul(x=a, y=b)
place = core.CPUPlace()
a_np = numpy.random.random((100, 784)).astype('float32')
tensor_a = core.LoDTensor()
tensor_a.set(a_np, place)
b_np = numpy.random.random((784, 100)).astype('float32')
tensor_b = core.LoDTensor()
tensor_b.set(b_np, place)
exe = Executor(place)
outs = exe.run(g_program,
feed={'a': tensor_a,
'b': tensor_b},
fetch_list=[out])
out = numpy.array(outs[0])
self.assertEqual((100, 100), out.shape)
self.assertTrue(numpy.allclose(out, numpy.dot(a_np, b_np)))
if __name__ == '__main__':
unittest.main()
...@@ -5,6 +5,7 @@ import numpy as np ...@@ -5,6 +5,7 @@ import numpy as np
class TestFeedFetch(unittest.TestCase): class TestFeedFetch(unittest.TestCase):
def test_feed_fetch(self): def test_feed_fetch(self):
scope = core.Scope()
place = core.CPUPlace() place = core.CPUPlace()
input_array = np.ones((4, 4, 6)).astype("float32") input_array = np.ones((4, 4, 6)).astype("float32")
input_array[0, 0, 0] = 3 input_array[0, 0, 0] = 3
...@@ -12,9 +13,9 @@ class TestFeedFetch(unittest.TestCase): ...@@ -12,9 +13,9 @@ class TestFeedFetch(unittest.TestCase):
input_tensor = core.LoDTensor([[0, 2, 4]]) input_tensor = core.LoDTensor([[0, 2, 4]])
input_tensor.set(input_array, place) input_tensor.set(input_array, place)
core.set_feed_variable_float(input_tensor, "feed", 0) core.set_feed_variable(scope, input_tensor, "feed", 0)
output_tensor = core.get_fetch_variable("feed", 0) output_tensor = core.get_fetch_variable(scope, "feed", 0)
output_lod = output_tensor.lod() output_lod = output_tensor.lod()
self.assertEqual(0, output_lod[0][0]) self.assertEqual(0, output_lod[0][0])
......
import paddle.v2 as paddle
import paddle.v2.framework.layers as layers
import paddle.v2.framework.core as core
import paddle.v2.framework.optimizer as optimizer
from paddle.v2.framework.framework import Program, g_program
from paddle.v2.framework.executor import Executor
import numpy as np
init_program = Program()
program = Program()
x = layers.data(
name='x',
shape=[13],
data_type='float32',
program=program,
init_program=init_program)
y_predict = layers.fc(input=x,
size=1,
act=None,
program=program,
init_program=init_program)
y = layers.data(
name='y',
shape=[1],
data_type='float32',
program=program,
init_program=init_program)
cost = layers.square_error_cost(
input=y_predict, label=y, program=program, init_program=init_program)
avg_cost = layers.mean(x=cost, program=program, init_program=init_program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
opts = sgd_optimizer.minimize(avg_cost)
BATCH_SIZE = 20
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500),
batch_size=BATCH_SIZE)
place = core.CPUPlace()
exe = Executor(place)
exe.run(init_program, feed={}, fetch_list=[])
PASS_NUM = 100
for pass_id in range(PASS_NUM):
for data in train_reader():
x_data = np.array(map(lambda x: x[0], data)).astype("float32")
y_data = np.array(map(lambda x: x[1], data)).astype("float32")
tensor_x = core.LoDTensor()
tensor_x.set(x_data, place)
# print tensor_x.get_dims()
tensor_y = core.LoDTensor()
tensor_y.set(y_data, place)
# print tensor_y.get_dims()
outs = exe.run(program,
feed={'x': tensor_x,
'y': tensor_y},
fetch_list=[avg_cost])
out = np.array(outs[0])
if out[0] < 10.0:
exit(0) # if avg cost less than 10.0, we think our code is good.
exit(1)
import unittest
import numpy as np
from op_test import OpTest
class TestIncrementOpPositiveStep(OpTest):
"""Test increment op with positive step
"""
def setUp(self):
self.op_type = "increment"
self.inputs = {'X': np.random.random((10, 10)).astype("float32")}
self.attrs = {'step': 14.8}
self.outputs = {'Out': self.inputs['X'] + self.attrs['step']}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(['X'], 'Out')
class TestIncrementOpNegativeStep(OpTest):
"""Test increment op with negative step
"""
def setUp(self):
self.op_type = "increment"
self.inputs = {'X': np.random.random((10, 10)).astype("float32")}
self.attrs = {'step': -3.8}
self.outputs = {'Out': self.inputs['X'] + self.attrs['step']}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(['X'], 'Out')
if __name__ == "__main__":
unittest.main()
...@@ -5,7 +5,7 @@ import paddle.v2.framework.core as core ...@@ -5,7 +5,7 @@ import paddle.v2.framework.core as core
class TestInferShape(unittest.TestCase): class TestInferShape(unittest.TestCase):
def test_sum_op(self): def test_sum_op(self):
prog = core.ProgramDesc.__create_program_desc__() prog = core.ProgramDesc()
self.assertIsNotNone(prog) self.assertIsNotNone(prog)
block = prog.block(0) block = prog.block(0)
self.assertIsNotNone(block) self.assertIsNotNone(block)
...@@ -33,7 +33,7 @@ class TestInferShape(unittest.TestCase): ...@@ -33,7 +33,7 @@ class TestInferShape(unittest.TestCase):
self.assertEqual(out.shape(), shape) self.assertEqual(out.shape(), shape)
def test_mul_op(self): def test_mul_op(self):
prog = core.ProgramDesc.__create_program_desc__() prog = core.ProgramDesc()
self.assertIsNotNone(prog) self.assertIsNotNone(prog)
block = prog.block(0) block = prog.block(0)
self.assertIsNotNone(block) self.assertIsNotNone(block)
......
from paddle.v2.framework.layers import fc_layer, data_layer, cross_entropy, mean, square_error_cost import paddle.v2.framework.layers as layers
import paddle.v2.framework.nets as nets
from paddle.v2.framework.framework import Program, g_program from paddle.v2.framework.framework import Program, g_program
import paddle.v2.framework.core as core import paddle.v2.framework.core as core
import unittest import unittest
...@@ -6,38 +7,87 @@ import unittest ...@@ -6,38 +7,87 @@ import unittest
class TestBook(unittest.TestCase): class TestBook(unittest.TestCase):
def test_fit_a_line(self): def test_fit_a_line(self):
pd = core.ProgramDesc.__create_program_desc__() program = Program()
program = Program(desc=pd) x = layers.data(
x = data_layer(
name='x', shape=[13], data_type='float32', program=program) name='x', shape=[13], data_type='float32', program=program)
y_predict = fc_layer(input=x, size=1, act=None, program=program) y_predict = layers.fc(input=x, size=1, act=None, program=program)
y = data_layer( y = layers.data(
name='y', shape=[1], data_type='float32', program=program) name='y', shape=[1], data_type='float32', program=program)
cost = square_error_cost(input=y_predict, label=y, program=program) cost = layers.square_error_cost(
input=y_predict, label=y, program=program)
avg_cost = mean(x=cost, program=program) avg_cost = layers.mean(x=cost, program=program)
self.assertIsNotNone(avg_cost) self.assertIsNotNone(avg_cost)
program.append_backward(avg_cost)
print str(program) print str(program)
def test_recognize_digits_mlp(self): def test_recognize_digits_mlp(self):
pd = core.ProgramDesc.__create_program_desc__() program = Program()
program = Program(desc=pd)
# Change g_program, so the rest layers use `g_program` # Change g_program, so the rest layers use `g_program`
images = data_layer( images = layers.data(
name='pixel', shape=[784], data_type='float32', program=program) name='pixel', shape=[784], data_type='float32', program=program)
label = data_layer( label = layers.data(
name='label', shape=[1], data_type='int32', program=program) name='label', shape=[1], data_type='int32', program=program)
hidden1 = fc_layer(input=images, size=128, act='relu', program=program) hidden1 = layers.fc(input=images, size=128, act='relu', program=program)
hidden2 = fc_layer(input=hidden1, size=64, act='relu', program=program) hidden2 = layers.fc(input=hidden1, size=64, act='relu', program=program)
predict = fc_layer( predict = layers.fc(input=hidden2,
input=hidden2, size=10, act='softmax', program=program) size=10,
cost = cross_entropy(input=predict, label=label, program=program) act='softmax',
avg_cost = mean(x=cost, program=program) program=program)
cost = layers.cross_entropy(input=predict, label=label, program=program)
avg_cost = layers.mean(x=cost, program=program)
self.assertIsNotNone(avg_cost) self.assertIsNotNone(avg_cost)
print str(program) print str(program)
def test_simple_conv2d(self):
program = Program()
images = layers.data(
name='pixel', shape=[3, 48, 48], data_type='int32', program=program)
layers.conv2d(
input=images, num_filters=3, filter_size=[4, 4], program=program)
print str(program)
def test_recognize_digits_conv(self):
program = Program()
images = layers.data(
name='pixel',
shape=[1, 28, 28],
data_type='float32',
program=program)
label = layers.data(
name='label', shape=[1], data_type='int32', program=program)
conv_pool_1 = nets.simple_img_conv_pool(
input=images,
filter_size=5,
num_filters=2,
pool_size=2,
pool_stride=2,
act="relu",
program=program)
conv_pool_2 = nets.simple_img_conv_pool(
input=conv_pool_1,
filter_size=5,
num_filters=4,
pool_size=2,
pool_stride=2,
act="relu",
program=program)
predict = layers.fc(input=conv_pool_2,
size=10,
act="softmax",
program=program)
cost = layers.cross_entropy(input=predict, label=label, program=program)
avg_cost = layers.mean(x=cost, program=program)
program.append_backward(avg_cost)
print str(program)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
import unittest
import numpy as np
from op_test import OpTest
def generate_compatible_shapes(dim_X, dim_Y, transpose_X, transpose_Y):
BATCH_SIZE = 2
M = 3
N = 4
K = 5
if (dim_X == 1 and transpose_X) or (dim_Y == 1 and transpose_Y):
K = 1
if dim_X == 1:
if transpose_X:
shape_X = [M]
else:
shape_X = [K]
if dim_Y == 1:
if transpose_Y:
shape_Y = [N]
else:
shape_Y = [K]
if dim_X >= 2:
if transpose_X:
shape_X = [K, M]
else:
shape_X = [M, K]
if dim_X == 3:
shape_X = [BATCH_SIZE] + shape_X
if dim_Y >= 2:
if transpose_Y:
shape_Y = [N, K]
else:
shape_Y = [K, N]
if dim_Y == 3:
shape_Y = [BATCH_SIZE] + shape_Y
return shape_X, shape_Y
def reference_matmul(X, Y, transpose_X=False, transpose_Y=False):
"""Reference forward implementation using np.matmul."""
# np.matmul does not support the transpose flags, so we manually
# transpose X and Y appropriately.
if transpose_X:
if X.ndim == 1:
X = X.reshape((X.size, 1))
elif X.ndim == 2:
X = X.T
elif X.ndim == 3:
X = np.transpose(X, (0, 2, 1))
else:
raise ValueError('X must have between 1 and 3 dimensions')
if transpose_Y:
if Y.ndim == 1:
Y = Y.reshape((1, Y.size))
elif Y.ndim == 2:
Y = Y.T
elif Y.ndim == 3:
Y = np.transpose(Y, (0, 2, 1))
else:
raise ValueError('Y must have between 1 and 3 dimensions')
Out = np.matmul(X, Y)
if not Out.shape:
# We do not support 0-dimensional Tensors (scalars). So where
# np.matmul outputs a scalar, we must convert to a Tensor of
# shape (1, ) instead.
# Everywhere else, we are compatible with np.matmul.
Out = np.array([Out], dtype="float32")
return Out
class Generator(object):
def setUp(self):
self.op_type = "matmul"
X = np.random.random(self.shape_X).astype("float32")
Y = np.random.random(self.shape_Y).astype("float32")
Out = reference_matmul(X, Y, self.transpose_X, self.transpose_Y)
self.inputs = {'X': X, 'Y': Y}
self.attrs = {
'transpose_X': self.transpose_X,
'transpose_Y': self.transpose_Y
}
self.outputs = {'Out': Out}
def test_check_output(self):
self.check_output(atol=1e-2)
def test_check_grad_normal(self):
self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5)
def test_check_grad_ignore_x(self):
self.check_grad(
['Y'], 'Out', max_relative_error=0.5, no_grad_set=set("X"))
def test_check_grad_ignore_y(self):
self.check_grad(
['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y'))
# Generate test cases for all possibilities
for dim_X in [1, 2, 3]:
for dim_Y in [1, 2, 3]:
for transpose_X in [False, True]:
for transpose_Y in [False, True]:
test_name = (
'TestMatMulOp_dimX_{}_dim_Y_{}_transX_{}_transY_{}'.format(
dim_X, dim_Y, transpose_X, transpose_Y))
shape_X, shape_Y = generate_compatible_shapes(
dim_X, dim_Y, transpose_X, transpose_Y)
test_class = type(test_name, (Generator, OpTest), {
'shape_X': shape_X,
'shape_Y': shape_Y,
'transpose_X': transpose_X,
'transpose_Y': transpose_Y,
})
globals()[test_name] = test_class
if __name__ == "__main__":
unittest.main()
import unittest
import numpy as np
from op_test import OpTest
class TestMomentumOp1(OpTest):
def setUp(self):
self.op_type = "momentum"
param = np.random.random((123, 321)).astype("float32")
grad = np.random.random((123, 321)).astype("float32")
velocity = np.zeros((123, 321)).astype("float32")
learning_rate = np.array([0.001]).astype("float32")
mu = 0.0001
use_nesterov = False
self.inputs = {
'Param': param,
'Grad': grad,
'Velocity': velocity,
'LearningRate': learning_rate
}
self.attrs = {'mu': mu}
velocity_out = mu * velocity + grad
if use_nesterov:
param_out = param - grad * learning_rate + \
velocity_out * mu * learning_rate
else:
param_out = param - learning_rate * velocity_out
self.outputs = {'ParamOut': param_out, 'VelocityOut': velocity_out}
def test_check_output(self):
self.check_output()
class TestMomentumOp2(OpTest):
'''Test Momentum with defaukt values for attributes
'''
def setUp(self):
self.op_type = "momentum"
param = np.random.random((123, 321)).astype("float32")
grad = np.random.random((123, 321)).astype("float32")
velocity = np.zeros((123, 321)).astype("float32")
learning_rate = np.array([0.001]).astype("float32")
mu = 0.0001
use_nesterov = True
self.inputs = {
'Param': param,
'Grad': grad,
'Velocity': velocity,
'LearningRate': learning_rate
}
self.attrs = {'mu': mu, 'useNesterov': use_nesterov}
velocity_out = mu * velocity + grad
if use_nesterov:
param_out = param - grad * learning_rate + \
velocity_out * mu * learning_rate
else:
param_out = param - learning_rate * velocity_out
self.outputs = {'ParamOut': param_out, 'VelocityOut': velocity_out}
def test_check_output(self):
self.check_output()
if __name__ == "__main__":
unittest.main()
import unittest
import paddle.v2.framework.core as core
class TestOpSupportGPU(unittest.TestCase):
def test_case(self):
self.assertEqual(core.is_compile_gpu(), core.op_support_gpu("sum"))
if __name__ == '__main__':
unittest.main()
import unittest
import paddle.v2.framework.framework as framework
import paddle.v2.framework.optimizer as optimizer
class TestOptimizer(unittest.TestCase):
def test_sgd_optimizer(self):
program = framework.Program()
block = program.global_block()
mul_x = block.create_parameter(
dtype="float32", shape=[5, 10], lod_level=0, name="mul.x")
mul_y = block.create_var(
dtype="float32", shape=[10, 8], lod_level=0, name="mul.y")
mul_out = block.create_var(
dtype="float32", shape=[5, 8], lod_level=0, name="mul.out")
block.append_op(
type="mul",
inputs={"X": mul_x,
"Y": mul_y},
outputs={"Out": mul_out},
attrs={"x_num_col_dims": 1})
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.01)
opts = sgd_optimizer.minimize(mul_out)
self.assertEqual(len(opts), 1)
sgd_op = opts[0]
self.assertEqual(sgd_op.type, "sgd")
class TestMomentumOptimizer(unittest.TestCase):
class MockMomentum(optimizer.MomentumOptimizer):
def get_accumulators(self):
return self._accumulators
def get_velocity_str(self):
return self._velocity_acc_str
def test_momentum_optimizer(self):
program = framework.Program()
block = program.global_block()
mul_x = block.create_parameter(
dtype="float32", shape=[5, 10], lod_level=0, name="mul.x")
mul_y = block.create_var(
dtype="float32", shape=[10, 8], lod_level=0, name="mul.y")
mul_out = block.create_var(
dtype="float32", shape=[5, 8], lod_level=0, name="mul.out")
block.append_op(
type="mul",
inputs={"X": mul_x,
"Y": mul_y},
outputs={"Out": mul_out},
attrs={"x_num_col_dims": 1})
momentum_optimizer = self.MockMomentum(learning_rate=0.01, momentum=0.2)
params_grads = momentum_optimizer.create_backward_pass(mul_out)
self.assertEqual(len(params_grads), 1)
self.assertEqual(len(momentum_optimizer.get_accumulators()), 0)
opts = momentum_optimizer.create_optimization_pass(params_grads,
mul_out)
self.assertEqual(len(opts), 1)
sgd_op = opts[0]
self.assertEqual(sgd_op.type, "momentum")
# Check accumulators
accumulators = momentum_optimizer.get_accumulators()
self.assertEqual(len(accumulators), 1)
self.assertTrue(momentum_optimizer.get_velocity_str() in accumulators)
velocity_acc = accumulators[momentum_optimizer.get_velocity_str()]
self.assertEqual(len(velocity_acc), 1)
self.assertTrue(mul_x.name in velocity_acc)
class TestAdagradOptimizer(unittest.TestCase):
class MockAdagrad(optimizer.AdagradOptimizer):
def get_accumulators(self):
return self._accumulators
def get_moment_str(self):
return self._moment_acc_str
def test_adagrad_optimizer(self):
program = framework.Program()
block = program.global_block()
mul_x = block.create_parameter(
dtype="float32", shape=[5, 10], lod_level=0, name="mul.x")
mul_y = block.create_var(
dtype="float32", shape=[10, 8], lod_level=0, name="mul.y")
mul_out = block.create_var(
dtype="float32", shape=[5, 8], lod_level=0, name="mul.out")
block.append_op(
type="mul",
inputs={"X": mul_x,
"Y": mul_y},
outputs={"Out": mul_out},
attrs={"x_num_col_dims": 1})
adagrad_optimizer = self.MockAdagrad(learning_rate=0.01, epsilon=1.0e-6)
params_grads = adagrad_optimizer.create_backward_pass(mul_out)
self.assertEqual(len(params_grads), 1)
self.assertEqual(len(adagrad_optimizer.get_accumulators()), 0)
opts = adagrad_optimizer.create_optimization_pass(params_grads, mul_out)
self.assertEqual(len(opts), 1)
adagrad_op = opts[0]
self.assertEqual(adagrad_op.type, "adagrad")
# check accumulators
accumulators = adagrad_optimizer.get_accumulators()
self.assertEqual(len(accumulators), 1)
self.assertTrue(adagrad_optimizer.get_moment_str() in accumulators)
moment_acc = accumulators[adagrad_optimizer.get_moment_str()]
self.assertEqual(len(moment_acc), 1)
self.assertTrue(mul_x.name in moment_acc)
if __name__ == '__main__':
unittest.main()
...@@ -34,49 +34,29 @@ class TestProgram(unittest.TestCase): ...@@ -34,49 +34,29 @@ class TestProgram(unittest.TestCase):
self.assertEqual(1, b.idx) self.assertEqual(1, b.idx)
self.assertEqual(0, b.parent_idx) self.assertEqual(0, b.parent_idx)
def test_desc_append_backward(self): def test_program_clone(self):
prog = core.ProgramDesc.__create_program_desc__() prog = Program()
self.assertIsNotNone(prog)
block = prog.block(0)
self.assertIsNotNone(block)
mul_op_desc = block.append_op()
mul_op_desc.set_type("mul")
mul_op_desc.set_input("X", ["x1"])
mul_op_desc.set_input("Y", ["y1"])
mul_op_desc.set_output("Out", ["out1"])
sum_op_desc = block.append_op()
sum_op_desc.set_type("elementwise_add")
sum_op_desc.set_input("X", ["out1"])
sum_op_desc.set_input("Y", ["b1"])
sum_op_desc.set_output("Out", ["out2"])
target = block.var("out2")
expect_ops = [ x = prog.global_block().create_var(
"mul", "elementwise_add", "fill_constant", "elementwise_add_grad", name='X', shape=[1000, 784], dtype='float32')
"mul_grad"
]
def grad_name(name):
return name + "@GRAD"
actual_ops = [] y = prog.global_block().create_var(
param_to_grad = prog.append_backward(target, set()) name='Y', shape=[784, 100], dtype='float32')
for var_name in ("x1", "y1", "out1", "b1"): out = prog.global_block().create_var(name='Out', dtype='float32')
self.assertEqual(param_to_grad[var_name][0], grad_name(var_name)) prog.global_block().append_op(
self.assertEqual(param_to_grad[var_name][1], 0) type="mul", inputs={'X': [x],
'Y': [y]}, outputs={'Out': [out]})
for op in block.all_ops(): # FIXME(yuyang18): We manual compare the output string, since the order
actual_ops.append(op.type()) # of variable could be changed.
self.assertEqual(actual_ops, expect_ops) print prog
print prog.clone()
def test_append_backward(self): def test_append_backward(self):
prog = Program.instance() prog = Program()
block = prog.global_block() block = prog.global_block()
mul_x = block.create_parameter( mul_x = block.create_var(
dtype="float32", shape=[5, 10], lod_level=0, name="mul.x") dtype="float32", shape=[5, 10], lod_level=0, name="mul.x")
mul_y = block.create_var( mul_y = block.create_var(
dtype="float32", shape=[10, 8], lod_level=0, name="mul.y") dtype="float32", shape=[10, 8], lod_level=0, name="mul.y")
...@@ -88,7 +68,35 @@ class TestProgram(unittest.TestCase): ...@@ -88,7 +68,35 @@ class TestProgram(unittest.TestCase):
"Y": mul_y}, "Y": mul_y},
outputs={"Out": [mul_out]}, outputs={"Out": [mul_out]},
attrs={"x_num_col_dims": 1}) attrs={"x_num_col_dims": 1})
param_to_grad = prog.append_backward(mul_out, set())
add_y = block.create_var(
dtype="float32", shape=[5, 8], lod_level=0, name="add.y")
add_out = block.create_var(
dtype="float32", shape=[5, 8], lod_level=0, name="add.out")
add_op = block.append_op(
type="elementwise_add",
inputs={"X": mul_out,
"Y": add_y},
outputs={"Out": add_out},
attrs={"x_num_col_dims": 1})
param_to_grad = prog.append_backward(add_out, set())
def grad_name(name):
return name + "@GRAD"
for var_name in ("mul.x", "mul.y", "mul.out", "add.y", "add.out"):
self.assertEqual(param_to_grad[var_name][0], grad_name(var_name))
self.assertEqual(param_to_grad[var_name][1], 0)
expect_ops = [
"mul", "elementwise_add", "fill_constant", "elementwise_add_grad",
"mul_grad"
]
actual_ops = []
for op in block.ops:
actual_ops.append(op.type)
self.assertEqual(actual_ops, expect_ops)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -4,7 +4,7 @@ import paddle.v2.framework.core as core ...@@ -4,7 +4,7 @@ import paddle.v2.framework.core as core
class TestOpDesc(unittest.TestCase): class TestOpDesc(unittest.TestCase):
def test_op_desc(self): def test_op_desc(self):
prog = core.ProgramDesc.__create_program_desc__() prog = core.ProgramDesc()
self.assertIsNotNone(prog) self.assertIsNotNone(prog)
block = prog.block(0) block = prog.block(0)
self.assertIsNotNone(block) self.assertIsNotNone(block)
...@@ -64,16 +64,16 @@ class TestOpDesc(unittest.TestCase): ...@@ -64,16 +64,16 @@ class TestOpDesc(unittest.TestCase):
class TestProgramDesc(unittest.TestCase): class TestProgramDesc(unittest.TestCase):
def test_instance(self): def test_instance(self):
program_desc = core.ProgramDesc.__create_program_desc__() program_desc = core.ProgramDesc()
self.assertIsNotNone(program_desc) self.assertIsNotNone(program_desc)
del program_desc del program_desc
program_desc = core.ProgramDesc.instance() program_desc = core.ProgramDesc()
self.assertIsNotNone(program_desc) self.assertIsNotNone(program_desc)
self.assertIsNotNone(program_desc.block(0)) self.assertIsNotNone(program_desc.block(0))
del program_desc del program_desc
def test_append_block(self): def test_append_block(self):
prog_desc = core.ProgramDesc.__create_program_desc__() prog_desc = core.ProgramDesc()
self.assertIsNotNone(prog_desc) self.assertIsNotNone(prog_desc)
block_root = prog_desc.block(0) block_root = prog_desc.block(0)
self.assertIsNotNone(block_root) self.assertIsNotNone(block_root)
...@@ -91,7 +91,7 @@ class TestProgramDesc(unittest.TestCase): ...@@ -91,7 +91,7 @@ class TestProgramDesc(unittest.TestCase):
class TestVarDesc(unittest.TestCase): class TestVarDesc(unittest.TestCase):
def test_shape(self): def test_shape(self):
program_desc = core.ProgramDesc.__create_program_desc__() program_desc = core.ProgramDesc()
block = program_desc.block(0) block = program_desc.block(0)
var = block.var('my_var') var = block.var('my_var')
var.set_type(core.VarDesc.VarType.SELECTED_ROWS) var.set_type(core.VarDesc.VarType.SELECTED_ROWS)
...@@ -102,7 +102,7 @@ class TestVarDesc(unittest.TestCase): ...@@ -102,7 +102,7 @@ class TestVarDesc(unittest.TestCase):
self.assertEqual(core.VarDesc.VarType.SELECTED_ROWS, var.type()) self.assertEqual(core.VarDesc.VarType.SELECTED_ROWS, var.type())
def test_data_type(self): def test_data_type(self):
program_desc = core.ProgramDesc.__create_program_desc__() program_desc = core.ProgramDesc()
block = program_desc.block(0) block = program_desc.block(0)
var = block.var('my_var') var = block.var('my_var')
var.set_type(core.VarDesc.VarType.LOD_TENSOR) var.set_type(core.VarDesc.VarType.LOD_TENSOR)
...@@ -113,7 +113,7 @@ class TestVarDesc(unittest.TestCase): ...@@ -113,7 +113,7 @@ class TestVarDesc(unittest.TestCase):
class TestBlockDesc(unittest.TestCase): class TestBlockDesc(unittest.TestCase):
def test_add_var(self): def test_add_var(self):
prog = core.ProgramDesc.__create_program_desc__() prog = core.ProgramDesc()
self.assertIsNotNone(prog) self.assertIsNotNone(prog)
block = prog.block(0) block = prog.block(0)
self.assertIsNotNone(block) self.assertIsNotNone(block)
...@@ -121,19 +121,21 @@ class TestBlockDesc(unittest.TestCase): ...@@ -121,19 +121,21 @@ class TestBlockDesc(unittest.TestCase):
var2 = block.var("var2") var2 = block.var("var2")
var3 = block.var("var3") var3 = block.var("var3")
all_vars = block.all_vars() all_vars = block.all_vars()
self.assertEqual(set(all_vars), set([var1, var2, var3])) self.assertEqual(set(all_vars), {var1, var2, var3})
var2_re = block.find_var("var2") var2_re = block.find_var("var2")
self.assertEqual(var2_re, var2) self.assertEqual(var2_re, var2)
def test_add_op(self): def test_add_op(self):
prog = core.ProgramDesc.__create_program_desc__() prog = core.ProgramDesc()
self.assertIsNotNone(prog) self.assertIsNotNone(prog)
block = prog.block(0) block = prog.block(0)
self.assertIsNotNone(block) self.assertIsNotNone(block)
op1 = block.append_op() op1 = block.append_op()
op2 = block.append_op() op2 = block.append_op()
op0 = block.prepend_op() op0 = block.prepend_op()
all_ops = block.all_ops() all_ops = []
for idx in xrange(0, block.op_size()):
all_ops.append(block.op(idx))
self.assertEqual(all_ops, [op0, op1, op2]) self.assertEqual(all_ops, [op0, op1, op2])
......
import unittest
import numpy as np
from op_test import OpTest
class TestProximalGDOp(OpTest):
def setUp(self):
self.op_type = "proximal_gd"
w = np.random.random((102, 105)).astype("float32")
g = np.random.random((102, 105)).astype("float32")
lr = np.array([0.1]).astype("float32")
l1 = 0.1
l2 = 0.2
self.inputs = {'Param': w, 'Grad': g, 'LearningRate': lr}
self.attrs = {'l1': l1, 'l2': l2}
prox_param = w - lr * g
param_out = 0.0
if l1 > 0.0:
x = np.abs(prox_param) - lr * l1
x[x < 0] = 0
param_out = np.sign(prox_param) * (x / (1.0 + lr * l2))
else:
param_out = prox_param / (1.0 + lr * l2)
self.outputs = {'ParamOut': param_out}
def test_check_output(self):
self.check_output()
if __name__ == "__main__":
unittest.main()
import paddle.v2 as paddle
import paddle.v2.framework.layers as layers
import paddle.v2.framework.nets as nets
import paddle.v2.framework.core as core
import paddle.v2.framework.optimizer as optimizer
from paddle.v2.framework.framework import Program, g_program
from paddle.v2.framework.executor import Executor
import numpy as np
init_program = Program()
program = Program()
images = layers.data(
name='pixel',
shape=[1, 28, 28],
data_type='float32',
program=program,
init_program=init_program)
label = layers.data(
name='label',
shape=[1],
data_type='int32',
program=program,
init_program=init_program)
conv_pool_1 = nets.simple_img_conv_pool(
input=images,
filter_size=5,
num_filters=20,
pool_size=2,
pool_stride=2,
act="relu",
program=program,
init_program=init_program)
conv_pool_2 = nets.simple_img_conv_pool(
input=conv_pool_1,
filter_size=5,
num_filters=50,
pool_size=2,
pool_stride=2,
act="relu",
program=program,
init_program=init_program)
predict = layers.fc(input=conv_pool_2,
size=10,
act="softmax",
program=program,
init_program=init_program)
cost = layers.cross_entropy(
input=predict, label=label, program=program, init_program=init_program)
avg_cost = layers.mean(x=cost, program=program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
opts = sgd_optimizer.minimize(avg_cost)
BATCH_SIZE = 50
PASS_NUM = 1
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=500),
batch_size=BATCH_SIZE)
place = core.CPUPlace()
exe = Executor(place)
exe.run(init_program, feed={}, fetch_list=[])
for pass_id in range(PASS_NUM):
count = 0
for data in train_reader():
img_data = np.array(map(lambda x: x[0].reshape([1, 28, 28]),
data)).astype("float32")
y_data = np.array(map(lambda x: x[1], data)).astype("int32")
y_data = y_data.reshape([BATCH_SIZE, 1])
tensor_img = core.LoDTensor()
tensor_y = core.LoDTensor()
tensor_img.set(img_data, place)
tensor_y.set(y_data, place)
outs = exe.run(program,
feed={"pixel": tensor_img,
"label": tensor_y},
fetch_list=[avg_cost])
loss = np.array(outs[0])
if loss < 10.0:
exit(0) # if avg cost less than 10.0, we think our code is good.
exit(1)
import paddle.v2 as paddle
import paddle.v2.framework.layers as layers
import paddle.v2.framework.core as core
import paddle.v2.framework.optimizer as optimizer
from paddle.v2.framework.framework import Program, g_program
from paddle.v2.framework.executor import Executor
import numpy as np
init_program = Program()
program = Program()
image = layers.data(
name='x',
shape=[784],
data_type='float32',
program=program,
init_program=init_program)
hidden1 = layers.fc(input=image,
size=128,
act='relu',
program=program,
init_program=init_program)
hidden2 = layers.fc(input=hidden1,
size=64,
act='relu',
program=program,
init_program=init_program)
predict = layers.fc(input=hidden2,
size=10,
act='softmax',
program=program,
init_program=init_program)
label = layers.data(
name='y',
shape=[1],
data_type='int32',
program=program,
init_program=init_program)
cost = layers.cross_entropy(
input=predict, label=label, program=program, init_program=init_program)
avg_cost = layers.mean(x=cost, program=program, init_program=init_program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
opts = sgd_optimizer.minimize(avg_cost)
BATCH_SIZE = 128
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=8192),
batch_size=BATCH_SIZE)
place = core.CPUPlace()
exe = Executor(place)
exe.run(init_program, feed={}, fetch_list=[])
PASS_NUM = 100
for pass_id in range(PASS_NUM):
for data in train_reader():
x_data = np.array(map(lambda x: x[0], data)).astype("float32")
y_data = np.array(map(lambda x: x[1], data)).astype("int32")
y_data = np.expand_dims(y_data, axis=1)
tensor_x = core.LoDTensor()
tensor_x.set(x_data, place)
tensor_y = core.LoDTensor()
tensor_y.set(y_data, place)
outs = exe.run(program,
feed={'x': tensor_x,
'y': tensor_y},
fetch_list=[avg_cost])
out = np.array(outs[0])
if out[0] < 5.0:
exit(0) # if avg cost less than 5.0, we think our code is good.
exit(1)
...@@ -132,15 +132,15 @@ class RecurrentOpTest(unittest.TestCase): ...@@ -132,15 +132,15 @@ class RecurrentOpTest(unittest.TestCase):
# create RNNOp # create RNNOp
self.rnnop = RecurrentOp( self.rnnop = RecurrentOp(
# inputs # inputs
inlinks=["x"], inputs=["x"],
boot_memories=["h_boot"], initial_states=["h_boot"],
step_net="stepnet", step_net="stepnet",
# outputs # outputs
outlinks=["h@mem"], outputs=["h@mem"],
step_scopes="step_scopes", step_scopes="step_scopes",
# attributes # attributes
pre_memories=["h@pre"], ex_states=["h@pre"],
memories=["h@mem"]) states=["h@mem"])
def create_step_net(self): def create_step_net(self):
stepnet = core.Net.create() stepnet = core.Net.create()
...@@ -169,15 +169,15 @@ class RecurrentGradientOpTest(unittest.TestCase): ...@@ -169,15 +169,15 @@ class RecurrentGradientOpTest(unittest.TestCase):
def create_forward_op(self): def create_forward_op(self):
self.forward_op = RecurrentOp( self.forward_op = RecurrentOp(
# inputs # inputs
inlinks=["x"], inputs=["x"],
boot_memories=["h_boot"], initial_states=["h_boot"],
step_net="stepnet", step_net="stepnet",
# outputs # outputs
outlinks=["h"], outputs=["h"],
step_scopes="step_scopes", step_scopes="step_scopes",
# attributes # attributes
pre_memories=["h@pre"], ex_states=["h@pre"],
memories=["h@alias"]) states=["h@alias"])
# create a stepnet for RNN # create a stepnet for RNN
stepnet = core.Net.create() stepnet = core.Net.create()
......
...@@ -85,5 +85,33 @@ class Test1DReduce(OpTest): ...@@ -85,5 +85,33 @@ class Test1DReduce(OpTest):
self.check_grad(['X'], 'Out') self.check_grad(['X'], 'Out')
class TestNorm(OpTest):
def setUp(self):
# use x away from 0 to avoid errors of numerical gradient when gradient near 0
x = np.random.random((5, 6, 10)).astype("float32") + 0.2
p = 2
dim = 1
keep_dim = False
abs_out = np.absolute(x)
pow_out = np.power(x, p)
sum_out = np.sum(pow_out, axis=dim, keepdims=keep_dim)
out = np.power(sum_out, 1. / p)
self.op_type = "norm"
self.inputs = {'X': x}
self.attrs = {"p": p, "dim": dim, "keep_dim": keep_dim}
self.outputs = {
"AbsOut": abs_out,
"PowOut": pow_out,
"SumOut": sum_out,
"Out": out
}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(['X'], 'Out', max_relative_error=0.01)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -46,7 +46,7 @@ class TestRmspropOp1(OpTest): ...@@ -46,7 +46,7 @@ class TestRmspropOp1(OpTest):
class TestRmspropOp2(OpTest): class TestRmspropOp2(OpTest):
'''Test RMSProp with defaukt values for attributes '''Test RMSProp with default values for attributes
''' '''
def setUp(self): def setUp(self):
......
...@@ -8,29 +8,30 @@ class TestSelectedRows(unittest.TestCase): ...@@ -8,29 +8,30 @@ class TestSelectedRows(unittest.TestCase):
place = core.CPUPlace() place = core.CPUPlace()
height = 10 height = 10
rows = [0, 4, 7] rows = [0, 4, 7]
row_numel = 10 row_numel = 12
selcted_rows = core.SelectedRows(rows, row_numel) selected_rows = core.SelectedRows(rows, height)
np_array = np.ones((len(rows), height)).astype("float32") np_array = np.ones((len(rows), row_numel)).astype("float32")
np_array[0, 0] = 2.0 np_array[0, 0] = 2.0
np_array[2, 8] = 4.0 np_array[2, 8] = 4.0
tensor = selcted_rows.get_tensor() tensor = selected_rows.get_tensor()
tensor.set(np_array, place) tensor.set(np_array, place)
# compare rows # compare rows
self.assertEqual(0, selcted_rows.rows()[0]) self.assertEqual(0, selected_rows.rows()[0])
self.assertEqual(4, selcted_rows.rows()[1]) self.assertEqual(4, selected_rows.rows()[1])
self.assertEqual(7, selcted_rows.rows()[2]) self.assertEqual(7, selected_rows.rows()[2])
# compare height # compare height
self.assertEqual(10, selcted_rows.height()) self.assertEqual(10, selected_rows.height())
# compare tensor # compare tensor
self.assertAlmostEqual(2.0, self.assertAlmostEqual(2.0,
selcted_rows.get_tensor().get_float_element(0)) selected_rows.get_tensor().get_float_element(0))
self.assertAlmostEqual(1.0, self.assertAlmostEqual(1.0,
selcted_rows.get_tensor().get_float_element(1)) selected_rows.get_tensor().get_float_element(1))
self.assertAlmostEqual( self.assertAlmostEqual(
4.0, selcted_rows.get_tensor().get_float_element(2 * row_numel + 8)) 4.0,
selected_rows.get_tensor().get_float_element(2 * row_numel + 8))
if __name__ == "__main__": if __name__ == "__main__":
......
import unittest import unittest
import numpy as np import numpy as np
import paddle.v2.framework.core as core
from paddle.v2.framework.op import Operator
from op_test import OpTest from op_test import OpTest
...@@ -17,5 +19,70 @@ class TestSGDOp(OpTest): ...@@ -17,5 +19,70 @@ class TestSGDOp(OpTest):
self.check_output() self.check_output()
class TestSparseSGDOp(unittest.TestCase):
def check_with_place(self, place):
scope = core.Scope()
# create and initialize Grad Variable
height = 10
rows = [0, 4, 7]
row_numel = 12
grad_selected_rows = scope.var('Grad').get_selected_rows()
grad_selected_rows.set_height(height)
grad_selected_rows.set_rows(rows)
np_array = np.ones((len(rows), row_numel)).astype("float32")
np_array[0, 0] = 2.0
np_array[2, 8] = 4.0
grad_tensor = grad_selected_rows.get_tensor()
grad_tensor.set(np_array, place)
# create and initialize Param Variable
param = scope.var('Param').get_tensor()
param_array = np.full((height, row_numel), 5.0).astype("float32")
param.set(param_array, place)
# create and initialize LeraningRate Variable
lr = scope.var('LearningRate').get_tensor()
lr_array = np.full((1), 2.0).astype("float32")
lr.set(lr_array, place)
# create and run sgd operator
sgd_op = Operator(
"sgd",
Param='Param',
Grad='Grad',
ParamOut='Param',
LearningRate='LearningRate')
ctx = core.DeviceContext.create(place)
sgd_op.run(scope, ctx)
# get and compare result
result_array = np.array(param)
# rows[0] = 0, 5.0 - 2.0 * 2.0
self.assertAlmostEqual(1.0, result_array[rows[0], 0])
# rows[0] = 0, 5.0 - 2.0 * 1.0
self.assertAlmostEqual(3.0, result_array[rows[0], 2])
# 5.0 - 2.0 * 0.0
self.assertAlmostEqual(5.0, result_array[1, 0])
# rows[1] = 4, 5.0 - 2.0 * 1.0
self.assertAlmostEqual(3.0, result_array[rows[1], 10])
# 5.0 - 2.0 * 0.0
self.assertAlmostEqual(5.0, result_array[5, 8])
# rows[2] = 7, 5.0 - 2.0 * 1.0
self.assertAlmostEqual(3.0, result_array[rows[2], 1])
# rows[2] = 7, 5.0 - 2.0 * 4.0
self.assertAlmostEqual(-3.0, result_array[rows[2], 8])
def test_sparse_sgd(self):
places = [core.CPUPlace()]
if core.is_compile_gpu():
places.append(core.GPUPlace(0))
for place in places:
self.check_with_place(place)
if __name__ == "__main__": if __name__ == "__main__":
unittest.main() unittest.main()
...@@ -19,7 +19,7 @@ class TestUniformRandomOp(unittest.TestCase): ...@@ -19,7 +19,7 @@ class TestUniformRandomOp(unittest.TestCase):
op = Operator( op = Operator(
"uniform_random", "uniform_random",
Out='X', Out='X',
dims=[1000, 784], shape=[1000, 784],
min=-5.0, min=-5.0,
max=10.0, max=10.0,
seed=10) seed=10)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册