提交 83e65005 编写于 作者: D Dong Zhihong

Merge remote-tracking branch 'origin/develop' into feature/evaluator

...@@ -30,6 +30,7 @@ addons: ...@@ -30,6 +30,7 @@ addons:
- automake - automake
- libtool - libtool
- ccache - ccache
ssh_known_hosts: 52.76.173.135
before_install: before_install:
- if [[ "$JOB" == "check_style" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi - if [[ "$JOB" == "check_style" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi
# Paddle is using protobuf 3.1 currently. Protobuf 3.2 breaks the compatibility. So we specify the python # Paddle is using protobuf 3.1 currently. Protobuf 3.2 breaks the compatibility. So we specify the python
...@@ -42,6 +43,14 @@ script: ...@@ -42,6 +43,14 @@ script:
- | - |
timeout 2580 paddle/scripts/travis/${JOB}.sh # 43min timeout timeout 2580 paddle/scripts/travis/${JOB}.sh # 43min timeout
RESULT=$?; if [ $RESULT -eq 0 ] || [ $RESULT -eq 142 ]; then true; else false; fi; RESULT=$?; if [ $RESULT -eq 0 ] || [ $RESULT -eq 142 ]; then true; else false; fi;
- |
if [[ "$JOB" != "build_doc" ]]; then exit 0; fi;
if [[ "$TRAVIS_PULL_REQUEST" != "false" ]]; then exit 0; fi;
if [[ "$TRAVIS_BRANCH" != "develop" && ! "$TRAVIS_BRANCH" =~ ^v[[:digit:]]+\.[[:digit:]]+(\.[[:digit:]]+)?(-\S*)?$ ]]; then exit 0; fi;
export DEPLOY_DOCS_SH=https://raw.githubusercontent.com/PaddlePaddle/PaddlePaddle.org/master/scripts/deploy/deploy_docs.sh
export DOCS_DIR=`pwd`
cd ..
curl $DEPLOY_DOCS_SH | bash -s $CONTENT_DEC_PASSWD $TRAVIS_BRANCH $DOCS_DIR $DOCS_DIR/build/doc
notifications: notifications:
email: email:
on_success: change on_success: change
......
...@@ -23,7 +23,7 @@ On each machine, we will test and compare the performance of training on single ...@@ -23,7 +23,7 @@ On each machine, we will test and compare the performance of training on single
## Benchmark Model ## Benchmark Model
### Server ### Server
Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148M CPU @ 2.40GHz Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Input image size - 3 * 224 * 224, Time: images/second Input image size - 3 * 224 * 224, Time: images/second
......
# Averaging Parameter in PaddlePaddle
## Why Averaging
In a large scale machine learning setup where the size of the training data is huge, it could take us a large number of iterations over the training data before we can achieve the optimal values of parameters of our model. Looking at the problem setup, it is desirable if we can obtain the optimal values of parameters by going through the data in as few passes as we can.
Polyak and Juditsky (1992) showed that the test performance of simple average of parameters obtained by Stochastic Gradient Descent (SGD) is as good as that of parameter values that are obtained by training the model over and over again, over the training dataset.
Hence, to accelerate the speed of Stochastic Gradient Descent, Averaged Stochastic Gradient Descent (ASGD) was proposed in Polyak and Juditsky (1992). For ASGD, the running average of parameters obtained by SGD, is used as the estimator for <img src="./images/theta_star.gif"/><br/> . The averaging is done as follows:
<img src="./images/asgd.gif" align="center"/><br/>
We propose averaging for any optimizer similar to how ASGD performs it, as mentioned above.
### How to perform Parameter Averaging in PaddlePaddle
Parameter Averaging in PaddlePaddle works in the following way during training :
1. It will take in an instance of a normal optimizer as an input, e.g. RMSPropOptimizer
2. The optimizer itself is responsible for updating the parameters.
3. The ParameterAverageOptimizer maintains a separate copy of the parameters for itself:
1. In concept, the values of this copy are the average of the values of the parameters in the most recent N batches.
2. However, saving all the N instances of the parameters in memory is not feasible.
3. Therefore, an approximation algorithm is used.
Hence, overall we have have two copies of the parameters: one for the optimizer itself, and one for the ParameterAverageOptimizer. The former should be used in back propagation, while the latter should be used during testing and should be saved.
During the testing/ saving the model phase, we perform the following steps:
1. Perform the delayed operations.
2. Save current values of the parameters to a temporary variable.
3. Replace the values of the parameters with the averaged values.
4. Perform testing and/or save the parameters.
5. Restore the values of the parameters once done.
### How to implement Averaging of Parameter in PaddlePaddle
We can add the ParameterAverageOptimizer op to the graph through Python API. Using this approach, we manually add this op to the graph and direct the output of the optimizer op to this op during training.
**Advantages**:
- Allows for greater flexibility to the users of PaddlePaddle. Using this approach, the users can plug different optimizers into ParameterAverageOptimizer by passing in the optimizer to the op.
- Makes it easy for the users to customize and extend the framework.
**Disadvantages**:
- Implementation requires re-writing the averaging methodology in Python.
### Low-Level implementation
In the new design, we propose to create a new operation for averaging parameter updates (ParameterAverageOptimizer). For now, we can add an op that takes in the following as input:
- the optimizer
- the window_size to keep the updates
The ParameterAverageOptimizer op can be like any other operator with its own CPU/GPU implementation either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement the kernel using Eigen following the abstraction pattern implemented for [Operators](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/rmsprop_op.h). We also want to support the case when the Trainer/Optimizer runs on the GPU while ParameterAverageOptimizer runs on a CPU.
The idea of building an op for averaging is in sync with the refactored PaddlePaddle philosophy of using operators to represent any computation unit. The way the op will be added to the computation graph will be decided by the [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) in Python API.
### Python API implementation for ParameterAverageOptimizer
Based on Polyak and Juditsky (1992), we can generalize the averaging of updates to any optimizer. The input to the op would be the following:
- Any optimizer (RMSProp , AdaGrad etc.)
- A window size. The op keeps accumulating updated parameter values over a window of N batches and takes an average. Move the averaged value to a buffer when window is full to avoid loss of precision.
Using the ParameterAverageOptimizer op, any user can add the operation to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support averaging. As per the PaddlePaddle [Python API design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md), the layer functions are responsible for creating operators, operator parameters and variables. Since ParameterAverageOptimizer will be an operator, it makes sense to create it in the layer functions.
We will have a wrapper written in Python that will support the functionality and implement the actual core computation in C++ core as we have done for other [Optimizers](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/rmsprop_op.cc)
#### Creation of the ParameterAverageOptimizer operator
There are two ways for creating the ParameterAverageOptimizer op:
1. We create the op immediately while building the computation graph.
2. We add the op in a lazy manner, just before the backward pass, similar to the way the optimization ops are added.
The proposal is to add the op immediately while building the computation graph.
#### High-level API
In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we also need to provide parameter average functionality in layer functions.
# Build PaddlePaddle for Android
There are two approaches to build PaddlePaddle for Android: using Docker and on Linux without Docker.
## Cross-Compiling Using Docker
Docker-based cross-compiling is the recommended approach because Docker runs on all major operating systems, including Linux, Mac OS X, and Windows.
### Build the Docker Image
The following steps pack all the tools that we need to build PaddlePaddle into a Docker image.
```bash
$ git clone https://github.com/PaddlePaddle/Paddle.git
$ cd Paddle
$ docker build -t paddle:dev-android . -f Dockerfile.android
```
### Build the Inference Library
We can run the Docker image we just created to build the inference library of PaddlePaddle for Android using the command below:
```bash
$ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=armeabi-v7a" -e "ANDROID_API=21" paddle:dev-android
```
The Docker image accepts two arguments `ANDROID_ABI` and `ANDROID_API`:
| Argument | Optional Values | Default |
|-----------------|-------------------------|---------|
|`ANDROID_ABI` |`armeabi-v7a, arm64-v8a` | `armeabi-v7a` |
|`ANDROID_API` |`>= 21` | `21` |
The ARM-64 architecture (`arm64-v8a`) requires at least level 21 of Android API.
The default entry-point of the Docker image, [`paddle/scripts/docker/build_android.sh`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build_android.sh) generates the [Android cross-compiling standalone toolchain](https://developer.android.com/ndk/guides/standalone_toolchain.html) based on the argument: `ANDROID_ABI` or `ANDROID_API`. For information about other configuration arguments, please continue reading.
The above command generates and outputs the inference library in `$PWD/install_android` and puts third-party libraries in `$PWD/install_android/third_party`.
## Cross-Compiling on Linux
The Linux-base approach to cross-compile is to run steps in `Dockerfile.android` manually on a Linux x64 computer.
### Setup the Environment
To build for Android's, we need [Android NDK](
https://developer.android.com/ndk/downloads/index.html):
```bash
wget -q https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip
unzip -q android-ndk-r14b-linux-x86_64.zip
```
Android NDK includes everything we need to build the [*standalone toolchain*](https://developer.android.com/ndk/guides/standalone_toolchain.html), which in then used to build PaddlePaddle for Android. (We plan to remove the intermediate stage of building the standalone toolchain in the near future.)
- To build the standalone toolchain for `armeabi-v7a` and Android API level 21:
```bash
your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \
--arch=arm --platform=android-21 --install-dir=your/path/to/arm_standalone_toolchain
```
The generated standalone toolchain will be in `your/path/to/arm_standalone_toolchain`.
- To build the standalone toolchain for `arm64-v8a` and Android API level 21:
```bash
your/path/to/android-ndk-r14b-linux-x86_64/build/tools/make-standalone-toolchain.sh \
--arch=arm64 --platform=android-21 --install-dir=your/path/to/arm64_standalone_toolchain
```
The generated standalone toolchain will be in `your/path/to/arm64_standalone_toolchain`.
**Please be aware that the minimum level of Android API required by PaddlePaddle is 21.**
### Cross-Compiling Arguments
CMake supports [choosing the toolchain](https://cmake.org/cmake/help/v3.0/manual/cmake-toolchains.7.html#cross-compiling). PaddlePaddle provides [`android.cmake`](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/android.cmake), which configures the Android cross-compiling toolchain for CMake. `android.cmake` is not required for CMake >= 3.7, which support Android cross-compiling. PaddlePaddle detects the CMake version, for those newer than 3.7, it uses [the official version](https://cmake.org/cmake/help/v3.7/manual/cmake-toolchains.7.html#cross-compiling).
Some other CMake arguments you need to know:
- `CMAKE_SYSTEM_NAME` must be `Android`. This tells PaddlePaddle's CMake system to cross-compile third-party dependencies. This also changes some other CMake arguments like `WITH_GPU=OFF`, `WITH_AVX=OFF`, `WITH_PYTHON=OFF`, and `WITH_RDMA=OFF`.
- `WITH_C_API` must be `ON`, to build the C-based inference library for Android.
- `WITH_SWIG_PY` must be `OFF` because the Android platform doesn't support SWIG-based API.
Some Android-specific arguments:
- `ANDROID_STANDALONE_TOOLCHAIN`: the absolute path of the Android standalone toolchain, or the path relative to the CMake build directory. PaddlePaddle's CMake extensions would derive the cross-compiler, sysroot and Android API level from this argument.
- `ANDROID_TOOLCHAIN`: could be `gcc` or `clang`. The default value is `clang`.
- For CMake >= 3.7, it should anyway be `clang`. For older versions, it could be `gcc`.
- Android's official `clang` requires `glibc` >= 2.15.
- `ANDROID_ABI`: could be `armeabi-v7a` or `arm64-v8a`. The default value is `armeabi-v7a`.
- `ANDROID_NATIVE_API_LEVEL`: could be derived from the value of `ANDROID_STANDALONE_TOOLCHAIN`.
- `ANROID_ARM_MODE`:
- could be `ON` or `OFF`, and defaults to `ON`, when `ANDROID_ABI=armeabi-v7a`;
- no need to specify when `ANDROID_ABI=arm64-v8a`.
- `ANDROID_ARM_NEON`: indicates if to use NEON instructions.
- could be `ON` or `OFF`, and defaults to `ON`, when `ANDROID_ABI=armeabi-v7a`;
- no need to specify when `ANDROID_ABI=arm64-v8a`.
Other useful arguments:
- `USE_EIGEN_FOR_BLAS`: indicates if using Eigen. Could be `ON` or `OFF`, defaults to `OFF`.
- `HOST_C/CXX_COMPILER`: specifies the host compiler, which is used to build the host-specific protoc and target-specific OpenBLAS. It defaults to the value of the environment variable `CC`, or `cc`.
Some frequent configurations for your reference:
```bash
cmake -DCMAKE_SYSTEM_NAME=Android \
-DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm_standalone_toolchain \
-DANDROID_ABI=armeabi-v7a \
-DANDROID_ARM_NEON=ON \
-DANDROID_ARM_MODE=ON \
-DUSE_EIGEN_FOR_BLAS=ON \
-DCMAKE_INSTALL_PREFIX=your/path/to/install \
-DWITH_C_API=ON \
-DWITH_SWIG_PY=OFF \
..
```
```
cmake -DCMAKE_SYSTEM_NAME=Android \
-DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm64_standalone_toolchain \
-DANDROID_ABI=arm64-v8a \
-DUSE_EIGEN_FOR_BLAS=OFF \
-DCMAKE_INSTALL_PREFIX=your/path/to/install \
-DWITH_C_API=ON \
-DWITH_SWIG_PY=OFF \
..
```
There are some other arguments you might want to configure.
- `CMAKE_BUILD_TYPE=MinSizeRel` minimizes the size of library.
- `CMAKE_BUILD_TYPE-Release` optimizes the runtime performance.
Our own tip for performance optimization to use clang and Eigen or OpenBLAS:
- `CMAKE_BUILD_TYPE=Release`
- `ANDROID_TOOLCHAIN=clang`
- `USE_EIGEN_BLAS=ON` for `armeabi-v7a`, or `USE_EIGEN_FOR_BLAS=OFF` for `arm64-v8a`.
### Build and Install
After running `cmake`, we can run `make; make install` to build and install.
Before building, you might want to remove the `third_party` and `build` directories including pre-built libraries for other architectures.
After building,in the directory `CMAKE_INSTALL_PREFIX`, you will find three sub-directories:
- `include`: the header file of the inference library,
- `lib`: the inference library built for various Android ABIs,
- `third_party`: dependent third-party libraries built for Android.
# 构建Android平台上的PaddlePaddle库 # 构建Android平台上的PaddlePaddle库
用户可通过如下两种方式,交叉编译Android平台上适用的PaddlePaddle库: 用户可通过如下两种方式,交叉编译Android平台上适用的PaddlePaddle库:
- 基于Docker容器的编译方式 - 基于Docker容器的编译方式
- 基于Linux交叉编译环境的编译方式 - 基于Linux交叉编译环境的编译方式
## 基于Docker容器的编译方式 ## 基于Docker容器的编译方式
...@@ -26,14 +26,14 @@ Android的Docker开发镜像向用户提供两个可配置的参数: ...@@ -26,14 +26,14 @@ Android的Docker开发镜像向用户提供两个可配置的参数:
|`ANDROID_API` |`>= 21` | `21` | |`ANDROID_API` |`>= 21` | `21` |
- 编译`armeabi-v7a``Android API 21`的PaddlePaddle库 - 编译`armeabi-v7a``Android API 21`的PaddlePaddle库
```bash ```bash
$ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=armeabi-v7a" -e "ANDROID_API=21" username/paddle-android:dev $ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=armeabi-v7a" -e "ANDROID_API=21" username/paddle-android:dev
``` ```
- 编译`arm64-v8a``Android API 21`的PaddlePaddle库 - 编译`arm64-v8a``Android API 21`的PaddlePaddle库
```bash ```bash
$ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=arm64-v8a" -e "ANDROID_API=21" username/paddle-android:dev $ docker run -it --rm -v $PWD:/paddle -e "ANDROID_ABI=arm64-v8a" -e "ANDROID_API=21" username/paddle-android:dev
``` ```
执行上述`docker run`命令时,容器默认执行[paddle/scripts/docker/build_android.sh](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build_android.sh)脚本。该脚本中记录了交叉编译Android版PaddlePaddle库常用的CMake配置,并且会根据`ANDROID_ABI``ANDROID_API`自动构建独立工具链、进行编译和安装。由于arm64架构要求Android API不小于21。因此当`ANDROID_ABI=arm64-v8a``ANDROID_API<21`时,Docker容器中将默认使用`Android API 21`的编译工具链。用户可以参考下文**配置交叉编译参数**章节,根据个人的需求修改定制Docker容器所执行的脚本。编译安装结束之后,PaddlePaddle的C-API库将被安装到`$PWD/install_android`目录,所依赖的第三方库同时也被安装到`$PWD/install_android/third_party`目录。 执行上述`docker run`命令时,容器默认执行[paddle/scripts/docker/build_android.sh](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build_android.sh)脚本。该脚本中记录了交叉编译Android版PaddlePaddle库常用的CMake配置,并且会根据`ANDROID_ABI``ANDROID_API`自动构建独立工具链、进行编译和安装。由于arm64架构要求Android API不小于21。因此当`ANDROID_ABI=arm64-v8a``ANDROID_API<21`时,Docker容器中将默认使用`Android API 21`的编译工具链。用户可以参考下文**配置交叉编译参数**章节,根据个人的需求修改定制Docker容器所执行的脚本。编译安装结束之后,PaddlePaddle的C-API库将被安装到`$PWD/install_android`目录,所依赖的第三方库同时也被安装到`$PWD/install_android/third_party`目录。
...@@ -82,16 +82,16 @@ CMake系统对交叉编译提供了支持[cmake-toolchains](https://cmake.org/cm ...@@ -82,16 +82,16 @@ CMake系统对交叉编译提供了支持[cmake-toolchains](https://cmake.org/cm
Android平台可选配置参数: Android平台可选配置参数:
- `ANDROID_STANDALONE_TOOLCHAIN`,独立工具链所在的绝对路径,或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动推导和设置需要使用的交叉编译器、sysroot、以及Android API级别;否则,用户需要在cmake时手动设置这些值。无默认值。 - `ANDROID_STANDALONE_TOOLCHAIN`,独立工具链所在的绝对路径,或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动推导和设置需要使用的交叉编译器、sysroot、以及Android API级别;否则,用户需要在cmake时手动设置这些值。无默认值。
- `ANDROID_TOOLCHAIN`,目标工具链。可设置`gcc/clang`,默认值为`clang` - `ANDROID_TOOLCHAIN`,目标工具链。可设置`gcc/clang`,默认值为`clang`
- CMake 3.7以上,将会始终使用`clang`工具链;CMake 3.7以下,可设置`ANDROID_TOOLCHAIN=gcc`以使用`gcc`工具链。 - CMake 3.7以上,将会始终使用`clang`工具链;CMake 3.7以下,可设置`ANDROID_TOOLCHAIN=gcc`以使用`gcc`工具链。
- Android官方提供的`clang`编译器要求系统支持`GLIBC 2.15`以上。 - Android官方提供的`clang`编译器要求系统支持`GLIBC 2.15`以上。
- `ANDROID_ABI`,目标架构ABI。目前支持`armeabi-v7a``arm64-v8a`,默认值为`armeabi-v7a` - `ANDROID_ABI`,目标架构ABI。目前支持`armeabi-v7a``arm64-v8a`,默认值为`armeabi-v7a`
- `ANDROID_NATIVE_API_LEVEL`,工具链的Android API级别。若没有显式设置,PaddlePaddle将根据`ANDROID_STANDALONE_TOOLCHAIN`的值自动推导得到。 - `ANDROID_NATIVE_API_LEVEL`,工具链的Android API级别。若没有显式设置,PaddlePaddle将根据`ANDROID_STANDALONE_TOOLCHAIN`的值自动推导得到。
- `ANROID_ARM_MODE`,是否使用ARM模式。 - `ANROID_ARM_MODE`,是否使用ARM模式。
- `ANDROID_ABI=armeabi-v7a`时,可设置`ON/OFF`,默认值为`ON` - `ANDROID_ABI=armeabi-v7a`时,可设置`ON/OFF`,默认值为`ON`
- `ANDROID_ABI=arm64-v8a`时,不需要设置。 - `ANDROID_ABI=arm64-v8a`时,不需要设置。
- `ANDROID_ARM_NEON`,是否使用NEON指令。 - `ANDROID_ARM_NEON`,是否使用NEON指令。
- `ANDROID_ABI=armeabi-v7a`时,可设置`ON/OFF`,默认值为`ON` - `ANDROID_ABI=armeabi-v7a`时,可设置`ON/OFF`,默认值为`ON`
- `ANDROID_ABI=arm64-v8a`时,不需要设置。 - `ANDROID_ABI=arm64-v8a`时,不需要设置。
其他配置参数: 其他配置参数:
...@@ -119,7 +119,7 @@ cmake -DCMAKE_SYSTEM_NAME=Android \ ...@@ -119,7 +119,7 @@ cmake -DCMAKE_SYSTEM_NAME=Android \
-DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm64_standalone_toolchain \ -DANDROID_STANDALONE_TOOLCHAIN=your/path/to/arm64_standalone_toolchain \
-DANDROID_ABI=arm64-v8a \ -DANDROID_ABI=arm64-v8a \
-DUSE_EIGEN_FOR_BLAS=OFF \ -DUSE_EIGEN_FOR_BLAS=OFF \
-DCMAKE_INSTALL_PREFIX=your/path/to/install \ -DCMAKE_INSTALL_PREFIX=your/path/to/install \
-DWITH_C_API=ON \ -DWITH_C_API=ON \
-DWITH_SWIG_PY=OFF \ -DWITH_SWIG_PY=OFF \
.. ..
...@@ -128,8 +128,8 @@ cmake -DCMAKE_SYSTEM_NAME=Android \ ...@@ -128,8 +128,8 @@ cmake -DCMAKE_SYSTEM_NAME=Android \
用户还可根据自己的需求设置其他编译参数。比如希望最小化生成的库的大小,可以设置`CMAKE_BUILD_TYPE``MinSizeRel`;若希望最快的执行速度,则可设置`CMAKE_BUILD_TYPE``Release`。亦可以通过手动设置`CMAKE_C/CXX_FLAGS_MINSIZEREL/RELEASE`来影响PaddlePaddle的编译过程。 用户还可根据自己的需求设置其他编译参数。比如希望最小化生成的库的大小,可以设置`CMAKE_BUILD_TYPE``MinSizeRel`;若希望最快的执行速度,则可设置`CMAKE_BUILD_TYPE``Release`。亦可以通过手动设置`CMAKE_C/CXX_FLAGS_MINSIZEREL/RELEASE`来影响PaddlePaddle的编译过程。
**性能TIPS**,为了达到最快的计算速度,在CMake参数配置上,有以下建议: **性能TIPS**,为了达到最快的计算速度,在CMake参数配置上,有以下建议:
- 设置`CMAKE_BUILD_TYPE``Release` - 设置`CMAKE_BUILD_TYPE``Release`
- 使用`clang`编译工具链 - 使用`clang`编译工具链
- `armeabi-v7a`时,设置`USE_EIGEN_BLAS=ON`,使用Eigen进行矩阵计算;`arm64-v8a`时,设置`USE_EIGEN_FOR_BLAS=OFF`,使用OpenBLAS进行矩阵计算 - `armeabi-v7a`时,设置`USE_EIGEN_BLAS=ON`,使用Eigen进行矩阵计算;`arm64-v8a`时,设置`USE_EIGEN_FOR_BLAS=OFF`,使用OpenBLAS进行矩阵计算
### 编译和安装 ### 编译和安装
......
...@@ -300,4 +300,12 @@ extern void hl_matrix_col2Vol(real* dataDst, ...@@ -300,4 +300,12 @@ extern void hl_matrix_col2Vol(real* dataDst,
real alpha, real alpha,
real beta); real beta);
/**
* @brief Matrix col2Vol: Convert col matrix into 3D volume
* @param[out] out output int vector.
* @param[in] vec input float vector.
* @param[in] size size of the vector.
*/
extern void hl_vector_cast2int(int* out, real* vec, int size);
#endif /* HL_MATRIX_H_ */ #endif /* HL_MATRIX_H_ */
...@@ -133,4 +133,6 @@ inline void hl_matrix_col2Vol(real* dataDst, ...@@ -133,4 +133,6 @@ inline void hl_matrix_col2Vol(real* dataDst,
real alpha, real alpha,
real beta) {} real beta) {}
inline void hl_vector_cast2int(int* out, real* vec, int size) {}
#endif // HL_MATRIX_STUB_H_ #endif // HL_MATRIX_STUB_H_
...@@ -793,3 +793,14 @@ void hl_matrix_col2Vol(real* dataDst, ...@@ -793,3 +793,14 @@ void hl_matrix_col2Vol(real* dataDst,
CHECK_SYNC("hl_matrix_col2Vol failed"); CHECK_SYNC("hl_matrix_col2Vol failed");
} }
__global__ void keVectorCast2Int(int* out, real* vec, int size) {
for (int i = threadIdx.x; i < (size); i += blockDim.x) {
out[i] = int(vec[i]);
}
}
void hl_vector_cast2int(int* out, real* vec, int size) {
keVectorCast2Int<<<1, 512, 0, STREAM_DEFAULT>>>(out, vec, size);
CHECK_SYNC("hl_vector_cast2int failed");
}
...@@ -24,7 +24,6 @@ ...@@ -24,7 +24,6 @@
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/operators/dynamic_recurrent_op.h" #include "paddle/operators/dynamic_recurrent_op.h"
#include "paddle/operators/net_op.h" #include "paddle/operators/net_op.h"
#include "paddle/operators/recurrent_op.h"
namespace paddle { namespace paddle {
namespace framework { namespace framework {
...@@ -38,7 +37,7 @@ static inline std::unique_ptr<OperatorBase> CreateGradOp( ...@@ -38,7 +37,7 @@ static inline std::unique_ptr<OperatorBase> CreateGradOp(
op_desc.SetType(op.Type()); op_desc.SetType(op.Type());
op_desc.SetAttrMap(op.Attrs()); op_desc.SetAttrMap(op.Attrs());
auto& info = OpInfoMap::Instance().Get(op.Type()); auto& info = OpInfoMap::Instance().Get(op.Type());
auto grad_descs = info.GradOpMaker()(op_desc, no_grad_set, grad_to_var); auto grad_descs = info.GradOpMaker()(op_desc, no_grad_set, grad_to_var, {});
std::vector<std::unique_ptr<OperatorBase>> grad_ops; std::vector<std::unique_ptr<OperatorBase>> grad_ops;
grad_ops.reserve(grad_descs.size()); grad_ops.reserve(grad_descs.size());
std::transform(grad_descs.begin(), grad_descs.end(), std::transform(grad_descs.begin(), grad_descs.end(),
...@@ -220,19 +219,7 @@ static std::unique_ptr<OperatorBase> BackwardRecursive( ...@@ -220,19 +219,7 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
}); });
// process recurrent gradient op as a special operator. // process recurrent gradient op as a special operator.
if (forwardOp.Type() == "recurrent") { if (forwardOp.Type() == "dynamic_recurrent") {
// NOTE clean up cycle call somewhere (RNN's stepnet constains itself),
// or this will result in infinite loop.
const auto& rnnop =
*static_cast<const operators::RecurrentOp*>(&forwardOp);
auto rnn_grad_op =
static_cast<operators::RecurrentGradientOp*>(grad_op.get());
const auto& stepnet_op =
*static_cast<const OperatorBase*>(&rnnop.stepnet());
// create stepnet's gradient op
rnn_grad_op->set_stepnet(
BackwardRecursive(stepnet_op, no_grad_names, grad_to_var, uniq_id));
} else if (forwardOp.Type() == "dynamic_recurrent") {
// NOTE clean up cycle call somewhere (RNN's stepnet constains itself), // NOTE clean up cycle call somewhere (RNN's stepnet constains itself),
// or this will result in infinite loop. // or this will result in infinite loop.
const auto& rnnop = const auto& rnnop =
...@@ -331,7 +318,7 @@ static void CreateGradVarInBlock( ...@@ -331,7 +318,7 @@ static void CreateGradVarInBlock(
continue; continue;
} }
auto pname = FwdName(arg); auto pname = FwdName(arg);
auto* param = block_desc->FindVar(pname); auto* param = block_desc->FindVarRecursive(pname);
auto* grad = block_desc->FindVar(arg); auto* grad = block_desc->FindVar(arg);
if (param == nullptr) { if (param == nullptr) {
LOG(WARNING) << "Cannot find forward variable of " << arg LOG(WARNING) << "Cannot find forward variable of " << arg
...@@ -348,7 +335,9 @@ static void CreateGradVarInBlock( ...@@ -348,7 +335,9 @@ static void CreateGradVarInBlock(
std::vector<std::unique_ptr<OpDescBind>> MakeOpGrad( std::vector<std::unique_ptr<OpDescBind>> MakeOpGrad(
const OpDescBind* op_desc, std::unordered_set<std::string>* no_grad_vars, const OpDescBind* op_desc, std::unordered_set<std::string>* no_grad_vars,
std::unordered_map<std::string, std::string>* grad_to_var) { std::unordered_map<std::string, std::string>* grad_to_var,
const std::vector<BlockDescBind*>& grad_block =
std::vector<BlockDescBind*>()) {
std::vector<std::unique_ptr<OpDescBind>> grad_op_descs; std::vector<std::unique_ptr<OpDescBind>> grad_op_descs;
// All input gradients of forwarding operator do not need to calculate. // All input gradients of forwarding operator do not need to calculate.
const std::vector<std::string>& inputs = op_desc->InputArgumentNames(); const std::vector<std::string>& inputs = op_desc->InputArgumentNames();
...@@ -364,9 +353,10 @@ std::vector<std::unique_ptr<OpDescBind>> MakeOpGrad( ...@@ -364,9 +353,10 @@ std::vector<std::unique_ptr<OpDescBind>> MakeOpGrad(
return grad_op_descs; // empty vector return grad_op_descs; // empty vector
} }
grad_op_descs = OpInfoMap::Instance() grad_op_descs =
.Get(op_desc->Type()) OpInfoMap::Instance()
.GradOpMaker()(*op_desc, *no_grad_vars, grad_to_var); .Get(op_desc->Type())
.GradOpMaker()(*op_desc, *no_grad_vars, grad_to_var, grad_block);
std::list<std::unique_ptr<OpDescBind>> pending_fill_zeros_ops; std::list<std::unique_ptr<OpDescBind>> pending_fill_zeros_ops;
for (auto& desc : grad_op_descs) { for (auto& desc : grad_op_descs) {
...@@ -400,21 +390,20 @@ std::vector<std::unique_ptr<OpDescBind>> MakeBlockBackward( ...@@ -400,21 +390,20 @@ std::vector<std::unique_ptr<OpDescBind>> MakeBlockBackward(
std::vector<std::unique_ptr<OpDescBind>> backward_descs; std::vector<std::unique_ptr<OpDescBind>> backward_descs;
for (auto it = op_descs.rbegin(); it != op_descs.rend(); ++it) { for (auto it = op_descs.rbegin(); it != op_descs.rend(); ++it) {
std::vector<std::unique_ptr<OpDescBind>> op_grads = std::vector<std::unique_ptr<OpDescBind>> op_grads;
MakeOpGrad(*it, no_grad_vars, grad_to_var);
if ((*it)->Type() == "recurrent") { if ((*it)->Type() == "recurrent") {
PADDLE_ENFORCE_EQ(
op_grads.size(), static_cast<size_t>(1),
"rnn_op's gradient process should contain only one op.");
int step_block_idx = (*it)->GetBlockAttr("step_block"); int step_block_idx = (*it)->GetBlockAttr("step_block");
auto backward_block_op_descs = MakeBlockBackward( auto backward_block_op_descs = MakeBlockBackward(
program_desc, step_block_idx, no_grad_vars, grad_to_var); program_desc, step_block_idx, no_grad_vars, grad_to_var);
BlockDescBind* backward_block = program_desc.AppendBlock(*cur_block); BlockDescBind* backward_block =
program_desc.AppendBlock(*program_desc.MutableBlock(step_block_idx));
for (auto& ptr : backward_block_op_descs) { for (auto& ptr : backward_block_op_descs) {
backward_block->AppendAllocatedOp(std::move(ptr)); backward_block->AppendAllocatedOp(std::move(ptr));
} }
op_grads[0]->SetBlockAttr("step_block", *backward_block); op_grads = MakeOpGrad(*it, no_grad_vars, grad_to_var, {backward_block});
} else {
op_grads = MakeOpGrad(*it, no_grad_vars, grad_to_var);
} }
for (const auto& desc : op_grads) { for (const auto& desc : op_grads) {
......
...@@ -88,6 +88,8 @@ class BlockDescBind { ...@@ -88,6 +88,8 @@ class BlockDescBind {
BlockDesc *Proto(); BlockDesc *Proto();
ProgramDescBind *Program() { return this->prog_; }
private: private:
void ClearPBOps(); void ClearPBOps();
void ClearPBVars(); void ClearPBVars();
......
...@@ -108,8 +108,9 @@ struct OpInfoFiller<T, kGradOpDescMaker> { ...@@ -108,8 +108,9 @@ struct OpInfoFiller<T, kGradOpDescMaker> {
info->grad_op_maker_ = []( info->grad_op_maker_ = [](
const OpDescBind& fwd_op, const OpDescBind& fwd_op,
const std::unordered_set<std::string>& no_grad_set, const std::unordered_set<std::string>& no_grad_set,
std::unordered_map<std::string, std::string>* grad_to_var) { std::unordered_map<std::string, std::string>* grad_to_var,
T maker(fwd_op, no_grad_set, grad_to_var); const std::vector<BlockDescBind*>& grad_block) {
T maker(fwd_op, no_grad_set, grad_to_var, grad_block);
return maker(); return maker();
}; };
} }
......
...@@ -31,7 +31,7 @@ namespace framework { ...@@ -31,7 +31,7 @@ namespace framework {
const std::string kFeedOpType = "feed"; const std::string kFeedOpType = "feed";
const std::string kFetchOpType = "fetch"; const std::string kFetchOpType = "fetch";
Executor::Executor(const std::vector<platform::Place>& places) { Executor::Executor(const std::vector<platform::Place>& places) : own_(true) {
PADDLE_ENFORCE_GT(places.size(), 0); PADDLE_ENFORCE_GT(places.size(), 0);
device_contexts_.resize(places.size()); device_contexts_.resize(places.size());
for (size_t i = 0; i < places.size(); i++) { for (size_t i = 0; i < places.size(); i++) {
...@@ -52,8 +52,10 @@ Executor::Executor(const std::vector<platform::Place>& places) { ...@@ -52,8 +52,10 @@ Executor::Executor(const std::vector<platform::Place>& places) {
} }
Executor::~Executor() { Executor::~Executor() {
for (auto& device_context : device_contexts_) { if (own_) {
delete device_context; for (auto& device_context : device_contexts_) {
delete device_context;
}
} }
} }
...@@ -66,14 +68,18 @@ static void CreateTensor(Variable* var, VarDesc::VarType var_type) { ...@@ -66,14 +68,18 @@ static void CreateTensor(Variable* var, VarDesc::VarType var_type) {
var->GetMutable<FeedFetchList>(); var->GetMutable<FeedFetchList>();
} else if (var_type == VarDesc::FETCH_LIST) { } else if (var_type == VarDesc::FETCH_LIST) {
var->GetMutable<FeedFetchList>(); var->GetMutable<FeedFetchList>();
} else if (var_type == VarDesc::STEP_SCOPES) {
var->GetMutable<std::vector<framework::Scope>>();
} else { } else {
PADDLE_THROW( PADDLE_THROW(
"Variable type must be " "Variable type %d is not in "
"LoDTensor/SelectedRows/FEED_MINIBATCH/FETCH_LIST."); "[LoDTensor, SelectedRows, FEED_MINIBATCH, FETCH_LIST]",
var_type);
} }
} }
void Executor::Run(const ProgramDescBind& pdesc, Scope* scope, int block_id) { void Executor::Run(const ProgramDescBind& pdesc, Scope* scope, int block_id,
bool create_local_scope) {
// TODO(tonyyang-svail): // TODO(tonyyang-svail):
// - only runs on the first device (i.e. no interdevice communication) // - only runs on the first device (i.e. no interdevice communication)
// - will change to use multiple blocks for RNN op and Cond Op // - will change to use multiple blocks for RNN op and Cond Op
...@@ -81,29 +87,42 @@ void Executor::Run(const ProgramDescBind& pdesc, Scope* scope, int block_id) { ...@@ -81,29 +87,42 @@ void Executor::Run(const ProgramDescBind& pdesc, Scope* scope, int block_id) {
auto& block = pdesc.Block(block_id); auto& block = pdesc.Block(block_id);
auto& device = device_contexts_[0]; auto& device = device_contexts_[0];
Scope& local_scope = scope->NewScope(); Scope* local_scope = scope;
if (create_local_scope) {
for (auto& var : block.AllVars()) { local_scope = &scope->NewScope();
if (var->Persistable()) { for (auto& var : block.AllVars()) {
auto* ptr = scope->Var(var->Name()); if (var->Persistable()) {
CreateTensor(ptr, var->GetType()); auto* ptr = scope->Var(var->Name());
VLOG(3) << "Create Variable " << var->Name() CreateTensor(ptr, var->GetType());
<< " global, which pointer is " << ptr; VLOG(3) << "Create Variable " << var->Name()
} else { << " global, which pointer is " << ptr;
auto* ptr = local_scope.Var(var->Name()); } else {
auto* ptr = local_scope->Var(var->Name());
CreateTensor(ptr, var->GetType());
VLOG(3) << "Create Variable " << var->Name()
<< " locally, which pointer is " << ptr;
}
}
} else {
for (auto& var : block.AllVars()) {
auto* ptr = local_scope->Var(var->Name());
CreateTensor(ptr, var->GetType()); CreateTensor(ptr, var->GetType());
VLOG(3) << "Create Variable " << var->Name() VLOG(3) << "Create variable " << var->Name() << ", which pointer is "
<< " locally, which pointer is " << ptr; << ptr;
} }
} }
for (auto& op_desc : block.AllOps()) { for (auto& op_desc : block.AllOps()) {
auto op = paddle::framework::OpRegistry::CreateOp(*op_desc); auto op = paddle::framework::OpRegistry::CreateOp(*op_desc);
op->Run(local_scope, *device); op->Run(*local_scope, *device);
}
if (create_local_scope) {
scope->DeleteScope(local_scope);
} }
scope->DeleteScope(&local_scope);
} }
Executor::Executor(const platform::DeviceContext& device)
: device_contexts_({&device}), own_(false) {}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -25,6 +25,7 @@ namespace framework { ...@@ -25,6 +25,7 @@ namespace framework {
class Executor { class Executor {
public: public:
explicit Executor(const std::vector<platform::Place>& places); explicit Executor(const std::vector<platform::Place>& places);
explicit Executor(const platform::DeviceContext& devices);
~Executor(); ~Executor();
/* @Brief /* @Brief
...@@ -34,10 +35,11 @@ class Executor { ...@@ -34,10 +35,11 @@ class Executor {
* ProgramDesc * ProgramDesc
* Scope * Scope
*/ */
void Run(const ProgramDescBind&, Scope*, int); void Run(const ProgramDescBind&, Scope*, int, bool create_local_scope = true);
private: private:
std::vector<platform::DeviceContext*> device_contexts_; std::vector<const platform::DeviceContext*> device_contexts_;
bool own_;
}; };
} // namespace framework } // namespace framework
......
...@@ -15,6 +15,7 @@ ...@@ -15,6 +15,7 @@
#pragma once #pragma once
#include <string> #include <string>
#include <unordered_set> #include <unordered_set>
#include <vector>
#include "paddle/framework/op_desc.h" #include "paddle/framework/op_desc.h"
#include "paddle/framework/operator.h" #include "paddle/framework/operator.h"
...@@ -26,8 +27,13 @@ class GradOpDescMakerBase { ...@@ -26,8 +27,13 @@ class GradOpDescMakerBase {
explicit GradOpDescMakerBase( explicit GradOpDescMakerBase(
const OpDescBind& fwd_op, const OpDescBind& fwd_op,
const std::unordered_set<std::string>& no_grad_set, const std::unordered_set<std::string>& no_grad_set,
std::unordered_map<std::string, std::string>* grad_to_var) std::unordered_map<std::string, std::string>* grad_to_var,
: fwd_op_(fwd_op), no_grad_set_(no_grad_set), grad_to_var_(grad_to_var) {} const std::vector<BlockDescBind*>& grad_block =
std::vector<BlockDescBind*>())
: fwd_op_(fwd_op),
no_grad_set_(no_grad_set),
grad_to_var_(grad_to_var),
grad_block_(grad_block) {}
virtual ~GradOpDescMakerBase() = default; virtual ~GradOpDescMakerBase() = default;
virtual std::vector<std::unique_ptr<OpDescBind>> operator()() const = 0; virtual std::vector<std::unique_ptr<OpDescBind>> operator()() const = 0;
...@@ -102,6 +108,9 @@ class GradOpDescMakerBase { ...@@ -102,6 +108,9 @@ class GradOpDescMakerBase {
const OpDescBind& fwd_op_; const OpDescBind& fwd_op_;
const std::unordered_set<std::string>& no_grad_set_; const std::unordered_set<std::string>& no_grad_set_;
std::unordered_map<std::string, std::string>* grad_to_var_; std::unordered_map<std::string, std::string>* grad_to_var_;
protected:
std::vector<BlockDescBind*> grad_block_;
}; };
class SingleGradOpDescMaker : public GradOpDescMakerBase { class SingleGradOpDescMaker : public GradOpDescMakerBase {
......
...@@ -327,6 +327,19 @@ void OpDescBind::InferShape(const BlockDescBind &block) const { ...@@ -327,6 +327,19 @@ void OpDescBind::InferShape(const BlockDescBind &block) const {
PADDLE_ENFORCE(static_cast<bool>(infer_shape), PADDLE_ENFORCE(static_cast<bool>(infer_shape),
"%s's infer_shape has not been registered", this->Type()); "%s's infer_shape has not been registered", this->Type());
CompileTimeInferShapeContext ctx(*this, block); CompileTimeInferShapeContext ctx(*this, block);
if (VLOG_IS_ON(10)) {
std::ostringstream sout;
auto inames = this->InputArgumentNames();
sout << " From [";
std::copy(inames.begin(), inames.end(),
std::ostream_iterator<std::string>(sout, ", "));
sout << "] to [";
auto onames = this->OutputArgumentNames();
std::copy(onames.begin(), onames.end(),
std::ostream_iterator<std::string>(sout, ", "));
sout << "]";
VLOG(10) << sout.str();
}
infer_shape(&ctx); infer_shape(&ctx);
} }
......
...@@ -126,7 +126,7 @@ OperatorBase::OperatorBase(const std::string& type, ...@@ -126,7 +126,7 @@ OperatorBase::OperatorBase(const std::string& type,
std::vector<std::string> OperatorBase::InputVars() const { std::vector<std::string> OperatorBase::InputVars() const {
std::vector<std::string> ret_val; std::vector<std::string> ret_val;
for (auto& o : outputs_) { for (auto& o : inputs_) {
ret_val.reserve(ret_val.size() + o.second.size()); ret_val.reserve(ret_val.size() + o.second.size());
ret_val.insert(ret_val.end(), o.second.begin(), o.second.end()); ret_val.insert(ret_val.end(), o.second.begin(), o.second.end());
} }
...@@ -394,7 +394,19 @@ class RuntimeInferShapeContext : public InferShapeContext { ...@@ -394,7 +394,19 @@ class RuntimeInferShapeContext : public InferShapeContext {
void OperatorWithKernel::Run(const Scope& scope, void OperatorWithKernel::Run(const Scope& scope,
const platform::DeviceContext& dev_ctx) const { const platform::DeviceContext& dev_ctx) const {
VLOG(3) << "Running operator " << this->Type(); if (VLOG_IS_ON(1)) {
auto inputs = this->InputVars();
auto outputs = this->OutputVars(true);
std::ostringstream sout;
sout << "Run operator " << this->Type() << " From [";
std::ostream_iterator<std::string> out_it(sout, ",");
std::copy(inputs.begin(), inputs.end(), out_it);
sout << "] to [";
std::copy(outputs.begin(), outputs.end(), out_it);
sout << "]";
VLOG(1) << sout.str();
}
RuntimeInferShapeContext infer_shape_ctx(*this, scope); RuntimeInferShapeContext infer_shape_ctx(*this, scope);
this->InferShape(&infer_shape_ctx); this->InferShape(&infer_shape_ctx);
......
...@@ -47,8 +47,12 @@ Variable* Scope::Var(const std::string& name) { ...@@ -47,8 +47,12 @@ Variable* Scope::Var(const std::string& name) {
return v; return v;
} }
Variable* Scope::Var() { Variable* Scope::Var(std::string* name) {
return Var(string::Sprintf("%p.%d", this, vars_.size())); auto var_name = string::Sprintf("%p.%d", this, vars_.size());
if (name != nullptr) {
*name = var_name;
}
return Var(var_name);
} }
Variable* Scope::FindVar(const std::string& name) const { Variable* Scope::FindVar(const std::string& name) const {
......
...@@ -49,7 +49,7 @@ class Scope { ...@@ -49,7 +49,7 @@ class Scope {
Variable* Var(const std::string& name); Variable* Var(const std::string& name);
/// Create a variable with a scope-unique name. /// Create a variable with a scope-unique name.
Variable* Var(); Variable* Var(std::string* name = nullptr);
/// Find a variable in the scope or any of its ancestors. Returns /// Find a variable in the scope or any of its ancestors. Returns
/// nullptr if cannot find. /// nullptr if cannot find.
......
...@@ -125,7 +125,7 @@ class Tensor { ...@@ -125,7 +125,7 @@ class Tensor {
* @param[in] end_idx The index of the end row(exclusive) to slice. * @param[in] end_idx The index of the end row(exclusive) to slice.
* The index number begins from 0. * The index number begins from 0.
*/ */
inline Tensor Slice(const int& begin_idx, const int& end_idx) const; inline Tensor Slice(int begin_idx, int end_idx) const;
platform::Place place() const { platform::Place place() const {
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE_NOT_NULL(
......
...@@ -228,7 +228,7 @@ inline void Tensor::CopyFromVector(const std::vector<T>& src, ...@@ -228,7 +228,7 @@ inline void Tensor::CopyFromVector(const std::vector<T>& src,
#endif #endif
} }
inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const { inline Tensor Tensor::Slice(int begin_idx, int end_idx) const {
check_memory_size(); check_memory_size();
PADDLE_ENFORCE_GE(begin_idx, 0, PADDLE_ENFORCE_GE(begin_idx, 0,
"The start row index must be greater than 0."); "The start row index must be greater than 0.");
......
...@@ -29,6 +29,7 @@ class OpDescBind; ...@@ -29,6 +29,7 @@ class OpDescBind;
class BlockDescBind; class BlockDescBind;
class BlockDesc; class BlockDesc;
class InferShapeContext; class InferShapeContext;
class BlockDescBind;
using VariableNameMap = std::map<std::string, std::vector<std::string>>; using VariableNameMap = std::map<std::string, std::vector<std::string>>;
...@@ -46,7 +47,8 @@ using OpCreator = std::function<OperatorBase*( ...@@ -46,7 +47,8 @@ using OpCreator = std::function<OperatorBase*(
using GradOpMakerFN = std::function<std::vector<std::unique_ptr<OpDescBind>>( using GradOpMakerFN = std::function<std::vector<std::unique_ptr<OpDescBind>>(
const OpDescBind&, const std::unordered_set<std::string>& /*no_grad_set*/, const OpDescBind&, const std::unordered_set<std::string>& /*no_grad_set*/,
std::unordered_map<std::string, std::string>* /*grad_to_var*/)>; std::unordered_map<std::string, std::string>* /*grad_to_var*/,
const std::vector<BlockDescBind*>& grad_block)>;
using InferVarTypeFN = std::function<void(const OpDescBind& /*op_desc*/, using InferVarTypeFN = std::function<void(const OpDescBind& /*op_desc*/,
BlockDescBind* /*block*/)>; BlockDescBind* /*block*/)>;
......
...@@ -395,14 +395,24 @@ real AucEvaluator::evalImp(std::vector<Argument>& arguments) { ...@@ -395,14 +395,24 @@ real AucEvaluator::evalImp(std::vector<Argument>& arguments) {
CHECK_LE(arguments.size(), (size_t)3); CHECK_LE(arguments.size(), (size_t)3);
MatrixPtr output = arguments[0].value; MatrixPtr output = arguments[0].value;
IVectorPtr label = arguments[1].ids; IVectorPtr label = arguments[1].ids;
MatrixPtr labelval = arguments[1].value;
bool supportWeight = (3 == arguments.size()) ? true : false; bool supportWeight = (3 == arguments.size()) ? true : false;
MatrixPtr weight = supportWeight ? arguments[2].value : nullptr; MatrixPtr weight = supportWeight ? arguments[2].value : nullptr;
if (nullptr == output || nullptr == label ||
(supportWeight && nullptr == weight)) { if (nullptr == output || (supportWeight && nullptr == weight)) {
return 0; return 0;
} }
size_t insNum = output->getHeight(); size_t insNum = output->getHeight();
size_t outputDim = output->getWidth(); size_t outputDim = output->getWidth();
// Copy label from value to a vector.
if (nullptr == label && nullptr != labelval) {
// label width is 1
CHECK_EQ(1, labelval->getWidth());
VectorPtr vec =
Vector::create(labelval->getData(), insNum, output->useGpu());
label = vec->castToInt();
}
CHECK_EQ(insNum, label->getSize()); CHECK_EQ(insNum, label->getSize());
if (supportWeight) { if (supportWeight) {
CHECK_EQ(insNum, weight->getHeight()); CHECK_EQ(insNum, weight->getHeight());
...@@ -443,6 +453,7 @@ real AucEvaluator::evalImp(std::vector<Argument>& arguments) { ...@@ -443,6 +453,7 @@ real AucEvaluator::evalImp(std::vector<Argument>& arguments) {
int* labelD = label->getData(); int* labelD = label->getData();
real* weightD = supportWeight ? weight->getData() : nullptr; real* weightD = supportWeight ? weight->getData() : nullptr;
size_t pos = realColumnIdx_; size_t pos = realColumnIdx_;
for (size_t i = 0; i < insNum; ++i) { for (size_t i = 0; i < insNum; ++i) {
real value = outputD[pos]; real value = outputD[pos];
uint32_t binIdx = static_cast<uint32_t>(value * kBinNum_); uint32_t binIdx = static_cast<uint32_t>(value * kBinNum_);
......
...@@ -18,6 +18,7 @@ limitations under the License. */ ...@@ -18,6 +18,7 @@ limitations under the License. */
#include <memory> #include <memory>
#include "Matrix.h" #include "Matrix.h"
#include "hl_gpu.h" #include "hl_gpu.h"
#include "hl_matrix.h"
#include "hl_table_apply.h" #include "hl_table_apply.h"
#include "paddle/utils/Flags.h" #include "paddle/utils/Flags.h"
#include "paddle/utils/Logging.h" #include "paddle/utils/Logging.h"
...@@ -99,6 +100,19 @@ MatrixPtr VectorT<int>::toOneHotSparseMatrix(size_t idRange, bool useGpu) { ...@@ -99,6 +100,19 @@ MatrixPtr VectorT<int>::toOneHotSparseMatrix(size_t idRange, bool useGpu) {
return mat; return mat;
} }
template <>
std::shared_ptr<VectorT<int>> VectorT<real>::castToInt() {
std::shared_ptr<VectorT<int>> ret = IVector::create(this->getSize(), useGpu_);
if (useGpu_) {
hl_vector_cast2int(ret->getData(), this->getData(), this->getSize());
} else {
for (size_t i = 0; i < getSize(); ++i) {
ret->getData()[i] = int(this->getData()[i]);
}
}
return ret;
}
template <class T> template <class T>
GpuVectorT<T>::GpuVectorT(size_t size) GpuVectorT<T>::GpuVectorT(size_t size)
: VectorT<T>(size, : VectorT<T>(size,
......
...@@ -162,6 +162,13 @@ public: ...@@ -162,6 +162,13 @@ public:
*/ */
std::shared_ptr<Matrix> toOneHotSparseMatrix(size_t idRange, bool useGpu); std::shared_ptr<Matrix> toOneHotSparseMatrix(size_t idRange, bool useGpu);
/**
* @brief cast vector of "real" elements to "int" elements.
*
* @note: float -> int must be casted, or you'll get wrong data.
*/
std::shared_ptr<VectorT<int>> castToInt();
/** /**
* This function will crash if the size of src and dest is different. * This function will crash if the size of src and dest is different.
*/ */
......
...@@ -131,9 +131,10 @@ add_subdirectory(math) ...@@ -131,9 +131,10 @@ add_subdirectory(math)
add_subdirectory(nccl) add_subdirectory(nccl)
set(DEPS_OPS set(DEPS_OPS
recurrent_op
cond_op cond_op
cross_entropy_op cross_entropy_op
recurrent_op
dynamic_recurrent_op
softmax_with_cross_entropy_op softmax_with_cross_entropy_op
sum_op sum_op
pool_op pool_op
...@@ -142,9 +143,6 @@ set(DEPS_OPS ...@@ -142,9 +143,6 @@ set(DEPS_OPS
sequence_conv_op sequence_conv_op
lstm_op) lstm_op)
op_library(recurrent_op SRCS recurrent_op.cc rnn/recurrent_op_utils.cc
DEPS framework_proto tensor net_op)
op_library(cond_op SRCS cond_op.cc DEPS framework_proto tensor operator net_op) op_library(cond_op SRCS cond_op.cc DEPS framework_proto tensor operator net_op)
op_library(cross_entropy_op DEPS cross_entropy) op_library(cross_entropy_op DEPS cross_entropy)
op_library(softmax_with_cross_entropy_op DEPS cross_entropy softmax) op_library(softmax_with_cross_entropy_op DEPS cross_entropy softmax)
...@@ -156,7 +154,9 @@ op_library(nccl_op DEPS nccl_common) ...@@ -156,7 +154,9 @@ op_library(nccl_op DEPS nccl_common)
endif() endif()
op_library(sequence_conv_op DEPS context_project) op_library(sequence_conv_op DEPS context_project)
op_library(lstm_op DEPS sequence2batch lstm_compute) op_library(lstm_op DEPS sequence2batch lstm_compute)
op_library(dynamic_recurrent_op SRCS dynamic_recurrent_op.cc rnn/recurrent_op_utils.cc
DEPS net_op tensor_array)
op_library(recurrent_op SRCS recurrent_op.cc DEPS executor)
list(REMOVE_ITEM GENERAL_OPS ${DEPS_OPS}) list(REMOVE_ITEM GENERAL_OPS ${DEPS_OPS})
foreach(src ${GENERAL_OPS}) foreach(src ${GENERAL_OPS})
op_library(${src}) op_library(${src})
...@@ -168,8 +168,9 @@ cc_test(gather_test SRCS gather_test.cc DEPS tensor) ...@@ -168,8 +168,9 @@ cc_test(gather_test SRCS gather_test.cc DEPS tensor)
cc_test(net_op_test SRCS net_op_test.cc DEPS net_op) cc_test(net_op_test SRCS net_op_test.cc DEPS net_op)
cc_test(scatter_test SRCS scatter_test.cc DEPS tensor) cc_test(scatter_test SRCS scatter_test.cc DEPS tensor)
cc_test(strided_memcpy_test SRCS strided_memcpy_test.cc DEPS tensor paddle_memory) cc_test(strided_memcpy_test SRCS strided_memcpy_test.cc DEPS tensor paddle_memory)
cc_test(dynamic_recurrent_op_test SRCS dynamic_recurrent_op_test.cc DEPS dynamic_recurrent_op recurrent_op tensor_array) cc_test(dynamic_recurrent_op_test SRCS dynamic_recurrent_op_test.cc
rnn/recurrent_op_utils.cc
DEPS dynamic_recurrent_op)
if(WITH_GPU) if(WITH_GPU)
nv_test(nccl_op_test SRCS nccl_op_test.cu DEPS nccl_op gpu_info device_context) nv_test(nccl_op_test SRCS nccl_op_test.cu DEPS nccl_op gpu_info device_context)
endif() endif()
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/conv2d_transpose_op.h"
namespace paddle {
namespace operators {
class CudnnConv2DTransposeOpMaker : public Conv2DTransposeOpMaker {
public:
CudnnConv2DTransposeOpMaker(framework::OpProto* proto,
framework::OpAttrChecker* op_checker)
: Conv2DTransposeOpMaker(proto, op_checker) {
AddAttr<std::vector<int>>("dilations", "dilations of convolution operator.")
.SetDefault(std::vector<int>{1, 1});
AddAttr<int>("workspace_size_MB",
"workspace size for cudnn, in MB, "
"workspace is a section of GPU memory which will be "
"allocated/freed each time the operator runs, larger "
"workspace size can increase performance but also requires "
"better hardward. This size should be carefully setted.")
.SetDefault(4096);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP(conv2d_transpose_cudnn, ops::Conv2DTransposeOp,
ops::CudnnConv2DTransposeOpMaker, conv2d_transpose_cudnn_grad,
ops::Conv2DTransposeOpGrad);
REGISTER_OP_CPU_KERNEL(
conv2d_transpose_cudnn,
ops::GemmConv2DTransposeKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(
conv2d_transpose_cudnn_grad,
ops::GemmConv2DTransposeGradKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
#include "paddle/memory/memory.h"
#include "paddle/operators/conv2d_transpose_op.h"
#include "paddle/platform/assert.h"
#include "paddle/platform/cudnn_helper.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using ScopedTensorDescriptor = platform::ScopedTensorDescriptor;
using ScopedFilterDescriptor = platform::ScopedFilterDescriptor;
using ScopedConvolutionDescriptor = platform::ScopedConvolutionDescriptor;
using DataLayout = platform::DataLayout;
using CUDADeviceContext = platform::CUDADeviceContext;
static constexpr size_t kConvCudnnWorkspaceLimitBytes = 1024 * 1024 * 1024;
template <typename T>
class CudnnConvTransposeOpKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"It must use GPUPlace.");
auto* input = ctx.Input<Tensor>("Input");
auto* filter = ctx.Input<Tensor>("Filter");
auto* output = ctx.Output<Tensor>("Output");
std::vector<int> strides = ctx.Attr<std::vector<int>>("strides");
std::vector<int> paddings = ctx.Attr<std::vector<int>>("paddings");
// cudnn v5 does not support dilations
std::vector<int> dilations = ctx.Attr<std::vector<int>>("dilations");
int user_workspace_size = ctx.Attr<int>("workspace_size_MB");
const T* input_data = input->data<T>();
const T* filter_data = filter->data<T>();
T* output_data = output->mutable_data<T>(ctx.GetPlace());
// ------------------- cudnn descriptors ---------------------
ScopedTensorDescriptor input_desc;
ScopedTensorDescriptor output_desc;
ScopedFilterDescriptor filter_desc;
ScopedConvolutionDescriptor conv_desc;
DataLayout layout = DataLayout::kNCHW;
// N, M, H, W
cudnnTensorDescriptor_t cudnn_input_desc = input_desc.descriptor<T>(
layout, framework::vectorize2int(input->dims()));
// N, C, O_h, O_w
cudnnTensorDescriptor_t cudnn_output_desc = output_desc.descriptor<T>(
layout, framework::vectorize2int(output->dims()));
// M, C, K_h, K_w
cudnnFilterDescriptor_t cudnn_filter_desc = filter_desc.descriptor<T>(
layout, framework::vectorize2int(filter->dims()));
cudnnConvolutionDescriptor_t cudnn_conv_desc =
conv_desc.descriptor<T>(paddings, strides, dilations);
// ------------------- cudnn conv workspace ---------------------
void* cudnn_workspace = nullptr;
size_t workspace_size_in_bytes; // final workspace to allocate.
size_t workspace_size_limit = kConvCudnnWorkspaceLimitBytes;
if (user_workspace_size > 0) {
workspace_size_limit = user_workspace_size * 1024 * 1024;
}
// ------------------- cudnn conv algorithm ---------------------
cudnnConvolutionBwdDataAlgo_t algo;
auto handle = ctx.cuda_device_context().cudnn_handle();
// Get the algorithm
PADDLE_ENFORCE(platform::dynload::cudnnGetConvolutionBackwardDataAlgorithm(
handle, cudnn_filter_desc, cudnn_input_desc, cudnn_conv_desc,
// dxDesc: Handle to the previously initialized output tensor
// descriptor.
cudnn_output_desc, CUDNN_CONVOLUTION_BWD_DATA_SPECIFY_WORKSPACE_LIMIT,
workspace_size_limit, &algo));
// get workspace size able to allocate
PADDLE_ENFORCE(
platform::dynload::cudnnGetConvolutionBackwardDataWorkspaceSize(
handle, cudnn_filter_desc, cudnn_input_desc, cudnn_conv_desc,
cudnn_output_desc, algo, &workspace_size_in_bytes));
// Allocate on GPU memory
platform::GPUPlace gpu = boost::get<platform::GPUPlace>(ctx.GetPlace());
cudnn_workspace = paddle::memory::Alloc(gpu, workspace_size_in_bytes);
// ------------------- cudnn conv transpose forward ---------------------
T alpha = 1.0f, beta = 0.0f;
PADDLE_ENFORCE(platform::dynload::cudnnConvolutionBackwardData(
handle, &alpha, cudnn_filter_desc, filter_data, cudnn_input_desc,
input_data, cudnn_conv_desc, algo, cudnn_workspace,
workspace_size_in_bytes, &beta, cudnn_output_desc, output_data));
// Release the cudnn workspace
paddle::memory::Free(gpu, cudnn_workspace);
}
};
template <typename T>
class CudnnConvTransposeGradOpKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"It must use GPUPlace.");
auto input = ctx.Input<Tensor>("Input");
auto filter = ctx.Input<Tensor>("Filter");
auto output_grad = ctx.Input<Tensor>(framework::GradVarName("Output"));
auto input_grad = ctx.Output<Tensor>(framework::GradVarName("Input"));
auto filter_grad = ctx.Output<Tensor>(framework::GradVarName("Filter"));
const T* input_data = input->data<T>();
const T* output_grad_data = output_grad->data<T>();
const T* filter_data = filter->data<T>();
std::vector<int> strides = ctx.Attr<std::vector<int>>("strides");
std::vector<int> paddings = ctx.Attr<std::vector<int>>("paddings");
// cudnn v5 does not support dilations
std::vector<int> dilations = ctx.Attr<std::vector<int>>("dilations");
int user_workspace_size = ctx.Attr<int>("workspace_size_MB");
// ------------------- cudnn descriptors ---------------------
ScopedTensorDescriptor input_desc;
ScopedTensorDescriptor output_desc;
ScopedFilterDescriptor filter_desc;
ScopedConvolutionDescriptor conv_desc;
DataLayout layout = DataLayout::kNCHW;
// Input: (N, M, H, W)
cudnnTensorDescriptor_t cudnn_input_desc = input_desc.descriptor<T>(
layout, framework::vectorize2int(input->dims()));
// Output: (N, C, O_H, O_W)
cudnnTensorDescriptor_t cudnn_output_desc = output_desc.descriptor<T>(
layout, framework::vectorize2int(output_grad->dims()));
// Filter (M, C, K_H, K_W)
cudnnFilterDescriptor_t cudnn_filter_desc = filter_desc.descriptor<T>(
layout, framework::vectorize2int(filter->dims()));
cudnnConvolutionDescriptor_t cudnn_conv_desc =
conv_desc.descriptor<T>(paddings, strides, dilations);
// ------------------- cudnn backward algorithm ---------------------
cudnnConvolutionFwdAlgo_t data_algo;
cudnnConvolutionBwdFilterAlgo_t filter_algo;
size_t bwd_filter_ws_size, fwd_ws_size;
size_t workspace_size_in_bytes = 0;
size_t workspace_size_limit = kConvCudnnWorkspaceLimitBytes;
if (user_workspace_size > 0) {
workspace_size_limit = user_workspace_size * 1024 * 1024;
}
auto handle = ctx.cuda_device_context().cudnn_handle();
if (input_grad) {
// choose backward algorithm for data
PADDLE_ENFORCE(platform::dynload::cudnnGetConvolutionForwardAlgorithm(
handle, cudnn_output_desc, cudnn_filter_desc, cudnn_conv_desc,
cudnn_input_desc, CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT,
workspace_size_limit, &data_algo));
PADDLE_ENFORCE(platform::dynload::cudnnGetConvolutionForwardWorkspaceSize(
handle, cudnn_output_desc, cudnn_filter_desc, cudnn_conv_desc,
cudnn_input_desc, data_algo, &fwd_ws_size));
workspace_size_in_bytes = std::max(workspace_size_in_bytes, fwd_ws_size);
}
if (filter_grad) {
// choose backward algorithm for filter
PADDLE_ENFORCE(
platform::dynload::cudnnGetConvolutionBackwardFilterAlgorithm(
handle, cudnn_output_desc, cudnn_input_desc, cudnn_conv_desc,
cudnn_filter_desc,
CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT,
workspace_size_limit, &filter_algo));
// get workspace for backwards filter algorithm
PADDLE_ENFORCE(
platform::dynload::cudnnGetConvolutionBackwardFilterWorkspaceSize(
handle, cudnn_output_desc, cudnn_input_desc, cudnn_conv_desc,
cudnn_filter_desc, filter_algo, &bwd_filter_ws_size));
workspace_size_in_bytes =
std::max(workspace_size_in_bytes, bwd_filter_ws_size);
}
// ------------------- cudnn conv workspace ---------------------
// Already on GPU
void* cudnn_workspace = nullptr;
platform::GPUPlace gpu = boost::get<platform::GPUPlace>(ctx.GetPlace());
cudnn_workspace = paddle::memory::Alloc(gpu, workspace_size_in_bytes);
// ------------------- cudnn conv backward data ---------------------
// FIXME(typhoonzero): template type T may not be the same as cudnn call.
T alpha = 1.0f, beta = 0.0f;
if (input_grad) {
T* input_grad_data = input_grad->mutable_data<T>(ctx.GetPlace());
auto t = framework::EigenVector<T>::Flatten(*input_grad);
t.device(ctx.GetEigenDevice<platform::GPUPlace>()) =
t.constant(static_cast<T>(0));
PADDLE_ENFORCE(platform::dynload::cudnnConvolutionForward(
handle, &alpha, cudnn_output_desc, output_grad_data,
cudnn_filter_desc, filter_data, cudnn_conv_desc, data_algo,
cudnn_workspace, workspace_size_in_bytes, &beta, cudnn_input_desc,
input_grad_data));
}
// ------------------- cudnn conv backward filter ---------------------
if (filter_grad) {
T* filter_grad_data = filter_grad->mutable_data<T>(ctx.GetPlace());
auto t = framework::EigenVector<T>::Flatten(*filter_grad);
t.device(ctx.GetEigenDevice<platform::GPUPlace>()) =
t.constant(static_cast<T>(0));
// Gradient with respect to the filter
PADDLE_ENFORCE(platform::dynload::cudnnConvolutionBackwardFilter(
handle, &alpha, cudnn_output_desc, output_grad_data, cudnn_input_desc,
input_data, cudnn_conv_desc, filter_algo, cudnn_workspace,
workspace_size_in_bytes, &beta, cudnn_filter_desc, filter_grad_data));
}
// Release the cudnn workspace
paddle::memory::Free(gpu, cudnn_workspace);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(conv2d_transpose_cudnn,
ops::CudnnConvTransposeOpKernel<float>);
REGISTER_OP_GPU_KERNEL(conv2d_transpose_cudnn_grad,
ops::CudnnConvTransposeGradOpKernel<float>);
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/operators/conv2dtranspose_op.h" #include "paddle/operators/conv2d_transpose_op.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
...@@ -95,13 +95,13 @@ void Conv2DTransposeOpGrad::InferShape( ...@@ -95,13 +95,13 @@ void Conv2DTransposeOpGrad::InferShape(
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(conv2dtranspose, ops::Conv2DTransposeOp, REGISTER_OP(conv2d_transpose, ops::Conv2DTransposeOp,
ops::Conv2DTransposeOpMaker, conv2dtranspose_grad, ops::Conv2DTransposeOpMaker, conv2d_transpose_grad,
ops::Conv2DTransposeOpGrad); ops::Conv2DTransposeOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
conv2dtranspose, conv2d_transpose,
ops::GemmConv2DTransposeKernel<paddle::platform::CPUPlace, float>); ops::GemmConv2DTransposeKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
conv2dtranspose_grad, conv2d_transpose_grad,
ops::GemmConv2DTransposeGradKernel<paddle::platform::CPUPlace, float>); ops::GemmConv2DTransposeGradKernel<paddle::platform::CPUPlace, float>);
...@@ -12,13 +12,13 @@ ...@@ -12,13 +12,13 @@
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/operators/conv2dtranspose_op.h" #include "paddle/operators/conv2d_transpose_op.h"
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL( REGISTER_OP_GPU_KERNEL(
conv2dtranspose, conv2d_transpose,
ops::GemmConv2DTransposeKernel<paddle::platform::GPUPlace, float>); ops::GemmConv2DTransposeKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL( REGISTER_OP_GPU_KERNEL(
conv2dtranspose_grad, conv2d_transpose_grad,
ops::GemmConv2DTransposeGradKernel<paddle::platform::GPUPlace, float>); ops::GemmConv2DTransposeGradKernel<paddle::platform::GPUPlace, float>);
...@@ -62,7 +62,7 @@ class GemmConv2DTransposeKernel : public framework::OpKernel<T> { ...@@ -62,7 +62,7 @@ class GemmConv2DTransposeKernel : public framework::OpKernel<T> {
std::vector<int> strides = context.Attr<std::vector<int>>("strides"); std::vector<int> strides = context.Attr<std::vector<int>>("strides");
// TODO(Zhuoyuan): Paddings can be added in future. // TODO(Zhuoyuan): Paddings can be added in future.
// groups will alway be disabled in conv2dtranspose. // groups will alway be disabled in conv2d_transpose.
const int batch_size = input->dims()[0]; const int batch_size = input->dims()[0];
const int m = input->dims()[1]; const int m = input->dims()[1];
......
...@@ -36,7 +36,12 @@ class FillConstantBatchSizeLikeOp : public framework::OperatorWithKernel { ...@@ -36,7 +36,12 @@ class FillConstantBatchSizeLikeOp : public framework::OperatorWithKernel {
[](int a) { return static_cast<int64_t>(a); }); [](int a) { return static_cast<int64_t>(a); });
auto dims = framework::make_ddim(shape_int64); auto dims = framework::make_ddim(shape_int64);
dims[0] = ctx->GetInputDim("Input")[0]; int dim_idx = ctx->Attrs().Get<int>("dim_idx");
PADDLE_ENFORCE_GE(dim_idx, 0);
PADDLE_ENFORCE_GT(static_cast<int>(shape.size()), dim_idx);
PADDLE_ENFORCE_GT(ctx->GetInputDim("Input").size(), dim_idx);
dims[dim_idx] = ctx->GetInputDim("Input")[dim_idx];
ctx->SetOutputDim("Out", dims); ctx->SetOutputDim("Out", dims);
} }
...@@ -57,15 +62,18 @@ class FillConstantBatchSizeLikeOpMaker ...@@ -57,15 +62,18 @@ class FillConstantBatchSizeLikeOpMaker
"(int, default 5 (FP32)) " "(int, default 5 (FP32)) "
"Output data type") "Output data type")
.SetDefault(framework::DataType::FP32); .SetDefault(framework::DataType::FP32);
AddAttr<std::vector<int>>("shape", "(vector<int>) The shape of the output");
AddAttr<float>("value", "(float, default 0) The value to be filled")
.SetDefault(0.0f);
AddInput("Input", AddInput("Input",
"(Tensor) Tensor " "(Tensor) Tensor "
"whose first dimension is used to specify the batch_size"); "whose dim_idx th dimension is used to specify the batch_size");
AddOutput("Out", AddOutput("Out",
"(Tensor) Tensor of specified shape will be filled " "(Tensor) Tensor of specified shape will be filled "
"with the specified value"); "with the specified value");
AddAttr<std::vector<int>>("shape", "(vector<int>) The shape of the output");
AddAttr<int>("dim_idx",
"(int, default 0) the index of batch size dimension")
.SetDefault(0);
AddAttr<float>("value", "(float, default 0) The value to be filled")
.SetDefault(0.0f);
AddComment(R"DOC(Fill up a variable with specified constant value.)DOC"); AddComment(R"DOC(Fill up a variable with specified constant value.)DOC");
} }
}; };
......
...@@ -90,11 +90,13 @@ class LookupTableGradKernel : public framework::OpKernel<T> { ...@@ -90,11 +90,13 @@ class LookupTableGradKernel : public framework::OpKernel<T> {
auto* d_output_data = d_output->data<T>(); auto* d_output_data = d_output->data<T>();
auto* d_table_data = d_table->mutable_data<T>(context.GetPlace()); auto* d_table_data = d_table->mutable_data<T>(context.GetPlace());
memset(d_table_data, 0, d_table->numel() * sizeof(T));
for (int64_t i = 0; i < ids->numel(); ++i) { for (int64_t i = 0; i < ids->numel(); ++i) {
PADDLE_ENFORCE_LT(ids_data[i], N); PADDLE_ENFORCE_LT(ids_data[i], N);
PADDLE_ENFORCE_GE(ids_data[i], 0); PADDLE_ENFORCE_GE(ids_data[i], 0);
for (int j = 0; j < D; ++j) { for (int j = 0; j < D; ++j) {
d_table_data[ids_data[i] * D + j] = d_output_data[i * D + j]; d_table_data[ids_data[i] * D + j] += d_output_data[i * D + j];
} }
} }
} }
......
...@@ -12,6 +12,10 @@ ...@@ -12,6 +12,10 @@
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
/* Acknowledgement: the following code is strongly inspired by
https://github.com/caffe2/caffe2/blob/master/caffe2/operators/lstm_unit_op_gpu.cu
*/
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/operators/cross_entropy_op.h" #include "paddle/operators/cross_entropy_op.h"
#include "paddle/platform/assert.h" #include "paddle/platform/assert.h"
......
...@@ -12,6 +12,10 @@ ...@@ -12,6 +12,10 @@
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
/* Acknowledgement: the following code is strongly inspired by
https://github.com/caffe2/caffe2/blob/master/caffe2/operators/lstm_unit_op.h
*/
#pragma once #pragma once
#include "glog/logging.h" #include "glog/logging.h"
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
......
...@@ -29,9 +29,14 @@ class MulOpShapeInference : public framework::InferShapeBase { ...@@ -29,9 +29,14 @@ class MulOpShapeInference : public framework::InferShapeBase {
auto x_dims = ctx->GetInputDim("X"); auto x_dims = ctx->GetInputDim("X");
auto y_dims = ctx->GetInputDim("Y"); auto y_dims = ctx->GetInputDim("Y");
int x_num_col_dims = ctx->Attrs().Get<int>("x_num_col_dims"); int x_num_col_dims = ctx->Attrs().Get<int>("x_num_col_dims");
int y_num_col_dims = ctx->Attrs().Get<int>("y_num_col_dims"); int y_num_col_dims = ctx->Attrs().Get<int>("y_num_col_dims");
VLOG(3) << "mul operator x.shape=" << x_dims << " y.shape=" << y_dims
<< " x_num_col_dims=" << x_num_col_dims
<< " y_num_col_dims=" << y_num_col_dims;
PADDLE_ENFORCE_GT( PADDLE_ENFORCE_GT(
x_dims.size(), x_num_col_dims, x_dims.size(), x_num_col_dims,
"The input tensor X's rank of MulOp should be larger than " "The input tensor X's rank of MulOp should be larger than "
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/precision_recall_op.h"
namespace paddle {
namespace operators {
class PrecisionRecallOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("MaxProbs"),
"Input(MaxProbs) should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Indices"),
"Input(Indices) should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Labels"),
"Input(Labels) should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("BatchMetrics"),
"Output(BatchMetrics) should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("AccumMetrics"),
"Output(AccumMetrics) should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("AccumStatesInfo"),
"Output(AccumStatesInfo) should not be null.");
int64_t cls_num =
static_cast<int64_t>(ctx->Attrs().Get<int>("class_number"));
auto max_probs_dims = ctx->GetInputDim("MaxProbs");
auto labels_dims = ctx->GetInputDim("Labels");
PADDLE_ENFORCE_EQ(max_probs_dims[1], 1,
"Each instance contains one max probability, so the "
"shape of Input(MaxProbs) should be [batch_size, 1].");
PADDLE_ENFORCE_EQ(ctx->GetInputDim("Indices"), max_probs_dims,
"The shape of Input(Indices) should be [batch_size, 1].");
PADDLE_ENFORCE_EQ(max_probs_dims[0], labels_dims[0],
"The 1st dimension of Input(MaxProbs) and "
"Input(Labels) both are batch_size and the shape should "
"be the same.");
PADDLE_ENFORCE_EQ(labels_dims[1], 1,
"The 2nd dimension of Input(Labels) contains instance "
"label and the shape should be equal to 1.");
if (ctx->HasInput("Weights")) {
auto weights_dims = ctx->GetInputDim("Weights");
PADDLE_ENFORCE_EQ(weights_dims,
framework::make_ddim({max_probs_dims[0], 1}),
"The shape of Input(Weights) should be "
"[batch_size, 1].");
}
if (ctx->HasInput("StatesInfo")) {
auto states_dims = ctx->GetInputDim("StatesInfo");
PADDLE_ENFORCE_EQ(states_dims, framework::make_ddim({cls_num, 4}),
"The shape of Input(StatesInfo) should be "
"[class_number, 4].");
}
// Layouts of BatchMetrics and AccumMetrics both are:
// [
// macro average precision, macro average recall, macro average F1 score,
// micro average precision, micro average recall, micro average F1 score
// ]
ctx->SetOutputDim("BatchMetrics", {6});
ctx->SetOutputDim("AccumMetrics", {6});
// Shape of AccumStatesInfo is [class_number, 4]
// The layout of each row is:
// [ TP, FP, TN, FN ]
ctx->SetOutputDim("AccumStatesInfo", {cls_num, 4});
}
protected:
framework::DataType IndicateDataType(
const framework::ExecutionContext &ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("MaxProbs")->type());
}
};
class PrecisionRecallOpMaker : public framework::OpProtoAndCheckerMaker {
public:
PrecisionRecallOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("MaxProbs",
"(Tensor, default Tensor<float>), a 2-D tensor with shape N x 1, "
"where N is the batch size. Each row contains the max probability "
"of an instance which computed by the previous top_k (k=1) "
"operator.");
AddInput("Indices",
"(Tensor, default Tensor<int>), a 2-D tensor with shape N x 1, "
"where N is the batch size. Each row contains the corresponding "
"index which computed by the previous top_k (k=1) operator.");
AddInput("Labels",
"(Tensor, default Tensor<int>), a 2-D tensor with shape N x 1, "
"where N is the batch size. Each element is a label and the "
"value should be in [0, class_number - 1].");
AddInput("Weights",
"(Tensor, default Tensor<float>), a 2-D tensor with shape N x 1, "
"where N is the batch size. This input is optional. If provided, "
"weight of instance would be considered when computing metrics.")
.AsDispensable();
AddInput("StatesInfo",
"(Tensor, default Tensor<int>), a 2-D tensor with shape D x 4, "
"where D is the number of classes. This input is optional. If "
"provided, current state will be accumulated to this state and "
"the accumulation state will be as the output state.")
.AsDispensable();
AddOutput("BatchMetrics",
"(Tensor, default Tensor<float>), a 1-D tensor with shape {6}."
"This output tensor contains metrics for current batch data."
"The layout is [macro average precision, macro average recall, "
"macro f1 score, micro average precision, micro average recall, "
"micro f1 score]");
AddOutput("AccumMetrics",
"(Tensor, default Tensor<float>), a 1-D tensor with shape {6}."
"This output tensor contains metrics for accumulated data."
"The layout is [macro average precision, macro average recall, "
"macro f1 score, micro average precision, micro average recall, "
"micro f1 score]");
AddOutput("AccumStatesInfo",
"(Tensor, default Tensor<float>), a 2-D tensor with shape D x 4, "
"where D is equal to class number. This output tensor contains "
"accumulated state variables used to compute metrics. The layout "
"for each class is [true positives, false positives, "
"true negatives, false negatives].");
AddAttr<int>("class_number", "Number of classes to be evaluated.");
AddComment(R"DOC(
When given 'Input(Indices)' and 'Input(Labels)', this operator can be used
to compute various metrics including:
- macro average precision
- macro average recall
- macro f1 score
- micro average precision
- micro average recall
- micro f1 score
To compute the above metrics, we need to do statistics for true positives,
false positives and false negatives. Here count of true negatives is not
necessary, but counting it may provide potential usage and the cost is
trivial, so the operator also provides count of true negatives.
We define state as a 2-D tensor with shape [class_number, 4]. Each row of a
state contains statistic variables for corresponding class. Layout of each row
is: TP(true positives), FP(false positives), TN(true negatives),
FN(false negatives). If 'Input(Weights)' provided, TP, FP, TN, FN will be
calculated by given weight instead of instance count.
This operator also supports metrics computing for cross-batch situation. To
achieve this, 'Input(StatesInfo)' should be provided. State of current batch
data will be accumulated to 'Input(StatesInfo)' and 'Output(AccumStatesInfo)'
is the accumulation state.
'Output(BatchMetrics)' is metrics of current batch data while
'Output(AccumStatesInfo)' is metrics of accumulation data.
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(precision_recall, ops::PrecisionRecallOp,
ops::PrecisionRecallOpMaker);
REGISTER_OP_CPU_KERNEL(
precision_recall,
ops::PrecisionRecallKernel<paddle::platform::CPUPlace, float>,
ops::PrecisionRecallKernel<paddle::platform::CPUPlace, double>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
enum StateVariable { TP = 0, FP, TN, FN };
template <typename Place, typename T>
class PrecisionRecallKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* in0 = ctx.Input<Tensor>("Indices");
auto* in1 = ctx.Input<Tensor>("Labels");
auto* in2 = ctx.Input<Tensor>("Weights");
auto* in3 = ctx.Input<Tensor>("StatesInfo");
auto* out0 = ctx.Output<Tensor>("BatchMetrics");
auto* out1 = ctx.Output<Tensor>("AccumMetrics");
auto* out2 = ctx.Output<Tensor>("AccumStatesInfo");
const int* ids_data = in0->data<int>();
const int* labels_data = in1->data<int>();
size_t cls_num = static_cast<size_t>(ctx.Attr<int>("class_number"));
const T* weights_data = in2 ? in2->data<T>() : nullptr;
const T* states_data = in3 ? in3->data<T>() : nullptr;
double* batch_metrics_data = out0->mutable_data<double>(ctx.GetPlace());
double* accum_metrics_data = out1->mutable_data<double>(ctx.GetPlace());
out2->mutable_data<T>(ctx.GetPlace());
auto accum_states = EigenMatrix<T>::From(*out2);
accum_states.setZero();
T* accum_states_data = out2->data<T>();
size_t sample_num = in0->dims()[0];
size_t state_var_num = 4; // TP FP TN FN
// get states info for current batch
for (size_t i = 0; i < sample_num; ++i) {
size_t idx = ids_data[i];
size_t label = labels_data[i];
PADDLE_ENFORCE(idx >= 0 && idx < cls_num,
"Class index of each instance should be in "
"[0, class_number).");
PADDLE_ENFORCE(label >= 0 && label < cls_num,
"Label of each instance should be in [0, class_number).");
T w = weights_data ? weights_data[i] : 1.0;
if (idx == label) {
accum_states_data[idx * state_var_num + TP] += w;
for (size_t j = 0; j < cls_num; ++j) {
accum_states_data[j * state_var_num + TN] += w;
}
accum_states_data[idx * state_var_num + TN] -= w;
} else {
accum_states_data[label * state_var_num + FN] += w;
accum_states_data[idx * state_var_num + FP] += w;
for (size_t j = 0; j < cls_num; ++j) {
accum_states_data[j * state_var_num + TN] += w;
}
accum_states_data[idx * state_var_num + TN] -= w;
accum_states_data[label * state_var_num + TN] -= w;
}
}
ComputeMetrics(accum_states_data, batch_metrics_data, state_var_num,
cls_num);
if (states_data) {
for (size_t i = 0; i < cls_num; ++i) {
for (size_t j = 0; j < state_var_num; ++j) {
size_t idx = i * state_var_num + j;
accum_states_data[idx] += states_data[idx];
}
}
}
ComputeMetrics(accum_states_data, accum_metrics_data, state_var_num,
cls_num);
}
// expose to be reused
static inline T CalcPrecision(T tp_count, T fp_count) {
if (tp_count > 0.0 || fp_count > 0.0) {
return tp_count / (tp_count + fp_count);
}
return 1.0;
}
static inline T CalcRecall(T tp_count, T fn_count) {
if (tp_count > 0.0 || fn_count > 0.0) {
return tp_count / (tp_count + fn_count);
}
return 1.0;
}
static inline T CalcF1Score(T precision, T recall) {
if (precision > 0.0 || recall > 0.0) {
return 2 * precision * recall / (precision + recall);
}
return 0.0;
}
protected:
void ComputeMetrics(const T* states_data, double* metrics_data,
size_t state_var_num, size_t cls_num) const {
T total_tp_count = 0;
T total_fp_count = 0;
T total_fn_count = 0;
T macro_avg_precision = 0.0;
T macro_avg_recall = 0.0;
for (size_t i = 0; i < cls_num; ++i) {
T tp_count = states_data[i * state_var_num + TP];
T fp_count = states_data[i * state_var_num + FP];
T fn_count = states_data[i * state_var_num + FN];
total_tp_count += tp_count;
total_fp_count += fp_count;
total_fn_count += fn_count;
macro_avg_precision += CalcPrecision(tp_count, fp_count);
macro_avg_recall += CalcRecall(tp_count, fn_count);
}
macro_avg_precision /= cls_num;
macro_avg_recall /= cls_num;
T macro_f1_score = CalcF1Score(macro_avg_precision, macro_avg_recall);
T micro_avg_precision = CalcPrecision(total_tp_count, total_fp_count);
T micro_avg_recall = CalcRecall(total_tp_count, total_fn_count);
T micro_f1_score = CalcF1Score(micro_avg_precision, micro_avg_recall);
// fill metrics data
metrics_data[0] = macro_avg_precision;
metrics_data[1] = macro_avg_recall;
metrics_data[2] = macro_f1_score;
metrics_data[3] = micro_avg_precision;
metrics_data[4] = micro_avg_recall;
metrics_data[5] = micro_f1_score;
}
};
} // namespace operators
} // namespace paddle
...@@ -12,181 +12,618 @@ ...@@ -12,181 +12,618 @@
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/operators/recurrent_op.h" #include <vector>
#include "paddle/framework/executor.h"
#include <cstring>
#include <sstream>
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/operators/net_op.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
constexpr char kInputs[] = "inputs";
constexpr char kInitialStates[] = "initial_states";
constexpr char kParameters[] = "parameters";
constexpr char kOutputs[] = "outputs";
constexpr char kStepScopes[] = "step_scopes";
constexpr char kExStates[] = "ex_states";
constexpr char kStates[] = "states";
constexpr char kStepBlock[] = "step_block";
constexpr char kReverse[] = "reverse";
constexpr char kIsTrain[] = "is_train";
#define GRAD_SUFFIX "@GRAD"
constexpr char kInputGrads[] = "inputs" GRAD_SUFFIX;
constexpr char kOutputGrads[] = "outputs" GRAD_SUFFIX;
constexpr char kParamGrads[] = "parameters" GRAD_SUFFIX;
constexpr char kInitStateGrads[] = "initial_states" GRAD_SUFFIX;
using Scope = framework::Scope; using StepScopeVar = std::vector<framework::Scope *>;
using Variable = framework::Variable;
using Tensor = framework::Tensor; // StepScopes manages scopes inside RNN.
using LoDTensor = framework::LoDTensor; // StepScopes::CurScope() get the current scope
// StepScopes::ExScope() get the ex-scope, or scope in previous time step.
void RecurrentAlgorithm::Run(const Scope& scope, // StepScopes::Next() move to next time step.
const platform::DeviceContext& dev_ctx) const { //
auto* input0 = scope.FindVar(arg_->inlinks[0]); // if is_train = False, then
PADDLE_ENFORCE_NOT_NULL(input0); // there are two scopes for the RNN and just support forward.
size_t seq_len = input0->GetMutable<LoDTensor>()->dims()[0]; // else
PADDLE_ENFORCE_GT(seq_len, 0); // the len(scopes) == seq_len
//
CreateScopes(scope, seq_len); // if is_backward = True, then
auto& step_scopes = GetStepScopes(scope); // reversely access scopes
rnn::SegmentInputs(step_scopes, arg_->inlinks, seq_len); // else
InitMemories(step_scopes[0]); // access scopes from begin to end.
class StepScopes {
for (size_t step_id = 0; step_id < seq_len; step_id++) { public:
if (step_id > 0) { StepScopes(const framework::Scope &parent, StepScopeVar *scopes,
rnn::LinkMemories(step_scopes, arg_->states, step_id, -1); bool is_train, size_t seq_len, bool is_backward = false)
: counter_(is_backward ? seq_len - 1 : 0UL),
scopes_(scopes),
is_train_(is_train),
is_backward_(is_backward) {
size_t num_step_scopes = is_train ? seq_len : 2;
PADDLE_ENFORCE(is_train || !is_backward,
"Cannot backward when is not training");
if (!is_backward_) {
PADDLE_ENFORCE(scopes->empty());
scopes->reserve(static_cast<size_t>(num_step_scopes));
for (size_t i = 0; i < num_step_scopes; ++i) {
scopes->emplace_back(&parent.NewScope());
}
} }
(*stepnet_)->Run(*step_scopes[step_id], dev_ctx); }
}
rnn::ConcatOutputs(step_scopes, arg_->outlinks, seq_len, dev_ctx); framework::Scope &CurScope() { return GetScope(counter_); }
}
framework::Scope &ExScope() {
void RecurrentAlgorithm::CreateScopes(const Scope& scope, auto &scope = GetScope(is_backward_ ? counter_ + 1 : counter_ - 1);
size_t seq_len) const { return scope;
// TODO(superjom) Only two scopes are needed for inference, this case will be }
// supported later.
auto* step_scopes_var = scope.FindVar(arg_->step_scopes); void Next() {
PADDLE_ENFORCE(step_scopes_var != nullptr, ""); if (is_backward_) {
auto* step_scopes = step_scopes_var->GetMutable<std::vector<Scope*>>(); --counter_;
} else {
// Now all variables in scope must be created outside of op. ++counter_;
PADDLE_ENFORCE_NOT_NULL(stepnet_); }
PADDLE_ENFORCE(!(*stepnet_)->Outputs().empty(), }
"step_unit_ op has no outputs");
private:
if (seq_len > step_scopes->size()) { framework::Scope &GetScope(size_t scope_id) const {
for (size_t i = step_scopes->size(); i < seq_len; ++i) { if (!is_train_) {
auto& step_scope = scope.NewScope(); scope_id %= 2;
}
// create step net's temp inputs PADDLE_ENFORCE_LT(scope_id, scopes_->size());
for (auto& input : (*stepnet_)->Inputs()) { return *(*scopes_)[scope_id];
// the weight are located in parent scope }
for (auto& var_name : input.second) {
if (!step_scope.FindVar(var_name)) { size_t counter_;
step_scope.Var(var_name)->GetMutable<LoDTensor>(); StepScopeVar *scopes_;
} bool is_train_;
bool is_backward_;
};
// Base class for RecurrentOp/RecurrentGradOp
// Some common protected functions for RecurrentOp/RecurrentGradOp
class RecurrentBase : public framework::OperatorBase {
public:
RecurrentBase(const std::string &type,
const framework::VariableNameMap &inputs,
const framework::VariableNameMap &outputs,
const framework::AttributeMap &attrs)
: OperatorBase(type, inputs, outputs, attrs) {}
protected:
// Get SequenceLength from Scope
// The sequence length is got from input tensor. The input tensor's
// dimension should be [SEQ_LEN, ..., ...]. The first of the tensor's shape
// is SEQ_LEN. The second of the tensor's shape could be the batch size or
// nested sequence length.
int64_t GetSequenceLength(const framework::Scope &scope) const {
// Dim format SEQ_LEN, BATCH_SIZE, ...
int64_t seq_len = -1;
auto &all_inputs = Inputs(kInputs);
PADDLE_ENFORCE(!all_inputs.empty());
for (auto &iname : all_inputs) {
auto *var = scope.FindVar(iname);
PADDLE_ENFORCE(var != nullptr);
PADDLE_ENFORCE(var->IsType<framework::LoDTensor>());
auto &dim = var->Get<framework::LoDTensor>().dims();
if (seq_len == -1) {
seq_len = dim[0];
} else {
PADDLE_ENFORCE_EQ(seq_len, dim[0]);
}
}
return seq_len;
}
// for src_tensor, dst_tensor in zip(map(src_scope.FindVar, src_vars),
// map(dst_scope.Var, dst_vars)):
// dst_tensor.ShareDataWith(src_tensor)
static void LinkTensor(const framework::Scope &src_scope,
const std::vector<std::string> &src_vars,
framework::Scope *dst_scope,
const std::vector<std::string> &dst_vars) {
LinkTensorWithCallback(
src_scope, src_vars, dst_scope, dst_vars,
[&](const framework::Tensor &src, framework::Tensor *dst) {
dst->ShareDataWith(src);
});
}
// for src_tensor, dst_tensor in zip(map(src_scope.FindVar, src_vars),
// map(dst_scope.Var, dst_vars)):
// callback(src_tensor, &dst_tensor)
template <typename Callback>
static void LinkTensorWithCallback(const framework::Scope &src_scope,
const std::vector<std::string> &src_vars,
framework::Scope *dst_scope,
const std::vector<std::string> &dst_vars,
Callback callback) {
PADDLE_ENFORCE_EQ(src_vars.size(), dst_vars.size());
for (size_t i = 0; i < dst_vars.size(); ++i) {
VLOG(10) << "Link " << src_vars[i] << " to " << dst_vars[i];
AccessTensor(src_scope, src_vars[i], dst_scope, dst_vars[i], callback);
}
}
// for src_tensor, dst_tensor in zip(map(src_scope.FindVar, src_vars),
// map(dst_scope.FindVar, dst_vars)):
// callback(src_tensor, &dst_tensor)
template <typename Callback>
static void LinkTensorWithCallback(const framework::Scope &src_scope,
const std::vector<std::string> &src_vars,
const framework::Scope &dst_scope,
const std::vector<std::string> &dst_vars,
Callback callback) {
PADDLE_ENFORCE_EQ(src_vars.size(), dst_vars.size());
for (size_t i = 0; i < dst_vars.size(); ++i) {
VLOG(10) << "Link " << src_vars[i] << " to " << dst_vars[i];
AccessTensor(src_scope, src_vars[i], dst_scope, dst_vars[i], callback);
}
}
// (seq_len, shape) -> return [seq_len] + list(shape)
static framework::DDim PrependDims(size_t seq_len,
const framework::DDim &src) {
auto dims = framework::vectorize(src);
dims.insert(dims.begin(), static_cast<int64_t>(seq_len));
return framework::make_ddim(dims);
}
private:
template <typename Callback>
static void AccessTensor(const framework::Scope &src_scope,
const std::string &src_var_name,
framework::Scope *dst_scope,
const std::string &dst_var_name, Callback callback) {
auto *src_var = src_scope.FindVar(src_var_name);
PADDLE_ENFORCE(src_var != nullptr);
auto &src_tensor = src_var->Get<framework::LoDTensor>();
auto *dst_var = dst_scope->Var(dst_var_name);
auto *dst_tensor = dst_var->GetMutable<framework::LoDTensor>();
callback(src_tensor, dst_tensor);
}
template <typename Callback>
static void AccessTensor(const framework::Scope &src_scope,
const std::string &src_var_name,
const framework::Scope &dst_scope,
const std::string &dst_var_name, Callback callback) {
auto *src_var = src_scope.FindVar(src_var_name);
PADDLE_ENFORCE(src_var != nullptr);
auto &src_tensor = src_var->Get<framework::LoDTensor>();
auto *dst_var = dst_scope.FindVar(dst_var_name);
PADDLE_ENFORCE(dst_var != nullptr);
auto *dst_tensor = dst_var->GetMutable<framework::LoDTensor>();
callback(src_tensor, dst_tensor);
}
};
class RecurrentOp : public RecurrentBase {
public:
RecurrentOp(const std::string &type, const framework::VariableNameMap &inputs,
const framework::VariableNameMap &outputs,
const framework::AttributeMap &attrs)
: RecurrentBase(type, inputs, outputs, attrs) {}
void Run(const framework::Scope &scope,
const platform::DeviceContext &dev_ctx) const override {
auto seq_len = static_cast<size_t>(this->GetSequenceLength(scope));
VLOG(3) << "Static RNN input sequence length = " << seq_len;
StepScopes scopes = CreateStepScopes(scope, seq_len);
auto reverse = Attr<bool>(kReverse);
framework::Executor executor(dev_ctx);
auto *block = Attr<framework::BlockDescBind *>(kStepBlock);
auto *program = block->Program();
for (size_t i = 0; i < seq_len; ++i) {
size_t seq_offset = reverse ? seq_len - i - 1 : i;
VLOG(3) << "Recurrent operate at the time step " << seq_offset;
auto &cur_scope = scopes.CurScope();
// Link outside::input --> inside::input
// inside::input = outside::input[seq_offset: seq_offset+1]
LinkTensorWithCallback(
scope, Inputs(kInputs), &cur_scope, Inputs(kInputs),
[&seq_offset](const framework::Tensor &outside,
framework::Tensor *inside) {
inside->ShareDataWith(outside.Slice(seq_offset, seq_offset + 1));
auto dims = framework::vectorize(inside->dims());
dims.erase(dims.begin());
inside->Resize(framework::make_ddim(dims));
});
if (i == 0) {
// Link initial states --> ex_states
LinkTensor(scope, Inputs(kInitialStates), &cur_scope,
Attr<std::vector<std::string>>(kExStates));
} else {
auto &ex_scope = scopes.ExScope();
// Link ex_scope::state --> cur_scope::ex_state
LinkTensor(ex_scope, Attr<std::vector<std::string>>(kStates),
&cur_scope, Attr<std::vector<std::string>>(kExStates));
}
// Every inputs are linked now, execute!
executor.Run(*program, &cur_scope, block->ID(),
false /*create_local_scope*/);
// Copy inside::output -> outside::output
// outside::output[seq_offset: seq_offset + 1] = inside::output
this->LinkTensorWithCallback(
cur_scope, Outputs(kOutputs), scope, Outputs(kOutputs),
[&](const framework::LoDTensor &src_tensor,
framework::LoDTensor *dst_tensor) {
if (i == 0) { // create output tensor at begin
dst_tensor->Resize(PrependDims(seq_len, src_tensor.dims()));
dst_tensor->mutable_data(dev_ctx.GetPlace(), src_tensor.type());
}
auto dst_out = dst_tensor->Slice(seq_offset, seq_offset + 1);
// Explicit copy output since the local RNN scope can be destroyed
// early.
dst_out.CopyFrom(src_tensor, dev_ctx.GetPlace(), dev_ctx);
});
scopes.Next();
}
}
private:
StepScopes CreateStepScopes(const framework::Scope &scope,
size_t seq_len) const {
auto *var = scope.FindVar(Output(kStepScopes));
PADDLE_ENFORCE(var != nullptr);
return StepScopes(scope, var->GetMutable<StepScopeVar>(),
Attr<bool>(kIsTrain), seq_len);
}
};
class RecurrentGradOp : public RecurrentBase {
public:
RecurrentGradOp(const std::string &type,
const framework::VariableNameMap &inputs,
const framework::VariableNameMap &outputs,
const framework::AttributeMap &attrs)
: RecurrentBase(type, inputs, outputs, attrs) {}
void Run(const framework::Scope &scope,
const platform::DeviceContext &dev_ctx) const override {
auto seq_len = static_cast<size_t>(GetSequenceLength(scope));
StepScopes scopes = CreateStepScopes(scope, seq_len);
auto reverse = Attr<bool>(kReverse);
framework::Executor executor(dev_ctx);
auto *block = Attr<framework::BlockDescBind *>(kStepBlock);
auto *program = block->Program();
for (size_t step_id = 0; step_id < seq_len; ++step_id) {
size_t seq_offset = reverse ? step_id : seq_len - step_id - 1;
VLOG(3) << "Recurrent backward operate at the time step " << seq_offset;
auto &cur_scope = scopes.CurScope();
// Link outside::output_grads --> inside::output_grads
// inside::output_grad = outside::output_grad[seq_offset:seq_offset+1]
LinkTensorWithCallback(
scope, Inputs(kOutputGrads), &cur_scope, Inputs(kOutputGrads),
[&](const framework::Tensor &outside, framework::Tensor *inside) {
inside->ShareDataWith(outside.Slice(seq_offset, seq_offset + 1));
auto dims = framework::vectorize(inside->dims());
dims.erase(dims.begin());
inside->Resize(framework::make_ddim(dims));
});
auto og_set = List2Set(Inputs(kOutputGrads));
if (VLOG_IS_ON(10)) {
std::ostringstream sout;
std::copy(og_set.begin(), og_set.end(),
std::ostream_iterator<std::string>(sout, ","));
VLOG(10) << " RNN output gradients = [" << sout.str() << "]";
}
// Link states
// if cur_scope::cur_state_grad in out_grads:
// cur_scope::cur_state_grad += ex_scope::ex_state_grad
// else:
// ex_scope::ex_state_grad --> cur_scope::cur_state_grad
if (step_id != 0) { // not at beginning
auto &ex_scope = scopes.ExScope();
auto ex_state_grads =
GradVarLists(Attr<std::vector<std::string>>(kExStates));
auto cur_state_grads =
GradVarLists(Attr<std::vector<std::string>>(kStates));
PADDLE_ENFORCE_EQ(ex_state_grads.size(), cur_state_grads.size());
for (size_t i = 0; i < ex_state_grads.size(); ++i) {
auto &cur_grad = cur_state_grads[i];
auto &ex_grad = ex_state_grads[i];
auto &ex_tensor =
ex_scope.FindVar(ex_grad)->Get<framework::LoDTensor>();
VLOG(10) << " RNN link " << cur_grad << " from " << ex_grad;
auto *cur_grad_var = cur_scope.Var(cur_grad);
auto cur_grad_tensor =
cur_grad_var->GetMutable<framework::LoDTensor>();
cur_grad_tensor->CopyFrom(ex_tensor, dev_ctx.GetPlace(), dev_ctx);
} }
} }
// create stepnet's outputs
for (const auto& output : (*stepnet_)->Outputs()) { VLOG(5) << "Recurrent memory linking finished ";
for (auto& var_name : output.second) { // Run step block with cur_scope
step_scope.Var(var_name); executor.Run(*program, &cur_scope, block->ID(),
false /*create_local_scope*/);
VLOG(5) << "executor.Run finished ";
auto local_var_names = LocalVarNames(cur_scope);
// Accumulate params
// if (step == 0):
// outside::param_grad = 0.0
// outside::param_grad += inside::param_grad
{
auto &pg_names = Outputs(kParamGrads);
auto &p_names = Inputs(kParameters);
PADDLE_ENFORCE_EQ(pg_names.size(), p_names.size());
for (size_t prog_id = 0; prog_id < pg_names.size(); ++prog_id) {
auto inside_grad_name = framework::GradVarName(p_names[prog_id]);
// If does not compute gradient of that variable inside rnn, just
// continue
if (local_var_names.find(inside_grad_name) == local_var_names.end()) {
continue;
}
// zero gradient variable in step 0
if (step_id == 0) {
auto &inside_tensor = cur_scope.FindVar(inside_grad_name)
->Get<framework::LoDTensor>();
framework::AttributeMap attrs;
attrs["data_type"] = framework::ToDataType(inside_tensor.type());
attrs["shape"] = framework::vectorize2int(inside_tensor.dims());
attrs["value"] = 0.0f;
auto zero_op = framework::OpRegistry::CreateOp(
"fill_constant", {}, {{"Out", {pg_names[prog_id]}}}, attrs);
zero_op->Run(scope, dev_ctx);
}
// sum gradient
auto *outside_var = scope.FindVar(pg_names[prog_id]);
PADDLE_ENFORCE(outside_var != nullptr);
auto &outside_tensor =
*outside_var->GetMutable<framework::LoDTensor>();
std::string result_var_name;
auto *local_result_var = cur_scope.Var(&result_var_name);
auto &local_result_tensor =
*local_result_var->GetMutable<framework::LoDTensor>();
local_result_tensor.ShareDataWith(outside_tensor);
auto sum_op = framework::OpRegistry::CreateOp(
"sum", {{"X", {result_var_name, inside_grad_name}}},
{{"Out", {result_var_name}}}, {});
sum_op->Run(cur_scope, dev_ctx);
} }
} }
step_scopes->emplace_back(&step_scope); VLOG(5) << "Accumulate Parameter finished ";
// Copy input gradient from inside to outside
// outside::input_grad[seq_offset: seq_offset + 1] = inside::input_grad
LinkTensorWithCallback(
cur_scope, GradVarLists(Inputs(kInputs)), scope, Outputs(kInputGrads),
[&](const framework::LoDTensor &inside,
framework::LoDTensor *outside) {
if (inside.memory_size() == 0) { // IG is not created.
return;
}
if (step_id == 0) { // alloc memory
outside->Resize(PrependDims(seq_len, inside.dims()));
outside->mutable_data(dev_ctx.GetPlace(), inside.type());
}
auto dst = outside->Slice(seq_offset, seq_offset + 1);
dst.CopyFrom(inside, dev_ctx.GetPlace(), dev_ctx);
});
VLOG(5) << "Link outside gradient finished ";
if (step_id + 1 == seq_len) { // at_end
// copy initialize states gradient from inside to outside
LinkTensorWithCallback(
cur_scope, GradVarLists(Attr<std::vector<std::string>>(kExStates)),
scope, Outputs(kInitStateGrads),
[&](const framework::LoDTensor &inside,
framework::LoDTensor *outside) {
outside->Resize(inside.dims());
outside->mutable_data(dev_ctx.GetPlace(), inside.type());
outside->CopyFrom(inside, dev_ctx.GetPlace(), dev_ctx);
});
VLOG(5) << "Link initialize state gradient finished ";
}
scopes.Next();
} }
} }
}
private:
void RecurrentAlgorithm::InitMemories(Scope* step_scope) const { StepScopes CreateStepScopes(const framework::Scope &scope,
for (auto& attr : arg_->states) { size_t seq_len) const {
auto* pre_mem = step_scope->Var(attr.pre_var)->GetMutable<LoDTensor>(); auto *var = scope.FindVar(Input(kStepScopes));
PADDLE_ENFORCE(step_scope->FindVar(attr.boot_var) != nullptr, PADDLE_ENFORCE(var != nullptr);
"memory [%s]'s boot variable [%s] not exists", attr.var, return StepScopes(scope, var->GetMutable<StepScopeVar>(),
attr.boot_var); Attr<bool>(kIsTrain), seq_len, true /*is_backward*/);
auto* boot_mem = }
step_scope->FindVar(attr.boot_var)->GetMutable<LoDTensor>();
pre_mem->Resize(boot_mem->dims()); std::unordered_set<std::string> List2Set(
PADDLE_ENFORCE_EQ(pre_mem->dims().size(), 2); const std::vector<std::string> &list) const {
pre_mem->ShareDataWith(*boot_mem); std::unordered_set<std::string> local_var_name_set;
} local_var_name_set.reserve(list.size());
} for (auto &each : list) {
local_var_name_set.insert(each);
const rnn::ArgumentName RecurrentOp::kArgName{ }
"step_net", "step_scopes", "inputs", "outputs", return local_var_name_set;
"states", "ex_states", "initial_states"}; }
const rnn::ArgumentName RecurrentGradientOp::kArgName{ std::unordered_set<std::string> LocalVarNames(
"step_net", "step_scopes@GRAD", "outputs@GRAD", "inputs@GRAD", const framework::Scope &scope) const {
"states", "ex_states", "initial_states@GRAD"}; return this->List2Set(scope.GetAllNames(false));
}
RecurrentOp::RecurrentOp(const std::string& type, static std::vector<std::string> GradVarLists(
const framework::VariableNameMap& inputs, const std::vector<std::string> &var_names) {
const framework::VariableNameMap& outputs, std::vector<std::string> retv;
const framework::AttributeMap& attrs) retv.reserve(var_names.size());
: OperatorBase(type, inputs, outputs, attrs) { std::transform(var_names.begin(), var_names.end(), std::back_inserter(retv),
rnn::InitArgument(kArgName, &arg_, *this); framework::GradVarName);
alg_.Init(&arg_, &stepnet_); return retv;
} }
};
class RecurrentAlgorithmProtoAndCheckerMaker
: public framework::OpProtoAndCheckerMaker { class RecurrentOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
RecurrentAlgorithmProtoAndCheckerMaker(framework::OpProto* proto, RecurrentOpProtoMaker(framework::OpProto *proto,
framework::OpAttrChecker* op_checker) framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
const auto& name = RecurrentOp::kArgName; AddInput(kInputs, "rnn inputs").AsDuplicable();
// inputs and outputs stored in proto AddInput(kInitialStates, "rnn initial states").AsDuplicable();
AddInput(name.inlinks, AddInput(kParameters,
"the inputs that need to be segmented for each step.") "Parameters are used by step block as its input. However, the "
"inputs is not a sequence tensor. Every time step, each operator "
"in step block just use the parameter directly")
.AsDuplicable(); .AsDuplicable();
AddInput(name.initial_states, "variables to initialize states.") AddOutput(kOutputs,
"The output sequence of RNN. The sequence length must be same")
.AsDuplicable(); .AsDuplicable();
AddOutput(kStepScopes,
"StepScopes contains all local variables in each time step.");
AddAttr<std::vector<std::string>>(kExStates,
string::Sprintf(
R"DOC(The ex-state variable names.
The ex-state means the state value in the ex-timestep or the previous time step
[%s, %s, %s] must be the same order)DOC",
kExStates, kStates, kInitStateGrads));
AddAttr<std::vector<std::string>>(
kStates,
string::Sprintf(
"The state variable names. [%s, %s, %s] must be the same order",
kExStates, kStates, kInitStateGrads));
AddAttr<framework::BlockDescBind *>(kStepBlock,
"The step block inside RNN");
AddAttr<bool>(kReverse, R"DOC(Calculate RNN reversely or not.
By default reverse=False
AddOutput(name.outlinks, "the outputs that need to concated for all steps.") Assume the input data is [A, B, C, D]
.AsDuplicable();
AddOutput(name.step_scopes, "step scopes"); if reverse is False:
the computation of RNN is like
A B C D
| | | |
v v v v
rnn -----> rnn -----> rnn ----> rnn
| | | |
v v v v
o o o o
if reverse is True
the computation of RNN is like
A B C D
| | | |
v v v v
rnn <----- rnn <----- rnn <---- rnn
| | | |
v v v v
o o o o
)DOC").SetDefault(false);
AddAttr<bool>(kIsTrain, "").SetDefault(true);
AddComment(R"DOC(Static Length Recurrent Operator
The static length recurrent operator can only operate on fix sized sequence
data, i.e. in each mini-batch, the sequence length of all inputs are same.
)DOC");
}
};
class RecurrentGradOpDescMaker : public framework::SingleGradOpDescMaker {
public:
using framework::SingleGradOpDescMaker::SingleGradOpDescMaker;
// Attributes stored in AttributeMap protected:
AddAttr<std::vector<std::string>>(name.ex_states, "names of pre-states"); virtual std::unique_ptr<framework::OpDescBind> Apply() const {
AddAttr<std::vector<std::string>>(name.states, "names of states"); auto *grad = new framework::OpDescBind();
grad->SetType("recurrent_grad");
for (auto &input_param : this->InputNames()) {
grad->SetInput(input_param, this->Input(input_param));
grad->SetOutput(framework::GradVarName(input_param),
this->InputGrad(input_param));
}
for (auto &output_param : this->OutputNames()) {
if (output_param == kStepScopes) {
grad->SetInput(output_param, this->Output(output_param));
grad->SetInput(framework::GradVarName(output_param),
this->Output(output_param));
} else {
grad->SetInput(output_param, this->Output(output_param));
grad->SetInput(framework::GradVarName(output_param),
this->OutputGrad(output_param));
}
}
grad->SetAttrMap(this->Attrs());
grad->SetBlockAttr(kStepBlock, *grad_block_[0]);
AddComment("This is a recurrent group operator."); return std::unique_ptr<framework::OpDescBind>(grad);
} }
}; };
void RecurrentGradientAlgorithm::Run( class RecurrentGradOpShapeInference : public framework::InferShapeBase {
const Scope& scope, const platform::DeviceContext& dev_ctx) const { public:
auto* input0 = scope.FindVar(arg_->inlinks[0]); void operator()(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE_NOT_NULL(input0); std::vector<std::string> input{kInputs, kInitialStates};
size_t seq_len = input0->GetMutable<LoDTensor>()->dims()[0]; std::vector<std::string> output{kOutputs};
auto& step_scopes = GetStepScopes(scope); for (auto &s : input) {
rnn::SegmentInputs(step_scopes, arg_->inlinks, seq_len); PADDLE_ENFORCE(ctx->HasInputs(s));
for (int step_id = seq_len - 1; step_id >= 0; --step_id) { PADDLE_ENFORCE(ctx->HasOutputs(framework::GradVarName(s)));
if (static_cast<size_t>(step_id) != seq_len - 1) { }
rnn::LinkMemories(step_scopes, arg_->states, step_id, 1); for (auto &s : output) {
PADDLE_ENFORCE(ctx->HasInputs(s));
}
for (auto &s : input) {
ctx->SetOutputsDim(framework::GradVarName(s), ctx->GetInputsDim(s));
} }
(*stepnet_)->Run(*step_scopes[step_id], dev_ctx); if (ctx->HasInputs(kParameters)) {
} PADDLE_ENFORCE(ctx->HasOutputs(framework::GradVarName(kParameters)));
rnn::ConcatOutputs(step_scopes, arg_->outlinks, seq_len, dev_ctx); ctx->SetOutputsDim(framework::GradVarName(kParameters),
LinkBootMemoryGradients(step_scopes[0]); ctx->GetInputsDim(kParameters));
} }
}
void RecurrentGradientAlgorithm::LinkBootMemoryGradients( };
Scope* step_scope) const {
for (auto& attr : arg_->states) {
PADDLE_ENFORCE(step_scope->FindVar(attr.var) != nullptr,
"memory variable [%s] does not exists", attr.var);
PADDLE_ENFORCE(step_scope->FindVar(attr.boot_var) != nullptr,
"boot variable [%s] does not exists", attr.boot_var);
auto* mem_grad = step_scope->Var(attr.var)->GetMutable<LoDTensor>();
auto* boot_mem_grad =
step_scope->Var(attr.boot_var)->GetMutable<LoDTensor>();
boot_mem_grad->Resize(mem_grad->dims());
boot_mem_grad->ShareDataWith(*mem_grad);
}
}
RecurrentGradientOp::RecurrentGradientOp(
const std::string& type, const framework::VariableNameMap& inputs,
const framework::VariableNameMap& outputs,
const framework::AttributeMap& attrs)
: OperatorBase(type, inputs, outputs, attrs) {
rnn::InitArgument(kArgName, &arg_, *this, true /*is grad*/);
alg_.Init(&arg_, &stepnet_);
}
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
REGISTER_OP(recurrent, paddle::operators::RecurrentOp, REGISTER_OPERATOR(recurrent, paddle::operators::RecurrentOp,
paddle::operators::RecurrentAlgorithmProtoAndCheckerMaker, paddle::operators::RecurrentOpProtoMaker,
recurrent_grad, paddle::operators::RecurrentGradientOp); paddle::operators::RecurrentGradOpDescMaker);
REGISTER_OPERATOR(recurrent_grad, paddle::operators::RecurrentGradOp,
paddle::operators::RecurrentGradOpShapeInference);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/operator.h"
#include "paddle/operators/net_op.h"
#include "paddle/operators/rnn/recurrent_op_utils.h"
namespace paddle {
namespace operators {
// The sequence format in RecurrentOp is Tensor<seq_len, batch_size, dim> now.
// TODO(Superjom)
// 1. No-padding computing for sequences with indifinite length in one batch.
// 2. Hierarchical RNN for sequence with sub-sequence.
// 3. Internal Memory.
// 4. More Complex RNN architecture, such as Gated Feedback RNN.
// Refer to: https://arxiv.org/pdf/1502.02367.pdf
class RecurrentAlgorithm {
public:
void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const;
void Init(rnn::Argument* arg,
std::unique_ptr<framework::OperatorBase>* stepnet) {
PADDLE_ENFORCE_NOT_NULL(stepnet, "stepnet should be set before.");
arg_ = arg;
stepnet_ = stepnet;
}
protected:
/*
* The step scopes will be stored in the father scope as a variable.
*
* NOTE the scopes are reused in both the forward and backward, so just
* create once and expand its size if more steps need.
*/
void CreateScopes(const framework::Scope& scope, size_t seq_len) const;
const std::vector<framework::Scope*>& GetStepScopes(
const framework::Scope& scope) const {
return *scope.FindVar(arg_->step_scopes)
->GetMutable<std::vector<framework::Scope*>>();
}
void InitMemories(framework::Scope* step_scopes) const;
private:
std::unique_ptr<framework::OperatorBase>* stepnet_;
rnn::Argument* arg_;
};
class RecurrentGradientAlgorithm {
/**
* RNN's backward alogorithm.
*
* To accelerate the development of RecurrentGradientOp, we decouple RNN's
* algorithm and `OperatorBase`'s implementation, the former contains the core
* implementation of a RNN, and will keep stable even if the framework changes
* a
* lot, and the latter is a wrapper acts like an dapter for it to make RNN an
* operator.
*/
public:
void Init(rnn::Argument* arg,
std::unique_ptr<framework::OperatorBase>* stepnet) {
PADDLE_ENFORCE_NOT_NULL(stepnet, "stepnet should be set before.");
arg_ = std::move(arg);
stepnet_ = stepnet;
}
void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const;
void LinkBootMemoryGradients(framework::Scope* step_scopes) const;
protected:
inline const std::vector<framework::Scope*>& GetStepScopes(
const framework::Scope& scope) const {
return *scope.FindVar(arg_->step_scopes)
->GetMutable<std::vector<framework::Scope*>>();
}
private:
rnn::Argument* arg_;
std::unique_ptr<framework::OperatorBase>* stepnet_;
};
class RecurrentOp : public framework::OperatorBase {
public:
RecurrentOp(const std::string& type, const framework::VariableNameMap& inputs,
const framework::VariableNameMap& outputs,
const framework::AttributeMap& attrs);
RecurrentOp(const RecurrentOp& o)
: framework::OperatorBase(
static_cast<const framework::OperatorBase&>(o)) {
// TODO(yuyang18): Implement copy ctor well.
PADDLE_THROW("Not implemented");
}
void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const override {
alg_.Run(scope, dev_ctx);
}
void set_stepnet(std::unique_ptr<OperatorBase> net) {
stepnet_ = std::move(net);
}
const OperatorBase& stepnet() const { return *stepnet_; }
static const rnn::ArgumentName kArgName;
private:
RecurrentAlgorithm alg_;
rnn::Argument arg_;
std::unique_ptr<OperatorBase> stepnet_;
};
class RecurrentGradientOp : public framework::OperatorBase {
public:
RecurrentGradientOp(const std::string& type,
const framework::VariableNameMap& inputs,
const framework::VariableNameMap& outputs,
const framework::AttributeMap& attrs);
RecurrentGradientOp(const RecurrentGradientOp& o)
: framework::OperatorBase(
static_cast<const framework::OperatorBase&>(o)) {
// TODO(yuyang18): Implement Copy ctor.
PADDLE_THROW("Not Implemented");
}
void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const override {
alg_.Run(scope, dev_ctx);
}
static const rnn::ArgumentName kArgName;
/*
* set a stepnet that is created according to a RecurrentOp's stepnet.
*/
void set_stepnet(std::unique_ptr<OperatorBase> net) {
stepnet_ = std::move(net);
}
const OperatorBase& stepnet() const { return *stepnet_; }
private:
RecurrentGradientAlgorithm alg_;
std::unique_ptr<OperatorBase> stepnet_;
rnn::Argument arg_;
};
} // namespace operators
} // namespace paddle
...@@ -133,11 +133,10 @@ class RNNMemoryHelperGradOpShapeInference : public framework::InferShapeBase { ...@@ -133,11 +133,10 @@ class RNNMemoryHelperGradOpShapeInference : public framework::InferShapeBase {
public: public:
void operator()(framework::InferShapeContext *ctx) const override { void operator()(framework::InferShapeContext *ctx) const override {
auto x_grad_name = framework::GradVarName("X"); auto x_grad_name = framework::GradVarName("X");
auto out_grad_name = framework::GradVarName("Out");
PADDLE_ENFORCE(ctx->HasInput(out_grad_name), "");
PADDLE_ENFORCE(ctx->HasOutput(x_grad_name), ""); PADDLE_ENFORCE(ctx->HasOutput(x_grad_name), "");
ctx->SetOutputDim(x_grad_name, ctx->GetInputDim(out_grad_name)); PADDLE_ENFORCE(ctx->HasInput("X"), "");
ctx->ShareLoD(out_grad_name, /*->*/ x_grad_name); ctx->SetOutputDim(x_grad_name, ctx->GetInputDim("X"));
ctx->ShareLoD("X", /*->*/ x_grad_name);
} }
}; };
......
...@@ -42,7 +42,8 @@ class SequencePoolOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -42,7 +42,8 @@ class SequencePoolOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr<std::string>( AddAttr<std::string>(
"pooltype", "pooltype",
"(int, default AVERAGE) the pooling pooltype of SequencePoolOp.") "(int, default AVERAGE) the pooling pooltype of SequencePoolOp.")
.SetDefault("AVERAGE"); .SetDefault("AVERAGE")
.InEnum({"AVERAGE", "SUM", "SQRT", "LAST", "FIRST", "MAX"});
AddComment(R"DOC( AddComment(R"DOC(
SequencePoolOp pools features of all time-steps of each instance. SequencePoolOp pools features of all time-steps of each instance.
......
...@@ -29,22 +29,27 @@ template <typename Place, typename T> ...@@ -29,22 +29,27 @@ template <typename Place, typename T>
class SumKernel : public framework::OpKernel<T> { class SumKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto& in_vars = context.MultiInputVar("X"); auto in_vars = context.MultiInputVar("X");
int N = in_vars.size(); int N = in_vars.size();
auto out_var = context.OutputVar("Out"); auto out_var = context.OutputVar("Out");
bool in_place = out_var == in_vars[0];
if (out_var->IsType<framework::LoDTensor>()) { if (out_var->IsType<framework::LoDTensor>()) {
auto* out = context.Output<Tensor>("Out"); auto* out = context.Output<Tensor>("Out");
out->mutable_data<T>(context.GetPlace()); out->mutable_data<T>(context.GetPlace());
auto result = EigenVector<T>::Flatten(*out); auto result = EigenVector<T>::Flatten(*out);
math::SetConstant<Place, T> constant_functor; if (!in_place) {
constant_functor(context.device_context(), out, 0.0); math::SetConstant<Place, T> constant_functor;
constant_functor(context.device_context(), out, 0.0);
}
math::SelectedRowsAddToTensor<Place, T> functor; math::SelectedRowsAddToTensor<Place, T> functor;
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
for (int i = 0; i < N; i++) { // If in_place, just skip the first tensor
for (int i = in_place ? 1 : 0; i < N; i++) {
if (in_vars[i]->IsType<framework::LoDTensor>()) { if (in_vars[i]->IsType<framework::LoDTensor>()) {
auto& in_t = in_vars[i]->Get<framework::LoDTensor>(); auto& in_t = in_vars[i]->Get<framework::LoDTensor>();
auto in = EigenVector<T>::Flatten(in_t); auto in = EigenVector<T>::Flatten(in_t);
...@@ -57,6 +62,7 @@ class SumKernel : public framework::OpKernel<T> { ...@@ -57,6 +62,7 @@ class SumKernel : public framework::OpKernel<T> {
} }
} }
} else if (out_var->IsType<framework::SelectedRows>()) { } else if (out_var->IsType<framework::SelectedRows>()) {
PADDLE_ENFORCE(!in_place, "SelectedRows not support inplace sum now");
auto* out = context.Output<SelectedRows>("Out"); auto* out = context.Output<SelectedRows>("Out");
auto* out_value = out->mutable_value(); auto* out_value = out->mutable_value();
......
...@@ -28,7 +28,6 @@ limitations under the License. */ ...@@ -28,7 +28,6 @@ limitations under the License. */
#include "paddle/operators/cond_op.h" #include "paddle/operators/cond_op.h"
#include "paddle/operators/dynamic_recurrent_op.h" #include "paddle/operators/dynamic_recurrent_op.h"
#include "paddle/operators/net_op.h" #include "paddle/operators/net_op.h"
#include "paddle/operators/recurrent_op.h"
#include "paddle/platform/enforce.h" #include "paddle/platform/enforce.h"
#include "paddle/platform/place.h" #include "paddle/platform/place.h"
#include "paddle/pybind/exception.h" #include "paddle/pybind/exception.h"
...@@ -428,25 +427,6 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -428,25 +427,6 @@ All parameter, weight, gradient are variables in Paddle.
return self.UnstackShared(source); return self.UnstackShared(source);
}); });
// recurrent_op
py::class_<operators::RecurrentOp, OperatorBase>(m, "RecurrentOp")
.def_static(
"create",
[](py::bytes protobin) -> operators::RecurrentOp * {
OpDesc desc;
PADDLE_ENFORCE(desc.ParsePartialFromString(protobin),
"Cannot parse user input to OpDesc");
PADDLE_ENFORCE(desc.IsInitialized(),
"User OpDesc is not initialized, reason %s",
desc.InitializationErrorString());
auto rnn_op = OpRegistry::CreateOp(desc);
return static_cast<operators::RecurrentOp *>(rnn_op.release());
})
.def("set_stepnet", [](operators::RecurrentOp &self,
const operators::NetOp &net) -> void {
self.set_stepnet(net.Clone());
});
py::class_<operators::DynamicRecurrentOp, OperatorBase>(m, py::class_<operators::DynamicRecurrentOp, OperatorBase>(m,
"DynamicRecurrentOp") "DynamicRecurrentOp")
.def_static("create", .def_static("create",
......
...@@ -53,8 +53,8 @@ function deploy_docs() { ...@@ -53,8 +53,8 @@ function deploy_docs() {
set +e set +e
rm -rf ${DIR}/doc ${DIR}/doc_cn rm -rf ${DIR}/doc ${DIR}/doc_cn
set -e set -e
mv ../doc/cn/html ${DIR}/doc_cn cp -r ../doc/cn/html ${DIR}/doc_cn
mv ../doc/en/html ${DIR}/doc cp -r ../doc/en/html ${DIR}/doc
git add . git add .
} }
......
...@@ -62,7 +62,7 @@ class Executor(object): ...@@ -62,7 +62,7 @@ class Executor(object):
outputs={'Out': [fetch_var]}, outputs={'Out': [fetch_var]},
attrs={'col': i}) attrs={'col': i})
self.executor.run(program.desc, scope, 0) self.executor.run(program.desc, scope, 0, True)
return [ return [
core.get_fetch_variable(scope, fetch_var_name, i) core.get_fetch_variable(scope, fetch_var_name, i)
for i in xrange(len(fetch_list)) for i in xrange(len(fetch_list))
......
...@@ -7,6 +7,11 @@ import copy ...@@ -7,6 +7,11 @@ import copy
__all__ = ['Block', 'Variable', 'Program', 'Operator'] __all__ = ['Block', 'Variable', 'Program', 'Operator']
def unique_name(prefix):
uid = core.unique_integer(prefix) # unique during whole process.
return "_".join([prefix, str(uid)])
class Variable(object): class Variable(object):
def __init__(self, def __init__(self,
block, block,
...@@ -265,7 +270,8 @@ class Operator(object): ...@@ -265,7 +270,8 @@ class Operator(object):
self.desc.check_attrs() self.desc.check_attrs()
no_kernel_op_set = { no_kernel_op_set = {
'feed', 'fetch', 'save', 'load', 'rnn_memory_helper_grad' 'feed', 'fetch', 'save', 'load', 'recurrent',
'rnn_memory_helper_grad'
} }
if type not in no_kernel_op_set: if type not in no_kernel_op_set:
self.desc.infer_var_type(self.block.desc) self.desc.infer_var_type(self.block.desc)
......
import paddle.v2.framework.framework as framework import paddle.v2.framework.framework as framework
import numpy as np
__all__ = ['ConstantInitializer', 'UniformInitializer'] __all__ = [
'ConstantInitializer', 'UniformInitializer', 'NormalInitializer',
'XavierInitializer'
]
class Initializer(object): class Initializer(object):
...@@ -20,6 +24,41 @@ class Initializer(object): ...@@ -20,6 +24,41 @@ class Initializer(object):
""" """
raise NotImplementedError() raise NotImplementedError()
def _compute_fans(self, var):
"""Compute the fan_in and the fan_out for layers
This method computes the fan_in and the fan_out
for neural network layers, if not specified. It is
not possible to perfectly estimate fan_in and fan_out.
This method will estimate it correctly for matrix multiply and
convolutions.
Args:
var: variable for which fan_in and fan_out have to be computed
Returns:
tuple of two integers (fan_in, fan_out)
"""
shape = var.shape
if not shape or len(shape) == 0:
fan_in = fan_out = 1
elif len(shape) == 1:
fan_in = fan_out = shape[0]
elif len(shape) == 2:
# This is the case for simple matrix multiply
fan_in = shape[0]
fan_out = shape[1]
else:
# Assume this to be a convolutional kernel
# In PaddlePaddle, the shape of the kernel is like:
# [num_filters, num_filter_channels, ...] where the remaining
# dimensions are the filter_size
receptive_field_size = np.prod(shape[2:])
fan_in = shape[1] * receptive_field_size
fan_out = shape[0] * receptive_field_size
return (fan_in, fan_out)
class ConstantInitializer(Initializer): class ConstantInitializer(Initializer):
"""Implements the constant initializer """Implements the constant initializer
...@@ -156,3 +195,93 @@ class NormalInitializer(Initializer): ...@@ -156,3 +195,93 @@ class NormalInitializer(Initializer):
}) })
var.op = op var.op = op
return op return op
class XavierInitializer(Initializer):
"""Implements the Xavier initializer
This class implements the Xavier weight initializer from the paper
Understanding the difficulty of training deep feedforward neural
networks[1] by Xavier Glorot and Yoshua Bengio.
This initializer is designed to keep the scale of the gradients
approximately same in all the layers. In case of Uniform distribution,
the range is [-x, x], where x = sqrt(6 / (fan_in + fan_out)).
In case of Normal distribution, the mean is 0 and the standard deviation
is sqrt(2/ (fan_in + fan_out)).
References:
[1] Understanding the difficulty of training deep feedforward neural
networks. International conference on artificial intelligence and
statistics.
(http://proceedings.mlr.press/v9/glorot10a.html)
"""
def __init__(self, uniform=True, fan_in=None, fan_out=None, seed=0):
"""Constructor for XavierInitializer
Args:
uniform: whether to use uniform or normal distribution
fan_in: fan_in for Xavier initialization. If None, it is
inferred from the variable.
fan_out: fan_out for Xavier initialization. If None, it is
inferred from the variable.
seed: random seed
Note: It is recommended to set fan_in and fan_out to None for
most cases.
"""
assert uniform is not None
assert seed is not None
super(XavierInitializer, self).__init__()
self._uniform = uniform
self._fan_in = fan_in
self._fan_out = fan_out
self._seed = seed
def __call__(self, var, block):
"""Add xavier initialization ops for a variable
Args:
var: Variable that needs to be initialized
block: The block in which initialization ops
should be added
Returns:
the initialization op
"""
assert isinstance(var, framework.Variable)
assert isinstance(block, framework.Block)
f_in, f_out = self._compute_fans(var)
# If fan_in and fan_out are passed, use them
fan_in = f_in if self._fan_in is None else self._fan_in
fan_out = f_out if self._fan_out is None else self._fan_out
if self._uniform:
limit = np.sqrt(6.0 / float(fan_in + fan_out))
op = block.prepend_op(
type="uniform_random",
outputs={"Out": var},
attrs={
"shape": var.shape,
"data_type": int(var.data_type),
"min": -limit,
"max": limit,
"seed": self._seed
})
else:
std = np.sqrt(2.0 / float(fan_in + fan_out))
op = block.prepend_op(
type="gaussian_random",
outputs={"Out": var},
attrs={
"shape": var.shape,
"data_type": int(var.data_type),
"mean": 0.0,
"std": std,
"seed": self._seed
})
var.op = op
return op
import copy import copy
import itertools import itertools
import paddle.v2.framework.core as core
from paddle.v2.framework.framework import Variable, g_program, \ from paddle.v2.framework.framework import Variable, g_program, \
g_init_program g_init_program, unique_name, Program
from paddle.v2.framework.initializer import ConstantInitializer, \ from paddle.v2.framework.initializer import ConstantInitializer, \
UniformInitializer UniformInitializer
def unique_name(prefix):
uid = core.unique_integer(prefix) # unique during whole process.
return "_".join([prefix, str(uid)])
class LayerHelper(object): class LayerHelper(object):
def __init__(self, layer_type, **kwargs): def __init__(self, layer_type, **kwargs):
self.kwargs = kwargs self.kwargs = kwargs
...@@ -138,9 +131,19 @@ class LayerHelper(object): ...@@ -138,9 +131,19 @@ class LayerHelper(object):
def create_variable(self, *args, **kwargs): def create_variable(self, *args, **kwargs):
return self.program.current_block().create_var(*args, **kwargs) return self.program.current_block().create_var(*args, **kwargs)
def create_global_variable(self, *args, **kwargs): def create_global_variable(self, persistable=False, *args, **kwargs):
return self.program.global_block().create_var( return self.program.global_block().create_var(
*args, persistable=False, **kwargs) *args, persistable=persistable, **kwargs)
def set_variable_initializer(self, var, initializer):
assert isinstance(var, Variable)
self.init_program.global_block().create_var(
name=var.name,
type=var.type,
dtype=var.data_type,
shape=var.shape,
persistable=True,
initializer=initializer)
def append_bias_op(self, input_var, num_flatten_dims=None): def append_bias_op(self, input_var, num_flatten_dims=None):
""" """
......
from paddle.v2.framework.layer_helper import LayerHelper, unique_name from paddle.v2.framework.layer_helper import LayerHelper, unique_name
import paddle.v2.framework.core as core import paddle.v2.framework.core as core
from paddle.v2.framework.framework import OpProtoHolder, Variable, Program from paddle.v2.framework.framework import OpProtoHolder, Variable, Program, \
Operator
from paddle.v2.framework.initializer import ConstantInitializer from paddle.v2.framework.initializer import ConstantInitializer
import re import re
...@@ -32,7 +33,6 @@ def fc(input, ...@@ -32,7 +33,6 @@ def fc(input,
param_shape = [ param_shape = [
reduce(lambda a, b: a * b, input_shape[num_flatten_dims:], 1) reduce(lambda a, b: a * b, input_shape[num_flatten_dims:], 1)
] + [size] ] + [size]
w = helper.create_parameter( w = helper.create_parameter(
attr=param_attr, shape=param_shape, dtype=dtype) attr=param_attr, shape=param_shape, dtype=dtype)
tmp = helper.create_tmp_variable(dtype) tmp = helper.create_tmp_variable(dtype)
...@@ -88,8 +88,17 @@ def data(name, ...@@ -88,8 +88,17 @@ def data(name,
program=None, program=None,
init_program=None): init_program=None):
helper = LayerHelper('data', **locals()) helper = LayerHelper('data', **locals())
shape = list(shape)
for i in xrange(len(shape)):
if shape[i] is None:
shape[i] = -1
append_batch_size = False
elif shape[i] < 0:
append_batch_size = False
if append_batch_size: if append_batch_size:
shape = [-1] + shape # append batch size as -1 shape = [-1] + shape # append batch size as -1
return helper.create_global_variable( return helper.create_global_variable(
name=name, shape=shape, dtype=data_type, type=type) name=name, shape=shape, dtype=data_type, type=type)
...@@ -165,6 +174,9 @@ _create_op_func_('mul') ...@@ -165,6 +174,9 @@ _create_op_func_('mul')
_create_op_func_('elementwise_add') _create_op_func_('elementwise_add')
_create_op_func_('dropout') _create_op_func_('dropout')
_create_op_func_('reshape') _create_op_func_('reshape')
_create_op_func_('elementwise_add')
_create_op_func_('sigmoid')
_create_op_func_('scale')
def cast(x, data_type, program=None): def cast(x, data_type, program=None):
...@@ -193,15 +205,15 @@ def concat(input, axis, program=None, init_program=None): ...@@ -193,15 +205,15 @@ def concat(input, axis, program=None, init_program=None):
def sums(input, program=None, init_program=None): def sums(input, program=None, init_program=None):
helper = LayerHelper('sum', **locals()) helper = LayerHelper('sum', **locals())
out = helper.create_tmp_variable(dtype=helper.input_dtype()) out = helper.create_tmp_variable(dtype=helper.input_dtype())
helper.append_op(type='sum', inputs={'X': [input]}, outputs={'Out': out}) helper.append_op(type='sum', inputs={'X': input}, outputs={'Out': out})
return out return out
def cos_sim(X, Y, program=None, init_program=None): def cos_sim(X, Y, **kwargs):
helper = LayerHelper('cos_sim', **locals()) helper = LayerHelper('cos_sim', **kwargs)
out = helper.create_tmp_variable(dtype=helper.input_dtype("X")) out = helper.create_tmp_variable(dtype=X.data_type)
xnorm = helper.create_tmp_variable(dtype=helper.input_dtype("X")) xnorm = helper.create_tmp_variable(dtype=X.data_type)
ynorm = helper.create_tmp_variable(dtype=helper.input_dtype("X")) ynorm = helper.create_tmp_variable(dtype=X.data_type)
helper.append_op( helper.append_op(
type='cos_sim', type='cos_sim',
inputs={'X': [X], inputs={'X': [X],
...@@ -209,7 +221,7 @@ def cos_sim(X, Y, program=None, init_program=None): ...@@ -209,7 +221,7 @@ def cos_sim(X, Y, program=None, init_program=None):
outputs={'Out': [out], outputs={'Out': [out],
'XNorm': [xnorm], 'XNorm': [xnorm],
'YNorm': [ynorm]}) 'YNorm': [ynorm]})
return out, xnorm, ynorm return out
def cross_entropy(input, label, **kwargs): def cross_entropy(input, label, **kwargs):
...@@ -265,7 +277,8 @@ def accuracy(input, label, k=1, **kwargs): ...@@ -265,7 +277,8 @@ def accuracy(input, label, k=1, **kwargs):
def sequence_conv(input, def sequence_conv(input,
num_filters, num_filters,
filter_size=3, filter_size=3,
stride=1, filter_stride=1,
act=None,
padding=None, padding=None,
bias_attr=None, bias_attr=None,
param_attr=None, param_attr=None,
...@@ -291,9 +304,9 @@ def sequence_conv(input, ...@@ -291,9 +304,9 @@ def sequence_conv(input,
}, },
outputs={"Out": pre_bias}, outputs={"Out": pre_bias},
attrs={ attrs={
'context_stride': stride, 'contextStride': filter_stride,
'context_start': 0, 'contextStart': -int(filter_size / 2),
'context_length': filter_size 'contextLength': filter_size
}) })
pre_act = helper.append_bias_op(pre_bias) pre_act = helper.append_bias_op(pre_bias)
return helper.append_activation(pre_act) return helper.append_activation(pre_act)
...@@ -346,17 +359,12 @@ def conv2d(input, ...@@ -346,17 +359,12 @@ def conv2d(input,
'paddings': padding, 'paddings': padding,
'groups': groups}) 'groups': groups})
pre_act = helper.append_bias_op(pre_bias) pre_act = helper.append_bias_op(pre_bias, 1)
return helper.append_activation(pre_act) return helper.append_activation(pre_act)
def sequence_pool(input, pool_type, **kwargs): def sequence_pool(input, pool_type, **kwargs):
ENUM_POOL_TYPE = set(["MAX", "AVG", "SQRT", "LAST", "FIRST"])
if pool_type.upper() not in ENUM_POOL_TYPE:
raise ValueError("Unknown pool_type: '%s'. It can only be %s.",
str(pool_type), " ".join(ENUM_POOL_TYPE))
helper = LayerHelper('sequence_pool', input=input, **kwargs) helper = LayerHelper('sequence_pool', input=input, **kwargs)
dtype = helper.input_dtype() dtype = helper.input_dtype()
pool_out = helper.create_tmp_variable(dtype) pool_out = helper.create_tmp_variable(dtype)
...@@ -518,6 +526,8 @@ class StaticRNNGuard(BlockGuard): ...@@ -518,6 +526,8 @@ class StaticRNNGuard(BlockGuard):
return super(StaticRNNGuard, self).__enter__() return super(StaticRNNGuard, self).__enter__()
def __exit__(self, exc_type, exc_val, exc_tb): def __exit__(self, exc_type, exc_val, exc_tb):
if exc_type is not None:
return False
self.rnn.status = StaticRNN.AFTER_RNN_BLOCK self.rnn.status = StaticRNN.AFTER_RNN_BLOCK
self.rnn.complete_rnn_op() self.rnn.complete_rnn_op()
return super(StaticRNNGuard, self).__exit__(exc_type, exc_val, exc_tb) return super(StaticRNNGuard, self).__exit__(exc_type, exc_val, exc_tb)
...@@ -577,7 +587,7 @@ class StaticRNN(object): ...@@ -577,7 +587,7 @@ class StaticRNN(object):
outputs={'Out': [boot_var]}, outputs={'Out': [boot_var]},
attrs={ attrs={
'value': init_value, 'value': init_value,
'shape': boot_var.shape, 'shape': [40] + list(boot_var.shape[1:]),
'data_type': boot_var.data_type 'data_type': boot_var.data_type
}) })
...@@ -596,14 +606,14 @@ class StaticRNN(object): ...@@ -596,14 +606,14 @@ class StaticRNN(object):
if not isinstance(x, Variable): if not isinstance(x, Variable):
raise TypeError("step input takes a Variable") raise TypeError("step input takes a Variable")
if self.seq_len is None: if self.seq_len is None:
self.seq_len = x.shape[1] self.seq_len = x.shape[0]
elif self.seq_len != x.shape[1]: elif self.seq_len != x.shape[0]:
raise ValueError("Static RNN only take fix seq_len input") raise ValueError("Static RNN only take fix seq_len input")
ipt = self.helper.create_variable( ipt = self.helper.create_variable(
name=x.name, name=x.name,
dtype=x.data_type, dtype=x.data_type,
shape=[-1] + list(x.shape[2:]), shape=list(x.shape[1:]),
type=x.type) type=x.type)
self.inputs.append(ipt) self.inputs.append(ipt)
return ipt return ipt
...@@ -613,10 +623,17 @@ class StaticRNN(object): ...@@ -613,10 +623,17 @@ class StaticRNN(object):
if not isinstance(o, Variable): if not isinstance(o, Variable):
raise TypeError("step output takes a Variable") raise TypeError("step output takes a Variable")
tmp_o = self.helper.create_tmp_variable(dtype=o.data_type)
self.helper.append_op(
type='rnn_memory_helper',
inputs={'X': [o]},
outputs={'Out': tmp_o},
attrs={'data_type': o.data_type})
out_var = self.parent_block().create_var( out_var = self.parent_block().create_var(
name=o.name, name=tmp_o.name,
shape=[-1, self.seq_len] + list(o.shape[1:]), shape=[self.seq_len] + list(tmp_o.shape),
dtype=o.data_type) dtype=tmp_o.data_type)
self.outputs.append(out_var) self.outputs.append(out_var)
...@@ -647,6 +664,68 @@ class StaticRNN(object): ...@@ -647,6 +664,68 @@ class StaticRNN(object):
return self.outputs return self.outputs
def complete_rnn_op(self): def complete_rnn_op(self):
# TODO(yuyang18): Create RNN Op here. program = self.helper.program
# Implement this method after RNN op complete. rnn_block = program.current_block()
pass parent_block = self.parent_block()
local_inputs = set()
for op in rnn_block.ops:
assert isinstance(op, Operator)
for oname in op.output_names:
for out_var_name in op.output(oname):
local_inputs.add(out_var_name)
for var in self.inputs:
local_inputs.add(var.name)
for m in self.memories:
local_inputs.add(m)
params = list()
for op in rnn_block.ops:
assert isinstance(op, Operator)
for iname in op.input_names:
for in_var_name in op.input(iname):
if in_var_name not in local_inputs:
params.append(in_var_name)
parameters = [parent_block.var(name) for name in params]
step_scope = parent_block.create_var(
type=core.VarDesc.VarType.STEP_SCOPES)
inlinks = [parent_block.var(i.name) for i in self.inputs]
outlinks = self.outputs
boot_memories = []
pre_memories = []
memories = []
for _, mem in self.memories.iteritems():
boot_memories.append(mem.init)
pre_memories.append(mem.pre_mem.name)
mem_var = rnn_block.var(mem.mem.name)
assert isinstance(mem_var, Variable)
new_mem = self.helper.create_tmp_variable(dtype=mem_var.data_type)
rnn_block.append_op(
type='rnn_memory_helper',
inputs={'X': [mem_var]},
outputs={'Out': [new_mem]},
attrs={'data_type': mem_var.data_type})
memories.append(new_mem.name)
parent_block.append_op(
type='recurrent',
inputs={
'inputs': inlinks,
'initial_states': boot_memories,
'parameters': parameters
},
outputs={'outputs': outlinks,
'step_scopes': [step_scope]},
attrs={
'ex_states': pre_memories,
'states': memories,
'step_block': rnn_block
})
import argparse
import json
import logging
from collections import defaultdict
import paddle.v2.framework.core as core
import paddle.v2.framework.proto.framework_pb2 as framework_pb2
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
try:
from graphviz import Digraph
except ImportError:
logger.info(
'Cannot import graphviz, which is required for drawing a network. This '
'can usually be installed in python with "pip install graphviz". Also, '
'pydot requires graphviz to convert dot files to pdf: in ubuntu, this '
'can usually be installed with "sudo apt-get install graphviz".')
print('net_drawer will not run correctly. Please install the correct '
'dependencies.')
exit(0)
OP_STYLE = {
'shape': 'oval',
'color': '#0F9D58',
'style': 'filled',
'fontcolor': '#FFFFFF'
}
VAR_STYLE = {}
GRAPH_STYLE = {"rankdir": "TB", }
GRAPH_ID = 0
def unique_id():
def generator():
GRAPH_ID += 1
return GRAPH_ID
return generator
def draw_node(op):
node = OP_STYLE
node["name"] = op.type
node["label"] = op.type
return node
def draw_edge(var_parent, op, var, arg):
edge = VAR_STYLE
edge["label"] = "%s(%s)" % (var.parameter, arg)
edge["head_name"] = op.type
edge["tail_name"] = var_parent[arg]
return edge
def parse_graph(program, graph, var_dict, **kwargs):
# fill the known variables
for block in program.blocks:
for var in block.vars:
if not var_dict.has_key(var):
var_dict[var] = "Feed"
proto = framework_pb2.ProgramDesc.FromString(
program.desc.serialize_to_string())
for block in proto.blocks:
for op in block.ops:
graph.node(**draw_node(op))
for o in op.outputs:
for arg in o.arguments:
var_dict[arg] = op.type
for e in op.inputs:
for arg in e.arguments:
if var_dict.has_key(arg):
graph.edge(**draw_edge(var_dict, op, e, arg))
def draw_graph(init_program, program, **kwargs):
if kwargs.has_key("graph_attr"):
GRAPH_STYLE.update(kwargs[graph_attr])
if kwargs.has_key("node_attr"):
OP_STYLE.update(kwargs[node_attr])
if kwargs.has_key("edge_attr"):
VAR_STYLE.update(kwargs[edge_attr])
graph_id = unique_id()
filename = kwargs.get("filename")
if filename == None:
filename = str(graph_id) + ".gv"
g = Digraph(
name=str(graph_id),
filename=filename,
graph_attr=GRAPH_STYLE,
node_attr=OP_STYLE,
edge_attr=VAR_STYLE,
**kwargs)
var_dict = {}
parse_graph(init_program, g, var_dict)
parse_graph(program, g, var_dict)
if filename != None:
g.save()
return g
...@@ -47,7 +47,7 @@ def img_conv_group(input, ...@@ -47,7 +47,7 @@ def img_conv_group(input,
""" """
tmp = input tmp = input
assert isinstance(conv_num_filter, list) or \ assert isinstance(conv_num_filter, list) or \
isinstance(conv_num_filter, tuple) isinstance(conv_num_filter, tuple)
def __extend_list__(obj): def __extend_list__(obj):
if not hasattr(obj, '__len__'): if not hasattr(obj, '__len__'):
...@@ -101,6 +101,7 @@ def img_conv_group(input, ...@@ -101,6 +101,7 @@ def img_conv_group(input,
def sequence_conv_pool(input, def sequence_conv_pool(input,
num_filters, num_filters,
filter_size, filter_size,
act="sigmoid",
pool_type="max", pool_type="max",
program=None, program=None,
init_program=None): init_program=None):
...@@ -108,6 +109,7 @@ def sequence_conv_pool(input, ...@@ -108,6 +109,7 @@ def sequence_conv_pool(input,
input=input, input=input,
num_filters=num_filters, num_filters=num_filters,
filter_size=filter_size, filter_size=filter_size,
act=act,
program=program, program=program,
init_program=init_program) init_program=init_program)
......
from collections import defaultdict from collections import defaultdict
import paddle.v2.framework.framework as framework import paddle.v2.framework.framework as framework
from paddle.v2.framework.framework import unique_name, Program
from paddle.v2.framework.backward import append_backward_ops from paddle.v2.framework.backward import append_backward_ops
from paddle.v2.framework.initializer import ConstantInitializer
from paddle.v2.framework.regularizer import append_regularization_ops from paddle.v2.framework.regularizer import append_regularization_ops
from paddle.v2.framework.layer_helper import LayerHelper
__all__ = [ __all__ = [
'SGDOptimizer', 'MomentumOptimizer', 'AdagradOptimizer', 'AdamOptimizer', 'SGDOptimizer', 'MomentumOptimizer', 'AdagradOptimizer', 'AdamOptimizer',
...@@ -25,6 +28,7 @@ class Optimizer(object): ...@@ -25,6 +28,7 @@ class Optimizer(object):
# to train. These variables are called accumulators. # to train. These variables are called accumulators.
# {accum_name : { paramter_name : accumulator_for_parameter, ...}, ...} # {accum_name : { paramter_name : accumulator_for_parameter, ...}, ...}
self._accumulators = defaultdict(lambda: dict()) self._accumulators = defaultdict(lambda: dict())
self.helper = None
def _append_optimize_op(self, block, param_and_grad): def _append_optimize_op(self, block, param_and_grad):
""" append optimize operator to block and return all the added optimize_op """ append optimize operator to block and return all the added optimize_op
...@@ -63,7 +67,7 @@ class Optimizer(object): ...@@ -63,7 +67,7 @@ class Optimizer(object):
""" """
pass pass
def _add_accumulator(self, block, name, param, dtype=None, fill_value=0.0): def _add_accumulator(self, name, param, dtype=None, fill_value=0.0):
"""Utility function to add an accumulator for a parameter """Utility function to add an accumulator for a parameter
Args: Args:
...@@ -77,22 +81,17 @@ class Optimizer(object): ...@@ -77,22 +81,17 @@ class Optimizer(object):
param.name in self._accumulators[name]): param.name in self._accumulators[name]):
raise Exception("Accumulator {} already exists for parmeter {}". raise Exception("Accumulator {} already exists for parmeter {}".
format(name, param.name)) format(name, param.name))
global_block = block.program.global_block()
param_shape = list(param.shape) assert isinstance(self.helper, LayerHelper)
param_acc = global_block.create_var( var = self.helper.create_global_variable(
dtype=dtype, shape=param_shape, lod_level=0) name=unique_name(name),
persistable=True,
# Initialize the accumulator with fill_value dtype=dtype or param.data_type,
# FIXME: Fix when Initialization design has been implemented type=param.type,
# https://github.com/PaddlePaddle/Paddle/pull/4852 shape=param.shape)
global_block.append_op( self.helper.set_variable_initializer(
type="fill_constant", var, initializer=ConstantInitializer(value=float(fill_value)))
outputs={"Out": param_acc}, self._accumulators[name][param.name] = var
attrs={"shape": param_shape,
"value": fill_value})
# Add to accumulators dict
self._accumulators[name][param.name] = param_acc
def _get_accumulator(self, name, param): def _get_accumulator(self, name, param):
"""Utility function to fetch an accumulator for a parameter """Utility function to fetch an accumulator for a parameter
...@@ -130,7 +129,10 @@ class Optimizer(object): ...@@ -130,7 +129,10 @@ class Optimizer(object):
return increment_op return increment_op
def create_optimization_pass(self, parameters_and_grads, loss): def create_optimization_pass(self,
parameters_and_grads,
loss,
init_program=None):
"""Add optimization operators to update gradients to variables. """Add optimization operators to update gradients to variables.
Args: Args:
...@@ -142,6 +144,7 @@ class Optimizer(object): ...@@ -142,6 +144,7 @@ class Optimizer(object):
optimization. This will include parameter update ops, global step optimization. This will include parameter update ops, global step
update ops and any other custom ops required by subclasses to manage update ops and any other custom ops required by subclasses to manage
their internal state. their internal state.
:param init_program:
""" """
# This is a default implementation of create_optimization_pass that # This is a default implementation of create_optimization_pass that
# can be shared by most optimizers. This implementation assumes that # can be shared by most optimizers. This implementation assumes that
...@@ -151,6 +154,9 @@ class Optimizer(object): ...@@ -151,6 +154,9 @@ class Optimizer(object):
# for parameters and extend _finish_update method to add custom ops. # for parameters and extend _finish_update method to add custom ops.
# Create any accumulators # Create any accumulators
program = loss.block.program
self.helper = LayerHelper(
self.__class__.__name__, program=program, init_program=init_program)
self._create_accumulators(loss.block, self._create_accumulators(loss.block,
[p[0] for p in parameters_and_grads]) [p[0] for p in parameters_and_grads])
# Create any necessary tensors # Create any necessary tensors
...@@ -177,7 +183,11 @@ class Optimizer(object): ...@@ -177,7 +183,11 @@ class Optimizer(object):
return_ops.append(self._increment_global_step(loss.block)) return_ops.append(self._increment_global_step(loss.block))
return return_ops return return_ops
def minimize(self, loss, parameter_list=None, no_grad_set=None): def minimize(self,
loss,
init_program=None,
parameter_list=None,
no_grad_set=None):
"""Add operations to minimize `loss` by updating `parameter_list`. """Add operations to minimize `loss` by updating `parameter_list`.
This method combines interface `append_backward_ops()` and This method combines interface `append_backward_ops()` and
...@@ -187,7 +197,8 @@ class Optimizer(object): ...@@ -187,7 +197,8 @@ class Optimizer(object):
set()) set())
# Add regularization if any # Add regularization if any
params_grads = append_regularization_ops(params_grads) params_grads = append_regularization_ops(params_grads)
optimize_ops = self.create_optimization_pass(params_grads, loss) optimize_ops = self.create_optimization_pass(params_grads, loss,
init_program)
return optimize_ops return optimize_ops
...@@ -202,24 +213,19 @@ class SGDOptimizer(Optimizer): ...@@ -202,24 +213,19 @@ class SGDOptimizer(Optimizer):
self._learning_rate = learning_rate self._learning_rate = learning_rate
def _initialize_tensors(self, block): def _initialize_tensors(self, block):
assert isinstance(block, framework.Block)
lr_shape = [1] lr_shape = [1]
# create a variable for learning_rate # create a variable for learning_rate
self._lr = block.create_var( self._lr = self.helper.create_global_variable(
dtype="float32", shape=lr_shape, lod_level=0) name=unique_name("learning_rate"),
dtype='float32',
# create an op to init the learning_rate shape=lr_shape,
# FIXME: Fix when Initialization design has been implemented lod_level=1,
# https://github.com/PaddlePaddle/Paddle/pull/4852 persistable=True)
block.append_op( self.helper.set_variable_initializer(
type="fill_constant", var=self._lr, initializer=ConstantInitializer(self._learning_rate))
outputs={"Out": self._lr},
attrs={"shape": lr_shape,
"value": self._learning_rate})
def _append_optimize_op(self, block, param_and_grad): def _append_optimize_op(self, block, param_and_grad):
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
# create the optimize op # create the optimize op
sgd_op = block.append_op( sgd_op = block.append_op(
type=self.type, type=self.type,
...@@ -255,23 +261,20 @@ class MomentumOptimizer(Optimizer): ...@@ -255,23 +261,20 @@ class MomentumOptimizer(Optimizer):
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
lr_shape = [1] lr_shape = [1]
# create a variable for learning_rate # create a variable for learning_rate
self._lr = block.create_var( self._lr = self.helper.create_global_variable(
dtype="float32", shape=lr_shape, lod_level=0) name=unique_name("learning_rate"),
dtype='float32',
# create an op to init the learning_rate shape=lr_shape,
# FIXME: Fix when Initialization design has been implemented lod_level=1,
# https://github.com/PaddlePaddle/Paddle/pull/4852 persistable=True)
block.append_op( self.helper.set_variable_initializer(
type="fill_constant", var=self._lr, initializer=ConstantInitializer(self._learning_rate))
outputs={"Out": self._lr},
attrs={"shape": lr_shape,
"value": self._learning_rate})
def _create_accumulators(self, block, parameters): def _create_accumulators(self, block, parameters):
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
for p in parameters: for p in parameters:
self._add_accumulator(block, self._velocity_acc_str, p, 'float32') self._add_accumulator(self._velocity_acc_str, p)
def _append_optimize_op(self, block, param_and_grad): def _append_optimize_op(self, block, param_and_grad):
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
...@@ -311,26 +314,22 @@ class AdagradOptimizer(Optimizer): ...@@ -311,26 +314,22 @@ class AdagradOptimizer(Optimizer):
self._epsilon = epsilon self._epsilon = epsilon
def _initialize_tensors(self, block): def _initialize_tensors(self, block):
assert isinstance(block, framework.Block)
lr_shape = [1] lr_shape = [1]
# create a variable for learning_rate # create a variable for learning_rate
self._lr = block.create_var( self._lr = self.helper.create_global_variable(
dtype="float32", shape=lr_shape, lod_level=0) name=unique_name("learning_rate"),
dtype='float32',
# create an op to init the learning_rate shape=lr_shape,
# FIXME: Fix when Initialization design has been implemented lod_level=1,
# https://github.com/PaddlePaddle/Paddle/pull/4852 persistable=True)
block.append_op( self.helper.set_variable_initializer(
type="fill_constant", var=self._lr, initializer=ConstantInitializer(self._learning_rate))
outputs={"Out": self._lr},
attrs={"shape": lr_shape,
"value": self._learning_rate})
def _create_accumulators(self, block, parameters): def _create_accumulators(self, block, parameters):
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
for p in parameters: for p in parameters:
self._add_accumulator(block, self._moment_acc_str, p, 'float32') self._add_accumulator(self._moment_acc_str, p)
def _append_optimize_op(self, block, param_and_grad): def _append_optimize_op(self, block, param_and_grad):
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
...@@ -378,51 +377,46 @@ class AdamOptimizer(Optimizer): ...@@ -378,51 +377,46 @@ class AdamOptimizer(Optimizer):
self._epsilon = epsilon self._epsilon = epsilon
def _initialize_tensors(self, block): def _initialize_tensors(self, block):
assert isinstance(block, framework.Block)
lr_shape = [1] lr_shape = [1]
# create a variable for learning_rate # create a variable for learning_rate
self._lr = block.create_var( self._lr = self.helper.create_global_variable(
dtype="float32", shape=lr_shape, lod_level=0) name=unique_name("learning_rate"),
dtype='float32',
# create an op to init the learning_rate shape=lr_shape,
# FIXME: Fix when Initialization design has been implemented lod_level=1,
# https://github.com/PaddlePaddle/Paddle/pull/4852 persistable=True)
block.append_op( self.helper.set_variable_initializer(
type="fill_constant", var=self._lr, initializer=ConstantInitializer(self._learning_rate))
outputs={"Out": self._lr},
attrs={"shape": lr_shape,
"value": self._learning_rate})
def _create_accumulators(self, block, parameters): def _create_accumulators(self, block, parameters):
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
global_block = block.program.global_block() main_block = block.program.global_block()
# Create beta1 and beta2 power tensors # Create beta1 and beta2 power tensors
beta_shape = [1] beta_shape = [1]
# Create variables for beta1 and beta2 powers self._beta1_pow_acc = self.helper.create_global_variable(
self._beta1_pow_acc = global_block.create_var( name=unique_name('beta1_pow_acc'),
dtype="float32", shape=beta_shape, lod_level=0) dtype='float32',
self._beta2_pow_acc = global_block.create_var( shape=beta_shape,
dtype="float32", shape=beta_shape, lod_level=0) lod_level=0,
persistable=True)
# Initialize beta1 and beta2 power accumulators self.helper.set_variable_initializer(
# FIXME: Fix when Initialization design has been implemented self._beta1_pow_acc, initializer=ConstantInitializer(self._beta1))
# https://github.com/PaddlePaddle/Paddle/pull/4852
global_block.append_op( self._beta2_pow_acc = self.helper.create_global_variable(
type="fill_constant", name=unique_name('beta2_pow_acc'),
outputs={"Out": self._beta1_pow_acc}, dtype='float32',
attrs={"shape": beta_shape, shape=beta_shape,
"value": self._beta1}) lod_level=0,
global_block.append_op( persistable=True)
type="fill_constant",
outputs={"Out": self._beta2_pow_acc}, self.helper.set_variable_initializer(
attrs={"shape": beta_shape, self._beta2_pow_acc, initializer=ConstantInitializer(self._beta2))
"value": self._beta2})
# Create accumulator tensors for first and second moments # Create accumulator tensors for first and second moments
for p in parameters: for p in parameters:
self._add_accumulator(block, self._moment1_acc_str, p, 'float32') self._add_accumulator(self._moment1_acc_str, p)
self._add_accumulator(block, self._moment2_acc_str, p, 'float32') self._add_accumulator(self._moment2_acc_str, p)
def _append_optimize_op(self, block, param_and_grad): def _append_optimize_op(self, block, param_and_grad):
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
...@@ -460,14 +454,14 @@ class AdamOptimizer(Optimizer): ...@@ -460,14 +454,14 @@ class AdamOptimizer(Optimizer):
"""Update Beta1 and Beta2 Power accumulators """Update Beta1 and Beta2 Power accumulators
""" """
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
global_block = block.program.global_block() main_block = block.program.global_block()
scale_beta1 = global_block.append_op( scale_beta1 = main_block.append_op(
type="scale", type="scale",
inputs={"X": self._beta1_pow_acc}, inputs={"X": self._beta1_pow_acc},
outputs={"Out": self._beta1_pow_acc}, outputs={"Out": self._beta1_pow_acc},
attrs={"scale": self._beta1}) attrs={"scale": self._beta1})
scale_beta2 = global_block.append_op( scale_beta2 = main_block.append_op(
type="scale", type="scale",
inputs={"X": self._beta2_pow_acc}, inputs={"X": self._beta2_pow_acc},
outputs={"Out": self._beta2_pow_acc}, outputs={"Out": self._beta2_pow_acc},
...@@ -500,43 +494,33 @@ class AdamaxOptimizer(Optimizer): ...@@ -500,43 +494,33 @@ class AdamaxOptimizer(Optimizer):
self._epsilon = epsilon self._epsilon = epsilon
def _initialize_tensors(self, block): def _initialize_tensors(self, block):
assert isinstance(block, framework.Block)
lr_shape = [1] lr_shape = [1]
# create a variable for learning_rate # create a variable for learning_rate
self._lr = block.create_var( self._lr = self.helper.create_global_variable(
dtype="float32", shape=lr_shape, lod_level=0) name=unique_name("learning_rate"),
dtype='float32',
# create an op to init the learning_rate shape=lr_shape,
# FIXME: Fix when Initialization design has been implemented lod_level=1,
# https://github.com/PaddlePaddle/Paddle/pull/4852 persistable=True)
block.append_op( self.helper.set_variable_initializer(
type="fill_constant", var=self._lr, initializer=ConstantInitializer(self._learning_rate))
outputs={"Out": self._lr},
attrs={"shape": lr_shape,
"value": self._learning_rate})
def _create_accumulators(self, block, parameters): def _create_accumulators(self, block, parameters):
assert isinstance(block, framework.Block)
global_block = block.program.global_block()
# Create beta1 power accumulator tensor # Create beta1 power accumulator tensor
beta_shape = [1] beta_shape = [1]
self._beta1_pow_acc = global_block.create_var( self._beta1_pow_acc = self.helper.create_global_variable(
dtype="float32", shape=beta_shape, lod_level=0) name=unique_name('beta1_pow_acc'),
dtype='float32',
# Initialize beta1 power accumulator shape=beta_shape,
# FIXME: Fix when Initialization design has been implemented lod_level=0,
# https://github.com/PaddlePaddle/Paddle/pull/4852 persistable=True)
global_block.append_op( self.helper.set_variable_initializer(
type="fill_constant", self._beta1_pow_acc, initializer=ConstantInitializer(self._beta1))
outputs={"Out": self._beta1_pow_acc},
attrs={"shape": beta_shape,
"value": self._beta1})
# Create accumulator tensors for first moment and infinity norm # Create accumulator tensors for first moment and infinity norm
for p in parameters: for p in parameters:
self._add_accumulator(block, self._moment_acc_str, p, 'float32') self._add_accumulator(self._moment_acc_str, p)
self._add_accumulator(block, self._inf_norm_acc_str, p, 'float32') self._add_accumulator(self._inf_norm_acc_str, p)
def _append_optimize_op(self, block, param_and_grad): def _append_optimize_op(self, block, param_and_grad):
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
...@@ -572,8 +556,8 @@ class AdamaxOptimizer(Optimizer): ...@@ -572,8 +556,8 @@ class AdamaxOptimizer(Optimizer):
"""Update Beta1 Power accumulator """Update Beta1 Power accumulator
""" """
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
global_block = block.program.global_block() main_block = block.program.global_block()
scale_beta1 = global_block.append_op( scale_beta1 = main_block.append_op(
type="scale", type="scale",
inputs={"X": self._beta1_pow_acc}, inputs={"X": self._beta1_pow_acc},
outputs={"Out": self._beta1_pow_acc}, outputs={"Out": self._beta1_pow_acc},
......
...@@ -45,23 +45,36 @@ class TestConv2dTransposeOp(OpTest): ...@@ -45,23 +45,36 @@ class TestConv2dTransposeOp(OpTest):
filter_ = np.random.random(self.filter_size).astype("float32") filter_ = np.random.random(self.filter_size).astype("float32")
output = conv2dtranspose_forward_naive( output = conv2dtranspose_forward_naive(
input_, filter_, conv2dtranspose_param).astype('float32') input_, filter_, conv2dtranspose_param).astype('float32')
# print 'deconv output py', output, output.shape
self.inputs = {'Input': input_, 'Filter': filter_} self.inputs = {'Input': input_, 'Filter': filter_}
self.attrs = { self.attrs = {
'strides': self.stride, 'strides': self.stride,
'paddings': self.pad, 'paddings': self.pad,
# 'dilations': self.dilations 'dilations': self.dilations
} }
self.outputs = {'Output': output} self.outputs = {'Output': output}
def test_check_output(self): def test_check_output(self):
print 'check output here' print 'check output here for', self.op_type
self.check_output() self.check_output()
def test_check_grad(self): def init_test_case(self):
self.pad = [0, 0]
self.stride = [1, 1]
self.dilations = [1, 1]
self.input_size = [2, 3, 5, 5] # NCHW
f_c = self.input_size[1]
self.filter_size = [f_c, 6, 3, 3]
def init_op_type(self):
self.op_type = "conv2d_transpose"
def test_check_grad_no_input(self):
self.check_grad( self.check_grad(
set(['Input', 'Filter']), 'Output', max_relative_error=0.05) ['Filter'],
'Output',
max_relative_error=0.05,
no_grad_set=set(['Input']))
def test_check_grad_no_filter(self): def test_check_grad_no_filter(self):
self.check_grad( self.check_grad(
...@@ -70,33 +83,15 @@ class TestConv2dTransposeOp(OpTest): ...@@ -70,33 +83,15 @@ class TestConv2dTransposeOp(OpTest):
max_relative_error=0.05, max_relative_error=0.05,
no_grad_set=set(['Filter'])) no_grad_set=set(['Filter']))
def test_check_grad_no_input(self): def test_check_grad(self):
self.check_grad( self.check_grad(
['Filter'], set(['Input', 'Filter']), 'Output', max_relative_error=0.05)
'Output',
max_relative_error=0.05,
no_grad_set=set(['Input']))
def init_test_case(self):
self.pad = [0, 0]
self.stride = [1, 1]
self.dilations = [1, 1]
self.input_size = [2, 3, 5, 5] # NCHW
f_c = self.input_size[1]
self.filter_size = [f_c, 6, 3, 3]
class TestCudnn(TestConv2dTransposeOp):
def init_op_type(self): def init_op_type(self):
self.op_type = "conv2dtranspose" self.op_type = "conv2d_transpose_cudnn"
"""
class TestCudnn(TestConv2dOp):
def init_group(self):
self.groups = 1
def init_op_type(self):
self.op_type = "conv_cudnn"
"""
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -61,4 +61,5 @@ class TestEvaluator(unittest.TestCase): ...@@ -61,4 +61,5 @@ class TestEvaluator(unittest.TestCase):
if __name__ == '__main__': if __name__ == '__main__':
exit(0)
unittest.main() unittest.main()
...@@ -3,13 +3,27 @@ import numpy as np ...@@ -3,13 +3,27 @@ import numpy as np
from op_test import OpTest from op_test import OpTest
class TestFillConstantBatchSizeLikeOp(OpTest): class TestFillConstantBatchSizeLikeWhenFirstDimIsBatchSize(OpTest):
def setUp(self): def setUp(self):
self.op_type = "fill_constant_batch_size_like" self.op_type = "fill_constant_batch_size_like"
self.inputs = {'Input': np.random.random((219, 232)).astype("float32")} self.inputs = {'Input': np.random.random((219, 232)).astype("float32")}
self.attrs = {'value': 3.5, 'shape': [-1, 132, 777]} self.attrs = {'value': 3.5, 'shape': [-1, 132, 7]}
out = np.random.random((219, 132, 777)).astype("float32") out = np.random.random((219, 132, 7)).astype("float32")
out.fill(3.5)
self.outputs = {'Out': out}
def test_check_output(self):
self.check_output()
class TestFillConstantBatchSizeLikeWhenSecondDimIsBatchSize(OpTest):
def setUp(self):
self.op_type = "fill_constant_batch_size_like"
self.inputs = {'Input': np.random.random((219, 232)).astype("float32")}
self.attrs = {'value': 3.5, 'shape': [132, -1, 7], 'dim_idx': 1}
out = np.random.random((132, 232, 7)).astype("float32")
out.fill(3.5) out.fill(3.5)
self.outputs = {'Out': out} self.outputs = {'Out': out}
......
...@@ -36,7 +36,7 @@ cost = layers.square_error_cost( ...@@ -36,7 +36,7 @@ cost = layers.square_error_cost(
avg_cost = layers.mean(x=cost, program=program, init_program=init_program) avg_cost = layers.mean(x=cost, program=program, init_program=init_program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001) sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
opts = sgd_optimizer.minimize(avg_cost) opts = sgd_optimizer.minimize(avg_cost, init_program)
BATCH_SIZE = 20 BATCH_SIZE = 20
......
...@@ -208,7 +208,7 @@ cost = layers.cross_entropy( ...@@ -208,7 +208,7 @@ cost = layers.cross_entropy(
avg_cost = layers.mean(x=cost, program=program, init_program=init_program) avg_cost = layers.mean(x=cost, program=program, init_program=init_program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001) sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
opts = sgd_optimizer.minimize(avg_cost) opts = sgd_optimizer.minimize(avg_cost, init_program)
BATCH_SIZE = 128 BATCH_SIZE = 128
PASS_NUM = 1 PASS_NUM = 1
......
...@@ -44,7 +44,7 @@ class TestBook(unittest.TestCase): ...@@ -44,7 +44,7 @@ class TestBook(unittest.TestCase):
x=cost, program=program, init_program=init_program) x=cost, program=program, init_program=init_program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001) sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
opts = sgd_optimizer.minimize(avg_cost) opts = sgd_optimizer.minimize(avg_cost, init_program)
place = core.CPUPlace() place = core.CPUPlace()
exe = executor.Executor(place) exe = executor.Executor(place)
......
import numpy as np
import unittest import unittest
import paddle.v2.framework.framework as framework import paddle.v2.framework.framework as framework
...@@ -116,5 +117,111 @@ class TestNormalInitializer(unittest.TestCase): ...@@ -116,5 +117,111 @@ class TestNormalInitializer(unittest.TestCase):
self.assertEqual(init_op.attr('seed'), 123) self.assertEqual(init_op.attr('seed'), 123)
class TestXavierInitializer(unittest.TestCase):
def test_uniform_xavier_initializer(self):
"""Test Xavier initializer with uniform distribution on
for matrix multiply.
"""
program = framework.Program()
block = program.global_block()
param = block.create_parameter(
dtype="float32",
shape=[5, 10],
lod_level=0,
name="param",
initializer=initializer.XavierInitializer())
self.assertEqual(len(block.ops), 1)
init_op = block.ops[0]
self.assertEqual(init_op.type, 'uniform_random')
limit = np.sqrt(6.0 / (param.shape[0] + param.shape[1]))
self.assertAlmostEqual(init_op.attr('min'), -limit, delta=DELTA)
self.assertAlmostEqual(init_op.attr('max'), limit, delta=DELTA)
self.assertEqual(init_op.attr('seed'), 0)
def test_uniform_xavier_initializer_conv(self):
"""Test Xavier initializer with uniform distribution on
for convolutions.
"""
program = framework.Program()
block = program.global_block()
param = block.create_parameter(
dtype="float32",
shape=[5, 10, 15, 20],
lod_level=0,
name="param",
initializer=initializer.XavierInitializer())
self.assertEqual(len(block.ops), 1)
init_op = block.ops[0]
self.assertEqual(init_op.type, 'uniform_random')
receptive_field_size = float(15 * 20)
limit = np.sqrt(6.0 / (
(param.shape[0] + param.shape[1]) * receptive_field_size))
self.assertAlmostEqual(init_op.attr('min'), -limit, delta=DELTA)
self.assertAlmostEqual(init_op.attr('max'), limit, delta=DELTA)
self.assertEqual(init_op.attr('seed'), 0)
def test_normal_xavier_initializer(self):
"""Test Xavier initializer with normal distribution on
for matrix multiply.
"""
program = framework.Program()
block = program.global_block()
param = block.create_parameter(
dtype="float32",
shape=[5, 10],
lod_level=0,
name="param",
initializer=initializer.XavierInitializer(uniform=False))
self.assertEqual(len(block.ops), 1)
init_op = block.ops[0]
self.assertEqual(init_op.type, 'gaussian_random')
std = np.sqrt(2.0 / (param.shape[0] + param.shape[1]))
self.assertAlmostEqual(init_op.attr('mean'), 0.0, delta=DELTA)
self.assertAlmostEqual(init_op.attr('std'), std, delta=DELTA)
self.assertEqual(init_op.attr('seed'), 0)
def test_normal_xavier_initializer_conv(self):
"""Test Xavier initializer with normal distribution on
for convolutions.
"""
program = framework.Program()
block = program.global_block()
param = block.create_parameter(
dtype="float32",
shape=[5, 10, 15, 20],
lod_level=0,
name="param",
initializer=initializer.XavierInitializer(uniform=False))
self.assertEqual(len(block.ops), 1)
init_op = block.ops[0]
self.assertEqual(init_op.type, 'gaussian_random')
receptive_field_size = float(15 * 20)
std = np.sqrt(2.0 / (
(param.shape[0] + param.shape[1]) * receptive_field_size))
self.assertAlmostEqual(init_op.attr('mean'), 0.0, delta=DELTA)
self.assertAlmostEqual(init_op.attr('std'), std, delta=DELTA)
self.assertEqual(init_op.attr('seed'), 0)
def test_xavier_initializer_supplied_arguments(self):
"""Test the Xavier initializer with supplied arguments
"""
program = framework.Program()
block = program.global_block()
block.create_parameter(
dtype="float32",
shape=[5, 10],
lod_level=0,
name="param",
initializer=initializer.XavierInitializer(
fan_in=12, fan_out=23, seed=134))
self.assertEqual(len(block.ops), 1)
init_op = block.ops[0]
self.assertEqual(init_op.type, 'uniform_random')
limit = np.sqrt(6.0 / (12 + 23))
self.assertAlmostEqual(init_op.attr('min'), -limit, delta=DELTA)
self.assertAlmostEqual(init_op.attr('max'), limit, delta=DELTA)
self.assertEqual(init_op.attr('seed'), 134)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -7,6 +7,7 @@ from paddle.v2.framework.backward import append_backward_ops ...@@ -7,6 +7,7 @@ from paddle.v2.framework.backward import append_backward_ops
class TestOptimizer(unittest.TestCase): class TestOptimizer(unittest.TestCase):
def test_sgd_optimizer(self): def test_sgd_optimizer(self):
init_program = framework.Program()
program = framework.Program() program = framework.Program()
block = program.global_block() block = program.global_block()
mul_x = block.create_parameter( mul_x = block.create_parameter(
...@@ -22,12 +23,13 @@ class TestOptimizer(unittest.TestCase): ...@@ -22,12 +23,13 @@ class TestOptimizer(unittest.TestCase):
outputs={"Out": mul_out}, outputs={"Out": mul_out},
attrs={"x_num_col_dims": 1}) attrs={"x_num_col_dims": 1})
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.01) sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.01)
opts = sgd_optimizer.minimize(mul_out) opts = sgd_optimizer.minimize(mul_out, init_program)
self.assertEqual(len(opts), 1) self.assertEqual(len(opts), 1)
sgd_op = opts[0] sgd_op = opts[0]
self.assertEqual(sgd_op.type, "sgd") self.assertEqual(sgd_op.type, "sgd")
def test_sgd_optimizer_with_global_step(self): def test_sgd_optimizer_with_global_step(self):
init_program = framework.Program()
program = framework.Program() program = framework.Program()
block = program.global_block() block = program.global_block()
mul_x = block.create_parameter( mul_x = block.create_parameter(
...@@ -44,15 +46,22 @@ class TestOptimizer(unittest.TestCase): ...@@ -44,15 +46,22 @@ class TestOptimizer(unittest.TestCase):
attrs={"x_num_col_dims": 1}) attrs={"x_num_col_dims": 1})
global_step = block.create_var( global_step = block.create_var(
dtype="float32", shape=[1], lod_level=0, name="step") dtype="float32", shape=[1], lod_level=0, name="step")
learning_rate = 0.01
sgd_optimizer = optimizer.SGDOptimizer( sgd_optimizer = optimizer.SGDOptimizer(
learning_rate=0.01, global_step=global_step) learning_rate=learning_rate, global_step=global_step)
opts = sgd_optimizer.minimize(mul_out) opts = sgd_optimizer.minimize(mul_out, init_program)
self.assertEqual(len(opts), 2) self.assertEqual(len(opts), 2)
sgd_op = opts[0] sgd_op = opts[0]
self.assertEqual(sgd_op.type, "sgd") self.assertEqual(sgd_op.type, "sgd")
increment_op = opts[1] increment_op = opts[1]
self.assertEqual(increment_op.type, "increment") self.assertEqual(increment_op.type, "increment")
# Check init_program
init_ops = init_program.global_block().ops
self.assertEqual(len(init_ops), 1)
self.assertEqual(init_ops[0].type, "fill_constant")
self.assertAlmostEqual(init_ops[0].attr('value'), learning_rate)
class TestMomentumOptimizer(unittest.TestCase): class TestMomentumOptimizer(unittest.TestCase):
class MockMomentum(optimizer.MomentumOptimizer): class MockMomentum(optimizer.MomentumOptimizer):
...@@ -63,6 +72,7 @@ class TestMomentumOptimizer(unittest.TestCase): ...@@ -63,6 +72,7 @@ class TestMomentumOptimizer(unittest.TestCase):
return self._velocity_acc_str return self._velocity_acc_str
def test_vanilla_momentum_optimizer(self): def test_vanilla_momentum_optimizer(self):
init_program = framework.Program()
program = framework.Program() program = framework.Program()
block = program.global_block() block = program.global_block()
mul_x = block.create_parameter( mul_x = block.create_parameter(
...@@ -77,12 +87,14 @@ class TestMomentumOptimizer(unittest.TestCase): ...@@ -77,12 +87,14 @@ class TestMomentumOptimizer(unittest.TestCase):
"Y": mul_y}, "Y": mul_y},
outputs={"Out": mul_out}, outputs={"Out": mul_out},
attrs={"x_num_col_dims": 1}) attrs={"x_num_col_dims": 1})
momentum_optimizer = self.MockMomentum(learning_rate=0.01, momentum=0.2) learning_rate = 0.01
momentum_optimizer = self.MockMomentum(
learning_rate=learning_rate, momentum=0.2)
params_grads = append_backward_ops(mul_out) params_grads = append_backward_ops(mul_out)
self.assertEqual(len(params_grads), 1) self.assertEqual(len(params_grads), 1)
self.assertEqual(len(momentum_optimizer.get_accumulators()), 0) self.assertEqual(len(momentum_optimizer.get_accumulators()), 0)
opts = momentum_optimizer.create_optimization_pass(params_grads, opts = momentum_optimizer.create_optimization_pass(
mul_out) params_grads, mul_out, init_program)
self.assertEqual(len(opts), 1) self.assertEqual(len(opts), 1)
sgd_op = opts[0] sgd_op = opts[0]
self.assertEqual(sgd_op.type, "momentum") self.assertEqual(sgd_op.type, "momentum")
...@@ -96,7 +108,16 @@ class TestMomentumOptimizer(unittest.TestCase): ...@@ -96,7 +108,16 @@ class TestMomentumOptimizer(unittest.TestCase):
self.assertEqual(len(velocity_acc), 1) self.assertEqual(len(velocity_acc), 1)
self.assertTrue(mul_x.name in velocity_acc) self.assertTrue(mul_x.name in velocity_acc)
# Check init_program
init_ops = init_program.global_block().ops
self.assertEqual(len(init_ops), 2)
self.assertEqual(init_ops[0].type, "fill_constant")
self.assertAlmostEqual(init_ops[0].attr('value'), learning_rate)
self.assertEqual(init_ops[1].type, "fill_constant")
self.assertAlmostEqual(init_ops[1].attr('value'), 0.0)
def test_nesterov_momentum_optimizer(self): def test_nesterov_momentum_optimizer(self):
init_program = framework.Program()
program = framework.Program() program = framework.Program()
block = program.global_block() block = program.global_block()
mul_x = block.create_parameter( mul_x = block.create_parameter(
...@@ -111,13 +132,14 @@ class TestMomentumOptimizer(unittest.TestCase): ...@@ -111,13 +132,14 @@ class TestMomentumOptimizer(unittest.TestCase):
"Y": mul_y}, "Y": mul_y},
outputs={"Out": mul_out}, outputs={"Out": mul_out},
attrs={"x_num_col_dims": 1}) attrs={"x_num_col_dims": 1})
learning_rate = 0.01
momentum_optimizer = self.MockMomentum( momentum_optimizer = self.MockMomentum(
learning_rate=0.01, momentum=0.2, use_nesterov=True) learning_rate=learning_rate, momentum=0.2, use_nesterov=True)
params_grads = append_backward_ops(mul_out) params_grads = append_backward_ops(mul_out)
self.assertEqual(len(params_grads), 1) self.assertEqual(len(params_grads), 1)
self.assertEqual(len(momentum_optimizer.get_accumulators()), 0) self.assertEqual(len(momentum_optimizer.get_accumulators()), 0)
opts = momentum_optimizer.create_optimization_pass(params_grads, opts = momentum_optimizer.create_optimization_pass(
mul_out) params_grads, mul_out, init_program)
self.assertEqual(len(opts), 1) self.assertEqual(len(opts), 1)
sgd_op = opts[0] sgd_op = opts[0]
self.assertEqual(sgd_op.type, "momentum") self.assertEqual(sgd_op.type, "momentum")
...@@ -131,6 +153,14 @@ class TestMomentumOptimizer(unittest.TestCase): ...@@ -131,6 +153,14 @@ class TestMomentumOptimizer(unittest.TestCase):
self.assertEqual(len(velocity_acc), 1) self.assertEqual(len(velocity_acc), 1)
self.assertTrue(mul_x.name in velocity_acc) self.assertTrue(mul_x.name in velocity_acc)
# Check init_program
init_ops = init_program.global_block().ops
self.assertEqual(len(init_ops), 2)
self.assertEqual(init_ops[0].type, "fill_constant")
self.assertAlmostEqual(init_ops[0].attr('value'), learning_rate)
self.assertEqual(init_ops[1].type, "fill_constant")
self.assertAlmostEqual(init_ops[1].attr('value'), 0.0)
class TestAdagradOptimizer(unittest.TestCase): class TestAdagradOptimizer(unittest.TestCase):
class MockAdagrad(optimizer.AdagradOptimizer): class MockAdagrad(optimizer.AdagradOptimizer):
...@@ -141,6 +171,7 @@ class TestAdagradOptimizer(unittest.TestCase): ...@@ -141,6 +171,7 @@ class TestAdagradOptimizer(unittest.TestCase):
return self._moment_acc_str return self._moment_acc_str
def test_adagrad_optimizer(self): def test_adagrad_optimizer(self):
init_program = framework.Program()
program = framework.Program() program = framework.Program()
block = program.global_block() block = program.global_block()
mul_x = block.create_parameter( mul_x = block.create_parameter(
...@@ -155,11 +186,14 @@ class TestAdagradOptimizer(unittest.TestCase): ...@@ -155,11 +186,14 @@ class TestAdagradOptimizer(unittest.TestCase):
"Y": mul_y}, "Y": mul_y},
outputs={"Out": mul_out}, outputs={"Out": mul_out},
attrs={"x_num_col_dims": 1}) attrs={"x_num_col_dims": 1})
adagrad_optimizer = self.MockAdagrad(learning_rate=0.01, epsilon=1.0e-6) learning_rate = 0.01
adagrad_optimizer = self.MockAdagrad(
learning_rate=learning_rate, epsilon=1.0e-6)
params_grads = append_backward_ops(mul_out) params_grads = append_backward_ops(mul_out)
self.assertEqual(len(params_grads), 1) self.assertEqual(len(params_grads), 1)
self.assertEqual(len(adagrad_optimizer.get_accumulators()), 0) self.assertEqual(len(adagrad_optimizer.get_accumulators()), 0)
opts = adagrad_optimizer.create_optimization_pass(params_grads, mul_out) opts = adagrad_optimizer.create_optimization_pass(params_grads, mul_out,
init_program)
self.assertEqual(len(opts), 1) self.assertEqual(len(opts), 1)
adagrad_op = opts[0] adagrad_op = opts[0]
self.assertEqual(adagrad_op.type, "adagrad") self.assertEqual(adagrad_op.type, "adagrad")
...@@ -172,6 +206,14 @@ class TestAdagradOptimizer(unittest.TestCase): ...@@ -172,6 +206,14 @@ class TestAdagradOptimizer(unittest.TestCase):
self.assertEqual(len(moment_acc), 1) self.assertEqual(len(moment_acc), 1)
self.assertTrue(mul_x.name in moment_acc) self.assertTrue(mul_x.name in moment_acc)
# Check init_program
init_ops = init_program.global_block().ops
self.assertEqual(len(init_ops), 2)
self.assertEqual(init_ops[0].type, "fill_constant")
self.assertAlmostEqual(init_ops[0].attr('value'), learning_rate)
self.assertEqual(init_ops[1].type, "fill_constant")
self.assertAlmostEqual(init_ops[1].attr('value'), 0.0)
class TestAdamOptimizer(unittest.TestCase): class TestAdamOptimizer(unittest.TestCase):
class MockAdam(optimizer.AdamOptimizer): class MockAdam(optimizer.AdamOptimizer):
...@@ -185,6 +227,7 @@ class TestAdamOptimizer(unittest.TestCase): ...@@ -185,6 +227,7 @@ class TestAdamOptimizer(unittest.TestCase):
return self._moment2_acc_str return self._moment2_acc_str
def test_adam_optimizer(self): def test_adam_optimizer(self):
init_program = framework.Program()
program = framework.Program() program = framework.Program()
block = program.global_block() block = program.global_block()
mul_x = block.create_parameter( mul_x = block.create_parameter(
...@@ -199,12 +242,14 @@ class TestAdamOptimizer(unittest.TestCase): ...@@ -199,12 +242,14 @@ class TestAdamOptimizer(unittest.TestCase):
"Y": mul_y}, "Y": mul_y},
outputs={"Out": mul_out}, outputs={"Out": mul_out},
attrs={"x_num_col_dims": 1}) attrs={"x_num_col_dims": 1})
learning_rate = 0.01
adam_optimizer = self.MockAdam( adam_optimizer = self.MockAdam(
learning_rate=0.01, beta1=0.9, beta2=0.999) learning_rate=learning_rate, beta1=0.9, beta2=0.999)
params_grads = append_backward_ops(mul_out) params_grads = append_backward_ops(mul_out)
self.assertEqual(len(params_grads), 1) self.assertEqual(len(params_grads), 1)
self.assertEqual(len(adam_optimizer.get_accumulators()), 0) self.assertEqual(len(adam_optimizer.get_accumulators()), 0)
opts = adam_optimizer.create_optimization_pass(params_grads, mul_out) opts = adam_optimizer.create_optimization_pass(params_grads, mul_out,
init_program)
self.assertEqual(len(opts), 3) self.assertEqual(len(opts), 3)
adam_op = opts[0] adam_op = opts[0]
self.assertEqual(adam_op.type, "adam") self.assertEqual(adam_op.type, "adam")
...@@ -221,6 +266,12 @@ class TestAdamOptimizer(unittest.TestCase): ...@@ -221,6 +266,12 @@ class TestAdamOptimizer(unittest.TestCase):
self.assertTrue(mul_x.name in moment1_acc) self.assertTrue(mul_x.name in moment1_acc)
self.assertTrue(mul_x.name in moment2_acc) self.assertTrue(mul_x.name in moment2_acc)
# Check init_program
init_ops = init_program.global_block().ops
self.assertEqual(len(init_ops), 5)
self.assertEqual(init_ops[0].type, "fill_constant")
self.assertAlmostEqual(init_ops[0].attr('value'), learning_rate)
class TestAdamaxOptimizer(unittest.TestCase): class TestAdamaxOptimizer(unittest.TestCase):
class MockAdamax(optimizer.AdamaxOptimizer): class MockAdamax(optimizer.AdamaxOptimizer):
...@@ -234,6 +285,7 @@ class TestAdamaxOptimizer(unittest.TestCase): ...@@ -234,6 +285,7 @@ class TestAdamaxOptimizer(unittest.TestCase):
return self._inf_norm_acc_str return self._inf_norm_acc_str
def test_adamax_optimizer(self): def test_adamax_optimizer(self):
init_program = framework.Program()
program = framework.Program() program = framework.Program()
block = program.global_block() block = program.global_block()
mul_x = block.create_parameter( mul_x = block.create_parameter(
...@@ -248,12 +300,14 @@ class TestAdamaxOptimizer(unittest.TestCase): ...@@ -248,12 +300,14 @@ class TestAdamaxOptimizer(unittest.TestCase):
"Y": mul_y}, "Y": mul_y},
outputs={"Out": mul_out}, outputs={"Out": mul_out},
attrs={"x_num_col_dims": 1}) attrs={"x_num_col_dims": 1})
learning_rate = 0.01
adamax_optimizer = self.MockAdamax( adamax_optimizer = self.MockAdamax(
learning_rate=0.01, beta1=0.9, beta2=0.999) learning_rate=learning_rate, beta1=0.9, beta2=0.999)
params_grads = append_backward_ops(mul_out) params_grads = append_backward_ops(mul_out)
self.assertEqual(len(params_grads), 1) self.assertEqual(len(params_grads), 1)
self.assertEqual(len(adamax_optimizer.get_accumulators()), 0) self.assertEqual(len(adamax_optimizer.get_accumulators()), 0)
opts = adamax_optimizer.create_optimization_pass(params_grads, mul_out) opts = adamax_optimizer.create_optimization_pass(params_grads, mul_out,
init_program)
self.assertEqual(len(opts), 2) self.assertEqual(len(opts), 2)
adam_op = opts[0] adam_op = opts[0]
self.assertEqual(adam_op.type, "adamax") self.assertEqual(adam_op.type, "adamax")
...@@ -270,6 +324,12 @@ class TestAdamaxOptimizer(unittest.TestCase): ...@@ -270,6 +324,12 @@ class TestAdamaxOptimizer(unittest.TestCase):
self.assertTrue(mul_x.name in moment_acc) self.assertTrue(mul_x.name in moment_acc)
self.assertTrue(mul_x.name in inf_norm_acc) self.assertTrue(mul_x.name in inf_norm_acc)
# Check init_program
init_ops = init_program.global_block().ops
self.assertEqual(len(init_ops), 4)
self.assertEqual(init_ops[0].type, "fill_constant")
self.assertAlmostEqual(init_ops[0].attr('value'), learning_rate)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
import unittest
import numpy as np
from op_test import OpTest
def calc_precision(tp_count, fp_count):
if tp_count > 0.0 or fp_count > 0.0:
return tp_count / (tp_count + fp_count)
return 1.0
def calc_recall(tp_count, fn_count):
if tp_count > 0.0 or fn_count > 0.0:
return tp_count / (tp_count + fn_count)
return 1.0
def calc_f1_score(precision, recall):
if precision > 0.0 or recall > 0.0:
return 2 * precision * recall / (precision + recall)
return 0.0
def get_states(idxs, labels, cls_num, weights=None):
ins_num = idxs.shape[0]
# TP FP TN FN
states = np.zeros((cls_num, 4)).astype('float32')
for i in xrange(ins_num):
w = weights[i] if weights is not None else 1.0
idx = idxs[i][0]
label = labels[i][0]
if idx == label:
states[idx][0] += w
for j in xrange(cls_num):
states[j][2] += w
states[idx][2] -= w
else:
states[label][3] += w
states[idx][1] += w
for j in xrange(cls_num):
states[j][2] += w
states[label][2] -= w
states[idx][2] -= w
return states
def compute_metrics(states, cls_num):
total_tp_count = 0.0
total_fp_count = 0.0
total_fn_count = 0.0
macro_avg_precision = 0.0
macro_avg_recall = 0.0
for i in xrange(cls_num):
total_tp_count += states[i][0]
total_fp_count += states[i][1]
total_fn_count += states[i][3]
macro_avg_precision += calc_precision(states[i][0], states[i][1])
macro_avg_recall += calc_recall(states[i][0], states[i][3])
metrics = []
macro_avg_precision /= cls_num
macro_avg_recall /= cls_num
metrics.append(macro_avg_precision)
metrics.append(macro_avg_recall)
metrics.append(calc_f1_score(macro_avg_precision, macro_avg_recall))
micro_avg_precision = calc_precision(total_tp_count, total_fp_count)
metrics.append(micro_avg_precision)
micro_avg_recall = calc_recall(total_tp_count, total_fn_count)
metrics.append(micro_avg_recall)
metrics.append(calc_f1_score(micro_avg_precision, micro_avg_recall))
return np.array(metrics).astype('float32')
class TestPrecisionRecallOp_0(OpTest):
def setUp(self):
self.op_type = "precision_recall"
ins_num = 64
cls_num = 10
max_probs = np.random.uniform(0, 1.0, (ins_num, 1)).astype('float32')
idxs = np.random.choice(xrange(cls_num), ins_num).reshape(
(ins_num, 1)).astype('int32')
labels = np.random.choice(xrange(cls_num), ins_num).reshape(
(ins_num, 1)).astype('int32')
states = get_states(idxs, labels, cls_num)
metrics = compute_metrics(states, cls_num)
self.attrs = {'class_number': cls_num}
self.inputs = {'MaxProbs': max_probs, 'Indices': idxs, 'Labels': labels}
self.outputs = {
'BatchMetrics': metrics,
'AccumMetrics': metrics,
'AccumStatesInfo': states
}
def test_check_output(self):
self.check_output()
class TestPrecisionRecallOp_1(OpTest):
def setUp(self):
self.op_type = "precision_recall"
ins_num = 64
cls_num = 10
max_probs = np.random.uniform(0, 1.0, (ins_num, 1)).astype('float32')
idxs = np.random.choice(xrange(cls_num), ins_num).reshape(
(ins_num, 1)).astype('int32')
weights = np.random.uniform(0, 1.0, (ins_num, 1)).astype('float32')
labels = np.random.choice(xrange(cls_num), ins_num).reshape(
(ins_num, 1)).astype('int32')
states = get_states(idxs, labels, cls_num, weights)
metrics = compute_metrics(states, cls_num)
self.attrs = {'class_number': cls_num}
self.inputs = {
'MaxProbs': max_probs,
'Indices': idxs,
'Labels': labels,
'Weights': weights
}
self.outputs = {
'BatchMetrics': metrics,
'AccumMetrics': metrics,
'AccumStatesInfo': states
}
def test_check_output(self):
self.check_output()
class TestPrecisionRecallOp_2(OpTest):
def setUp(self):
self.op_type = "precision_recall"
ins_num = 64
cls_num = 10
max_probs = np.random.uniform(0, 1.0, (ins_num, 1)).astype('float32')
idxs = np.random.choice(xrange(cls_num), ins_num).reshape(
(ins_num, 1)).astype('int32')
weights = np.random.uniform(0, 1.0, (ins_num, 1)).astype('float32')
labels = np.random.choice(xrange(cls_num), ins_num).reshape(
(ins_num, 1)).astype('int32')
states = np.random.randint(0, 30, (cls_num, 4)).astype('float32')
accum_states = get_states(idxs, labels, cls_num, weights)
batch_metrics = compute_metrics(accum_states, cls_num)
accum_states += states
accum_metrics = compute_metrics(accum_states, cls_num)
self.attrs = {'class_number': cls_num}
self.inputs = {
'MaxProbs': max_probs,
'Indices': idxs,
'Labels': labels,
'Weights': weights,
'StatesInfo': states
}
self.outputs = {
'BatchMetrics': batch_metrics,
'AccumMetrics': accum_metrics,
'AccumStatesInfo': accum_states
}
def test_check_output(self):
self.check_output()
if __name__ == '__main__':
unittest.main()
...@@ -54,8 +54,10 @@ avg_cost = layers.mean(x=cost, program=program) ...@@ -54,8 +54,10 @@ avg_cost = layers.mean(x=cost, program=program)
accuracy = layers.accuracy( accuracy = layers.accuracy(
input=predict, label=label, program=program, init_program=init_program) input=predict, label=label, program=program, init_program=init_program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001) # optimizer = optimizer.MomentumOptimizer(learning_rate=0.1 / 128.0,
opts = sgd_optimizer.minimize(avg_cost) # momentum=0.9)
optimizer = optimizer.AdamOptimizer(learning_rate=0.01, beta1=0.9, beta2=0.999)
opts = optimizer.minimize(avg_cost, init_program)
BATCH_SIZE = 50 BATCH_SIZE = 50
PASS_NUM = 3 PASS_NUM = 3
......
...@@ -58,8 +58,8 @@ cost = layers.cross_entropy( ...@@ -58,8 +58,8 @@ cost = layers.cross_entropy(
input=predict, label=label, program=program, init_program=init_program) input=predict, label=label, program=program, init_program=init_program)
avg_cost = layers.mean(x=cost, program=program, init_program=init_program) avg_cost = layers.mean(x=cost, program=program, init_program=init_program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001) optimizer = optimizer.MomentumOptimizer(learning_rate=0.001, momentum=0.9)
opts = sgd_optimizer.minimize(avg_cost) opts = optimizer.minimize(avg_cost, init_program)
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.reader.shuffle( paddle.reader.shuffle(
...@@ -89,6 +89,7 @@ for pass_id in range(PASS_NUM): ...@@ -89,6 +89,7 @@ for pass_id in range(PASS_NUM):
'y': tensor_y}, 'y': tensor_y},
fetch_list=[avg_cost]) fetch_list=[avg_cost])
out = np.array(outs[0]) out = np.array(outs[0])
if out[0] < 5.0: if out[0] < 5.0:
exit(0) # if avg cost less than 5.0, we think our code is good. exit(0) # if avg cost less than 5.0, we think our code is good.
exit(1) exit(1)
import paddle.v2 as paddle
import paddle.v2.framework.layers as layers
import paddle.v2.framework.nets as nets
import paddle.v2.framework.core as core
import paddle.v2.framework.optimizer as optimizer
from paddle.v2.framework.framework import Program, g_program
from paddle.v2.framework.executor import Executor
import numpy as np
init_program = Program()
program = Program()
is_sparse = True
use_gpu = False
BATCH_SIZE = 256
def get_usr_combined_features():
# FIXME(dzh) : old API integer_value(10) may has range check.
# currently we don't have user configurated check.
USR_DICT_SIZE = paddle.dataset.movielens.max_user_id() + 1
uid = layers.data(
name='user_id',
shape=[1],
data_type='int64',
program=program,
init_program=init_program)
usr_emb = layers.embedding(
input=uid,
data_type='float32',
size=[USR_DICT_SIZE, 32],
param_attr={'name': 'user_table'},
is_sparse=is_sparse,
program=program,
init_program=init_program)
usr_fc = layers.fc(input=usr_emb,
size=32,
program=program,
init_program=init_program)
USR_GENDER_DICT_SIZE = 2
usr_gender_id = layers.data(
name='gender_id',
shape=[1],
data_type='int64',
program=program,
init_program=init_program)
usr_gender_emb = layers.embedding(
input=usr_gender_id,
size=[USR_GENDER_DICT_SIZE, 16],
param_attr={'name': 'gender_table'},
is_sparse=is_sparse,
program=program,
init_program=init_program)
usr_gender_fc = layers.fc(input=usr_gender_emb,
size=16,
program=program,
init_program=init_program)
USR_AGE_DICT_SIZE = len(paddle.dataset.movielens.age_table)
usr_age_id = layers.data(
name='age_id',
shape=[1],
data_type="int64",
program=program,
init_program=init_program)
usr_age_emb = layers.embedding(
input=usr_age_id,
size=[USR_AGE_DICT_SIZE, 16],
is_sparse=is_sparse,
param_attr={'name': 'age_table'},
program=program,
init_program=init_program)
usr_age_fc = layers.fc(input=usr_age_emb,
size=16,
program=program,
init_program=init_program)
USR_JOB_DICT_SIZE = paddle.dataset.movielens.max_job_id() + 1
usr_job_id = layers.data(
name='job_id',
shape=[1],
data_type="int64",
program=program,
init_program=init_program)
usr_job_emb = layers.embedding(
input=usr_job_id,
size=[USR_JOB_DICT_SIZE, 16],
param_attr={'name': 'job_table'},
is_sparse=is_sparse,
program=program,
init_program=init_program)
usr_job_fc = layers.fc(input=usr_job_emb,
size=16,
program=program,
init_program=init_program)
concat_embed = layers.concat(
input=[usr_fc, usr_gender_fc, usr_age_fc, usr_job_fc],
axis=1,
program=program,
init_program=init_program)
usr_combined_features = layers.fc(input=concat_embed,
size=200,
act="tanh",
program=program,
init_program=init_program)
return usr_combined_features
def get_mov_combined_features():
MOV_DICT_SIZE = paddle.dataset.movielens.max_movie_id() + 1
mov_id = layers.data(
name='movie_id',
shape=[1],
data_type='int64',
program=program,
init_program=init_program)
mov_emb = layers.embedding(
input=mov_id,
data_type='float32',
size=[MOV_DICT_SIZE, 32],
param_attr={'name': 'movie_table'},
is_sparse=is_sparse,
program=program,
init_program=init_program)
mov_fc = layers.fc(input=mov_emb,
size=32,
program=program,
init_program=init_program)
CATEGORY_DICT_SIZE = len(paddle.dataset.movielens.movie_categories())
category_id = layers.data(
name='category_id',
shape=[1],
data_type='int64',
program=program,
init_program=init_program)
mov_categories_emb = layers.embedding(
input=category_id,
size=[CATEGORY_DICT_SIZE, 32],
is_sparse=is_sparse,
program=program,
init_program=init_program)
mov_categories_hidden = layers.sequence_pool(
input=mov_categories_emb,
pool_type="sum",
program=program,
init_program=init_program)
MOV_TITLE_DICT_SIZE = len(paddle.dataset.movielens.get_movie_title_dict())
mov_title_id = layers.data(
name='movie_title',
shape=[1],
data_type='int64',
program=program,
init_program=init_program)
mov_title_emb = layers.embedding(
input=mov_title_id,
size=[MOV_TITLE_DICT_SIZE, 32],
is_sparse=is_sparse,
program=program,
init_program=init_program)
mov_title_conv = nets.sequence_conv_pool(
input=mov_title_emb,
num_filters=32,
filter_size=3,
act="tanh",
pool_type="sum",
program=program,
init_program=init_program)
concat_embed = layers.concat(
input=[mov_fc, mov_categories_hidden, mov_title_conv],
axis=1,
program=program,
init_program=init_program)
# FIXME(dzh) : need tanh operator
mov_combined_features = layers.fc(input=concat_embed,
size=200,
act="tanh",
program=program,
init_program=init_program)
return mov_combined_features
def model():
usr_combined_features = get_usr_combined_features()
mov_combined_features = get_mov_combined_features()
# need cos sim
inference = layers.cos_sim(
X=usr_combined_features,
Y=mov_combined_features,
program=program,
init_program=init_program)
label = layers.data(
name='score',
shape=[1],
data_type='float32',
program=program,
init_program=init_program)
square_cost = layers.square_error_cost(
input=inference,
label=label,
program=program,
init_program=init_program)
avg_cost = layers.mean(
x=square_cost, program=program, init_program=init_program)
return avg_cost
def main():
cost = model()
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.2)
opts = sgd_optimizer.minimize(cost, init_program=init_program)
block = program.block(0)
if use_gpu:
place = core.GPUPlace(0)
else:
place = core.CPUPlace()
exe = Executor(place)
exe.run(init_program, feed={}, fetch_list=[])
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.movielens.train(), buf_size=8192),
batch_size=BATCH_SIZE)
feeding = {
'user_id': 0,
'gender_id': 1,
'age_id': 2,
'job_id': 3,
'movie_id': 4,
'category_id': 5,
'movie_title': 6,
'score': 7
}
def func_feed(feeding, data):
feed_tensors = {}
for (key, idx) in feeding.iteritems():
tensor = core.LoDTensor()
if key != "category_id" and key != "movie_title":
if key == "score":
numpy_data = np.array(map(lambda x: x[idx], data)).astype(
"float32")
else:
numpy_data = np.array(map(lambda x: x[idx], data)).astype(
"int64")
else:
numpy_data = map(lambda x: np.array(x[idx]).astype("int64"),
data)
lod_info = [len(item) for item in numpy_data]
offset = 0
lod = [offset]
for item in lod_info:
offset += item
lod.append(offset)
numpy_data = np.concatenate(numpy_data, axis=0)
tensor.set_lod([lod])
numpy_data = numpy_data.reshape([numpy_data.shape[0], 1])
tensor.set(numpy_data, place)
feed_tensors[key] = tensor
return feed_tensors
PASS_NUM = 100
for pass_id in range(PASS_NUM):
for data in train_reader():
outs = exe.run(program,
feed=func_feed(feeding, data),
fetch_list=[cost])
out = np.array(outs[0])
if out[0] < 6.0:
# if avg cost less than 6.0, we think our code is good.
exit(0)
main()
import logging
import paddle.v2.framework.core as core
import unittest import unittest
import numpy as np
from paddle.v2.framework.op import Operator, RecurrentOp
from op_test import get_numeric_gradient
def py_sigmoid(x): import logging
return 1. / (1. + np.exp(-x))
from op_test import get_numeric_gradient
from paddle.v2.framework.layers import *
from paddle.v2.framework.framework import Program
from paddle.v2.framework.executor import Executor
from paddle.v2.framework.backward import append_backward_ops
import numpy as np
import paddle.v2.framework.core as core
class PySimpleRNN(object):
'''
A simple implementation of RNN based on numpy, to futhur test RecurrentOp's alogorithm
'''
def __init__(self, input_dim=30, batch_size=50, weight_dim=15, sent_len=11): class PyRNNBase(object):
self.x = np.random.normal(size=(sent_len, batch_size, def __init__(self, input_shape, output_shape):
input_dim)).astype("float32") self.x = np.ones(shape=input_shape).astype("float32")
self.W = np.random.normal(size=(input_dim, input_dim)).astype("float32") self.y = np.zeros(shape=output_shape).astype("float32")
self.U = np.random.normal(size=(input_dim, input_dim)).astype("float32")
self.h_boot = np.random.normal(size=(batch_size,
input_dim)).astype("float32")
# memories def step(self):
self.mems = [ pass
np.zeros(shape=(batch_size, input_dim)).astype("float32")
for i in range(sent_len)
]
def forward(self): def forward(self):
xs = self.segment_inputs()
for step_id in range(self.x.shape[0]): for step_id in range(self.x.shape[0]):
self.step(step_id, xs[step_id]) self.step(step_id, self.x[step_id])
return self.concat_outputs() return np.array([np.mean(self.y)])
def segment_inputs(self): def segment_inputs(self):
return [self.x[i] for i in range(self.x.shape[0])] return [self.x[i] for i in range(self.x.shape[0])]
def concat_outputs(self):
return np.array(self.mems).astype("float32") class PySimpleRNN1(PyRNNBase):
def __init__(self, input_shape, output_shape):
super(PySimpleRNN1, self).__init__(input_shape, output_shape)
seq_len, batch_size, input_dim = input_shape
self.h_boot = np.random.normal(size=(batch_size,
input_dim)).astype("float32")
self.scale = 1.0 / 2.0
men_dim = (seq_len, batch_size, input_dim)
self.mems = np.zeros(shape=men_dim).astype("float32")
def step(self, step_id, x):
if step_id == 0:
pre_mem = self.h_boot
else:
pre_mem = self.mems[step_id - 1]
self.mems[step_id] = (pre_mem + x) * self.scale
self.y[step_id] = self.mems[step_id]
class PySimpleRNN2(PyRNNBase):
def __init__(self, input_shape, output_shape):
super(PySimpleRNN2, self).__init__(input_shape, output_shape)
seq_len, batch_size, input_dim = input_shape
self.W = np.random.normal(size=(input_dim, input_dim)).astype("float32")
self.U = np.random.normal(size=(input_dim, input_dim)).astype("float32")
self.h_boot = np.ones(shape=(batch_size, input_dim)).astype("float32")
men_dim = (seq_len, batch_size, input_dim)
self.mems = np.zeros(shape=men_dim).astype("float32")
def step(self, step_id, x): def step(self, step_id, x):
'''
run a step
'''
mem = self.mems[step_id]
if step_id > 0: if step_id > 0:
pre_mem = self.mems[step_id - 1] pre_mem = self.mems[step_id - 1]
else: else:
...@@ -53,108 +69,124 @@ class PySimpleRNN(object): ...@@ -53,108 +69,124 @@ class PySimpleRNN(object):
xW = np.matmul(x, self.W).astype("float32") xW = np.matmul(x, self.W).astype("float32")
hU = np.matmul(pre_mem, self.U).astype("float32") hU = np.matmul(pre_mem, self.U).astype("float32")
sum = xW + hU def py_sigmoid(x):
self.mems[step_id] = py_sigmoid(sum) return 1. / (1. + np.exp(-x))
class PySimpleRNNTest(unittest.TestCase): self.mems[step_id] = py_sigmoid(xW + hU)
def setUp(self): self.y[step_id] = self.mems[step_id]
self.rnn = PySimpleRNN()
def test_forward(self):
output = self.rnn.forward()
def create_tensor(scope, name, shape, np_data): def create_tensor(np_data, place):
tensor = scope.var(name).get_tensor() tensor = core.LoDTensor()
tensor.set_dims(shape) tensor.set(np_data, place)
tensor.set(np_data, core.CPUPlace())
return tensor return tensor
class RecurrentOpTest(unittest.TestCase): class RecurrentOpTest1(unittest.TestCase):
''' '''
Test RNNOp Test RNNOp
equation: equation:
h_t = \sigma (W x_t + U h_{t-1}) h_t = ( x_t + h_{t-1} ) / scale
weights:
- W
- U
vars: vars:
- x - x
memories: memories:
- h - h
outputs: outputs:
- h - h
''' '''
input_dim = 30 input_dim = 2
batch_size = 50 batch_size = 1
weight_dim = 15 sent_len = 1
sent_len = 11
def init_program(self):
self.program = Program()
self.init_program = Program()
self.p_info = {
"program": self.program,
"init_program": self.init_program
}
self.place = core.CPUPlace()
def setUp(self): def setUp(self):
self.py_rnn = PySimpleRNN(self.input_dim, self.batch_size, self.init_program()
self.weight_dim, self.sent_len) self.data_field = {"x", "h_boot"}
def forward(self): self.input_shape = (self.sent_len, self.batch_size, self.input_dim)
self.scope = core.Scope() self.output_shape = (self.sent_len, self.batch_size, self.input_dim)
self.create_global_variables() self.py_rnn = PySimpleRNN1(self.input_shape, self.output_shape)
self.create_rnn_op()
self.create_step_net() self.output = mean(x=self.create_rnn_op(), **self.p_info)
ctx = core.DeviceContext.create(core.CPUPlace())
self.rnnop.run(self.scope, ctx)
return np.array(self.scope.find_var("h@mem").get_tensor()).astype(
"float32")
def create_global_variables(self):
# create inlink
x_np_data = self.py_rnn.x
create_tensor(self.scope, "x",
[self.sent_len, self.batch_size, self.input_dim],
x_np_data)
W_np_data = self.py_rnn.W
create_tensor(self.scope, "W", [self.input_dim, self.input_dim],
W_np_data)
U_np_data = self.py_rnn.U
create_tensor(self.scope, "U", [self.input_dim, self.input_dim],
U_np_data)
h_boot_np_data = self.py_rnn.h_boot
create_tensor(self.scope, "h_boot", [self.batch_size, self.input_dim],
h_boot_np_data)
self.scope.var("step_scopes")
self.scope.var("h@mem")
def create_rnn_op(self): def create_rnn_op(self):
# create RNNOp x = data(
self.rnnop = RecurrentOp( shape=[self.sent_len, self.batch_size, self.input_dim],
# inputs data_type='float32',
inputs=["x"], name='x',
initial_states=["h_boot"], append_batch_size=False,
step_net="stepnet", **self.p_info)
# outputs h_boot = data(
outputs=["h@mem"], shape=[self.input_dim],
step_scopes="step_scopes", data_type='float32',
# attributes name='h_boot',
ex_states=["h@pre"], **self.p_info)
states=["h@mem"])
rnn = StaticRNN(program=self.program)
def create_step_net(self): with rnn.step():
stepnet = core.Net.create() h_pre = rnn.memory(init=h_boot)
x_fc_op = Operator("mul", X="x", Y="W", Out="Wx") x_t = rnn.step_input(x)
h_fc_op = Operator("mul", X="h@pre", Y="U", Out="Uh")
sum_op = Operator("sum", X=["Wx", "Uh"], Out="sum") h = scale(
sig_op = Operator("sigmoid", X="sum", Y="h@mem") x=elementwise_add(
x=h_pre, y=x_t, **self.p_info),
for op in [x_fc_op, h_fc_op, sum_op, sig_op]: scale=self.py_rnn.scale,
stepnet.append_op(op) **self.p_info)
stepnet.complete_add_op(True)
self.rnnop.set_stepnet(stepnet) rnn.update_memory(h_pre, h)
rnn.output(h)
def test_forward(self):
return rnn()
def forward(self):
self.feed_map = {
x: create_tensor(getattr(self.py_rnn, x), self.place)
for x in self.data_field
}
exe = Executor(self.place)
out = exe.run(self.program,
feed=self.feed_map,
fetch_list=[self.output])
return np.array(out[0])
def backward(self):
self.feed_map = {
x: create_tensor(getattr(self.py_rnn, x), self.place)
for x in self.data_field
}
fetch_list = [
self.program.global_block().var(x + "@GRAD")
for x in self.data_field
]
exe = Executor(self.place)
return exe.run(self.program, feed=self.feed_map, fetch_list=fetch_list)
def test_backward(self):
self.check_forward()
append_backward_ops(self.output)
ana_grad = [np.array(x) for x in self.backward()]
num_grad = self.get_numerical_gradient()
for idx, name in enumerate(self.data_field):
self.assertEqual(num_grad[idx].shape, ana_grad[idx].shape)
self.assertTrue(
np.isclose(
num_grad[idx], ana_grad[idx], rtol=0.1).all())
def check_forward(self):
print 'test recurrent op forward' print 'test recurrent op forward'
pd_output = self.forward() pd_output = self.forward()
py_output = self.py_rnn.forward() py_output = self.py_rnn.forward()
...@@ -164,44 +196,190 @@ class RecurrentOpTest(unittest.TestCase): ...@@ -164,44 +196,190 @@ class RecurrentOpTest(unittest.TestCase):
self.assertEqual(pd_output.shape, py_output.shape) self.assertEqual(pd_output.shape, py_output.shape)
self.assertTrue(np.isclose(pd_output, py_output, rtol=0.1).all()) self.assertTrue(np.isclose(pd_output, py_output, rtol=0.1).all())
def get_numerical_gradient(self, delta=0.005):
dloss_dout = 1.0
feed_list = [getattr(self.py_rnn, x) for x in self.data_field]
grad_list = [np.zeros_like(x) for x in feed_list]
for feed, grad in zip(feed_list, grad_list):
for f, g in np.nditer([feed, grad], op_flags=['readwrite']):
o = float(f)
f[...] = o + delta
y_pos = self.forward()
class RecurrentGradientOpTest(unittest.TestCase): f[...] = o - delta
def create_forward_op(self): y_neg = self.forward()
self.forward_op = RecurrentOp(
# inputs f[...] = o
inputs=["x"], dout_dfeed = (y_pos - y_neg) / (delta * 2)
initial_states=["h_boot"], g[...] = dout_dfeed[0]
step_net="stepnet",
# outputs return grad_list
outputs=["h"],
step_scopes="step_scopes",
# attributes class RecurrentOpTest2(RecurrentOpTest1):
ex_states=["h@pre"], '''
states=["h@alias"]) Test RNNOp
equation:
# create a stepnet for RNN h_t = \sigma (W x_t + U h_{t-1})
stepnet = core.Net.create() weights:
x_fc_op = Operator("mul", X="x@alias", Y="W", Out="Wx") - W
h_fc_op = Operator("mul", X="h@pre", Y="U", Out="Uh") - U
sum_op = Operator("sum", X=["Wx", "Uh"], Out="sum") vars:
sig_op = Operator("sigmoid", X="sum", Y="h@alias") - x
memories:
for op in [x_fc_op, h_fc_op, sum_op, sig_op]: - h
stepnet.append_op(op) outputs:
stepnet.complete_add_op(True) - h
self.forward_op.set_stepnet(stepnet) '''
def create_gradient_op(self): input_dim = 2
a = set() batch_size = 10
backward_op = core.RecurrentOp.backward(self.forward_op, a) sent_len = 2
def test_grad(self): def setUp(self):
self.create_forward_op() self.init_program()
self.create_gradient_op()
self.data_field = {"x", "h_boot", "W", "U"}
self.input_shape = (self.sent_len, self.batch_size, self.input_dim)
self.output_shape = (self.sent_len, self.batch_size, self.input_dim)
self.py_rnn = PySimpleRNN2(self.input_shape, self.output_shape)
self.output = mean(x=self.create_rnn_op(), **self.p_info)
def create_rnn_op(self):
x = data(
shape=[self.sent_len, self.batch_size, self.input_dim],
data_type='float32',
name='x',
append_batch_size=False,
**self.p_info)
h_boot = data(
shape=[self.input_dim],
data_type='float32',
name='h_boot',
**self.p_info)
rnn = StaticRNN(program=self.program)
with rnn.step():
h_pre = rnn.memory(init=h_boot)
x_t = rnn.step_input(x)
temp_l = fc(input=x_t,
size=self.input_dim,
param_attr={'name': 'W'},
bias_attr=False,
**self.p_info)
temp_r = fc(input=h_pre,
size=self.input_dim,
param_attr={'name': 'U'},
bias_attr=False,
**self.p_info)
h = sigmoid(
x=elementwise_add(
x=temp_l, y=temp_r, **self.p_info),
**self.p_info)
rnn.update_memory(h_pre, h)
rnn.output(h)
return rnn()
class RecurrentOpTest3(RecurrentOpTest1):
'''
Test RNNOp with two memories
equation:
h_1 = h_pre_1
h_2 = h_pre_2
y = h_1 + h_2
vars:
- x
memories:
- h_1, h_2
outputs:
- y
'''
class PySimpleRNN3(PyRNNBase):
def __init__(self, input_shape, output_shape):
super(RecurrentOpTest3.PySimpleRNN3, self).__init__(input_shape,
output_shape)
seq_len, batch_size, input_dim = input_shape
self.h_boot1 = np.random.normal(size=(batch_size,
input_dim)).astype("float32")
self.h_boot2 = np.random.normal(size=(batch_size,
input_dim)).astype("float32")
men_dim = (seq_len, batch_size, input_dim)
self.mems1 = np.zeros(shape=men_dim).astype("float32")
self.mems2 = np.zeros(shape=men_dim).astype("float32")
def step(self, step_id, x):
if step_id == 0:
pre_mem1 = self.h_boot1
pre_mem2 = self.h_boot2
else:
pre_mem1 = self.mems1[step_id - 1]
pre_mem2 = self.mems2[step_id - 1]
self.mems1[step_id] = pre_mem1
self.mems2[step_id] = pre_mem2
self.y[step_id] = self.mems1[step_id] + self.mems2[step_id] + x
input_dim = 1
batch_size = 1
sent_len = 2
def setUp(self):
self.init_program()
self.data_field = {"x", "h_boot1", "h_boot2"}
self.input_shape = (self.sent_len, self.batch_size, self.input_dim)
self.output_shape = (self.sent_len, self.batch_size, self.input_dim)
self.py_rnn = RecurrentOpTest3.PySimpleRNN3(self.input_shape,
self.output_shape)
self.output = mean(x=self.create_rnn_op(), **self.p_info)
def create_rnn_op(self):
x = data(
shape=[self.sent_len, self.batch_size, self.input_dim],
data_type='float32',
name='x',
append_batch_size=False,
**self.p_info)
h_boot1 = data(
shape=[self.batch_size, self.input_dim],
data_type='float32',
name='h_boot1',
append_batch_size=False,
**self.p_info)
h_boot2 = data(
shape=[self.batch_size, self.input_dim],
data_type='float32',
name='h_boot2',
append_batch_size=False,
**self.p_info)
rnn = StaticRNN(program=self.program)
with rnn.step():
h_pre1 = rnn.memory(init=h_boot1)
h_pre2 = rnn.memory(init=h_boot2)
x_t = rnn.step_input(x)
mem1 = scale(x=h_pre1, scale=1.0, **self.p_info)
mem2 = scale(x=h_pre2, scale=1.0, **self.p_info)
out = sums(input=[mem1, x_t, mem2], **self.p_info)
rnn.update_memory(h_pre1, mem1)
rnn.update_memory(h_pre2, mem2)
rnn.output(out)
return rnn()
if __name__ == '__main__': if __name__ == '__main__':
exit(
0
) # FIXME(qijun): https://github.com/PaddlePaddle/Paddle/issues/5101#issuecomment-339814957
unittest.main() unittest.main()
import unittest
from paddle.v2.framework.layers import *
from paddle.v2.framework.framework import g_program
class TestRNN(unittest.TestCase):
def test_rnn(self):
img = data(
shape=[
80, # sequence length
22, # image height
22
], # image width
data_type='float32',
name='image')
hidden = fc(input=img, size=100, act='sigmoid', num_flatten_dims=2)
self.assertEqual((-1, 80, 100), hidden.shape)
hidden = fc(input=hidden, size=100, act='sigmoid', num_flatten_dims=2)
self.assertEqual((-1, 80, 100), hidden.shape)
rnn = StaticRNN()
with rnn.step():
hidden = rnn.step_input(hidden)
self.assertEqual((-1, 100), hidden.shape)
memory = rnn.memory(shape=(-1, 32), dtype='float32', init_value=0.0)
rnn_out = fc(input=[hidden, memory], size=32, act='sigmoid')
self.assertEqual((-1, 32), rnn_out.shape)
rnn.update_memory(memory, rnn_out)
rnn.output(rnn_out)
out = rnn()
self.assertEqual((-1, 80, 32), out.shape)
print g_program
if __name__ == '__main__':
unittest.main()
import paddle.v2 as paddle
import paddle.v2.framework.layers as layers
import paddle.v2.framework.nets as nets
import paddle.v2.framework.core as core
import paddle.v2.framework.optimizer as optimizer
from paddle.v2.framework.framework import Program, g_program, g_init_program
from paddle.v2.framework.executor import Executor
import numpy as np
def convolution_net(input_dim, class_dim=2, emb_dim=32, hid_dim=32):
data = layers.data(name="words", shape=[1], data_type="int64")
label = layers.data(name="label", shape=[1], data_type="int64")
emb = layers.embedding(input=data, size=[input_dim, emb_dim])
conv_3 = nets.sequence_conv_pool(
input=emb,
num_filters=hid_dim,
filter_size=3,
act="tanh",
pool_type="sqrt")
conv_4 = nets.sequence_conv_pool(
input=emb,
num_filters=hid_dim,
filter_size=4,
act="tanh",
pool_type="sqrt")
prediction = layers.fc(input=[conv_3, conv_4],
size=class_dim,
act="softmax")
cost = layers.cross_entropy(input=prediction, label=label)
avg_cost = layers.mean(x=cost)
adam_optimizer = optimizer.AdamOptimizer(learning_rate=0.002)
opts = adam_optimizer.minimize(avg_cost)
acc = layers.accuracy(input=prediction, label=label)
return avg_cost, acc
def to_lodtensor(data, place):
seq_lens = [len(seq) for seq in data]
cur_len = 0
lod = [cur_len]
for l in seq_lens:
cur_len += l
lod.append(cur_len)
flattened_data = np.concatenate(data, axis=0).astype("int64")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
res = core.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
def main():
BATCH_SIZE = 100
PASS_NUM = 5
word_dict = paddle.dataset.imdb.word_dict()
dict_dim = len(word_dict)
class_dim = 2
cost, acc = convolution_net(input_dim=dict_dim, class_dim=class_dim)
train_data = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.imdb.train(word_dict), buf_size=1000),
batch_size=BATCH_SIZE)
place = core.CPUPlace()
exe = Executor(place)
exe.run(g_init_program)
for pass_id in xrange(PASS_NUM):
for data in train_data():
tensor_words = to_lodtensor(map(lambda x: x[0], data), place)
label = np.array(map(lambda x: x[1], data)).astype("int64")
label = label.reshape([BATCH_SIZE, 1])
tensor_label = core.LoDTensor()
tensor_label.set(label, place)
outs = exe.run(g_program,
feed={"words": tensor_words,
"label": tensor_label},
fetch_list=[cost, acc])
cost_val = np.array(outs[0])
acc_val = np.array(outs[1])
print("cost=" + str(cost_val) + " acc=" + str(acc_val))
if cost_val < 1.0 and acc_val > 0.7:
exit(0)
exit(1)
if __name__ == '__main__':
main()
...@@ -109,7 +109,7 @@ cost = layers.cross_entropy( ...@@ -109,7 +109,7 @@ cost = layers.cross_entropy(
avg_cost = layers.mean(x=cost, program=program, init_program=init_program) avg_cost = layers.mean(x=cost, program=program, init_program=init_program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001) sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
opts = sgd_optimizer.minimize(avg_cost) opts = sgd_optimizer.minimize(avg_cost, init_program)
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.dataset.imikolov.train(word_dict, N), batch_size) paddle.dataset.imikolov.train(word_dict, N), batch_size)
......
...@@ -56,7 +56,7 @@ class Ploter(object): ...@@ -56,7 +56,7 @@ class Ploter(object):
assert isinstance(data, PlotData) assert isinstance(data, PlotData)
data.append(step, value) data.append(step, value)
def plot(self): def plot(self, path=None):
if self.__plot_is_disabled__(): if self.__plot_is_disabled__():
return return
...@@ -68,8 +68,11 @@ class Ploter(object): ...@@ -68,8 +68,11 @@ class Ploter(object):
titles.append(title) titles.append(title)
self.plt.plot(data.step, data.value) self.plt.plot(data.step, data.value)
self.plt.legend(titles, loc='upper left') self.plt.legend(titles, loc='upper left')
self.display.clear_output(wait=True) if path is None:
self.display.display(self.plt.gcf()) self.display.clear_output(wait=True)
self.display.display(self.plt.gcf())
else:
self.plt.savefig(path)
self.plt.gcf().clear() self.plt.gcf().clear()
def reset(self): def reset(self):
......
...@@ -7,3 +7,4 @@ rarfile ...@@ -7,3 +7,4 @@ rarfile
scipy>=0.19.0 scipy>=0.19.0
Pillow Pillow
nltk>=3.2.2 nltk>=3.2.2
graphviz
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册