提交 c0005d58 编写于 作者: D dangqingqing

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into lstm_fix

./doc/howto/dev/contribute_to_paddle_en.md # Contribute Code
We sincerely appreciate your contribution. This document explains our workflow and work style.
## Workflow
PaddlePaddle uses this [Git branching model](http://nvie.com/posts/a-successful-git-branching-model/). The following steps guide usual contributions.
1. Fork
Our development community has been growing fastly; it doesn't make sense for everyone to write into the official repo. So, please file Pull Requests from your fork. To make a fork, just head over to the GitHub page and click the ["Fork" button](https://help.github.com/articles/fork-a-repo/).
1. Clone
To make a copy of your fork to your local computers, please run
```bash
git clone https://github.com/your-github-account/paddle
cd paddle
```
1. Create the local feature branch
For daily works like adding a new feature or fixing a bug, please open your feature branch before coding:
```bash
git checkout -b my-cool-stuff
```
1. Commit
Before issuing your first `git commit` command, please install [`pre-commit`](http://pre-commit.com/) by running the following commands:
```bash
pip install pre-commit
pre-commit install
```
Our pre-commit configuration requires clang-format 3.8 for auto-formating C/C++ code and yapf for Python.
Once installed, `pre-commit` checks the style of code and documentation in every commit. We will see something like the following when you run `git commit`:
```
➜ git commit
CRLF end-lines remover...............................(no files to check)Skipped
yapf.................................................(no files to check)Skipped
Check for added large files..............................................Passed
Check for merge conflicts................................................Passed
Check for broken symlinks................................................Passed
Detect Private Key...................................(no files to check)Skipped
Fix End of Files.....................................(no files to check)Skipped
clang-formater.......................................(no files to check)Skipped
[my-cool-stuff c703c041] add test file
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 233
```
1. Build and test
Users can build PaddlePaddle natively on Linux and Mac OS X. But to unify the building environment and to make it easy for debugging, the recommended way is [using Docker](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/build_en.md).
1. Keep pulling
An experienced Git user pulls from the official repo often -- daily or even hourly, so they notice conflicts with others work early, and it's easier to resolve smaller conflicts.
```bash
git remote add upstream https://github.com/PaddlePaddle/Paddle
git pull upstream develop
```
1. Push and file a pull request
You can "push" your local work into your forked repo:
```bash
git push origin my-cool-stuff
```
The push allows you to create a pull request, requesting owners of this [official repo](https://github.com/PaddlePaddle/Paddle) to pull your change into the official one.
To create a pull request, please follow [these steps](https://help.github.com/articles/creating-a-pull-request/).
If your change is for fixing an issue, please write ["Fixes <issue-URL>"](https://help.github.com/articles/closing-issues-using-keywords/) in the description section of your pull request. Github would close the issue when the owners merge your pull request.
Please remember to specify some reviewers for your pull request. If you don't know who are the right ones, please follow Github's recommendation.
1. Delete local and remote branches
To keep your local workspace and your fork clean, you might want to remove merged branches:
```bash
git push origin :my-cool-stuff
git checkout develop
git pull upstream develop
git branch -d my-cool-stuff
```
### Code Review
- Please feel free to ping your reviewers by sending them the URL of your pull request via IM or email. Please do this after your pull request passes the CI.
- Please answer reviewers' every comment. If you are to follow the comment, please write "Done"; please give a reason otherwise.
- If you don't want your reviewers to get overwhelmed by email notifications, you might reply their comments by [in a batch](https://help.github.com/articles/reviewing-proposed-changes-in-a-pull-request/).
- Reduce the unnecessary commits. Some developers commit often. It is recommended to append a sequence of small changes into one commit by running `git commit --amend` instead of `git commit`.
## Coding Standard
### Code Style
Our C/C++ code follows the [Google style guide](http://google.github.io/styleguide/cppguide.html).
Our Python code follows the [PEP8 style guide](https://www.python.org/dev/peps/pep-0008/).
Our build process helps to check the code style. In [`build.sh`](https://github.com/PaddlePaddle/Paddle/blob/b84e8226514b8bb4405c3c28e54aa5077193d179/paddle/scripts/docker/build.sh#L42), the entry point of our [builder Docker image](https://github.com/PaddlePaddle/Paddle/blob/b84e8226514b8bb4405c3c28e54aa5077193d179/Dockerfile#L88), the CMake argument `WITH_STYLE_CHECK` is set to `ON` by default. This flag is on
Please install pre-commit, which automatically reformat the changes to C/C++ and Python code whenever we run `git commit`. To check the whole codebase, we can run the command `pre-commit run -a`, as in the [`check_style.sh` file](https://github.com/PaddlePaddle/Paddle/blob/b84e8226514b8bb4405c3c28e54aa5077193d179/paddle/scripts/travis/check_style.sh#L30), which is invoked by [our Travis CI configuration](https://github.com/PaddlePaddle/Paddle/blob/b84e8226514b8bb4405c3c28e54aa5077193d179/.travis.yml#L43).
### Unit Tests
Please remember to add related unit tests.
- For C/C++ code, please follow [`google-test` Primer](https://github.com/google/googletest/blob/master/googletest/docs/Primer.md).
- For Python code, please use [Python's standard `unittest` package](http://pythontesting.net/framework/unittest/unittest-introduction/).
### Writing Logs
We use [glog](https://github.com/google/glog) for logging in our C/C++ code.
For general information, please use `LOG`. For debug information, please use [`VLOG`](http://htmlpreview.github.io/?https://github.com/google/glog/blob/master/doc/glog.html#verbose). The reason is at [here](https://groups.google.com/a/chromium.org/d/msg/chromium-dev/3NDNd1KzXeY/AZKMMx37fdQJ).
`VLOG` requires a *verbose level* parameter. For example:
```c++
VLOG(3) << "Operator FC is taking " << num_inputs << "inputs."
```
When we run a PaddlePaddle application or test, we can specify a verbose threshold. For example:
```bash
GLOG_vmodule=buddy_allocator=2 \
GLOG_v=10 \
python \
../python/paddle/v2/framework/tests/test_recurrent_op.py
```
This will enable VLOG messages generated by `buddy_allocator.{h,cc}` and in the verbose range of 0 to 3, so you will see above example VLOG message, which is in level 3. This suggests that we output overall messages in lower verbose levels, so they display with higher probability. When coding C++, please follow the verbose level convention as follows:
- verbose level 1: [framework](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/framework)
- verbose level 3: [operators](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators)
- verbose level 5: [memory](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/memory), [platform](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/platform)
- verbose level 7: [math](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/math)
# 构建Raspberry Pi平台上的PaddlePaddle库 # 构建Raspberry Pi平台上的PaddlePaddle库
对于Rasspberry Pi系统,用户可通过ssh等方式登录到Raspberry Pi系统上,按照[源码编译PaddlePaddle](http://www.paddlepaddle.org/doc_cn/getstarted/build_and_install/cmake/build_from_source_cn.html)相关文档所述,直接编译Raspberry Pi平台上适用的PaddlePaddle库。 通常有两个方法来构建基于 Rasspberry Pi 的版本:
用户也可以在自己熟悉的开发平台上,通过交叉编译的方式来编译。这篇文档将以Linux x86-64平台为例,介绍交叉编译Raspberry Pi平台上适用的PaddlePaddle的方法和步骤 1. 通过ssh等方式登录到Raspberry Pi系统上来构建。所需的开发工具和第三方库可以参考 [`/Dockerfile`](https://github.com/PaddlePaddle/Paddle/blob/develop/Dockerfile)
## 准备交叉编译环境 1. 另一个方法是交叉编译。这篇文档介绍在 Linux/x64 上交叉编译Raspberry Pi平台上适用的PaddlePaddle的方法和步骤。
从源码交叉编译PaddlePaddle,用户需要提前准备好交叉编译环境。用户可自行前往[github](https://github.com/raspberrypi/tools)下载Raspberry Pi平台使用的C/C++交叉编译工具链,也可通过以下命令获取: ## 安装交叉编译器
克隆下面 Github repo
```bash ```bash
git clone https://github.com/raspberrypi/tools.git git clone https://github.com/raspberrypi/tools.git
``` ```
该github仓库中包含若干个预编译好的、针对不同平台的编译工具。宿主机是Linux x86-64环境,则需选用`arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64`下的作为编译工具,所使用的编译器为arm-linux-gnueabihf-gcc 4.8.3。 即可在 `./tools/tree/master/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64` 目录里找到交叉编译器 arm-linux-gnueabihf-gcc 4.8.3。运行该编译工具链需要一台 Linux x64 机器上以及 2.14版本以上的 glibc。
注意,该编译工具链需要系统glibc支持2.14以上。
## 配置交叉编译参数 ## 配置交叉编译参数
CMake系统对交叉编译提供了支持[cmake-toolchains](https://cmake.org/cmake/help/v3.0/manual/cmake-toolchains.7.html#cross-compiling)。为了简化cmake配置,PaddlePaddle为交叉编译提供了工具链配置文档[cmake/cross_compiling/raspberry_pi.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/raspberry_pi.cmake),以提供一些默认的编译器和编译参数相关配置 CMake[支持交叉编译](https://cmake.org/cmake/help/v3.0/manual/cmake-toolchains.7.html#cross-compiling)。PaddlePaddle for Raspberry Pi的配置信息在[cmake/cross_compiling/raspberry_pi.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/raspberry_pi.cmake)
交叉编译Raspberry Pi版本PaddlePaddle库时,有一些必须配置的参数: 交叉编译Raspberry Pi版本PaddlePaddle库时,有一些必须配置的参数:
- `CMAKE_SYSTEM_NAME`,CMake编译的目标平台,必须配置为`RPi`。在设置`CMAKE_SYSTEM_NAME=RPi`后,PaddlePaddle的CMake系统才认为在是在交叉编译Raspberry Pi系统的版本,并自动编译宿主机版protoc可执行文件、目标机版protobuf库、以及目标机版OpenBLAS库。 - `CMAKE_SYSTEM_NAME`:CMake编译的目标平台,必须配置为`RPi`。在设置`CMAKE_SYSTEM_NAME=RPi`后,PaddlePaddle的CMake系统才认为在是在交叉编译Raspberry Pi系统的版本,并自动编译宿主机版protoc可执行文件、目标机版protobuf库、以及目标机版OpenBLAS库。
Raspberry Pi平台可选配置参数:
- `RPI_TOOLCHAIN`,编译工具链所在的绝对路径,或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动设置需要使用的交叉编译器;否则,用户需要在cmake时手动设置这些值。无默认值。 - `RPI_TOOLCHAIN`:编译工具链所在的绝对路径,或者相对于构建目录的相对路径。PaddlePaddle的CMake系统将根据该值自动设置需要使用的交叉编译器;否则,用户需要在cmake时手动设置这些值。无默认值。
- `RPI_ARM_NEON`,是否使用NEON指令。目前必须设置成`ON`,默认值为`ON`
其他配置参数: - `RPI_ARM_NEON`:是否使用NEON指令。目前必须设置成`ON`,默认值为`ON`
- `HOST_C/CXX_COMPILER`,宿主机的C/C++编译器。在编译宿主机版protoc可执行文件和目标机版OpenBLAS库时需要用到。默认设置成环境变量`CC`的值;若环境变量`CC`没有设置,则设置成`cc`编译器。 - `HOST_C/CXX_COMPILER`,宿主机的C/C++编译器。在编译宿主机版protoc可执行文件和目标机版OpenBLAS库时需要用到。默认设置成环境变量`CC`的值;若环境变量`CC`没有设置,则设置成`cc`编译器。
cmake参数如下; 一个常用的CMake配置如下:
``` ```
cmake -DCMAKE_SYSTEM_NAME=RPi \ cmake -DCMAKE_SYSTEM_NAME=RPi \
...@@ -47,7 +44,9 @@ cmake -DCMAKE_SYSTEM_NAME=RPi \ ...@@ -47,7 +44,9 @@ cmake -DCMAKE_SYSTEM_NAME=RPi \
.. ..
``` ```
用户还可根据自己的需求设置其他编译参数。比如希望最小化生成的库的大小,可以设置`CMAKE_BUILD_TYPE``MinSizeRel`;若希望最快的执行速度,则可设置`CMAKE_BUILD_TYPE``Release`。亦可以通过手动设置`CMAKE_C/CXX_FLAGS_MINSIZEREL/RELEASE`来影响PaddlePaddle的编译过程。 其中`WITH_C_API=ON`表示需要构建推理库。
用户还可根据自己的需求设置其他编译参数。比如希望最小化生成的库的大小,可以设置`CMAKE_BUILD_TYPE``MinSizeRel`;若希望最快的执行速度,则可设置`CMAKE_BUILD_TYPE``Release`
## 编译和安装 ## 编译和安装
...@@ -60,6 +59,4 @@ make install ...@@ -60,6 +59,4 @@ make install
注意:如果你曾经在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。 注意:如果你曾经在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。
执行完安装命令后,由于上一步cmake配置中`WITH_C_API`设置为`ON``your/path/to/install`目录中会包含`include``lib`目录,其中`include`中包含C-API的头文件,`lib`中包含一个Raspberry Pi版本的库。 执行完安装命令后,,`your/path/to/install`目录中会包含`include``lib`目录,其中`include`中包含C-API的头文件,`lib`中包含一个Raspberry Pi版本的库。
更多的编译配置见[源码编译PaddlePaddle](http://www.paddlepaddle.org/doc_cn/getstarted/build_and_install/cmake/build_from_source_cn.html)相关文档。
# Build PaddlePaddle for Raspberry Pi
You may use any of the following two approaches to build the inference library of PaddlePaddle for Raspberry Pi:
1. Build using SSH: Log in to a Raspberry Pi using SSH and build the library. The required development tools and third-party dependencies are listed in here: [`/Dockerfile`](https://github.com/PaddlePaddle/Paddle/blob/develop/Dockerfile).
1. Cross-compile: We talk about how to cross-compile PaddlePaddle for Raspberry Pi on a Linux/x64 machine, in more detail in this article.
## The Cross-Compiling Toolchain
Step 1. Clone the Github repo by running the following command.
```bash
git clone https://github.com/raspberrypi/tools.git
```
Step 2. Use the pre-built cross-compiler found in `./tools/tree/master/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64`. To run it on a Linux computer, glibc version >= 2.14 is needed.
## CMake Arguments
CMake supports [cross-compiling](https://cmake.org/cmake/help/v3.0/manual/cmake-toolchains.7.html#cross-compiling). All CMake configuration arguments required for the cross-compilation for Raspberry Pi can be found in [`cmake/cross_compiling/raspberry_pi.cmake`](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/raspberry_pi.cmake).
Some important arguments that need to be set:
- `CMAKE_SYSTEM_NAME`: The target platform. Must be `RPi`.
- `RPI_TOOLCHAIN`: The absolute path of the cross-compiling toolchain.
- `RPI_ARM_NEON`: Use ARM NEON Intrinsics. This is a required argument and set default to `ON`.
- `HOST_C/CXX_COMPILER`: The C/C++ compiler for the host. It is used to build building tools running on the host, for example, protoc.
A commonly-used CMake configuration is as follows:
```
cmake -DCMAKE_SYSTEM_NAME=RPi \
-DRPI_TOOLCHAIN=your/path/to/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64 \
-DRPI_ARM_NEON=ON \
-DCMAKE_INSTALL_PREFIX=your/path/to/install \
-DWITH_GPU=OFF \
-DWITH_C_API=ON \
-DWITH_PYTHON=OFF \
-DWITH_SWIG_PY=OFF \
..
```
To build the inference library, please set the argument WITH_API to ON: `WITH_C_API=ON`.
You can add more arguments. For example, to minimize the size of the generated inference library, you may use `CMAKE_BUILD_TYPE=MinSizeRel`. For performance optimization, you may use `CMAKE_BUILD_TYPE=Release`.
## Build and Install
The following commands build the inference library of PaddlePaddle for Raspberry Pi and third-party dependencies.
```bash
make
make install
```
The intermediate files will be stored in `build`. Third-party libraries will be located in `build/third_party`. If you have already built it for other platforms like Android or iOS, you may want to clear these directories by running the command: `rm -rf build`.
The infernece library will be in `your/path/to/install/lib`, with related header files in `your/path/to/install/include`.
# Contribute Code
We sincerely appreciate your contributions. You can use fork and pull request
workflow to merge your code.
## Code Requirements
- Your code comments must be fully documented by
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) style.
- Make sure the compiler option `WITH_STYLE_CHECK` is on and the compiler
passes the code style check.
- All code must have unit test.
- Pass all unit tests.
The following tutorial guides you into submitting your contibution.
## [Creating a Fork](https://help.github.com/articles/fork-a-repo/)
Just head over to the GitHub page and click the "Fork" button.
It's just that simple.
## Clone
Clone remote repository.
```bash
➜ git clone https://github.com/USERNAME/Paddle
cd Paddle
```
## Create a local branch
Paddle is currently using [Git-flow branching model](http://nvie.com/posts/a-successful-git-branching-model/).
All feature and bug fix development work should be done on a new branch, generally create new branch from `develop` branch .
```bash
➜ git checkout -b my-cool-stuff
```
Before the checkout, you need to keep the current branch directory clean, otherwise the untracked file will be brought to the new branch, which can be inspected by `git status`.
## Using `pre-commit` hook
Paddle developers use [pre-commit](http://pre-commit.com/) tool to manage git
pre-commit hooks. It can help us format source codes (cpp, python), check some
basic thing before commit (only one EOL for each file, do not add a huge file
in git). `pre-commit` tests is a part of unit tests in Travis-CI now, every
PR doesn't fit hook can not be merged into Paddle.
To use [pre-commit](http://pre-commit.com/), you should install it by
`pip install pre-commit`, and currently, Paddle uses `clang-format` to format
c/cpp sources. Please make sure clang-format 3.8+ installed.
Install and run it as follow:
```bash
➜ pip install pre-commit
➜ pre-commit install
```
When you commit your code, the pre-commit hook will check the local code if there is
anything not suitable to commit, and so on.
## Start to develop
In this tutorial, I delete a line in README.md and created a new file.
We can use `git status` to inspect the changes of current directory, `git diff` to see difference.
```bash
➜ git status
On branch test
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: README.md
Untracked files:
(use "git add <file>..." to include in what will be committed)
test
no changes added to commit (use "git add" and/or "git commit -a")
```
## Build and Test
We package PaddlePaddle's compile environment into a Docker image, called the develop image named `paddle:dev`, it contains all compiling tools that PaddlePaddle needs.
If you want to build the develop image, just run:
```bash
➜ docker build -t paddle:dev .
```
Then we can use the develop image to build PaddlePaddle source. For example:
```bash
➜ docker run -v $(pwd):/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TEST=ON" paddle:dev
```
The above command will compile PaddlePaddle and create a Dockerfile for building production image. All the generated files are in the build directory. "WITH_GPU" controls if the generated production image supports GPU. "WITH_AVX" controls if the generated production image supports AVX. "WITH_TEST" controls if the unit test will be generated.
Then we can generate the production image by copying the compiled PaddlePaddle program into the image by
```bash
➜ docker build -t paddle:prod -f build/Dockerfile .
```
Run unit test finally:
```bash
➜ docker run -it -v $(pwd):/paddle paddle:dev bash -c "cd /paddle/build && ctest"
```
For more details, you can read [this doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
## Commit
Next we cancel the changes to the README.md file and then commit our changes by following command lines:
```bash
➜ git checkout -- README.md
➜ git status
On branch test
Untracked files:
(use "git add <file>..." to include in what will be committed)
test
nothing added to commit but untracked files present (use "git add" to track)
➜ git add test
```
We should write a description of each commit by `git commit` to allow others to know
the changes in these files.
```bash
➜ git commit
CRLF end-lines remover...............................(no files to check)Skipped
yapf.................................................(no files to check)Skipped
Check for added large files..............................................Passed
Check for merge conflicts................................................Passed
Check for broken symlinks................................................Passed
Detect Private Key...................................(no files to check)Skipped
Fix End of Files.....................................(no files to check)Skipped
clang-formater.......................................(no files to check)Skipped
[my-cool-stuff c703c041] add test file
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 233
```
## Keeping Fork Up to Date
Before pull your request, you should sync your code from the latest PaddlePaddle.
To do this, you'll need to add a remote at first:
```bash
➜ git remote add upstream https://github.com/PaddlePaddle/Paddle
➜ git remote
origin
upstream
```
Update your fork with the latest upstream changes:
```bash
➜ git fetch upstream
➜ git pull upstream develop
```
Now, your local master branch is up-to-date with everything modified upstream.
## Push to GitHub
```bash
# push to your repository in Github
➜ git push origin my-cool-stuff
```
## Create an issue and a Pull Request
Create an Issue to describe the problem and record its number.
Go to the page for your fork on GitHub, select your development branch,
and click the `New pull request`.
<img width="295" alt="screen shot 2017-04-26 at 9 09 28 pm" src="https://cloud.githubusercontent.com/assets/11692045/25436054/a6d98c66-2ac4-11e7-9cb1-18dd13150230.png">
Then select the target branch:
<img width="750" alt="screen shot 2017-04-26 at 9 11 52 pm" src="https://cloud.githubusercontent.com/assets/11692045/25436139/f83b1e6c-2ac4-11e7-8c0e-add499023c46.png">
We can add `resolve #Issue number` in PR description to close the issue automatically after the PR is merge. More details in <https://help.github.com/articles/closing-issues-via-commit-messages/>.
Then wait for review, if there need to modify, refer to the above steps to update the corresponding origin branch.
## Delete origin branch
After the PR is merge into the main repository, we can delete the remote branch on the PR page.
<img width="775" alt="screen shot 2017-04-26 at 9 18 24 pm" src="https://cloud.githubusercontent.com/assets/11692045/25436457/e4cdd472-2ac5-11e7-9272-badc76c4a23e.png">
Or just run:
```bash
➜ git push origin :my-cool-stuff
```
## Delete local branch
Finally, we delete local branch:
```bash
➜ git checkout develop
# delete my-cool-stuff branch
➜ git branch -D my-cool-stuff
```
../../../CONTRIBUTING.md
\ No newline at end of file
...@@ -21,7 +21,6 @@ ...@@ -21,7 +21,6 @@
dev/build_cn.rst dev/build_cn.rst
dev/write_docs_cn.rst dev/write_docs_cn.rst
dev/contribute_to_paddle_cn.md
模型配置 模型配置
-------- --------
......
...@@ -19,7 +19,7 @@ ...@@ -19,7 +19,7 @@
* [启动集群作业](#启动集群作业-1) * [启动集群作业](#启动集群作业-1)
* [在Kubernetes集群中提交训练作业](#在kubernetes集群中提交训练作业) * [在Kubernetes集群中提交训练作业](#在kubernetes集群中提交训练作业)
# 概述 ## 概述
本文将介绍如何使用PaddlePaddle在不同的集群框架下完成分布式训练。分布式训练架构如下图所示: 本文将介绍如何使用PaddlePaddle在不同的集群框架下完成分布式训练。分布式训练架构如下图所示:
<img src="https://user-images.githubusercontent.com/13348433/31772175-5f419eca-b511-11e7-9db7-5231fe3d9ccb.png" width="500"> <img src="https://user-images.githubusercontent.com/13348433/31772175-5f419eca-b511-11e7-9db7-5231fe3d9ccb.png" width="500">
...@@ -32,7 +32,7 @@ ...@@ -32,7 +32,7 @@
在使用同步SGD训练神经网络时,PaddlePaddle使用同步屏障(barrier),使梯度的提交和参数的更新按照顺序方式执行。在异步SGD中,则并不会等待所有trainer提交梯度才更新参数,这样极大地提高了计算的并行性:参数服务器之间不相互依赖,并行地接收梯度和更新参数,参数服务器也不会等待计算节点全部都提交梯度之后才开始下一步,计算节点之间也不会相互依赖,并行地执行模型的训练。可以看出,虽然异步SGD方式会提高参数更新并行度, 但是并不能保证参数同步更新,在任意时间某一台参数服务器上保存的参数可能比另一台要更新,与同步SGD相比,梯度会有噪声。 在使用同步SGD训练神经网络时,PaddlePaddle使用同步屏障(barrier),使梯度的提交和参数的更新按照顺序方式执行。在异步SGD中,则并不会等待所有trainer提交梯度才更新参数,这样极大地提高了计算的并行性:参数服务器之间不相互依赖,并行地接收梯度和更新参数,参数服务器也不会等待计算节点全部都提交梯度之后才开始下一步,计算节点之间也不会相互依赖,并行地执行模型的训练。可以看出,虽然异步SGD方式会提高参数更新并行度, 但是并不能保证参数同步更新,在任意时间某一台参数服务器上保存的参数可能比另一台要更新,与同步SGD相比,梯度会有噪声。
# 环境准备 ## 环境准备
1. 准备您的计算集群。计算集群通常由一组(几台到几千台规模)的Linux服务器组成。服务器之间可以通过局域网(LAN)联通,每台服务器具有集群中唯一的IP地址(或者可被DNS解析的主机名)。集群中的每台计算机通常被成为一个“节点”。 1. 准备您的计算集群。计算集群通常由一组(几台到几千台规模)的Linux服务器组成。服务器之间可以通过局域网(LAN)联通,每台服务器具有集群中唯一的IP地址(或者可被DNS解析的主机名)。集群中的每台计算机通常被成为一个“节点”。
1. 我们需要在集群的所有节点上安装 PaddlePaddle。 如果要启用GPU,还需要在节点上安装对应的GPU驱动以及CUDA。PaddlePaddle的安装可以参考[build_and_install](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/getstarted/build_and_install)的多种安装方式。我们推荐使用[Docker](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)安装方式来快速安装PaddlePaddle。 1. 我们需要在集群的所有节点上安装 PaddlePaddle。 如果要启用GPU,还需要在节点上安装对应的GPU驱动以及CUDA。PaddlePaddle的安装可以参考[build_and_install](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/getstarted/build_and_install)的多种安装方式。我们推荐使用[Docker](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)安装方式来快速安装PaddlePaddle。
...@@ -51,8 +51,8 @@ PaddlePaddle 0.10.0, compiled with ...@@ -51,8 +51,8 @@ PaddlePaddle 0.10.0, compiled with
下面以`doc/howto/usage/cluster/src/word2vec`中的代码作为实例,介绍使用PaddlePaddle v2 API完成分布式训练。 下面以`doc/howto/usage/cluster/src/word2vec`中的代码作为实例,介绍使用PaddlePaddle v2 API完成分布式训练。
# 启动参数说明 ## 启动参数说明
## 启动参数服务器 ### 启动参数服务器
执行以下的命令启动一个参数服务器并等待和计算节点的数据交互 执行以下的命令启动一个参数服务器并等待和计算节点的数据交互
```bash ```bash
$ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 $ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1
...@@ -70,7 +70,7 @@ $ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num ...@@ -70,7 +70,7 @@ $ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num
| ports_num_for_sparse | 必选 | 1 | 用于稀疏类型参数通信的端口个数 | | ports_num_for_sparse | 必选 | 1 | 用于稀疏类型参数通信的端口个数 |
| num_gradient_servers | 必选 | 1 | 当前训练任务pserver总数 | | num_gradient_servers | 必选 | 1 | 当前训练任务pserver总数 |
## 启动计算节点 ### 启动计算节点
执行以下命令启动使用python编写的trainer程序(文件名为任意文件名,如train.py) 执行以下命令启动使用python编写的trainer程序(文件名为任意文件名,如train.py)
```bash ```bash
$ python train.py $ python train.py
...@@ -117,7 +117,7 @@ paddle.init( ...@@ -117,7 +117,7 @@ paddle.init(
| pservers | 必选 | 127.0.0.1 | 当前训练任务启动的pserver的IP列表,多个IP使用“,”隔开 | | pservers | 必选 | 127.0.0.1 | 当前训练任务启动的pserver的IP列表,多个IP使用“,”隔开 |
## 准备数据集 ### 准备数据集
参考样例数据准备脚本[prepare.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py),准备训练数据和验证数据集,我们使用paddle.dataset.imikolov数据集,并根据分布式训练并发数(trainer节点个数),在`prepare.py`开头部分指定`SPLIT_COUNT`将数据切分成多份。 参考样例数据准备脚本[prepare.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py),准备训练数据和验证数据集,我们使用paddle.dataset.imikolov数据集,并根据分布式训练并发数(trainer节点个数),在`prepare.py`开头部分指定`SPLIT_COUNT`将数据切分成多份。
...@@ -149,7 +149,7 @@ test.txt-00002 ...@@ -149,7 +149,7 @@ test.txt-00002
对于不同的训练任务,训练数据格式和训练程序的`reader()`会大不相同,所以开发者需要根据自己训练任务的实际场景完成训练数据的分割和`reader()`的编写。 对于不同的训练任务,训练数据格式和训练程序的`reader()`会大不相同,所以开发者需要根据自己训练任务的实际场景完成训练数据的分割和`reader()`的编写。
## 准备训练程序 ### 准备训练程序
我们会对每个训练任务都会在每个节点上创建一个工作空间(workspace),其中包含了用户的训练程序、程序依赖、挂载或下载的训练数据分片。 我们会对每个训练任务都会在每个节点上创建一个工作空间(workspace),其中包含了用户的训练程序、程序依赖、挂载或下载的训练数据分片。
...@@ -184,7 +184,7 @@ test.txt-00002 ...@@ -184,7 +184,7 @@ test.txt-00002
- `train_data_dir`:包含训练数据的目录,可以是从分布式存储挂载过来的,也可以是在任务启动前下载到本地的。 - `train_data_dir`:包含训练数据的目录,可以是从分布式存储挂载过来的,也可以是在任务启动前下载到本地的。
- `test_data_dir`:包含测试数据集的目录。 - `test_data_dir`:包含测试数据集的目录。
# 使用分布式计算平台或工具 ## 使用分布式计算平台或工具
PaddlePaddle可以使用多种分布式计算平台构建分布式计算任务,包括: PaddlePaddle可以使用多种分布式计算平台构建分布式计算任务,包括:
- [Kubernetes](http://kubernetes.io) Google开源的容器集群的调度框架,支持大规模集群生产环境的完整集群方案。 - [Kubernetes](http://kubernetes.io) Google开源的容器集群的调度框架,支持大规模集群生产环境的完整集群方案。
...@@ -195,12 +195,12 @@ PaddlePaddle可以使用多种分布式计算平台构建分布式计算任务 ...@@ -195,12 +195,12 @@ PaddlePaddle可以使用多种分布式计算平台构建分布式计算任务
在使用分布式计算平台进行训练时,任务被调度在集群中时,分布式计算平台通常会通过API或者环境变量提供任务运行需要的参数,比如节点的ID、IP和任务节点个数等。 在使用分布式计算平台进行训练时,任务被调度在集群中时,分布式计算平台通常会通过API或者环境变量提供任务运行需要的参数,比如节点的ID、IP和任务节点个数等。
## 使用Fabric启动集群作业 ### 使用Fabric启动集群作业
### 准备一个Linux集群 #### 准备一个Linux集群
可以在`paddle/scripts/cluster_train_v2/fabric/docker_cluster`目录下,执行`kubectl -f ssh_servers.yaml`启动一个测试集群,并使用`kubectl get po -o wide`获得这些节点的IP地址。 可以在`paddle/scripts/cluster_train_v2/fabric/docker_cluster`目录下,执行`kubectl -f ssh_servers.yaml`启动一个测试集群,并使用`kubectl get po -o wide`获得这些节点的IP地址。
### 启动集群作业 #### 启动集群作业
`paddle.py` 提供了自动化脚本来启动不同节点中的所有 PaddlePaddle 集群进程。默认情况下,所有命令行选项可以设置为 `paddle.py` 命令选项并且 `paddle.py` 将透明、自动地将这些选项应用到 PaddlePaddle 底层进程。 `paddle.py` 提供了自动化脚本来启动不同节点中的所有 PaddlePaddle 集群进程。默认情况下,所有命令行选项可以设置为 `paddle.py` 命令选项并且 `paddle.py` 将透明、自动地将这些选项应用到 PaddlePaddle 底层进程。
...@@ -216,10 +216,10 @@ sh run.sh ...@@ -216,10 +216,10 @@ sh run.sh
集群作业将会在几秒后启动。 集群作业将会在几秒后启动。
### 终止集群作业 #### 终止集群作业
`paddle.py`能获取`Ctrl + C` SIGINT 信号来自动终止它启动的所有进程。只需中断 `paddle.py` 任务来终止集群作业。如果程序崩溃你也可以手动终止。 `paddle.py`能获取`Ctrl + C` SIGINT 信号来自动终止它启动的所有进程。只需中断 `paddle.py` 任务来终止集群作业。如果程序崩溃你也可以手动终止。
### 检查集群训练结果 #### 检查集群训练结果
详细信息请检查 $workspace/log 里的日志,每一个节点都有相同的日志结构。 详细信息请检查 $workspace/log 里的日志,每一个节点都有相同的日志结构。
`paddle_trainer.INFO` `paddle_trainer.INFO`
...@@ -234,13 +234,13 @@ sh run.sh ...@@ -234,13 +234,13 @@ sh run.sh
`train.log` `train.log`
提供训练过程的 stderr 和 stdout。训练失败时可以检查错误日志。 提供训练过程的 stderr 和 stdout。训练失败时可以检查错误日志。
### 检查模型输出 #### 检查模型输出
运行完成后,模型文件将被写入节点 0 的 `output` 目录中。 运行完成后,模型文件将被写入节点 0 的 `output` 目录中。
工作空间中的 `nodefile` 表示当前集群作业的节点 ID。 工作空间中的 `nodefile` 表示当前集群作业的节点 ID。
## 在OpenMPI集群中提交训练作业 ### 在OpenMPI集群中提交训练作业
### 准备OpenMPI集群 #### 准备OpenMPI集群
执行下面的命令以启动3个节点的OpenMPI集群和一个"head"节点: 执行下面的命令以启动3个节点的OpenMPI集群和一个"head"节点:
...@@ -252,7 +252,7 @@ kubectl create -f mpi-nodes.yaml ...@@ -252,7 +252,7 @@ kubectl create -f mpi-nodes.yaml
然后可以从head节点ssh无密码登录到OpenMPI的每个节点上。 然后可以从head节点ssh无密码登录到OpenMPI的每个节点上。
### 启动集群作业 #### 启动集群作业
您可以按照下面的步骤在OpenMPI集群中提交paddle训练任务: 您可以按照下面的步骤在OpenMPI集群中提交paddle训练任务:
...@@ -280,6 +280,6 @@ scp train.txt-00002 test.txt-00002 [node3IP]:/home/tutorial ...@@ -280,6 +280,6 @@ scp train.txt-00002 test.txt-00002 [node3IP]:/home/tutorial
mpirun -hostfile machines -n 3 /home/tutorial/start_mpi_train.sh mpirun -hostfile machines -n 3 /home/tutorial/start_mpi_train.sh
``` ```
## 在Kubernetes集群中提交训练作业 ### 在Kubernetes集群中提交训练作业
此部分的使用方法可以参考[here](../k8s/k8s_distributed_cn.md) 此部分的使用方法可以参考[here](../k8s/k8s_distributed_cn.md)
...@@ -19,7 +19,7 @@ ...@@ -19,7 +19,7 @@
* [Launching Cluster Job](#launching-cluster-job-1) * [Launching Cluster Job](#launching-cluster-job-1)
* [Cluster Training Using Kubernetes](#cluster-training-using-kubernetes) * [Cluster Training Using Kubernetes](#cluster-training-using-kubernetes)
# Introduction ## Introduction
In this article, we'll explain how to run distributed training jobs with PaddlePaddle on different types of clusters. The diagram below shows the main architecture of a distributed trainning job: In this article, we'll explain how to run distributed training jobs with PaddlePaddle on different types of clusters. The diagram below shows the main architecture of a distributed trainning job:
...@@ -33,7 +33,7 @@ PaddlePaddle can support both synchronize stochastic gradient descent (SGD) and ...@@ -33,7 +33,7 @@ PaddlePaddle can support both synchronize stochastic gradient descent (SGD) and
When training with synchronize SGD, PaddlePaddle uses an internal "synchronize barrier" which makes gradients update and parameter download in strict order. On the other hand, asynchronous SGD won't wait for all trainers to finish upload at a single step, this will increase the parallelism of distributed training: parameter servers do not depend on each other, they'll do parameter optimization concurrently. Parameter servers will not wait for trainers, so trainers will also do their work concurrently. But asynchronous SGD will introduce more randomness and noises in the gradient. When training with synchronize SGD, PaddlePaddle uses an internal "synchronize barrier" which makes gradients update and parameter download in strict order. On the other hand, asynchronous SGD won't wait for all trainers to finish upload at a single step, this will increase the parallelism of distributed training: parameter servers do not depend on each other, they'll do parameter optimization concurrently. Parameter servers will not wait for trainers, so trainers will also do their work concurrently. But asynchronous SGD will introduce more randomness and noises in the gradient.
# Preparations ## Preparations
1. Prepare your computer cluster. It's normally a bunch of Linux servers connected by LAN. Each server will be assigned a unique IP address. The computers in the cluster can be called "nodes". 1. Prepare your computer cluster. It's normally a bunch of Linux servers connected by LAN. Each server will be assigned a unique IP address. The computers in the cluster can be called "nodes".
2. Install PaddlePaddle on every node. If you are going to take advantage of GPU cards, you'll also need to install proper driver and CUDA libraries. To install PaddlePaddle please read [this build and install](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/getstarted/build_and_install) document. We strongly recommend using [Docker installation](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst). 2. Install PaddlePaddle on every node. If you are going to take advantage of GPU cards, you'll also need to install proper driver and CUDA libraries. To install PaddlePaddle please read [this build and install](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/getstarted/build_and_install) document. We strongly recommend using [Docker installation](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
...@@ -52,9 +52,9 @@ PaddlePaddle 0.10.0rc, compiled with ...@@ -52,9 +52,9 @@ PaddlePaddle 0.10.0rc, compiled with
We'll take `doc/howto/usage/cluster/src/word2vec` as an example to introduce distributed training using PaddlePaddle v2 API. We'll take `doc/howto/usage/cluster/src/word2vec` as an example to introduce distributed training using PaddlePaddle v2 API.
# Command-line arguments ## Command-line arguments
## Starting parameter server ### Starting parameter server
Type the below command to start a parameter server which will wait for trainers to connect: Type the below command to start a parameter server which will wait for trainers to connect:
...@@ -74,7 +74,7 @@ $ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num ...@@ -74,7 +74,7 @@ $ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num
| ports_num_for_sparse | required | 1 | number of ports which serves sparse parameter update | | ports_num_for_sparse | required | 1 | number of ports which serves sparse parameter update |
| num_gradient_servers | required | 1 | total number of gradient servers | | num_gradient_servers | required | 1 | total number of gradient servers |
## Starting trainer ### Starting trainer
Type the command below to start the trainer(name the file whatever you want, like "train.py") Type the command below to start the trainer(name the file whatever you want, like "train.py")
```bash ```bash
...@@ -122,7 +122,7 @@ paddle.init( ...@@ -122,7 +122,7 @@ paddle.init(
| trainer_id | required | 0 | ID for every trainer, start from 0 | | trainer_id | required | 0 | ID for every trainer, start from 0 |
| pservers | required | 127.0.0.1 | list of IPs of parameter servers, separated by "," | | pservers | required | 127.0.0.1 | list of IPs of parameter servers, separated by "," |
## Prepare Training Dataset ### Prepare Training Dataset
Here's some example code [prepare.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py), it will download public `imikolov` dataset and split it into multiple files according to job parallelism(trainers count). Modify `SPLIT_COUNT` at the begining of `prepare.py` to change the count of output files. Here's some example code [prepare.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py), it will download public `imikolov` dataset and split it into multiple files according to job parallelism(trainers count). Modify `SPLIT_COUNT` at the begining of `prepare.py` to change the count of output files.
...@@ -155,7 +155,7 @@ When job started, every trainer needs to get it's own part of data. In some dist ...@@ -155,7 +155,7 @@ When job started, every trainer needs to get it's own part of data. In some dist
Different training jobs may have different data format and `reader()` function, developers may need to write different data prepare scripts and `reader()` functions for their job. Different training jobs may have different data format and `reader()` function, developers may need to write different data prepare scripts and `reader()` functions for their job.
## Prepare Training program ### Prepare Training program
We'll create a *workspace* directory on each node, storing your training program, dependencies, mounted or downloaded dataset directory. We'll create a *workspace* directory on each node, storing your training program, dependencies, mounted or downloaded dataset directory.
...@@ -191,7 +191,7 @@ Your workspace may looks like: ...@@ -191,7 +191,7 @@ Your workspace may looks like:
- `train_data_dir`: containing training data. Mount from storage service or copy trainning data to here. - `train_data_dir`: containing training data. Mount from storage service or copy trainning data to here.
- `test_data_dir`: containing testing data. - `test_data_dir`: containing testing data.
# Use cluster platforms or cluster management tools ## Use cluster platforms or cluster management tools
PaddlePaddle supports running jobs on several platforms including: PaddlePaddle supports running jobs on several platforms including:
- [Kubernetes](http://kubernetes.io) open-source system for automating deployment, scaling, and management of containerized applications from Google. - [Kubernetes](http://kubernetes.io) open-source system for automating deployment, scaling, and management of containerized applications from Google.
...@@ -202,13 +202,13 @@ We'll introduce cluster job management on these platforms. The examples can be f ...@@ -202,13 +202,13 @@ We'll introduce cluster job management on these platforms. The examples can be f
These cluster platforms provide API or environment variables for training processes, when the job is dispatched to different nodes. Like node ID, IP or total number of nodes etc. These cluster platforms provide API or environment variables for training processes, when the job is dispatched to different nodes. Like node ID, IP or total number of nodes etc.
## Cluster Training Using Fabric ### Cluster Training Using Fabric
### Prepare a Linux cluster #### Prepare a Linux cluster
Run `kubectl -f ssh_servers.yaml` under the directory: `paddle/scripts/cluster_train_v2/fabric/docker_cluster` will launch a demo cluster. Run `kubectl get po -o wide` to get IP addresses of these nodes. Run `kubectl -f ssh_servers.yaml` under the directory: `paddle/scripts/cluster_train_v2/fabric/docker_cluster` will launch a demo cluster. Run `kubectl get po -o wide` to get IP addresses of these nodes.
### Launching Cluster Job #### Launching Cluster Job
`paddle.py` provides automatical scripts to start all PaddlePaddle cluster processes in different nodes. By default, all command line options can be set as `paddle.py` command options and `paddle.py` will transparently and automatically set these options to PaddlePaddle lower level processes. `paddle.py` provides automatical scripts to start all PaddlePaddle cluster processes in different nodes. By default, all command line options can be set as `paddle.py` command options and `paddle.py` will transparently and automatically set these options to PaddlePaddle lower level processes.
`paddle.py`provides two distinguished command option for easy job launching. `paddle.py`provides two distinguished command option for easy job launching.
...@@ -224,10 +224,10 @@ sh run.sh ...@@ -224,10 +224,10 @@ sh run.sh
The cluster Job will start in several seconds. The cluster Job will start in several seconds.
### Kill Cluster Job #### Kill Cluster Job
`paddle.py` can capture `Ctrl + C` SIGINT signal to automatically kill all processes launched by it. So just stop `paddle.py` to kill cluster job. You should manually kill the job if the program crashed. `paddle.py` can capture `Ctrl + C` SIGINT signal to automatically kill all processes launched by it. So just stop `paddle.py` to kill cluster job. You should manually kill the job if the program crashed.
### Check Cluster Training Result #### Check Cluster Training Result
Check log in $workspace/log for details, each node owns same log structure. Check log in $workspace/log for details, each node owns same log structure.
`paddle_trainer.INFO` `paddle_trainer.INFO`
...@@ -242,13 +242,13 @@ It provides stderr and stdout of parameter server process. Check error log if tr ...@@ -242,13 +242,13 @@ It provides stderr and stdout of parameter server process. Check error log if tr
`train.log` `train.log`
It provides stderr and stdout of trainer process. Check error log if training crashes. It provides stderr and stdout of trainer process. Check error log if training crashes.
### Check Model Output #### Check Model Output
After one pass finished, model files will be written in `output` directory in node 0. After one pass finished, model files will be written in `output` directory in node 0.
`nodefile` in workspace indicates the node id of current cluster job. `nodefile` in workspace indicates the node id of current cluster job.
## Cluster Training Using OpenMPI ### Cluster Training Using OpenMPI
### Prepare an OpenMPI cluster #### Prepare an OpenMPI cluster
Run the following command to start a 3-node MPI cluster and one "head" node. Run the following command to start a 3-node MPI cluster and one "head" node.
...@@ -260,7 +260,7 @@ kubectl create -f mpi-nodes.yaml ...@@ -260,7 +260,7 @@ kubectl create -f mpi-nodes.yaml
Then you can log in to every OpenMPI node using ssh without input any passwords. Then you can log in to every OpenMPI node using ssh without input any passwords.
### Launching Cluster Job #### Launching Cluster Job
Follow the steps to launch a PaddlePaddle training job in OpenMPI cluster:\ Follow the steps to launch a PaddlePaddle training job in OpenMPI cluster:\
...@@ -288,6 +288,6 @@ scp train.txt-00002 test.txt-00002 [node3IP]:/home/tutorial ...@@ -288,6 +288,6 @@ scp train.txt-00002 test.txt-00002 [node3IP]:/home/tutorial
mpirun -hostfile machines -n 3 /home/tutorial/start_mpi_train.sh mpirun -hostfile machines -n 3 /home/tutorial/start_mpi_train.sh
``` ```
## Cluster Training Using Kubernetes ### Cluster Training Using Kubernetes
The details can be found [here](../k8s/k8s_cn.md) The details can be found [here](../k8s/k8s_cn.md)
vendor/ vendor/
.glide/ .glide/
proto/*.go
hash: 51d9e2e46d7fd9173ff11ecada40f7b7728756be18d5e2f032535f66465e6e15 hash: 107c058cf5c9163a75d40eef2273a793c36112683c25d72aa8288827fdde3a19
updated: 2017-10-24T15:04:09.987751592-07:00 updated: 2017-10-30T03:46:19.137696069Z
imports: imports:
- name: github.com/alecthomas/gometalinter - name: github.com/alecthomas/gometalinter
version: bae2f1293d092fd8167939d5108d1b025eaef9de version: bae2f1293d092fd8167939d5108d1b025eaef9de
......
...@@ -30,3 +30,4 @@ import: ...@@ -30,3 +30,4 @@ import:
version: v2.13 version: v2.13
- package: github.com/go-stack/stack - package: github.com/go-stack/stack
version: v1.6.0 version: v1.6.0
- package: github.com/golang/protobuf
# Ignore everything in this directory
*
# Except this file
!.gitignore
...@@ -13,5 +13,5 @@ ...@@ -13,5 +13,5 @@
# limitations under the License. # limitations under the License.
# #
if(WITH_TESTING) if(WITH_TESTING)
go_test(pserver_test DEPS paddle_go_optimizer) go_test(pserver_test DEPS paddle_go_optimizer gen_proto_go)
endif() endif()
...@@ -17,6 +17,7 @@ package pserver ...@@ -17,6 +17,7 @@ package pserver
import ( import (
"bufio" "bufio"
"bytes" "bytes"
"encoding/binary"
"encoding/gob" "encoding/gob"
"encoding/json" "encoding/json"
"errors" "errors"
...@@ -26,11 +27,15 @@ import ( ...@@ -26,11 +27,15 @@ import (
"os" "os"
"path" "path"
"strconv" "strconv"
"strings"
"sync" "sync"
"time" "time"
"github.com/golang/protobuf/proto"
uuid "github.com/satori/go.uuid" uuid "github.com/satori/go.uuid"
pb "github.com/PaddlePaddle/Paddle/go/proto"
log "github.com/inconshreveable/log15" log "github.com/inconshreveable/log15"
) )
...@@ -65,6 +70,46 @@ type Parameter struct { ...@@ -65,6 +70,46 @@ type Parameter struct {
Content []byte Content []byte
} }
func float32ToString(b []byte) string {
f := make([]float32, len(b)/4)
buf := bytes.NewReader(b)
err := binary.Read(buf, binary.LittleEndian, &f)
if err != nil {
return ""
}
return fmt.Sprintf("%v", f)
}
func float32ByteToString(c []byte) string {
var a []byte
var b []byte
if len(c) <= 80 {
a = c
} else {
a = c[0:40]
b = c[len(c)-40:]
}
var s string
s = float32ToString(a)
if b == nil {
return s
}
s = strings.Replace(s, "]", "", -1) + "..." + strings.Replace(float32ToString(b), "[", "", -1)
return s
}
func (p Parameter) String() string {
if p.ElementType != Float32 {
return fmt.Sprintf("name:%v ElementType:%v",
p.Name, p.ElementType)
}
return float32ByteToString(p.Content)
}
// ParameterWithConfig contains the parameter and the configuration. // ParameterWithConfig contains the parameter and the configuration.
type ParameterWithConfig struct { type ParameterWithConfig struct {
Param Parameter Param Parameter
...@@ -189,7 +234,9 @@ func (s *Service) InitParam(paramWithConfigs ParameterWithConfig, _ *int) error ...@@ -189,7 +234,9 @@ func (s *Service) InitParam(paramWithConfigs ParameterWithConfig, _ *int) error
default: default:
} }
// TODO(helin): parse parameter config c := &pb.OptimizerConfig{}
proto.Unmarshal(paramWithConfigs.Config, c)
log.Debug(fmt.Sprintf("OptimizerConfig:%v", c))
s.mu.Lock() s.mu.Lock()
defer s.mu.Unlock() defer s.mu.Unlock()
...@@ -239,7 +286,8 @@ func (s *Service) SendGrad(g Gradient, _ *int) error { ...@@ -239,7 +286,8 @@ func (s *Service) SendGrad(g Gradient, _ *int) error {
select { select {
case <-s.initialized: case <-s.initialized:
default: default:
log.Warn("received gradient before initialization.", "name", g.Name, "size", len(g.Content), "type", g.ElementType) log.Warn("received gradient before initialization.",
"name", g.Name, "size", len(g.Content), "type", g.ElementType)
return errors.New(Uninitialized) return errors.New(Uninitialized)
} }
...@@ -248,10 +296,14 @@ func (s *Service) SendGrad(g Gradient, _ *int) error { ...@@ -248,10 +296,14 @@ func (s *Service) SendGrad(g Gradient, _ *int) error {
o, ok := s.optMap[g.Name] o, ok := s.optMap[g.Name]
if !ok { if !ok {
log.Warn("received gradient but can't find name.",
"name", g.Name, "size", len(g.Content), "type", g.ElementType)
return fmt.Errorf("parameter: %s does not exist", g.Name) return fmt.Errorf("parameter: %s does not exist", g.Name)
} }
log.Info("received gradient from trainer, updating gradient.", "name", g.Name, "size", len(g.Content), "type", g.ElementType) log.Debug(Parameter(g).String())
log.Info("received gradient from trainer, updating gradient.",
"name", g.Name, "size", len(g.Content), "type", g.ElementType)
return o.UpdateParameter(g) return o.UpdateParameter(g)
} }
...@@ -277,7 +329,7 @@ func (s *Service) GetParam(name string, parameter *Parameter) error { ...@@ -277,7 +329,7 @@ func (s *Service) GetParam(name string, parameter *Parameter) error {
parameter.Name = name parameter.Name = name
parameter.ElementType = opt.elementType parameter.ElementType = opt.elementType
parameter.Content = opt.GetWeights() parameter.Content = opt.GetWeights()
log.Debug(parameter.String())
log.Info("sending parameter to the trainer", "name", parameter.Name, "size", len(parameter.Content), "type", parameter.ElementType) log.Info("sending parameter to the trainer", "name", parameter.Name, "size", len(parameter.Content), "type", parameter.ElementType)
return nil return nil
} }
......
...@@ -15,6 +15,7 @@ ...@@ -15,6 +15,7 @@
package pserver_test package pserver_test
import ( import (
"fmt"
"io/ioutil" "io/ioutil"
"reflect" "reflect"
"sync" "sync"
...@@ -178,3 +179,33 @@ func TestBlockUntilInitialized(t *testing.T) { ...@@ -178,3 +179,33 @@ func TestBlockUntilInitialized(t *testing.T) {
wg.Wait() wg.Wait()
} }
func TestGradientString(t *testing.T) {
g := pserver.Parameter{}
g.ElementType = pserver.Float32
g.Content = []byte{0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40, 0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40}
if g.String() != "[3.3702806e+12 2.142699 3.3702806e+12 2.142699]" {
t.Fatal("get float data error!")
}
g.Content = []byte{0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40,
0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40}
if g.String() != "[3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699...3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699 3.3702806e+12 2.142699]" {
t.Fatal("get float data error!", g.String())
}
fmt.Println(g)
}
# gserver pacakge unittests # gserver pacakge unittests
if(NOT MOBILE_INFERENCE) add_simple_unittest(test_LinearChainCRF)
################### test_ProtoDataProvider ############ add_simple_unittest(test_MultinomialSampler)
add_unittest_without_exec(test_ProtoDataProvider add_simple_unittest(test_RecurrentLayer)
test_ProtoDataProvider.cpp)
# test_ProtoDataProvider will mkdir as same name,
# so if WORKING_DIRECTORY is default directory, then
# mkdir will get error.
add_test(NAME test_ProtoDataProvider
COMMAND ${CMAKE_CURRENT_BINARY_DIR}/test_ProtoDataProvider
WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle)
endif()
################# test_LayerGrad ####################### function(gserver_test TARGET)
add_unittest_without_exec(test_LayerGrad add_unittest_without_exec(${TARGET}
test_LayerGrad.cpp ${TARGET}.cpp
LayerGradUtil.cpp) LayerGradUtil.cpp)
add_test(NAME test_LayerGrad add_test(NAME ${TARGET}
COMMAND test_LayerGrad) COMMAND ${TARGET})
endfunction()
gserver_test(test_LayerGrad)
gserver_test(test_CRFLayerGrad)
gserver_test(test_CrossEntropyOverBeamGrad)
gserver_test(test_SeqSliceLayerGrad)
gserver_test(test_ActivationGrad)
gserver_test(test_ConvTrans)
gserver_test(test_PriorBox)
gserver_test(test_DetectionOutput)
gserver_test(test_ConvUnify)
gserver_test(test_BatchNorm)
gserver_test(test_KmaxSeqScore)
gserver_test(test_Expand)
########## test_Mkldnn layers and activations ########## ########## test_Mkldnn layers and activations ##########
if(WITH_MKLDNN) if(WITH_MKLDNN)
...@@ -32,89 +37,6 @@ if(WITH_MKLDNN) ...@@ -32,89 +37,6 @@ if(WITH_MKLDNN)
WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle) WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle)
endif() endif()
################ test_CRFLayerGrad ####################
add_unittest_without_exec(test_CRFLayerGrad
test_CRFLayerGrad.cpp
LayerGradUtil.cpp)
add_test(NAME test_CRFLayerGrad
COMMAND test_CRFLayerGrad)
################ test_CrossEntropyOverBeam ####################
add_unittest_without_exec(test_CrossEntropyOverBeam
test_CrossEntropyOverBeamGrad.cpp
LayerGradUtil.cpp)
add_test(NAME test_CrossEntropyOverBeam
COMMAND test_CrossEntropyOverBeam)
################ test_SeqSliceLayerGrad ####################
add_unittest_without_exec(test_SeqSliceLayerGrad
test_SeqSliceLayerGrad.cpp
LayerGradUtil.cpp)
add_test(NAME test_SeqSliceLayerGrad
COMMAND test_SeqSliceLayerGrad)
add_unittest_without_exec(test_ActivationGrad
test_ActivationGrad.cpp
LayerGradUtil.cpp)
add_test(NAME test_ActivationGrad
COMMAND test_ActivationGrad)
################# test_ConvTrans #######################
add_unittest_without_exec(test_ConvTrans
test_ConvTrans.cpp
LayerGradUtil.cpp)
add_test(NAME test_ConvTrans
COMMAND test_ConvTrans)
################# test_PriorBox #######################
add_unittest_without_exec(test_PriorBox
test_PriorBox.cpp
LayerGradUtil.cpp)
add_test(NAME test_PriorBox
COMMAND test_PriorBox)
################# test_DetectionOutput #######################
add_unittest_without_exec(test_DetectionOutput
test_DetectionOutput.cpp
LayerGradUtil.cpp)
add_test(NAME test_DetectionOutput
COMMAND test_DetectionOutput)
################# test_ConvUnify #######################
add_unittest_without_exec(test_ConvUnify
test_ConvUnify.cpp
LayerGradUtil.cpp)
add_test(NAME test_ConvUnify
COMMAND test_ConvUnify)
################# test_BatchNorm #######################
add_unittest_without_exec(test_BatchNorm
test_BatchNorm.cpp
LayerGradUtil.cpp)
add_test(NAME test_BatchNorm
COMMAND test_BatchNorm)
################# test_KmaxSeqScore #######################
add_unittest_without_exec(test_KmaxSeqScore
test_KmaxSeqScore.cpp
LayerGradUtil.cpp)
add_test(NAME test_KmaxSeqScore
COMMAND test_KmaxSeqScore)
if(NOT MOBILE_INFERENCE)
################## test_Evaluator #######################
add_unittest(test_Evaluator
test_Evaluator.cpp)
endif()
################ test_LinearChainCRF ####################
add_simple_unittest(test_LinearChainCRF)
############## test_MultinomialSampler ###################
add_simple_unittest(test_MultinomialSampler)
############## test_PyDataProvider ######################## ############## test_PyDataProvider ########################
if(WITH_PYTHON) if(WITH_PYTHON)
add_unittest_without_exec(test_PyDataProvider add_unittest_without_exec(test_PyDataProvider
...@@ -125,9 +47,6 @@ if(WITH_PYTHON) ...@@ -125,9 +47,6 @@ if(WITH_PYTHON)
WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle) WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle)
endif() endif()
############### test_RecurrentLayer #######################
add_simple_unittest(test_RecurrentLayer)
############### test_WarpCTCLayer ####################### ############### test_WarpCTCLayer #######################
if(NOT WITH_DOUBLE) if(NOT WITH_DOUBLE)
add_unittest_without_exec(test_WarpCTCLayer add_unittest_without_exec(test_WarpCTCLayer
...@@ -139,6 +58,21 @@ if(NOT WITH_DOUBLE) ...@@ -139,6 +58,21 @@ if(NOT WITH_DOUBLE)
endif() endif()
if(NOT MOBILE_INFERENCE) if(NOT MOBILE_INFERENCE)
################### test_ProtoDataProvider ############
add_unittest_without_exec(test_ProtoDataProvider
test_ProtoDataProvider.cpp)
# test_ProtoDataProvider will mkdir as same name,
# so if WORKING_DIRECTORY is default directory, then
# mkdir will get error.
add_test(NAME test_ProtoDataProvider
COMMAND ${CMAKE_CURRENT_BINARY_DIR}/test_ProtoDataProvider
WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle)
################## test_Evaluator #######################
add_unittest(test_Evaluator
test_Evaluator.cpp)
############### test_RecurrentGradientMachine ############### ############### test_RecurrentGradientMachine ###############
# TODO(yuyang18): There is some bug in test_RecurrentGradientMachine # TODO(yuyang18): There is some bug in test_RecurrentGradientMachine
# I will fix it. # I will fix it.
...@@ -149,9 +83,8 @@ if(NOT MOBILE_INFERENCE) ...@@ -149,9 +83,8 @@ if(NOT MOBILE_INFERENCE)
${PADDLE_SOURCE_DIR}/python:${PADDLE_SOURCE_DIR}/paddle/gserver/tests ${PADDLE_SOURCE_DIR}/python:${PADDLE_SOURCE_DIR}/paddle/gserver/tests
${CMAKE_CURRENT_BINARY_DIR}/test_RecurrentGradientMachine ${CMAKE_CURRENT_BINARY_DIR}/test_RecurrentGradientMachine
WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle) WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}/paddle)
endif()
if(NOT MOBILE_INFERENCE) ############### test_NetworkCompare ###############
add_unittest_without_exec(test_NetworkCompare add_unittest_without_exec(test_NetworkCompare
test_NetworkCompare.cpp) test_NetworkCompare.cpp)
if(WITH_GPU) if(WITH_GPU)
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <gtest/gtest.h>
#include <string>
#include <vector>
#include "LayerGradUtil.h"
#include "paddle/testing/TestUtil.h"
using namespace paddle; // NOLINT
using namespace std; // NOLINT
// Do one forward pass of expand layer and check to see if its output
// matches the given result.(Test onlyCPU currently.)
void doOneExpandTest(string trans_type,
bool hasSubseq,
bool useGpu,
Argument& input1,
Argument& input2,
Argument& result) {
FLAGS_use_gpu = false;
// Setting up the expand layer
TestConfig config;
config.layerConfig.set_type("expand");
auto inputType1 =
trans_type == "non-seq" ? INPUT_DENSE_DIM_DATA : INPUT_SEQUENCE_DATA;
config.inputDefs.push_back({inputType1, "layer0", 1, 0});
auto inputType2 =
hasSubseq ? INPUT_HASSUB_SEQUENCE_DATA : INPUT_SEQUENCE_DATA;
config.inputDefs.push_back({inputType2, "layer1", 1, 0});
config.layerConfig.add_inputs();
config.layerConfig.add_inputs();
config.layerConfig.set_trans_type(trans_type);
// data layer initialize
std::vector<DataLayerPtr> dataLayers;
LayerMap layerMap;
vector<Argument> datas;
initDataLayer(
config, &dataLayers, &datas, &layerMap, "expand", 1, false, useGpu);
dataLayers[0]->getOutput() = input1;
dataLayers[1]->getOutput() = input2;
// test layer initialize
std::vector<ParameterPtr> parameters;
LayerPtr expandLayer;
initTestLayer(config, &layerMap, &parameters, &expandLayer);
expandLayer->forward(PASS_GC);
checkMatrixEqual(expandLayer->getOutputValue(), result.value);
}
TEST(Layer, ExpandLayerFwd) {
bool useGpu = false;
// Assume batch_size =3 in all cases.
// CPU case 1. non-seq expand to seq
// input1 = 1,2,3
// input2 = [4,5],[6],[7,8,9]
// result = [1,1],[2],[3,3,3]
Argument input1, input2, result;
input1.value = Matrix::create(3, 1, false, useGpu);
real input1Data[] = {1, 2, 3};
input1.value->setData(input1Data);
input2.value = Matrix::create(6, 1, false, useGpu);
real input2Data[] = {4, 5, 6, 7, 8, 9};
input2.value->setData(input2Data);
input2.sequenceStartPositions = ICpuGpuVector::create(4, useGpu);
int input2Seq[] = {0, 2, 3, 6};
input2.sequenceStartPositions->copyFrom(input2Seq, 4, useGpu);
result.value = Matrix::create(6, 1, false, useGpu);
real resultData[] = {1, 1, 2, 3, 3, 3};
result.value->setData(resultData);
doOneExpandTest("non-seq", false, useGpu, input1, input2, result);
// CPU case 2. non-seq expand to sub-seq
// NOTE: input1.batch_size == input2.sequencelength in this case.
// i.e, input1 expands by input2.sequence
// input1 = 1,2,3
// input2 = [[4,5]],[[6]],[[7],[8,9]]
// result = [[1,1]],[[2]],[[3],[3,3]]
input2.subSequenceStartPositions = ICpuGpuVector::create(5, useGpu);
int input2SubSeq[] = {0, 2, 3, 4, 6};
input2.subSequenceStartPositions->copyFrom(input2SubSeq, 5, useGpu);
doOneExpandTest("non-seq", true, useGpu, input1, input2, result);
// CPU case 3. seq expand to sub-seq
// input1 = [1,2],[3],[4]
// input2 = [[4,5]],[[6]],[[7],[8,9]]
// result = [[1,1]],[[2]],[[3],[4,4]]
Matrix::resizeOrCreate(input1.value, 4, 1, false, useGpu);
real input1Data_case3[] = {1, 2, 3, 4};
input1.value->setData(input1Data_case3);
input1.sequenceStartPositions = ICpuGpuVector::create(4, useGpu);
int input1Seq[] = {0, 2, 3, 4};
input1.sequenceStartPositions->copyFrom(input1Seq, 4, useGpu);
real resultData_case3[] = {1, 1, 2, 3, 4, 4};
result.value->setData(resultData_case3);
doOneExpandTest("seq", true, useGpu, input1, input2, result);
}
int main(int argc, char** argv) {
testing::InitGoogleTest(&argc, argv);
initMain(argc, argv);
return RUN_ALL_TESTS();
}
...@@ -22,22 +22,35 @@ class AccuracyOp : public framework::OperatorWithKernel { ...@@ -22,22 +22,35 @@ class AccuracyOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext *ctx) const override { void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Inference"), PADDLE_ENFORCE(ctx->HasInput("Out"),
"Input(Inference) of AccuracyOp should not be null."); "Input (Out) of accuracy op should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Indices"),
"Input (Indices) of accuracy op should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Label"), PADDLE_ENFORCE(ctx->HasInput("Label"),
"Input(Label) of AccuracyOp should not be null."); "Input (Label) of accuracy op should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Accuracy"), PADDLE_ENFORCE(ctx->HasOutput("Accuracy"),
"Output(Accuracy) of AccuracyOp should not be null."); "Output (Accuracy) of AccuracyOp should not be null.");
auto inference_dim = ctx->GetInputDim("Inference"); auto inference_dim = ctx->GetInputDim("Out");
auto label_dim = ctx->GetInputDim("Label"); auto label_dim = ctx->GetInputDim("Label");
// Assume indices has same shape with infernece, because
// it's the output of topk.
PADDLE_ENFORCE_EQ(label_dim.size(), 1, "label must be a vector"); PADDLE_ENFORCE_EQ(label_dim.size(), 2, "label's rank must be 2.");
PADDLE_ENFORCE_EQ(label_dim[1], 1, "label's second dimension must be 1");
PADDLE_ENFORCE_EQ(inference_dim[0], label_dim[0], PADDLE_ENFORCE_EQ(inference_dim[0], label_dim[0],
"inference size must be the same as label size"); "the inference tensor's num_rows must be"
" the same as label.");
ctx->SetOutputDim("Accuracy", {1}); ctx->SetOutputDim("Accuracy", {1});
ctx->ShareLoD("Inference", /*->*/ "Accuracy"); ctx->ShareLoD("Out", /*->*/ "Accuracy");
}
protected:
// IndicateDataType
framework::DataType IndicateDataType(
const framework::ExecutionContext &ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("Out")->type());
} }
}; };
...@@ -47,7 +60,8 @@ class AccuracyOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -47,7 +60,8 @@ class AccuracyOpMaker : public framework::OpProtoAndCheckerMaker {
framework::OpAttrChecker *op_checker) framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
// TODO(typhoonzero): support both inference value and indices. // TODO(typhoonzero): support both inference value and indices.
AddInput("Inference", "topk(indices) the network output"); AddInput("Out", "topk (inferences) the network output");
AddInput("Indices", "topk (indices) the network output");
AddInput("Label", "Label of the training data"); AddInput("Label", "Label of the training data");
// TODO(typhoonzero): AddInput("Weight", ... // TODO(typhoonzero): AddInput("Weight", ...
AddOutput("Accuracy", "The accuracy of current batch"); AddOutput("Accuracy", "The accuracy of current batch");
...@@ -58,7 +72,7 @@ The accuracy is: ...@@ -58,7 +72,7 @@ The accuracy is:
.. math:: .. math::
accuracy = \\frac{NumOfCorrectPredicts}{NumOfAllSamples}) accuracy = \\frac{NumOfCorrectPredicts}{NumOfAllSamples})
Both the input `Inference` and `Label` can carry the LoD (Level of Details) Both the input `Out` and `Label` can carry the LoD (Level of Details)
information, or not. But the output only shares the LoD with input `Inference`. information, or not. But the output only shares the LoD with input `Inference`.
)DOC"); )DOC");
} }
...@@ -68,7 +82,10 @@ information, or not. But the output only shares the LoD with input `Inference`. ...@@ -68,7 +82,10 @@ information, or not. But the output only shares the LoD with input `Inference`.
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(accuracy, ops::AccuracyOp, ops::AccuracyOpMaker); REGISTER_OPERATOR(accuracy, ops::AccuracyOp, ops::AccuracyOpMaker,
REGISTER_OP_CPU_KERNEL( paddle::framework::EmptyGradOpMaker);
accuracy, ops::AccuracyKernel<paddle::platform::CPUPlace, int>, // FIXME(typhoonzero): types of T is for infernece data.
ops::AccuracyKernel<paddle::platform::CPUPlace, int64_t>); // label data is always int.
REGISTER_OP_CPU_KERNEL(accuracy,
ops::AccuracyKernel<paddle::platform::CPUPlace, float>,
ops::AccuracyKernel<paddle::platform::CPUPlace, double>);
...@@ -21,9 +21,10 @@ namespace paddle { ...@@ -21,9 +21,10 @@ namespace paddle {
namespace operators { namespace operators {
using platform::PADDLE_CUDA_NUM_THREADS; using platform::PADDLE_CUDA_NUM_THREADS;
template <typename T, int BlockSize> template <int BlockSize>
__global__ void AccuracyCudaKernel(const int N, const int D, const T* Xdata, __global__ void AccuracyCudaKernel(const int N, const int D,
const T* labeldata, float* accuracy) { const int64_t* Xdata,
const int64_t* labeldata, float* accuracy) {
int count = 0; int count = 0;
__shared__ int total[BlockSize]; __shared__ int total[BlockSize];
...@@ -52,13 +53,14 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> { ...@@ -52,13 +53,14 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"It must use GPUPlace."); "It must use GPUPlace.");
auto* inference = ctx.Input<Tensor>("Inference"); auto* inference = ctx.Input<Tensor>("Out");
auto* indices = ctx.Input<Tensor>("Indices");
auto* label = ctx.Input<Tensor>("Label"); auto* label = ctx.Input<Tensor>("Label");
auto* accuracy = ctx.Output<Tensor>("Accuracy"); auto* accuracy = ctx.Output<Tensor>("Accuracy");
// FIXME(typhoonzero): only support indices currently // FIXME(typhoonzero): only support indices currently
// if add support for output values, how to detect the data type? // if add support for output values, how to detect the data type?
const T* inference_data = inference->data<T>(); const int64_t* indices_data = indices->data<int64_t>();
const T* label_data = label->data<T>(); const int64_t* label_data = label->data<int64_t>();
float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace()); float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace());
size_t num_samples = inference->dims()[0]; size_t num_samples = inference->dims()[0];
...@@ -69,11 +71,11 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> { ...@@ -69,11 +71,11 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
return; return;
} }
AccuracyCudaKernel<T, PADDLE_CUDA_NUM_THREADS><<< AccuracyCudaKernel<PADDLE_CUDA_NUM_THREADS><<<
1, PADDLE_CUDA_NUM_THREADS, 0, 1, PADDLE_CUDA_NUM_THREADS, 0,
reinterpret_cast<const platform::CUDADeviceContext&>( reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context()) ctx.device_context())
.stream()>>>(num_samples, infer_width, inference_data, label_data, .stream()>>>(num_samples, infer_width, indices_data, label_data,
accuracy_data); accuracy_data);
} }
}; };
...@@ -81,5 +83,7 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> { ...@@ -81,5 +83,7 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
REGISTER_OP_GPU_KERNEL(accuracy, paddle::operators::AccuracyOpCUDAKernel<int>, // FIXME(typhoonzero): types of T is for infernece data.
paddle::operators::AccuracyOpCUDAKernel<int64_t>); // label data is always int
REGISTER_OP_GPU_KERNEL(accuracy, paddle::operators::AccuracyOpCUDAKernel<float>,
paddle::operators::AccuracyOpCUDAKernel<double>);
...@@ -38,14 +38,15 @@ template <typename Place, typename T> ...@@ -38,14 +38,15 @@ template <typename Place, typename T>
class AccuracyKernel : public framework::OpKernel<T> { class AccuracyKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* inference = ctx.Input<Tensor>("Inference"); auto* inference = ctx.Input<Tensor>("Out");
auto* indices = ctx.Input<Tensor>("Indices");
auto* label = ctx.Input<Tensor>("Label"); auto* label = ctx.Input<Tensor>("Label");
auto* accuracy = ctx.Output<Tensor>("Accuracy"); auto* accuracy = ctx.Output<Tensor>("Accuracy");
float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace()); float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace());
const T* inference_data = inference->data<T>(); const int64_t* indices_data = indices->data<int64_t>();
const T* label_data = label->data<T>(); const int64_t* label_data = label->data<int64_t>();
size_t num_samples = inference->dims()[0]; size_t num_samples = inference->dims()[0];
size_t class_dim = inference->dims()[1]; size_t class_dim = inference->dims()[1];
...@@ -60,7 +61,7 @@ class AccuracyKernel : public framework::OpKernel<T> { ...@@ -60,7 +61,7 @@ class AccuracyKernel : public framework::OpKernel<T> {
for (size_t i = 0; i < num_samples; ++i) { for (size_t i = 0; i < num_samples; ++i) {
PADDLE_ENFORCE_GE(label_data[i], 0, "label must >= 0"); PADDLE_ENFORCE_GE(label_data[i], 0, "label must >= 0");
for (size_t j = 0; j < class_dim; ++j) { for (size_t j = 0; j < class_dim; ++j) {
if (inference_data[i * class_dim + j] == label_data[i]) { if (indices_data[i * class_dim + j] == label_data[i]) {
++num_correct; ++num_correct;
break; break;
} }
......
...@@ -547,6 +547,7 @@ struct ELUGradFunctor : public BaseActivationFunctor<T> { ...@@ -547,6 +547,7 @@ struct ELUGradFunctor : public BaseActivationFunctor<T> {
} }
}; };
// FIXME(qijun) https://github.com/PaddlePaddle/Paddle/issues/5198
template <typename T> template <typename T>
struct PowFunctor : public BaseActivationFunctor<T> { struct PowFunctor : public BaseActivationFunctor<T> {
float factor; float factor;
......
...@@ -23,18 +23,26 @@ class AucOp : public framework::OperatorWithKernel { ...@@ -23,18 +23,26 @@ class AucOp : public framework::OperatorWithKernel {
protected: protected:
void InferShape(framework::InferShapeContext *ctx) const override { void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("Inference"), PADDLE_ENFORCE(ctx->HasInput("Out"), "Input of Out must be initialized.");
"Input of Inference must be initialized."); PADDLE_ENFORCE(ctx->HasInput("Indices"),
"Input of Indices must be initialized.");
PADDLE_ENFORCE(ctx->HasInput("Label"), PADDLE_ENFORCE(ctx->HasInput("Label"),
"Input of Label must be initialized."); "Input of Label must be initialized.");
auto inference_dim = ctx->GetInputDim("Inference"); auto inference_height = ctx->GetInputDim("Out")[0];
auto label_dim = ctx->GetInputDim("Label"); auto label_height = ctx->GetInputDim("Label")[0];
PADDLE_ENFORCE_EQ(inference_dim, label_dim, PADDLE_ENFORCE_EQ(inference_height, label_height,
"inference and label should have same shape"); "Out and Label should have same height.");
ctx->SetOutputDim("AUC", {1}); ctx->SetOutputDim("AUC", {1});
ctx->ShareLoD("Inference", /*->*/ "AUC"); ctx->ShareLoD("Out", /*->*/ "AUC");
}
protected:
// IndicateDataType
framework::DataType IndicateDataType(
const framework::ExecutionContext &ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("Out")->type());
} }
}; };
...@@ -42,12 +50,18 @@ class AucOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -42,12 +50,18 @@ class AucOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AucOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) AucOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Inference", AddInput("Out",
"A floating point tensor of arbitrary shape and whose values" "A floating point 2D tensor, values are in the range [0, 1]."
"are in the range [0, 1]."); "Each row is descend sorted. This input should be the"
"output of topk."
"Typically, this tensor indicates the probability of each label");
AddInput("Indices",
"An int 2D tensor, indicating the indices of original"
"tensor before sort. Typically, this tensor indicates which label"
"the probability stands for.");
AddInput("Label", AddInput("Label",
"A tensor whose shape matches " "A 2D int tensor indicating the label of the training data."
"Inference. Will be cast to bool."); "The height is batch size and width is always 1.");
// TODO(typhoonzero): support weight input // TODO(typhoonzero): support weight input
AddOutput("AUC", AddOutput("AUC",
"A scalar representing the " "A scalar representing the "
......
...@@ -29,7 +29,7 @@ template <typename Place, typename T> ...@@ -29,7 +29,7 @@ template <typename Place, typename T>
class AucKernel : public framework::OpKernel<T> { class AucKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* inference = ctx.Input<Tensor>("Inference"); auto* inference = ctx.Input<Tensor>("Out");
auto* label = ctx.Input<Tensor>("Label"); auto* label = ctx.Input<Tensor>("Label");
auto* auc = ctx.Output<Tensor>("AUC"); auto* auc = ctx.Output<Tensor>("AUC");
...@@ -46,18 +46,11 @@ class AucKernel : public framework::OpKernel<T> { ...@@ -46,18 +46,11 @@ class AucKernel : public framework::OpKernel<T> {
thresholds_list[0] = 0.0f - kEpsilon; thresholds_list[0] = 0.0f - kEpsilon;
thresholds_list[num_thresholds - 1] = 1.0f + kEpsilon; thresholds_list[num_thresholds - 1] = 1.0f + kEpsilon;
size_t num_samples = inference->numel(); size_t batch_size = inference->dims()[0];
size_t inference_width = inference->dims()[1];
const T* inference_data = inference->data<T>(); const T* inference_data = inference->data<T>();
Tensor label_casted; const int64_t* label_data = label->data<int64_t>();
label_casted.Resize(label->dims());
bool* label_casted_data = label_casted.mutable_data<bool>(ctx.GetPlace());
const int* label_data = label->data<int>();
// cast label_data to bool
for (size_t i = 0; i < num_samples; i++) {
label_casted_data[i] = static_cast<bool>(label_data[i]);
}
// Create local tensor for storing the curve: TP, FN, TN, FP // Create local tensor for storing the curve: TP, FN, TN, FP
// TODO(typhoonzero): use eigen op to caculate these values. // TODO(typhoonzero): use eigen op to caculate these values.
...@@ -68,23 +61,27 @@ class AucKernel : public framework::OpKernel<T> { ...@@ -68,23 +61,27 @@ class AucKernel : public framework::OpKernel<T> {
true_negative.Resize({num_thresholds}); true_negative.Resize({num_thresholds});
false_positive.Resize({num_thresholds}); false_positive.Resize({num_thresholds});
int* tp_data = true_positive.mutable_data<int>(ctx.GetPlace()); int64_t* tp_data = true_positive.mutable_data<int64_t>(ctx.GetPlace());
int* fn_data = false_negative.mutable_data<int>(ctx.GetPlace()); int64_t* fn_data = false_negative.mutable_data<int64_t>(ctx.GetPlace());
int* tn_data = true_negative.mutable_data<int>(ctx.GetPlace()); int64_t* tn_data = true_negative.mutable_data<int64_t>(ctx.GetPlace());
int* fp_data = false_positive.mutable_data<int>(ctx.GetPlace()); int64_t* fp_data = false_positive.mutable_data<int64_t>(ctx.GetPlace());
for (int idx_thresh = 0; idx_thresh < num_thresholds; idx_thresh++) { for (int idx_thresh = 0; idx_thresh < num_thresholds; idx_thresh++) {
// caculate TP, FN, TN, FP for current thresh // caculate TP, FN, TN, FP for current thresh
int tp = 0, fn = 0, tn = 0, fp = 0; int64_t tp = 0, fn = 0, tn = 0, fp = 0;
for (size_t i = 0; i < num_samples; i++) { for (size_t i = 0; i < batch_size; i++) {
if (label_casted_data[i]) { // NOTE: label_data used as bool, labels >0 will be treated as true.
if (inference_data[i] >= (thresholds_list[idx_thresh])) { if (label_data[i]) {
// use first(max) data in each row
if (inference_data[i * inference_width] >=
(thresholds_list[idx_thresh])) {
tp++; tp++;
} else { } else {
fn++; fn++;
} }
} else { } else {
if (inference_data[i] >= (thresholds_list[idx_thresh])) { if (inference_data[i * inference_width] >=
(thresholds_list[idx_thresh])) {
fp++; fp++;
} else { } else {
tn++; tn++;
......
...@@ -16,36 +16,36 @@ limitations under the License. */ ...@@ -16,36 +16,36 @@ limitations under the License. */
#include "paddle/framework/eigen.h" #include "paddle/framework/eigen.h"
#include "paddle/framework/lod_tensor.h" #include "paddle/framework/lod_tensor.h"
#include "paddle/framework/tensor.h"
#include "paddle/operators/math/im2col.h" #include "paddle/operators/math/im2col.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
namespace math { namespace math {
using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
template <typename T, int MajorType = Eigen::RowMajor, template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex> typename IndexType = Eigen::DenseIndex>
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>; using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
/* /*
* \brief Context projection concatenate features in adjacent time steps in * \brief Context projection concatenates features in adjacent time-steps in
* a sequence. The i-th row of the output is the concatenation of * a sequence. The i-th row of the output is the concatenation of
* context_length rows of the input. The context_length rows are the * context_length rows of the input. The context_length rows are the
* consecutive rows from the i+shift_start row. * consecutive rows from the i+shift_start row.
* ContextProjectGradFunctor is the inverse process of ContextProjectFunctor.
*
* \param in Input data. * \param in Input data.
* \param Shape The shape of Input data, * \param Shape The shape of Input data:
* [minibatch, number_of_input_features]. * [mini-batch, input_hidden_size].
* \param type A float LoDTensor.
* *
* \param padding_data Padding data. * \param padding_data Padding data.
* \param Shape The shape of Padding data, * \param Shape The shape of Padding data:
* [up_pad + down_pad, number_of_input_features]. * [up_pad + down_pad, input_hidden_size].
* \param type A float Tensor.
* *
* \param col Col data. * \param col Col data.
* \param Shape The shape of Col data, * \param Shape The shape of Col data:
* [minibatch, context_length * number_of_input_features]. * [mini-batch, context_length * input_hidden_size].
* \param type A float Tensor.
* *
* For a mini-batch of 2 variable lengths sentences, containing 3, and 1 * For a mini-batch of 2 variable lengths sentences, containing 3, and 1
* time-steps: * time-steps:
...@@ -65,7 +65,7 @@ using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>; ...@@ -65,7 +65,7 @@ using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
* - Case1: * - Case1:
* If context_start is -1 and padding_trainable is false, we use zero to pad * If context_start is -1 and padding_trainable is false, we use zero to pad
* instead of learned weight to pad, * instead of learned weight to pad,
* and the context_lenth is 3, the output (Out) is: * and the context_length is 3, the output (Out) is:
* *
* Out =[[0, 0, a1, a2, b1, b2; * Out =[[0, 0, a1, a2, b1, b2;
* a1, a2, b1, b2, c1, c2; * a1, a2, b1, b2, c1, c2;
...@@ -75,7 +75,7 @@ using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>; ...@@ -75,7 +75,7 @@ using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
* - Case2: * - Case2:
* If context_start is -1 and padding_trainable is true, we use learned weight * If context_start is -1 and padding_trainable is true, we use learned weight
* to pad, * to pad,
* and the context_lenth is 3, the output (Out) is: * and the context_length is 3, the output (Out) is:
* *
* Out = [[w1, w2, a1, a2, b1, b2; * Out = [[w1, w2, a1, a2, b1, b2;
* a1, a2, b1, b2, c1, c2; * a1, a2, b1, b2, c1, c2;
...@@ -87,48 +87,146 @@ using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>; ...@@ -87,48 +87,146 @@ using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
template <typename Place, typename T> template <typename Place, typename T>
class ContextProjectFunctor { class ContextProjectFunctor {
public: public:
void operator()(const platform::DeviceContext& context, void operator()(const platform::DeviceContext& context, const LoDTensor& in,
framework::LoDTensor& in, framework::Tensor& padding_data, const Tensor& padding_data, Tensor& col,
framework::Tensor& col, bool padding_trainable, bool padding_trainable, int context_start, int context_length,
int context_start, int context_length, int context_stride, int context_stride, int up_pad, int down_pad) {
int up_pad, int down_pad, bool gradient, bool input_grad,
bool pad_grad) {
auto lod_level_0 = in.lod()[0]; auto lod_level_0 = in.lod()[0];
paddle::operators::math::Im2ColFunctor< math::Im2ColFunctor<math::ColFormat::kOCF, Place, float> im2col_ocf;
paddle::operators::math::ColFormat::kOCF, Place, float>
im2col_ocf;
paddle::operators::math::Col2ImFunctor<
paddle::operators::math::ColFormat::kOCF, Place, float>
col2im_ocf;
int input_row_begin, input_row_end; int input_row_begin, input_row_end;
int sequence_height, sequence_width; int sequence_height, sequence_width;
sequence_width = in.dims()[1]; sequence_width = in.dims()[1];
input_grad = gradient && input_grad;
pad_grad = gradient && pad_grad;
if (!gradient || input_grad) {
for (int i = 0; i < static_cast<int>(lod_level_0.size()) - 1; ++i) { for (int i = 0; i < static_cast<int>(lod_level_0.size()) - 1; ++i) {
input_row_begin = (context_start > 0) input_row_begin = (context_start > 0)
? static_cast<int>(lod_level_0[i]) + context_start ? static_cast<int>(lod_level_0[i]) + context_start
: static_cast<int>(lod_level_0[i]); : static_cast<int>(lod_level_0[i]);
input_row_end = static_cast<int>(lod_level_0[i + 1]); input_row_end = static_cast<int>(lod_level_0[i + 1]);
framework::Tensor out_t = Tensor out_t = col.Slice(static_cast<int>(lod_level_0[i]),
col.Slice(static_cast<int>(lod_level_0[i]),
static_cast<int>(lod_level_0[i + 1])); static_cast<int>(lod_level_0[i + 1]));
sequence_height = static_cast<int>(out_t.dims()[0]); sequence_height = static_cast<int>(out_t.dims()[0]);
if (input_row_begin < input_row_end) { if (input_row_begin < input_row_end) {
framework::Tensor in_t = in.Slice(input_row_begin, input_row_end); Tensor in_t = in.Slice(input_row_begin, input_row_end);
std::vector<int64_t> output_shape( std::vector<int64_t> output_shape(
{sequence_height, 1, 1, context_length, {sequence_height, 1, 1, context_length,
sequence_width}); // output_height, output_width, sequence_width}); // output_height, output_width,
// input_channels, filter_height, filter_width // input_channels, filter_height, filter_width
out_t.Resize(framework::make_ddim(output_shape));
std::vector<int64_t> input_shape(
{1, input_row_end - input_row_begin,
sequence_width}); // input_channels, input_height, input_width
in_t.Resize(framework::make_ddim(input_shape));
im2col_ocf(context, in_t, out_t,
/*stride_height*/ context_stride, /*stride_width*/ 1, up_pad,
down_pad, 0, 0);
out_t.Resize({sequence_height, context_length * sequence_width});
}
}
if (padding_trainable) {
for (int i = 0; i < static_cast<int>(lod_level_0.size()) - 1; ++i) {
Tensor out_t = col.Slice(static_cast<int>(lod_level_0[i]),
static_cast<int>(lod_level_0[i + 1]));
sequence_height = static_cast<int>(out_t.dims()[0]);
// add up trainable data
out_t.Resize({sequence_height * context_length, sequence_width});
if (up_pad > 0) { // add up pad
int padding_rows = std::min(
up_pad, static_cast<int>(lod_level_0[i + 1] - lod_level_0[i]));
for (int k = 0; k < padding_rows; ++k) {
int padding_size =
k + context_length < up_pad ? context_length : up_pad - k;
Tensor out_t_sub = out_t.Slice(k * context_length,
k * context_length + padding_size);
Tensor w_sub = padding_data.Slice(k, k + padding_size);
auto out_t_sub_e = EigenMatrix<T>::From(out_t_sub);
auto w_sub_e = EigenMatrix<T>::From(w_sub);
out_t_sub_e.device(*context.GetEigenDevice<Place>()) = w_sub_e;
}
}
if (down_pad > 0) { // add down pad
int down_pad_begin_row =
std::max(0,
(sequence_height - context_start - context_length) + 1) +
1;
int padding_begin = std::max(0, context_start - sequence_height);
int padding_size =
sequence_height - context_start >= context_length
? 1
: context_length - (sequence_height - context_start);
if (context_start >= sequence_height) padding_size = context_length;
int padding_idx = padding_begin;
for (int t = 0; t + down_pad_begin_row <= sequence_height;
++t, ++padding_size) {
if (context_start >= sequence_height) padding_size = context_length;
if (padding_size > context_length) {
padding_size = context_length;
padding_idx++;
}
if (padding_begin > 0 || sequence_height == context_start)
padding_idx = padding_begin + t;
Tensor out_t_sub = out_t.Slice(
(down_pad_begin_row + t) * context_length - padding_size,
(down_pad_begin_row + t) * context_length);
Tensor w_sub = padding_data.Slice(
up_pad + padding_idx, up_pad + padding_idx + padding_size);
auto out_t_sub_e = EigenMatrix<T>::From(out_t_sub);
auto w_sub_e = EigenMatrix<T>::From(w_sub);
out_t_sub_e.device(*context.GetEigenDevice<Place>()) = w_sub_e;
}
}
out_t.Resize({sequence_height, context_length * sequence_width});
}
}
}
};
template <typename Place, typename T>
class ContextProjectGradFunctor {
public:
void operator()(const platform::DeviceContext& context, LoDTensor& in,
Tensor& padding_data, Tensor& col, bool padding_trainable,
int context_start, int context_length, int context_stride,
int up_pad, int down_pad, bool input_grad, bool pad_grad) {
auto lod_level_0 = in.lod()[0];
math::Col2ImFunctor<math::ColFormat::kOCF, Place, float> col2im_ocf;
int input_row_begin, input_row_end;
int sequence_height, sequence_width;
sequence_width = in.dims()[1];
if (input_grad) {
for (int i = 0; i < static_cast<int>(lod_level_0.size()) - 1; ++i) {
input_row_begin = (context_start > 0)
? static_cast<int>(lod_level_0[i]) + context_start
: static_cast<int>(lod_level_0[i]);
input_row_end = static_cast<int>(lod_level_0[i + 1]);
Tensor out_t = col.Slice(static_cast<int>(lod_level_0[i]),
static_cast<int>(lod_level_0[i + 1]));
sequence_height = static_cast<int>(out_t.dims()[0]);
if (input_row_begin < input_row_end) {
Tensor in_t = in.Slice(input_row_begin, input_row_end);
std::vector<int64_t> output_shape(
{sequence_height, 1, 1, context_length,
sequence_width}); // output_height, output_width,
// input_channels, filter_height, filter_width
out_t.Resize(framework::make_ddim(output_shape)); out_t.Resize(framework::make_ddim(output_shape));
std::vector<int64_t> input_shape( std::vector<int64_t> input_shape(
...@@ -136,53 +234,39 @@ class ContextProjectFunctor { ...@@ -136,53 +234,39 @@ class ContextProjectFunctor {
sequence_width}); // input_channels, input_height, input_width sequence_width}); // input_channels, input_height, input_width
in_t.Resize(framework::make_ddim(input_shape)); in_t.Resize(framework::make_ddim(input_shape));
if (gradient) {
col2im_ocf(context, in_t, out_t, col2im_ocf(context, in_t, out_t,
/*stride_height*/ context_stride, /*stride_width*/ 1, /*stride_height*/ context_stride, /*stride_width*/ 1,
up_pad, down_pad, 0, 0); up_pad, down_pad, 0, 0);
} else {
im2col_ocf(context, in_t, out_t,
/*stride_height*/ context_stride, /*stride_width*/ 1,
up_pad, down_pad, 0, 0);
}
out_t.Resize({sequence_height, context_length * sequence_width}); out_t.Resize({sequence_height, context_length * sequence_width});
} }
} }
} }
if (!gradient || pad_grad) { if (pad_grad) {
if (padding_trainable) { if (padding_trainable) {
for (int i = 0; i < static_cast<int>(lod_level_0.size()) - 1; ++i) { for (int i = 0; i < static_cast<int>(lod_level_0.size()) - 1; ++i) {
framework::Tensor out_t = Tensor out_t = col.Slice(static_cast<int>(lod_level_0[i]),
col.Slice(static_cast<int>(lod_level_0[i]),
static_cast<int>(lod_level_0[i + 1])); static_cast<int>(lod_level_0[i + 1]));
sequence_height = static_cast<int>(out_t.dims()[0]); sequence_height = static_cast<int>(out_t.dims()[0]);
// add up trainable data
out_t.Resize({sequence_height * context_length, sequence_width}); out_t.Resize({sequence_height * context_length, sequence_width});
if (up_pad > 0) { // add up pad if (up_pad > 0) {
int padding_rows = std::min( int padding_rows = std::min(
up_pad, static_cast<int>(lod_level_0[i + 1] - lod_level_0[i])); up_pad, static_cast<int>(lod_level_0[i + 1] - lod_level_0[i]));
for (int k = 0; k < padding_rows; ++k) { for (int k = 0; k < padding_rows; ++k) {
int padding_size = int padding_size =
k + context_length < up_pad ? context_length : up_pad - k; k + context_length < up_pad ? context_length : up_pad - k;
framework::Tensor out_t_sub = out_t.Slice( Tensor out_t_sub = out_t.Slice(k * context_length,
k * context_length, k * context_length + padding_size); k * context_length + padding_size);
framework::Tensor w_sub = padding_data.Slice(k, k + padding_size); Tensor w_sub = padding_data.Slice(k, k + padding_size);
// in this block, using EigenVector<T>::Flatten is ok too.
auto out_t_sub_e = EigenMatrix<T>::From(out_t_sub); auto out_t_sub_e = EigenMatrix<T>::From(out_t_sub);
auto w_sub_e = EigenMatrix<T>::From(w_sub); auto w_sub_e = EigenMatrix<T>::From(w_sub);
if (gradient) {
w_sub_e.device(*context.GetEigenDevice<Place>()) = w_sub_e.device(*context.GetEigenDevice<Place>()) =
w_sub_e + out_t_sub_e; w_sub_e + out_t_sub_e;
} else {
out_t_sub_e.device(*context.GetEigenDevice<Place>()) = w_sub_e;
}
} }
} }
if (down_pad > 0) { // add down pad if (down_pad > 0) {
int down_pad_begin_row = int down_pad_begin_row =
std::max( std::max(
0, (sequence_height - context_start - context_length) + 1) + 0, (sequence_height - context_start - context_length) + 1) +
...@@ -204,19 +288,16 @@ class ContextProjectFunctor { ...@@ -204,19 +288,16 @@ class ContextProjectFunctor {
} }
if (padding_begin > 0 || sequence_height == context_start) if (padding_begin > 0 || sequence_height == context_start)
padding_idx = padding_begin + t; padding_idx = padding_begin + t;
framework::Tensor out_t_sub = out_t.Slice(
Tensor out_t_sub = out_t.Slice(
(down_pad_begin_row + t) * context_length - padding_size, (down_pad_begin_row + t) * context_length - padding_size,
(down_pad_begin_row + t) * context_length); (down_pad_begin_row + t) * context_length);
framework::Tensor w_sub = padding_data.Slice( Tensor w_sub = padding_data.Slice(
up_pad + padding_idx, up_pad + padding_idx + padding_size); up_pad + padding_idx, up_pad + padding_idx + padding_size);
auto out_t_sub_e = EigenMatrix<T>::From(out_t_sub); auto out_t_sub_e = EigenMatrix<T>::From(out_t_sub);
auto w_sub_e = EigenMatrix<T>::From(w_sub); auto w_sub_e = EigenMatrix<T>::From(w_sub);
if (gradient) {
w_sub_e.device(*context.GetEigenDevice<Place>()) = w_sub_e.device(*context.GetEigenDevice<Place>()) =
w_sub_e + out_t_sub_e; w_sub_e + out_t_sub_e;
} else {
out_t_sub_e.device(*context.GetEigenDevice<Place>()) = w_sub_e;
}
} }
} }
out_t.Resize({sequence_height, context_length * sequence_width}); out_t.Resize({sequence_height, context_length * sequence_width});
......
...@@ -48,7 +48,7 @@ class SeqExpandKernel : public framework::OpKernel<T> { ...@@ -48,7 +48,7 @@ class SeqExpandKernel : public framework::OpKernel<T> {
x_t(x_data, 1, element_len); x_t(x_data, 1, element_len);
Eigen::TensorMap<Eigen::Tensor<T, 2, Eigen::RowMajor, Eigen::DenseIndex>> Eigen::TensorMap<Eigen::Tensor<T, 2, Eigen::RowMajor, Eigen::DenseIndex>>
out_t(out_data, scale, element_len); out_t(out_data, scale, element_len);
Eigen::array<int, 2> cast({scale, 1}); Eigen::array<int, 2> cast({{scale, 1}});
out_t.device(place) = x_t.broadcast(cast); out_t.device(place) = x_t.broadcast(cast);
x_data += element_len; x_data += element_len;
out_data += element_len * scale; out_data += element_len * scale;
......
...@@ -30,19 +30,20 @@ class SequenceConvOp : public framework::OperatorWithKernel { ...@@ -30,19 +30,20 @@ class SequenceConvOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE(ctx->HasOutput("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of SequenceConvOp should not be null."); "Output(Out) of SequenceConvOp should not be null.");
int context_length = ctx->Attrs().Get<int>("context_length"); int context_length = ctx->Attrs().Get<int>("contextLength");
bool padding_trainable = ctx->Attrs().Get<bool>("padding_trainable"); int context_start = ctx->Attrs().Get<int>("contextStart");
int context_start = ctx->Attrs().Get<int>("context_start");
auto in_dims = ctx->GetInputDim("X"); auto in_dims = ctx->GetInputDim("X");
auto filter_dims = ctx->GetInputDim("Filter"); auto filter_dims = ctx->GetInputDim("Filter");
PADDLE_ENFORCE(ctx->Attrs().Get<int>("contextStride") == 1,
"Currently, SequenceConvOp only supports contextStride=1.");
PADDLE_ENFORCE(in_dims.size() == 2 && filter_dims.size() == 2, PADDLE_ENFORCE(in_dims.size() == 2 && filter_dims.size() == 2,
"Input(X, Filter) should be 2-D tensor."); "Input(X, Filter) should be 2-D tensor.");
PADDLE_ENFORCE(filter_dims[0] == context_length * in_dims[1], PADDLE_ENFORCE(filter_dims[0] == context_length * in_dims[1],
"Filter's height should be context_length * " "Filter's height should be context_length * "
"number_of_input_features ."); "input_hidden_size .");
if (padding_trainable) { if (ctx->Attrs().Get<bool>("paddingTrainable")) {
PADDLE_ENFORCE( PADDLE_ENFORCE(
ctx->HasInput("PaddingData"), ctx->HasInput("PaddingData"),
"Input(PaddingData) of SequenceConvOp should not be null."); "Input(PaddingData) of SequenceConvOp should not be null.");
...@@ -54,7 +55,7 @@ class SequenceConvOp : public framework::OperatorWithKernel { ...@@ -54,7 +55,7 @@ class SequenceConvOp : public framework::OperatorWithKernel {
if (context_start == 0 && context_length == 1) { if (context_start == 0 && context_length == 1) {
PADDLE_THROW( PADDLE_THROW(
"If context_start is 0 and context_length is 1, padding_trainable " "If context_start is 0 and context_length is 1, paddingTrainable "
"should be false."); "should be false.");
} }
PADDLE_ENFORCE(padding_dim.size() == 2, PADDLE_ENFORCE(padding_dim.size() == 2,
...@@ -81,13 +82,14 @@ class SequenceConvGradOp : public framework::OperatorWithKernel { ...@@ -81,13 +82,14 @@ class SequenceConvGradOp : public framework::OperatorWithKernel {
"Gradient of output(Out) should not be null."); "Gradient of output(Out) should not be null.");
PADDLE_ENFORCE(ctx->HasInput("X"), "The input(X) should not be null."); PADDLE_ENFORCE(ctx->HasInput("X"), "The input(X) should not be null.");
if (ctx->Attrs().Get<bool>("padding_trainable") && if (ctx->Attrs().Get<bool>("paddingTrainable") &&
ctx->HasOutput(framework::GradVarName("PaddingData"))) { ctx->HasOutput(framework::GradVarName("PaddingData"))) {
ctx->SetOutputDim(framework::GradVarName("PaddingData"), ctx->SetOutputDim(framework::GradVarName("PaddingData"),
ctx->GetInputDim("PaddingData")); ctx->GetInputDim("PaddingData"));
} }
if (ctx->HasOutput(framework::GradVarName("X"))) { if (ctx->HasOutput(framework::GradVarName("X"))) {
ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X")); ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X"));
ctx->ShareLoD(framework::GradVarName("X"), "X");
} }
if (ctx->HasOutput(framework::GradVarName("Filter"))) { if (ctx->HasOutput(framework::GradVarName("Filter"))) {
ctx->SetOutputDim(framework::GradVarName("Filter"), ctx->SetOutputDim(framework::GradVarName("Filter"),
...@@ -105,54 +107,58 @@ class SequenceConvOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -105,54 +107,58 @@ class SequenceConvOpMaker : public framework::OpProtoAndCheckerMaker {
"X", "X",
"(LoDTensor) the input(X) is a LodTensor, which support " "(LoDTensor) the input(X) is a LodTensor, which support "
"variable-time length input sequence. The underlying tensor in " "variable-time length input sequence. The underlying tensor in "
"this LoDTensor is a matrix with shape (T, D), where, T is the " "this LoDTensor is a matrix with shape (T, N), where, T is the "
"total time steps in this mini-batch, D is the input feature size."); "total time steps in this mini-batch, N is the input_hidden_size.");
AddInput("PaddingData", AddInput("PaddingData",
"(Tensor, optional) the input(PaddingData) is an optional " "(Tensor, optional) the input(PaddingData) is an optional "
"parameter, and it is learnable. " "parameter, and it is learnable. "
"This is a tensor with shape (N, D), where N is the " "This is a tensor with shape (P, N), where P is the "
"top_pad + bottom_pad, D is the input feature size. In order to " "top_pad + bottom_pad, N is the input_hidden_size. In order to "
"ensure the equal length of sequence before and after " "ensure the equal length of sequence before and after "
"convolution, it is necessary to fill the top and bottom of each " "convolution, it is necessary to fill the top and bottom of each "
"sequence according to context_length, context_stride and " "sequence according to context_length, context_stride and "
"context_start") "context_start")
.AsDispensable(); .AsDispensable();
AddInput("Filter", AddInput(
"Filter",
"(Tensor) the input(Filter) is an learnable parameter." "(Tensor) the input(Filter) is an learnable parameter."
"This is a tensor with shape (N, D), where N is the " "This is a tensor with shape (K, M), where K is the "
"context_length, D is the output feature size."); "context_length * input_hidden_size, M is the output feature size.");
AddOutput( AddOutput(
"Out", "Out",
"(LoDTensor) the output(Out) is a LodTensor, which support " "(LoDTensor) the output(Out) is a LodTensor, which support "
"variable-time length output sequence. The underlying tensor in " "variable-time length output sequence. The underlying tensor in "
"this LoDTensor is a matrix with shape (T, D), where, T is the " "this LoDTensor is a matrix with shape (T, M), where, T is the "
"total time steps in this mini-batch, D is the output feature size."); "total time steps in this mini-batch, M is the output feature size.");
AddAttr<bool>("padding_trainable", AddAttr<bool>("paddingTrainable",
"(bool, default false) the padding data of SequenceConvOp " "(bool, default:false) the padding data of SequenceConvOp "
"is trainable or not.") "is trainable or not.")
.SetDefault(false); .SetDefault(false);
AddAttr<int>("context_length", AddAttr<int>("contextLength",
"(int, default 3) the context_length of SequenceConvOp is the " "(int) the contextLength of SequenceConvOp is the "
"height of the convolution kernel.") "height of the convolution kernel.")
.SetDefault(3)
.GreaterThan(0); .GreaterThan(0);
AddAttr<int>("context_start", AddAttr<int>("contextStart",
"(int, default 0) the context_start of SequenceConvOp " "(int, default:0) the contextStart of SequenceConvOp "
"represents the beginning of the convolution of the number of " "represents the beginning of the convolution of the number of "
"rows of sequence, which can be negative.") "rows of sequence, which can be negative. The negative number "
"means to pad contextStart time-steps of zeros or learnable "
"parameters at the beginning of each instance. The positive "
"number means to skip contextStart time-steps of each "
"instance.")
.SetDefault(0); .SetDefault(0);
AddAttr<int>("context_stride", AddAttr<int>("contextStride",
"(int, default 1) the context_stride of SequenceConvOp " "(int, default:1) the contextStride of SequenceConvOp "
"represents the step length of convolution. " "represents the stride length of convolution kernel. "
"Currently, SequenceConvOp only supports" "Currently, SequenceConvOp only supports"
"context_stride=1.") "contextStride=1.")
.SetDefault(1) .SetDefault(1)
.GreaterThan(0); .GreaterThan(0);
AddComment(R"DOC( AddComment(R"DOC(
SequenceConvOp performs convolution operation on features of SequenceConvOp performs convolution operation on features of
context_length time-steps of each instance. contextLength time-steps of each instance.
The convolution operation calculates the output based on the input, filter The convolution operation calculates the output based on the input, filter
and strides, paddings parameters. The size of each dimension of the and strides, paddings parameters. The size of each dimension of the
parameters is checked in the infer-shape. In order to ensure the equal parameters is checked in the infer-shape. In order to ensure the equal
......
...@@ -35,12 +35,11 @@ class SequenceConvKernel : public framework::OpKernel<T> { ...@@ -35,12 +35,11 @@ class SequenceConvKernel : public framework::OpKernel<T> {
out->mutable_data<T>(context.GetPlace()); out->mutable_data<T>(context.GetPlace());
context.ShareLoD("X", "Out"); context.ShareLoD("X", "Out");
int context_start = context.Attr<int>("context_start"); int context_start = context.Attr<int>("contextStart");
int context_length = context.Attr<int>("context_length"); int context_length = context.Attr<int>("contextLength");
int context_stride = context.Attr<int>("context_stride"); int context_stride = context.Attr<int>("contextStride");
bool padding_trainable = context.Attr<bool>("padding_trainable"); bool padding_trainable = context.Attr<bool>("paddingTrainable");
// InferShape by in_lod
PADDLE_ENFORCE_EQ(in->lod().size(), 1UL, PADDLE_ENFORCE_EQ(in->lod().size(), 1UL,
"Only support one level sequence now."); "Only support one level sequence now.");
...@@ -51,26 +50,21 @@ class SequenceConvKernel : public framework::OpKernel<T> { ...@@ -51,26 +50,21 @@ class SequenceConvKernel : public framework::OpKernel<T> {
int up_pad = std::max(0, -context_start); int up_pad = std::max(0, -context_start);
int down_pad = std::max(0, context_start + context_length - 1); int down_pad = std::max(0, context_start + context_length - 1);
int sequence_width; int sequence_width = static_cast<int>(in->dims()[1]);
sequence_width = static_cast<int>(in->dims()[1]);
// Use col_shape in the im2col calculation.
framework::DDim col_shape = {in->dims()[0], framework::DDim col_shape = {in->dims()[0],
sequence_width * context_length}; context_length * sequence_width};
Tensor col; Tensor col;
col.mutable_data<T>(col_shape, context.GetPlace()); col.mutable_data<T>(col_shape, context.GetPlace());
math::SetConstant<Place, T> set_zero;
// Because if padding_trainable is false, padding data should be zeros. // Because if padding_trainable is false, padding data should be zeros.
math::SetConstant<Place, T> set_zero;
set_zero(context.device_context(), &col, static_cast<T>(0)); set_zero(context.device_context(), &col, static_cast<T>(0));
paddle::operators::math::ContextProjectFunctor<Place, T> math::ContextProjectFunctor<Place, T> seq_project_functor;
seq_project_functor;
LoDTensor* input = const_cast<LoDTensor*>(in);
Tensor* pad_data = const_cast<Tensor*>(padding_data);
seq_project_functor(context.device_context(), *input, *pad_data, col, seq_project_functor(context.device_context(), *in, *padding_data, col,
padding_trainable, context_start, context_length, padding_trainable, context_start, context_length,
context_stride, up_pad, down_pad, false, false, false); context_stride, up_pad, down_pad);
math::matmul<Place, T>(context.device_context(), col, false, filter, false, math::matmul<Place, T>(context.device_context(), col, false, filter, false,
static_cast<T>(1.0), out, static_cast<T>(0.0)); static_cast<T>(1.0), out, static_cast<T>(0.0));
...@@ -81,18 +75,18 @@ template <typename Place, typename T> ...@@ -81,18 +75,18 @@ template <typename Place, typename T>
class SequenceConvGradKernel : public framework::OpKernel<T> { class SequenceConvGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* out_g = context.Input<LoDTensor>(framework::GradVarName("Out"));
auto* in_g = context.Output<LoDTensor>(framework::GradVarName("X")); auto* in_g = context.Output<LoDTensor>(framework::GradVarName("X"));
auto* out_g = context.Input<LoDTensor>(framework::GradVarName("Out"));
auto* filter_g = context.Output<Tensor>(framework::GradVarName("Filter")); auto* filter_g = context.Output<Tensor>(framework::GradVarName("Filter"));
auto* padding_data_g = auto* padding_data_g =
context.Output<Tensor>(framework::GradVarName("PaddingData")); context.Output<Tensor>(framework::GradVarName("PaddingData"));
auto* in = context.Input<LoDTensor>("X"); auto* in = context.Input<LoDTensor>("X");
auto* filter = context.Input<Tensor>("Filter"); auto* filter = context.Input<Tensor>("Filter");
int context_start = context.Attr<int>("context_start"); int context_start = context.Attr<int>("contextStart");
int context_length = context.Attr<int>("context_length"); int context_length = context.Attr<int>("contextLength");
int context_stride = context.Attr<int>("context_stride"); int context_stride = context.Attr<int>("contextStride");
bool padding_trainable = context.Attr<bool>("padding_trainable"); bool padding_trainable = context.Attr<bool>("paddingTrainable");
PADDLE_ENFORCE_EQ(in->lod().size(), 1UL, PADDLE_ENFORCE_EQ(in->lod().size(), 1UL,
"Only support one level sequence now."); "Only support one level sequence now.");
...@@ -115,17 +109,18 @@ class SequenceConvGradKernel : public framework::OpKernel<T> { ...@@ -115,17 +109,18 @@ class SequenceConvGradKernel : public framework::OpKernel<T> {
math::matmul<Place, T>(context.device_context(), *out_g, false, *filter, math::matmul<Place, T>(context.device_context(), *out_g, false, *filter,
true, T(1.0), &col, T(1.0)); true, T(1.0), &col, T(1.0));
} }
paddle::operators::math::ContextProjectFunctor<Place, T> math::ContextProjectFunctor<Place, T> seq_project_functor;
seq_project_functor; math::ContextProjectGradFunctor<Place, T> seq_project_grad_functor;
if (in_g) { if (in_g) {
in_g->mutable_data<T>(context.GetPlace()); in_g->mutable_data<T>(context.GetPlace());
in_g->set_lod(in->lod()); in_g->set_lod(in->lod());
set_zero(context.device_context(), in_g, static_cast<T>(0)); set_zero(context.device_context(), in_g, static_cast<T>(0));
seq_project_functor(context.device_context(), *in_g, *padding_data_g, col, seq_project_grad_functor(context.device_context(), *in_g, *padding_data_g,
padding_trainable, context_start, context_length, col, padding_trainable, context_start,
context_stride, up_pad, down_pad, true, true, false); context_length, context_stride, up_pad, down_pad,
true, false);
} }
if (padding_trainable && padding_data_g) { if (padding_trainable && padding_data_g) {
...@@ -133,9 +128,10 @@ class SequenceConvGradKernel : public framework::OpKernel<T> { ...@@ -133,9 +128,10 @@ class SequenceConvGradKernel : public framework::OpKernel<T> {
set_zero(context.device_context(), padding_data_g, static_cast<T>(0)); set_zero(context.device_context(), padding_data_g, static_cast<T>(0));
LoDTensor* input = const_cast<LoDTensor*>(in); LoDTensor* input = const_cast<LoDTensor*>(in);
seq_project_functor(context.device_context(), *input, *padding_data_g, seq_project_grad_functor(context.device_context(), *input,
col, padding_trainable, context_start, context_length, *padding_data_g, col, padding_trainable,
context_stride, up_pad, down_pad, true, false, true); context_start, context_length, context_stride,
up_pad, down_pad, false, true);
} }
if (filter_g) { if (filter_g) {
...@@ -150,15 +146,9 @@ class SequenceConvGradKernel : public framework::OpKernel<T> { ...@@ -150,15 +146,9 @@ class SequenceConvGradKernel : public framework::OpKernel<T> {
padding_data = context.Input<Tensor>("PaddingData"); padding_data = context.Input<Tensor>("PaddingData");
} }
sequence_width = static_cast<int>(in->dims()[1]); seq_project_functor(context.device_context(), *in, *padding_data, col,
LoDTensor* input = const_cast<LoDTensor*>(in);
Tensor* pad_data = const_cast<Tensor*>(padding_data);
seq_project_functor(context.device_context(), *input, *pad_data, col,
padding_trainable, context_start, context_length, padding_trainable, context_start, context_length,
context_stride, up_pad, down_pad, false, false, context_stride, up_pad, down_pad);
false);
math::matmul<Place, T>(context.device_context(), col, true, out_grad, math::matmul<Place, T>(context.device_context(), col, true, out_grad,
false, T(1.0), &filter_grad, T(1.0)); false, T(1.0), &filter_grad, T(1.0));
......
...@@ -52,7 +52,11 @@ class TopkOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -52,7 +52,11 @@ class TopkOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("Out", "The output tensor of Topk op"); AddOutput("Out", "The output tensor of Topk op");
AddOutput("Indices", "The indices of Topk elements of input"); AddOutput("Indices", "The indices of Topk elements of input");
AddComment( AddComment(
R"DOC(If the input is a vector (1d tensor), finds the k largest entries in the vector and outputs their values and indices as vectors. Thus values[j] is the j-th largest entry in input, and its index is indices[j]. R"DOC(If the input is a vector (1d tensor),
finds the k largest entries in the vector
and outputs their values and indices as vectors.
Thus values[j] is the j-th largest entry in input,
and its index is indices[j].
For matrices, computes the top k entries in each row. )DOC"); For matrices, computes the top k entries in each row. )DOC");
AddAttr<int>("k", AddAttr<int>("k",
...@@ -66,6 +70,7 @@ class TopkOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -66,6 +70,7 @@ class TopkOpMaker : public framework::OpProtoAndCheckerMaker {
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(top_k, ops::TopkOp, ops::TopkOpMaker); REGISTER_OPERATOR(top_k, ops::TopkOp, ops::TopkOpMaker,
paddle::framework::EmptyGradOpMaker);
REGISTER_OP_CPU_KERNEL(top_k, REGISTER_OP_CPU_KERNEL(top_k,
ops::TopkKernel<paddle::platform::CPUPlace, float>); ops::TopkKernel<paddle::platform::CPUPlace, float>);
...@@ -23,9 +23,9 @@ using Tensor = framework::Tensor; ...@@ -23,9 +23,9 @@ using Tensor = framework::Tensor;
template <typename T> template <typename T>
struct Pair { struct Pair {
__device__ __forceinline__ Pair() {} __device__ __forceinline__ Pair() {}
__device__ __forceinline__ Pair(T value, int id) : v(value), id(id) {} __device__ __forceinline__ Pair(T value, int64_t id) : v(value), id(id) {}
__device__ __forceinline__ void set(T value, int id) { __device__ __forceinline__ void set(T value, int64_t id) {
v = value; v = value;
id = id; id = id;
} }
...@@ -48,7 +48,7 @@ struct Pair { ...@@ -48,7 +48,7 @@ struct Pair {
} }
T v; T v;
int id; int64_t id;
}; };
template <typename T> template <typename T>
...@@ -197,7 +197,7 @@ __device__ __forceinline__ void ThreadGetTopK(Pair<T> topk[], int& beam, ...@@ -197,7 +197,7 @@ __device__ __forceinline__ void ThreadGetTopK(Pair<T> topk[], int& beam,
template <typename T, int MaxLength, int BlockSize> template <typename T, int MaxLength, int BlockSize>
__device__ __forceinline__ void BlockReduce(Pair<T>* sh_topk, int* maxid, __device__ __forceinline__ void BlockReduce(Pair<T>* sh_topk, int* maxid,
Pair<T> topk[], T** topVal, Pair<T> topk[], T** topVal,
int** topIds, int& beam, int& k, int64_t** topIds, int& beam, int& k,
const int tid, const int warp) { const int tid, const int warp) {
while (true) { while (true) {
__syncthreads(); __syncthreads();
...@@ -249,7 +249,7 @@ __device__ __forceinline__ void BlockReduce(Pair<T>* sh_topk, int* maxid, ...@@ -249,7 +249,7 @@ __device__ __forceinline__ void BlockReduce(Pair<T>* sh_topk, int* maxid,
* 4. go to the first setp, until get the topk value. * 4. go to the first setp, until get the topk value.
*/ */
template <typename T, int MaxLength, int BlockSize> template <typename T, int MaxLength, int BlockSize>
__global__ void KeMatrixTopK(T* output, int output_stride, int* indices, __global__ void KeMatrixTopK(T* output, int output_stride, int64_t* indices,
const T* src, int lds, int dim, int k) { const T* src, int lds, int dim, int k) {
__shared__ Pair<T> sh_topk[BlockSize]; __shared__ Pair<T> sh_topk[BlockSize];
__shared__ int maxid[BlockSize / 2]; __shared__ int maxid[BlockSize / 2];
...@@ -293,7 +293,7 @@ class TopkOpCUDAKernel : public framework::OpKernel<T> { ...@@ -293,7 +293,7 @@ class TopkOpCUDAKernel : public framework::OpKernel<T> {
T* output_data = output->mutable_data<T>(ctx.GetPlace()); T* output_data = output->mutable_data<T>(ctx.GetPlace());
// FIXME(typhoonzero): data is always converted to type T? // FIXME(typhoonzero): data is always converted to type T?
int* indices_data = indices->mutable_data<int>(ctx.GetPlace()); int64_t* indices_data = indices->mutable_data<int64_t>(ctx.GetPlace());
size_t input_height = input->dims()[0]; size_t input_height = input->dims()[0];
size_t input_width = input->dims()[1]; size_t input_width = input->dims()[1];
......
...@@ -40,7 +40,7 @@ class TopkKernel : public framework::OpKernel<T> { ...@@ -40,7 +40,7 @@ class TopkKernel : public framework::OpKernel<T> {
const size_t k = static_cast<int>(ctx.Attr<int>("k")); const size_t k = static_cast<int>(ctx.Attr<int>("k"));
T* output_data = output->mutable_data<T>(ctx.GetPlace()); T* output_data = output->mutable_data<T>(ctx.GetPlace());
T* indices_data = indices->mutable_data<T>(ctx.GetPlace()); int64_t* indices_data = indices->mutable_data<int64_t>(ctx.GetPlace());
auto eg_input = EigenMatrix<T>::From(*input); auto eg_input = EigenMatrix<T>::From(*input);
...@@ -66,7 +66,7 @@ class TopkKernel : public framework::OpKernel<T> { ...@@ -66,7 +66,7 @@ class TopkKernel : public framework::OpKernel<T> {
}); });
for (size_t j = 0; j < k; j++) { for (size_t j = 0; j < k; j++) {
output_data[i * k + j] = vec[j].first; output_data[i * k + j] = vec[j].first;
indices_data[i * k + j] = vec[j].second; indices_data[i * k + j] = int64_t(vec[j].second);
} }
} }
} }
......
...@@ -9,7 +9,6 @@ cc_test(place_test SRCS place_test.cc DEPS place glog gflags) ...@@ -9,7 +9,6 @@ cc_test(place_test SRCS place_test.cc DEPS place glog gflags)
add_subdirectory(dynload) add_subdirectory(dynload)
cc_test(enforce_test SRCS enforce_test.cc DEPS stringpiece) cc_test(enforce_test SRCS enforce_test.cc DEPS stringpiece)
cc_test(environment_test SRCS environment_test.cc DEPS stringpiece)
IF(WITH_GPU) IF(WITH_GPU)
set(GPU_CTX_DEPS dynload_cuda dynamic_loader) set(GPU_CTX_DEPS dynload_cuda dynamic_loader)
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <stdlib.h>
#include <unistd.h>
#include <vector>
#include "paddle/platform/enforce.h"
#include "paddle/string/piece.h"
extern char** environ; // for environment variables
namespace paddle {
namespace platform {
inline void SetEnvVariable(const std::string& name, const std::string& value) {
PADDLE_ENFORCE_NE(setenv(name.c_str(), value.c_str(), 1), -1,
"Failed to set environment variable %s=%s", name, value);
}
inline void UnsetEnvVariable(const std::string& name) {
PADDLE_ENFORCE_NE(unsetenv(name.c_str()), -1,
"Failed to unset environment variable %s", name);
}
inline bool IsEnvVarDefined(const std::string& name) {
return std::getenv(name.c_str()) != nullptr;
}
inline std::string GetEnvValue(const std::string& name) {
PADDLE_ENFORCE(IsEnvVarDefined(name),
"Tried to access undefined environment variable %s", name);
return std::getenv(name.c_str());
}
inline std::vector<std::string> GetAllEnvVariables() {
std::vector<std::string> vars;
for (auto var = environ; *var != nullptr; ++var) {
auto tail = string::Index(*var, "=");
auto name = string::SubStr(*var, 0, tail).ToString();
vars.push_back(name);
}
return vars;
}
} // namespace platform
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/platform/environment.h"
#include "glog/logging.h"
#include "gtest/gtest.h"
TEST(ENVIRONMENT, ACCESS) {
namespace platform = paddle::platform;
namespace string = paddle::string;
platform::SetEnvVariable("PADDLE_USE_ENV", "TRUE");
EXPECT_TRUE(platform::IsEnvVarDefined("PADDLE_USE_ENV"));
EXPECT_EQ(platform::GetEnvValue("PADDLE_USE_ENV"), "TRUE");
platform::UnsetEnvVariable("PADDLE_USE_ENV");
EXPECT_FALSE(platform::IsEnvVarDefined("PADDLE_USE_ENV"));
platform::SetEnvVariable("PADDLE_USE_ENV1", "Hello ");
platform::SetEnvVariable("PADDLE_USE_ENV2", "World, ");
platform::SetEnvVariable("PADDLE_USE_ENV3", "PaddlePaddle!");
std::string env_info;
auto vars = platform::GetAllEnvVariables();
for_each(vars.begin(), vars.end(), [&](const std::string& var) {
env_info += platform::GetEnvValue(var);
});
EXPECT_TRUE(string::Contains(env_info, "Hello World, PaddlePaddle!"));
platform::UnsetEnvVariable("PADDLE_USE_ENV1");
platform::UnsetEnvVariable("PADDLE_USE_ENV2");
platform::UnsetEnvVariable("PADDLE_USE_ENV3");
env_info.clear();
vars = platform::GetAllEnvVariables();
for_each(vars.begin(), vars.end(), [&](const std::string& var) {
env_info += platform::GetEnvValue(var);
});
EXPECT_FALSE(string::Contains(env_info, "Hello World, PaddlePaddle!"));
EXPECT_FALSE(platform::IsEnvVarDefined("PADDLE_USE_ENV1"));
EXPECT_FALSE(platform::IsEnvVarDefined("PADDLE_USE_ENV2"));
EXPECT_FALSE(platform::IsEnvVarDefined("PADDLE_USE_ENV3"));
}
...@@ -17,7 +17,6 @@ limitations under the License. */ ...@@ -17,7 +17,6 @@ limitations under the License. */
#include "gflags/gflags.h" #include "gflags/gflags.h"
#include "paddle/platform/enforce.h" #include "paddle/platform/enforce.h"
#include "paddle/platform/environment.h"
DEFINE_double(fraction_of_gpu_memory_to_use, 0.95, DEFINE_double(fraction_of_gpu_memory_to_use, 0.95,
"Default use 95% of GPU memory for PaddlePaddle," "Default use 95% of GPU memory for PaddlePaddle,"
...@@ -75,13 +74,6 @@ size_t GpuMaxChunkSize() { ...@@ -75,13 +74,6 @@ size_t GpuMaxChunkSize() {
GpuMemoryUsage(available, total); GpuMemoryUsage(available, total);
if (IsEnvVarDefined(kEnvFractionGpuMemoryToUse)) {
auto val = std::stod(GetEnvValue(kEnvFractionGpuMemoryToUse));
PADDLE_ENFORCE_GT(val, 0.0);
PADDLE_ENFORCE_LE(val, 1.0);
FLAGS_fraction_of_gpu_memory_to_use = val;
}
// Reserving the rest memory for page tables, etc. // Reserving the rest memory for page tables, etc.
size_t reserving = (1 - FLAGS_fraction_of_gpu_memory_to_use) * total; size_t reserving = (1 - FLAGS_fraction_of_gpu_memory_to_use) * total;
......
...@@ -14,6 +14,9 @@ limitations under the License. */ ...@@ -14,6 +14,9 @@ limitations under the License. */
#include "paddle/pybind/protobuf.h" #include "paddle/pybind/protobuf.h"
#include <mutex> // for call_once
#include <unordered_map>
#include "gflags/gflags.h"
#include "paddle/framework/backward.h" #include "paddle/framework/backward.h"
#include "paddle/framework/executor.h" #include "paddle/framework/executor.h"
#include "paddle/framework/feed_fetch_method.h" #include "paddle/framework/feed_fetch_method.h"
...@@ -40,9 +43,27 @@ limitations under the License. */ ...@@ -40,9 +43,27 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace pybind { namespace pybind {
static size_t UniqueIntegerGenerator() { static size_t UniqueIntegerGenerator(const std::string &prefix) {
static std::atomic<size_t> generator; static std::unordered_map<std::string, std::atomic<size_t>> generators;
return generator.fetch_add(1); return generators[prefix].fetch_add(1);
}
std::once_flag gflags_init_flag;
// TODO(qijun) move init gflags to init.cc
void InitGflags(std::vector<std::string> &argv) {
std::call_once(gflags_init_flag, [&]() {
int argc = argv.size();
char **arr = new char *[argv.size()];
std::string line;
for (size_t i = 0; i < argv.size(); i++) {
arr[i] = &argv[i][0];
line += argv[i];
line += ' ';
}
google::ParseCommandLineFlags(&argc, &arr, true);
VLOG(1) << "Init commandline: " << line;
});
} }
bool IsCompileGPU() { bool IsCompileGPU() {
...@@ -483,6 +504,7 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -483,6 +504,7 @@ All parameter, weight, gradient are variables in Paddle.
}); });
m.def("unique_integer", UniqueIntegerGenerator); m.def("unique_integer", UniqueIntegerGenerator);
m.def("init_gflags", InitGflags);
m.def("is_compile_gpu", IsCompileGPU); m.def("is_compile_gpu", IsCompileGPU);
m.def("set_feed_variable", framework::SetFeedVariable); m.def("set_feed_variable", framework::SetFeedVariable);
......
...@@ -27,3 +27,30 @@ foreach(filename ${proto_filenames}) ...@@ -27,3 +27,30 @@ foreach(filename ${proto_filenames})
endforeach() endforeach()
add_custom_target(gen_proto_py ALL DEPENDS ${PROTO_GEN_PY}) add_custom_target(gen_proto_py ALL DEPENDS ${PROTO_GEN_PY})
if (WITH_GOLANG)
add_custom_target(protoc-gen-go)
add_custom_command(TARGET protoc-gen-go
COMMAND go
ARGS "get" "-u" "github.com/golang/protobuf/protoc-gen-go")
set(PROTO_GEN_GO)
file(GLOB proto_filenames . OptimizerConfig.proto)
foreach(filename ${proto_filenames})
message(STATUS ${filename})
get_filename_component(ABS_FIL ${filename} ABSOLUTE)
get_filename_component(FIL_WE ${filename} NAME_WE)
set(CUR_PROTO_GEN_GO
${PADDLE_SOURCE_DIR}/paddle/go/proto/${FIL_WE}.pb.go)
set(PROTO_GEN_GO
${CUR_PROTO_GEN_GO}
${PROTO_GEN_GO})
add_custom_command(OUTPUT ${CUR_PROTO_GEN_GO}
COMMAND ${PROTOBUF_PROTOC_EXECUTABLE}
ARGS "--go_out=${PADDLE_SOURCE_DIR}/go/proto"
"-I" ${CMAKE_CURRENT_SOURCE_DIR} ${ABS_FIL}
DEPENDS ${ABS_FIL} protoc protoc-gen-go)
endforeach()
add_custom_target(gen_proto_go ALL DEPENDS ${PROTO_GEN_GO})
endif()
import sys
import core
__all__ = ['proto'] __all__ = ['proto']
argv = []
if core.is_compile_gpu():
argv = list(sys.argv) + [
"--tryfromenv=fraction_of_gpu_memory_to_use,use_pinned_memory"
]
else:
argv = list(sys.argv) + ["--tryfromenv=use_pinned_memory"]
core.init_gflags(argv)
...@@ -119,8 +119,9 @@ class Variable(object): ...@@ -119,8 +119,9 @@ class Variable(object):
@staticmethod @staticmethod
def _unique_var_name_(): def _unique_var_name_():
uid = core.unique_integer() # unique during whole process. prefix = "_generated_var"
return "_generated_var_%d" % uid uid = core.unique_integer(prefix) # unique during whole process.
return "_".join([prefix, str(uid)])
@staticmethod @staticmethod
def _convert_np_dtype_to_dtype_(np_dtype): def _convert_np_dtype_to_dtype_(np_dtype):
......
...@@ -8,7 +8,7 @@ from paddle.v2.framework.framework import Variable, g_program, \ ...@@ -8,7 +8,7 @@ from paddle.v2.framework.framework import Variable, g_program, \
def unique_name(prefix): def unique_name(prefix):
uid = core.unique_integer() # unique during whole process. uid = core.unique_integer(prefix) # unique during whole process.
return "_".join([prefix, str(uid)]) return "_".join([prefix, str(uid)])
......
...@@ -5,7 +5,7 @@ import re ...@@ -5,7 +5,7 @@ import re
__all__ = [ __all__ = [
'fc', 'data', 'cross_entropy', 'conv2d', 'pool2d', 'embedding', 'concat', 'fc', 'data', 'cross_entropy', 'conv2d', 'pool2d', 'embedding', 'concat',
'StaticRNN', 'cast' 'StaticRNN', 'cast', 'sequence_conv', 'sequence_pool', 'accuracy'
] ]
...@@ -150,7 +150,7 @@ def _create_op_func_(op_type): ...@@ -150,7 +150,7 @@ def _create_op_func_(op_type):
outputs[name] = [helper.create_tmp_variable(dtype=dtype)] outputs[name] = [helper.create_tmp_variable(dtype=dtype)]
helper.append_op( helper.append_op(
type=op_type, inputs=inputs, outputs=outputs, attrs=kwargs) type=op_type, inputs=inputs, outputs=outputs, attrs=kwargs)
return out return helper.append_activation(out)
func.__name__ = op_type func.__name__ = op_type
globals()[op_type] = func globals()[op_type] = func
...@@ -160,10 +160,23 @@ def _create_op_func_(op_type): ...@@ -160,10 +160,23 @@ def _create_op_func_(op_type):
_create_op_func_('mean') _create_op_func_('mean')
_create_op_func_('mul') _create_op_func_('mul')
_create_op_func_('elementwise_add')
_create_op_func_('dropout') _create_op_func_('dropout')
_create_op_func_('reshape') _create_op_func_('reshape')
def cast(x, data_type, program=None):
helper = LayerHelper('cast', **locals())
out = helper.create_tmp_variable(dtype=data_type)
helper.append_op(
type='cast',
inputs={'X': [x]},
outputs={'Out': [out]},
attrs={'in_data_type': x.data_type,
'out_data_type': out.data_type})
return out
def cast(x, data_type, program=None): def cast(x, data_type, program=None):
helper = LayerHelper('cast', **locals()) helper = LayerHelper('cast', **locals())
out = helper.create_tmp_variable(dtype=data_type) out = helper.create_tmp_variable(dtype=data_type)
...@@ -212,13 +225,73 @@ def square_error_cost(input, label, **kwargs): ...@@ -212,13 +225,73 @@ def square_error_cost(input, label, **kwargs):
square_out = helper.create_tmp_variable(dtype=input.data_type) square_out = helper.create_tmp_variable(dtype=input.data_type)
helper.append_op( helper.append_op(
type='pow', type='square', inputs={'X': [minus_out]}, outputs={'Y': [square_out]})
inputs={'X': [minus_out]},
outputs={'Y': [square_out]},
attrs={'factor': 2.0})
return square_out return square_out
def accuracy(input, label, k=1, **kwargs):
helper = LayerHelper("accuracy", **kwargs)
topk_out = helper.create_tmp_variable(dtype=input.data_type)
topk_indices = helper.create_tmp_variable(dtype="int64")
helper.append_op(
type="top_k",
inputs={"X": [input]},
outputs={"Out": [topk_out],
"Indices": [topk_indices]},
attrs={"k": k})
acc_out_dtype = kwargs.get("out_dtype", "float32")
acc_out = helper.create_tmp_variable(dtype=acc_out_dtype)
helper.append_op(
type="accuracy",
inputs={
"Out": [topk_out],
"Indices": [topk_indices],
"Label": [label]
},
outputs={"Accuracy": [acc_out]})
return acc_out
def sequence_conv(input,
num_filters,
name=None,
filter_size=3,
act=None,
stride=1,
padding=None,
bias_attr=None,
param_attr=None,
program=None,
init_program=None):
# FIXME(dzh) : want to unify the argument of python layer
# function. So we ignore some unecessary attributes.
# such as, padding_trainable, context_start.
helper = LayerHelper('sequence_conv', **locals())
dtype = helper.input_dtype()
filter_shape = [num_filters, filter_size]
filter = helper.create_parameter(
attr=helper.param_attr, shape=filter_shape, dtype=dtype)
pre_bias = helper.create_tmp_variable(dtype)
helper.append_op(
type='sequence_conv',
inputs={
'X': [input],
'Filter': filter,
},
outputs={"Out": pre_bias},
attrs={
'context_stride': stride,
'context_start': 0,
'context_length': filter_size
})
pre_act = helper.append_bias_op(pre_bias)
return helper.append_activation(pre_act)
def conv2d(input, def conv2d(input,
num_filters, num_filters,
name=None, name=None,
...@@ -271,6 +344,35 @@ def conv2d(input, ...@@ -271,6 +344,35 @@ def conv2d(input,
return helper.append_activation(pre_act) return helper.append_activation(pre_act)
def sequence_pool(input,
pool_size,
pool_type,
pool_stride=1,
pool_padding=0,
global_pooling=False,
program=None,
init_program=None):
# FIXME(dzh) : want to unify the argument of python layer
# function. So we ignore some unecessary attributes
ENUM_POOL_TYPE = set(["max", "avg", "sqrt", "last", "first"])
if pool_type not in ENUM_POOL_TYPE:
raise ValueError("Unknown pool_type: '%s'. It can only be %s.",
str(pool_type), " ".join(ENUM_POOL_TYPE))
helper = LayerHelper('sequence_pool', **locals())
dtype = helper.input_dtype()
pool_out = helper.create_tmp_variable(dtype)
helper.append_op(
type="sequence_pool",
inputs={"X": [input]},
outputs={"Out": pool_out},
attrs={"strategy": pool_type})
return pool_out
def pool2d(input, def pool2d(input,
pool_size, pool_size,
pool_type, pool_type,
...@@ -290,7 +392,7 @@ def pool2d(input, ...@@ -290,7 +392,7 @@ def pool2d(input,
if isinstance(pool_padding, int): if isinstance(pool_padding, int):
pool_padding = [pool_padding, pool_padding] pool_padding = [pool_padding, pool_padding]
helper = LayerHelper('conv2d', **locals()) helper = LayerHelper('pool2d', **locals())
dtype = helper.input_dtype() dtype = helper.input_dtype()
pool_out = helper.create_tmp_variable(dtype) pool_out = helper.create_tmp_variable(dtype)
......
import paddle.v2.framework.layers as layers import paddle.v2.framework.layers as layers
__all__ = ["simple_img_conv_pool", "sequence_conv_pool"]
def simple_img_conv_pool(input, def simple_img_conv_pool(input,
filter_size,
num_filters, num_filters,
filter_size,
pool_size, pool_size,
pool_stride, pool_stride,
act, act,
...@@ -94,3 +96,29 @@ def img_conv_group(input, ...@@ -94,3 +96,29 @@ def img_conv_group(input,
program=program, program=program,
init_program=init_program) init_program=init_program)
return pool_out return pool_out
def sequence_conv_pool(input,
num_filters,
filter_size,
pool_size,
pool_stride,
act,
program=None,
init_program=None):
conv_out = layers.sequence_conv(
input=input,
num_filters=num_filters,
filter_size=filter_size,
act=act,
program=program,
init_program=init_program)
pool_out = layers.sequence_pool(
input=conv_out,
pool_size=pool_size,
pool_type='max',
pool_stride=pool_stride,
program=program,
init_program=init_program)
return pool_out
...@@ -281,7 +281,8 @@ class OpTest(unittest.TestCase): ...@@ -281,7 +281,8 @@ class OpTest(unittest.TestCase):
type(sub_out)) type(sub_out))
for sub_out_name, expect in sub_out: for sub_out_name, expect in sub_out:
idx = find_actual(sub_out_name, fetch_list) idx = find_actual(sub_out_name, fetch_list)
actual_t = np.array(outs[idx]) actual = outs[idx]
actual_t = np.array(actual)
expect_t = expect[0] \ expect_t = expect[0] \
if isinstance(expect, tuple) else expect if isinstance(expect, tuple) else expect
self.assertTrue( self.assertTrue(
...@@ -291,11 +292,12 @@ class OpTest(unittest.TestCase): ...@@ -291,11 +292,12 @@ class OpTest(unittest.TestCase):
str(place)) str(place))
if isinstance(expect, tuple): if isinstance(expect, tuple):
self.assertListEqual( self.assertListEqual(
actual_t.lod(), expect[1], "Output (" + sub_out_name actual.lod(), expect[1], "Output (" + sub_out_name +
+ ") has different lod at " + str(place)) ") has different lod at " + str(place))
else: else:
idx = find_actual(out_name, fetch_list) idx = find_actual(out_name, fetch_list)
actual_t = outs[idx] actual = outs[idx]
actual_t = np.array(actual)
expect = self.outputs[out_name] expect = self.outputs[out_name]
expect_t = expect[0] if isinstance(expect, tuple) else expect expect_t = expect[0] if isinstance(expect, tuple) else expect
self.assertTrue( self.assertTrue(
...@@ -303,7 +305,7 @@ class OpTest(unittest.TestCase): ...@@ -303,7 +305,7 @@ class OpTest(unittest.TestCase):
actual_t, expect_t, atol=atol), actual_t, expect_t, atol=atol),
"Output (" + out_name + ") has diff at " + str(place)) "Output (" + out_name + ") has diff at " + str(place))
if isinstance(expect, tuple): if isinstance(expect, tuple):
self.assertListEqual(actual_t.lod(), expect[1], self.assertListEqual(actual.lod(), expect[1],
"Output (" + out_name + "Output (" + out_name +
") has different lod at " + str(place)) ") has different lod at " + str(place))
......
...@@ -7,12 +7,13 @@ class TestAccuracyOp(OpTest): ...@@ -7,12 +7,13 @@ class TestAccuracyOp(OpTest):
def setUp(self): def setUp(self):
self.op_type = "accuracy" self.op_type = "accuracy"
n = 8192 n = 8192
infer = np.random.randint(0, 2, (n, 1)).astype("int") infer = np.random.random((n, 1)).astype("float32")
label = np.random.randint(0, 2, (n, )).astype("int") indices = np.random.randint(0, 2, (n, 1))
self.inputs = {'Inference': infer, "Label": label} label = np.random.randint(0, 2, (n, 1))
self.inputs = {'Out': infer, 'Indices': indices, "Label": label}
num_correct = 0 num_correct = 0
for rowid in xrange(n): for rowid in xrange(n):
for ele in infer[rowid]: for ele in indices[rowid]:
if ele == label[rowid]: if ele == label[rowid]:
num_correct += 1 num_correct += 1
break break
......
...@@ -6,10 +6,11 @@ from op_test import OpTest ...@@ -6,10 +6,11 @@ from op_test import OpTest
class TestAucOp(OpTest): class TestAucOp(OpTest):
def setUp(self): def setUp(self):
self.op_type = "auc" self.op_type = "auc"
pred = np.random.random((128)).astype("float32") pred = np.random.random((128, 2)).astype("float32")
labels = np.random.randint(0, 2, (128, )) indices = np.random.randint(0, 2, (128, 2))
labels = np.random.randint(0, 2, (128, 1))
num_thresholds = 200 num_thresholds = 200
self.inputs = {'Inference': pred, 'Label': labels} self.inputs = {'Out': pred, 'Indices': indices, 'Label': labels}
self.attrs = {'curve': 'ROC', 'num_thresholds': num_thresholds} self.attrs = {'curve': 'ROC', 'num_thresholds': num_thresholds}
# NOTE: sklearn use a different way to generate thresholds # NOTE: sklearn use a different way to generate thresholds
# which will cause the result differs slightly: # which will cause the result differs slightly:
...@@ -31,12 +32,12 @@ class TestAucOp(OpTest): ...@@ -31,12 +32,12 @@ class TestAucOp(OpTest):
tp, fn, tn, fp = 0, 0, 0, 0 tp, fn, tn, fp = 0, 0, 0, 0
for i, lbl in enumerate(labels): for i, lbl in enumerate(labels):
if lbl: if lbl:
if pred[i] >= thresh: if pred[i, 0] >= thresh:
tp += 1 tp += 1
else: else:
fn += 1 fn += 1
else: else:
if pred[i] >= thresh: if pred[i, 0] >= thresh:
fp += 1 fp += 1
else: else:
tn += 1 tn += 1
...@@ -62,6 +63,5 @@ class TestAucOp(OpTest): ...@@ -62,6 +63,5 @@ class TestAucOp(OpTest):
self.check_output() self.check_output()
# TODO(typhoonzero): add this back till we fix it if __name__ == "__main__":
#if __name__ == "__main__": unittest.main()
# unittest.main()
...@@ -37,7 +37,7 @@ class TestLayer(unittest.TestCase): ...@@ -37,7 +37,7 @@ class TestLayer(unittest.TestCase):
layers.batch_norm( layers.batch_norm(
input=images, program=program, init_program=init_program) input=images, program=program, init_program=init_program)
#print str(program) # print str(program)
def test_dropout_layer(self): def test_dropout_layer(self):
program = Program() program = Program()
...@@ -53,7 +53,7 @@ class TestLayer(unittest.TestCase): ...@@ -53,7 +53,7 @@ class TestLayer(unittest.TestCase):
program=program, program=program,
init_program=init_program) init_program=init_program)
#print str(program) # print str(program)
def test_img_conv_group(self): def test_img_conv_group(self):
program = Program() program = Program()
...@@ -70,6 +70,29 @@ class TestLayer(unittest.TestCase): ...@@ -70,6 +70,29 @@ class TestLayer(unittest.TestCase):
# print str(program) # print str(program)
def test_elementwise_add_with_act(self):
program = Program()
init_program = Program()
image1 = layers.data(
name='pixel1',
shape=[3, 48, 48],
data_type='float32',
program=program,
init_program=init_program)
image2 = layers.data(
name='pixel2',
shape=[3, 48, 48],
data_type='float32',
program=program,
init_program=init_program)
out = layers.elementwise_add(
x=image1,
y=image2,
act='relu',
program=program,
init_program=init_program)
# print(program)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -10,6 +10,120 @@ from paddle.v2.framework.executor import Executor ...@@ -10,6 +10,120 @@ from paddle.v2.framework.executor import Executor
import numpy as np import numpy as np
def resnet_cifar10(input, depth=32, program=None, init_program=None):
def conv_bn_layer(input,
ch_out,
filter_size,
stride,
padding,
act='relu',
program=None,
init_program=None):
tmp = layers.conv2d(
input=input,
filter_size=filter_size,
num_filters=ch_out,
stride=stride,
padding=padding,
act=None,
bias_attr=False,
program=program,
init_program=init_program)
return layers.batch_norm(
input=tmp, act=act, program=program, init_program=init_program)
def shortcut(input, ch_in, ch_out, stride, program, init_program):
if ch_in != ch_out:
return conv_bn_layer(input, ch_out, 1, stride, 0, None, program,
init_program)
else:
return input
def basicblock(input,
ch_in,
ch_out,
stride,
program=program,
init_program=init_program):
tmp = conv_bn_layer(
input,
ch_out,
3,
stride,
1,
program=program,
init_program=init_program)
tmp = conv_bn_layer(
tmp,
ch_out,
3,
1,
1,
act=None,
program=program,
init_program=init_program)
short = shortcut(input, ch_in, ch_out, stride, program, init_program)
return layers.elementwise_add(
x=tmp,
y=short,
act='relu',
program=program,
init_program=init_program)
def layer_warp(block_func, input, ch_in, ch_out, count, stride, program,
init_program):
tmp = block_func(input, ch_in, ch_out, stride, program, init_program)
for i in range(1, count):
tmp = block_func(tmp, ch_out, ch_out, 1, program, init_program)
return tmp
assert (depth - 2) % 6 == 0
n = (depth - 2) / 6
conv1 = conv_bn_layer(
input=input,
ch_out=16,
filter_size=3,
stride=1,
padding=1,
program=program,
init_program=init_program)
res1 = layer_warp(
basicblock,
conv1,
16,
16,
n,
1,
program=program,
init_program=init_program)
res2 = layer_warp(
basicblock,
res1,
16,
32,
n,
2,
program=program,
init_program=init_program)
res3 = layer_warp(
basicblock,
res2,
32,
64,
n,
2,
program=program,
init_program=init_program)
pool = layers.pool2d(
input=res3,
pool_size=8,
pool_type='avg',
pool_stride=1,
program=program,
init_program=init_program)
return pool
def vgg16_bn_drop(input, program, init_program): def vgg16_bn_drop(input, program, init_program):
def conv_block(input, def conv_block(input,
num_filter, num_filter,
...@@ -75,8 +189,16 @@ label = layers.data( ...@@ -75,8 +189,16 @@ label = layers.data(
data_type='int64', data_type='int64',
program=program, program=program,
init_program=init_program) init_program=init_program)
vgg_net = vgg16_bn_drop(images, program, init_program)
predict = layers.fc(input=vgg_net, # Add neural network config
# option 1. resnet
net = resnet_cifar10(images, 32, program, init_program)
# option 2. vgg
# net = vgg16_bn_drop(images, program, init_program)
# print(program)
predict = layers.fc(input=net,
size=classdim, size=classdim,
act='softmax', act='softmax',
program=program, program=program,
...@@ -123,8 +245,8 @@ for pass_id in range(PASS_NUM): ...@@ -123,8 +245,8 @@ for pass_id in range(PASS_NUM):
fetch_list=[avg_cost]) fetch_list=[avg_cost])
loss = np.array(outs[0]) loss = np.array(outs[0])
# print("pass_id:" + str(pass_id) + " batch_id:" + str(batch_id) + print("pass_id:" + str(pass_id) + " batch_id:" + str(batch_id) +
# " loss:" + str(loss)) " loss:" + str(loss))
batch_id = batch_id + 1 batch_id = batch_id + 1
if batch_id > 1: if batch_id > 1:
......
...@@ -51,12 +51,14 @@ predict = layers.fc(input=conv_pool_2, ...@@ -51,12 +51,14 @@ predict = layers.fc(input=conv_pool_2,
cost = layers.cross_entropy( cost = layers.cross_entropy(
input=predict, label=label, program=program, init_program=init_program) input=predict, label=label, program=program, init_program=init_program)
avg_cost = layers.mean(x=cost, program=program) avg_cost = layers.mean(x=cost, program=program)
accuracy = layers.accuracy(
input=predict, label=label, program=program, init_program=init_program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001) sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
opts = sgd_optimizer.minimize(avg_cost) opts = sgd_optimizer.minimize(avg_cost)
BATCH_SIZE = 50 BATCH_SIZE = 50
PASS_NUM = 1 PASS_NUM = 3
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.reader.shuffle( paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=500), paddle.dataset.mnist.train(), buf_size=500),
...@@ -83,10 +85,11 @@ for pass_id in range(PASS_NUM): ...@@ -83,10 +85,11 @@ for pass_id in range(PASS_NUM):
outs = exe.run(program, outs = exe.run(program,
feed={"pixel": tensor_img, feed={"pixel": tensor_img,
"label": tensor_y}, "label": tensor_y},
fetch_list=[avg_cost]) fetch_list=[avg_cost, accuracy])
loss = np.array(outs[0]) loss = np.array(outs[0])
acc = np.array(outs[1])
if loss < 10.0: if loss < 10.0 and acc > 0.9:
exit(0) # if avg cost less than 10.0, we think our code is good. # if avg cost less than 10.0 and accuracy is larger than 0.9, we think our code is good.
exit(0)
exit(1) exit(1)
...@@ -45,10 +45,10 @@ class TestSeqProject(OpTest): ...@@ -45,10 +45,10 @@ class TestSeqProject(OpTest):
self.inputs_val_no_f = ['PaddingData', 'X'] self.inputs_val_no_f = ['PaddingData', 'X']
self.attrs = { self.attrs = {
'context_start': self.context_start, 'contextStart': self.context_start,
'context_length': self.context_length, 'contextLength': self.context_length,
'padding_trainable': self.padding_trainable, 'paddingTrainable': self.padding_trainable,
'context_stride': self.context_stride 'contextStride': self.context_stride
} }
out = np.zeros( out = np.zeros(
(self.input_size[0], self.output_represention)).astype('float32') (self.input_size[0], self.output_represention)).astype('float32')
......
...@@ -9,7 +9,7 @@ class TestTopkOp(OpTest): ...@@ -9,7 +9,7 @@ class TestTopkOp(OpTest):
k = 1 k = 1
input = np.random.random((32, 84)).astype("float32") input = np.random.random((32, 84)).astype("float32")
output = np.ndarray((32, k)) output = np.ndarray((32, k))
indices = np.ndarray((32, k)) indices = np.ndarray((32, k)).astype("int64")
self.inputs = {'X': input} self.inputs = {'X': input}
self.attrs = {'k': k} self.attrs = {'k': k}
...@@ -32,7 +32,7 @@ class TestTopkOp3d(OpTest): ...@@ -32,7 +32,7 @@ class TestTopkOp3d(OpTest):
input = np.random.random((32, 2, 84)).astype("float32") input = np.random.random((32, 2, 84)).astype("float32")
input_flat_2d = input.reshape(64, 84) input_flat_2d = input.reshape(64, 84)
output = np.ndarray((64, k)) output = np.ndarray((64, k))
indices = np.ndarray((64, k)).astype("int") indices = np.ndarray((64, k)).astype("int64")
# FIXME: should use 'X': input for a 3d input # FIXME: should use 'X': input for a 3d input
self.inputs = {'X': input_flat_2d} self.inputs = {'X': input_flat_2d}
......
...@@ -205,7 +205,8 @@ class SGD(object): ...@@ -205,7 +205,8 @@ class SGD(object):
""" """
Testing method. Will test input data. Testing method. Will test input data.
:param reader: A reader that reads and yeilds data items. :param reader: A batch reader that reads and yeilds data items,
it should be a paddle.v2.batch.
:type reader: collections.Iterable :type reader: collections.Iterable
:param feeding: Feeding is a map of neural network input name and array :param feeding: Feeding is a map of neural network input name and array
index that reader returns. index that reader returns.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册