提交 37792e54 编写于 作者: Y Yu Yang

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into HEAD

8 合并请求!11636[IMPORTANT] MKLDNN layout: Support for sum operator,!11358Merge release 0.12.0 and 0.13.0 to master,!11107merge release/0.12.0 release/0.13.0,!11094Master release/0.12.0 to master,!11092Release/0.12.0,!8411compile with nccl2,!8361Backward on parallel do using nccl,!8196compile with nccl2
......@@ -2,7 +2,7 @@
|---|---|
| backyes | Yan-Fei Wang |
| beckett1124 | Bin Qi |
| Canpio | Jia-Yi Feng |
| JiayiFeng | Jia-Yi Feng |
| chengxiaohua1105 | Xiao-Hua Cheng |
| cxwangyi, yiwangbaidu, wangkuiyi | Yi Wang |
| cxysteven | Xing-Yi Cheng |
......
# 用Docker编译和测试PaddlePaddle
## 需要的软硬件
为了开发PaddlePaddle,我们需要
1. 一台电脑,可以装的是 Linux, BSD, Windows 或者 MacOS 操作系统,以及
1. Docker。
不需要依赖其他任何软件了。即便是 Python 和 GCC 都不需要,因为我们会把所有编译工具都安装进一个 Docker image 里。
## 总体流程
1. 获取源码
```bash
git clone https://github.com/paddlepaddle/paddle
```
2. 安装开发工具到 Docker image 里
```bash
cd paddle; docker build -t paddle:dev .
```
请注意这个命令结尾处的 `.`;它表示 `docker build` 应该读取当前目录下的 [`Dockerfile`文件](https://github.com/PaddlePaddle/Paddle/blob/develop/Dockerfile),按照其内容创建一个名为 `paddle:dev` 的 Docker image,并且把各种开发工具安装进去。
3. 编译
以下命令启动一个 Docker container 来执行 `paddle:dev` 这个 Docker image,同时把当前目录(源码树根目录)映射为 container 里的 `/paddle` 目录,并且运行 `Dockerfile` 描述的默认入口程序 [`build.sh`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build.sh)。这个脚本调用 `cmake``make` 来编译 `/paddle` 里的源码,结果输出到 `/paddle/build`,也就是本地的源码树根目录里的 `build` 子目录。
```bash
docker run --rm -v $PWD:/paddle paddle:dev
```
上述命令编译出一个 CUDA-enabled 版本。如果我们只需要编译一个只支持 CPU 的版本,可以用
```bash
docker run --rm -e WITH_GPU=OFF -v $PWD:/paddle paddle:dev
```
4. 运行单元测试
用本机的第一个 GPU 来运行包括 GPU 单元测试在内的所有单元测试:
```bash
NV_GPU=0 nvidia-docker run --rm -v $PWD:/paddle paddle:dev bash -c "cd /paddle/build; ctest"
```
如果编译的时候我们用了 `WITH_GPU=OFF` 选项,那么编译过程只会产生 CPU-based 单元测试,那么我们也就不需要 nvidia-docker 来运行单元测试了。我们只需要:
```bash
docker run --rm -v $PWD:/paddle paddle:dev bash -c "cd /paddle/build; ctest"
```
有时候我们只想运行一个特定的单元测试,比如 `memory_test`,我们可以
```bash
nvidia-docker run --rm -v $PWD:/paddle paddle:dev bash -c "cd /paddle/build; ctest -V -R memory_test"
```
5. 清理
有时候我们会希望清理掉已经下载的第三方依赖以及已经编译的二进制文件。此时只需要:
```bash
rm -rf build
```
## 为什么要 Docker 呀?
- 什么是 Docker?
如果您没有听说 Docker,可以把它想象为一个类似 virtualenv 的系统,但是虚拟的不仅仅是 Python 的运行环境。
- Docker 还是虚拟机?
有人用虚拟机来类比 Docker。需要强调的是:Docker 不会虚拟任何硬件,Docker container 里运行的编译工具实际上都是在本机的 CPU 和操作系统上直接运行的,性能和把编译工具安装在本机运行一样。
- 为什么用 Docker?
把工具和配置都安装在一个 Docker image 里可以标准化编译环境。这样如果遇到问题,其他人可以复现问题以便帮助。
另外,对于习惯使用Windows和MacOS的开发者来说,使用Docker就不用配置交叉编译环境了。
- 我可以选择不用Docker吗?
当然可以。大家可以用把开发工具安装进入 Docker image 一样的方式,把这些工具安装到本机。这篇文档介绍基于 Docker 的开发流程,是因为这个流程比其他方法都更简便。
- 学习 Docker 有多难?
理解 Docker 并不难,大概花十分钟看一下[这篇文章](https://zhuanlan.zhihu.com/p/19902938)。这可以帮您省掉花一小时安装和配置各种开发工具,以及切换机器时需要新安装的辛苦。别忘了 PaddlePaddle 更新可能导致需要新的开发工具。更别提简化问题复现带来的好处了。
- 我可以用 IDE 吗?
当然可以,因为源码就在本机上。IDE 默认调用 make 之类的程序来编译源码,我们只需要配置 IDE 来调用 Docker 命令编译源码即可。
很多 PaddlePaddle 开发者使用 Emacs。他们在自己的 `~/.emacs` 配置文件里加两行
```emacs
(global-set-key "\C-cc" 'compile)
(setq compile-command
"docker run --rm -it -v $(git rev-parse --show-toplevel):/paddle paddle:dev")
```
就可以按 `Ctrl-C``c` 键来启动编译了。
- 可以并行编译吗?
是的。我们的 Docker image 运行一个 [Bash 脚本](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build.sh)。这个脚本调用 `make -j$(nproc)` 来启动和 CPU 核一样多的进程来并行编译。
## 可能碰到的问题
- Docker 需要 sudo
如果用自己的电脑开发,自然也就有管理员权限(sudo)了。如果用公用的电脑开发,需要请管理员安装和配置好 Docker。此外,PaddlePaddle 项目在努力开始支持其他不需要 sudo 的集装箱技术,比如 rkt。
- 在 Windows/MacOS 上编译很慢
Docker 在 Windows 和 MacOS 都可以运行。不过实际上是运行在一个 Linux 虚拟机上。可能需要注意给这个虚拟机多分配一些 CPU 和内存,以保证编译高效。具体做法请参考[这个issue](https://github.com/PaddlePaddle/Paddle/issues/627)
- 磁盘不够
本文中的例子里,`docker run` 命令里都用了 `--rm` 参数,这样保证运行结束之后的 containers 不会保留在磁盘上。可以用 `docker ps -a` 命令看到停止后但是没有删除的 containers。`docker build` 命令有时候会产生一些中间结果,是没有名字的 images,也会占用磁盘。可以参考[这篇文章](https://zaiste.net/posts/removing_docker_containers/)来清理这些内容。
# Build using Docker
## What Developers Need
To contribute to PaddlePaddle, you need
1. A computer -- Linux, BSD, Windows, MacOS, and
1. Docker.
Nothing else. Not even Python and GCC, because you can install all build tools into a Docker image. We run all the tools by running this image.
## General Process
1. Retrieve source code.
```bash
git clone https://github.com/paddlepaddle/paddle
```
2. Install build tools into a Docker image.
```bash
cd paddle; docker build -t paddle:dev .
```
Please be aware of the `.` at the end of the command, which refers to the [`./Dockerfile` file](https://github.com/PaddlePaddle/Paddle/blob/develop/Dockerfile). `docker build` follows instructions in this file to create a Docker image named `paddle:dev`, and installs building tools into it.
3. Build from source.
This following command starts a Docker container that executes the Docker image `paddle:dev`, mapping the current directory to `/paddle/` in the container, and runs the default entry-point [`build.sh`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build.sh) as specified in the Dockefile. `build.sh` invokes `cmake` and `make` to build PaddlePaddle source code, which had been mapped to `/paddle`, and writes outputs to `/paddle/build`, which maps to `build` in the current source directory on the computer.
```bash
docker run -v $PWD:/paddle paddle:dev
```
Above command builds a CUDA-enabled version. If we want to build a CPU-only version, we can type
```bash
docker run -e WITH_GPU=OFF -v $PWD:/paddle paddle:dev
```
4. Run unit tests.
To run all unit tests using the first GPU of a node:
```bash
NV_GPU=0 nvidia-docker run -v $PWD:/paddle paddle:dev bash -c "cd /paddle/build; ctest"
```
If we used `WITH_GPU=OFF` at build time, it generates only CPU-based unit tests, and we don't need nvidia-docker to run them. We can just run
```bash
docker run -v $PWD:/paddle paddle:dev bash -c "cd /paddle/build; ctest"
```
Sometimes we want to run a specific unit test, say `memory_test`, we can run
```bash
nvidia-docker run -v $PWD:/paddle paddle:dev bash -c "cd /paddle/build; ctest -V -R memory_test"
```
5. Clean Build.
Sometimes, we might want to clean all thirt-party dependents and built binaries. To do so, just
```bash
rm -rf build
```
## Docker, Or Not?
- What is Docker?
If you haven't heard of it, consider it something like Python's virtualenv.
- Docker or virtual machine?
Some people compare Docker with VMs, but Docker doesn't virtualize any hardware nor running a guest OS, which means there is no compromise on the performance.
- Why Docker?
Using a Docker image of build tools standardizes the building environment, which makes it easier for others to reproduce your problems and to help.
Also, some build tools don't run on Windows or Mac or BSD, but Docker runs almost everywhere, so developers can use whatever computer they want.
- Can I choose not to use Docker?
Sure, you don't have to install build tools into a Docker image; instead, you can install them in your local computer. This document exists because Docker would make the development way easier.
- How difficult is it to learn Docker?
It takes you ten minutes to read [an introductory article](https://docs.docker.com/get-started) and saves you more than one hour to install all required build tools, configure them, especially when new versions of PaddlePaddle require some new tools. Not even to mention the time saved when other people trying to reproduce the issue you have.
- Can I use my favorite IDE?
Yes, of course. The source code resides on your local computer, and you can edit it using whatever editor you like.
Many PaddlePaddle developers are using Emacs. They add the following few lines into their `~/.emacs` configure file:
```emacs
(global-set-key "\C-cc" 'compile)
(setq compile-command
"docker run --rm -it -v $(git rev-parse --show-toplevel):/paddle paddle:dev")
```
so they could type `Ctrl-C` and `c` to build PaddlePaddle from source.
- Does Docker do parallel building?
Our building Docker image runs a [Bash script](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build.sh), which calls `make -j$(nproc)` to starts as many processes as the number of your CPU cores.
## Some Gotchas
- Docker requires sudo
An owner of a computer has the administrative privilege, a.k.a., sudo, and Docker requires this privilege to work properly. If you use a shared computer for development, please ask the administrator to install and configure Docker. We will do our best to support rkt, another container technology that doesn't require sudo.
- Docker on Windows/MacOS builds slowly
On Windows and MacOS, Docker containers run in a Linux VM. You might want to give this VM some more memory and CPUs so to make the building efficient. Please refer to [this issue](https://github.com/PaddlePaddle/Paddle/issues/627) for details.
- Not enough disk space
Examples in this article uses option `--rm` with the `docker run` command. This option ensures that stopped containers do not exist on hard disks. We can use `docker ps -a` to list all containers, including stopped. Sometimes `docker build` generates some intermediate dangling images, which also take disk space. To clean them, please refer to [this article](https://zaiste.net/posts/removing_docker_containers/).
从源码编译
======================
.. _requirements:
需要的软硬件
----------------
为了编译PaddlePaddle,我们需要
1. 一台电脑,可以装的是 Linux, Windows 或者 MacOS 操作系统
1. Docker
不需要依赖其他任何软件了。即便是 Python 和 GCC 都不需要,因为我们会把所有编译工具都安装进一个 Docker 镜像里。
.. _build_step:
编译方法
----------------
PaddlePaddle主要使用 `CMake <https://cmake.org>`_ 以及GCC, G++作为编译工具。
我们推荐您使用PaddlePaddle Docker编译环境镜像完成编译,这样可以免去单独安装编译依赖的步骤,可选的不同编译环境Docker镜像
可以在 `这里 <https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/>`_ 找到
PaddlePaddle需要使用Docker环境完成编译,这样可以免去单独安装编译依赖的步骤,可选的不同编译环境Docker镜像
可以在 `这里 <https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/>`_ 找到。或者
参考下述可选步骤,从源码中构建用于编译PaddlePaddle的Docker镜像
如果您选择不使用Docker镜像,则需要在本机安装下面章节列出的 `编译依赖`_ 之后才能开始编译的步骤。
......@@ -16,15 +28,19 @@ PaddlePaddle主要使用 `CMake <https://cmake.org>`_ 以及GCC, G++作为编译
.. code-block:: bash
# 1. 获取源码
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
# 如果使用Docker编译环境,执行下面的命令编译CPU-Only的二进制
# 2. 可选步骤:源码中构建用于编译PaddlePaddle的Docker镜像
docker build -t paddle:dev .
# 3. 执行下面的命令编译CPU-Only的二进制
docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=OFF" paddlepaddle/paddle_manylinux_devel:cuda8.0_cudnn5 bash -x /paddle/paddle/scripts/docker/build.sh
# 如果不使用Docker编译环境,执行下面的命令
mkdir build
cd build
cmake -DWITH_GPU=OFF -DWITH_TESTING=OFF ..
make
# 4. 或者也可以使用为上述可选步骤构建的镜像(必须先执行第2步)
docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=OFF" paddle:dev
注:上述命令把当前目录(源码树根目录)映射为 container 里的 :code:`/paddle` 目录。如果使用自行
构建的镜像(上述第4步)会执行 :code:`Dockerfile` 描述的默认入口程序 :code:`build.sh` 可以省略步骤3中
最后的执行脚本的命令。
编译完成后会在build/python/dist目录下生成输出的whl包,可以选在在当前机器安装也可以拷贝到目标机器安装:
......@@ -50,28 +66,83 @@ PaddlePaddle主要使用 `CMake <https://cmake.org>`_ 以及GCC, G++作为编译
如果您期望在编译完成后立即执行所有的单元测试,可以按照下面的方法:
使用Docker的情况下,设置 :code:`RUN_TEST=ON` 和 :code:`WITH_TESTING=ON` 就会在完成编译之后,立即执行单元测试。
设置 :code:`RUN_TEST=ON` 和 :code:`WITH_TESTING=ON` 就会在完成编译之后,立即执行单元测试。
开启 :code:`WITH_GPU=ON` 可以指定同时执行GPU上的单元测试。
.. code-block:: bash
docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=ON" -e "RUN_TEST=ON" paddlepaddle/paddle_manylinux_devel:cuda8.0_cudnn5 bash -x /paddle/paddle/scripts/docker/build.sh
如果不使用Docker,可以执行ctest命令即可
如果期望执行其中一个单元测试,(比如 :code:`test_sum_op` )
.. code-block:: bash
mkdir build
cd build
cmake -DWITH_GPU=OFF -DWITH_TESTING=OFF ..
make
ctest
# 指定执行其中一个单元测试 test_mul_op
ctest -R test_mul_op
docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=ON" -e "RUN_TEST=OFF" paddlepaddle/paddle_manylinux_devel:cuda8.0_cudnn5 /bin/bash
bash /paddle/paddle/scripts/docker/build.sh
cd /paddle/build
ctest -R test_sum_op -V
.. _faq_docker:
常见问题
----------------
- 什么是 Docker?
如果您没有听说 Docker,可以把它想象为一个类似 virtualenv 的系统,但是虚拟的不仅仅是 Python 的运行环境。
- Docker 还是虚拟机?
有人用虚拟机来类比 Docker。需要强调的是:Docker 不会虚拟任何硬件,Docker container 里运行的编译工具实际上都是在本机的 CPU 和操作系统上直接运行的,性能和把编译工具安装在本机运行一样。
- 为什么用 Docker?
把工具和配置都安装在一个 Docker image 里可以标准化编译环境。这样如果遇到问题,其他人可以复现问题以便帮助。
另外,对于习惯使用Windows和MacOS的开发者来说,使用Docker就不用配置交叉编译环境了。
- 我可以选择不用Docker吗?
当然可以。大家可以用把开发工具安装进入 Docker image 一样的方式,把这些工具安装到本机。这篇文档介绍基于 Docker 的开发流程,是因为这个流程比其他方法都更简便。
- 学习 Docker 有多难?
理解 Docker 并不难,大概花十分钟看一下[这篇文章](https://zhuanlan.zhihu.com/p/19902938)。这可以帮您省掉花一小时安装和配置各种开发工具,以及切换机器时需要新安装的辛苦。别忘了 PaddlePaddle 更新可能导致需要新的开发工具。更别提简化问题复现带来的好处了。
- 我可以用 IDE 吗?
当然可以,因为源码就在本机上。IDE 默认调用 make 之类的程序来编译源码,我们只需要配置 IDE 来调用 Docker 命令编译源码即可。
很多 PaddlePaddle 开发者使用 Emacs。他们在自己的 `~/.emacs` 配置文件里加两行
```emacs
(global-set-key "\C-cc" 'compile)
(setq compile-command
"docker run --rm -it -v $(git rev-parse --show-toplevel):/paddle paddle:dev")
```
就可以按 `Ctrl-C` 和 `c` 键来启动编译了。
- 可以并行编译吗?
是的。我们的 Docker image 运行一个 [Bash 脚本](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build.sh)。这个脚本调用 `make -j$(nproc)` 来启动和 CPU 核一样多的进程来并行编译。
- Docker 需要 sudo
如果用自己的电脑开发,自然也就有管理员权限(sudo)了。如果用公用的电脑开发,需要请管理员安装和配置好 Docker。此外,PaddlePaddle 项目在努力开始支持其他不需要 sudo 的集装箱技术,比如 rkt。
- 在 Windows/MacOS 上编译很慢
Docker 在 Windows 和 MacOS 都可以运行。不过实际上是运行在一个 Linux 虚拟机上。可能需要注意给这个虚拟机多分配一些 CPU 和内存,以保证编译高效。具体做法请参考[这个issue](https://github.com/PaddlePaddle/Paddle/issues/627)。
- 磁盘不够
本文中的例子里,`docker run` 命令里都用了 `--rm` 参数,这样保证运行结束之后的 containers 不会保留在磁盘上。可以用 `docker ps -a` 命令看到停止后但是没有删除的 containers。`docker build` 命令有时候会产生一些中间结果,是没有名字的 images,也会占用磁盘。可以参考[这篇文章](https://zaiste.net/posts/removing_docker_containers/)来清理这些内容。
.. _compile_deps:
编译依赖
附录:编译依赖
----------------
PaddlePaddle编译需要使用到下面的依赖(包含但不限于),其他的依赖软件,会自动在编译时下载。
......@@ -91,7 +162,7 @@ PaddlePaddle编译需要使用到下面的依赖(包含但不限于),其
.. _build_options:
编译选项
附录:编译选项
----------------
PaddlePaddle的编译选项,包括生成CPU/GPU二进制文件、链接何种BLAS库等。
......
Build from Sources
==========================
.. _build_step:
.. _requirements:
How To Build
Requirements
----------------
PaddlePaddle mainly uses `CMake <https://cmake.org>`_ and GCC, G++ as compile
tools. We recommend you to use our pre-built Docker image to run the build
to avoid installing dependencies by yourself. We have several build environment
Docker images `here <https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/>`_ .
To build PaddlePaddle, you need
1. A computer -- Linux, Windows, MacOS.
1. Docker.
Nothing else. Not even Python and GCC, because you can install all build tools into a Docker image.
We run all the tools by running this image.
.. _build_step:
If you choose not to use Docker image for your build, you need to install the
below `Compile Dependencies`_ before run the build.
How To Build
----------------
Then run:
You need to use Docker to build PaddlePaddle
to avoid installing dependencies by yourself. We have several pre-built
Docker images `here <https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/>`_ ,
Or you can build your own image from source as the optional step below:
.. code-block:: bash
# 1. clone the source code
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
# run the following command to build a CPU-Only binaries if you are using docker
# 2. Optional: build development docker image from source
docker build -t paddle:dev .
# 3. Run the following command to build a CPU-Only binaries
docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=OFF" paddlepaddle/paddle_manylinux_devel:cuda8.0_cudnn5 bash -x /paddle/paddle/scripts/docker/build.sh
# else run these commands
mkdir build
cd build
cmake -DWITH_GPU=OFF -DWITH_TESTING=OFF ..
make
# 4. Or, use your built Docker image to build PaddlePaddle (must run step 2)
docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=OFF" paddle:dev
NOTE: The above command try to mount the current working directory (root directory of source code)
into :code:`/paddle` directory inside docker container. If you are using your own image
(Step 4) it will run default entry-point :code:`build.sh` , so you could omit the last
command in step 3.
When the compile finishes, you can get the output whl package under
build/python/dist, then you can choose to install the whl on local
......@@ -61,22 +74,75 @@ Set :code:`WITH_GPU=ON` Can also run tests on GPU.
docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=ON" -e "RUN_TEST=ON" paddlepaddle/paddle_manylinux_devel:cuda8.0_cudnn5 bash -x paddle/paddle/scripts/docker/build.sh
If you don't use Docker, just run ctest will start the tests:
If you wish to run only one unit test, like :code:`test_sum_op`:
.. code-block:: bash
mkdir build
cd build
cmake -DWITH_GPU=OFF -DWITH_TESTING=ON ..
make
ctest
# run a single test like test_mul_op
ctest -R test_mul_op
docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=ON" -e "RUN_TEST=OFF" paddlepaddle/paddle_manylinux_devel:cuda8.0_cudnn5 /bin/bash
bash /paddle/paddle/scripts/docker/build.sh
cd /paddle/build
ctest -R test_sum_op -V
.. _faq_docker:
Frequently Asked Questions
----------------
- What is Docker?
If you haven't heard of it, consider it something like Python's virtualenv.
- Docker or virtual machine?
Some people compare Docker with VMs, but Docker doesn't virtualize any hardware nor running a guest OS, which means there is no compromise on the performance.
- Why Docker?
Using a Docker image of build tools standardizes the building environment, which makes it easier for others to reproduce your problems and to help.
Also, some build tools don't run on Windows or Mac or BSD, but Docker runs almost everywhere, so developers can use whatever computer they want.
- Can I choose not to use Docker?
Sure, you don't have to install build tools into a Docker image; instead, you can install them on your local computer. This document exists because Docker would make the development way easier.
- How difficult is it to learn Docker?
It takes you ten minutes to read [an introductory article](https://docs.docker.com/get-started) and saves you more than one hour to install all required build tools, configure them, especially when new versions of PaddlePaddle require some new tools. Not even to mention the time saved when other people trying to reproduce the issue you have.
- Can I use my favorite IDE?
Yes, of course. The source code resides on your local computer, and you can edit it using whatever editor you like.
Many PaddlePaddle developers are using Emacs. They add the following few lines into their `~/.emacs` configure file:
```emacs
(global-set-key "\C-cc" 'compile)
(setq compile-command
"docker run --rm -it -v $(git rev-parse --show-toplevel):/paddle paddle:dev")
```
so they could type `Ctrl-C` and `c` to build PaddlePaddle from source.
- Does Docker do parallel building?
Our building Docker image runs a [Bash script](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/scripts/docker/build.sh), which calls `make -j$(nproc)` to starts as many processes as the number of your CPU cores.
- Docker requires sudo
An owner of a computer has the administrative privilege, a.k.a., sudo, and Docker requires this privilege to work properly. If you use a shared computer for development, please ask the administrator to install and configure Docker. We will do our best to support rkt, another container technology that doesn't require sudo.
- Docker on Windows/MacOS builds slowly
On Windows and MacOS, Docker containers run in a Linux VM. You might want to give this VM some more memory and CPUs so to make the building efficient. Please refer to [this issue](https://github.com/PaddlePaddle/Paddle/issues/627) for details.
- Not enough disk space
Examples in this article use option `--rm` with the `docker run` command. This option ensures that stopped containers do not exist on hard disks. We can use `docker ps -a` to list all containers, including stopped. Sometimes `docker build` generates some intermediate dangling images, which also take disk space. To clean them, please refer to [this article](https://zaiste.net/posts/removing_docker_containers/).
.. _compile_deps:
Compile Dependencies
Appendix: Compile Dependencies
----------------
PaddlePaddle need the following dependencies when compiling, other dependencies
......@@ -97,17 +163,13 @@ will be downloaded automatically.
.. _build_options:
Build Options
Appendix: Build Options
----------------
Build options include whether build binaries for CPU or GPU, which BLAS
library to use etc. You may pass these settings when running cmake.
For detailed cmake tutorial please refer to `here <https://cmake.org/cmake-tutorial>`_ 。
.. _build_options_bool:
Bool Type Options
----------------
You can add :code:`-D` argument to pass such options, like:
......
......@@ -13,7 +13,6 @@ PaddlePaddle提供pip和Docker的安装方式:
pip_install_cn.rst
docker_install_cn.rst
build_cn.md
编译流程
++++++++
......
......@@ -13,8 +13,6 @@ You can choose either pip or Docker to complete your install:
pip_install_en.rst
docker_install_en.rst
build_en.md
Build from Source
-----------------
......
文件已移动
......@@ -82,7 +82,7 @@ language = 'zh_CN'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build', '**/*_en*', '*_en*']
exclude_patterns = ['_build', '**/*_en*', '*_en*', 'api/*']
# The reST default role (used for this markup: `text`) to use for all
# documents.
......
......@@ -82,7 +82,7 @@ language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build', '**/*_cn*', '*_cn*']
exclude_patterns = ['_build', '**/*_cn*', '*_cn*', 'api/*']
# The reST default role (used for this markup: `text`) to use for all
# documents.
......
......@@ -11,7 +11,6 @@ if(MOBILE_INFERENCE)
else()
add_subdirectory(pserver)
add_subdirectory(trainer)
add_subdirectory(string)
add_subdirectory(scripts)
if(WITH_C_API)
......
......@@ -4,3 +4,4 @@ add_subdirectory(framework)
add_subdirectory(operators)
add_subdirectory(pybind)
add_subdirectory(inference)
add_subdirectory(string)
......@@ -314,5 +314,15 @@ DDim stride(const DDim& ddim) {
}
return framework::make_ddim(strides);
}
DDim stride_numel(const framework::DDim& ddim) {
std::vector<int64_t> strides(ddim.size());
strides[ddim.size() - 1] = ddim[ddim.size() - 1];
for (int i = ddim.size() - 2; i >= 0; --i) {
strides[i] = strides[i + 1] * ddim[i];
}
return framework::make_ddim(strides);
}
} // namespace framework
} // namespace paddle
......@@ -125,6 +125,8 @@ DDim flatten_to_2d(const DDim& src, int num_col_dims);
DDim flatten_to_1d(const DDim& src);
DDim stride(const DDim& ddim);
DDim stride_numel(const DDim& ddim);
} // namespace framework
} // namespace paddle
......
......@@ -20,7 +20,7 @@ limitations under the License. */
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/place.h"
#include "paddle/string/piece.h"
#include "paddle/fluid/string/piece.h"
namespace paddle {
namespace framework {
......
......@@ -37,9 +37,8 @@ class Vector {
// Fill vector with value. The vector size is `count`.
explicit Vector(size_t count, const T& value = T()) {
if (count == 0) {
InitEmpty();
} else {
InitEmpty();
if (count != 0) {
resize(count);
T* ptr = begin();
for (size_t i = 0; i < count; ++i) {
......@@ -107,9 +106,11 @@ class Vector {
// std::vector iterator methods. Based on CPU data access method
size_t size() const { return size_; }
T* begin() { return &this->operator[](0); }
T* begin() { return capacity() == 0 ? &EmptyDummy() : &this->operator[](0); }
T* end() { return &this->operator[](size()); }
T* end() {
return capacity() == 0 ? &EmptyDummy() : &this->operator[](size());
}
T& front() { return *begin(); }
......@@ -119,8 +120,17 @@ class Vector {
return *it;
}
const T* begin() const { return &this->operator[](0); }
const T* end() const { return &this->operator[](size()); }
const T* begin() const {
return capacity() == 0 ? &EmptyDummy() : &this->operator[](0);
}
const T* end() const {
return capacity() == 0 ? &EmptyDummy() : &this->operator[](size());
}
const T* cbegin() const { return begin(); }
const T* cend() const { return end(); }
const T& back() const {
auto it = end();
......@@ -244,7 +254,9 @@ class Vector {
bool operator==(const Vector<T>& other) const {
if (size() != other.size()) return false;
for (auto it1 = begin(), it2 = other.begin(); it1 < end(); ++it1, ++it2) {
auto it1 = cbegin();
auto it2 = other.cbegin();
for (; it1 < cend(); ++it1, ++it2) {
if (*it1 != *it2) {
return false;
}
......@@ -353,6 +365,11 @@ class Vector {
}
}
static T& EmptyDummy() {
static T dummy = T();
return dummy;
}
mutable int flag_;
mutable Tensor cpu_vec_;
mutable Tensor cuda_vec_;
......
......@@ -26,10 +26,10 @@ TEST(mixed_vector, CPU_VECTOR) {
for (int i = 0; i < 10; ++i) {
tmp.push_back(i);
}
ASSERT_EQ(tmp.size(), 10);
ASSERT_EQ(tmp.size(), 10UL);
vec<int> tmp2;
tmp2 = tmp;
ASSERT_EQ(tmp2.size(), 10);
ASSERT_EQ(tmp2.size(), 10UL);
for (int i = 0; i < 10; ++i) {
ASSERT_EQ(tmp2[i], i);
ASSERT_EQ(tmp2[i], tmp[i]);
......@@ -58,7 +58,7 @@ TEST(mixed_vector, GPU_VECTOR) {
for (int i = 0; i < 10; ++i) {
tmp.push_back(i);
}
ASSERT_EQ(tmp.size(), 10);
ASSERT_EQ(tmp.size(), 10UL);
paddle::platform::CUDAPlace gpu(0);
multiply_10<<<1, 1, 0, GetCUDAStream(gpu)>>>(tmp.MutableData(gpu));
......@@ -79,7 +79,7 @@ TEST(mixed_vector, MultiGPU) {
for (int i = 0; i < 10; ++i) {
tmp.push_back(i);
}
ASSERT_EQ(tmp.size(), 10);
ASSERT_EQ(tmp.size(), 10UL);
paddle::platform::CUDAPlace gpu0(0);
paddle::platform::SetDeviceId(0);
multiply_10<<<1, 1, 0, GetCUDAStream(gpu0)>>>(tmp.MutableData(gpu0));
......@@ -91,3 +91,16 @@ TEST(mixed_vector, MultiGPU) {
ASSERT_EQ(tmp[i], i * 100);
}
}
TEST(mixed_vector, InitWithCount) {
paddle::framework::Vector<int> vec(10, 10);
for (int i = 0; i < 10; ++i) {
ASSERT_EQ(vec[i], 10);
}
}
TEST(mixed_vector, ForEach) {
vec<int> tmp;
for (auto& v : tmp) {
}
}
......@@ -18,7 +18,7 @@ limitations under the License. */
#include <mutex> // for call_once
#include "glog/logging.h"
#include "paddle/fluid/framework/threadpool.h"
#include "paddle/string/printf.h"
#include "paddle/fluid/string/printf.h"
DEFINE_bool(benchmark, false,
"Doing memory benchmark. It will make deleting scope synchronized, "
......
......@@ -29,6 +29,6 @@ inference_test(image_classification ARGS vgg resnet)
inference_test(label_semantic_roles)
inference_test(recognize_digits ARGS mlp)
inference_test(recommender_system)
inference_test(rnn_encoder_decoder)
#inference_test(rnn_encoder_decoder)
inference_test(understand_sentiment)
inference_test(word2vec)
......@@ -102,3 +102,5 @@ REGISTER_LOGICAL_OP(less_equal, "Out = X <= Y");
REGISTER_LOGICAL_KERNEL(less_equal, CPU, paddle::operators::LessEqualFunctor);
REGISTER_LOGICAL_OP(equal, "Out = X == Y");
REGISTER_LOGICAL_KERNEL(equal, CPU, paddle::operators::EqualFunctor);
REGISTER_LOGICAL_OP(not_equal, "Out = X != Y");
REGISTER_LOGICAL_KERNEL(not_equal, CPU, paddle::operators::NotEqualFunctor);
......@@ -17,3 +17,4 @@ limitations under the License. */
REGISTER_LOGICAL_KERNEL(less_than, CUDA, paddle::operators::LessThanFunctor);
REGISTER_LOGICAL_KERNEL(less_equal, CUDA, paddle::operators::LessEqualFunctor);
REGISTER_LOGICAL_KERNEL(equal, CUDA, paddle::operators::EqualFunctor);
REGISTER_LOGICAL_KERNEL(not_equal, CUDA, paddle::operators::NotEqualFunctor);
......@@ -48,6 +48,14 @@ struct EqualFunctor {
}
};
template <typename T>
struct NotEqualFunctor {
using ELEM_TYPE = T;
HOSTDEVICE bool operator()(const T& a, const T& b) const {
return !EqualFunctor<T>()(a, b);
}
};
template <typename DeviceContext, typename Functor>
class CompareOpKernel
: public framework::OpKernel<typename Functor::ELEM_TYPE> {
......
......@@ -28,17 +28,18 @@ class ConcatKernel : public framework::OpKernel<T> {
auto ins = ctx.MultiInput<framework::Tensor>("X");
auto* out = ctx.Output<framework::Tensor>("Out");
int64_t axis = static_cast<int64_t>(ctx.Attr<int>("axis"));
const size_t n = ins.size();
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto out_stride = framework::stride_numel(out->dims());
size_t output_offset = 0;
out->mutable_data<T>(ctx.GetPlace());
auto out_stride = framework::stride(out->dims());
for (size_t i = 0; i < n; i++) {
auto& in = ins[i];
auto axis_dim = in->dims()[axis];
auto in_stride = framework::stride(in->dims());
StridedMemcpy<T>(ctx.device_context(), in->data<T>(), in_stride,
in->dims(), out_stride, out->data<T>() + output_offset);
output_offset += axis_dim * in_stride[axis];
for (auto* in : ins) {
auto in_stride = framework::stride_numel(in->dims());
StridedNumelCopyWithAxis<T>(ctx.device_context(), axis,
out->data<T>() + output_offset, out_stride,
in->data<T>(), in_stride);
output_offset += in_stride[axis];
}
}
};
......@@ -50,17 +51,16 @@ class ConcatGradKernel : public framework::OpKernel<T> {
auto* in = ctx.Input<framework::Tensor>(framework::GradVarName("Out"));
auto outs = ctx.MultiOutput<framework::Tensor>(framework::GradVarName("X"));
int64_t axis = static_cast<int64_t>(ctx.Attr<int>("axis"));
const size_t n = outs.size();
size_t input_offset = 0;
auto in_stride = framework::stride(in->dims());
for (size_t i = 0; i < n; i++) {
auto& out = outs[i];
auto in_stride = framework::stride_numel(in->dims());
for (auto& out : outs) {
out->mutable_data<T>(ctx.GetPlace());
size_t axis_dim = out->dims()[axis];
auto out_stride = framework::stride(out->dims());
StridedMemcpy<T>(ctx.device_context(), in->data<T>() + input_offset,
in_stride, out->dims(), out_stride, out->data<T>());
input_offset += axis_dim * in_stride[axis];
auto out_stride = framework::stride_numel(out->dims());
StridedNumelCopyWithAxis<T>(ctx.device_context(), axis, out->data<T>(),
out_stride, in->data<T>() + input_offset,
in_stride);
input_offset += out_stride[axis];
}
}
};
......
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/detection_map_op.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
class DetectionMAPOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("DetectRes"),
"Input(DetectRes) of DetectionMAPOp should not be null.");
PADDLE_ENFORCE(ctx->HasInput("Label"),
"Input(Label) of DetectionMAPOp should not be null.");
PADDLE_ENFORCE(
ctx->HasOutput("AccumPosCount"),
"Output(AccumPosCount) of DetectionMAPOp should not be null.");
PADDLE_ENFORCE(
ctx->HasOutput("AccumTruePos"),
"Output(AccumTruePos) of DetectionMAPOp should not be null.");
PADDLE_ENFORCE(
ctx->HasOutput("AccumFalsePos"),
"Output(AccumFalsePos) of DetectionMAPOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("MAP"),
"Output(MAP) of DetectionMAPOp should not be null.");
auto det_dims = ctx->GetInputDim("DetectRes");
PADDLE_ENFORCE_EQ(det_dims.size(), 2UL,
"The rank of Input(DetectRes) must be 2, "
"the shape is [N, 6].");
PADDLE_ENFORCE_EQ(det_dims[1], 6UL,
"The shape is of Input(DetectRes) [N, 6].");
auto label_dims = ctx->GetInputDim("Label");
PADDLE_ENFORCE_EQ(label_dims.size(), 2UL,
"The rank of Input(Label) must be 2, "
"the shape is [N, 6].");
PADDLE_ENFORCE_EQ(label_dims[1], 6UL,
"The shape is of Input(Label) [N, 6].");
if (ctx->HasInput("PosCount")) {
PADDLE_ENFORCE(ctx->HasInput("TruePos"),
"Input(TruePos) of DetectionMAPOp should not be null when "
"Input(TruePos) is not null.");
PADDLE_ENFORCE(
ctx->HasInput("FalsePos"),
"Input(FalsePos) of DetectionMAPOp should not be null when "
"Input(FalsePos) is not null.");
}
ctx->SetOutputDim("MAP", framework::make_ddim({1}));
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
framework::ToDataType(
ctx.Input<framework::Tensor>("DetectRes")->type()),
ctx.device_context());
}
};
class DetectionMAPOpMaker : public framework::OpProtoAndCheckerMaker {
public:
DetectionMAPOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("DetectRes",
"(LoDTensor) A 2-D LoDTensor with shape [M, 6] represents the "
"detections. Each row has 6 values: "
"[label, confidence, xmin, ymin, xmax, ymax], M is the total "
"number of detect results in this mini-batch. For each instance, "
"the offsets in first dimension are called LoD, the number of "
"offset is N + 1, if LoD[i + 1] - LoD[i] == 0, means there is "
"no detected data.");
AddInput("Label",
"(LoDTensor) A 2-D LoDTensor with shape[N, 6] represents the"
"Labeled ground-truth data. Each row has 6 values: "
"[label, is_difficult, xmin, ymin, xmax, ymax], N is the total "
"number of ground-truth data in this mini-batch. For each "
"instance, the offsets in first dimension are called LoD, "
"the number of offset is N + 1, if LoD[i + 1] - LoD[i] == 0, "
"means there is no ground-truth data.");
AddInput("PosCount",
"(Tensor) A tensor with shape [Ncls, 1], store the "
"input positive example count of each class, Ncls is the count of "
"input classification. "
"This input is used to pass the AccumPosCount generated by the "
"previous mini-batch when the multi mini-batches cumulative "
"calculation carried out. "
"When the input(PosCount) is empty, the cumulative "
"calculation is not carried out, and only the results of the "
"current mini-batch are calculated.")
.AsDispensable();
AddInput("TruePos",
"(LoDTensor) A 2-D LoDTensor with shape [Ntp, 2], store the "
"input true positive example of each class."
"This input is used to pass the AccumTruePos generated by the "
"previous mini-batch when the multi mini-batches cumulative "
"calculation carried out. ")
.AsDispensable();
AddInput("FalsePos",
"(LoDTensor) A 2-D LoDTensor with shape [Nfp, 2], store the "
"input false positive example of each class."
"This input is used to pass the AccumFalsePos generated by the "
"previous mini-batch when the multi mini-batches cumulative "
"calculation carried out. ")
.AsDispensable();
AddOutput("AccumPosCount",
"(Tensor) A tensor with shape [Ncls, 1], store the "
"positive example count of each class. It combines the input "
"input(PosCount) and the positive example count computed from "
"input(Detection) and input(Label).");
AddOutput("AccumTruePos",
"(LoDTensor) A LoDTensor with shape [Ntp', 2], store the "
"true positive example of each class. It combines the "
"input(TruePos) and the true positive examples computed from "
"input(Detection) and input(Label).");
AddOutput("AccumFalsePos",
"(LoDTensor) A LoDTensor with shape [Nfp', 2], store the "
"false positive example of each class. It combines the "
"input(FalsePos) and the false positive examples computed from "
"input(Detection) and input(Label).");
AddOutput("MAP",
"(Tensor) A tensor with shape [1], store the mAP evaluate "
"result of the detection.");
AddAttr<float>(
"overlap_threshold",
"(float) "
"The lower bound jaccard overlap threshold of detection output and "
"ground-truth data.")
.SetDefault(.3f);
AddAttr<bool>("evaluate_difficult",
"(bool, default true) "
"Switch to control whether the difficult data is evaluated.")
.SetDefault(true);
AddAttr<std::string>("ap_type",
"(string, default 'integral') "
"The AP algorithm type, 'integral' or '11point'.")
.SetDefault("integral")
.InEnum({"integral", "11point"})
.AddCustomChecker([](const std::string& ap_type) {
PADDLE_ENFORCE_NE(GetAPType(ap_type), APType::kNone,
"The ap_type should be 'integral' or '11point.");
});
AddComment(R"DOC(
Detection mAP evaluate operator.
The general steps are as follows. First, calculate the true positive and
false positive according to the input of detection and labels, then
calculate the mAP evaluate value.
Supporting '11 point' and 'integral' mAP algorithm. Please get more information
from the following articles:
https://sanchom.wordpress.com/tag/average-precision/
https://arxiv.org/abs/1512.02325
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(detection_map, ops::DetectionMAPOp,
ops::DetectionMAPOpMaker);
REGISTER_OP_CPU_KERNEL(
detection_map, ops::DetectionMAPOpKernel<paddle::platform::CPUPlace, float>,
ops::DetectionMAPOpKernel<paddle::platform::CPUPlace, double>);
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
namespace paddle {
namespace operators {
enum APType { kNone = 0, kIntegral, k11point };
APType GetAPType(std::string str) {
if (str == "integral") {
return APType::kIntegral;
} else if (str == "11point") {
return APType::k11point;
} else {
return APType::kNone;
}
}
template <typename T>
inline bool SortScorePairDescend(const std::pair<float, T>& pair1,
const std::pair<float, T>& pair2) {
return pair1.first > pair2.first;
}
template <typename T>
inline void GetAccumulation(std::vector<std::pair<T, int>> in_pairs,
std::vector<int>* accu_vec) {
std::stable_sort(in_pairs.begin(), in_pairs.end(), SortScorePairDescend<int>);
accu_vec->clear();
size_t sum = 0;
for (size_t i = 0; i < in_pairs.size(); ++i) {
auto count = in_pairs[i].second;
sum += count;
accu_vec->push_back(sum);
}
}
template <typename Place, typename T>
class DetectionMAPOpKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* in_detect = ctx.Input<framework::LoDTensor>("DetectRes");
auto* in_label = ctx.Input<framework::LoDTensor>("Label");
auto* out_map = ctx.Output<framework::Tensor>("MAP");
auto* in_pos_count = ctx.Input<framework::Tensor>("PosCount");
auto* in_true_pos = ctx.Input<framework::LoDTensor>("TruePos");
auto* in_false_pos = ctx.Input<framework::LoDTensor>("FalsePos");
auto* out_pos_count = ctx.Output<framework::Tensor>("AccumPosCount");
auto* out_true_pos = ctx.Output<framework::LoDTensor>("AccumTruePos");
auto* out_false_pos = ctx.Output<framework::LoDTensor>("AccumFalsePos");
float overlap_threshold = ctx.Attr<float>("overlap_threshold");
float evaluate_difficult = ctx.Attr<bool>("evaluate_difficult");
auto ap_type = GetAPType(ctx.Attr<std::string>("ap_type"));
auto label_lod = in_label->lod();
auto detect_lod = in_detect->lod();
PADDLE_ENFORCE_EQ(label_lod.size(), 1UL,
"Only support one level sequence now.");
PADDLE_ENFORCE_EQ(label_lod[0].size(), detect_lod[0].size(),
"The batch_size of input(Label) and input(Detection) "
"must be the same.");
std::vector<std::map<int, std::vector<Box>>> gt_boxes;
std::vector<std::map<int, std::vector<std::pair<T, Box>>>> detect_boxes;
GetBoxes(*in_label, *in_detect, gt_boxes, detect_boxes);
std::map<int, int> label_pos_count;
std::map<int, std::vector<std::pair<T, int>>> true_pos;
std::map<int, std::vector<std::pair<T, int>>> false_pos;
if (in_pos_count != nullptr) {
GetInputPos(*in_pos_count, *in_true_pos, *in_false_pos, label_pos_count,
true_pos, false_pos);
}
CalcTrueAndFalsePositive(gt_boxes, detect_boxes, evaluate_difficult,
overlap_threshold, label_pos_count, true_pos,
false_pos);
T map = CalcMAP(ap_type, label_pos_count, true_pos, false_pos);
GetOutputPos(ctx, label_pos_count, true_pos, false_pos, *out_pos_count,
*out_true_pos, *out_false_pos);
T* map_data = out_map->mutable_data<T>(ctx.GetPlace());
map_data[0] = map;
}
protected:
struct Box {
Box(T xmin, T ymin, T xmax, T ymax)
: xmin(xmin), ymin(ymin), xmax(xmax), ymax(ymax), is_difficult(false) {}
T xmin, ymin, xmax, ymax;
bool is_difficult;
};
inline T JaccardOverlap(const Box& box1, const Box& box2) const {
if (box2.xmin > box1.xmax || box2.xmax < box1.xmin ||
box2.ymin > box1.ymax || box2.ymax < box1.ymin) {
return 0.0;
} else {
T inter_xmin = std::max(box1.xmin, box2.xmin);
T inter_ymin = std::max(box1.ymin, box2.ymin);
T inter_xmax = std::min(box1.xmax, box2.xmax);
T inter_ymax = std::min(box1.ymax, box2.ymax);
T inter_width = inter_xmax - inter_xmin;
T inter_height = inter_ymax - inter_ymin;
T inter_area = inter_width * inter_height;
T bbox_area1 = (box1.xmax - box1.xmin) * (box1.ymax - box1.ymin);
T bbox_area2 = (box2.xmax - box2.xmin) * (box2.ymax - box2.ymin);
return inter_area / (bbox_area1 + bbox_area2 - inter_area);
}
}
void GetBoxes(const framework::LoDTensor& input_label,
const framework::LoDTensor& input_detect,
std::vector<std::map<int, std::vector<Box>>>& gt_boxes,
std::vector<std::map<int, std::vector<std::pair<T, Box>>>>&
detect_boxes) const {
auto labels = framework::EigenTensor<T, 2>::From(input_label);
auto detect = framework::EigenTensor<T, 2>::From(input_detect);
auto label_lod = input_label.lod();
auto detect_lod = input_detect.lod();
int batch_size = label_lod[0].size() - 1;
auto label_index = label_lod[0];
for (int n = 0; n < batch_size; ++n) {
std::map<int, std::vector<Box>> boxes;
for (int i = label_index[n]; i < label_index[n + 1]; ++i) {
Box box(labels(i, 2), labels(i, 3), labels(i, 4), labels(i, 5));
int label = labels(i, 0);
auto is_difficult = labels(i, 1);
if (std::abs(is_difficult - 0.0) < 1e-6)
box.is_difficult = false;
else
box.is_difficult = true;
boxes[label].push_back(box);
}
gt_boxes.push_back(boxes);
}
auto detect_index = detect_lod[0];
for (int n = 0; n < batch_size; ++n) {
std::map<int, std::vector<std::pair<T, Box>>> boxes;
for (int i = detect_index[n]; i < detect_index[n + 1]; ++i) {
Box box(detect(i, 2), detect(i, 3), detect(i, 4), detect(i, 5));
int label = detect(i, 0);
auto score = detect(i, 1);
boxes[label].push_back(std::make_pair(score, box));
}
detect_boxes.push_back(boxes);
}
}
void GetOutputPos(
const framework::ExecutionContext& ctx,
const std::map<int, int>& label_pos_count,
const std::map<int, std::vector<std::pair<T, int>>>& true_pos,
const std::map<int, std::vector<std::pair<T, int>>>& false_pos,
framework::Tensor& output_pos_count,
framework::LoDTensor& output_true_pos,
framework::LoDTensor& output_false_pos) const {
int max_class_id = 0;
int true_pos_count = 0;
int false_pos_count = 0;
for (auto it = label_pos_count.begin(); it != label_pos_count.end(); ++it) {
int label = it->first;
if (label > max_class_id) max_class_id = label;
int label_num_pos = it->second;
if (label_num_pos == 0 || true_pos.find(label) == true_pos.end())
continue;
auto label_true_pos = true_pos.find(label)->second;
auto label_false_pos = false_pos.find(label)->second;
true_pos_count += label_true_pos.size();
false_pos_count += label_false_pos.size();
}
int* pos_count_data = output_pos_count.mutable_data<int>(
framework::make_ddim({max_class_id + 1, 1}), ctx.GetPlace());
T* true_pos_data = output_true_pos.mutable_data<T>(
framework::make_ddim({true_pos_count, 2}), ctx.GetPlace());
T* false_pos_data = output_false_pos.mutable_data<T>(
framework::make_ddim({false_pos_count, 2}), ctx.GetPlace());
true_pos_count = 0;
false_pos_count = 0;
std::vector<size_t> true_pos_starts = {0};
std::vector<size_t> false_pos_starts = {0};
for (int i = 0; i <= max_class_id; ++i) {
auto it_count = label_pos_count.find(i);
pos_count_data[i] = 0;
if (it_count != label_pos_count.end()) {
pos_count_data[i] = it_count->second;
}
auto it_true_pos = true_pos.find(i);
if (it_true_pos != true_pos.end()) {
const std::vector<std::pair<T, int>>& true_pos_vec =
it_true_pos->second;
for (const std::pair<T, int>& tp : true_pos_vec) {
true_pos_data[true_pos_count * 2] = tp.first;
true_pos_data[true_pos_count * 2 + 1] = static_cast<T>(tp.second);
true_pos_count++;
}
}
true_pos_starts.push_back(true_pos_count);
auto it_false_pos = false_pos.find(i);
if (it_false_pos != false_pos.end()) {
const std::vector<std::pair<T, int>>& false_pos_vec =
it_false_pos->second;
for (const std::pair<T, int>& fp : false_pos_vec) {
false_pos_data[false_pos_count * 2] = fp.first;
false_pos_data[false_pos_count * 2 + 1] = static_cast<T>(fp.second);
false_pos_count++;
}
}
false_pos_starts.push_back(false_pos_count);
}
framework::LoD true_pos_lod;
true_pos_lod.emplace_back(true_pos_starts);
framework::LoD false_pos_lod;
false_pos_lod.emplace_back(false_pos_starts);
output_true_pos.set_lod(true_pos_lod);
output_false_pos.set_lod(false_pos_lod);
return;
}
void GetInputPos(
const framework::Tensor& input_pos_count,
const framework::LoDTensor& input_true_pos,
const framework::LoDTensor& input_false_pos,
std::map<int, int>& label_pos_count,
std::map<int, std::vector<std::pair<T, int>>>& true_pos,
std::map<int, std::vector<std::pair<T, int>>>& false_pos) const {
constexpr T kEPS = static_cast<T>(1e-6);
int class_number = input_pos_count.dims()[0];
const int* pos_count_data = input_pos_count.data<int>();
for (int i = 0; i < class_number; ++i) {
label_pos_count[i] = pos_count_data[i];
}
auto SetData = [](const framework::LoDTensor& pos_tensor,
std::map<int, std::vector<std::pair<T, int>>>& pos) {
const T* pos_data = pos_tensor.data<T>();
auto pos_data_lod = pos_tensor.lod();
for (int i = 0; i < pos_data_lod.size(); ++i) {
for (int j = pos_data_lod[0][i]; j < pos_data_lod[0][i + 1]; ++j) {
T score = pos_data[j * 2];
int flag = 1;
if (pos_data[j * 2 + 1] < kEPS) flag = 0;
pos[i].push_back(std::make_pair(score, flag));
}
}
};
SetData(input_true_pos, true_pos);
SetData(input_false_pos, false_pos);
return;
}
void CalcTrueAndFalsePositive(
const std::vector<std::map<int, std::vector<Box>>>& gt_boxes,
const std::vector<std::map<int, std::vector<std::pair<T, Box>>>>&
detect_boxes,
bool evaluate_difficult, float overlap_threshold,
std::map<int, int>& label_pos_count,
std::map<int, std::vector<std::pair<T, int>>>& true_pos,
std::map<int, std::vector<std::pair<T, int>>>& false_pos) const {
int batch_size = gt_boxes.size();
for (int n = 0; n < batch_size; ++n) {
auto image_gt_boxes = gt_boxes[n];
for (auto it = image_gt_boxes.begin(); it != image_gt_boxes.end(); ++it) {
size_t count = 0;
auto labeled_bboxes = it->second;
if (evaluate_difficult) {
count = labeled_bboxes.size();
} else {
for (size_t i = 0; i < labeled_bboxes.size(); ++i)
if (!(labeled_bboxes[i].is_difficult)) ++count;
}
if (count == 0) {
continue;
}
int label = it->first;
if (label_pos_count.find(label) == label_pos_count.end()) {
label_pos_count[label] = count;
} else {
label_pos_count[label] += count;
}
}
}
for (size_t n = 0; n < detect_boxes.size(); ++n) {
auto image_gt_boxes = gt_boxes[n];
auto detections = detect_boxes[n];
if (image_gt_boxes.size() == 0) {
for (auto it = detections.begin(); it != detections.end(); ++it) {
auto pred_boxes = it->second;
int label = it->first;
for (size_t i = 0; i < pred_boxes.size(); ++i) {
auto score = pred_boxes[i].first;
true_pos[label].push_back(std::make_pair(score, 0));
false_pos[label].push_back(std::make_pair(score, 1));
}
}
continue;
}
for (auto it = detections.begin(); it != detections.end(); ++it) {
int label = it->first;
auto pred_boxes = it->second;
if (image_gt_boxes.find(label) == image_gt_boxes.end()) {
for (size_t i = 0; i < pred_boxes.size(); ++i) {
auto score = pred_boxes[i].first;
true_pos[label].push_back(std::make_pair(score, 0));
false_pos[label].push_back(std::make_pair(score, 1));
}
continue;
}
auto matched_bboxes = image_gt_boxes.find(label)->second;
std::vector<bool> visited(matched_bboxes.size(), false);
// Sort detections in descend order based on scores
std::sort(pred_boxes.begin(), pred_boxes.end(),
SortScorePairDescend<Box>);
for (size_t i = 0; i < pred_boxes.size(); ++i) {
T max_overlap = -1.0;
size_t max_idx = 0;
auto score = pred_boxes[i].first;
for (size_t j = 0; j < matched_bboxes.size(); ++j) {
T overlap = JaccardOverlap(pred_boxes[i].second, matched_bboxes[j]);
if (overlap > max_overlap) {
max_overlap = overlap;
max_idx = j;
}
}
if (max_overlap > overlap_threshold) {
bool match_evaluate_difficult =
evaluate_difficult ||
(!evaluate_difficult && !matched_bboxes[max_idx].is_difficult);
if (match_evaluate_difficult) {
if (!visited[max_idx]) {
true_pos[label].push_back(std::make_pair(score, 1));
false_pos[label].push_back(std::make_pair(score, 0));
visited[max_idx] = true;
} else {
true_pos[label].push_back(std::make_pair(score, 0));
false_pos[label].push_back(std::make_pair(score, 1));
}
}
} else {
true_pos[label].push_back(std::make_pair(score, 0));
false_pos[label].push_back(std::make_pair(score, 1));
}
}
}
}
}
T CalcMAP(
APType ap_type, const std::map<int, int>& label_pos_count,
const std::map<int, std::vector<std::pair<T, int>>>& true_pos,
const std::map<int, std::vector<std::pair<T, int>>>& false_pos) const {
T mAP = 0.0;
int count = 0;
for (auto it = label_pos_count.begin(); it != label_pos_count.end(); ++it) {
int label = it->first;
int label_num_pos = it->second;
if (label_num_pos == 0 || true_pos.find(label) == true_pos.end())
continue;
auto label_true_pos = true_pos.find(label)->second;
auto label_false_pos = false_pos.find(label)->second;
// Compute average precision.
std::vector<int> tp_sum;
GetAccumulation<T>(label_true_pos, &tp_sum);
std::vector<int> fp_sum;
GetAccumulation<T>(label_false_pos, &fp_sum);
std::vector<T> precision, recall;
size_t num = tp_sum.size();
// Compute Precision.
for (size_t i = 0; i < num; ++i) {
precision.push_back(static_cast<T>(tp_sum[i]) /
static_cast<T>(tp_sum[i] + fp_sum[i]));
recall.push_back(static_cast<T>(tp_sum[i]) / label_num_pos);
}
// VOC2007 style
if (ap_type == APType::k11point) {
std::vector<T> max_precisions(11, 0.0);
int start_idx = num - 1;
for (int j = 10; j >= 0; --j)
for (int i = start_idx; i >= 0; --i) {
if (recall[i] < j / 10.) {
start_idx = i;
if (j > 0) max_precisions[j - 1] = max_precisions[j];
break;
} else {
if (max_precisions[j] < precision[i])
max_precisions[j] = precision[i];
}
}
for (int j = 10; j >= 0; --j) mAP += max_precisions[j] / 11;
++count;
} else if (ap_type == APType::kIntegral) {
// Nature integral
float average_precisions = 0.;
float prev_recall = 0.;
for (size_t i = 0; i < num; ++i) {
if (fabs(recall[i] - prev_recall) > 1e-6)
average_precisions += precision[i] * fabs(recall[i] - prev_recall);
prev_recall = recall[i];
}
mAP += average_precisions;
++count;
} else {
LOG(FATAL) << "Unkown ap version: " << ap_type;
}
}
if (count != 0) mAP /= count;
return mAP * 100;
}
}; // namespace operators
} // namespace operators
} // namespace paddle
......@@ -27,7 +27,7 @@ limitations under the License. */
#include "paddle/fluid/operators/detail/grpc_server.h"
#include "paddle/fluid/operators/detail/sendrecvop_utils.h"
#include "paddle/fluid/operators/detail/simple_block_queue.h"
#include "paddle/string/printf.h"
#include "paddle/fluid/string/printf.h"
namespace paddle {
namespace operators {
......@@ -101,11 +101,15 @@ class ListenAndServOp : public framework::OperatorBase {
// TODO(typhoonzero): change this to a while_op for every cluster-batch.
bool exit_flag = false;
// Record received sparse variables, so that
// we could reset those after execute optimize program
std::vector<framework::Variable *> sparse_vars;
while (!exit_flag) {
// Get from multiple trainers, we don't care about the order in which
// the gradients arrives, just add suffix 0~n and merge the gradient.
rpc_service_->SetCond(0);
size_t recv_var_cnt = 0;
size_t update_param_cnt = 0;
int batch_barrier = 0;
while (batch_barrier != fan_in) {
const detail::MessageWithName &v = rpc_service_->Get();
......@@ -126,13 +130,14 @@ class ListenAndServOp : public framework::OperatorBase {
std::string param_var_name;
if (it != grad_list.end()) {
param_var_name = param_list[it - grad_list.begin()];
update_param_cnt++;
VLOG(3) << "received grad: " << grad_var_name
<< " updating param: " << param_var_name;
} else {
LOG(ERROR) << "grad has no paired param:" << grad_var_name;
VLOG(3) << "received variable: " << grad_var_name
<< " no need to update param";
}
VLOG(3) << "received grad: " << grad_var_name
<< " updating param: " << param_var_name;
if (fan_in > 1) {
if (fan_in > 1 && !param_var_name.empty()) {
grad_var_name = this->GetGradVarNameForTrainer(grad_var_name);
}
auto *var = recv_scope.FindVar(grad_var_name);
......@@ -141,23 +146,35 @@ class ListenAndServOp : public framework::OperatorBase {
PADDLE_THROW("Can not find server side var");
}
detail::DeserializeFromMessage(v.second, dev_ctx, var);
if (var->IsType<framework::SelectedRows>()) {
sparse_vars.push_back(var);
}
}
}
VLOG(3) << "recv " << recv_var_cnt << " parmeters for one barrier.";
// TODO(Yancey1989): merge SelectedRows variables here
if (exit_flag) {
rpc_service_->ShutDown();
}
VLOG(3) << "run optimize graph...";
try {
executor.Run(*program, &recv_scope, block->ID(), /*global_block*/
false /*create_local_scope*/, false /*create_vars*/);
} catch (std::exception &e) {
LOG(ERROR) << "run sub program error " << e.what();
}
// Reset the received sparse variables, the sum operator would not
// sum the input sparse variables which rows is empty at the next
// mini-batch.
// TOOD(Yancey1989): move the reset action into an operator, we couldn't
// have any hide logic in the operator.
for (auto &var : sparse_vars) {
var->GetMutable<framework::SelectedRows>()->mutable_rows()->clear();
}
rpc_service_->SetCond(1);
rpc_service_->WaitClientGet(recv_var_cnt);
rpc_service_->WaitClientGet(update_param_cnt);
grads_counter_.clear();
sparse_vars.clear();
} // while(true)
}
......
......@@ -38,22 +38,22 @@ class MultiClassNMSOp : public framework::OperatorWithKernel {
auto box_dims = ctx->GetInputDim("BBoxes");
auto score_dims = ctx->GetInputDim("Scores");
PADDLE_ENFORCE_EQ(box_dims.size(), 2,
"The rank of Input(BBoxes) must be 2.");
PADDLE_ENFORCE_EQ(box_dims.size(), 3,
"The rank of Input(BBoxes) must be 3.");
PADDLE_ENFORCE_EQ(score_dims.size(), 3,
"The rank of Input(Scores) must be 3.");
PADDLE_ENFORCE_EQ(box_dims[1], 4,
PADDLE_ENFORCE_EQ(box_dims[2], 4,
"The 2nd dimension of Input(BBoxes) must be 4, "
"represents the layout of coordinate "
"[xmin, ymin, xmax, ymax]");
PADDLE_ENFORCE_EQ(box_dims[0], score_dims[2],
PADDLE_ENFORCE_EQ(box_dims[1], score_dims[2],
"The 1st dimensiong of Input(BBoxes) must be equal to "
"3rd dimension of Input(Scores), which represents the "
"predicted bboxes.");
// Here the box_dims[0] is not the real dimension of output.
// It will be rewritten in the computing kernel.
ctx->SetOutputDim("Out", {box_dims[0], 6});
ctx->SetOutputDim("Out", {box_dims[1], 6});
}
protected:
......@@ -260,15 +260,20 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
int64_t batch_size = score_dims[0];
int64_t class_num = score_dims[1];
int64_t predict_dim = score_dims[2];
int64_t box_dim = boxes->dims()[2];
std::vector<std::map<int, std::vector<int>>> all_indices;
std::vector<size_t> batch_starts = {0};
for (int64_t i = 0; i < batch_size; ++i) {
Tensor ins_score = scores->Slice(i, i + 1);
ins_score.Resize({class_num, predict_dim});
Tensor ins_boxes = boxes->Slice(i, i + 1);
ins_boxes.Resize({predict_dim, box_dim});
std::map<int, std::vector<int>> indices;
int num_nmsed_out = 0;
MultiClassNMS(ctx, ins_score, *boxes, indices, num_nmsed_out);
MultiClassNMS(ctx, ins_score, ins_boxes, indices, num_nmsed_out);
all_indices.push_back(indices);
batch_starts.push_back(batch_starts.back() + num_nmsed_out);
}
......@@ -282,11 +287,15 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
for (int64_t i = 0; i < batch_size; ++i) {
Tensor ins_score = scores->Slice(i, i + 1);
ins_score.Resize({class_num, predict_dim});
Tensor ins_boxes = boxes->Slice(i, i + 1);
ins_boxes.Resize({predict_dim, box_dim});
int64_t s = batch_starts[i];
int64_t e = batch_starts[i + 1];
if (e > s) {
Tensor out = outs->Slice(s, e);
MultiClassOutput(ins_score, *boxes, all_indices[i], &out);
MultiClassOutput(ins_score, ins_boxes, all_indices[i], &out);
}
}
}
......@@ -303,9 +312,9 @@ class MultiClassNMSOpMaker : public framework::OpProtoAndCheckerMaker {
MultiClassNMSOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("BBoxes",
"(Tensor) A 2-D Tensor with shape [M, 4] represents the "
"predicted locations of M bounding bboxes. Each bounding box "
"has four coordinate values and the layout is "
"(Tensor) A 3-D Tensor with shape [N, M, 4] represents the "
"predicted locations of M bounding bboxes, N is the batch size. "
"Each bounding box has four coordinate values and the layout is "
"[xmin, ymin, xmax, ymax].");
AddInput("Scores",
"(Tensor) A 3-D Tensor with shape [N, C, M] represents the "
......
......@@ -24,6 +24,22 @@ limitations under the License. */
namespace paddle {
namespace operators {
static bool IsVariableInitialized(const framework::Scope& scope,
const std::string& varname) {
auto* var = scope.FindVar(varname);
PADDLE_ENFORCE_NOT_NULL(var, "Can not find variable '%s' in the send side.",
varname);
if (var->IsType<framework::LoDTensor>()) {
return var->Get<framework::LoDTensor>().IsInitialized();
} else if (var->IsType<framework::SelectedRows>()) {
return var->Get<framework::SelectedRows>().value().IsInitialized();
} else {
PADDLE_THROW(
"Variable type in send side should be in "
"[LodTensor, SelectedRows]");
}
return false;
}
class SendOp : public framework::OperatorBase {
public:
......@@ -51,8 +67,12 @@ class SendOp : public framework::OperatorBase {
detail::RPCClient* rpc_client = client_var->GetMutable<detail::RPCClient>();
for (size_t i = 0; i < ins.size(); i++) {
VLOG(3) << "sending " << ins[i] << " to " << epmap[i];
rpc_client->AsyncSendVariable(epmap[i], ctx, scope, ins[i]);
if (IsVariableInitialized(scope, ins[i])) {
VLOG(3) << "sending " << ins[i] << " to " << epmap[i];
rpc_client->AsyncSendVariable(epmap[i], ctx, scope, ins[i]);
} else {
VLOG(3) << "don't send no-initialied variable: " << ins[i];
}
}
PADDLE_ENFORCE(rpc_client->Wait());
......
......@@ -22,7 +22,7 @@ limitations under the License. */
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/operators/math/selected_rows_functor.h"
#include "paddle/string/printf.h"
#include "paddle/fluid/string/printf.h"
USE_NO_KERNEL_OP(send);
USE_NO_KERNEL_OP(listen_and_serv);
......
......@@ -29,7 +29,9 @@ class SequenceExpandOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE(ctx->HasOutput("Out"));
PADDLE_ENFORCE(ctx->HasInput("Y"));
framework::DDim out_dim;
out_dim = ctx->GetInputDim("Y");
auto y_dim = ctx->GetInputDim("Y");
out_dim = ctx->GetInputDim("X");
out_dim[0] = y_dim[0];
ctx->ShareLoD("Y", "Out");
ctx->SetOutputDim("Out", out_dim);
}
......
......@@ -44,7 +44,6 @@ class SmoothL1LossOp : public framework::OperatorWithKernel {
}
};
template <typename AttrType>
class SmoothL1LossOpMaker : public framework::OpProtoAndCheckerMaker {
public:
SmoothL1LossOpMaker(OpProto* proto, OpAttrChecker* op_checker)
......@@ -73,10 +72,10 @@ class SmoothL1LossOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("Out",
"(Tensor, default Tensor<float>) A tensor with rank be 2. "
"The output smooth l1 loss with shape [batch_size, 1].");
AddAttr<AttrType>("sigma",
"Hyper parameter of smooth l1 loss op."
"A float scalar with default value 3.0.")
.SetDefault(3.0);
AddAttr<float>("sigma",
"Hyper parameter of smooth l1 loss op."
"A float scalar with default value 3.0.")
.SetDefault(1.0);
AddComment(R"DOC(
Smooth L1 Loss Operator.
......@@ -133,9 +132,8 @@ class SmoothL1LossGradOp : public framework::OperatorWithKernel {
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP(smooth_l1_loss, ops::SmoothL1LossOp,
ops::SmoothL1LossOpMaker<float>, smooth_l1_loss_grad,
ops::SmoothL1LossGradOp);
REGISTER_OP(smooth_l1_loss, ops::SmoothL1LossOp, ops::SmoothL1LossOpMaker,
smooth_l1_loss_grad, ops::SmoothL1LossGradOp);
REGISTER_OP_CPU_KERNEL(
smooth_l1_loss,
ops::SmoothL1LossKernel<paddle::platform::CPUDeviceContext, float>);
......
......@@ -14,6 +14,7 @@ limitations under the License. */
#pragma once
#include <chrono>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/strided_memcpy.h"
......@@ -27,18 +28,18 @@ class SplitOpKernel : public framework::OpKernel<T> {
void Compute(const framework::ExecutionContext& ctx) const override {
auto* in = ctx.Input<framework::Tensor>("X");
auto outs = ctx.MultiOutput<framework::Tensor>("Out");
auto in_stride = framework::stride(in->dims());
auto in_stride = framework::stride_numel(in->dims());
int64_t axis = static_cast<int64_t>(ctx.Attr<int>("axis"));
const size_t n = outs.size();
auto place = ctx.GetPlace();
size_t input_offset = 0;
for (size_t i = 0; i < n; i++) {
auto& out = outs[i];
for (auto& out : outs) {
out->mutable_data<T>(ctx.GetPlace());
size_t axis_dim = out->dims()[axis];
auto out_stride = framework::stride(out->dims());
StridedMemcpy<T>(ctx.device_context(), in->data<T>() + input_offset,
in_stride, out->dims(), out_stride, out->data<T>());
input_offset += axis_dim * in_stride[axis];
auto out_stride = framework::stride_numel(out->dims());
StridedNumelCopyWithAxis<T>(ctx.device_context(), axis, out->data<T>(),
out_stride, in->data<T>() + input_offset,
in_stride);
input_offset += out_stride[axis];
}
}
};
......
......@@ -22,7 +22,7 @@ class SplitSelectedRowsOpMaker : public framework::OpProtoAndCheckerMaker {
SplitSelectedRowsOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input SelectedRows.");
AddOutput("Out", "The outputs of input SelectedRows.").AsDuplicable();
AddOutput("Out", "The outputs of the input SelectedRows.").AsDuplicable();
AddAttr<std::vector<int>>("height_sections",
"Height for each output SelectedRows.")
.SetDefault(std::vector<int>({}));
......@@ -56,27 +56,6 @@ class SplitSelectedRowsOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE(ctx->HasInput("X"), "SplitSelectedRowsOp must has input X.");
PADDLE_ENFORCE(ctx->HasOutputs("Out"),
"SplitSelectedRowsOp must has output Out.");
std::vector<int> height_sections =
ctx->Attrs().Get<std::vector<int>>("height_sections");
int64_t n = ctx->Outputs("Out").size();
std::vector<framework::DDim> outs_dims;
outs_dims.reserve(n);
// make output dims
for (int64_t i = 0; i < n; ++i) {
auto dims = ctx->GetInputDim("X");
if (height_sections.size()) {
PADDLE_ENFORCE_EQ(
height_sections.size(), static_cast<size_t>(n),
"The size of height section should be the same with height"
" section size.");
dims[0] = height_sections[i];
}
outs_dims.push_back(dims);
}
ctx->SetOutputsDim("Out", outs_dims);
}
};
......
......@@ -55,6 +55,7 @@ class SplitSelectedRowsOpKernel : public framework::OpKernel<T> {
for (size_t i = 0; i < outs_rows_idx.size(); ++i) {
auto rows_idx = outs_rows_idx[i];
outs[i]->set_height(height_sections[i]);
if (rows_idx.size() > 0) {
auto dims = x->GetCompleteDims();
dims[0] = rows_idx.size();
......
......@@ -41,5 +41,62 @@ inline void StridedMemcpy(const platform::DeviceContext& dev_ctx, const T* src,
StridedCopyDimVisitor<T> func(dev_ctx, src, src_stride, dst_stride, dst);
boost::apply_visitor(func, dst_dim);
}
// Strided numel memory copy from src to dst by the specified axis
//
// For example, for a tensor dims [4, 20, 100], the strieded numel is
// [8000, 2000, 100]
//
// NOTE: The src and dst tensor should have the same elements
// except the specified axis.
template <typename T>
inline void StridedNumelCopyWithAxis(const platform::DeviceContext& ctx,
int64_t axis, T* dst,
const framework::DDim& dst_stride_numel,
const T* src,
const framework::DDim& src_stride_numel) {
int64_t before = dst_stride_numel[0] / dst_stride_numel[axis];
int64_t src_after = src_stride_numel[axis];
int64_t dst_after = dst_stride_numel[axis];
auto place = ctx.GetPlace();
PADDLE_ENFORCE_EQ(src_stride_numel.size(), dst_stride_numel.size(),
"src and dst tensor should have the same dims size.");
for (int64_t i = 0; i < axis; ++i) {
if (i < axis) {
PADDLE_ENFORCE_EQ(src_stride_numel[i] / src_stride_numel[axis],
dst_stride_numel[i] / dst_stride_numel[axis],
"src and dst should have the same elements "
"except the specified axis.");
} else if (i == axis) {
continue;
} else {
PADDLE_ENFORCE_EQ(src_stride_numel[i], dst_stride_numel[i],
"src and dst should have the same elements "
"except the specified axis.");
}
}
for (int64_t i = 0; i < before; ++i) {
if (platform::is_cpu_place(place)) {
auto& cpu_place = boost::get<platform::CPUPlace>(place);
memory::Copy(cpu_place, dst + i * dst_after, cpu_place,
src + i * src_after, sizeof(T) * src_after);
} else {
#ifdef PADDLE_WITH_CUDA
auto& gpu_place = boost::get<platform::CUDAPlace>(place);
auto& cuda_ctx =
reinterpret_cast<const platform::CUDADeviceContext&>(ctx);
memory::Copy(gpu_place, dst + i * dst_after, gpu_place,
src + i * src_after, sizeof(T) * src_after,
cuda_ctx.stream());
#else
PADDLE_THROW("Paddle is not compiled with GPU");
#endif
}
}
}
} // namespace operators
} // namespace paddle
......@@ -116,7 +116,9 @@ class SumKernel : public framework::OpKernel<T> {
int64_t offset = 0;
for (int i = 0; i < N; i++) {
auto &sel_row = get_selected_row(i);
if (!sel_row.value().IsInitialized() || sel_row.rows().size() == 0) {
continue;
}
PADDLE_ENFORCE_EQ(out->height(), sel_row.height());
functor(context.template device_context<DeviceContext>(), sel_row,
offset, out);
......
......@@ -22,69 +22,43 @@ class TargetAssignOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
// checkout inputs
PADDLE_ENFORCE(ctx->HasInput("EncodedGTBBox"),
"Input(EncodedGTBBox) of TargetAssignOp should not be null");
PADDLE_ENFORCE(ctx->HasInput("GTScoreLabel"),
"Input(GTScoreLabel) of TargetAssignOp should not be null");
PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of TargetAssignOp should not be null");
PADDLE_ENFORCE(ctx->HasInput("MatchIndices"),
"Input(MatchIndices) of TargetAssignOp should not be null");
PADDLE_ENFORCE(ctx->HasInput("NegIndices"),
"Input(NegIndices) of TargetAssignOp should not be null");
// checkout outputs
PADDLE_ENFORCE(
ctx->HasOutput("PredBBoxLabel"),
"Output(PredBBoxLabel) of TargetAssignOp should not be null.");
PADDLE_ENFORCE(
ctx->HasOutput("PredBBoxWeight"),
"Output(PredBBoxWeight) of TargetAssignOp should not be null.");
PADDLE_ENFORCE(
ctx->HasOutput("PredScoreLabel"),
"Output(PredScoreLabel) of TargetAssignOp should not be null.");
PADDLE_ENFORCE(
ctx->HasOutput("PredScoreWeight"),
"Output(PredScoreWeight) of TargetAssignOp should not be null.");
auto blabel_dims = ctx->GetInputDim("EncodedGTBBox");
auto slabel_dims = ctx->GetInputDim("GTScoreLabel");
PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of TargetAssignOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("OutWeight"),
"Output(OutWeight) of TargetAssignOp should not be null.");
auto in_dims = ctx->GetInputDim("X");
auto mi_dims = ctx->GetInputDim("MatchIndices");
auto neg_dims = ctx->GetInputDim("NegIndices");
PADDLE_ENFORCE_EQ(blabel_dims.size(), 3UL,
"The rank of Input(EncodedGTBBox) must be 3.");
PADDLE_ENFORCE_EQ(slabel_dims.size(), 2UL,
"The rank of Input(GTScoreLabel) must be 2.");
PADDLE_ENFORCE_EQ(mi_dims.size(), 2UL,
PADDLE_ENFORCE_EQ(in_dims.size(), 3, "The rank of Input(X) must be 3.");
PADDLE_ENFORCE_EQ(mi_dims.size(), 2,
"The rank of Input(MatchIndices) must be 2.");
PADDLE_ENFORCE_EQ(neg_dims.size(), 2UL,
"The rank of Input(NegIndices) must be 2.");
PADDLE_ENFORCE_EQ(blabel_dims[0], slabel_dims[0],
"The 1st dimension (means the total number of "
"ground-truth bounding boxes) of Input(EncodedGTBBox) "
"and Input(GTScoreLabel) must be the same.");
PADDLE_ENFORCE_EQ(blabel_dims[1], mi_dims[1],
"The 2nd dimension (means the number of priod boxes) "
"of Input(EncodedGTBBox) and "
"Input(MatchIndices) must be the same.");
PADDLE_ENFORCE_EQ(blabel_dims[2], 4,
"The 3rd dimension of Input(EncodedGTBBox) must be 4.");
if (ctx->HasInput("NegIndices")) {
auto neg_dims = ctx->GetInputDim("NegIndices");
PADDLE_ENFORCE_EQ(neg_dims.size(), 2,
"The rank of Input(NegIndices) must be 2.");
PADDLE_ENFORCE_EQ(neg_dims[1], 1,
"The last dimenstion of Out(NegIndices) must be 1.");
}
auto n = mi_dims[0];
auto np = mi_dims[1];
ctx->SetOutputDim("PredBBoxLabel", {n, np, 4});
ctx->SetOutputDim("PredBBoxWeight", {n, np, 1});
ctx->SetOutputDim("PredScoreLabel", {n, np, 1});
ctx->SetOutputDim("PredScoreWeight", {n, np, 1});
auto m = mi_dims[1];
auto k = in_dims[in_dims.size() - 1];
ctx->SetOutputDim("Out", {n, m, k});
ctx->SetOutputDim("OutWeight", {n, m, 1});
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
framework::ToDataType(
ctx.Input<framework::LoDTensor>("EncodedGTBBox")->type()),
framework::ToDataType(ctx.Input<framework::LoDTensor>("X")->type()),
ctx.device_context());
}
};
......@@ -93,102 +67,87 @@ class TargetAssignOpMaker : public framework::OpProtoAndCheckerMaker {
public:
TargetAssignOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("EncodedGTBBox",
"(LoDTensor), The encoded ground-truth bounding boxes with shape "
"[Ng, Np, 4], where Ng is the total number of ground-truth boxes "
"in this mini-batch, Np the number of predictions, 4 is the "
"number of coordinate in [xmin, ymin, xmax, ymax] layout.");
AddInput("GTScoreLabel",
"(LoDTensor, default LoDTensor<int>), The input ground-truth "
"labels with shape [Ng, 1], where the Ng is the same as it in "
"the input of EncodedGTBBox.");
AddInput("X",
"(LoDTensor), This input is a 3D LoDTensor with shape [M, P, K]. "
"Some elements in X will be assigned to Out based on the "
"MatchIndices and NegIndices.");
AddInput("MatchIndices",
"(Tensor, default Tensor<int>), The input matched indices "
"with shape [N, Np], where N is the batch size, Np is the same "
"as it in the input of EncodedGTBBox. If MatchIndices[i][j] "
"is -1, the j-th prior box is not matched to any ground-truh "
"box in i-th instance.");
"with shape [N, P], If MatchIndices[i][j] is -1, the j-th entity "
"of column is not matched to any entity of row in i-th instance.");
AddInput("NegIndices",
"(LoDTensor, default LoDTensor<int>), The input negative example "
"indices with shape [Neg, 1], where is the total number of "
"negative example indices.");
AddAttr<int>("background_label",
"(int, default 0), Label index of background class.")
"indices are an optional input with shape [Neg, 1], where Neg is "
"the total number of negative example indices.")
.AsDispensable();
AddAttr<int>("mismatch_value",
"(int, default 0), Fill this value to the "
"mismatched location.")
.SetDefault(0);
AddOutput("PredBBoxLabel",
"(Tensor), The output encoded ground-truth labels "
"with shape [N, Np, 4], N is the batch size and Np, 4 is the "
"same as they in input of EncodedGTBBox. If MatchIndices[i][j] "
"is -1, the PredBBoxLabel[i][j][:] is the encoded ground-truth "
"box for background_label in i-th instance.");
AddOutput("PredBBoxWeight",
"(Tensor), The weight for PredBBoxLabel with the shape "
"of [N, Np, 1]");
AddOutput("PredScoreLabel",
"(Tensor, default Tensor<int>), The output score labels for "
"each predictions with shape [N, Np, 1]. If MatchIndices[i][j] "
"is -1, PredScoreLabel[i][j] = background_label.");
AddOutput("PredScoreWeight",
"(Tensor), The weight for PredScoreLabel with the shape "
"of [N, Np, 1]");
AddOutput("Out",
"(Tensor), The output is a 3D Tensor with shape [N, P, K], "
"N and P is the same as they are in NegIndices, K is the "
"same as it in input of X. If MatchIndices[i][j] "
"is -1, the Out[i][j][0 : K] is the mismatch_value.");
AddOutput("OutWeight",
"(Tensor), The weight for output with the shape of [N, P, 1]");
AddComment(R"DOC(
This operator is, for given the encoded boxes between prior boxes and
ground-truth boxes and ground-truth class labels, to assign classification
and regression targets to each prior box as well as weights to each
prior box. The weights is used to specify which prior box would not contribute
to training loss.
For each instance, the output `PredBBoxLabel`, `PredBBoxWeight`,
`PredScoreLabel` and `PredScoreWeight` are assigned based on `MatchIndices`.
Assumed that the row offset for each instance in `EncodedGTBBox` is called lod,
this operato assigns classification/regression targets by performing the
This operator can be, for given the target bounding boxes or labels,
to assign classification and regression targets to each prediction as well as
weights to prediction. The weights is used to specify which prediction would
not contribute to training loss.
For each instance, the output `Out` and`OutWeight` are assigned based on
`MatchIndices` and `NegIndices`.
Assumed that the row offset for each instance in `X` is called lod,
this operator assigns classification/regression targets by performing the
following steps:
1. Assigning all outpts based on `MatchIndices`:
If id = MatchIndices[i][j] > 0,
PredBBoxLabel[i][j] = EncodedGTBBox[lod[i] + id][j]
PredBBoxWeight[i][j] = 1.
PredScoreLabel[i][j] = GTScoreLabel[lod[i] + id]
PredScoreWeight[i][j] = 1.
Out[i][j][0 : K] = X[lod[i] + id][j % P][0 : K]
OutWeight[i][j] = 1.
Otherwise,
PredBBoxLabel[j][j] = [0., 0., 0., 0.]
PredBBoxWeight[i][j] = 0.
PredScoreLabel[i][j] = background_label
PredScoreWeight[i][j] = 0.
Out[j][j][0 : K] = {mismatch_value, mismatch_value, ...}
OutWeight[i][j] = 0.
2. Assigning PredScoreWeight based on `NegIndices`:
2. Assigning OutWeight based on `NegIndices` if `NegIndices` is provided:
Assumed that the row offset for each instance in `NegIndices` is caleed neg_lod,
for i-th instance and all ids of NegIndices in this instance:
Assumed that the row offset for each instance in `NegIndices` is called neg_lod,
for i-th instance and each `id` of NegIndices in this instance:
PredScoreLabel[i][id] = background_label
PredScoreWeight[i][id] = 1.0
Out[i][id][0 : K] = {mismatch_value, mismatch_value, ...}
OutWeight[i][id] = 1.0
)DOC");
}
};
template <typename T>
struct NegTargetAssignFunctor<platform::CPUDeviceContext, T> {
template <typename T, typename WT>
struct NegTargetAssignFunctor<platform::CPUDeviceContext, T, WT> {
void operator()(const platform::CPUDeviceContext& ctx, const int* neg_indices,
const size_t* lod, const int num, const int num_prior_box,
const int background_label, int* out_label, T* out_label_wt) {
for (int i = 0; i < num; ++i) {
const size_t* lod, const int N, const int M, const int K,
const int mismatch_value, T* out, WT* out_wt) {
for (int i = 0; i < N; ++i) {
for (size_t j = lod[i]; j < lod[i + 1]; ++j) {
int id = neg_indices[j];
out_label[i * num_prior_box + id] = background_label;
out_label_wt[i * num_prior_box + id] = static_cast<T>(1.0);
int off = (i * M + id) * K;
for (int k = 0; k < K; ++k) {
out[off + k] = mismatch_value;
out_wt[off + k] = static_cast<WT>(1.0);
}
}
}
}
};
template struct NegTargetAssignFunctor<platform::CPUDeviceContext, float>;
template struct NegTargetAssignFunctor<platform::CPUDeviceContext, double>;
template struct NegTargetAssignFunctor<platform::CPUDeviceContext, int, float>;
template struct NegTargetAssignFunctor<platform::CPUDeviceContext, float,
float>;
} // namespace operators
} // namespace paddle
......@@ -198,5 +157,5 @@ REGISTER_OP_WITHOUT_GRADIENT(target_assign, ops::TargetAssignOp,
ops::TargetAssignOpMaker);
REGISTER_OP_CPU_KERNEL(
target_assign,
ops::TargetAssignKernel<paddle::platform::CPUDeviceContext, float>,
ops::TargetAssignKernel<paddle::platform::CPUDeviceContext, double>);
ops::TargetAssignKernel<paddle::platform::CPUDeviceContext, int, float>,
ops::TargetAssignKernel<paddle::platform::CPUDeviceContext, float, float>);
......@@ -17,39 +17,41 @@ limitations under the License. */
namespace paddle {
namespace operators {
template <typename T>
template <typename T, typename WT>
__global__ void NegTargetAssignKernel(const int* neg_indices, const size_t* lod,
const int num, const int num_prior_box,
const int background_label,
int* out_label, T* out_label_wt) {
const int N, const int M, const int K,
const int mismatch_value, T* out,
WT* out_wt) {
int bidx = blockIdx.x;
int st = lod[bidx];
int ed = lod[bidx + 1];
int row_start = bidx * num_prior_box;
int row_start = bidx * M;
for (int i = st + threadIdx.x; i < ed; i += blockDim.x) {
int id = row_start + neg_indices[i];
out_label[id] = background_label;
out_label_wt[id] = 1.;
for (int k = 0; k < K; ++k) {
out[id * K + k] = T(mismatch_value);
out_wt[id * K + k] = WT(1.);
}
}
}
template <typename T>
struct NegTargetAssignFunctor<platform::CUDADeviceContext, T> {
template <typename T, typename WT>
struct NegTargetAssignFunctor<platform::CUDADeviceContext, T, WT> {
void operator()(const platform::CUDADeviceContext& ctx,
const int* neg_indices, const size_t* lod, const int num,
const int num_prior_box, const int background_label,
int* out_label, T* out_label_wt) {
const int* neg_indices, const size_t* lod, const int N,
const int M, const int K, const int mismatch_value, T* out,
WT* out_wt) {
const int block_size = 256;
const int grid_size = num;
NegTargetAssignKernel<T><<<grid_size, block_size, 0, ctx.stream()>>>(
neg_indices, lod, num, num_prior_box, background_label, out_label,
out_label_wt);
const int grid_size = N;
NegTargetAssignKernel<T, WT><<<grid_size, block_size, 0, ctx.stream()>>>(
neg_indices, lod, N, M, K, mismatch_value, out, out_wt);
}
};
template struct NegTargetAssignFunctor<platform::CUDADeviceContext, float>;
template struct NegTargetAssignFunctor<platform::CUDADeviceContext, double>;
template struct NegTargetAssignFunctor<platform::CUDADeviceContext, int, float>;
template struct NegTargetAssignFunctor<platform::CUDADeviceContext, float,
float>;
} // namespace operators
} // namespace paddle
......@@ -57,5 +59,5 @@ template struct NegTargetAssignFunctor<platform::CUDADeviceContext, double>;
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(
target_assign,
ops::TargetAssignKernel<paddle::platform::CUDADeviceContext, float>,
ops::TargetAssignKernel<paddle::platform::CUDADeviceContext, double>);
ops::TargetAssignKernel<paddle::platform::CUDADeviceContext, int, float>,
ops::TargetAssignKernel<paddle::platform::CUDADeviceContext, float, float>);
......@@ -19,140 +19,113 @@ limitations under the License. */
namespace paddle {
namespace operators {
template <typename T>
template <typename T, typename WT>
struct TargetAssignFunctor {
const T* gt_box_;
const int* gt_label_;
const T* in_;
const int* match_indices_;
const size_t* lod_;
const int background_label_;
const int64_t num_;
const int64_t num_prior_box_;
T* out_box_;
T* out_box_wt_;
int* out_label_;
T* out_label_wt_;
TargetAssignFunctor(const T* gt_box, const int* gt_label,
const int* match_indices, const size_t* lod,
const int background_label, const int64_t num,
const int64_t np, T* out_box, T* out_box_wt,
int* out_label, T* out_label_wt)
: gt_box_(gt_box),
gt_label_(gt_label),
const int mismatch_value_;
const int64_t N_;
const int64_t M_;
const int64_t P_;
const int64_t K_;
T* out_;
WT* out_wt_;
TargetAssignFunctor(const T* input, const int* match_indices,
const size_t* lod, const int mismatch_value,
const int64_t N, const int64_t M, const int64_t P,
const int64_t K, T* out, WT* out_wt)
: in_(input),
match_indices_(match_indices),
lod_(lod),
background_label_(background_label),
num_(num),
num_prior_box_(np),
out_box_(out_box),
out_box_wt_(out_box_wt),
out_label_(out_label),
out_label_wt_(out_label_wt) {}
mismatch_value_(mismatch_value),
N_(N),
M_(M),
P_(P),
K_(K),
out_(out),
out_wt_(out_wt) {}
HOSTDEVICE void operator()(size_t i) const {
int row = i / num_prior_box_;
int col = i - row * num_prior_box_;
int h = i / M_;
int w = i - h * M_;
size_t row_off = lod_[row];
int offset = row * num_prior_box_ + col;
size_t off = lod_[h];
int id = match_indices_[i];
int id = match_indices_[offset];
T* obox = out_box_ + offset * 4;
int* olabel = out_label_ + offset;
T* obox_wt = out_box_wt_ + offset;
T* olabel_wt = out_label_wt_ + offset;
T* out = out_ + i * K_;
WT* out_wt = out_wt_ + i;
if (id > -1) {
const T* gtbox = gt_box_ + ((row_off + id) * num_prior_box_ + col) * 4;
obox[0] = gtbox[0];
obox[1] = gtbox[1];
obox[2] = gtbox[2];
obox[3] = gtbox[3];
olabel[0] = gt_label_[row_off + id];
obox_wt[0] = static_cast<T>(1.);
olabel_wt[0] = static_cast<T>(1.);
int w_off = w % P_;
const T* in = in_ + ((off + id) * P_ + w_off) * K_;
for (int64_t k = 0; k < K_; ++k) {
out[k] = in[k];
}
out_wt[0] = static_cast<WT>(1.);
} else {
obox[0] = static_cast<T>(0.);
obox[1] = static_cast<T>(0.);
obox[2] = static_cast<T>(0.);
obox[3] = static_cast<T>(0.);
olabel[0] = background_label_;
obox_wt[0] = static_cast<T>(0.);
olabel_wt[0] = static_cast<T>(0.);
for (int64_t k = 0; k < K_; ++k) {
out[k] = static_cast<T>(mismatch_value_);
}
out_wt[0] = static_cast<WT>(0.);
}
}
};
template <typename DeviceContext, typename T>
template <typename DeviceContext, typename T, typename WT>
struct NegTargetAssignFunctor {
void operator()(const platform::DeviceContext& ctx, const int* neg_indices,
const size_t* lod, const int num, const int num_prior_box,
const int background_label, int* out_label,
T* out_label_wt) const;
const size_t* lod, const int N, const int M, const int K,
const int mismatch_value, T* out, WT* out_wt) const;
};
template <typename DeviceContext, typename T>
template <typename DeviceContext, typename T, typename WT>
class TargetAssignKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* enc_gt_box = ctx.Input<framework::LoDTensor>("EncodedGTBBox");
auto* gt_label = ctx.Input<framework::LoDTensor>("GTScoreLabel");
auto* x = ctx.Input<framework::LoDTensor>("X");
auto* match_indices = ctx.Input<framework::Tensor>("MatchIndices");
auto* neg_indices = ctx.Input<framework::LoDTensor>("NegIndices");
auto* out_box = ctx.Output<framework::Tensor>("PredBBoxLabel");
auto* out_box_wt = ctx.Output<framework::Tensor>("PredBBoxWeight");
auto* out_label = ctx.Output<framework::Tensor>("PredScoreLabel");
auto* out_label_wt = ctx.Output<framework::Tensor>("PredScoreWeight");
PADDLE_ENFORCE_EQ(enc_gt_box->lod().size(), 1UL);
PADDLE_ENFORCE_EQ(gt_label->lod().size(), 1UL);
PADDLE_ENFORCE_EQ(neg_indices->lod().size(), 1UL);
auto* out = ctx.Output<framework::Tensor>("Out");
auto* out_wt = ctx.Output<framework::Tensor>("OutWeight");
int background_label = ctx.Attr<int>("background_label");
PADDLE_ENFORCE_EQ(x->lod().size(), 1UL);
int mismatch_value = ctx.Attr<int>("mismatch_value");
const T* box_data = enc_gt_box->data<T>();
const int* label_data = gt_label->data<int>();
const T* x_data = x->data<T>();
const int* match_idx_data = match_indices->data<int>();
const int* neg_idx_data = neg_indices->data<int>();
T* obox_data = out_box->mutable_data<T>(ctx.GetPlace());
T* obox_wt_data = out_box_wt->mutable_data<T>(ctx.GetPlace());
int* olabel_data = out_label->mutable_data<int>(ctx.GetPlace());
T* olabel_wt_data = out_label_wt->mutable_data<T>(ctx.GetPlace());
T* out_data = out->mutable_data<T>(ctx.GetPlace());
WT* out_wt_data = out_wt->mutable_data<WT>(ctx.GetPlace());
int64_t num = match_indices->dims()[0];
int64_t num_prior_box = match_indices->dims()[1];
int64_t n = match_indices->dims()[0];
int64_t m = match_indices->dims()[1];
int64_t p = x->dims()[1];
int64_t k = x->dims()[2];
auto gt_lod = enc_gt_box->lod().back();
auto gt_label_lod = gt_label->lod().back();
auto neg_lod = neg_indices->lod().back();
for (size_t i = 0; i < gt_lod.size(); ++i) {
PADDLE_ENFORCE_EQ(gt_lod.data()[i], gt_label_lod.data()[i]);
}
size_t* gt_lod_data = gt_lod.MutableData(ctx.GetPlace());
size_t* neg_lod_data = neg_lod.MutableData(ctx.GetPlace());
auto x_lod = x->lod().back();
size_t* x_lod_data = x_lod.MutableData(ctx.GetPlace());
TargetAssignFunctor<T> functor(box_data, label_data, match_idx_data,
gt_lod_data, background_label, num,
num_prior_box, obox_data, obox_wt_data,
olabel_data, olabel_wt_data);
TargetAssignFunctor<T, WT> functor(x_data, match_idx_data, x_lod_data,
mismatch_value, n, m, p, k, out_data,
out_wt_data);
auto& device_ctx = ctx.template device_context<DeviceContext>();
platform::ForRange<DeviceContext> for_range(device_ctx,
num * num_prior_box);
platform::ForRange<DeviceContext> for_range(device_ctx, n * m);
for_range(functor);
NegTargetAssignFunctor<DeviceContext, T> neg_trg_functor;
neg_trg_functor(device_ctx, neg_idx_data, neg_lod_data, num, num_prior_box,
background_label, olabel_data, olabel_wt_data);
auto* neg_indices = ctx.Input<framework::LoDTensor>("NegIndices");
if (neg_indices) {
PADDLE_ENFORCE_EQ(neg_indices->lod().size(), 1UL);
const int* neg_idx_data = neg_indices->data<int>();
auto neg_lod = neg_indices->lod().back();
size_t* neg_lod_data = neg_lod.MutableData(ctx.GetPlace());
NegTargetAssignFunctor<DeviceContext, T, WT> neg_trg_functor;
neg_trg_functor(device_ctx, neg_idx_data, neg_lod_data, n, m, k,
mismatch_value, out_data, out_wt_data);
}
}
};
......
......@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/platform/cpu_info.h"
#include "paddle/string/printf.h"
#include "paddle/fluid/string/printf.h"
#include <ostream>
#include <sstream>
......
......@@ -23,8 +23,8 @@ limitations under the License. */
#include <string>
#include "paddle/fluid/platform/macros.h"
#include "paddle/string/printf.h"
#include "paddle/string/to_string.h"
#include "paddle/fluid/string/printf.h"
#include "paddle/fluid/string/to_string.h"
#ifdef __GNUC__
#include <cxxabi.h> // for __cxa_demangle
......
......@@ -15,7 +15,7 @@ limitations under the License. */
#include "gtest/gtest.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/string/piece.h"
#include "paddle/fluid/string/piece.h"
using StringPiece = paddle::string::Piece;
using paddle::string::HasPrefix;
......
......@@ -35,7 +35,7 @@ limitations under the License. */
#include "paddle/fluid/pybind/exception.h"
#include "paddle/fluid/pybind/pybind.h"
#include "paddle/fluid/pybind/tensor_py.h"
#include "paddle/string/to_string.h"
#include "paddle/fluid/string/to_string.h"
#ifdef PADDLE_WITH_CUDA
#include "paddle/fluid/operators/nccl/nccl_gpu_common.h"
......
......@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/string/piece.h"
#include "piece.h"
#include <string.h>
......
......@@ -28,7 +28,7 @@ namespace string {
// its syntax is simple as it doesn't own/manage the string, it is
// cheap to construct Pieces and pass them around.
class Piece {
public:
public:
static const size_t npos = static_cast<size_t>(-1);
// We provide non-explicit singleton constructors so users can
......@@ -55,7 +55,7 @@ public:
// Return a string that contains the copy of the referenced data.
std::string ToString() const { return std::string(data_, size_); }
private:
private:
const char* data_;
size_t size_;
......
......@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/string/piece.h"
#include "paddle/fluid/string/piece.h"
#include <sstream>
......
......@@ -71,7 +71,7 @@
#include <iostream>
#include <sstream>
#include "paddle/string/tinyformat/tinyformat.h" // https://github.com/c42f/tinyformat
#include "tinyformat/tinyformat.h" // https://github.com/c42f/tinyformat
namespace paddle {
namespace string {
......
......@@ -11,7 +11,7 @@
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/string/printf.h"
#include "printf.h"
#include <string>
......@@ -24,6 +24,6 @@ TEST(StringPrintf, StringPrintf) {
long hour = 14;
int min = 44;
EXPECT_EQ(std::string("Wednesday, July 27, 14:44"),
paddle::string::Sprintf(
"%s, %s %d, %.2d:%.2d", weekday, month, day, hour, min));
paddle::string::Sprintf("%s, %s %d, %.2d:%.2d", weekday, month, day,
hour, min));
}
......@@ -147,7 +147,7 @@ namespace detail {
// Test whether type T1 is convertible to type T2
template <typename T1, typename T2>
struct is_convertible {
private:
private:
// two types of different size
struct fail {
char dummy[2];
......@@ -160,7 +160,7 @@ private:
static succeed tryConvert(const T2 &);
static const T1 &makeT1();
public:
public:
// Standard trick: the (...) version of tryConvert will be chosen from
// the overload set only if the version taking a T2 doesn't match.
// Then we compare the sizes of the return types to check which
......@@ -170,8 +170,7 @@ public:
// Format the value by casting to type fmtT. This default implementation
// should never be called.
template <typename T,
typename fmtT,
template <typename T, typename fmtT,
bool convertible = is_convertible<T, fmtT>::value>
struct formatValueAsType {
static void invoke(std::ostream & /*out*/, const T & /*value*/) { assert(0); }
......@@ -241,11 +240,8 @@ TINYFORMAT_DEFINE_FORMAT_TRUNCATED_CSTR(char)
/// operator<< to format the type T, with special cases for the %c and %p
/// conversions.
template <typename T>
inline void formatValue(std::ostream &out,
const char * /*fmtBegin*/,
const char *fmtEnd,
int ntrunc,
const T &value) {
inline void formatValue(std::ostream &out, const char * /*fmtBegin*/,
const char *fmtEnd, int ntrunc, const T &value) {
// The mess here is to support the %c and %p conversions: if these
// conversions are active we try to convert the type to a char or const
// void* respectively and format that instead of the value itself. For the
......@@ -267,25 +263,22 @@ inline void formatValue(std::ostream &out,
}
// Overloaded version for char types to support printing as an integer
#define TINYFORMAT_DEFINE_FORMATVALUE_CHAR(charType) \
inline void formatValue(std::ostream &out, \
const char * /*fmtBegin*/, \
const char *fmtEnd, \
int /**/, \
charType value) { \
switch (*(fmtEnd - 1)) { \
case 'u': \
case 'd': \
case 'i': \
case 'o': \
case 'X': \
case 'x': \
out << static_cast<int>(value); \
break; \
default: \
out << value; \
break; \
} \
#define TINYFORMAT_DEFINE_FORMATVALUE_CHAR(charType) \
inline void formatValue(std::ostream &out, const char * /*fmtBegin*/, \
const char *fmtEnd, int /**/, charType value) { \
switch (*(fmtEnd - 1)) { \
case 'u': \
case 'd': \
case 'i': \
case 'o': \
case 'X': \
case 'x': \
out << static_cast<int>(value); \
break; \
default: \
out << value; \
break; \
} \
}
// per 3.9.1: char, signed char and unsigned char are all distinct types
TINYFORMAT_DEFINE_FORMATVALUE_CHAR(char)
......@@ -482,7 +475,7 @@ namespace detail {
// each argument to be allocated as a homogenous array inside FormatList
// whereas a naive implementation based on inheritance does not.
class FormatArg {
public:
public:
FormatArg() {}
template <typename T>
......@@ -491,22 +484,17 @@ public:
m_formatImpl(&formatImpl<T>),
m_toIntImpl(&toIntImpl<T>) {}
void format(std::ostream &out,
const char *fmtBegin,
const char *fmtEnd,
void format(std::ostream &out, const char *fmtBegin, const char *fmtEnd,
int ntrunc) const {
m_formatImpl(out, fmtBegin, fmtEnd, ntrunc, m_value);
}
int toInt() const { return m_toIntImpl(m_value); }
private:
private:
template <typename T>
static void formatImpl(std::ostream &out,
const char *fmtBegin,
const char *fmtEnd,
int ntrunc,
const void *value) {
static void formatImpl(std::ostream &out, const char *fmtBegin,
const char *fmtEnd, int ntrunc, const void *value) {
formatValue(out, fmtBegin, fmtEnd, ntrunc, *static_cast<const T *>(value));
}
......@@ -516,11 +504,8 @@ private:
}
const void *m_value;
void (*m_formatImpl)(std::ostream &out,
const char *fmtBegin,
const char *fmtEnd,
int ntrunc,
const void *value);
void (*m_formatImpl)(std::ostream &out, const char *fmtBegin,
const char *fmtEnd, int ntrunc, const void *value);
int (*m_toIntImpl)(const void *value);
};
......@@ -569,12 +554,10 @@ inline const char *printFormatStringLiteral(std::ostream &out,
// necessary to pull out variable width and precision . The function returns a
// pointer to the character after the end of the current format spec.
inline const char *streamStateFromFormat(std::ostream &out,
bool &spacePadPositive,
int &ntrunc,
bool &spacePadPositive, int &ntrunc,
const char *fmtStart,
const detail::FormatArg *formatters,
int &argIndex,
int numFormatters) {
int &argIndex, int numFormatters) {
if (*fmtStart != '%') {
TINYFORMAT_ERROR(
"tinyformat: Not enough conversion specifiers in format string");
......@@ -750,10 +733,8 @@ inline const char *streamStateFromFormat(std::ostream &out,
}
//------------------------------------------------------------------------------
inline void formatImpl(std::ostream &out,
const char *fmt,
const detail::FormatArg *formatters,
int numFormatters) {
inline void formatImpl(std::ostream &out, const char *fmt,
const detail::FormatArg *formatters, int numFormatters) {
// Saved stream state
std::streamsize origWidth = out.width();
std::streamsize origPrecision = out.precision();
......@@ -765,13 +746,9 @@ inline void formatImpl(std::ostream &out,
fmt = printFormatStringLiteral(out, fmt);
bool spacePadPositive = false;
int ntrunc = -1;
const char *fmtEnd = streamStateFromFormat(out,
spacePadPositive,
ntrunc,
fmt,
formatters,
argIndex,
numFormatters);
const char *fmtEnd =
streamStateFromFormat(out, spacePadPositive, ntrunc, fmt, formatters,
argIndex, numFormatters);
if (argIndex >= numFormatters) {
// Check args remain after reading any variable width/precision
TINYFORMAT_ERROR("tinyformat: Not enough format arguments");
......@@ -820,15 +797,14 @@ inline void formatImpl(std::ostream &out,
/// information has been stripped from the arguments, leaving just enough of a
/// common interface to perform formatting as required.
class FormatList {
public:
public:
FormatList(detail::FormatArg *formatters, int N)
: m_formatters(formatters), m_N(N) {}
friend void vformat(std::ostream &out,
const char *fmt,
friend void vformat(std::ostream &out, const char *fmt,
const FormatList &list);
private:
private:
const detail::FormatArg *m_formatters;
int m_N;
};
......@@ -841,7 +817,7 @@ namespace detail {
// Format list subclass with fixed storage to avoid dynamic allocation
template <int N>
class FormatListN : public FormatList {
public:
public:
template <typename... Args>
FormatListN(const Args &... args)
: FormatList(&m_formatterStore[0], N),
......@@ -849,14 +825,14 @@ public:
static_assert(sizeof...(args) == N, "Number of args must be N");
}
private:
private:
FormatArg m_formatterStore[N];
};
// Special 0-arg version - MSVC says zero-sized C array in struct is nonstandard
template <>
class FormatListN<0> : public FormatList {
public:
public:
FormatListN() : FormatList(0, 0) {}
};
......
文件已移动
......@@ -12,12 +12,12 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/string/to_string.h"
#include "to_string.h"
#include <gtest/gtest.h>
constexpr char kOutputString[] = "User Defined Output";
class UserDefinedClass {
public:
public:
};
std::ostream& operator<<(std::ostream& s, const UserDefinedClass& ins) {
......
......@@ -117,8 +117,8 @@ EOF
-DWITH_AVX=${WITH_AVX:-ON} \
-DWITH_SWIG_PY=ON \
-DWITH_STYLE_CHECK=OFF
make -j `nproc` gen_proto_py
make -j `nproc` paddle_python
make -j `nproc` gen_proto_py framework_py_proto
make -j `nproc` copy_paddle_pybind
make -j `nproc` paddle_docs paddle_docs_cn paddle_api_docs
popd
fi
......
......@@ -6,9 +6,9 @@ mkdir -p $TRAVIS_BUILD_DIR/build
cd $TRAVIS_BUILD_DIR/build
# Compile Documentation only.
cmake .. -DCMAKE_BUILD_TYPE=Debug -DWITH_GPU=OFF -DWITH_MKL=OFF -DWITH_DOC=ON
make -j `nproc` gen_proto_py
make -j `nproc` paddle_python
cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_MKL=OFF -DWITH_DOC=ON -DWITH_STYLE_CHECK=OFF
make -j `nproc` gen_proto_py framework_py_proto
make -j `nproc` copy_paddle_pybind
make -j `nproc` paddle_docs paddle_docs_cn paddle_api_docs
# check websites for broken links
......
......@@ -33,6 +33,57 @@ class VarBlock:
return "%s:%d:%d" % (self.varname, self.offset, self.size)
class UnionFind(object):
""" Union-find data struct.
Union-find is a data struct that keeps track of a set of elements partitioned
into a number of disjoint (non-overlapping) subsets.
Reference:
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Args:
elements(list): The initialize element list.
"""
def __init__(self, elementes=None):
self._parents = [] # index -> parent index
self._index = {} # element -> index
self._curr_idx = 0
if not elementes:
elementes = []
for ele in elementes:
self._parents.append(self._curr_idx)
self._index.update({ele: self._curr_idx})
self._curr_idx += 1
def find(self, x):
# Find the root index of given element x,
# execute the path compress while findind the root index
if not x in self._index:
return -1
idx = self._index[x]
while idx != self._parents[idx]:
t = self._parents[idx]
self._parents[idx] = self._parents[t]
idx = t
return idx
def union(self, x, y):
# Union two given element
x_root = self.find(x)
y_root = self.find(y)
if x_root == y_root:
return
self._parents[x_root] = y_root
def is_connected(self, x, y):
# If two given elements have the same root index,
# then they are connected.
return self.find(x) == self.find(y)
def same_or_split_var(p_name, var_name):
return p_name == var_name or p_name.startswith(var_name + ".block")
......@@ -140,6 +191,7 @@ class DistributeTranspiler:
for b in param_blocks:
varname, block_id, _ = b.split(":")
send_outputs.append(param_var_mapping[varname][int(block_id)])
# let send_op know which endpoint to send which var to, eplist has the same
# order as send_inputs.
eplist = split_method(send_inputs, pserver_endpoints)
......@@ -178,6 +230,21 @@ class DistributeTranspiler:
outputs={"Out": [orig_param]},
attrs={"axis": 0})
self.lr_param_mapping = self._create_lr_param_mapping()
def _create_lr_param_mapping(self):
lr_mapping = dict()
for _, opt_op in enumerate(self.optimize_ops):
if not opt_op.inputs or not opt_op.inputs.has_key("LearningRate") \
or not opt_op.inputs.has_key("Param"):
continue
lr = opt_op.inputs["LearningRate"].name
param = opt_op.inputs["Param"].name
if not lr_mapping.has_key(lr):
lr_mapping.update({lr: list()})
lr_mapping[lr].append(param)
return lr_mapping
def _create_vars_from_blocklist(self, program, block_list):
# Create respective variables using the block_list
block_map = dict()
......@@ -208,6 +275,7 @@ class DistributeTranspiler:
name="%s.block%d" % (varname, i),
psersistable=False,
dtype=orig_var.dtype,
type=orig_var.type,
shape=splited_shape) # flattend splited var
var_mapping[varname].append(var)
return var_mapping
......@@ -269,6 +337,7 @@ class DistributeTranspiler:
name="%s.trainer_%d" % (var.name, i),
psersistable=var.persistable,
dtype=var.dtype,
type=var.type,
shape=var.shape)
var_list.append(var_each)
return var_list
......@@ -300,52 +369,15 @@ class DistributeTranspiler:
pass
return orig_shape
def _op_input_var(self, op, varname):
pass
def _is_op_on_pserver(self, endpoint, all_ops, idx):
"""
Recursively check if the op need to run on current server.
Assume that ops are in the execution order.
"""
param_names = [
p.name for p in self.param_grad_ep_mapping[endpoint]["params"]
]
op = all_ops[idx]
input_names = set(op.input_names)
# TODO(typhoonzero): using Param and Grad input name to identify
# that the operator is an optimization operator, need a better way.
if "Param" in input_names:
if op.input("Param")[0] in param_names:
return True
else:
for n in param_names:
if same_or_split_var(n, op.input("Param")[0]) \
and n != op.input("Param")[0]:
return True
return False
else:
j = idx - 1
while j >= 0:
prev_op = all_ops[j]
# prev_output_names = [o.name for o in prev_op.outputs.values()]
# prev_input_names = [o.name for o in prev_op.inputs.values()]
# NOTE(typhoonzero): consider list input/output
prev_output_names = prev_op.desc.output_arg_names()
prev_input_names = prev_op.desc.input_arg_names()
found1 = False
found2 = False
for varname in op.desc.input_arg_names():
if varname in prev_output_names:
found1 = self._is_op_on_pserver(endpoint, all_ops, j)
# later ops may produce output for prev op's next batch use.
for varname in op.desc.output_arg_names():
if varname in prev_input_names:
found2 = self._is_op_on_pserver(endpoint, all_ops, j)
if found1 or found2:
return True
j -= 1
return False
def _fetch_var_names(self, param_dict):
res = []
if not param_dict:
return res
for _, values in param_dict.iteritems():
if not isinstance(values, list):
values = [values]
res += [v.name for v in values]
return res
def _append_pserver_ops(self, optimize_block, opt_op, endpoint):
program = optimize_block.program
......@@ -363,11 +395,7 @@ class DistributeTranspiler:
# do not append this op if current endpoint
# is not dealing with this grad block
return
merged_var = program.global_block().create_var(
name=grad_block.name,
persistable=grad_block.persistable,
dtype=grad_block.dtype,
shape=grad_block.shape)
merged_var = program.global_block().vars[grad_block.name]
# append merging ops if trainers > 1
if self.trainers > 1:
vars2merge = self._create_var_for_trainers(
......@@ -398,13 +426,19 @@ class DistributeTranspiler:
shape=param_block.shape)
new_inputs[key] = tmpvar
elif key == "LearningRate":
# leraning rate variable has already be created by non-optimize op,
# don't create it once again.
new_inputs[key] = program.global_block().vars[opt_op.input(key)[
0]]
for key in opt_op.input_names:
if key in ["Param", "Grad"]:
new_shape = None
if key in ["Param", "Grad", "LearningRate"]:
continue
var = program.global_block().vars[opt_op.input(key)[0]]
# update accumulator variable shape
param_shape = new_inputs["Param"].shape
var = program.global_block().vars[opt_op.input(key)[0]]
new_shape = self._get_optimizer_input_shape(opt_op.type, key,
var.shape, param_shape)
tmpvar = program.global_block().create_var(
......@@ -415,12 +449,11 @@ class DistributeTranspiler:
new_inputs[key] = tmpvar
# change output's ParamOut variable
outputs = self._get_output_map_from_op(program.global_block(), opt_op)
outputs["ParamOut"] = new_inputs["Param"]
opt_op.outputs["ParamOut"] = new_inputs["Param"]
optimize_block.append_op(
type=opt_op.type,
inputs=new_inputs,
outputs=outputs,
outputs=opt_op.outputs,
attrs=opt_op.attrs)
def _append_pserver_non_opt_ops(self, optimize_block, opt_op):
......@@ -428,11 +461,10 @@ class DistributeTranspiler:
# Append the ops for parameters that do not need to be optimized/updated
inputs = self._get_input_map_from_op(self.program.global_block().vars,
opt_op)
for var in inputs.itervalues():
if type(var) == list:
varlist = var
else:
varlist = [var]
for varlist in inputs.itervalues():
if not isinstance(varlist, list):
varlist = [varlist]
for var in varlist:
if not program.global_block().vars.has_key(var.name):
program.global_block().create_var(
......@@ -444,12 +476,70 @@ class DistributeTranspiler:
outputs = self._get_output_map_from_op(self.program.global_block().vars,
opt_op)
for varlist in outputs.itervalues():
if not isinstance(varlist, list):
varlist = [varlist]
for var in varlist:
program.global_block().create_var(
name=var.name,
persistable=var.persistable,
dtype=var.dtype,
shape=var.shape)
optimize_block.append_op(
type=opt_op.type,
inputs=inputs,
outputs=outputs,
attrs=opt_op.attrs)
def _is_op_connected(self, op1, op2):
# If one op's input is another op's output or
# one op's output is another op's input, we say
# the two operator is connected.
op1_input_names = self._fetch_var_names(op1.inputs)
op1_output_names = self._fetch_var_names(op1.outputs)
op2_input_names = self._fetch_var_names(op2.inputs)
op2_output_names = self._fetch_var_names(op2.outputs)
if set(op1_output_names) & set(op2_input_names) or \
set(op1_input_names) & set(op2_output_names):
return True
return False
def _create_ufind(self, optimize_ops):
# Create a unit find data struct by optimize ops
ufind = UnionFind(optimize_ops)
for i in xrange(len(optimize_ops)):
for j in xrange(i, len(optimize_ops)):
op1 = optimize_ops[i]
op2 = optimize_ops[j]
if self._is_op_connected(op1, op2):
ufind.union(op1, op2)
return ufind
def _is_opt_op(self, op):
# NOTE: It's a HACK implement.
# optimize op: SGDOptimize, MomentumOptimizer, AdamOptimizer and etc...
if op.inputs and op.inputs.has_key("Param") \
and op.inputs.has_key("LearningRate"):
return True
return False
def _is_opt_op_on_pserver(self, endpoint, op):
param_names = [
p.name for p in self.param_grad_ep_mapping[endpoint]["params"]
]
if op.inputs["Param"].name in param_names:
return True
else:
for n in param_names:
param = op.inputs["Param"].name
if same_or_split_var(n, param) and n != op.inputs["Param"].name:
return True
return False
return False
def get_pserver_program(self, endpoint):
"""
Get pserver side program using the endpoint
......@@ -469,26 +559,38 @@ class DistributeTranspiler:
pserver_program.global_block().create_var(
name=v.name, persistable=True, dtype=v.dtype, shape=v.shape)
for trainer_id in xrange(self.trainers):
print("create variable for program: %s.trainer_%d" %
(v.name, trainer_id))
pserver_program.global_block().create_var(
name="%s.trainer_%d" % (v.name, trainer_id),
persistable=True,
dtype=v.dtype,
shape=v.shape)
# step6
optimize_block = pserver_program.create_block(0)
# Iterate through the ops and append ops as needed
for idx, opt_op in enumerate(self.optimize_ops):
is_op_on_pserver = self._is_op_on_pserver(endpoint,
self.optimize_ops, idx)
if not is_op_on_pserver:
continue
if "Grad" in opt_op.desc.input_arg_names():
self._append_pserver_ops(optimize_block, opt_op, endpoint)
else:
self._append_pserver_non_opt_ops(optimize_block, opt_op)
# step 6.1
# Create a union-find data struct by optimize ops,
# If two ops are connected, we could add these two ops
# into one set.
ufind = self._create_ufind(self.optimize_ops)
# step 6.2
# Iterate through the ops and append optimize op which
# located on current pserver
opt_op_on_pserver = []
for _, op in enumerate(self.optimize_ops):
if self._is_opt_op(op) and self._is_opt_op_on_pserver(endpoint, op):
opt_op_on_pserver.append(op)
# step 6.3
# Iterate through the ops, and if an op and the optimize ops
# which located on current pserver are in one set, then
# append it into the sub program.
for _, op in enumerate(self.optimize_ops):
for _, opt_op in enumerate(opt_op_on_pserver):
if ufind.is_connected(op, opt_op):
if self._is_opt_op(op):
self._append_pserver_ops(optimize_block, op, endpoint)
else:
self._append_pserver_non_opt_ops(optimize_block, op)
break
# Append the listen_and_serv op
pserver_program.global_block().append_op(
type="listen_and_serv",
......
......@@ -16,6 +16,8 @@ import ops
from ops import *
import nn
from nn import *
import detection
from detection import *
import io
from io import *
import tensor
......@@ -28,6 +30,7 @@ import math_op_patch
from math_op_patch import *
__all__ = []
__all__ += detection.__all__
__all__ += nn.__all__
__all__ += io.__all__
__all__ += tensor.__all__
......
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
All layers just related to the detection neural network.
"""
from ..layer_helper import LayerHelper
__all__ = ['detection_output', ]
def detection_output(scores,
loc,
prior_box,
prior_box_var,
background_label=0,
nms_threshold=0.3,
nms_top_k=400,
keep_top_k=200,
score_threshold=0.01,
nms_eta=1.0):
"""
**Detection Output Layer**
This layer applies the NMS to the output of network and computes the
predict bounding box location. The output's shape of this layer could
be zero if there is no valid bounding box.
Args:
scores(Variable): A 3-D Tensor with shape [N, C, M] represents the
predicted confidence predictions. N is the batch size, C is the
class number, M is number of bounding boxes. For each category
there are total M scores which corresponding M bounding boxes.
loc(Variable): A 3-D Tensor with shape [N, M, 4] represents the
predicted locations of M bounding bboxes. N is the batch size,
and each bounding box has four coordinate values and the layout
is [xmin, ymin, xmax, ymax].
prior_box(Variable): A 2-D Tensor with shape [M, 4] holds M boxes,
each box is represented as [xmin, ymin, xmax, ymax],
[xmin, ymin] is the left top coordinate of the anchor box,
if the input is image feature map, they are close to the origin
of the coordinate system. [xmax, ymax] is the right bottom
coordinate of the anchor box.
prior_box_var(Variable): A 2-D Tensor with shape [M, 4] holds M group
of variance.
background_label(float): The index of background label,
the background label will be ignored. If set to -1, then all
categories will be considered.
nms_threshold(float): The threshold to be used in NMS.
nms_top_k(int): Maximum number of detections to be kept according
to the confidences aftern the filtering detections based on
score_threshold.
keep_top_k(int): Number of total bboxes to be kept per image after
NMS step. -1 means keeping all bboxes after NMS step.
score_threshold(float): Threshold to filter out bounding boxes with
low confidence score. If not provided, consider all boxes.
nms_eta(float): The parameter for adaptive NMS.
Returns:
The detected bounding boxes which are a Tensor.
Examples:
.. code-block:: python
pb = layers.data(name='prior_box', shape=[10, 4],
append_batch_size=False, dtype='float32')
pbv = layers.data(name='prior_box_var', shape=[10, 4],
append_batch_size=False, dtype='float32')
loc = layers.data(name='target_box', shape=[21, 4],
append_batch_size=False, dtype='float32')
scores = layers.data(name='scores', shape=[2, 21, 10],
append_batch_size=False, dtype='float32')
nmsed_outs = fluid.layers.detection_output(scores=scores,
loc=loc,
prior_box=pb,
prior_box_var=pbv)
"""
helper = LayerHelper("detection_output", **locals())
decoded_box = helper.create_tmp_variable(dtype=loc.dtype)
helper.append_op(
type="box_coder",
inputs={
'PriorBox': prior_box,
'PriorBoxVar': prior_box_var,
'TargetBox': loc
},
outputs={'OutputBox': decoded_box},
attrs={'code_type': 'decode_center_size'})
nmsed_outs = helper.create_tmp_variable(dtype=decoded_box.dtype)
helper.append_op(
type="multiclass_nms",
inputs={'Scores': scores,
'BBoxes': decoded_box},
outputs={'Out': nmsed_outs},
attrs={
'background_label': 0,
'nms_threshold': nms_threshold,
'nms_top_k': nms_top_k,
'keep_top_k': keep_top_k,
'score_threshold': score_threshold,
'nms_eta': 1.0
})
return nmsed_outs
......@@ -117,6 +117,7 @@ def monkey_patch_variable():
tmp_name = unique_tmp_name()
out = self.block.create_var(name=tmp_name, dtype=lhs_dtype)
self.block.append_op(
type=op_type,
inputs={'X': [self],
......@@ -151,7 +152,12 @@ def monkey_patch_variable():
("__div__", "elementwise_div", False),
("__rdiv__", "elementwise_div", True),
("__pow__", "elementwise_pow", False),
("__rpow__", "elementwise_pow", True)):
("__rpow__", "elementwise_pow", True),
# for logical compare
("__eq__", "equal", False),
("__ne__", "not_equal", False),
("__lt__", "less_than", False),
("__le__", "less_equal", False)):
setattr(Variable, method_name,
_elemwise_method_creator_(method_name, op_type, reverse))
......
......@@ -66,6 +66,8 @@ __all__ = [
'row_conv',
'multiplex',
'layer_norm',
'softmax_with_cross_entropy',
'smooth_l1',
]
......@@ -3091,3 +3093,122 @@ def multiplex(inputs, index):
'Ids': index},
outputs={'Out': [out]})
return out
def softmax_with_cross_entropy(logits, label, soft_label=False):
"""
**Softmax With Cross Entropy Operator.**
Cross entropy loss with softmax is used as the output layer extensively. This
operator computes the softmax normalized values for each row of the input
tensor, after which cross-entropy loss is computed. This provides a more
numerically stable gradient.
Because this operator performs a softmax on logits internally, it expects
unscaled logits. This operator should not be used with the output of
softmax operator since that would produce incorrect results.
When the attribute soft_label is set false, this operators expects mutually
exclusive hard labels, each sample in a batch is in exactly one class with a
probability of 1.0. Each sample in the batch will have a single label.
The equation is as follows:
1) Hard label (one-hot label, so every sample has exactly one class)
.. math::
loss_j = -\\text{logit}_{label_j} +
\\log\\left(\\sum_{i=0}^{K}\\exp(\\text{logit}_i)\\right), j = 1,..., K
2) Soft label (each sample can have a distribution over all classes)
.. math::
loss_j = -\\sum_{i=0}^{K}\\text{label}_i
\\left(\\text{logit}_i - \\log\\left(\\sum_{i=0}^{K}
\\exp(\\text{logit}_i)\\right)\\right), j = 1,...,K
Args:
logits (Variable): The unscaled log probabilities, which is a 2-D tensor
with shape [N x K]. N is the batch_size, and K is the class number.
label (Variable): The ground truth which is a 2-D tensor. If soft_label
is set to false, Label is a Tensor<int64> with shape [N x 1]. If
soft_label is set to true, Label is a Tensor<float/double> with
soft_label (bool): A flag to indicate whether to interpretate the given
labels as soft labels. By default, `soft_label` is set to False.
Returns:
Variable: The cross entropy loss is a 2-D tensor with shape [N x 1].
Examples:
.. code-block:: python
data = fluid.layers.data(name='data', shape=[128], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
fc = fluid.layers.fc(input=data, size=100)
out = fluid.layers.softmax_with_cross_entropy(logits=fc, label=label)
"""
helper = LayerHelper('softmax_with_cross_entropy', **locals())
softmax = helper.create_tmp_variable(dtype=logits.dtype)
loss = helper.create_tmp_variable(dtype=logits.dtype)
helper.append_op(
type='softmax_with_cross_entropy',
inputs={'Logits': logits,
'Label': label},
outputs={'Softmax': softmax,
'Loss': loss},
attrs={'soft_label': soft_label})
return loss
def smooth_l1(x, y, inside_weight=None, outside_weight=None, sigma=None):
"""
**Smooth L1 Loss Operator. **
This operator computes the smooth l1 loss for X and Y.
The operator takes the first dimension of X and Y as batch size.
For each instance, it computes the smooth l1 loss element by element first
and then sums all the losses. So the shape of Out is [batch_size, 1].
Args:
x (Variable): A tensor with rank at least 2. The input value of smooth
l1 loss op with shape [batch_size, dim1, ..., dimN].
y (Variable): A tensor with rank at least 2. The target value of smooth
l1 loss op with same shape as x.
inside_weight (Variable|None): A tensor with rank at least 2. This
input is optional and should have same shape with x. If provided,
the result of (x - y) will be multiplied by this tensor element by
element.
outside_weight (Variable|None): A tensor with rank at least 2. This
input is optional and should have same shape with x. If provided,
the out smooth l1 loss will be multiplied by this tensor element
by element.
sigma (float|None): Hyper parameter of smooth l1 loss op. A float scalar
with default value 1.0.
Returns:
Variable: A tensor with rank be 2. The output smooth l1 loss with
shape [batch_size, 1].
Examples:
.. code-block:: python
data = fluid.layers.data(name='data', shape=[128], dtype='float32')
label = fluid.layers.data(name='label', shape=[100], dtype='int64')
fc = fluid.layers.fc(input=data, size=100)
out = fluid.layers.smooth_l1(logits=fc, label=label)
"""
helper = LayerHelper('smooth_l1_loss', **locals())
diff = helper.create_tmp_variable(dtype=x.dtype)
loss = helper.create_tmp_variable(dtype=x.dtype)
helper.append_op(
type='smooth_l1_loss',
inputs={
'X': x,
'Y': y,
'InsideWeight': inside_weight,
'OutsideWeight': outside_weight
},
outputs={'Diff': diff,
'Out': loss},
attrs={'sigma': sigma})
return loss
......@@ -179,7 +179,7 @@ def polynomial_decay(learning_rate,
shape=[1], dtype='float32', value=1.0)
with layers.Switch() as switch:
with switch.case(layers.equal(x=global_step, y=zero_var)):
with switch.case(global_step == zero_var):
layers.assign(input=one_var, output=div_res)
decay_steps = decay_steps * div_res
else:
......@@ -229,7 +229,7 @@ def piecewise_decay(global_step, boundaries, values):
shape=[1], dtype='float32', value=float(boundaries[i]))
value_var = layers.fill_constant(
shape=[1], dtype='float32', value=float(values[i]))
with switch.case(layers.less_than(global_step, boundary_val)):
with switch.case(global_step < boundary_val):
layers.assign(value_var, lr)
last_value_var = layers.fill_constant(
shape=[1],
......
......@@ -99,7 +99,7 @@ elif training_role == "TRAINER":
exe.run(fluid.default_startup_program())
for pass_id in range(PASS_NUM):
for data in train_reader():
avg_cost_np = exe.run(fluid.default_main_program(),
avg_cost_np = exe.run(t.get_trainer_program(),
feed=feeder.feed(data),
fetch_list=[avg_cost])
print("avg_cost_np", avg_cost_np)
......
......@@ -64,9 +64,7 @@ exe = fluid.Executor(place)
[res1, res2] = exe.run(prog, fetch_list=[out1, out2])
test_pass = res1.shape == (10, 2) and res2.shape == (10, 1)
if not test_pass:
if not (res1.shape == (10, 2) and res2.shape == (10, 1)):
exit(1)
exit(0)
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import unittest
import paddle.v2.fluid.layers as layers
from paddle.v2.fluid.framework import Program, program_guard
class TestBook(unittest.TestCase):
def test_detection_output(self):
program = Program()
with program_guard(program):
pb = layers.data(
name='prior_box',
shape=[10, 4],
append_batch_size=False,
dtype='float32')
pbv = layers.data(
name='prior_box_var',
shape=[10, 4],
append_batch_size=False,
dtype='float32')
loc = layers.data(
name='target_box',
shape=[20, 4],
append_batch_size=False,
dtype='float32')
scores = layers.data(
name='scores',
shape=[2, 20, 10],
append_batch_size=False,
dtype='float32')
out = layers.detection_output(
scores=scores, loc=loc, prior_box=pb, prior_box_var=pbv)
self.assertIsNotNone(out)
print(str(program))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
import numpy as np
import sys
import collections
import math
from op_test import OpTest
class TestDetectionMAPOp(OpTest):
def set_data(self):
self.init_test_case()
self.mAP = [self.calc_map(self.tf_pos, self.tf_pos_lod)]
self.label = np.array(self.label).astype('float32')
self.detect = np.array(self.detect).astype('float32')
self.mAP = np.array(self.mAP).astype('float32')
if (len(self.class_pos_count) > 0):
self.class_pos_count = np.array(self.class_pos_count).astype(
'int32')
self.true_pos = np.array(self.true_pos).astype('float32')
self.false_pos = np.array(self.false_pos).astype('float32')
self.inputs = {
'Label': (self.label, self.label_lod),
'DetectRes': (self.detect, self.detect_lod),
'PosCount': self.class_pos_count,
'TruePos': (self.true_pos, self.true_pos_lod),
'FalsePos': (self.false_pos, self.false_pos_lod)
}
else:
self.inputs = {
'Label': (self.label, self.label_lod),
'DetectRes': (self.detect, self.detect_lod),
}
self.attrs = {
'overlap_threshold': self.overlap_threshold,
'evaluate_difficult': self.evaluate_difficult,
'ap_type': self.ap_type
}
self.out_class_pos_count = np.array(self.out_class_pos_count).astype(
'int')
self.out_true_pos = np.array(self.out_true_pos).astype('float32')
self.out_false_pos = np.array(self.out_false_pos).astype('float32')
self.outputs = {
'MAP': self.mAP,
'AccumPosCount': self.out_class_pos_count,
'AccumTruePos': (self.out_true_pos, self.out_true_pos_lod),
'AccumFalsePos': (self.out_false_pos, self.out_false_pos_lod)
}
def init_test_case(self):
self.overlap_threshold = 0.3
self.evaluate_difficult = True
self.ap_type = "integral"
self.label_lod = [[0, 2, 4]]
# label difficult xmin ymin xmax ymax
self.label = [[1, 0, 0.1, 0.1, 0.3, 0.3], [1, 1, 0.6, 0.6, 0.8, 0.8],
[2, 0, 0.3, 0.3, 0.6, 0.5], [1, 0, 0.7, 0.1, 0.9, 0.3]]
# label score xmin ymin xmax ymax difficult
self.detect_lod = [[0, 3, 7]]
self.detect = [
[1, 0.3, 0.1, 0.0, 0.4, 0.3], [1, 0.7, 0.0, 0.1, 0.2, 0.3],
[1, 0.9, 0.7, 0.6, 0.8, 0.8], [2, 0.8, 0.2, 0.1, 0.4, 0.4],
[2, 0.1, 0.4, 0.3, 0.7, 0.5], [1, 0.2, 0.8, 0.1, 1.0, 0.3],
[3, 0.2, 0.8, 0.1, 1.0, 0.3]
]
# label score true_pos false_pos
self.tf_pos_lod = [[0, 3, 7]]
self.tf_pos = [[1, 0.9, 1, 0], [1, 0.7, 1, 0], [1, 0.3, 0, 1],
[1, 0.2, 1, 0], [2, 0.8, 0, 1], [2, 0.1, 1, 0],
[3, 0.2, 0, 1]]
self.class_pos_count = []
self.true_pos_lod = [[]]
self.true_pos = [[]]
self.false_pos_lod = [[]]
self.false_pos = [[]]
def calc_map(self, tf_pos, tf_pos_lod):
mAP = 0.0
count = 0
def get_input_pos(class_pos_count, true_pos, true_pos_lod, false_pos,
false_pos_lod):
class_pos_count_dict = collections.Counter()
true_pos_dict = collections.defaultdict(list)
false_pos_dict = collections.defaultdict(list)
for i, count in enumerate(class_pos_count):
class_pos_count_dict[i] = count
for i in range(len(true_pos_lod[0]) - 1):
start = true_pos_lod[0][i]
end = true_pos_lod[0][i + 1]
for j in range(start, end):
true_pos_dict[i].append(true_pos[j])
for i in range(len(false_pos_lod[0]) - 1):
start = false_pos_lod[0][i]
end = false_pos_lod[0][i + 1]
for j in range(start, end):
false_pos_dict[i].append(false_pos[j])
return class_pos_count_dict, true_pos_dict, false_pos_dict
def get_output_pos(label_count, true_pos, false_pos):
max_label = 0
for (label, label_pos_num) in label_count.items():
if max_label < label:
max_label = label
label_number = max_label + 1
out_class_pos_count = []
out_true_pos_lod = [0]
out_true_pos = []
out_false_pos_lod = [0]
out_false_pos = []
for i in range(label_number):
out_class_pos_count.append([label_count[i]])
true_pos_list = true_pos[i]
out_true_pos += true_pos_list
out_true_pos_lod.append(len(out_true_pos))
false_pos_list = false_pos[i]
out_false_pos += false_pos_list
out_false_pos_lod.append(len(out_false_pos))
return out_class_pos_count, out_true_pos, [
out_true_pos_lod
], out_false_pos, [out_false_pos_lod]
def get_accumulation(pos_list):
sorted_list = sorted(pos_list, key=lambda pos: pos[0], reverse=True)
sum = 0
accu_list = []
for (score, count) in sorted_list:
sum += count
accu_list.append(sum)
return accu_list
label_count, true_pos, false_pos = get_input_pos(
self.class_pos_count, self.true_pos, self.true_pos_lod,
self.false_pos, self.false_pos_lod)
for (label, difficult, xmin, ymin, xmax, ymax) in self.label:
if self.evaluate_difficult:
label_count[label] += 1
elif not difficult:
label_count[label] += 1
true_pos = collections.defaultdict(list)
false_pos = collections.defaultdict(list)
for (label, score, tp, fp) in tf_pos:
true_pos[label].append([score, tp])
false_pos[label].append([score, fp])
for (label, label_pos_num) in label_count.items():
if label_pos_num == 0 or label not in true_pos: continue
label_true_pos = true_pos[label]
label_false_pos = false_pos[label]
accu_tp_sum = get_accumulation(label_true_pos)
accu_fp_sum = get_accumulation(label_false_pos)
precision = []
recall = []
for i in range(len(accu_tp_sum)):
precision.append(
float(accu_tp_sum[i]) /
float(accu_tp_sum[i] + accu_fp_sum[i]))
recall.append(float(accu_tp_sum[i]) / label_pos_num)
if self.ap_type == "11point":
max_precisions = [0.0] * 11
start_idx = len(accu_tp_sum) - 1
for j in range(10, -1, -1):
for i in range(start_idx, -1, -1):
if recall[i] < float(j) / 10.0:
start_idx = i
if j > 0:
max_precisions[j - 1] = max_precisions[j]
break
else:
if max_precisions[j] < precision[i]:
max_precisions[j] = precision[i]
for j in range(10, -1, -1):
mAP += max_precisions[j] / 11
count += 1
elif self.ap_type == "integral":
average_precisions = 0.0
prev_recall = 0.0
for i in range(len(accu_tp_sum)):
if math.fabs(recall[i] - prev_recall) > 1e-6:
average_precisions += precision[i] * \
math.fabs(recall[i] - prev_recall)
prev_recall = recall[i]
mAP += average_precisions
count += 1
self.out_class_pos_count, self.out_true_pos, self.out_true_pos_lod, self.out_false_pos, self.out_false_pos_lod = get_output_pos(
label_count, true_pos, false_pos)
if count != 0:
mAP /= count
return mAP * 100.0
def setUp(self):
self.op_type = "detection_map"
self.set_data()
def test_check_output(self):
self.check_output()
class TestDetectionMAPOpSkipDiff(TestDetectionMAPOp):
def init_test_case(self):
super(TestDetectionMAPOpSkipDiff, self).init_test_case()
self.evaluate_difficult = False
self.tf_pos_lod = [[0, 2, 6]]
# label score true_pos false_pos
self.tf_pos = [[1, 0.7, 1, 0], [1, 0.3, 0, 1], [1, 0.2, 1, 0],
[2, 0.8, 0, 1], [2, 0.1, 1, 0], [3, 0.2, 0, 1]]
class TestDetectionMAPOp11Point(TestDetectionMAPOp):
def init_test_case(self):
super(TestDetectionMAPOp11Point, self).init_test_case()
self.ap_type = "11point"
class TestDetectionMAPOpMultiBatch(TestDetectionMAPOp):
def init_test_case(self):
super(TestDetectionMAPOpMultiBatch, self).init_test_case()
self.class_pos_count = [0, 2, 1]
self.true_pos_lod = [[0, 0, 3, 5]]
self.true_pos = [[0.7, 1.], [0.3, 0.], [0.2, 1.], [0.8, 0.], [0.1, 1.]]
self.false_pos_lod = [[0, 0, 3, 5]]
self.false_pos = [[0.7, 0.], [0.3, 1.], [0.2, 0.], [0.8, 1.], [0.1, 0.]]
if __name__ == '__main__':
unittest.main()
......@@ -161,8 +161,8 @@ class TestBook(unittest.TestCase):
label=label,
chunk_scheme="IOB",
num_chunk_types=(label_dict_len - 1) / 2)
self.assertNotEqual(crf, None)
self.assertNotEqual(crf_decode, None)
self.assertFalse(crf is None)
self.assertFalse(crf_decode is None)
print(str(program))
......@@ -309,6 +309,24 @@ class TestBook(unittest.TestCase):
self.assertIsNotNone(out)
print(str(program))
def test_softmax_with_cross_entropy(self):
program = Program()
with program_guard(program):
x = layers.data(name='x', shape=[16], dtype='float32')
y = layers.data(name='label', shape=[1], dtype='int64')
loss = layers.softmax_with_cross_entropy(x, y)
self.assertIsNotNone(loss)
print(str(program))
def test_smooth_l1(self):
program = Program()
with program_guard(program):
x = layers.data(name='x', shape=[4], dtype='float32')
y = layers.data(name='label', shape=[4], dtype='float32')
loss = layers.smooth_l1(x, y)
self.assertIsNotNone(loss)
print(str(program))
if __name__ == '__main__':
unittest.main()
......@@ -137,7 +137,7 @@ def batched_multiclass_nms(boxes, scores, background, score_threshold,
det_outs = []
lod = [0]
for n in range(batch_size):
nmsed_outs, nmsed_num = multiclass_nms(boxes, scores[n], background,
nmsed_outs, nmsed_num = multiclass_nms(boxes[n], scores[n], background,
score_threshold, nms_threshold,
nms_top_k, keep_top_k)
lod.append(lod[-1] + nmsed_num)
......@@ -145,7 +145,7 @@ def batched_multiclass_nms(boxes, scores, background, score_threshold,
for c, indices in nmsed_outs.iteritems():
for idx in indices:
xmin, ymin, xmax, ymax = boxes[idx][:]
xmin, ymin, xmax, ymax = boxes[n][idx][:]
det_outs.append([c, scores[n][c][idx], xmin, ymin, xmax, ymax])
return det_outs, lod
......@@ -179,9 +179,9 @@ class TestMulticlassNMSOp(OpTest):
scores = np.reshape(scores, (N, M, C))
scores = np.transpose(scores, (0, 2, 1))
boxes = np.random.random((M, BOX_SIZE)).astype('float32')
boxes[:, 0:2] = boxes[:, 0:2] * 0.5
boxes[:, 2:4] = boxes[:, 2:4] * 0.5 + 0.5
boxes = np.random.random((N, M, BOX_SIZE)).astype('float32')
boxes[:, :, 0:2] = boxes[:, :, 0:2] * 0.5
boxes[:, :, 2:4] = boxes[:, :, 2:4] * 0.5 + 0.5
nmsed_outs, lod = batched_multiclass_nms(boxes, scores, background,
score_threshold, nms_threshold,
......
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
import numpy as np
import paddle.v2.fluid.layers as layers
import paddle.v2.fluid.framework as framework
import paddle.v2.fluid as fluid
class TestPythonOperatorOverride(unittest.TestCase):
def check_result(self, fn, place, dtype):
shape = [9, 10]
x_data = np.random.random(size=shape).astype(dtype)
y_data = np.random.random(size=shape).astype(dtype)
python_out = fn(x_data, y_data)
x_var = layers.create_global_var(
name='x', shape=shape, value=0.0, dtype=dtype, persistable=True)
y_var = layers.create_global_var(
name='y', shape=shape, value=0.0, dtype=dtype, persistable=True)
out = fn(x_var, y_var)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
fluid_out = exe.run(fluid.default_main_program(),
feed={'x': x_data,
'y': y_data},
fetch_list=[out])
np.testing.assert_array_equal(python_out, fluid_out[0])
def test_override(self):
# compare func to check
compare_fns = [
lambda _a, _b: _a == _b,
lambda _a, _b: _a != _b,
lambda _a, _b: _a < _b,
lambda _a, _b: _a <= _b,
lambda _a, _b: _a > _b,
lambda _a, _b: _a >= _b,
]
# places to check
places = [fluid.CPUPlace()]
if fluid.core.is_compiled_with_cuda():
places.append(fluid.CUDAPlace(0))
# dtypes to check
dtypes = ['int32', 'float32']
for place in places:
for dtype in dtypes:
for compare_fn in compare_fns:
with framework.program_guard(framework.Program(),
framework.Program()):
self.check_result(compare_fn, place, dtype)
if __name__ == '__main__':
unittest.main()
......@@ -73,5 +73,20 @@ class TestSequenceExpandCase3(TestSequenceExpand):
self.inputs = {'X': (x_data, x_lod), 'Y': (y_data, y_lod)}
class TestSequenceExpandCase4(TestSequenceExpand):
def set_data(self):
x_data = np.array(
[0.1, 0.3, 0.2, 0.15, 0.25, 0.2, 0.15, 0.25, 0.1, 0.3]).reshape(
[2, 5]).astype('float32')
x_lod = [[
0,
1,
2,
]]
y_data = np.random.uniform(0.1, 1, [2, 1]).astype('float32')
y_lod = [[0, 1, 2], [0, 1, 2]]
self.inputs = {'X': (x_data, x_lod), 'Y': (y_data, y_lod)}
if __name__ == '__main__':
unittest.main()
......@@ -20,11 +20,11 @@ from op_test import OpTest
class TestSplitOp(OpTest):
def setUp(self):
self.op_type = "split"
axis = 0
x = np.random.random((4, 2, 5)).astype('float32')
out = np.split(x, [1, 3], axis)
axis = 1
x = np.random.random((4, 5, 6)).astype('float32')
out = np.split(x, [2, 3], axis)
self.inputs = {'X': x}
self.attrs = {'axis': axis, 'sections': [1, 2, 1]}
self.attrs = {'axis': axis, 'sections': [2, 1, 2]}
self.outputs = {'Out': [('out%d' % i, out[i]) \
for i in xrange(len(out))]}
......
......@@ -43,7 +43,7 @@ def gen_match_and_neg_indices(num_prior, gt_lod, neg_lod):
def target_assign(encoded_box, gt_label, match_indices, neg_indices, gt_lod,
neg_lod, background_label):
neg_lod, mismatch_value):
batch_size, num_prior = match_indices.shape
# init target bbox
......@@ -52,7 +52,7 @@ def target_assign(encoded_box, gt_label, match_indices, neg_indices, gt_lod,
trg_box_wt = np.zeros((batch_size, num_prior, 1)).astype('float32')
# init target label
trg_label = np.ones((batch_size, num_prior, 1)).astype('int32')
trg_label = trg_label * background_label
trg_label = trg_label * mismatch_value
# init weight for target label
trg_label_wt = np.zeros((batch_size, num_prior, 1)).astype('float32')
......@@ -65,53 +65,90 @@ def target_assign(encoded_box, gt_label, match_indices, neg_indices, gt_lod,
# target bbox
for v, c in zip(col_val + gt_start, col_ids[0].tolist()):
trg_box[i][c][:] = encoded_box[v][c][:]
# weight for target bbox
trg_box_wt[i][col_ids] = 1.0
trg_label[i][col_ids] = gt_label[col_val + gt_start]
trg_label_wt[i][col_ids] = 1.0
# set target label weight to 1.0 for the negative samples
neg_ids = neg_indices[neg_lod[i]:neg_lod[i + 1]]
trg_label_wt[i][neg_ids] = 1.0
if neg_indices is not None:
neg_ids = neg_indices[neg_lod[i]:neg_lod[i + 1]]
trg_label_wt[i][neg_ids] = 1.0
return trg_box, trg_box_wt, trg_label, trg_label_wt
class TestTargetAssginOp(OpTest):
class TestTargetAssginFloatType(OpTest):
def setUp(self):
self.op_type = "target_assign"
num_prior = 120
num_class = 21
gt_lod = [0, 5, 11, 23]
neg_lod = [0, 4, 7, 13]
mismatch_value = 0
batch_size = len(gt_lod) - 1
num_gt = gt_lod[-1]
encoded_box = np.random.random((num_gt, num_prior, 4)).astype('float32')
gt_label = np.random.randint(
num_class, size=(num_gt, 1)).astype('int32')
match_indices, neg_indices = gen_match_and_neg_indices(num_prior,
gt_lod, neg_lod)
out, out_wt, _, _ = target_assign(encoded_box, gt_label, match_indices,
neg_indices, gt_lod, neg_lod,
mismatch_value)
# assign regression targets
x = encoded_box
self.inputs = {
'X': (x, [gt_lod]),
'MatchIndices': match_indices,
}
self.attrs = {'mismatch_value': mismatch_value}
self.outputs = {
'Out': out,
'OutWeight': out_wt,
}
def test_check_output(self):
self.check_output()
class TestTargetAssginIntType(OpTest):
def setUp(self):
self.op_type = "target_assign"
num_prior = 120
num_class = 21
gt_lod = [0, 5, 11, 23]
neg_lod = [0, 4, 7, 13]
mismatch_value = 0
batch_size = len(gt_lod) - 1
num_gt = gt_lod[-1]
background_label = 0
encoded_box = np.random.random((num_gt, num_prior, 4)).astype('float32')
gt_label = np.random.randint(
num_class, size=(num_gt, 1)).astype('int32')
match_indices, neg_indices = gen_match_and_neg_indices(num_prior,
gt_lod, neg_lod)
trg_box, trg_box_wt, trg_label, trg_label_wt = target_assign(
encoded_box, gt_label, match_indices, neg_indices, gt_lod, neg_lod,
background_label)
_, _, out, out_wt, = target_assign(encoded_box, gt_label, match_indices,
neg_indices, gt_lod, neg_lod,
mismatch_value)
# assign cassification argets
x = np.reshape(gt_label, (num_gt, 1, 1))
self.inputs = {
'EncodedGTBBox': (encoded_box, [gt_lod]),
'GTScoreLabel': (gt_label, [gt_lod]),
'MatchIndices': (match_indices),
'X': (x, [gt_lod]),
'MatchIndices': match_indices,
'NegIndices': (neg_indices, [neg_lod]),
}
self.attrs = {'background_label': background_label}
self.attrs = {'mismatch_value': mismatch_value}
self.outputs = {
'PredBBoxLabel': (trg_box),
'PredBBoxWeight': (trg_box_wt),
'PredScoreLabel': (trg_label),
'PredScoreWeight': (trg_label_wt),
'Out': out,
'OutWeight': out_wt,
}
def test_check_output(self):
......
......@@ -52,3 +52,5 @@ RUN wget -O /opt/swig-2.0.12.tar.gz https://sourceforge.net/projects/swig/files/
RUN mkdir -p /src && cd /src && git clone https://github.com/NVIDIA/nccl.git nccl && cd nccl &&\
make -j `nproc` install <NCCL_MAKE_OPTS> && cd .. && rm -rf nccl
CMD ["bash", "/paddle/paddle/scripts/docker/build.sh"]
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册
新手
引导
客服 返回
顶部