提交 d9f20576 编写于 作者: L Luo Tao

Merge branch 'develop' into mkl_script

...@@ -2,8 +2,8 @@ ...@@ -2,8 +2,8 @@
[![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/Paddle) [![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/Paddle)
[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](http://doc.paddlepaddle.org/develop/doc/) [![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/index_en.html)
[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](http://doc.paddlepaddle.org/develop/doc_cn/) [![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](http://www.paddlepaddle.org/docs/develop/documentation/zh/getstarted/index_cn.html)
[![Coverage Status](https://coveralls.io/repos/github/PaddlePaddle/Paddle/badge.svg?branch=develop)](https://coveralls.io/github/PaddlePaddle/Paddle?branch=develop) [![Coverage Status](https://coveralls.io/repos/github/PaddlePaddle/Paddle/badge.svg?branch=develop)](https://coveralls.io/github/PaddlePaddle/Paddle?branch=develop)
[![Release](https://img.shields.io/github/release/PaddlePaddle/Paddle.svg)](https://github.com/PaddlePaddle/Paddle/releases) [![Release](https://img.shields.io/github/release/PaddlePaddle/Paddle.svg)](https://github.com/PaddlePaddle/Paddle/releases)
[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) [![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
......
...@@ -7,3 +7,4 @@ API ...@@ -7,3 +7,4 @@ API
模型配置 <v2/model_configs.rst> 模型配置 <v2/model_configs.rst>
数据访问 <v2/data.rst> 数据访问 <v2/data.rst>
训练与应用 <v2/run_logic.rst> 训练与应用 <v2/run_logic.rst>
v2/fluid.rst
# Intel® MKL Packed on PaddlePaddle: Design Doc
## Contents
- [Overview](#overview)
- [Key Points](#key-points)
- [Background](#background)
- [Solution](#solution)
- [Actions](#actions)
- [CMake](#cmake)
- [Layers](#layers)
- [Unit Tests](#unit-tests)
- [Python API](#python-api)
- [Benchmarking](#benchmarking)
## Overview
我们计划将 Intel® MKL 中引入的 GEMM Packed APIs\[[1](#references)\] 集成到 PaddlePaddle 中,充分发挥英特尔平台的优势,有效提升PaddlePaddle在英特尔架构上的性能。
现阶段的优化主要针对 Recurrent Neural Network(以下简称RNN)相关层(包括`RecurrentLayer`, `GatedRecurrentLayer``LstmLayer`), 以及 PaddlePaddle V1 API。
## Key Points
### Background
目前PaddlePaddle采用了 Intel® MKL库的[cblas_?gemm](https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm)函数,这个函数本身会在计算前将原数据转换为更适合英特尔平台的内部格式。
1. 转换耗时 \
这一数据格式的转换操作(Packing),在问题本身的计算量比较小的时候,显得相对来说较为耗时。例如在DeepSpeech2 \[[2](#references)\] 的Vanilla RNN部分中,矩阵大小是`batch_size * 2048`
2. 转换冗余 \
由于在现有的某些情况下(例如RNN),多次调用 cblas_?gemm 会使用相同的原数据,因此,每次调用时对原数据的重复Packing便成为了冗余。
为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时,Intel® MKL 引入了以下四个API:
* cblas_?gemm_alloc
* cblas_?gemm_pack
* cblas_?gemm_compute
* cblas_?gemm_free
通过使用这些API,我们可以先完成对原数据的Packing操作,再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数,从而避免了Packing冗余。
### Solution
在RNN的情况下,同一次前向、后向(forward/backward)过程中所有时间步(time step)共享同一个权重(weight)。当只做推断(inference)时,各次前向之间也都使用了相同的权重,没有必要在每次前向中每个时间步的计算时对权重进行重复的Packing操作。
我们通过使用新引入的GEMM Packed APIs,在层初始化的时候,先完成对权重的Packing操作,然后在前向,后向时复用已经转换过的权重,并在每次权重更新后,对新的权重进行转换用于下次迭代。
* 优化前,对于序列长度(sequence length)为`T`的网络模型(model), `N`次迭代执行的转换次数为:
- `inference``N * T`
- `training``2 * N * T`
* 优化后,对于同样设置的网络模型,其转换次数减少至:
- `inference``1`
- `training``2 * N`
## Actions
添加的相关文件和目录结构如下:
```txt
PaddlePaddle/Paddle
├── ...
└── paddle/
├── ...
└── gserver/
├── ...
├── layers/
│ ├── ...
│ ├── MKLPackedRecurrentLayer.*
| ├── MKLPackedGatedRecurrentLayer.*
| ├── MKLPackedLstmLayer.*
| └── MKLPackedGemm.h
└── tests/
├── ...
└── test_MKLPacked.cpp
```
### CMake
在对应的`CMakeLists.txt`中根据`WITH_MKL`是否打开,来决定是否开启MKL Packed相关功能。
### Layers
所有的`MKLPacked*Layer`都继承于PaddlePaddle的基类`Layer`, 并添加头文件 `MKLPackedGemm.h`,该文件对相关GEMM Packed APIs做了封装。
### Unit Tests
我们会添加`test_MKLPacked.cpp`用于MKL Packed优化后layer的测试。
对于每一个新加的RNN layer,我们会对比如下2个方面:
1. 对比优化后layer自身,sequence mode(`rnn_use_batch=false`)与batch mode(`rnn_use_batch=true`)的结果。
2. 对比优化后layer与相对应的PaddlePaddle原有layer, 在batch mode下的结果。
### Python API
TBD
### Benchmarking
会添加相应的脚本用于测试和对比在使用MKL Packed recurrent layers 前后的网络性能。
## References
1. [Introducing the new Packed APIs for GEMM](https://software.intel.com/en-us/articles/introducing-the-new-packed-apis-for-gemm)
2. [DeepSpeech2 on PaddlePaddle](https://github.com/PaddlePaddle/DeepSpeech#deepspeech2-on-paddlepaddle)
...@@ -208,4 +208,3 @@ if use_mkldnn ...@@ -208,4 +208,3 @@ if use_mkldnn
但是在PaddlePaddle中,无论是重构前的layer还是重构后的op,都不会想要知道next layer/op的信息。 但是在PaddlePaddle中,无论是重构前的layer还是重构后的op,都不会想要知道next layer/op的信息。
4. MKL-DNN的高性能格式与PaddlePaddle原有的`NCHW`不同(PaddlePaddle中的cuDNN部分使用的也是`NCHW`,所以不存在这个问题)。 4. MKL-DNN的高性能格式与PaddlePaddle原有的`NCHW`不同(PaddlePaddle中的cuDNN部分使用的也是`NCHW`,所以不存在这个问题)。
所以需要引入一个转换方法,并且只需要在必要的时候转换这种格式,才能更好的发挥MKL-DNN的性能。 所以需要引入一个转换方法,并且只需要在必要的时候转换这种格式,才能更好的发挥MKL-DNN的性能。
# Design Doc: NCCL support in Paddle Fluid
## Abstract
This Design Doc refers to the NCCL feature in paddle. We propose an approach to support NCCL library both on a single machine and multiple machines. We wrapper the NCCL primitives `Broadcast`, `Allreduce`, `Reduce` as operators to utilize Multi-GPU powers in one script.
## Motivation
[NCCL](https://developer.nvidia.com/nccl) is a NVIDIA library support Multi-GPU communicating and optimized for NVIDIA GPUs, it provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter, that can achieve high bandwidth over PCIe and NVLink high-speed interconnect. With NCCL library, we can easily accelerate the training in parallel.
- Pros
1. easily plug-in with [NCCL2](https://developer.nvidia.com/nccl) library.
1. high performance in NVIDIA GPUs.
1. MPI like primitives, which have low learning cost for users.
- Cons
1. Only design for NVIDIA GPUs, not a general multi-device solution.
1. Although NCCL1 is opensourced under BSD license, but NCCL2 is not opensourced anymore.
At the beginning of training, the framework needs to distribute the same parameters to every GPU, and merge the gradients at any time user interests.
As a result, during training, we need the operations of peer to peer copy between different GPUs, aggregating gradients/parameters from GPUs, and broadcasting parameters to GPUs. Every GPU only need to run the operator with correct place information.
Besides, it needs interfaces to synchronize model update with each different GPU Cards.
## Implementation
As mentioned above, we wrap the NCCL routines as several kinds of operators. Need to note that NCCL need to create Communicator between gpu at the beginning, so there is a NCCLInit operator created.
### Transpiler
To be compatible with [parameter server design doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/ops/dist_train.md), the transpiler compiles the user defined operation graph into sub-graphs to be executed on different devices.
1. The user-defined model will be a single device program
2. Broadcast/Reduce operators between GPUs will be inserted into the program, even for the multi-node, may insert the `Send`, `Recv` operator.
*Broadcast, AllReduce in a single machine. And Broadcast, AllReduce, [Send, Recv](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/ops/dist_train.md#graph-converter) in multiple machines*
<img src="images/multigpu_before_convert.png" width="300"/>
After compiling, the graph as shows
<img src="images/multigpu_allreduce.png" width="1000"/>
Operators are added to the sub-graphs. Every GPU assigned a role of `rank0`, `rank1` etc.
- **Broadcast**. Broadcast operator distribute initialized parameter to all the GPUs from the GPU who owns it. e.g. from`rank0` GPU.
- **AllReduce**. AllReduce operator synchronizes parameters/gradients between GPUs. AllReduce implemented in the Ring-Based communicating method, avoid of the bottle neck in a single GPU.
Need to notice that AllReduce operator force GPUs synchronized at that point. The whole training process in asynchronous or synchronous mode depends on the AllReduce point in the graph.
As it shown in the picture, when each GPU compute the gradient of `W`, followed with a `AllReduce` operator, accumulate the `dW` to full batch of data, then run the optimize process individually and apply the gradient to its `W`.
- **AllReduce**
Need to note that our AllReduce operator is a ring-base AllReduce implementation. If we use the NCCL2 AllReduce primitive, every GPU optimized full batch of data, wasted (n-1) GPU compute resources. In addition, NCCL2 built-in AllReduce will only utilize the communicating resource during synchronization, then update the gradient will be a subsequent phase. In fact, we can amortize the update gradient time cost into the communicating phase. The process is
1. Every parameter has its root card. That card will responsible for aggregating the gradients from GPUs.
2. The whole model's parameter will be hashed to different root card, ensure the load balance between GPUs.
3. Logically neighberhood card will start send parameter to the next one. After one round, the parameter main card will aggregate the full gradients.
4. Then the root card will optimize the parameter.
5. This parameter card will send its optimized result to its neighberhood, then the neighberhood will send parameter to its next one.
6. Finish the sychronization round.
The total time cost will be 2 * (n-1) * per-parameter-send-time, we reach the goal of amortize the upgrade time into communicating phase.
import paddle.v2 as paddle
import numpy as np
paddle.init(use_gpu=False)
x = paddle.layer.data(name='x', type=paddle.data_type.dense_vector(2))
y_predict = paddle.layer.fc(input=x, size=1, act=paddle.activation.Linear())
# loading the model which generated by training
with open('params_pass_90.tar', 'r') as f:
parameters = paddle.parameters.Parameters.from_tar(f)
# Input multiple sets of data,Output the infer result in a array.
i = [[[1, 2]], [[3, 4]], [[5, 6]]]
print paddle.infer(output_layer=y_predict, parameters=parameters, input=i)
# Will print:
# [[ -3.24491572]
# [ -6.94668722]
# [-10.64845848]]
...@@ -26,6 +26,11 @@ def event_handler(event): ...@@ -26,6 +26,11 @@ def event_handler(event):
if event.batch_id % 1 == 0: if event.batch_id % 1 == 0:
print "Pass %d, Batch %d, Cost %f" % (event.pass_id, event.batch_id, print "Pass %d, Batch %d, Cost %f" % (event.pass_id, event.batch_id,
event.cost) event.cost)
# product model every 10 pass
if isinstance(event, paddle.event.EndPass):
if event.pass_id % 10 == 0:
with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
trainer.save_parameter_to_tar(f)
# define training dataset reader # define training dataset reader
......
...@@ -147,4 +147,9 @@ PaddlePaddle支持不同类型的输入数据,主要包括四种类型,和 ...@@ -147,4 +147,9 @@ PaddlePaddle支持不同类型的输入数据,主要包括四种类型,和
.. literalinclude:: src/train.py .. literalinclude:: src/train.py
:linenos: :linenos:
使用以上训练好的模型进行预测,取其中一个模型params_pass_90.tar,输入需要预测的向量组,然后打印输出:
.. literalinclude:: src/infer.py
:linenos:
有关线性回归的实际应用,可以参考PaddlePaddle book的 `第一章节 <http://book.paddlepaddle.org/index.html>`_。 有关线性回归的实际应用,可以参考PaddlePaddle book的 `第一章节 <http://book.paddlepaddle.org/index.html>`_。
...@@ -6,10 +6,10 @@ Core: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/framework ...@@ -6,10 +6,10 @@ Core: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/framework
Operator: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators Operator: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators
Optimizer: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/optimizer
Memory: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/memory Memory: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/memory
Platform: https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/platform
# Compile Time # Compile Time
The following **defines** the NN. The definition goes into this [protocol buffer](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto). The following **defines** the NN. The definition goes into this [protocol buffer](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto).
......
...@@ -18,11 +18,11 @@ PaddlePaddle为交叉编译提供了工具链配置文档[cmake/cross_compiling/ ...@@ -18,11 +18,11 @@ PaddlePaddle为交叉编译提供了工具链配置文档[cmake/cross_compiling/
- `CMAKE_SYSTEM_NAME`,CMake编译的目标平台,必须设置为`iOS`。在设置`CMAKE_SYSTEM_NAME=iOS`后,PaddlePaddle的CMake系统会自动编译所有的第三方依赖库,并且强制设置一些PaddlePaddle参数的值(`WITH_C_API=ON``WITH_GPU=OFF``WITH_AVX=OFF``WITH_PYTHON=OFF``WITH_RDMA=OFF`)。 - `CMAKE_SYSTEM_NAME`,CMake编译的目标平台,必须设置为`iOS`。在设置`CMAKE_SYSTEM_NAME=iOS`后,PaddlePaddle的CMake系统会自动编译所有的第三方依赖库,并且强制设置一些PaddlePaddle参数的值(`WITH_C_API=ON``WITH_GPU=OFF``WITH_AVX=OFF``WITH_PYTHON=OFF``WITH_RDMA=OFF`)。
- `WITH_C_API`,是否编译C-API预测库,必须设置为ON。在iOS平台上只支持使用C-API来预测。 - `WITH_C_API`,是否编译C-API预测库,必须设置为ON。在iOS平台上只支持使用C-API来预测。
- `WITH_SWIG_PY`,必须设置为ON。在iOS平台上不支持通过swig调用来训练或者预测。 - `WITH_SWIG_PY`,必须设置为`OFF`。在iOS平台上不支持通过swig调用来训练或者预测。
iOS平台可选配置参数: iOS平台可选配置参数:
- `IOS_PLATFORM`,可设置为`OS/SIMULATOR`,默认值为`OS` - `IOS_PLATFORM`,可设置为`OS`(默认值)或`SIMULATOR`
- `OS`,构建目标为`arm`架构的iPhone或者iPad等物理设备。 - `OS`,构建目标为`arm`架构的iPhone或者iPad等物理设备。
- `SIMULATOR`,构建目标为`x86`架构的模拟器平台。 - `SIMULATOR`,构建目标为`x86`架构的模拟器平台。
- `IOS_ARCH`,目标架构。针对不同的`IOS_PLATFORM`,可设置的目标架构如下表所示,默认编译所有架构: - `IOS_ARCH`,目标架构。针对不同的`IOS_PLATFORM`,可设置的目标架构如下表所示,默认编译所有架构:
......
# PaddlePaddle Compiling Guide for iOS
This tutorial will walk you through cross compiling the PaddlePaddle library for iOS from the source in MacOS.
## Preparation
Apple provides Xcode for cross-compiling and IDE for iOS development. Download from App store or [here](https://developer.apple.com/cn/xcode/). To verify your installation, run command as follows
```bash
$ xcodebuild -version
Xcode 9.0
Build version 9A235
```
## Cross-compiling configurations
PaddlePaddle provides cross-compiling toolchain configuration documentation [cmake/cross_compiling/ios.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/ios.cmake), which has some default settings for frequently used compilers.
There are some mandatory environment variables need to be set before cross compiling PaddlePaddle for iOS:
- `CMAKE_SYSTEM_NAME`, CMake compiling target platform name, has to be `iOS`. PaddlePaddle CMake will compile all the third party dependencies and enforce some parameters (`WITH_C_API=ON`, `WITH_GPU=OFF`, `WITH_AVX=OFF`, `WITH_PYTHON=OFF`,`WITH_RDMA=OFF`) when this variable is set with value `iOS`.
- `WITH_C_API`, Whether to compile inference C-API library, has to be `ON`, since C-API is the only supported interface for inferencing in iOS.
- `WITH_SWIG_PY`, has to be `OFF`. It's not supported to inference or train via swig in iOS.
Optional environment variables for iOS are:
- `IOS_PLATFORM`, either `OS` (default) or `SIMULATOR`.
- `OS`, build targets ARM-based physical devices like iPhone or iPad.
- `SIMULATOR`, build targets x86 architecture simulators.
- `IOS_ARCH`, target architecture. By default, all architecture types will be compiled. If you need to specify the architecture to compile for, please find valid values for different `IOS_PLATFORM` settings from the table below:
<table class="docutils">
<colgroup>
<col width="35%" />
<col width="65%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd">
<th class="head">IOS_PLATFORM</th>
<th class="head">IOS_ARCH</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even">
<td>OS</td>
<td>armv7, armv7s, arm64 </td>
</tr>
<tr class="row-odd">
<td>SIMULATOR</td>
<td>i386, x86_64 </td>
</tr>
</tbody>
</table>
- `IOS_DEPLOYMENT_TARGET`, minimum iOS version to deployment, `7.0` by default.
- `IOS_ENABLE_BITCODE`, whether to enable [Bitcode](https://developer.apple.com/library/content/documentation/IDEs/Conceptual/AppDistributionGuide/AppThinning/AppThinning.html#//apple_ref/doc/uid/TP40012582-CH35-SW3), values can be `ON/OFF`, `ON` by default.
- `IOS_USE_VECLIB_FOR_BLAS`, whether to use [vecLib](https://developer.apple.com/documentation/accelerate/veclib) framework for BLAS computing. values can be `ON/OFF`, `OFF` by default.
- `IOS_DEVELOPMENT_ROOT`, the path to `Developer` directory, can be explicitly set with your `/path/to/platform/Developer`. If left blank, PaddlePaddle will automatically pick the Xcode corresponding `platform`'s `Developer` directory based on your `IOS_PLATFORM` value.
- `IOS_SDK_ROOT`, the path to `SDK` root, can be explicitly set with your `/path/to/platform/Developer/SDKs/SDK`. if left black, PaddlePaddle will pick the latest SDK in the directory of `IOS_DEVELOPMENT_ROOT`.
other settings:
- `USE_EIGEN_FOR_BLAS`, whether to use Eigen for matrix computing. effective when `IOS_USE_VECLIB_FOR_BLAS=OFF`. Values can be `ON/OFF`, `OFF` by default.
- `HOST_C/CXX_COMPILER`, host C/C++ compiler. Uses value from environment variable `CC/CXX` by default or `cc/c++` if `CC/CXX` doesn't exist.
some typical cmake configurations:
```bash
cmake -DCMAKE_SYSTEM_NAME=iOS \
-DIOS_PLATFORM=OS \
-DIOS_ARCH="armv7;arm64" \
-DIOS_ENABLE_BITCODE=ON \
-DIOS_USE_VECLIB_FOR_BLAS=ON \
-DCMAKE_INSTALL_PREFIX=your/path/to/install \
-DWITH_C_API=ON \
-DWITH_TESTING=OFF \
-DWITH_SWIG_PY=OFF \
..
```
```bash
cmake -DCMAKE_SYSTEM_NAME=iOS \
-DIOS_PLATFORM=SIMULATOR \
-DIOS_ARCH="x86_64" \
-DIOS_USE_VECLIB_FOR_BLAS=ON \
-DCMAKE_INSTALL_PREFIX=your/path/to/install \
-DWITH_C_API=ON \
-DWITH_TESTING=OFF \
-DWITH_SWIG_PY=OFF \
..
```
You can set other compiling parameters for your own need. I.E. if you are trying to minimize the library size, set `CMAKE_BUILD_TYPE` with `MinSizeRel`; or if the performance is your concern, set `CMAKE_BUILD_TYPE` with `Release`. You can even manipulate the PaddlePaddle compiling procedure by manually set `CMAKE_C/CXX_FLAGS` values.
**TIPS for a better performance**:
- set `CMAKE_BUILD_TYPE` with `Release`
- set `IOS_USE_VECLIB_FOR_BLAS` with `ON`
## Compile and install
After CMake, run following commands, PaddlePaddle will download the compile 3rd party dependencies, compile and install PaddlePaddle inference library.
```
$ make
$ make install
```
Please Note: if you compiled PaddlePaddle in the source directory for other platforms, do remove `third_party` and `build` directory within the source with `rm -rf` to ensure that all the 3rd party libraries dependencies and PaddlePaddle is newly compiled with current CMake configuration.
`your/path/to/install` directory will have following directories after `compile` and `install`:
- `include`, contains all the C-API header files.
- `lib`, contains PaddlePaddle C-API static library.
- `third_party` contains all the 3rd party libraries.
Please note: if PaddlePaddle library need to support both physical devices and simulators, you will need to compile correspondingly, then merge fat library with `lipo`.
Now you will have PaddlePaddle library compiled and installed, the fat library can be used in deep learning related iOS APPs. Please refer to C-API documentation for usage guides.
...@@ -5,4 +5,5 @@ MOBILE ...@@ -5,4 +5,5 @@ MOBILE
:maxdepth: 1 :maxdepth: 1
cross_compiling_for_android_en.md cross_compiling_for_android_en.md
cross_compiling_for_ios_en.md
cross_compiling_for_raspberry_en.md cross_compiling_for_raspberry_en.md
...@@ -14,7 +14,7 @@ limitations under the License. */ ...@@ -14,7 +14,7 @@ limitations under the License. */
#include "error.h" #include "error.h"
const char* paddle_error_string(paddle_error err) { extern "C" const char* paddle_error_string(paddle_error err) {
switch (err) { switch (err) {
case kPD_NULLPTR: case kPD_NULLPTR:
return "nullptr error"; return "nullptr error";
......
...@@ -29,9 +29,17 @@ typedef enum { ...@@ -29,9 +29,17 @@ typedef enum {
kPD_UNDEFINED_ERROR = -1, kPD_UNDEFINED_ERROR = -1,
} paddle_error; } paddle_error;
#ifdef __cplusplus
extern "C" {
#endif
/** /**
* Error string for Paddle API. * Error string for Paddle API.
*/ */
PD_API const char* paddle_error_string(paddle_error err); PD_API const char* paddle_error_string(paddle_error err);
#ifdef __cplusplus
}
#endif
#endif #endif
...@@ -58,3 +58,6 @@ cc_test(var_type_inference_test SRCS var_type_inference_test.cc DEPS op_registry ...@@ -58,3 +58,6 @@ cc_test(var_type_inference_test SRCS var_type_inference_test.cc DEPS op_registry
proto_desc) proto_desc)
cc_library(selected_rows SRCS selected_rows.cc DEPS tensor) cc_library(selected_rows SRCS selected_rows.cc DEPS tensor)
cc_test(selected_rows_test SRCS selected_rows_test.cc DEPS selected_rows) cc_test(selected_rows_test SRCS selected_rows_test.cc DEPS selected_rows)
cc_library(init SRCS init.cc DEPS gflags executor place stringpiece)
cc_test(init_test SRCS init_test.cc DEPS init)
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <sstream> #include <sstream>
#include <vector> #include <vector>
......
...@@ -33,32 +33,12 @@ namespace framework { ...@@ -33,32 +33,12 @@ namespace framework {
const std::string kFeedOpType = "feed"; const std::string kFeedOpType = "feed";
const std::string kFetchOpType = "fetch"; const std::string kFetchOpType = "fetch";
Executor::Executor(const std::vector<platform::Place>& places) : own_(true) { DeviceContextPool* DeviceContextPool::pool = nullptr;
PADDLE_ENFORCE_GT(places.size(), 0);
device_contexts_.resize(places.size());
for (size_t i = 0; i < places.size(); i++) {
if (platform::is_cpu_place(places[i])) {
device_contexts_[i] = new platform::CPUDeviceContext(
boost::get<platform::CPUPlace>(places[i]));
} else if (platform::is_gpu_place(places[i])) {
#ifdef PADDLE_WITH_CUDA
device_contexts_[i] = new platform::CUDADeviceContext(
boost::get<platform::GPUPlace>(places[i]));
#else
PADDLE_THROW(
"'GPUPlace' is not supported, Please re-compile with WITH_GPU "
"option");
#endif
}
}
}
Executor::~Executor() { Executor::Executor(const std::vector<platform::Place>& places) {
if (own_) { DeviceContextPool& pool = DeviceContextPool::Get();
for (auto& device_context : device_contexts_) { auto borrowed_contexts = pool.Borrow(places);
delete device_context; device_contexts_.swap(borrowed_contexts);
}
}
} }
static void CreateTensor(Variable* var, VarDesc::VarType var_type) { static void CreateTensor(Variable* var, VarDesc::VarType var_type) {
...@@ -132,8 +112,5 @@ void Executor::Run(const ProgramDescBind& pdesc, Scope* scope, int block_id, ...@@ -132,8 +112,5 @@ void Executor::Run(const ProgramDescBind& pdesc, Scope* scope, int block_id,
} }
} }
Executor::Executor(const platform::DeviceContext& device)
: device_contexts_({&device}), own_(false) {}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -14,19 +14,98 @@ limitations under the License. */ ...@@ -14,19 +14,98 @@ limitations under the License. */
#pragma once #pragma once
#include <map>
#include <unordered_map>
#include "paddle/framework/op_info.h" #include "paddle/framework/op_info.h"
#include "paddle/framework/program_desc.h" #include "paddle/framework/program_desc.h"
#include "paddle/framework/scope.h" #include "paddle/framework/scope.h"
#include "paddle/framework/tensor.h" #include "paddle/framework/tensor.h"
#include "paddle/platform/device_context.h"
namespace paddle { namespace paddle {
namespace framework { namespace framework {
class DeviceContextPool {
public:
static DeviceContextPool& Get() {
PADDLE_ENFORCE_NOT_NULL(pool, "Need to Create DeviceContextPool first!");
return *pool;
}
static DeviceContextPool& Create(const std::vector<platform::Place>& places) {
if (pool == nullptr) {
pool = new DeviceContextPool(places);
}
return *pool;
}
std::vector<const platform::DeviceContext*> Borrow(
const std::vector<platform::Place>& places) {
PADDLE_ENFORCE_GT(places.size(), 0);
PADDLE_ENFORCE_LE(places.size(), device_contexts_.size());
std::vector<const platform::DeviceContext*> borrowed_contexts;
for (auto& place : places) {
auto range = device_contexts_.equal_range(place);
if (range.first == range.second) {
PADDLE_THROW(
"'Place' is not supported, Please re-compile with WITH_GPU "
"option");
}
// TODO(dzhwinter) : assign the first found device. Will enhanced later.
// device load balancer maybe useful here.
borrowed_contexts.emplace_back(range.first->second);
}
return borrowed_contexts;
}
explicit DeviceContextPool(const std::vector<platform::Place>& places) {
PADDLE_ENFORCE_GT(places.size(), 0);
for (size_t i = 0; i < places.size(); i++) {
if (platform::is_cpu_place(places[i])) {
device_contexts_.emplace(
places[i], new platform::CPUDeviceContext(
boost::get<platform::CPUPlace>(places[i])));
} else if (platform::is_gpu_place(places[i])) {
#ifdef PADDLE_WITH_CUDA
device_contexts_.emplace(
places[i], new platform::CUDADeviceContext(
boost::get<platform::GPUPlace>(places[i])));
#else
PADDLE_THROW(
"'GPUPlace' is not supported, Please re-compile with WITH_GPU "
"option");
#endif
}
}
}
~DeviceContextPool() {}
private:
static DeviceContextPool* pool;
struct Hash {
std::hash<int> hash_;
size_t operator()(const platform::Place& place) const {
return hash_(place.which());
}
};
std::unordered_multimap<const platform::Place, const platform::DeviceContext*,
Hash>
device_contexts_;
DISABLE_COPY_AND_ASSIGN(DeviceContextPool);
};
class Executor { class Executor {
public: public:
// TODO(dzhwinter) : Do not rely on this function, it will be removed
explicit Executor(const platform::DeviceContext& device)
: Executor(std::vector<platform::Place>({device.GetPlace()})) {}
explicit Executor(const platform::Place& place)
: Executor(std::vector<platform::Place>({place})) {}
explicit Executor(const std::vector<platform::Place>& places); explicit Executor(const std::vector<platform::Place>& places);
explicit Executor(const platform::DeviceContext& devices);
~Executor();
/* @Brief /* @Brief
* Runtime evaluation of the given ProgramDesc under certain Scope * Runtime evaluation of the given ProgramDesc under certain Scope
...@@ -39,7 +118,6 @@ class Executor { ...@@ -39,7 +118,6 @@ class Executor {
private: private:
std::vector<const platform::DeviceContext*> device_contexts_; std::vector<const platform::DeviceContext*> device_contexts_;
bool own_;
}; };
} // namespace framework } // namespace framework
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <algorithm>
#include <string>
#include "paddle/framework/executor.h"
#include "paddle/framework/init.h"
#include "paddle/platform/place.h"
#include "paddle/string/piece.h"
namespace paddle {
namespace framework {
std::once_flag gflags_init_flag;
// TODO(qijun) move init gflags to init.cc
void InitGflags(std::vector<std::string> &argv) {
std::call_once(gflags_init_flag, [&]() {
int argc = argv.size();
char **arr = new char *[argv.size()];
std::string line;
for (size_t i = 0; i < argv.size(); i++) {
arr[i] = &argv[i][0];
line += argv[i];
line += ' ';
}
google::ParseCommandLineFlags(&argc, &arr, true);
VLOG(1) << "Init commandline: " << line;
});
}
bool InitDevices(const std::vector<std::string> &devices) {
// device format
// CPU
// GPU:1
// TODO(dzhwinter) : add device format annotation for users.
std::vector<platform::Place> places;
for (auto &device : devices) {
auto p = string::Piece(device);
if (string::Find(p, ':', 0) == string::Piece::npos) {
places.emplace_back(platform::CPUPlace());
} else if (string::HasPrefix(p, "GPU")) {
#ifdef PADDLE_WITH_CUDA
auto pos = string::RFind(p, ':', string::Piece::npos);
auto number = device.substr(pos + 1);
places.emplace_back(platform::GPUPlace(std::stoi(number)));
#else
LOG(WARNING)
<< "'GPU' is not supported, Please re-compile with WITH_GPU option";
#endif
} else {
return false;
}
}
if (std::find_if(places.begin(), places.end(),
[&](const platform::Place &place) {
return platform::is_cpu_place(place);
}) == places.end()) {
places.emplace_back(platform::CPUPlace());
LOG(WARNING) << "Not specified any device, use CPU by Default.";
}
DeviceContextPool::Create(places);
return true;
return true;
}
} // namespace framework
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <mutex>
#include "gflags/gflags.h"
#include "glog/logging.h"
namespace paddle {
namespace framework {
void InitGflags(std::vector<std::string> &argv);
bool InitDevices(const std::vector<std::string> &devices);
} // namespace framework
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "gtest/gtest.h"
#include "paddle/framework/init.h"
TEST(Init, InitDevices) {
using paddle::framework::InitDevices;
std::vector<std::string> ds1 = {"CPU"};
ASSERT_EQ(InitDevices(ds1), true);
#ifdef PADDLE_WITH_CUDA
std::vector<std::string> ds2 = {"CPU", "GPU:0", "GPU:1"};
ASSERT_EQ(InitDevices(ds2), true);
#endif
}
...@@ -126,6 +126,11 @@ public: ...@@ -126,6 +126,11 @@ public:
inputData += inputChannels * inputHeight * inputWidth; inputData += inputChannels * inputHeight * inputWidth;
outputData += outputChannels * outputHeight * outputWidth; outputData += outputChannels * outputHeight * outputWidth;
} }
#ifdef PADDLE_MOBILE_INFERENCE
if (Device == DEVICE_TYPE_CPU) {
memory_.reset();
}
#endif
} }
}; };
......
...@@ -84,12 +84,15 @@ void ROIPoolLayer::forward(PassType passType) { ...@@ -84,12 +84,15 @@ void ROIPoolLayer::forward(PassType passType) {
size_t poolChannelOffset = pooledHeight_ * pooledWidth_; size_t poolChannelOffset = pooledHeight_ * pooledWidth_;
real* outputData = outputValue->getData(); real* outputData = outputValue->getData();
real* argmaxData = nullptr;
if (passType != PASS_TEST) {
Matrix::resizeOrCreate(maxIdxs_, Matrix::resizeOrCreate(maxIdxs_,
numROIs, numROIs,
channels_ * pooledHeight_ * pooledWidth_, channels_ * pooledHeight_ * pooledWidth_,
false, false,
false); false);
real* argmaxData = maxIdxs_->getData(); argmaxData = maxIdxs_->getData();
}
for (size_t n = 0; n < numROIs; ++n) { for (size_t n = 0; n < numROIs; ++n) {
// the first five elememts of each RoI should be: // the first five elememts of each RoI should be:
...@@ -128,23 +131,29 @@ void ROIPoolLayer::forward(PassType passType) { ...@@ -128,23 +131,29 @@ void ROIPoolLayer::forward(PassType passType) {
bool isEmpty = (hend <= hstart) || (wend <= wstart); bool isEmpty = (hend <= hstart) || (wend <= wstart);
size_t poolIndex = ph * pooledWidth_ + pw; size_t poolIndex = ph * pooledWidth_ + pw;
outputData[poolIndex] = isEmpty ? 0 : -FLT_MAX; outputData[poolIndex] = isEmpty ? 0 : -FLT_MAX;
if (argmaxData) {
argmaxData[poolIndex] = -1; argmaxData[poolIndex] = -1;
}
for (size_t h = hstart; h < hend; ++h) { for (size_t h = hstart; h < hend; ++h) {
for (size_t w = wstart; w < wend; ++w) { for (size_t w = wstart; w < wend; ++w) {
size_t index = h * width_ + w; size_t index = h * width_ + w;
if (batchData[index] > outputData[poolIndex]) { if (batchData[index] > outputData[poolIndex]) {
outputData[poolIndex] = batchData[index]; outputData[poolIndex] = batchData[index];
if (argmaxData) {
argmaxData[poolIndex] = index; argmaxData[poolIndex] = index;
} }
} }
} }
} }
} }
}
batchData += channelOffset; batchData += channelOffset;
outputData += poolChannelOffset; outputData += poolChannelOffset;
if (argmaxData) {
argmaxData += poolChannelOffset; argmaxData += poolChannelOffset;
} }
}
bottomROIs += roiOffset; bottomROIs += roiOffset;
} }
if (useGpu_) { if (useGpu_) {
......
...@@ -171,12 +171,31 @@ void SequenceToBatch::sequence2BatchCopy(Matrix &batch, ...@@ -171,12 +171,31 @@ void SequenceToBatch::sequence2BatchCopy(Matrix &batch,
hl_sequence2batch_copy( hl_sequence2batch_copy(
batchData, seqData, idxData, seqWidth, batchCount, seq2batch); batchData, seqData, idxData, seqWidth, batchCount, seq2batch);
} else { } else {
for (int i = 0; i < batchCount; ++i) {
if (seq2batch) { if (seq2batch) {
#ifdef PADDLE_USE_MKLML
const int blockMemSize = 8 * 1024;
const int blockSize = blockMemSize / sizeof(real);
#pragma omp parallel for collapse(2)
for (int i = 0; i < batchCount; ++i) {
for (int j = 0; j < seqWidth; j += blockSize) {
memcpy(batch.rowBuf(i) + j,
sequence.rowBuf(idxData[i]) + j,
(j + blockSize > seqWidth) ? (seqWidth - j) * sizeof(real)
: blockMemSize);
}
}
#else
for (int i = 0; i < batchCount; ++i) {
memcpy(batch.rowBuf(i), memcpy(batch.rowBuf(i),
sequence.rowBuf(idxData[i]), sequence.rowBuf(idxData[i]),
seqWidth * sizeof(real)); seqWidth * sizeof(real));
}
#endif
} else { } else {
#ifdef PADDLE_USE_MKLML
#pragma omp parallel for
#endif
for (int i = 0; i < batchCount; ++i) {
memcpy(sequence.rowBuf(idxData[i]), memcpy(sequence.rowBuf(idxData[i]),
batch.rowBuf(i), batch.rowBuf(i),
seqWidth * sizeof(real)); seqWidth * sizeof(real));
......
...@@ -79,7 +79,7 @@ public: ...@@ -79,7 +79,7 @@ public:
#ifdef PADDLE_CUDA_FP16 #ifdef PADDLE_CUDA_FP16
HOSTDEVICE inline explicit float16(const half& h) { HOSTDEVICE inline explicit float16(const half& h) {
#if CUDA_VERSION >= 9000 #if CUDA_VERSION >= 9000
x = reinterpret_cast<__half_raw*>(&h)->x; x = reinterpret_cast<__half_raw*>(const_cast<half*>(&h))->x;
#else #else
x = h.x; x = h.x;
#endif // CUDA_VERSION >= 9000 #endif // CUDA_VERSION >= 9000
...@@ -145,7 +145,7 @@ public: ...@@ -145,7 +145,7 @@ public:
#ifdef PADDLE_CUDA_FP16 #ifdef PADDLE_CUDA_FP16
HOSTDEVICE inline float16& operator=(const half& rhs) { HOSTDEVICE inline float16& operator=(const half& rhs) {
#if CUDA_VERSION >= 9000 #if CUDA_VERSION >= 9000
x = reinterpret_cast<__half_raw*>(&rhs)->x; x = reinterpret_cast<__half_raw*>(const_cast<half*>(&rhs))->x;
#else #else
x = rhs.x; x = rhs.x;
#endif #endif
......
...@@ -19,6 +19,7 @@ limitations under the License. */ ...@@ -19,6 +19,7 @@ limitations under the License. */
#include <stdlib.h> // for malloc and free #include <stdlib.h> // for malloc and free
#include <sys/mman.h> // for mlock and munlock #include <sys/mman.h> // for mlock and munlock
#include <algorithm> // for std::max
#include "gflags/gflags.h" #include "gflags/gflags.h"
...@@ -28,7 +29,7 @@ limitations under the License. */ ...@@ -28,7 +29,7 @@ limitations under the License. */
// of memory available to the system for paging. So, by default, we // of memory available to the system for paging. So, by default, we
// should set false to use_pinned_memory. // should set false to use_pinned_memory.
DEFINE_bool(use_pinned_memory, true, "If set, allocate cpu pinned memory."); DEFINE_bool(use_pinned_memory, true, "If set, allocate cpu pinned memory.");
DECLARE_double(fraction_of_gpu_memory_to_use);
namespace paddle { namespace paddle {
namespace memory { namespace memory {
namespace detail { namespace detail {
...@@ -77,45 +78,20 @@ void* GPUAllocator::Alloc(size_t& index, size_t size) { ...@@ -77,45 +78,20 @@ void* GPUAllocator::Alloc(size_t& index, size_t size) {
// CUDA documentation doesn't explain if cudaMalloc returns nullptr // CUDA documentation doesn't explain if cudaMalloc returns nullptr
// if size is 0. We just make sure it does. // if size is 0. We just make sure it does.
if (size <= 0) return nullptr; if (size <= 0) return nullptr;
void* p;
size_t available = 0;
size_t capacity = 0;
paddle::platform::GpuMemoryUsage(available, capacity);
// Reserve memory for page tables, etc.
size_t reserving = 0.05 * capacity + paddle::platform::GpuMinChunkSize();
size_t usable = available > reserving ? available - reserving : 0;
// If remaining size no less than expected size, using general
// cudaMalloc to allocate GPU memory.
void* p = 0;
if (size <= usable) {
cudaError_t result = cudaMalloc(&p, size); cudaError_t result = cudaMalloc(&p, size);
if (result == cudaSuccess) { if (result == cudaSuccess) {
index = 0; index = 0;
gpu_alloc_size_ += size; gpu_alloc_size_ += size;
return p; return p;
} } else {
} LOG(WARNING)
<< "Cannot malloc " << size / 1024.0 / 1024.0
// If remaining size less than expected size or cudaMalloc failed, << " MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use "
// cudaMallocHost will be considered as a fallback allocator. "environment variable to a lower value. Current value is "
// << FLAGS_fraction_of_gpu_memory_to_use;
// NOTE: here, we use GpuMaxAllocSize() as the maximum memory size
// of host fallback allocation. Allocates too much would reduce
// the amount of memory available to the underlying system for paging.
usable = paddle::platform::GpuMaxAllocSize() - fallback_alloc_size_;
if (size > usable) return nullptr;
cudaError_t result = cudaMallocHost(&p, size);
if (result == cudaSuccess) {
index = 1;
fallback_alloc_size_ += size;
return p;
}
return nullptr; return nullptr;
}
} }
void GPUAllocator::Free(void* p, size_t size, size_t index) { void GPUAllocator::Free(void* p, size_t size, size_t index) {
......
...@@ -32,6 +32,13 @@ class ChunkEvalOp : public framework::OperatorWithKernel { ...@@ -32,6 +32,13 @@ class ChunkEvalOp : public framework::OperatorWithKernel {
"Output(Recall) of ChunkEvalOp should not be null."); "Output(Recall) of ChunkEvalOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("F1-Score"), PADDLE_ENFORCE(ctx->HasOutput("F1-Score"),
"Output(F1-Score) of ChunkEvalOp should not be null."); "Output(F1-Score) of ChunkEvalOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("NumInferChunks"),
"Output(NumInferChunks) of ChunkEvalOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("NumLabelChunks"),
"Output(NumLabelChunks) of ChunkEvalOp should not be null.");
PADDLE_ENFORCE(
ctx->HasOutput("NumCorrectChunks"),
"Output(NumCorrectChunks) of ChunkEvalOp should not be null.");
auto inference_dim = ctx->GetInputDim("Inference"); auto inference_dim = ctx->GetInputDim("Inference");
auto label_dim = ctx->GetInputDim("Label"); auto label_dim = ctx->GetInputDim("Label");
...@@ -42,6 +49,9 @@ class ChunkEvalOp : public framework::OperatorWithKernel { ...@@ -42,6 +49,9 @@ class ChunkEvalOp : public framework::OperatorWithKernel {
ctx->SetOutputDim("Precision", {1}); ctx->SetOutputDim("Precision", {1});
ctx->SetOutputDim("Recall", {1}); ctx->SetOutputDim("Recall", {1});
ctx->SetOutputDim("F1-Score", {1}); ctx->SetOutputDim("F1-Score", {1});
ctx->SetOutputDim("NumInferChunks", {1});
ctx->SetOutputDim("NumLabelChunks", {1});
ctx->SetOutputDim("NumCorrectChunks", {1});
} }
protected: protected:
...@@ -70,6 +80,16 @@ class ChunkEvalOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -70,6 +80,16 @@ class ChunkEvalOpMaker : public framework::OpProtoAndCheckerMaker {
"sensitivity) of chunks on the given mini-batch."); "sensitivity) of chunks on the given mini-batch.");
AddOutput("F1-Score", AddOutput("F1-Score",
"(float). The evaluated F1-Score on the given mini-batch."); "(float). The evaluated F1-Score on the given mini-batch.");
AddOutput("NumInferChunks",
"(int64_t). The number of chunks in Inference on the given "
"mini-batch.");
AddOutput(
"NumLabelChunks",
"(int64_t). The number of chunks in Label on the given mini-batch.");
AddOutput(
"NumCorrectChunks",
"(int64_t). The number of chunks both in Inference and Label on the "
"given mini-batch.");
AddAttr<int>("num_chunk_types", AddAttr<int>("num_chunk_types",
"(int). The number of chunk type. See below for details."); "(int). The number of chunk type. See below for details.");
AddAttr<std::string>( AddAttr<std::string>(
......
...@@ -111,9 +111,7 @@ class ChunkEvalKernel : public framework::OpKernel<T> { ...@@ -111,9 +111,7 @@ class ChunkEvalKernel : public framework::OpKernel<T> {
std::vector<Segment> label_segments; std::vector<Segment> label_segments;
std::vector<Segment> output_segments; std::vector<Segment> output_segments;
std::set<int> excluded_chunk_types; std::set<int> excluded_chunk_types;
int64_t num_output_segments = 0;
int64_t num_label_segments = 0;
int64_t num_correct = 0;
if (context.Attr<std::string>("chunk_scheme") == "IOB") { if (context.Attr<std::string>("chunk_scheme") == "IOB") {
num_tag_types = 2; num_tag_types = 2;
tag_begin = 0; tag_begin = 0;
...@@ -151,12 +149,24 @@ class ChunkEvalKernel : public framework::OpKernel<T> { ...@@ -151,12 +149,24 @@ class ChunkEvalKernel : public framework::OpKernel<T> {
auto* precision = context.Output<Tensor>("Precision"); auto* precision = context.Output<Tensor>("Precision");
auto* recall = context.Output<Tensor>("Recall"); auto* recall = context.Output<Tensor>("Recall");
auto* f1 = context.Output<Tensor>("F1-Score"); auto* f1 = context.Output<Tensor>("F1-Score");
auto* num_infer_chunks = context.Output<Tensor>("NumInferChunks");
auto* num_label_chunks = context.Output<Tensor>("NumLabelChunks");
auto* num_correct_chunks = context.Output<Tensor>("NumCorrectChunks");
const int64_t* inference_data = inference->data<int64_t>(); const int64_t* inference_data = inference->data<int64_t>();
const int64_t* label_data = label->data<int64_t>(); const int64_t* label_data = label->data<int64_t>();
T* precision_data = precision->mutable_data<T>(context.GetPlace()); T* precision_data = precision->mutable_data<T>(context.GetPlace());
T* racall_data = recall->mutable_data<T>(context.GetPlace()); T* racall_data = recall->mutable_data<T>(context.GetPlace());
T* f1_data = f1->mutable_data<T>(context.GetPlace()); T* f1_data = f1->mutable_data<T>(context.GetPlace());
int64_t* num_infer_chunks_data =
num_infer_chunks->mutable_data<int64_t>(context.GetPlace());
int64_t* num_label_chunks_data =
num_label_chunks->mutable_data<int64_t>(context.GetPlace());
int64_t* num_correct_chunks_data =
num_correct_chunks->mutable_data<int64_t>(context.GetPlace());
*num_infer_chunks_data = 0;
*num_label_chunks_data = 0;
*num_correct_chunks_data = 0;
auto lod = label->lod(); auto lod = label->lod();
PADDLE_ENFORCE_EQ(lod.size(), 1UL, "Only support one level sequence now."); PADDLE_ENFORCE_EQ(lod.size(), 1UL, "Only support one level sequence now.");
...@@ -166,16 +176,22 @@ class ChunkEvalKernel : public framework::OpKernel<T> { ...@@ -166,16 +176,22 @@ class ChunkEvalKernel : public framework::OpKernel<T> {
for (int i = 0; i < num_sequences; ++i) { for (int i = 0; i < num_sequences; ++i) {
int seq_length = lod[0][i + 1] - lod[0][i]; int seq_length = lod[0][i + 1] - lod[0][i];
EvalOneSeq(inference_data + lod[0][i], label_data + lod[0][i], seq_length, EvalOneSeq(inference_data + lod[0][i], label_data + lod[0][i], seq_length,
output_segments, label_segments, num_output_segments, output_segments, label_segments, *num_infer_chunks_data,
num_label_segments, num_correct, num_chunk_types, *num_label_chunks_data, *num_correct_chunks_data,
num_tag_types, other_chunk_type, tag_begin, tag_inside, num_chunk_types, num_tag_types, other_chunk_type, tag_begin,
tag_end, tag_single, excluded_chunk_types); tag_inside, tag_end, tag_single, excluded_chunk_types);
} }
*precision_data = !num_output_segments ? 0 : static_cast<T>(num_correct) / *precision_data = !(*num_infer_chunks_data)
num_output_segments; ? 0
*racall_data = !num_label_segments ? 0 : static_cast<T>(num_correct) / : static_cast<T>(*num_correct_chunks_data) /
num_label_segments; (*num_infer_chunks_data);
*f1_data = !num_correct ? 0 : 2 * (*precision_data) * (*racall_data) / *racall_data = !(*num_label_chunks_data)
? 0
: static_cast<T>(*num_correct_chunks_data) /
(*num_label_chunks_data);
*f1_data = !(*num_correct_chunks_data)
? 0
: 2 * (*precision_data) * (*racall_data) /
((*precision_data) + (*racall_data)); ((*precision_data) + (*racall_data));
} }
......
...@@ -277,6 +277,14 @@ void set_constant_with_place<platform::CPUPlace>( ...@@ -277,6 +277,14 @@ void set_constant_with_place<platform::CPUPlace>(
TensorSetConstantCPU(tensor, value)); TensorSetConstantCPU(tensor, value));
} }
template <>
void set_constant_with_place<platform::MKLDNNPlace>(
const platform::DeviceContext& context, framework::Tensor* tensor,
float value) {
framework::VisitDataType(framework::ToDataType(tensor->type()),
TensorSetConstantCPU(tensor, value));
}
struct TensorSetConstantWithPlace : public boost::static_visitor<void> { struct TensorSetConstantWithPlace : public boost::static_visitor<void> {
TensorSetConstantWithPlace(const platform::DeviceContext& context, TensorSetConstantWithPlace(const platform::DeviceContext& context,
framework::Tensor* tensor, float value) framework::Tensor* tensor, float value)
......
...@@ -274,7 +274,7 @@ void set_constant_with_place<platform::GPUPlace>( ...@@ -274,7 +274,7 @@ void set_constant_with_place<platform::GPUPlace>(
} }
template <> template <>
void set_constant_with_place<platform::CudnnPlace>( void set_constant_with_place<platform::CUDNNPlace>(
const platform::DeviceContext& context, framework::Tensor* tensor, const platform::DeviceContext& context, framework::Tensor* tensor,
float value) { float value) {
set_constant_with_place<platform::GPUPlace>(context, tensor, value); set_constant_with_place<platform::GPUPlace>(context, tensor, value);
......
...@@ -34,20 +34,32 @@ class ReshapeOp : public framework::OperatorWithKernel { ...@@ -34,20 +34,32 @@ class ReshapeOp : public framework::OperatorWithKernel {
auto shape = ctx->Attrs().Get<std::vector<int>>("shape"); auto shape = ctx->Attrs().Get<std::vector<int>>("shape");
PADDLE_ENFORCE(shape.size() > 0, "Attr(shape) shouldn't be empty."); PADDLE_ENFORCE(shape.size() > 0, "Attr(shape) shouldn't be empty.");
auto x_dims = ctx->GetInputDim("X"); auto x_dims = ctx->GetInputDim("X");
// TODO(qiao) change batch_size
for (size_t i = 1; i < shape.size(); ++i) { std::vector<size_t> neg_dims_idx;
PADDLE_ENFORCE(shape[i] > 0, // set some dimension to -1 if it is unknown
"Each dimension of Attr(shape) " const int unknown_size = -1;
"must be positive except the first one."); for (size_t i = 0; i < shape.size(); ++i) {
PADDLE_ENFORCE(shape[i] > 0 || shape[i] == unknown_size,
"Each dimension of Attr(shape) must be positive or %d.",
unknown_size);
if (shape[i] == unknown_size) {
neg_dims_idx.push_back(i);
PADDLE_ENFORCE(neg_dims_idx.size() <= 1,
"Only one dimension of Attr(shape) can be unknown.");
} }
if (shape[0] < 0) {
shape[0] = x_dims[0];
} }
// capacity check
int64_t capacity = int64_t capacity =
std::accumulate(shape.begin(), shape.end(), 1, std::multiplies<int>()); std::accumulate(shape.begin(), shape.end(), 1, std::multiplies<int>());
int64_t in_size = framework::product(x_dims); int64_t in_size = framework::product(x_dims);
PADDLE_ENFORCE_EQ(capacity, in_size, if (neg_dims_idx.size() == 1) {
// dim infer
shape[neg_dims_idx[0]] = in_size / (-capacity);
// recalculate capacity
capacity = shape[neg_dims_idx[0]] * (-capacity);
}
// capacity check
PADDLE_ENFORCE(capacity == in_size,
"The size of Input(X) mismatches with Attr(shape)."); "The size of Input(X) mismatches with Attr(shape).");
// resize output // resize output
std::vector<int64_t> shape_int64(shape.size(), 0); std::vector<int64_t> shape_int64(shape.size(), 0);
...@@ -88,6 +100,9 @@ the tensor X into a 2-D tensor: ...@@ -88,6 +100,9 @@ the tensor X into a 2-D tensor:
[[1, 2, 3, 4]] [[1, 2, 3, 4]]
One dimension in the target shape can be set -1, representing that its
size is unknown. In this case, the real dimension will be infered from
the original shape of Input(X) and other dimensions in the target shape.
)DOC"); )DOC");
} }
}; };
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Indicesou may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/spp_op.h"
namespace paddle {
namespace operators {
class SppOpMaker : public framework::OpProtoAndCheckerMaker {
public:
SppOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(
"X",
"(Tensor) The input tensor of spp operator. "
"The format of input tensor is NCHW. Where N is batch size, C is the "
"number of channels, H and W is the height and width of feature.");
AddOutput("Out",
"(Tensor) The output tensor of spp operator."
"N * M."
"M = C * H * W");
AddAttr<int>("pyramid_height", "(int), multi level pooling");
AddAttr<std::string>(
"pooling_type",
"(string), pooling type, can be \"max\" for max-pooling "
"and \"avg\" for average-pooling.")
.InEnum({"max", "avg"});
AddComment(R"DOC(
"With spatial pyramid pooling, the input image can
be of any sizes. This not only allows arbitrary aspect
ratios, but also allows arbitrary scales. We can resize
the input image to any scale (e.g., min(w, h)=180, 224,
...) and apply the same deep network. When the
input image is at different scales, the network (with
the same filter sizes) will extract features at different
scales. The scales play important roles in traditional
methods.
Input shape: $(N, C_{in}, H_{in}, W_{in})$
Output shape: $(H_{out}, W_{out})$
Where
$$
H_{out} = N \\
W_{out} = (((4^pyramid_height) - 1) / (4 - 1))$ * C_{in}
$$
paper https://arxiv.org/pdf/1406.4729v4.pdf
)DOC");
}
};
class SppOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of SppOp"
"should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of SppOp should not be null.");
auto in_x_dims = ctx->GetInputDim("X");
int pyramid_height = ctx->Attrs().Get<int>("pyramid_height");
PADDLE_ENFORCE(in_x_dims.size() == 4,
"Spping intput must be of 4-dimensional.");
int outlen = ((std::pow(4, pyramid_height) - 1) / (4 - 1)) * in_x_dims[1];
std::vector<int64_t> output_shape({in_x_dims[0], outlen});
ctx->SetOutputDim("Out", framework::make_ddim(output_shape));
}
};
class SppOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) must not be null.");
PADDLE_ENFORCE(ctx->HasOutput(framework::GradVarName("X")),
"Input(X@GRAD) should not be null.");
ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X"));
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP(spp, ops::SppOp, ops::SppOpMaker, spp_grad, ops::SppOpGrad);
REGISTER_OP_CPU_KERNEL(
spp, ops::SppKernel<paddle::platform::CPUDeviceContext, float>,
ops::SppKernel<paddle::platform::CPUDeviceContext, double>);
REGISTER_OP_CPU_KERNEL(
spp_grad, ops::SppGradKernel<paddle::platform::CPUDeviceContext, float>,
ops::SppGradKernel<paddle::platform::CPUDeviceContext, double>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Indicesou may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/spp_op.h"
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(
spp, ops::SppKernel<paddle::platform::CUDADeviceContext, float>,
ops::SppKernel<paddle::platform::CUDADeviceContext, double>);
REGISTER_OP_CUDA_KERNEL(
spp_grad, ops::SppGradKernel<paddle::platform::CUDADeviceContext, float>,
ops::SppGradKernel<paddle::platform::CUDADeviceContext, double>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Indicesou may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/op_registry.h"
#include "paddle/operators/math/math_function.h"
#include "paddle/operators/math/pooling.h"
#include "paddle/operators/strided_memcpy.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class SppKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
const framework::Tensor* in_x = context.Input<framework::Tensor>("X");
auto* out = context.Output<framework::Tensor>("Out");
int pyramid_height = context.template Attr<int>("pyramid_height");
std::string pooling_type =
context.template Attr<std::string>("pooling_type");
out->mutable_data<T>(context.GetPlace());
auto out_stride = framework::stride(out->dims());
int input_h = in_x->dims()[2];
int input_w = in_x->dims()[3];
size_t output_offset = 0;
for (int p = 0; p < pyramid_height; ++p) {
int bins = std::pow(2, p);
int kernel_size_h = std::ceil(input_h / static_cast<double>(bins));
int kernel_size_w = std::ceil(input_w / static_cast<double>(bins));
int padding_h = (kernel_size_h * bins - input_h + 1) / 2;
int padding_w = (kernel_size_w * bins - input_w + 1) / 2;
std::vector<int> kernel_size({kernel_size_h, kernel_size_w});
std::vector<int> strides({kernel_size_h, kernel_size_w});
std::vector<int> paddings({padding_h, padding_w});
// pooling output shape
framework::Tensor out_level;
std::vector<int64_t> output_shape_vec(
{in_x->dims()[0], in_x->dims()[1], bins, bins});
framework::DDim output_shape(framework::make_ddim(output_shape_vec));
out_level.mutable_data<T>(output_shape, context.GetPlace());
// pooling
if (pooling_type == "max") {
math::Pool2dFunctor<DeviceContext, math::MaxPool<T>, T> pool_forward;
math::MaxPool<T> max_process;
pool_forward(context.template device_context<DeviceContext>(), *in_x,
kernel_size, strides, paddings, max_process, &out_level);
} else if (pooling_type == "avg") {
math::Pool2dFunctor<DeviceContext, math::AvgPool<T>, T> pool_forward;
math::AvgPool<T> avg_process;
pool_forward(context.template device_context<DeviceContext>(), *in_x,
kernel_size, strides, paddings, avg_process, &out_level);
}
// flatten pooling output shape
int output_flatten_w = in_x->dims()[1] * bins * bins;
std::vector<int64_t> output_flatten_shape_vec(
{in_x->dims()[0], output_flatten_w});
framework::DDim output_flatten_shape(
framework::make_ddim(output_flatten_shape_vec));
out_level.Resize(output_flatten_shape);
// concat
auto out_level_stride = framework::stride(out_level.dims());
StridedMemcpy<T>(context.template device_context<DeviceContext>(),
out_level.data<T>(), out_level_stride, out_level.dims(),
out_stride, out->data<T>() + output_offset);
output_offset += out_level.dims()[1] * out_level_stride[1];
}
}
};
template <typename DeviceContext, typename T>
class SppGradKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
const framework::Tensor* in_x = context.Input<framework::Tensor>("X");
const framework::Tensor* out = context.Input<framework::Tensor>("Out");
const framework::Tensor* out_grad =
context.Input<framework::Tensor>(framework::GradVarName("Out"));
framework::Tensor* in_x_grad =
context.Output<framework::Tensor>(framework::GradVarName("X"));
int pyramid_height = context.template Attr<int>("pyramid_height");
std::string pooling_type =
context.template Attr<std::string>("pooling_type");
auto& device_ctx = context.template device_context<DeviceContext>();
math::SetConstant<DeviceContext, T> zero;
in_x_grad->mutable_data<T>(context.GetPlace());
zero(device_ctx, in_x_grad, static_cast<T>(0));
auto out_stride = framework::stride(out->dims());
int input_h = in_x->dims()[2];
int input_w = in_x->dims()[3];
size_t out_offset = 0;
for (int p = 0; p < pyramid_height; ++p) {
int bins = std::pow(2, p);
int kernel_size_h = std::ceil(input_h / static_cast<double>(bins));
int kernel_size_w = std::ceil(input_w / static_cast<double>(bins));
int padding_h = (kernel_size_h * bins - input_h + 1) / 2;
int padding_w = (kernel_size_w * bins - input_w + 1) / 2;
std::vector<int> kernel_size({kernel_size_h, kernel_size_w});
std::vector<int> strides({kernel_size_h, kernel_size_w});
std::vector<int> paddings({padding_h, padding_w});
// split out and outgrad ... to flatten
framework::Tensor out_level;
framework::Tensor outgrad_level;
int out_flatten_w = in_x->dims()[1] * bins * bins;
std::vector<int64_t> out_flatten_shape_vec(
{in_x->dims()[0], out_flatten_w});
framework::DDim out_flatten_shape(
framework::make_ddim(out_flatten_shape_vec));
out_level.mutable_data<T>(out_flatten_shape, context.GetPlace());
outgrad_level.mutable_data<T>(out_flatten_shape, context.GetPlace());
auto flatten_stride = framework::stride(out_level.dims());
// memcpy
StridedMemcpy<T>(context.template device_context<DeviceContext>(),
out->data<T>() + out_offset, out_stride,
out_level.dims(), flatten_stride, out_level.data<T>());
StridedMemcpy<T>(context.template device_context<DeviceContext>(),
out_grad->data<T>() + out_offset, out_stride,
outgrad_level.dims(), flatten_stride,
outgrad_level.data<T>());
out_offset += out_level.dims()[1] * out_stride[1];
// flatten backward to nchw
std::vector<int64_t> out_shape_vec({in_x->dims()[0], in_x->dims()[1]});
out_shape_vec.push_back(
(input_h - kernel_size_h + 2 * padding_h) / kernel_size_h + 1);
out_shape_vec.push_back(
(input_w - kernel_size_w + 2 * padding_w) / kernel_size_w + 1);
framework::DDim out_shape(framework::make_ddim(out_shape_vec));
out_level.ShareDataWith(out_level);
out_level.Resize(out_shape);
outgrad_level.ShareDataWith(outgrad_level);
outgrad_level.Resize(out_shape);
// pooling backward
if (pooling_type == "max") {
math::MaxPool2dGradFunctor<DeviceContext, T> pool2d_backward;
pool2d_backward(context.template device_context<DeviceContext>(), *in_x,
*&out_level, *&outgrad_level, kernel_size, strides,
paddings, in_x_grad);
} else if (pooling_type == "avg") {
math::Pool2dGradFunctor<DeviceContext, math::AvgPoolGrad<T>, T>
pool_backward;
math::AvgPoolGrad<T> avg_process;
pool_backward(context.template device_context<DeviceContext>(), *in_x,
*&out_level, *&outgrad_level, kernel_size, strides,
paddings, avg_process, in_x_grad);
}
}
}
};
} // namespace operators
} // namespace paddle
...@@ -125,21 +125,21 @@ cudnnHandle_t CUDADeviceContext::cudnn_handle() const { return cudnn_handle_; } ...@@ -125,21 +125,21 @@ cudnnHandle_t CUDADeviceContext::cudnn_handle() const { return cudnn_handle_; }
cudaStream_t CUDADeviceContext::stream() const { return stream_; } cudaStream_t CUDADeviceContext::stream() const { return stream_; }
CudnnDeviceContext::CudnnDeviceContext(CudnnPlace place) CUDNNDeviceContext::CUDNNDeviceContext(CUDNNPlace place)
: CUDADeviceContext(place), place_(place) { : CUDADeviceContext(place), place_(place) {
PADDLE_ENFORCE(dynload::cudnnCreate(&cudnn_handle_)); PADDLE_ENFORCE(dynload::cudnnCreate(&cudnn_handle_));
PADDLE_ENFORCE(dynload::cudnnSetStream(cudnn_handle_, stream())); PADDLE_ENFORCE(dynload::cudnnSetStream(cudnn_handle_, stream()));
} }
CudnnDeviceContext::~CudnnDeviceContext() { CUDNNDeviceContext::~CUDNNDeviceContext() {
SetDeviceId(place_.device); SetDeviceId(place_.device);
Wait(); Wait();
PADDLE_ENFORCE(dynload::cudnnDestroy(cudnn_handle_)); PADDLE_ENFORCE(dynload::cudnnDestroy(cudnn_handle_));
} }
Place CudnnDeviceContext::GetPlace() const { return CudnnPlace(); } Place CUDNNDeviceContext::GetPlace() const { return CUDNNPlace(); }
cudnnHandle_t CudnnDeviceContext::cudnn_handle() const { return cudnn_handle_; } cudnnHandle_t CUDNNDeviceContext::cudnn_handle() const { return cudnn_handle_; }
#endif #endif
......
...@@ -86,10 +86,10 @@ class CUDADeviceContext : public DeviceContext { ...@@ -86,10 +86,10 @@ class CUDADeviceContext : public DeviceContext {
cublasHandle_t cublas_handle_; cublasHandle_t cublas_handle_;
}; };
class CudnnDeviceContext : public CUDADeviceContext { class CUDNNDeviceContext : public CUDADeviceContext {
public: public:
explicit CudnnDeviceContext(CudnnPlace place); explicit CUDNNDeviceContext(CUDNNPlace place);
virtual ~CudnnDeviceContext(); virtual ~CUDNNDeviceContext();
/*! \brief Return place in the device context. */ /*! \brief Return place in the device context. */
Place GetPlace() const final; Place GetPlace() const final;
...@@ -99,7 +99,7 @@ class CudnnDeviceContext : public CUDADeviceContext { ...@@ -99,7 +99,7 @@ class CudnnDeviceContext : public CUDADeviceContext {
private: private:
cudnnHandle_t cudnn_handle_; cudnnHandle_t cudnn_handle_;
CudnnPlace place_; CUDNNPlace place_;
}; };
#endif #endif
......
...@@ -47,14 +47,14 @@ TEST(Device, CUDADeviceContext) { ...@@ -47,14 +47,14 @@ TEST(Device, CUDADeviceContext) {
} }
} }
TEST(Device, CudnnDeviceContext) { TEST(Device, CUDNNDeviceContext) {
using paddle::platform::CudnnDeviceContext; using paddle::platform::CUDNNDeviceContext;
using paddle::platform::CudnnPlace; using paddle::platform::CUDNNPlace;
if (paddle::platform::dynload::HasCUDNN()) { if (paddle::platform::dynload::HasCUDNN()) {
int count = paddle::platform::GetCUDADeviceCount(); int count = paddle::platform::GetCUDADeviceCount();
for (int i = 0; i < count; ++i) { for (int i = 0; i < count; ++i) {
CudnnDeviceContext* device_context = CUDNNDeviceContext* device_context =
new CudnnDeviceContext(CudnnPlace(i)); new CUDNNDeviceContext(CUDNNPlace(i));
cudnnHandle_t cudnn_handle = device_context->cudnn_handle(); cudnnHandle_t cudnn_handle = device_context->cudnn_handle();
ASSERT_NE(nullptr, cudnn_handle); ASSERT_NE(nullptr, cudnn_handle);
ASSERT_NE(nullptr, device_context->stream()); ASSERT_NE(nullptr, device_context->stream());
......
...@@ -25,6 +25,11 @@ void *nccl_dso_handle; ...@@ -25,6 +25,11 @@ void *nccl_dso_handle;
NCCL_RAND_ROUTINE_EACH(DEFINE_WRAP); NCCL_RAND_ROUTINE_EACH(DEFINE_WRAP);
void LoadNCCLDSO() {
platform::call_once(nccl_dso_flag,
[] { GetNCCLDsoHandle(&nccl_dso_handle); });
}
} // namespace dynload } // namespace dynload
} // namespace platform } // namespace platform
} // namespace paddle } // namespace paddle
...@@ -28,14 +28,14 @@ extern std::once_flag nccl_dso_flag; ...@@ -28,14 +28,14 @@ extern std::once_flag nccl_dso_flag;
extern void* nccl_dso_handle; extern void* nccl_dso_handle;
#ifdef PADDLE_USE_DSO #ifdef PADDLE_USE_DSO
extern void LoadNCCLDSO();
#define DECLARE_DYNAMIC_LOAD_NCCL_WRAP(__name) \ #define DECLARE_DYNAMIC_LOAD_NCCL_WRAP(__name) \
struct DynLoad__##__name { \ struct DynLoad__##__name { \
template <typename... Args> \ template <typename... Args> \
auto operator()(Args... args) -> decltype(__name(args...)) { \ auto operator()(Args... args) -> decltype(__name(args...)) { \
using nccl_func = decltype(__name(args...)) (*)(Args...); \ using nccl_func = decltype(__name(args...)) (*)(Args...); \
platform::call_once(nccl_dso_flag, \ paddle::platform::dynload::LoadNCCLDSO(); \
paddle::platform::dynload::GetNCCLDsoHandle, \
&nccl_dso_handle); \
void* p_##__name = dlsym(nccl_dso_handle, #__name); \ void* p_##__name = dlsym(nccl_dso_handle, #__name); \
return reinterpret_cast<nccl_func>(p_##__name)(args...); \ return reinterpret_cast<nccl_func>(p_##__name)(args...); \
} \ } \
......
...@@ -73,19 +73,20 @@ size_t GpuMaxChunkSize() { ...@@ -73,19 +73,20 @@ size_t GpuMaxChunkSize() {
size_t available = 0; size_t available = 0;
GpuMemoryUsage(available, total); GpuMemoryUsage(available, total);
VLOG(10) << "GPU Usage " << available / 1024 / 1024 << "M/"
// Reserving the rest memory for page tables, etc. << total / 1024 / 1024 << "M";
size_t reserving = 0.05 * total; size_t reserving = static_cast<size_t>(0.05 * total);
// If available less than minimum chunk size, no usable memory exists. // If available less than minimum chunk size, no usable memory exists.
available = available =
std::max(std::max(available, GpuMinChunkSize()) - GpuMinChunkSize(), std::min(std::max(available, GpuMinChunkSize()) - GpuMinChunkSize(),
reserving) - total - reserving);
reserving;
// Reserving the rest memory for page tables, etc.
size_t allocating = FLAGS_fraction_of_gpu_memory_to_use * total; size_t allocating = static_cast<size_t>(FLAGS_fraction_of_gpu_memory_to_use *
(total - reserving));
PADDLE_ENFORCE_LT(allocating, available); PADDLE_ENFORCE_LE(allocating, available);
return allocating; return allocating;
} }
......
...@@ -31,7 +31,7 @@ namespace platform { ...@@ -31,7 +31,7 @@ namespace platform {
TEST(NCCL, init) { TEST(NCCL, init) {
std::vector<ncclComm_t> comms; std::vector<ncclComm_t> comms;
comms.resize(dev_count); comms.resize(dev_count);
PADDLE_ENFORCE(dynload::ncclCommInitAll(comms.data(), dev_count, nullptr)); dynload::ncclCommInitAll(comms.data(), dev_count, nullptr);
for (int i = 0; i < dev_count; ++i) { for (int i = 0; i < dev_count; ++i) {
dynload::ncclCommDestroy(comms[i]); dynload::ncclCommDestroy(comms[i]);
} }
...@@ -62,7 +62,7 @@ TEST(NCCL, all_reduce) { ...@@ -62,7 +62,7 @@ TEST(NCCL, all_reduce) {
std::vector<ncclComm_t> comms; std::vector<ncclComm_t> comms;
comms.resize(dev_count); comms.resize(dev_count);
VLOG(1) << "Initializing ncclComm"; VLOG(1) << "Initializing ncclComm";
PADDLE_ENFORCE(dynload::ncclCommInitAll(comms.data(), dev_count, nullptr)); dynload::ncclCommInitAll(comms.data(), dev_count, nullptr);
VLOG(1) << "ncclComm initialized"; VLOG(1) << "ncclComm initialized";
VLOG(1) << "Creating thread data"; VLOG(1) << "Creating thread data";
std::vector<std::unique_ptr<PerThreadData<double>>> data; std::vector<std::unique_ptr<PerThreadData<double>>> data;
......
...@@ -23,6 +23,7 @@ class PlacePrinter : public boost::static_visitor<> { ...@@ -23,6 +23,7 @@ class PlacePrinter : public boost::static_visitor<> {
public: public:
explicit PlacePrinter(std::ostream &os) : os_(os) {} explicit PlacePrinter(std::ostream &os) : os_(os) {}
void operator()(const CPUPlace &) { os_ << "CPUPlace"; } void operator()(const CPUPlace &) { os_ << "CPUPlace"; }
void operator()(const MKLDNNPlace &) { os_ << "MKLDNNPlace"; }
void operator()(const GPUPlace &p) { os_ << "GPUPlace(" << p.device << ")"; } void operator()(const GPUPlace &p) { os_ << "GPUPlace(" << p.device << ")"; }
private: private:
...@@ -38,12 +39,17 @@ const Place &get_place() { return the_default_place; } ...@@ -38,12 +39,17 @@ const Place &get_place() { return the_default_place; }
const GPUPlace default_gpu() { return GPUPlace(0); } const GPUPlace default_gpu() { return GPUPlace(0); }
const CPUPlace default_cpu() { return CPUPlace(); } const CPUPlace default_cpu() { return CPUPlace(); }
const MKLDNNPlace default_mkldnn() { return MKLDNNPlace(); }
bool is_gpu_place(const Place &p) { bool is_gpu_place(const Place &p) {
return boost::apply_visitor(IsGPUPlace(), p); return boost::apply_visitor(IsGPUPlace(), p);
} }
bool is_cpu_place(const Place &p) { bool is_cpu_place(const Place &p) {
return !boost::apply_visitor(IsGPUPlace(), p); return !is_gpu_place(p) && !is_mkldnn_place(p);
}
bool is_mkldnn_place(const Place &p) {
return boost::apply_visitor(IsMKLDNNPlace(), p);
} }
bool places_are_same_class(const Place &p1, const Place &p2) { bool places_are_same_class(const Place &p1, const Place &p2) {
......
...@@ -31,6 +31,14 @@ struct CPUPlace { ...@@ -31,6 +31,14 @@ struct CPUPlace {
inline bool operator!=(const CPUPlace &) const { return false; } inline bool operator!=(const CPUPlace &) const { return false; }
}; };
struct MKLDNNPlace {
MKLDNNPlace() {}
// needed for variant equality comparison
inline bool operator==(const MKLDNNPlace &) const { return true; }
inline bool operator!=(const MKLDNNPlace &) const { return false; }
};
struct GPUPlace { struct GPUPlace {
GPUPlace() : GPUPlace(0) {} GPUPlace() : GPUPlace(0) {}
explicit GPUPlace(int d) : device(d) {} explicit GPUPlace(int d) : device(d) {}
...@@ -43,21 +51,28 @@ struct GPUPlace { ...@@ -43,21 +51,28 @@ struct GPUPlace {
int device; int device;
}; };
struct CudnnPlace : public GPUPlace { struct CUDNNPlace : public GPUPlace {
CudnnPlace() : GPUPlace() {} CUDNNPlace() : GPUPlace() {}
explicit CudnnPlace(int d) : GPUPlace(d) {} explicit CUDNNPlace(int d) : GPUPlace(d) {}
}; };
struct IsGPUPlace : public boost::static_visitor<bool> { struct IsGPUPlace : public boost::static_visitor<bool> {
bool operator()(const CPUPlace &) const { return false; } bool operator()(const CPUPlace &) const { return false; }
bool operator()(const MKLDNNPlace &) const { return false; }
bool operator()(const GPUPlace &gpu) const { return true; } bool operator()(const GPUPlace &gpu) const { return true; }
}; };
struct IsMKLDNNPlace : public boost::static_visitor<bool> {
bool operator()(const MKLDNNPlace &) const { return true; }
bool operator()(const CPUPlace &) const { return false; }
bool operator()(const GPUPlace &) const { return false; }
};
// Define the max number of Place in bit length. i.e., the max number of places // Define the max number of Place in bit length. i.e., the max number of places
// should be less equal than 2^(NUM_PLACE_TYPE_LIMIT_IN_BIT) // should be less equal than 2^(NUM_PLACE_TYPE_LIMIT_IN_BIT)
#define NUM_PLACE_TYPE_LIMIT_IN_BIT 4 #define NUM_PLACE_TYPE_LIMIT_IN_BIT 4
typedef boost::variant<CudnnPlace, GPUPlace, CPUPlace> Place; typedef boost::variant<CUDNNPlace, GPUPlace, CPUPlace, MKLDNNPlace> Place;
// static check number of place types is less equal than // static check number of place types is less equal than
// 2^(NUM_PLACE_TYPE_LIMIT_IN_BIT) // 2^(NUM_PLACE_TYPE_LIMIT_IN_BIT)
...@@ -70,9 +85,11 @@ const Place &get_place(); ...@@ -70,9 +85,11 @@ const Place &get_place();
const GPUPlace default_gpu(); const GPUPlace default_gpu();
const CPUPlace default_cpu(); const CPUPlace default_cpu();
const MKLDNNPlace default_mkldnn();
bool is_gpu_place(const Place &); bool is_gpu_place(const Place &);
bool is_cpu_place(const Place &); bool is_cpu_place(const Place &);
bool is_mkldnn_place(const Place &);
bool places_are_same_class(const Place &, const Place &); bool places_are_same_class(const Place &, const Place &);
std::ostream &operator<<(std::ostream &, const Place &); std::ostream &operator<<(std::ostream &, const Place &);
......
...@@ -5,25 +5,37 @@ ...@@ -5,25 +5,37 @@
TEST(Place, Equality) { TEST(Place, Equality) {
paddle::platform::CPUPlace cpu; paddle::platform::CPUPlace cpu;
paddle::platform::GPUPlace g0(0), g1(1), gg0(0); paddle::platform::GPUPlace g0(0), g1(1), gg0(0);
paddle::platform::CUDNNPlace d0(0), d1(1), dd0(0);
EXPECT_EQ(cpu, cpu); EXPECT_EQ(cpu, cpu);
EXPECT_EQ(g0, g0); EXPECT_EQ(g0, g0);
EXPECT_EQ(g1, g1); EXPECT_EQ(g1, g1);
EXPECT_EQ(g0, gg0); EXPECT_EQ(g0, gg0);
EXPECT_EQ(d0, dd0);
EXPECT_NE(g0, g1); EXPECT_NE(g0, g1);
EXPECT_NE(d0, d1);
EXPECT_TRUE(paddle::platform::places_are_same_class(g0, gg0)); EXPECT_TRUE(paddle::platform::places_are_same_class(g0, gg0));
EXPECT_FALSE(paddle::platform::places_are_same_class(g0, cpu)); EXPECT_FALSE(paddle::platform::places_are_same_class(g0, cpu));
EXPECT_TRUE(paddle::platform::is_gpu_place(d0));
EXPECT_FALSE(paddle::platform::places_are_same_class(g0, d0));
} }
TEST(Place, Default) { TEST(Place, Default) {
EXPECT_TRUE(paddle::platform::is_gpu_place(paddle::platform::get_place())); EXPECT_TRUE(paddle::platform::is_gpu_place(paddle::platform::get_place()));
EXPECT_TRUE(paddle::platform::is_gpu_place(paddle::platform::default_gpu())); EXPECT_TRUE(paddle::platform::is_gpu_place(paddle::platform::default_gpu()));
EXPECT_TRUE(paddle::platform::is_cpu_place(paddle::platform::default_cpu())); EXPECT_TRUE(paddle::platform::is_cpu_place(paddle::platform::default_cpu()));
EXPECT_TRUE(
paddle::platform::is_mkldnn_place(paddle::platform::default_mkldnn()));
paddle::platform::set_place(paddle::platform::CPUPlace()); paddle::platform::set_place(paddle::platform::CPUPlace());
EXPECT_TRUE(paddle::platform::is_cpu_place(paddle::platform::get_place())); EXPECT_TRUE(paddle::platform::is_cpu_place(paddle::platform::get_place()));
paddle::platform::set_place(paddle::platform::MKLDNNPlace());
EXPECT_FALSE(paddle::platform::is_cpu_place(paddle::platform::get_place()));
EXPECT_TRUE(paddle::platform::is_mkldnn_place(paddle::platform::get_place()));
} }
TEST(Place, Print) { TEST(Place, Print) {
......
...@@ -14,6 +14,19 @@ ...@@ -14,6 +14,19 @@
#pragma once #pragma once
#ifdef __CUDACC__
#ifdef __CUDACC_VER_MAJOR__
// CUDA 9 define `__CUDACC_VER__` as a warning message, manually define
// __CUDACC_VER__ instead.
#undef __CUDACC_VER__
#define __CUDACC_VER__ \
(__CUDACC_VER_MAJOR__ * 10000 + __CUDACC_VER_MINOR__ * 100 + \
__CUDACC_VER_BUILD__)
#endif
#endif
#include <boost/config.hpp> #include <boost/config.hpp>
#ifdef PADDLE_WITH_CUDA #ifdef PADDLE_WITH_CUDA
......
if(WITH_PYTHON) if(WITH_PYTHON)
cc_library(paddle_pybind SHARED cc_library(paddle_pybind SHARED
SRCS pybind.cc exception.cc protobuf.cc SRCS pybind.cc exception.cc protobuf.cc
DEPS pybind python backward proto_desc paddle_memory executor prune DEPS pybind python backward proto_desc paddle_memory executor prune init
${GLOB_OP_LIB}) ${GLOB_OP_LIB})
endif(WITH_PYTHON) endif(WITH_PYTHON)
......
...@@ -16,11 +16,11 @@ limitations under the License. */ ...@@ -16,11 +16,11 @@ limitations under the License. */
#include <mutex> // for call_once #include <mutex> // for call_once
#include <unordered_map> #include <unordered_map>
#include "gflags/gflags.h"
#include "paddle/framework/backward.h" #include "paddle/framework/backward.h"
#include "paddle/framework/executor.h" #include "paddle/framework/executor.h"
#include "paddle/framework/feed_fetch_method.h" #include "paddle/framework/feed_fetch_method.h"
#include "paddle/framework/framework.pb.h" #include "paddle/framework/framework.pb.h"
#include "paddle/framework/init.h"
#include "paddle/framework/lod_rank_table.h" #include "paddle/framework/lod_rank_table.h"
#include "paddle/framework/lod_tensor.h" #include "paddle/framework/lod_tensor.h"
#include "paddle/framework/lod_tensor_array.h" #include "paddle/framework/lod_tensor_array.h"
...@@ -51,24 +51,6 @@ static size_t UniqueIntegerGenerator(const std::string &prefix) { ...@@ -51,24 +51,6 @@ static size_t UniqueIntegerGenerator(const std::string &prefix) {
return generators[prefix].fetch_add(1); return generators[prefix].fetch_add(1);
} }
std::once_flag gflags_init_flag;
// TODO(qijun) move init gflags to init.cc
void InitGflags(std::vector<std::string> &argv) {
std::call_once(gflags_init_flag, [&]() {
int argc = argv.size();
char **arr = new char *[argv.size()];
std::string line;
for (size_t i = 0; i < argv.size(); i++) {
arr[i] = &argv[i][0];
line += argv[i];
line += ' ';
}
google::ParseCommandLineFlags(&argc, &arr, true);
VLOG(1) << "Init commandline: " << line;
});
}
bool IsCompileGPU() { bool IsCompileGPU() {
#ifndef PADDLE_WITH_CUDA #ifndef PADDLE_WITH_CUDA
return false; return false;
...@@ -282,6 +264,23 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -282,6 +264,23 @@ All parameter, weight, gradient are variables in Paddle.
} }
return ret_values; return ret_values;
}); });
m.def("get_grad_op_descs",
[](const OpDescBind &op_desc,
const std::unordered_set<std::string> &no_grad_set,
std::unordered_map<std::string, std::string> &grad_to_var,
const std::vector<BlockDescBind *> &grad_sub_block) {
std::vector<std::unique_ptr<OpDescBind>> grad_op_descs =
framework::OpInfoMap::Instance()
.Get(op_desc.Type())
.GradOpMaker()(op_desc, no_grad_set, &grad_to_var,
grad_sub_block);
std::vector<OpDescBind *> grad_op_desc_ptrs(grad_op_descs.size());
std::transform(
grad_op_descs.begin(), grad_op_descs.end(),
grad_op_desc_ptrs.begin(),
[](std::unique_ptr<OpDescBind> &p) { return p.release(); });
return grad_op_desc_ptrs;
});
m.def("prune", [](const ProgramDescBind &origin, m.def("prune", [](const ProgramDescBind &origin,
const std::vector<std::array<size_t, 2>> &targets) { const std::vector<std::array<size_t, 2>> &targets) {
ProgramDescBind prog_with_targets(origin); ProgramDescBind prog_with_targets(origin);
...@@ -421,7 +420,8 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -421,7 +420,8 @@ All parameter, weight, gradient are variables in Paddle.
.def("run", &Executor::Run); .def("run", &Executor::Run);
m.def("unique_integer", UniqueIntegerGenerator); m.def("unique_integer", UniqueIntegerGenerator);
m.def("init_gflags", InitGflags); m.def("init_gflags", framework::InitGflags);
m.def("init_devices", &framework::InitDevices);
m.def("is_compile_gpu", IsCompileGPU); m.def("is_compile_gpu", IsCompileGPU);
m.def("set_feed_variable", framework::SetFeedVariable); m.def("set_feed_variable", framework::SetFeedVariable);
......
...@@ -2,6 +2,7 @@ ...@@ -2,6 +2,7 @@
build build
dist dist
paddle.egg-info paddle.egg-info
paddlepaddle_gpu.egg-info
.idea .idea
paddle/proto/*.py paddle/proto/*.py
paddle/proto/*.pyc paddle/proto/*.pyc
...@@ -1119,8 +1119,9 @@ def simple_gru2(input, ...@@ -1119,8 +1119,9 @@ def simple_gru2(input,
:param gru_bias_attr: bias parameter attribute of gru layer, :param gru_bias_attr: bias parameter attribute of gru layer,
False means no bias, None means default bias. False means no bias, None means default bias.
:type gru_bias_attr: ParameterAttribute|False|None :type gru_bias_attr: ParameterAttribute|False|None
:param gru_layer_attr: Extra attribute of the gru layer. :param gru_param_attr: param parameter attribute of gru layer,
:type gru_layer_attr: ExtraLayerAttribute None means default param.
:type gru_param_attr: ParameterAttribute|None
:return: the gru group. :return: the gru group.
:rtype: LayerOutput :rtype: LayerOutput
""" """
......
...@@ -4,7 +4,7 @@ import layers ...@@ -4,7 +4,7 @@ import layers
from framework import Program, unique_name, Variable from framework import Program, unique_name, Variable
from layer_helper import LayerHelper from layer_helper import LayerHelper
__all__ = ['Accuracy'] __all__ = ['Accuracy', 'ChunkEvaluator']
def _clone_var_(block, var): def _clone_var_(block, var):
...@@ -132,3 +132,74 @@ class Accuracy(Evaluator): ...@@ -132,3 +132,74 @@ class Accuracy(Evaluator):
correct = layers.cast(correct, dtype='float32', **kwargs) correct = layers.cast(correct, dtype='float32', **kwargs)
out = layers.elementwise_div(x=correct, y=total, **kwargs) out = layers.elementwise_div(x=correct, y=total, **kwargs)
return np.array(executor.run(eval_program, fetch_list=[out])[0]) return np.array(executor.run(eval_program, fetch_list=[out])[0])
class ChunkEvaluator(Evaluator):
"""
Accumulate counter numbers output by chunk_eval from mini-batches and
compute the precision recall and F1-score using the accumulated counter
numbers.
"""
def __init__(self,
input,
label,
chunk_scheme,
num_chunk_types,
excluded_chunk_types=None,
**kwargs):
super(ChunkEvaluator, self).__init__("chunk_eval", **kwargs)
main_program = self.helper.main_program
if main_program.current_block().idx != 0:
raise ValueError("You can only invoke Evaluator in root block")
self.num_infer_chunks = self.create_state(
dtype='int64', shape=[1], suffix='num_infer_chunks')
self.num_label_chunks = self.create_state(
dtype='int64', shape=[1], suffix='num_label_chunks')
self.num_correct_chunks = self.create_state(
dtype='int64', shape=[1], suffix='num_correct_chunks')
kwargs = {'main_program': main_program}
precision, recall, f1_score, num_infer_chunks, num_label_chunks, num_correct_chunks = layers.chunk_eval(
input=input,
label=label,
chunk_scheme=chunk_scheme,
num_chunk_types=num_chunk_types,
excluded_chunk_types=excluded_chunk_types,
**kwargs)
layers.sums(
input=[self.num_infer_chunks, num_infer_chunks],
out=self.num_infer_chunks,
**kwargs)
layers.sums(
input=[self.num_label_chunks, num_label_chunks],
out=self.num_label_chunks,
**kwargs)
layers.sums(
input=[self.num_correct_chunks, num_correct_chunks],
out=self.num_correct_chunks,
**kwargs)
self.metrics.extend([precision, recall, f1_score])
def eval(self, executor, eval_program=None):
if eval_program is None:
eval_program = Program()
block = eval_program.current_block()
kwargs = {'main_program': eval_program}
num_infer_chunks, num_label_chunks, num_correct_chunks = executor.run(
eval_program,
fetch_list=[_clone_var_(block, state) for state in self.states])
num_infer_chunks = num_infer_chunks[0]
num_label_chunks = num_label_chunks[0]
num_correct_chunks = num_correct_chunks[0]
precision = float(
num_correct_chunks) / num_infer_chunks if num_infer_chunks else 0
recall = float(
num_correct_chunks) / num_label_chunks if num_label_chunks else 0
f1_score = float(2 * precision * recall) / (
precision + recall) if num_correct_chunks else 0
return np.array(
[precision], dtype='float32'), np.array(
[recall], dtype='float32'), np.array(
[f1_score], dtype='float32')
...@@ -46,6 +46,13 @@ class Executor(object): ...@@ -46,6 +46,13 @@ class Executor(object):
p.set_place(each) p.set_place(each)
act_places.append(p) act_places.append(p)
# TODO(dzhwinter) : consider that our fluid tests all written in
# GPUPlace(gpu_id), this will be changed in next PR.
if core.is_compile_gpu():
core.init_devices(["CPU", "GPU:0"])
else:
core.init_devices(["CPU"])
self.executor = core.Executor(act_places) self.executor = core.Executor(act_places)
self.places = places self.places = places
......
import ops
from ops import *
import nn
from nn import *
import io
from io import *
import tensor
from tensor import *
import control_flow
from control_flow import *
__all__ = []
__all__ += nn.__all__
__all__ += io.__all__
__all__ += tensor.__all__
__all__ += control_flow.__all__
__all__ += ops.__all__
from ..layer_helper import LayerHelper, unique_name
from ..framework import Program, Variable, Operator
from .. import core
from tensor import assign, fill_constant
import contextlib import contextlib
import proto.framework_pb2 as framework_pb2
import core
from framework import OpProtoHolder, Variable, Program, Operator
from initializer import Constant, Normal, Xavier, Initializer
from paddle.v2.fluid.layer_helper import LayerHelper, unique_name
from registry import register_layer
from param_attr import ParamAttr
__all__ = [ __all__ = [
'fc', 'data', 'cross_entropy', 'conv2d', 'pool2d', 'embedding', 'concat', 'split_lod_tensor', 'merge_lod_tensor', 'BlockGuard', 'StaticRNNGuard',
'StaticRNN', 'cast', 'sequence_conv', 'sequence_pool', 'sums', 'cos_sim', 'StaticRNNMemoryLink', 'WhileGuard', 'While', 'lod_rank_table',
'batch_norm', 'accuracy', 'split_lod_tensor', 'While' 'max_sequence_len', 'topk', 'lod_tensor_to_array', 'array_to_lod_tensor',
] 'increment', 'array_write', 'create_array', 'less_than', 'array_read',
'shrink_memory', 'array_length', 'IfElse', 'DynamicRNN', 'ConditionalBlock',
_REGISTER_LAYER_FROM_OPS = [ 'StaticRNN'
'mean', 'mul', 'dropout', 'reshape', 'sigmoid', 'scale', 'transpose',
'sigmoid_cross_entropy_with_logits', 'elementwise_add', 'elementwise_div',
'elementwise_sub', 'elementwise_mul', 'clip', 'abs'
] ]
for _OP in set(_REGISTER_LAYER_FROM_OPS):
globals()[_OP] = register_layer(_OP)
__all__.append(_OP)
def fc(input,
size,
num_flatten_dims=1,
param_attr=None,
bias_attr=None,
act=None,
name=None,
main_program=None,
startup_program=None):
"""
Fully Connected Layer.
Args:
input: The input tensor to the function
size: The size of the layer
num_flatten_dims: Number of columns in input
param_attr: The parameters/weights to the FC Layer
param_initializer: Initializer used for the weight/parameter. If None, XavierInitializer() is used
bias_attr: The bias parameter for the FC layer
bias_initializer: Initializer used for the bias. If None, then ConstantInitializer() is used
act: Activation to be applied to the output of FC layer
name: Name/alias of the function
main_program: Name of the main program that calls this
startup_program: Name of the startup program
This function can take in multiple inputs and performs the Fully Connected
function (linear transformation) on top of each of them.
So for input x, the output will be : Wx + b. Where W is the parameter,
b the bias and x is the input.
The function also applies an activation (non-linearity) on top of the
output, if activation is passed in the input.
All the input variables of this function are passed in as local variables
to the LayerHelper constructor.
"""
helper = LayerHelper('fc', **locals())
dtype = helper.input_dtype()
mul_results = []
for input_var, param_attr in helper.iter_inputs_and_params():
input_shape = input_var.shape
param_shape = [
reduce(lambda a, b: a * b, input_shape[num_flatten_dims:], 1)
] + [size]
w = helper.create_parameter(
attr=param_attr, shape=param_shape, dtype=dtype, is_bias=False)
tmp = helper.create_tmp_variable(dtype)
helper.append_op(
type="mul",
inputs={
"X": input_var,
"Y": w,
},
outputs={"Out": tmp},
attrs={'x_num_col_dims': num_flatten_dims,
'y_num_col_dims': 1})
mul_results.append(tmp)
# sum
if len(mul_results) == 1:
pre_bias = mul_results[0]
else:
pre_bias = helper.create_tmp_variable(dtype)
helper.append_op(
type="sum", inputs={"X": mul_results}, outputs={"Out": pre_bias})
# add bias
pre_activation = helper.append_bias_op(pre_bias)
# add activation
return helper.append_activation(pre_activation)
def embedding(input,
size,
is_sparse=False,
param_attr=None,
dtype='float32',
main_program=None,
startup_program=None):
"""
Embedding Layer.
Args:
param_initializer:
input: The input to the function
size: The size of the layer
is_sparse: A flag that decleares whether the input is sparse
param_attr: Parameters for this layer
dtype: The type of data : float32, float_16, int etc
main_program: Name of the main program that calls this
startup_program: Name of the startup program
This function can take in the input (which is a vector of IDs) and
performs a lookup in the lookup_table using these IDs, to result into
the embedding of each ID in the input.
All the input variables of this function are passed in as local variables
to the LayerHelper constructor.
"""
helper = LayerHelper('embedding', **locals())
w = helper.create_parameter(
attr=helper.param_attr, shape=size, dtype=dtype, is_bias=False)
tmp = helper.create_tmp_variable(dtype)
helper.append_op(
type='lookup_table',
inputs={'Ids': input,
'W': w},
outputs={'Out': tmp},
attrs={'is_sparse': is_sparse})
return tmp
# TODO(qijun): expose H0 and C0
def dynamic_lstm(input,
size,
param_attr=None,
bias_attr=None,
use_peepholes=True,
is_reverse=False,
gate_activation='sigmoid',
cell_activation='tanh',
candidate_activation='tanh',
dtype='float32',
main_program=None,
startup_program=None):
helper = LayerHelper('lstm', **locals())
size = size / 4
weight = helper.create_parameter(
attr=helper.param_attr, shape=[size, 4 * size], dtype=dtype)
bias_size = [1, 7 * size]
if not use_peepholes:
bias_size[1] = 4 * size
bias = helper.create_parameter(
attr=helper.bias_attr, shape=bias_size, dtype=dtype, is_bias=True)
hidden = helper.create_tmp_variable(dtype)
cell = helper.create_tmp_variable(dtype)
batch_gate = helper.create_tmp_variable(dtype)
batch_cell_pre_act = helper.create_tmp_variable(dtype)
helper.append_op(
type='lstm',
inputs={'Input': input,
'Weight': weight,
'Bias': bias},
outputs={
'Hidden': hidden,
'Cell': cell,
'BatchGate': batch_gate,
'BatchCellPreAct': batch_cell_pre_act
},
attrs={
'use_peepholes': use_peepholes,
'is_reverse': is_reverse,
'gate_activation': gate_activation,
'cell_activation': cell_activation,
'candidate_activation': candidate_activation
})
return hidden, cell
def gru_unit(input,
hidden,
size,
weight=None,
bias=None,
activation='tanh',
gate_activation='sigmoid',
main_program=None,
startup_program=None):
"""
GRUUnit Operator implements partial calculations of the GRU unit as following:
$$
update \ gate: u_t = actGate(xu_t + W_u * h_{t-1} + b_u) \\
reset \ gate: r_t = actGate(xr_t + W_r * h_{t-1} + b_r) \\
output \ candidate: {h}_t = actNode(xc_t + W_c * dot(r_t, h_{t-1}) + b_c) \\
output: h_t = dot((1 - u_t), h_{t-1}) + dot(u_t, {h}_t)
$$
which is same as one time step of GRU Operator.
@note To implement the complete GRU unit, fully-connected operator must be
used before to feed xu, xr and xc as the Input of GRUUnit operator.
TODO(ChunweiYan) add more document here
"""
activation_dict = dict(
identity=0,
sigmoid=1,
tanh=2,
relu=3, )
activation = activation_dict[activation]
gate_activation = activation_dict[gate_activation]
helper = LayerHelper('gru_unit', **locals())
dtype = helper.input_dtype()
size = size / 3
# create weight
if weight is None:
weight = helper.create_parameter(
attr=helper.param_attr, shape=[size, 3 * size], dtype=dtype)
# create bias
if bias is None:
bias_size = [1, 3 * size]
bias = helper.create_parameter(
attr=helper.bias_attr, shape=bias_size, dtype=dtype, is_bias=True)
gate = helper.create_tmp_variable(dtype)
reset_hidden_pre = helper.create_tmp_variable(dtype)
updated_hidden = helper.create_tmp_variable(dtype)
helper.append_op(
type='gru_unit',
inputs={'Input': input,
'HiddenPrev': hidden,
'Weight': weight},
outputs={
'Gate': gate,
'ResetHiddenPrev': reset_hidden_pre,
'Hidden': updated_hidden,
},
attrs={
'activation': 0,
'gate_activation': 1,
})
return updated_hidden, reset_hidden_pre, gate
def data(name,
shape,
append_batch_size=True,
dtype='float32',
lod_level=0,
type=core.VarDesc.VarType.LOD_TENSOR,
main_program=None,
startup_program=None,
stop_gradient=True):
"""
Data Layer.
Args:
name: The name/alias of the function
shape: Tuple declaring the shape.
append_batch_size: Whether or not to append the data as a batch.
dtype: The type of data : float32, float_16, int etc
type: The output type. By default it is LOD_TENSOR.
lod_level(int): The LoD Level. 0 means the input data is not a sequence.
main_program: Name of the main program that calls this
startup_program: Name of the startup program
stop_gradient: A boolean that mentions whether gradient should flow.
This function takes in input and based on whether data has
to be returned back as a minibatch, it creates the global variable using
the helper functions. The global variables can be accessed by all the
following operations and layers in the graph.
All the input variables of this function are passed in as local variables
to the LayerHelper constructor.
"""
helper = LayerHelper('data', **locals())
shape = list(shape)
for i in xrange(len(shape)):
if shape[i] is None:
shape[i] = -1
append_batch_size = False
elif shape[i] < 0:
append_batch_size = False
if append_batch_size:
shape = [-1] + shape # append batch size as -1
return helper.create_global_variable(
name=name,
shape=shape,
dtype=dtype,
type=type,
stop_gradient=stop_gradient,
lod_level=lod_level)
def create_tensor(dtype, name=None, main_program=None, startup_program=None):
helper = LayerHelper("create_tensor", **locals())
return helper.create_variable(name=helper.name, dtype=dtype)
def cast(x, dtype, main_program=None):
"""
This function takes in the input with input_dtype
and casts it to the output_dtype as the output.
"""
helper = LayerHelper('cast', **locals())
out = helper.create_tmp_variable(dtype=dtype)
helper.append_op(
type='cast',
inputs={'X': [x]},
outputs={'Out': [out]},
attrs={'in_dtype': x.dtype,
'out_dtype': out.dtype})
return out
def concat(input, axis, main_program=None, startup_program=None):
"""
This function concats the input along the axis mentioned
and returns that as the output.
"""
helper = LayerHelper('concat', **locals())
out = helper.create_tmp_variable(dtype=helper.input_dtype())
helper.append_op(
type='concat',
inputs={'X': input},
outputs={'Out': [out]},
attrs={'axis': axis})
return out
def sums(input, out=None, main_program=None, startup_program=None):
"""
This function takes in the input and performs the sum operation on it
and returns that as the output.
"""
helper = LayerHelper('sum', **locals())
if out is None:
out = helper.create_tmp_variable(dtype=helper.input_dtype())
helper.append_op(type='sum', inputs={'X': input}, outputs={'Out': out})
return out
def linear_chain_crf(input,
label,
param_attr=None,
main_program=None,
startup_program=None):
helper = LayerHelper('linear_chain_crf', **locals())
size = input.shape[1]
transition = helper.create_parameter(
attr=helper.param_attr,
shape=[size + 2, size],
dtype=helper.input_dtype())
alpha = helper.create_tmp_variable(dtype=helper.input_dtype())
emission_exps = helper.create_tmp_variable(dtype=helper.input_dtype())
transition_exps = helper.create_tmp_variable(dtype=helper.input_dtype())
log_likelihood = helper.create_tmp_variable(dtype=helper.input_dtype())
helper.append_op(
type='linear_chain_crf',
inputs={"Emission": [input],
"Transition": transition,
"Label": label},
outputs={
"Alpha": [alpha],
"EmissionExps": [emission_exps],
"TransitionExps": transition_exps,
"LogLikelihood": log_likelihood
})
return log_likelihood
def crf_decoding(input,
param_attr,
label=None,
main_program=None,
startup_program=None):
helper = LayerHelper('crf_decoding', **locals())
transition = helper.get_parameter(param_attr.name)
viterbi_path = helper.create_tmp_variable(dtype=helper.input_dtype())
helper.append_op(
type='crf_decoding',
inputs={"Emission": [input],
"Transition": transition,
"Label": label},
outputs={"ViterbiPath": [viterbi_path]})
return viterbi_path
def assign(input, output, main_program=None, startup_program=None):
helper = LayerHelper('assign', **locals())
helper.append_op(
type='scale',
inputs={'X': [input]},
outputs={'Out': [output]},
attrs={'scale': 1.0})
return output
def split_lod_tensor(input, def split_lod_tensor(input,
mask, mask,
...@@ -460,404 +54,6 @@ def merge_lod_tensor(in_true, ...@@ -460,404 +54,6 @@ def merge_lod_tensor(in_true,
return out return out
def cos_sim(X, Y, **kwargs):
"""
This function performs the cosine similarity between two tensors
X and Y and returns that as the output.
"""
helper = LayerHelper('cos_sim', **kwargs)
out = helper.create_tmp_variable(dtype=X.dtype)
xnorm = helper.create_tmp_variable(dtype=X.dtype)
ynorm = helper.create_tmp_variable(dtype=X.dtype)
helper.append_op(
type='cos_sim',
inputs={'X': [X],
'Y': [Y]},
outputs={'Out': [out],
'XNorm': [xnorm],
'YNorm': [ynorm]})
return out
def cross_entropy(input, label, **kwargs):
"""
This function computes cross_entropy using the input and label.
"""
helper = LayerHelper('cross_entropy', **kwargs)
out = helper.create_tmp_variable(dtype=input.dtype)
helper.append_op(
type='cross_entropy',
inputs={'X': [input],
'Label': [label]},
outputs={'Y': [out]},
attrs=kwargs)
return out
def square_error_cost(input, label, **kwargs):
"""
This functions returns the squared error cost using the input and label.
The output is appending the op to do the above.
"""
helper = LayerHelper('square_error_cost', **kwargs)
minus_out = helper.create_tmp_variable(dtype=input.dtype)
helper.append_op(
type='elementwise_sub',
inputs={'X': [input],
'Y': [label]},
outputs={'Out': [minus_out]})
square_out = helper.create_tmp_variable(dtype=input.dtype)
helper.append_op(
type='square', inputs={'X': [minus_out]}, outputs={'Y': [square_out]})
return square_out
def accuracy(input, label, k=1, correct=None, total=None, **kwargs):
"""
This function computes the accuracy using the input and label.
The output is the top_k inputs and their indices.
"""
helper = LayerHelper("accuracy", **kwargs)
topk_out = helper.create_tmp_variable(dtype=input.dtype)
topk_indices = helper.create_tmp_variable(dtype="int64")
helper.append_op(
type="top_k",
inputs={"X": [input]},
outputs={"Out": [topk_out],
"Indices": [topk_indices]},
attrs={"k": k})
acc_out = helper.create_tmp_variable(dtype="float32")
if correct is None:
correct = helper.create_tmp_variable(dtype="int64")
if total is None:
total = helper.create_tmp_variable(dtype="int64")
helper.append_op(
type="accuracy",
inputs={
"Out": [topk_out],
"Indices": [topk_indices],
"Label": [label]
},
outputs={
"Accuracy": [acc_out],
"Correct": [correct],
"Total": [total],
})
return acc_out
def chunk_eval(input,
label,
chunk_scheme,
num_chunk_types,
excluded_chunk_types=None,
**kwargs):
"""
This function computes the accuracy using the input and label.
The output is the top_k inputs and their indices.
"""
helper = LayerHelper("chunk_eval", **kwargs)
# prepare output
precision = helper.create_tmp_variable(dtype="float32")
recall = helper.create_tmp_variable(dtype="float32")
f1_score = helper.create_tmp_variable(dtype="float32")
helper.append_op(
type="chunk_eval",
inputs={"Inference": [input],
"Label": [label]},
outputs={
"Precision": [precision],
"Recall": [recall],
"F1-Score": [f1_score]
},
attrs={
"num_chunk_types": num_chunk_types,
'chunk_scheme': chunk_scheme,
'excluded_chunk_types': excluded_chunk_types or []
})
return precision, recall, f1_score
def sequence_conv(input,
num_filters,
filter_size=3,
filter_stride=1,
padding=None,
bias_attr=None,
param_attr=None,
act=None,
main_program=None,
startup_program=None):
"""
This function creates the op for sequence_conv, using the inputs and
other convolutional configurations for the filters and stride as given
in the input parameters to the function.
"""
# FIXME(dzh) : want to unify the argument of python layer
# function. So we ignore some unecessary attributes.
# such as, padding_trainable, context_start.
helper = LayerHelper('sequence_conv', **locals())
dtype = helper.input_dtype()
filter_shape = [filter_size * input.shape[1], num_filters]
filter_param = helper.create_parameter(
attr=helper.param_attr, shape=filter_shape, dtype=dtype)
pre_bias = helper.create_tmp_variable(dtype)
helper.append_op(
type='sequence_conv',
inputs={
'X': [input],
'Filter': [filter_param],
},
outputs={"Out": pre_bias},
attrs={
'contextStride': filter_stride,
'contextStart': -int(filter_size / 2),
'contextLength': filter_size
})
pre_act = helper.append_bias_op(pre_bias)
return helper.append_activation(pre_act)
def conv2d(input,
num_filters,
filter_size,
stride=None,
padding=None,
groups=None,
param_attr=None,
bias_attr=None,
act=None,
name=None,
main_program=None,
startup_program=None):
"""
This function creates the op for a 2-dimensional Convolution.
This is performed using the parameters of filters(size, dimensionality etc)
, stride and other configurations for a Convolution operation.
This funciton can also append an activation on top of the
conv-2d output, if mentioned in the input parameters.
"""
if stride is None:
stride = [1, 1]
helper = LayerHelper('conv2d', **locals())
dtype = helper.input_dtype()
num_channels = input.shape[1]
if groups is None:
num_filter_channels = num_channels
else:
if num_channels % groups != 0:
raise ValueError("num_channels must be divisible by groups.")
num_filter_channels = num_channels / groups
if isinstance(filter_size, int):
filter_size = [filter_size, filter_size]
if isinstance(stride, int):
stride = [stride, stride]
if isinstance(padding, int):
padding = [padding, padding]
input_shape = input.shape
filter_shape = [num_filters, num_filter_channels] + filter_size
def _get_default_param_initializer():
std = (2.0 / (filter_size[0]**2 * num_channels))**0.5
return Normal(0.0, std, 0)
filter_param = helper.create_parameter(
attr=helper.param_attr,
shape=filter_shape,
dtype=dtype,
default_initializer=_get_default_param_initializer())
pre_bias = helper.create_tmp_variable(dtype)
helper.append_op(
type='conv2d_cudnn',
inputs={
'Input': input,
'Filter': filter_param,
},
outputs={"Output": pre_bias},
attrs={'strides': stride,
'paddings': padding,
'groups': groups})
pre_act = helper.append_bias_op(pre_bias, dim_start=1, dim_end=2)
return helper.append_activation(pre_act)
def sequence_pool(input, pool_type, **kwargs):
"""
This function add the operator for sequence pooling.
This is applied on top of the input using pool_type mentioned
in the parameters.
"""
helper = LayerHelper('sequence_pool', input=input, **kwargs)
dtype = helper.input_dtype()
pool_out = helper.create_tmp_variable(dtype)
max_index = helper.create_tmp_variable(dtype)
helper.append_op(
type="sequence_pool",
inputs={"X": input},
outputs={"Out": pool_out,
"MaxIndex": max_index},
attrs={"pooltype": pool_type.upper()})
return pool_out
def pool2d(input,
pool_size,
pool_type,
pool_stride=None,
pool_padding=None,
global_pooling=False,
main_program=None,
startup_program=None):
"""
This function adds the operator for pooling in 2 dimensions, using the
pooling configurations mentioned in input parameters.
"""
if pool_padding is None:
pool_padding = [0, 0]
if pool_stride is None:
pool_stride = [1, 1]
if pool_type not in ["max", "avg"]:
raise ValueError(
"Unknown pool_type: '%s'. It can only be 'max' or 'avg'.",
str(pool_type))
if isinstance(pool_size, int):
pool_size = [pool_size, pool_size]
if isinstance(pool_stride, int):
pool_stride = [pool_stride, pool_stride]
if isinstance(pool_padding, int):
pool_padding = [pool_padding, pool_padding]
helper = LayerHelper('pool2d', **locals())
dtype = helper.input_dtype()
pool_out = helper.create_tmp_variable(dtype)
helper.append_op(
type="pool2d",
inputs={"X": input},
outputs={"Out": pool_out},
attrs={
"pooling_type": pool_type,
"ksize": pool_size,
"global_pooling": global_pooling,
"strides": pool_stride,
"paddings": pool_padding
})
return pool_out
def batch_norm(input,
act=None,
is_test=False,
momentum=0.9,
epsilon=1e-05,
param_attr=None,
bias_attr=None,
data_layout='NCHW',
main_program=None,
startup_program=None):
"""
This function helps create an operator to implement
the BatchNorm layer using the configurations from the input parameters.
"""
helper = LayerHelper('batch_norm', **locals())
dtype = helper.input_dtype()
input_shape = input.shape
if data_layout == 'NCHW':
channel_num = input_shape[1]
else:
if data_layout == 'NHWC':
channel_num = input_shape[-1]
else:
raise ValueError("unsupported data layout:" + data_layout)
param_shape = [channel_num]
# create parameter
scale = helper.create_parameter(
attr=helper.param_attr,
shape=param_shape,
dtype=dtype,
default_initializer=Constant(1.0))
bias = helper.create_parameter(
attr=helper.param_attr, shape=param_shape, dtype=dtype, is_bias=True)
mean = helper.create_global_variable(
dtype=input.dtype, shape=param_shape, persistable=True)
helper.set_variable_initializer(var=mean, initializer=Constant(0.0))
variance = helper.create_global_variable(
dtype=input.dtype, shape=param_shape, persistable=True)
helper.set_variable_initializer(var=variance, initializer=Constant(1.0))
# create output
# mean and mean_out share the same memory
mean_out = mean
# variance and variance out share the same memory
variance_out = variance
saved_mean = helper.create_tmp_variable(dtype)
saved_variance = helper.create_tmp_variable(dtype)
batch_norm_out = helper.create_tmp_variable(dtype)
helper.append_op(
type="batch_norm",
inputs={
"X": input,
"Scale": scale,
"Bias": bias,
"Mean": mean,
"Variance": variance
},
outputs={
"Y": batch_norm_out,
"MeanOut": mean_out,
"VarianceOut": variance_out,
"SavedMean": saved_mean,
"SavedVariance": saved_variance
},
attrs={"momentum": momentum,
"epsilon": epsilon,
"is_test": is_test})
return helper.append_activation(batch_norm_out)
def beam_search_decode(ids, scores, main_program=None, startup_program=None):
helper = LayerHelper('beam_search_decode', **locals())
sentence_ids = helper.create_tmp_variable(dtype=ids.dtype)
sentence_scores = helper.create_tmp_variable(dtype=ids.dtype)
helper.append_op(
type="beam_search_decode",
inputs={"Ids": ids,
"Scores": scores},
outputs={
"SentenceIds": sentence_ids,
"SentenceScores": sentence_scores
})
return sentence_ids, sentence_scores
class BlockGuard(object): class BlockGuard(object):
""" """
BlockGuard class. BlockGuard class.
...@@ -1210,50 +406,6 @@ class While(object): ...@@ -1210,50 +406,6 @@ class While(object):
attrs={'sub_block': while_block}) attrs={'sub_block': while_block})
def lstm(x,
c_pre_init,
hidden_dim,
forget_bias=None,
main_program=None,
startup_program=None):
"""
This function helps create an operator for the LSTM (Long Short Term
Memory) cell that can be used inside an RNN.
"""
helper = LayerHelper('lstm_unit', **locals())
rnn = StaticRNN()
with rnn.step():
c_pre = rnn.memory(init=c_pre_init)
x_t = rnn.step_input(x)
before_fc = concat(
input=[x_t, c_pre],
axis=1,
main_program=main_program,
startup_program=startup_program)
after_fc = fc(input=before_fc,
size=hidden_dim * 4,
main_program=main_program,
startup_program=startup_program)
dtype = x.dtype
c = helper.create_tmp_variable(dtype)
h = helper.create_tmp_variable(dtype)
helper.append_op(
type='lstm_unit',
inputs={"X": after_fc,
"C_prev": c_pre},
outputs={"C": c,
"H": h},
attrs={"forget_bias": forget_bias})
rnn.update_memory(c_pre, c)
rnn.output(h)
return rnn()
def lod_rank_table(x, level=0, main_program=None): def lod_rank_table(x, level=0, main_program=None):
""" """
This function creates an operator for creating a LOD_RANK_TABLE This function creates an operator for creating a LOD_RANK_TABLE
...@@ -1331,72 +483,6 @@ def array_to_lod_tensor(x, table, main_program=None, startup_program=None): ...@@ -1331,72 +483,6 @@ def array_to_lod_tensor(x, table, main_program=None, startup_program=None):
return tmp return tmp
def fill_constant(shape,
dtype,
value,
out=None,
main_program=None,
startup_program=None):
"""
This function creates a tensor , with shape as mentioned in the input and
specified dtype and fills this up with a constant value that
comes in the input. It also sets the stop_gradient to be True.
"""
helper = LayerHelper("fill_constant", **locals())
if out is None:
out = helper.create_tmp_variable(dtype=dtype)
helper.append_op(
type='fill_constant',
inputs={},
outputs={'Out': [out]},
attrs={'shape': shape,
'dtype': out.dtype,
'value': float(value)})
out.stop_gradient = True
return out
def fill_constant_batch_size_like(input,
shape,
dtype,
value,
input_dim_idx=0,
output_dim_idx=0,
main_program=None,
startup_program=None):
helper = LayerHelper("fill_constant_batch_size_like", **locals())
out = helper.create_tmp_variable(dtype=dtype)
helper.append_op(
type='fill_constant_batch_size_like',
inputs={'Input': input},
outputs={'Out': [out]},
attrs={
'shape': shape,
'dtype': out.dtype,
'value': float(value),
'input_dim_idx': input_dim_idx,
'output_dim_idx': output_dim_idx
})
out.stop_gradient = True
return out
def ones(shape, dtype, main_program=None):
"""
This function performs the same function as fill_constant() declared above
with the constant value being 1.0.
"""
return fill_constant(value=1.0, **locals())
def zeros(shape, dtype, main_program=None):
"""
This function performs the same function as fill_constant() declared above
with the constant value being 0.0.
"""
return fill_constant(value=0.0, **locals())
def increment(x, def increment(x,
value=1.0, value=1.0,
in_place=True, in_place=True,
...@@ -1508,95 +594,6 @@ def array_length(array, main_program=None): ...@@ -1508,95 +594,6 @@ def array_length(array, main_program=None):
return tmp return tmp
def conv2d_transpose(input,
num_filters,
output_size=None,
filter_size=None,
padding=None,
stride=None,
param_attr=None,
main_program=None,
startup_program=None):
"""
The transpose of conv2d layer.
This layer is also known as deconvolution layer.
Args:
input(Variable): The input image with [N, C, H, W] format.
num_filters(int): The number of filter. It is as same as the output
image channel.
output_size(int|tuple|None): The output image size. If output size is a
tuple, it must contain two integers, (image_H, image_W). This
parameter only works when filter_size is None.
filter_size(int|tuple|None): The filter size. If filter_size is a tuple,
it must contain two integers, (filter_size_H, filter_size_W).
Otherwise, the filter will be a square. None if use output size to
calculate filter_size
padding(int|tuple): The padding size. If padding is a tuple, it must
contain two integers, (padding_H, padding_W). Otherwise, the
padding_H = padding_W = padding.
stride(int|tuple): The stride size. If stride is a tuple, it must
contain two integers, (stride_H, stride_W). Otherwise, the
stride_H = stride_W = stride.
param_attr: Parameter Attribute.
main_program(Program): the main program
startup_program(Program): the startup program
Returns:
Variable: Output image.
"""
helper = LayerHelper("conv2d_transpose", **locals())
if not isinstance(input, Variable):
raise TypeError("Input of conv2d_transpose must be Variable")
input_channel = input.shape[1]
op_attr = dict()
if isinstance(padding, int):
op_attr['paddings'] = [padding, padding]
elif padding is not None:
op_attr['paddings'] = padding
if isinstance(stride, int):
op_attr['strides'] = stride
elif stride is not None:
op_attr['strides'] = stride
if filter_size is None:
if output_size is None:
raise ValueError("output_size must be set when filter_size is None")
if isinstance(output_size, int):
output_size = [output_size, output_size]
padding = op_attr.get('paddings', [0, 0])
stride = op_attr.get('strides', [1, 1])
h_in = input.shape[2]
w_in = input.shape[3]
filter_size_h = output_size[0] - \
(h_in - 1) * stride[0] + 2 * padding[0]
filter_size_w = output_size[1] - \
(w_in - 1) * stride[1] + 2 * padding[1]
filter_size = [filter_size_h, filter_size_w]
elif isinstance(filter_size, int):
filter_size = [filter_size, filter_size]
filter_shape = [input_channel, num_filters] + filter_size
img_filter = helper.create_parameter(
dtype=input.dtype, shape=filter_shape, attr=helper.param_attr)
out = helper.create_tmp_variable(dtype=input.dtype)
helper.append_op(
type='conv2d_transpose',
inputs={'Input': [input],
'Filter': [img_filter]},
outputs={'Output': out},
attrs=op_attr)
return out
class ConditionalBlockGuard(BlockGuard): class ConditionalBlockGuard(BlockGuard):
def __init__(self, block): def __init__(self, block):
if not isinstance(block, ConditionalBlock): if not isinstance(block, ConditionalBlock):
......
from .. import core
from ..layer_helper import LayerHelper
__all__ = ['data']
def data(name,
shape,
append_batch_size=True,
dtype='float32',
lod_level=0,
type=core.VarDesc.VarType.LOD_TENSOR,
main_program=None,
startup_program=None,
stop_gradient=True):
"""
Data Layer.
Args:
name: The name/alias of the function
shape: Tuple declaring the shape.
append_batch_size: Whether or not to append the data as a batch.
dtype: The type of data : float32, float_16, int etc
type: The output type. By default it is LOD_TENSOR.
lod_level(int): The LoD Level. 0 means the input data is not a sequence.
main_program: Name of the main program that calls this
startup_program: Name of the startup program
stop_gradient: A boolean that mentions whether gradient should flow.
This function takes in input and based on whether data has
to be returned back as a minibatch, it creates the global variable using
the helper functions. The global variables can be accessed by all the
following operations and layers in the graph.
All the input variables of this function are passed in as local variables
to the LayerHelper constructor.
"""
helper = LayerHelper('data', **locals())
shape = list(shape)
for i in xrange(len(shape)):
if shape[i] is None:
shape[i] = -1
append_batch_size = False
elif shape[i] < 0:
append_batch_size = False
if append_batch_size:
shape = [-1] + shape # append batch size as -1
return helper.create_global_variable(
name=name,
shape=shape,
dtype=dtype,
type=type,
stop_gradient=stop_gradient,
lod_level=lod_level)
"""
All layers just related to the neural network.
"""
from ..layer_helper import LayerHelper
from ..initializer import Normal, Constant
from ..framework import Variable
__all__ = [
'fc', 'embedding', 'dynamic_lstm', 'gru_unit', 'linear_chain_crf',
'crf_decoding', 'cos_sim', 'cross_entropy', 'square_error_cost', 'accuracy',
'chunk_eval', 'sequence_conv', 'conv2d', 'sequence_pool', 'pool2d',
'batch_norm', 'beam_search_decode', 'conv2d_transpose'
]
def fc(input,
size,
num_flatten_dims=1,
param_attr=None,
bias_attr=None,
act=None,
name=None,
main_program=None,
startup_program=None):
"""
Fully Connected Layer.
Args:
input: The input tensor to the function
size: The size of the layer
num_flatten_dims: Number of columns in input
param_attr: The parameters/weights to the FC Layer
param_initializer: Initializer used for the weight/parameter. If None, XavierInitializer() is used
bias_attr: The bias parameter for the FC layer
bias_initializer: Initializer used for the bias. If None, then ConstantInitializer() is used
act: Activation to be applied to the output of FC layer
name: Name/alias of the function
main_program: Name of the main program that calls this
startup_program: Name of the startup program
This function can take in multiple inputs and performs the Fully Connected
function (linear transformation) on top of each of them.
So for input x, the output will be : Wx + b. Where W is the parameter,
b the bias and x is the input.
The function also applies an activation (non-linearity) on top of the
output, if activation is passed in the input.
All the input variables of this function are passed in as local variables
to the LayerHelper constructor.
"""
helper = LayerHelper('fc', **locals())
dtype = helper.input_dtype()
mul_results = []
for input_var, param_attr in helper.iter_inputs_and_params():
input_shape = input_var.shape
param_shape = [
reduce(lambda a, b: a * b, input_shape[num_flatten_dims:], 1)
] + [size]
w = helper.create_parameter(
attr=param_attr, shape=param_shape, dtype=dtype, is_bias=False)
tmp = helper.create_tmp_variable(dtype)
helper.append_op(
type="mul",
inputs={
"X": input_var,
"Y": w,
},
outputs={"Out": tmp},
attrs={'x_num_col_dims': num_flatten_dims,
'y_num_col_dims': 1})
mul_results.append(tmp)
# sum
if len(mul_results) == 1:
pre_bias = mul_results[0]
else:
pre_bias = helper.create_tmp_variable(dtype)
helper.append_op(
type="sum", inputs={"X": mul_results}, outputs={"Out": pre_bias})
# add bias
pre_activation = helper.append_bias_op(pre_bias)
# add activation
return helper.append_activation(pre_activation)
def embedding(input,
size,
is_sparse=False,
param_attr=None,
dtype='float32',
main_program=None,
startup_program=None):
"""
Embedding Layer.
Args:
param_initializer:
input: The input to the function
size: The size of the layer
is_sparse: A flag that decleares whether the input is sparse
param_attr: Parameters for this layer
dtype: The type of data : float32, float_16, int etc
main_program: Name of the main program that calls this
startup_program: Name of the startup program
This function can take in the input (which is a vector of IDs) and
performs a lookup in the lookup_table using these IDs, to result into
the embedding of each ID in the input.
All the input variables of this function are passed in as local variables
to the LayerHelper constructor.
"""
helper = LayerHelper('embedding', **locals())
w = helper.create_parameter(
attr=helper.param_attr, shape=size, dtype=dtype, is_bias=False)
tmp = helper.create_tmp_variable(dtype)
helper.append_op(
type='lookup_table',
inputs={'Ids': input,
'W': w},
outputs={'Out': tmp},
attrs={'is_sparse': is_sparse})
return tmp
# TODO(qijun): expose H0 and C0
def dynamic_lstm(input,
size,
param_attr=None,
bias_attr=None,
use_peepholes=True,
is_reverse=False,
gate_activation='sigmoid',
cell_activation='tanh',
candidate_activation='tanh',
dtype='float32',
main_program=None,
startup_program=None):
helper = LayerHelper('lstm', **locals())
size = size / 4
weight = helper.create_parameter(
attr=helper.param_attr, shape=[size, 4 * size], dtype=dtype)
bias_size = [1, 7 * size]
if not use_peepholes:
bias_size[1] = 4 * size
bias = helper.create_parameter(
attr=helper.bias_attr, shape=bias_size, dtype=dtype, is_bias=True)
hidden = helper.create_tmp_variable(dtype)
cell = helper.create_tmp_variable(dtype)
batch_gate = helper.create_tmp_variable(dtype)
batch_cell_pre_act = helper.create_tmp_variable(dtype)
helper.append_op(
type='lstm',
inputs={'Input': input,
'Weight': weight,
'Bias': bias},
outputs={
'Hidden': hidden,
'Cell': cell,
'BatchGate': batch_gate,
'BatchCellPreAct': batch_cell_pre_act
},
attrs={
'use_peepholes': use_peepholes,
'is_reverse': is_reverse,
'gate_activation': gate_activation,
'cell_activation': cell_activation,
'candidate_activation': candidate_activation
})
return hidden, cell
def gru_unit(input,
hidden,
size,
weight=None,
bias=None,
activation='tanh',
gate_activation='sigmoid',
main_program=None,
startup_program=None):
"""
GRUUnit Operator implements partial calculations of the GRU unit as following:
$$
update \ gate: u_t = actGate(xu_t + W_u * h_{t-1} + b_u) \\
reset \ gate: r_t = actGate(xr_t + W_r * h_{t-1} + b_r) \\
output \ candidate: {h}_t = actNode(xc_t + W_c * dot(r_t, h_{t-1}) + b_c) \\
output: h_t = dot((1 - u_t), h_{t-1}) + dot(u_t, {h}_t)
$$
which is same as one time step of GRU Operator.
@note To implement the complete GRU unit, fully-connected operator must be
used before to feed xu, xr and xc as the Input of GRUUnit operator.
TODO(ChunweiYan) add more document here
"""
activation_dict = dict(
identity=0,
sigmoid=1,
tanh=2,
relu=3, )
activation = activation_dict[activation]
gate_activation = activation_dict[gate_activation]
helper = LayerHelper('gru_unit', **locals())
dtype = helper.input_dtype()
size = size / 3
# create weight
if weight is None:
weight = helper.create_parameter(
attr=helper.param_attr, shape=[size, 3 * size], dtype=dtype)
# create bias
if bias is None:
bias_size = [1, 3 * size]
bias = helper.create_parameter(
attr=helper.bias_attr, shape=bias_size, dtype=dtype, is_bias=True)
gate = helper.create_tmp_variable(dtype)
reset_hidden_pre = helper.create_tmp_variable(dtype)
updated_hidden = helper.create_tmp_variable(dtype)
helper.append_op(
type='gru_unit',
inputs={'Input': input,
'HiddenPrev': hidden,
'Weight': weight},
outputs={
'Gate': gate,
'ResetHiddenPrev': reset_hidden_pre,
'Hidden': updated_hidden,
},
attrs={
'activation': 0,
'gate_activation': 1,
})
return updated_hidden, reset_hidden_pre, gate
def linear_chain_crf(input,
label,
param_attr=None,
main_program=None,
startup_program=None):
helper = LayerHelper('linear_chain_crf', **locals())
size = input.shape[1]
transition = helper.create_parameter(
attr=helper.param_attr,
shape=[size + 2, size],
dtype=helper.input_dtype())
alpha = helper.create_tmp_variable(dtype=helper.input_dtype())
emission_exps = helper.create_tmp_variable(dtype=helper.input_dtype())
transition_exps = helper.create_tmp_variable(dtype=helper.input_dtype())
log_likelihood = helper.create_tmp_variable(dtype=helper.input_dtype())
helper.append_op(
type='linear_chain_crf',
inputs={"Emission": [input],
"Transition": transition,
"Label": label},
outputs={
"Alpha": [alpha],
"EmissionExps": [emission_exps],
"TransitionExps": transition_exps,
"LogLikelihood": log_likelihood
})
return log_likelihood
def crf_decoding(input,
param_attr,
label=None,
main_program=None,
startup_program=None):
helper = LayerHelper('crf_decoding', **locals())
transition = helper.get_parameter(param_attr.name)
viterbi_path = helper.create_tmp_variable(dtype=helper.input_dtype())
helper.append_op(
type='crf_decoding',
inputs={"Emission": [input],
"Transition": transition,
"Label": label},
outputs={"ViterbiPath": [viterbi_path]})
return viterbi_path
def cos_sim(X, Y, **kwargs):
"""
This function performs the cosine similarity between two tensors
X and Y and returns that as the output.
"""
helper = LayerHelper('cos_sim', **kwargs)
out = helper.create_tmp_variable(dtype=X.dtype)
xnorm = helper.create_tmp_variable(dtype=X.dtype)
ynorm = helper.create_tmp_variable(dtype=X.dtype)
helper.append_op(
type='cos_sim',
inputs={'X': [X],
'Y': [Y]},
outputs={'Out': [out],
'XNorm': [xnorm],
'YNorm': [ynorm]})
return out
def cross_entropy(input, label, **kwargs):
"""
This function computes cross_entropy using the input and label.
"""
helper = LayerHelper('cross_entropy', **kwargs)
out = helper.create_tmp_variable(dtype=input.dtype)
helper.append_op(
type='cross_entropy',
inputs={'X': [input],
'Label': [label]},
outputs={'Y': [out]},
attrs=kwargs)
return out
def square_error_cost(input, label, **kwargs):
"""
This functions returns the squared error cost using the input and label.
The output is appending the op to do the above.
"""
helper = LayerHelper('square_error_cost', **kwargs)
minus_out = helper.create_tmp_variable(dtype=input.dtype)
helper.append_op(
type='elementwise_sub',
inputs={'X': [input],
'Y': [label]},
outputs={'Out': [minus_out]})
square_out = helper.create_tmp_variable(dtype=input.dtype)
helper.append_op(
type='square', inputs={'X': [minus_out]}, outputs={'Y': [square_out]})
return square_out
def accuracy(input, label, k=1, correct=None, total=None, **kwargs):
"""
This function computes the accuracy using the input and label.
The output is the top_k inputs and their indices.
"""
helper = LayerHelper("accuracy", **kwargs)
topk_out = helper.create_tmp_variable(dtype=input.dtype)
topk_indices = helper.create_tmp_variable(dtype="int64")
helper.append_op(
type="top_k",
inputs={"X": [input]},
outputs={"Out": [topk_out],
"Indices": [topk_indices]},
attrs={"k": k})
acc_out = helper.create_tmp_variable(dtype="float32")
if correct is None:
correct = helper.create_tmp_variable(dtype="int64")
if total is None:
total = helper.create_tmp_variable(dtype="int64")
helper.append_op(
type="accuracy",
inputs={
"Out": [topk_out],
"Indices": [topk_indices],
"Label": [label]
},
outputs={
"Accuracy": [acc_out],
"Correct": [correct],
"Total": [total],
})
return acc_out
def chunk_eval(input,
label,
chunk_scheme,
num_chunk_types,
excluded_chunk_types=None,
**kwargs):
"""
This function computes and outputs the precision, recall and
F1-score of chunk detection.
"""
helper = LayerHelper("chunk_eval", **kwargs)
# prepare output
precision = helper.create_tmp_variable(dtype="float32")
recall = helper.create_tmp_variable(dtype="float32")
f1_score = helper.create_tmp_variable(dtype="float32")
num_infer_chunks = helper.create_tmp_variable(dtype="int64")
num_label_chunks = helper.create_tmp_variable(dtype="int64")
num_correct_chunks = helper.create_tmp_variable(dtype="int64")
helper.append_op(
type="chunk_eval",
inputs={"Inference": [input],
"Label": [label]},
outputs={
"Precision": [precision],
"Recall": [recall],
"F1-Score": [f1_score],
"NumInferChunks": [num_infer_chunks],
"NumLabelChunks": [num_label_chunks],
"NumCorrectChunks": [num_correct_chunks]
},
attrs={
"num_chunk_types": num_chunk_types,
'chunk_scheme': chunk_scheme,
'excluded_chunk_types': excluded_chunk_types or []
})
return precision, recall, f1_score, num_infer_chunks, num_label_chunks, num_correct_chunks
def sequence_conv(input,
num_filters,
filter_size=3,
filter_stride=1,
padding=None,
bias_attr=None,
param_attr=None,
act=None,
main_program=None,
startup_program=None):
"""
This function creates the op for sequence_conv, using the inputs and
other convolutional configurations for the filters and stride as given
in the input parameters to the function.
"""
# FIXME(dzh) : want to unify the argument of python layer
# function. So we ignore some unecessary attributes.
# such as, padding_trainable, context_start.
helper = LayerHelper('sequence_conv', **locals())
dtype = helper.input_dtype()
filter_shape = [filter_size * input.shape[1], num_filters]
filter_param = helper.create_parameter(
attr=helper.param_attr, shape=filter_shape, dtype=dtype)
pre_bias = helper.create_tmp_variable(dtype)
helper.append_op(
type='sequence_conv',
inputs={
'X': [input],
'Filter': [filter_param],
},
outputs={"Out": pre_bias},
attrs={
'contextStride': filter_stride,
'contextStart': -int(filter_size / 2),
'contextLength': filter_size
})
pre_act = helper.append_bias_op(pre_bias)
return helper.append_activation(pre_act)
def conv2d(input,
num_filters,
filter_size,
stride=None,
padding=None,
groups=None,
param_attr=None,
bias_attr=None,
act=None,
name=None,
main_program=None,
startup_program=None):
"""
This function creates the op for a 2-dimensional Convolution.
This is performed using the parameters of filters(size, dimensionality etc)
, stride and other configurations for a Convolution operation.
This funciton can also append an activation on top of the
conv-2d output, if mentioned in the input parameters.
"""
if stride is None:
stride = [1, 1]
helper = LayerHelper('conv2d', **locals())
dtype = helper.input_dtype()
num_channels = input.shape[1]
if groups is None:
num_filter_channels = num_channels
else:
if num_channels % groups != 0:
raise ValueError("num_channels must be divisible by groups.")
num_filter_channels = num_channels / groups
if isinstance(filter_size, int):
filter_size = [filter_size, filter_size]
if isinstance(stride, int):
stride = [stride, stride]
if isinstance(padding, int):
padding = [padding, padding]
input_shape = input.shape
filter_shape = [num_filters, num_filter_channels] + filter_size
def _get_default_param_initializer():
std = (2.0 / (filter_size[0]**2 * num_channels))**0.5
return Normal(0.0, std, 0)
filter_param = helper.create_parameter(
attr=helper.param_attr,
shape=filter_shape,
dtype=dtype,
default_initializer=_get_default_param_initializer())
pre_bias = helper.create_tmp_variable(dtype)
helper.append_op(
type='conv2d_cudnn',
inputs={
'Input': input,
'Filter': filter_param,
},
outputs={"Output": pre_bias},
attrs={'strides': stride,
'paddings': padding,
'groups': groups})
pre_act = helper.append_bias_op(pre_bias, dim_start=1, dim_end=2)
return helper.append_activation(pre_act)
def sequence_pool(input, pool_type, **kwargs):
"""
This function add the operator for sequence pooling.
This is applied on top of the input using pool_type mentioned
in the parameters.
"""
helper = LayerHelper('sequence_pool', input=input, **kwargs)
dtype = helper.input_dtype()
pool_out = helper.create_tmp_variable(dtype)
max_index = helper.create_tmp_variable(dtype)
helper.append_op(
type="sequence_pool",
inputs={"X": input},
outputs={"Out": pool_out,
"MaxIndex": max_index},
attrs={"pooltype": pool_type.upper()})
return pool_out
def pool2d(input,
pool_size,
pool_type,
pool_stride=None,
pool_padding=None,
global_pooling=False,
main_program=None,
startup_program=None):
"""
This function adds the operator for pooling in 2 dimensions, using the
pooling configurations mentioned in input parameters.
"""
if pool_padding is None:
pool_padding = [0, 0]
if pool_stride is None:
pool_stride = [1, 1]
if pool_type not in ["max", "avg"]:
raise ValueError(
"Unknown pool_type: '%s'. It can only be 'max' or 'avg'.",
str(pool_type))
if isinstance(pool_size, int):
pool_size = [pool_size, pool_size]
if isinstance(pool_stride, int):
pool_stride = [pool_stride, pool_stride]
if isinstance(pool_padding, int):
pool_padding = [pool_padding, pool_padding]
helper = LayerHelper('pool2d', **locals())
dtype = helper.input_dtype()
pool_out = helper.create_tmp_variable(dtype)
helper.append_op(
type="pool2d",
inputs={"X": input},
outputs={"Out": pool_out},
attrs={
"pooling_type": pool_type,
"ksize": pool_size,
"global_pooling": global_pooling,
"strides": pool_stride,
"paddings": pool_padding
})
return pool_out
def batch_norm(input,
act=None,
is_test=False,
momentum=0.9,
epsilon=1e-05,
param_attr=None,
bias_attr=None,
data_layout='NCHW',
main_program=None,
startup_program=None):
"""
This function helps create an operator to implement
the BatchNorm layer using the configurations from the input parameters.
"""
helper = LayerHelper('batch_norm', **locals())
dtype = helper.input_dtype()
input_shape = input.shape
if data_layout == 'NCHW':
channel_num = input_shape[1]
else:
if data_layout == 'NHWC':
channel_num = input_shape[-1]
else:
raise ValueError("unsupported data layout:" + data_layout)
param_shape = [channel_num]
# create parameter
scale = helper.create_parameter(
attr=helper.param_attr,
shape=param_shape,
dtype=dtype,
default_initializer=Constant(1.0))
bias = helper.create_parameter(
attr=helper.param_attr, shape=param_shape, dtype=dtype, is_bias=True)
mean = helper.create_global_variable(
dtype=input.dtype, shape=param_shape, persistable=True)
helper.set_variable_initializer(var=mean, initializer=Constant(0.0))
variance = helper.create_global_variable(
dtype=input.dtype, shape=param_shape, persistable=True)
helper.set_variable_initializer(var=variance, initializer=Constant(1.0))
# create output
# mean and mean_out share the same memory
mean_out = mean
# variance and variance out share the same memory
variance_out = variance
saved_mean = helper.create_tmp_variable(dtype)
saved_variance = helper.create_tmp_variable(dtype)
batch_norm_out = helper.create_tmp_variable(dtype)
helper.append_op(
type="batch_norm",
inputs={
"X": input,
"Scale": scale,
"Bias": bias,
"Mean": mean,
"Variance": variance
},
outputs={
"Y": batch_norm_out,
"MeanOut": mean_out,
"VarianceOut": variance_out,
"SavedMean": saved_mean,
"SavedVariance": saved_variance
},
attrs={"momentum": momentum,
"epsilon": epsilon,
"is_test": is_test})
return helper.append_activation(batch_norm_out)
def beam_search_decode(ids, scores, main_program=None, startup_program=None):
helper = LayerHelper('beam_search_decode', **locals())
sentence_ids = helper.create_tmp_variable(dtype=ids.dtype)
sentence_scores = helper.create_tmp_variable(dtype=ids.dtype)
helper.append_op(
type="beam_search_decode",
inputs={"Ids": ids,
"Scores": scores},
outputs={
"SentenceIds": sentence_ids,
"SentenceScores": sentence_scores
})
return sentence_ids, sentence_scores
def conv2d_transpose(input,
num_filters,
output_size=None,
filter_size=None,
padding=None,
stride=None,
param_attr=None,
main_program=None,
startup_program=None):
"""
The transpose of conv2d layer.
This layer is also known as deconvolution layer.
Args:
input(Variable): The input image with [N, C, H, W] format.
num_filters(int): The number of filter. It is as same as the output
image channel.
output_size(int|tuple|None): The output image size. If output size is a
tuple, it must contain two integers, (image_H, image_W). This
parameter only works when filter_size is None.
filter_size(int|tuple|None): The filter size. If filter_size is a tuple,
it must contain two integers, (filter_size_H, filter_size_W).
Otherwise, the filter will be a square. None if use output size to
calculate filter_size
padding(int|tuple): The padding size. If padding is a tuple, it must
contain two integers, (padding_H, padding_W). Otherwise, the
padding_H = padding_W = padding.
stride(int|tuple): The stride size. If stride is a tuple, it must
contain two integers, (stride_H, stride_W). Otherwise, the
stride_H = stride_W = stride.
param_attr: Parameter Attribute.
main_program(Program): the main program
startup_program(Program): the startup program
Returns:
Variable: Output image.
"""
helper = LayerHelper("conv2d_transpose", **locals())
if not isinstance(input, Variable):
raise TypeError("Input of conv2d_transpose must be Variable")
input_channel = input.shape[1]
op_attr = dict()
if isinstance(padding, int):
op_attr['paddings'] = [padding, padding]
elif padding is not None:
op_attr['paddings'] = padding
if isinstance(stride, int):
op_attr['strides'] = stride
elif stride is not None:
op_attr['strides'] = stride
if filter_size is None:
if output_size is None:
raise ValueError("output_size must be set when filter_size is None")
if isinstance(output_size, int):
output_size = [output_size, output_size]
padding = op_attr.get('paddings', [0, 0])
stride = op_attr.get('strides', [1, 1])
h_in = input.shape[2]
w_in = input.shape[3]
filter_size_h = output_size[0] - \
(h_in - 1) * stride[0] + 2 * padding[0]
filter_size_w = output_size[1] - \
(w_in - 1) * stride[1] + 2 * padding[1]
filter_size = [filter_size_h, filter_size_w]
elif isinstance(filter_size, int):
filter_size = [filter_size, filter_size]
filter_shape = [input_channel, num_filters] + filter_size
img_filter = helper.create_parameter(
dtype=input.dtype, shape=filter_shape, attr=helper.param_attr)
out = helper.create_tmp_variable(dtype=input.dtype)
helper.append_op(
type='conv2d_transpose',
inputs={'Input': [input],
'Filter': [img_filter]},
outputs={'Output': out},
attrs=op_attr)
return out
from ..registry import register_layer
__all__ = [
'mean', 'mul', 'dropout', 'reshape', 'sigmoid', 'scale', 'transpose',
'sigmoid_cross_entropy_with_logits', 'elementwise_add', 'elementwise_div',
'elementwise_sub', 'elementwise_mul', 'clip', 'abs'
]
for _OP in set(__all__):
globals()[_OP] = register_layer(_OP)
from ..layer_helper import LayerHelper
__all__ = [
'create_tensor', 'cast', 'concat', 'sums', 'assign',
'fill_constant_batch_size_like', 'fill_constant', 'ones', 'zeros'
]
def create_tensor(dtype, name=None, main_program=None, startup_program=None):
helper = LayerHelper("create_tensor", **locals())
return helper.create_variable(name=helper.name, dtype=dtype)
def cast(x, dtype, main_program=None):
"""
This function takes in the input with input_dtype
and casts it to the output_dtype as the output.
"""
helper = LayerHelper('cast', **locals())
out = helper.create_tmp_variable(dtype=dtype)
helper.append_op(
type='cast',
inputs={'X': [x]},
outputs={'Out': [out]},
attrs={'in_dtype': x.dtype,
'out_dtype': out.dtype})
return out
def concat(input, axis, main_program=None, startup_program=None):
"""
This function concats the input along the axis mentioned
and returns that as the output.
"""
helper = LayerHelper('concat', **locals())
out = helper.create_tmp_variable(dtype=helper.input_dtype())
helper.append_op(
type='concat',
inputs={'X': input},
outputs={'Out': [out]},
attrs={'axis': axis})
return out
def sums(input, out=None, main_program=None, startup_program=None):
"""
This function takes in the input and performs the sum operation on it
and returns that as the output.
"""
helper = LayerHelper('sum', **locals())
if out is None:
out = helper.create_tmp_variable(dtype=helper.input_dtype())
helper.append_op(type='sum', inputs={'X': input}, outputs={'Out': out})
return out
def assign(input, output, main_program=None, startup_program=None):
helper = LayerHelper('assign', **locals())
helper.append_op(
type='scale',
inputs={'X': [input]},
outputs={'Out': [output]},
attrs={'scale': 1.0})
return output
def fill_constant(shape,
dtype,
value,
out=None,
main_program=None,
startup_program=None):
"""
This function creates a tensor , with shape as mentioned in the input and
specified dtype and fills this up with a constant value that
comes in the input. It also sets the stop_gradient to be True.
"""
helper = LayerHelper("fill_constant", **locals())
if out is None:
out = helper.create_tmp_variable(dtype=dtype)
helper.append_op(
type='fill_constant',
inputs={},
outputs={'Out': [out]},
attrs={'shape': shape,
'dtype': out.dtype,
'value': float(value)})
out.stop_gradient = True
return out
def fill_constant_batch_size_like(input,
shape,
dtype,
value,
input_dim_idx=0,
output_dim_idx=0,
main_program=None,
startup_program=None):
helper = LayerHelper("fill_constant_batch_size_like", **locals())
out = helper.create_tmp_variable(dtype=dtype)
helper.append_op(
type='fill_constant_batch_size_like',
inputs={'Input': input},
outputs={'Out': [out]},
attrs={
'shape': shape,
'dtype': out.dtype,
'value': float(value),
'input_dim_idx': input_dim_idx,
'output_dim_idx': output_dim_idx
})
out.stop_gradient = True
return out
def ones(shape, dtype, main_program=None):
"""
This function performs the same function as fill_constant() declared above
with the constant value being 1.0.
"""
return fill_constant(value=1.0, **locals())
def zeros(shape, dtype, main_program=None):
"""
This function performs the same function as fill_constant() declared above
with the constant value being 0.0.
"""
return fill_constant(value=0.0, **locals())
from __future__ import print_function from __future__ import print_function
import numpy as np import sys
import paddle.v2 as paddle import paddle.v2 as paddle
import paddle.v2.fluid as fluid import paddle.v2.fluid as fluid
import sys
def resnet_cifar10(input, depth=32): def resnet_cifar10(input, depth=32):
......
...@@ -150,7 +150,7 @@ def main(): ...@@ -150,7 +150,7 @@ def main():
crf_decode = fluid.layers.crf_decoding( crf_decode = fluid.layers.crf_decoding(
input=feature_out, param_attr=fluid.ParamAttr(name='crfw')) input=feature_out, param_attr=fluid.ParamAttr(name='crfw'))
precision, recall, f1_score = fluid.layers.chunk_eval( chunk_evaluator = fluid.evaluator.ChunkEvaluator(
input=crf_decode, input=crf_decode,
label=target, label=target,
chunk_scheme="IOB", chunk_scheme="IOB",
...@@ -176,20 +176,21 @@ def main(): ...@@ -176,20 +176,21 @@ def main():
batch_id = 0 batch_id = 0
for pass_id in xrange(PASS_NUM): for pass_id in xrange(PASS_NUM):
chunk_evaluator.reset(exe)
for data in train_data(): for data in train_data():
outs = exe.run(fluid.default_main_program(), cost, precision, recall, f1_score = exe.run(
fluid.default_main_program(),
feed=feeder.feed(data), feed=feeder.feed(data),
fetch_list=[avg_cost, precision, recall, f1_score]) fetch_list=[avg_cost] + chunk_evaluator.metrics)
avg_cost_val = np.array(outs[0]) pass_precision, pass_recall, pass_f1_score = chunk_evaluator.eval(
precision_val = np.array(outs[1]) exe)
recall_val = np.array(outs[2])
f1_score_val = np.array(outs[3])
if batch_id % 10 == 0: if batch_id % 10 == 0:
print("avg_cost=" + str(avg_cost_val)) print("avg_cost:" + str(cost) + " precision:" + str(
print("precision_val=" + str(precision_val)) precision) + " recall:" + str(recall) + " f1_score:" + str(
print("recall_val:" + str(recall_val)) f1_score) + " pass_precision:" + str(
print("f1_score_val:" + str(f1_score_val)) pass_precision) + " pass_recall:" + str(pass_recall)
+ " pass_f1_score:" + str(pass_f1_score))
# exit early for CI # exit early for CI
exit(0) exit(0)
......
import numpy as np import numpy as np
import paddle.v2 as paddle import paddle.v2 as paddle
import paddle.v2.fluid as fluid import paddle.v2.fluid as fluid
from paddle.v2.fluid.layer_helper import LayerHelper
def lstm(x,
c_pre_init,
hidden_dim,
forget_bias=None,
main_program=None,
startup_program=None):
"""
This function helps create an operator for the LSTM (Long Short Term
Memory) cell that can be used inside an RNN.
"""
helper = LayerHelper('lstm_unit', **locals())
rnn = fluid.layers.StaticRNN()
with rnn.step():
c_pre = rnn.memory(init=c_pre_init)
x_t = rnn.step_input(x)
before_fc = fluid.layers.concat(
input=[x_t, c_pre],
axis=1,
main_program=main_program,
startup_program=startup_program)
after_fc = fluid.layers.fc(input=before_fc,
size=hidden_dim * 4,
main_program=main_program,
startup_program=startup_program)
dtype = x.dtype
c = helper.create_tmp_variable(dtype)
h = helper.create_tmp_variable(dtype)
helper.append_op(
type='lstm_unit',
inputs={"X": after_fc,
"C_prev": c_pre},
outputs={"C": c,
"H": h},
attrs={"forget_bias": forget_bias})
rnn.update_memory(c_pre, c)
rnn.output(h)
return rnn()
def lstm_net(dict_dim, class_dim=2, emb_dim=32, seq_len=80, batch_size=50): def lstm_net(dict_dim, class_dim=2, emb_dim=32, seq_len=80, batch_size=50):
...@@ -23,8 +68,7 @@ def lstm_net(dict_dim, class_dim=2, emb_dim=32, seq_len=80, batch_size=50): ...@@ -23,8 +68,7 @@ def lstm_net(dict_dim, class_dim=2, emb_dim=32, seq_len=80, batch_size=50):
c_pre_init = fluid.layers.fill_constant( c_pre_init = fluid.layers.fill_constant(
dtype=emb.dtype, shape=[batch_size, emb_dim], value=0.0) dtype=emb.dtype, shape=[batch_size, emb_dim], value=0.0)
c_pre_init.stop_gradient = False c_pre_init.stop_gradient = False
layer_1_out = fluid.layers.lstm( layer_1_out = lstm(emb, c_pre_init=c_pre_init, hidden_dim=emb_dim)
emb, c_pre_init=c_pre_init, hidden_dim=emb_dim)
layer_1_out = fluid.layers.transpose(x=layer_1_out, axis=[1, 0, 2]) layer_1_out = fluid.layers.transpose(x=layer_1_out, axis=[1, 0, 2])
prediction = fluid.layers.fc(input=layer_1_out, prediction = fluid.layers.fc(input=layer_1_out,
......
...@@ -147,7 +147,13 @@ class TestChunkEvalOp(OpTest): ...@@ -147,7 +147,13 @@ class TestChunkEvalOp(OpTest):
'Recall': np.asarray( 'Recall': np.asarray(
[recall], dtype='float32'), [recall], dtype='float32'),
'F1-Score': np.asarray( 'F1-Score': np.asarray(
[f1], dtype='float32') [f1], dtype='float32'),
'NumInferChunks': np.asarray(
[self.num_infer_chunks], dtype='int64'),
'NumLabelChunks': np.asarray(
[self.num_label_chunks], dtype='int64'),
'NumCorrectChunks': np.asarray(
[self.num_correct_chunks], dtype='int64')
} }
def setUp(self): def setUp(self):
......
...@@ -17,5 +17,19 @@ class TestReshapeOp(OpTest): ...@@ -17,5 +17,19 @@ class TestReshapeOp(OpTest):
self.check_grad(["X"], "Out") self.check_grad(["X"], "Out")
class TestReshapeOpDimInfer(OpTest):
def setUp(self):
self.op_type = "reshape"
self.inputs = {'X': np.random.random((10, 20)).astype("float32")}
self.attrs = {'shape': [4, -1, 5]}
self.outputs = {'Out': self.inputs['X'].reshape(self.attrs['shape'])}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(["X"], "Out")
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
import unittest
import numpy as np
from op_test import OpTest
from test_pool2d_op import max_pool2D_forward_naive
from test_pool2d_op import avg_pool2D_forward_naive
class TestSppOp(OpTest):
def setUp(self):
self.op_type = "spp"
self.init_test_case()
input = np.random.random(self.shape).astype("float32")
nsize, csize, hsize, wsize = input.shape
out_level_flatten = []
for i in xrange(self.pyramid_height):
bins = np.power(2, i)
kernel_size = [0, 0]
padding = [0, 0]
kernel_size[0] = np.ceil(hsize /
bins.astype("double")).astype("int32")
padding[0] = (
(kernel_size[0] * bins - hsize + 1) / 2).astype("int32")
kernel_size[1] = np.ceil(wsize /
bins.astype("double")).astype("int32")
padding[1] = (
(kernel_size[1] * bins - wsize + 1) / 2).astype("int32")
out_level = self.pool2D_forward_naive(input, kernel_size,
kernel_size, padding)
out_level_flatten.append(
out_level.reshape(nsize, bins * bins * csize))
if i == 0:
output = out_level_flatten[i]
else:
output = np.concatenate((output, out_level_flatten[i]), 1)
# output = np.concatenate(out_level_flatten.tolist(), 0);
self.inputs = {'X': input.astype('float32'), }
self.attrs = {
'pyramid_height': self.pyramid_height,
'pooling_type': self.pool_type
}
self.outputs = {'Out': output.astype('float32')}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
if self.pool_type != "avg":
self.check_grad(['X'], 'Out', max_relative_error=0.05)
def init_test_case(self):
self.shape = [3, 2, 4, 4]
self.pyramid_height = 3
self.pool2D_forward_naive = max_pool2D_forward_naive
self.pool_type = "max"
class TestCase2(TestSppOp):
def init_test_case(self):
self.shape = [3, 2, 4, 4]
self.pyramid_height = 3
self.pool2D_forward_naive = avg_pool2D_forward_naive
self.pool_type = "avg"
if __name__ == '__main__':
unittest.main()
...@@ -390,8 +390,6 @@ def pipe_reader(left_cmd, ...@@ -390,8 +390,6 @@ def pipe_reader(left_cmd,
if not callable(parser): if not callable(parser):
raise TypeError("parser must be a callable object") raise TypeError("parser must be a callable object")
process = subprocess.Popen(
left_cmd.split(" "), bufsize=bufsize, stdout=subprocess.PIPE)
# TODO(typhoonzero): add a thread to read stderr # TODO(typhoonzero): add a thread to read stderr
# Always init a decompress object is better than # Always init a decompress object is better than
...@@ -400,6 +398,8 @@ def pipe_reader(left_cmd, ...@@ -400,6 +398,8 @@ def pipe_reader(left_cmd,
32 + zlib.MAX_WBITS) # offset 32 to skip the header 32 + zlib.MAX_WBITS) # offset 32 to skip the header
def reader(): def reader():
process = subprocess.Popen(
left_cmd.split(" "), bufsize=bufsize, stdout=subprocess.PIPE)
remained = "" remained = ""
while True: while True:
buff = process.stdout.read(bufsize) buff = process.stdout.read(bufsize)
......
...@@ -145,5 +145,35 @@ class TestXmap(unittest.TestCase): ...@@ -145,5 +145,35 @@ class TestXmap(unittest.TestCase):
self.assertEqual(e, mapper(idx)) self.assertEqual(e, mapper(idx))
class TestPipeReader(unittest.TestCase):
def test_pipe_reader(self):
def simple_parser(lines):
return lines
import tempfile
records = [str(i) for i in xrange(5)]
temp = tempfile.NamedTemporaryFile()
try:
with open(temp.name, 'w') as f:
for r in records:
f.write('%s\n' % r)
cmd = "cat %s" % temp.name
reader = paddle.v2.reader.pipe_reader(
cmd, simple_parser, bufsize=128)
for i in xrange(4):
result = []
for r in reader():
result.append(r)
for idx, e in enumerate(records):
print e, result[idx]
self.assertEqual(e, result[idx])
finally:
# delete the temporary file
temp.close()
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -68,6 +68,7 @@ packages=['paddle', ...@@ -68,6 +68,7 @@ packages=['paddle',
'paddle.v2.plot', 'paddle.v2.plot',
'paddle.v2.fluid', 'paddle.v2.fluid',
'paddle.v2.fluid.proto', 'paddle.v2.fluid.proto',
'paddle.v2.fluid.layers',
'py_paddle'] 'py_paddle']
with open('@PADDLE_SOURCE_DIR@/python/requirements.txt') as f: with open('@PADDLE_SOURCE_DIR@/python/requirements.txt') as f:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册