提交 f05330b7 编写于 作者: Y yangyaming

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix-7195

...@@ -37,6 +37,7 @@ Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddl ...@@ -37,6 +37,7 @@ Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddl
- Optimized math operations through SSE/AVX intrinsics, BLAS libraries - Optimized math operations through SSE/AVX intrinsics, BLAS libraries
(e.g. MKL, OpenBLAS, cuBLAS) or customized CPU/GPU kernels. (e.g. MKL, OpenBLAS, cuBLAS) or customized CPU/GPU kernels.
- Optimized CNN networks through MKL-DNN library.
- Highly optimized recurrent networks which can handle **variable-length** - Highly optimized recurrent networks which can handle **variable-length**
sequence without padding. sequence without padding.
- Optimized local and distributed training for models with high dimensional - Optimized local and distributed training for models with high dimensional
......
# Cluster Training Benchmark
## Setup
- Platform
- Kubernetes: v1.6.2
- Linux Kernel: v3.10.0
- Resource
- CPU: 10 Cores per Pod
- Memory: 5GB per Pod
- Docker Image
We use different base Docker Image to run the benchmark on Kubernetes:
- PaddlePaddle v2: paddlepaddle/paddle:0.11.0
- PaddlePaddle Fluid: paddlepaddle/paddle:[commit-id]
- TensorFlow: tensorflow/tensorflow:1.5.0-rc0
- Model
vgg16 is used in this benchmark.
## Cases
- Variable
- Batch Size of training data.
- PServer count of the training job.
- The number of trainers.
- Invariant
- The resource of trainer/pserver Pod.
### Measure the Performance for Different Batch Size
- PServer Count: 40
- Trainer Count: 100
- Metrics: mini-batch / sec
| Batch Size | 32 | 64 | 128 | 256 |
| -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | - | - | - |
| PaddlePaddle v2 | - | - | - | - |
| TensorFlow | - | - | - | - |
### Measure the Performance for Different PServer Count
- Trainer Count: 100
- Batch Size: 64
- Metrics: mini-batch / sec
| PServer Count | 10 | 20 | 40 | 60 |
| -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | - | - | - |
| PaddlePaddle v2 | - | - | - | - |
| TensorFlow | - | - | - | - |
### Measure Parallel Efficiency By Increasing Trainer Count
- PServer Count: 20
- Batch Size: 64
- Metrics:
$S = \div(T1, TN)$
which S is the ratio of T1 over TN, training time of 1 and N trainers.
The parallel efficiency is:
$E = \div(S, N)$
| Trainer Counter | 1 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | - | - | - | - | - | - | - | - | - | - |
| PaddlePaddle v2 | - | - | - | - | - | - | - | - | - | - | - | - |
| TensorFlow | - | - | - | - | - | - | - | - | - | - | - | - | - |
## Reproduce the benchmark
TODO
...@@ -15,4 +15,4 @@ Fluid ...@@ -15,4 +15,4 @@ Fluid
fluid/param_attr.rst fluid/param_attr.rst
fluid/profiler.rst fluid/profiler.rst
fluid/regularizer.rst fluid/regularizer.rst
fluid/io.rst
===========
IO
===========
is_parameter
-----------
.. autofunction:: paddle.v2.fluid.io.is_parameter
:noindex:
...@@ -38,6 +38,16 @@ elementwise_add ...@@ -38,6 +38,16 @@ elementwise_add
.. autofunction:: paddle.v2.fluid.layers.elementwise_add .. autofunction:: paddle.v2.fluid.layers.elementwise_add
:noindex: :noindex:
elementwise_sub
---------------
.. autofunction:: paddle.v2.fluid.layers.elementwise_sub
:noindex:
elementwise_mul
---------------
.. autofunction:: paddle.v2.fluid.layers.elementwise_mul
:noindex:
elementwise_div elementwise_div
--------------- ---------------
.. autofunction:: paddle.v2.fluid.layers.elementwise_div .. autofunction:: paddle.v2.fluid.layers.elementwise_div
...@@ -348,3 +358,132 @@ reduce_min ...@@ -348,3 +358,132 @@ reduce_min
.. autofunction:: paddle.v2.fluid.layers.reduce_min .. autofunction:: paddle.v2.fluid.layers.reduce_min
:noindex: :noindex:
logsigmoid
----------
.. autofunction:: paddle.v2.fluid.layers.logsigmoid
:noindex:
exp
---
.. autofunction:: paddle.v2.fluid.layers.exp
:noindex:
relu
----
.. autofunction:: paddle.v2.fluid.layers.relu
:noindex:
tanh
----
.. autofunction:: paddle.v2.fluid.layers.tanh
:noindex:
tanh_shrink
-----------
.. autofunction:: paddle.v2.fluid.layers.tanh_shrink
:noindex:
softshrink
----------
.. autofunction:: paddle.v2.fluid.layers.softshrink
:noindex:
sqrt
----
.. autofunction:: paddle.v2.fluid.layers.sqrt
:noindex:
abs
----
.. autofunction:: paddle.v2.fluid.layers.abs
:noindex:
ceil
----
.. autofunction:: paddle.v2.fluid.layers.ceil
:noindex:
floor
-----
.. autofunction:: paddle.v2.fluid.layers.floor
:noindex:
round
-----
.. autofunction:: paddle.v2.fluid.layers.round
:noindex:
reciprocal
----------
.. autofunction:: paddle.v2.fluid.layers.reciprocal
:noindex:
log
---
.. autofunction:: paddle.v2.fluid.layers.log
:noindex:
square
------
.. autofunction:: paddle.v2.fluid.layers.square
:noindex:
softplus
--------
.. autofunction:: paddle.v2.fluid.layers.softplus
:noindex:
softsign
---------
.. autofunction:: paddle.v2.fluid.layers.softsign
:noindex:
brelu
-----
.. autofunction:: paddle.v2.fluid.layers.brelu
:noindex:
leaky_relu
----------
.. autofunction:: paddle.v2.fluid.layers.leaky_relu
:noindex:
soft_relu
---------
.. autofunction:: paddle.v2.fluid.layers.soft_relu
:noindex:
elu
----
.. autofunction:: paddle.v2.fluid.layers.elu
:noindex:
relu6
-----
.. autofunction:: paddle.v2.fluid.layers.relu6
:noindex:
pow
----
.. autofunction:: paddle.v2.fluid.layers.pow
:noindex:
hard_shrink
-----------
.. autofunction:: paddle.v2.fluid.layers.hard_shrink
:noindex:
thresholded_relu
----------------
.. autofunction:: paddle.v2.fluid.layers.thresholded_relu
:noindex:
hard_sigmoid
-------------
.. autofunction:: paddle.v2.fluid.layers.hard_sigmoid
:noindex:
swish
------
.. autofunction:: paddle.v2.fluid.layers.swish
:noindex:
...@@ -202,8 +202,8 @@ This `OpDesc` value is in the `ops` field of the `BlockDesc` value representing ...@@ -202,8 +202,8 @@ This `OpDesc` value is in the `ops` field of the `BlockDesc` value representing
During the generation of the Protobuf message, the Block should store VarDesc (the Protobuf message which describes Variable) and OpDesc (the Protobuf message which describes Operator). During the generation of the Protobuf message, the Block should store VarDesc (the Protobuf message which describes Variable) and OpDesc (the Protobuf message which describes Operator).
VarDesc in a block should have its name scope to avoid local variables affect parent block's name scope. VarDesc in a block should have its name scope to avoid local variables affecting parent block's name scope.
Child block's name scopes should inherit the parent's so that OpDesc in child block can reference a VarDesc that stored in parent block. For example: Child block's name scopes should inherit the parent's so that OpDesc in child block can reference a VarDesc that is stored in the parent block. For example:
```python ```python
a = pd.Variable(shape=[20, 20]) a = pd.Variable(shape=[20, 20])
......
# Design Doc: The Keys of Operator Kernel Type # Design Doc: The Keys of Operator Kernel Type
## Problem ## Problem
An operator can have different kernel implementations, and each operator will have a map to store the related kernels. Fluid uses `OpKernelType` as a key to identify a unique Kernel. Before an operator runs, an certain kernel must be chosen by a key of `OpKernelType`. Currently, `OpKernelType` is defined as follows: An operator can have different kernel implementations, and each operator will have a map to store the related kernels. Fluid uses `OpKernelType` as a key to identify a unique kernel. Before an operator runs, a certain type of kernel must be chosen via a key of `OpKernelType`. Currently, `OpKernelType` is defined as follows:
```cpp ```cpp
struct OpKernelType { struct OpKernelType {
...@@ -10,13 +10,13 @@ struct OpKernelType { ...@@ -10,13 +10,13 @@ struct OpKernelType {
``` ```
For more details, please refer to [codes](https://github.com/PaddlePaddle/Paddle/blob/2d5ec16bc8a09fb8e0f62c89b116b0cd1d333907/paddle/framework/operator.h#L348-L374) in github. For more details, please refer to [codes](https://github.com/PaddlePaddle/Paddle/blob/2d5ec16bc8a09fb8e0f62c89b116b0cd1d333907/paddle/framework/operator.h#L348-L374) in github.
It contains two keys, `Place` and `DataType`. And these two keys will be hashed to a unique key to represent a certain type of kernel. However, these two keys are not enough. We need a more complete representation of `OpKernelType`. It contains two keys, `Place` and `DataType`. And these two keys will be hashed to a unique key to represent a certain type of kernel. However, these two keys do not provide enough information. We need a more complete representation of `OpKernelType`.
We often implement a kernel of an operator with some computing library in certain device(place). Please remind that computing library and device are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices. We often implement a kernel of an operator with some computing library on certain device(place). Please note that computing library and device do not have a one-to-one correspondence. A device can have a lot of computing libraries and a computing library can also support different devices.
For example, Eigen library can support Nvidia GPU/AMD GPU/CPU. And MKLDNN library can support Intel CPU/Intel FPGA. Both `Place` and `Library` should be a key of `OpKernelType`. For example, Eigen library supports Nvidia GPU/AMD GPU/CPU and MKLDNN library supports Intel CPU/Intel FPGA. Both `Place` and `Library` should be a key of `OpKernelType`.
It's obvious that different DataTypes, like fp64/fp32/int8 will have different kernels. But the data layout of a Tensor will also lead to different implementation. Please refer to the batch norm operator [kernels](https://github.com/PaddlePaddle/Paddle/blob/a948fac4d0ad7e0412d373b8aabeb711c2899563/paddle/operators/batch_norm_op.cc#L180-L209). Data Layout should also be taken into consideration. Different DataTypes, such as fp64/fp32/int8, will obviously have different kernels. But different data layout of a Tensor will also lead to different implementations. Please refer to the batch norm operator [kernels](https://github.com/PaddlePaddle/Paddle/blob/a948fac4d0ad7e0412d373b8aabeb711c2899563/paddle/operators/batch_norm_op.cc#L180-L209) as an example. Data layout should also be taken into consideration.
## Solution ## Solution
...@@ -31,17 +31,17 @@ struct OpKernelType { ...@@ -31,17 +31,17 @@ struct OpKernelType {
}; };
``` ```
Following is the details: The details are as follows:
### Place ### Place
`Place` is defined as follows: `Place` is defined as:
```cpp ```cpp
typedef boost::variant<CUDAPlace, ROCmPlace, FPGAPlace, CPUPlace> Place; typedef boost::variant<CUDAPlace, ROCmPlace, FPGAPlace, CPUPlace> Place;
``` ```
`Place` is to represent the device memory where data is locating. `Place` represents the device memory where data is located.
### Library ### Library
...@@ -52,10 +52,10 @@ One operator kernel is usually implemented based on one library. `Library` is de ...@@ -52,10 +52,10 @@ One operator kernel is usually implemented based on one library. `Library` is de
enum Library { Plain, MKLDNN, CUDNN }; enum Library { Plain, MKLDNN, CUDNN };
``` ```
We use `Plain` enumerator to represent default library. Since most operators in Fluid are implemented based on `Eigen` library, we take `Eigen` library as the `Plain` enumerator. We use `Plain` enumerator to represent default library. Since most operators in Fluid are implemented based on the `Eigen` library, we take `Eigen` library as the `Plain` enumerator.
A library usually has a corresponding `DeviceContext` which contains some handles needed by computation. Fluid now have two default DeviceContexts in CPU and CUDA, `CPUDeviceContext` and `CUDADeviceContext`. `CPUDeviceContext` contains a Eigen library handle and `CDUADeviceContext` contains a Eigen library handle and cuBLAS handle. A library usually has a corresponding `DeviceContext` which contains some handles needed for computation. Fluid now has two default DeviceContexts for CPU and CUDA, namely, `CPUDeviceContext` and `CUDADeviceContext`. `CPUDeviceContext` contains an Eigen library handle and `CDUADeviceContext` contains an Eigen library handle and a cuBLAS handle.
If we want to support new Library, a new enumerator need to be added to `Library` and a new corresponding `LibraryDeviceContext` will be created. If we want to support new library, a new enumerator need to be added to `Library` and a corresponding new `LibraryDeviceContext` need to be created.
### DataType ### DataType
...@@ -67,15 +67,15 @@ If we want to support new Library, a new enumerator need to be added to `Library ...@@ -67,15 +67,15 @@ If we want to support new Library, a new enumerator need to be added to `Library
Actually, a Tensor is a view of a block of memory. Besides a pointer to the memory, we also have to get some other descriptions of this block of memory, such as shape(ddim), stride, and layout. Actually, a Tensor is a view of a block of memory. Besides a pointer to the memory, we also have to get some other descriptions of this block of memory, such as shape(ddim), stride, and layout.
Different layout leads to different implementation of operator kernel. There are mainly 4 principles we have to follow to support layout in our fluid framework. Different layout leads to different implementation of the operator kernel. There are mainly 4 principles we have to follow to support layout in our Fluid framework.
- We take layout as a data member of Tensor. Layout is actually a enum variable. If fluid is built with MKLDNN, then, the memory format in MKLDNN will be added into this enum variable too. - We take layout as a data member of Tensor. Layout is actually a enum variable. If Fluid is built with MKLDNN, then the memory format in MKLDNN will also be added into this enum variable.
- Users have to set layout for input data. And some operators like fill_constant/random, also have to set layout of generating data. Of course, we can have some default layout, like NCHW. - Users have to set layout for input data. And some operators like fill_constant/random, also have to set layout for generating data. Of course, we can have some default layout, like NCHW.
- The inference of Layout is at run-time, not compile-time. - The inference of Layout is at run-time, not at compile-time.
- Every operator have to implement different kernels for different layouts. Let's take MKLDNN as an example, if we want to implement a MKLDNN convolution operator, we have to realize all the kernels for different layout, list at [here](http://01org.github.io/mkl-dnn/structmkldnn_1_1memory.html). And we will have a special macro to do registering kernels for MKLDNN operators. - Every operator has to implement different kernels for different layouts. Let's take MKLDNN as an example. If we want to implement an MKLDNN convolution operator, we have to implement all the kernels for different layouts, which are listed [here](http://01org.github.io/mkl-dnn/structmkldnn_1_1memory.html). And we will have a special macro to register kernels for MKLDNN operators.
`Layout` is also defined as a enum variable: `Layout` is also defined as a enum variable:
......
## Background ## Background
PaddlePaddle divides the description of neural network computation graph into two stages: compile time and runtime. PaddlePaddle divides the description of neural network computation into two stages: compile time and runtime. At compile time, the neural network computation is described as a `ProgramDesc` whereas at runtime an `Executor` interprets the `ProgramDesc` to compute the operations.
PaddlePaddle use proto message to describe compile time graph because PaddlePaddle use proto message to describe compile time program because
1. Computation graph should be able to be saved to a file. 1. The computation program description must be serializable and saved in a file.
1. In distributed training, the graph will be serialized and send to multiple workers. 1. During distributed training, the sreialized program will be sent to multiple workers. It should also be possible to break the program into different components, each of which can be executed on different workers.
The computation graph is constructed by Data Node and Operation Node. The concept to represent them is in the table below. The computation `Program` consists of nested `Blocks`. Each `Block` will consist of data(i.e. `Variable`) and `Operations`. The concept to represent them is in the table below.
| |compile time|runtime| | |compile time|runtime|
|---|---|---| |---|---|---|
......
...@@ -32,6 +32,16 @@ PaddlePaddle主要使用 `CMake <https://cmake.org>`_ 以及GCC, G++作为编译 ...@@ -32,6 +32,16 @@ PaddlePaddle主要使用 `CMake <https://cmake.org>`_ 以及GCC, G++作为编译
pip install build/python/dist/*.whl pip install build/python/dist/*.whl
如果机器中已经安装过PaddlePaddle,有两种方法:
.. code-block:: bash
1. 先卸载之前的版本,再重新安装
pip uninstall paddlepaddle
pip install build/python/dist/*.whl
2. 直接升级到更新的版本
pip install build/python/dist/*.whl -U
.. _run_test: .. _run_test:
......
...@@ -36,6 +36,16 @@ machine or copy it to the target machine. ...@@ -36,6 +36,16 @@ machine or copy it to the target machine.
pip install build/python/dist/*.whl pip install build/python/dist/*.whl
If the machine has installed PaddlePaddle before, there are two methods:
.. code-block:: bash
1. uninstall and reinstall
pip uninstall paddlepaddle
pip install build/python/dist/*.whl
2. upgrade directly
pip install build/python/dist/*.whl -U
.. _run_test: .. _run_test:
......
...@@ -24,7 +24,7 @@ ...@@ -24,7 +24,7 @@
- `framework::OperatorWithKernel`:继承自OperatorBase,Op有计算函数,称作有Kernel。 - `framework::OperatorWithKernel`:继承自OperatorBase,Op有计算函数,称作有Kernel。
- `class OpProtoAndCheckerMaker`:描述该Op的输入、输出、属性、注释,主要用于Python API接口生成 - `class OpProtoAndCheckerMaker`:描述该Op的输入、输出、属性、注释,主要用于Python API接口生成
依据是否包含kernel,可以将Op分为两种:包含Kernel的Op和不包含kernel的Op,前者Op的定义继承自`OperatorBase`,后者继承自`OperatorWithKernel`。本教程主要介绍带Kernel的Op如何写,简单总结Op需要包含的内容如下: 依据是否包含kernel,可以将Op分为两种:包含Kernel的Op和不包含kernel的Op,前者Op的定义继承自`OperatorWithKernel`,后者继承自`OperatorBase`。本教程主要介绍带Kernel的Op如何写,简单总结Op需要包含的内容如下:
内容 | 定义位置 内容 | 定义位置
......
...@@ -9,6 +9,7 @@ ...@@ -9,6 +9,7 @@
usage/cmd_parameter/index_cn.rst usage/cmd_parameter/index_cn.rst
usage/cluster/cluster_train_cn.md usage/cluster/cluster_train_cn.md
usage/capi/index_cn.rst
开发标准 开发标准
-------- --------
......
## 编译 PaddlePaddle 预测库
### 概述
使用 C-API 进行预测依赖于将 PaddlePaddle 核心代码编译成链接库,只需在编译时需配制下面这些编译选项:
必须配置选项:
- `WITH_C_API`,必须配置为`ON`
推荐配置选项:
- `WITH_PYTHON`,推荐配置为`OFF`
- `WITH_SWIG_PY`,推荐配置为`OFF`
- `WITH_GOLANG`,推荐设置为`OFF`
可选配置选项:
- `WITH_GPU`,可配置为`ON/OFF`
- `WITH_MKL`,可配置为`ON/OFF`
对推荐配置中的选项建议按照设置,以避免链接不必要的库。其它可选编译选项按需进行设定。
下面的代码片段从github拉取最新代码,配制编译选项(需要将PADDLE_ROOT替换为PaddlePaddle预测库的安装路径):
```shell
PADDLE_ROOT=/path/of/capi
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=$PADDLE_ROOT \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_C_API=ON \
-DWITH_SWIG_PY=OFF \
-DWITH_GOLANG=OFF \
-DWITH_PYTHON=OFF \
-DWITH_MKL=OFF \
-DWITH_GPU=OFF \
..
```
执行上述代码生成Makefile文件后,执行:`make && make install`。成功编译后,使用C-API所需的依赖(包括:(1)编译出的PaddlePaddle预测库和头文件;(2)第三方链接库和头文件)均会存放于`PADDLE_ROOT`目录中。
编译成功后在 `PADDLE_ROOT` 下会看到如下目录结构(包括了编译出的PaddlePaddle头文件和链接库,以及第三方依赖链接库和头文件(如果需要,由链接方式决定)):
```text
├── include
│   └── paddle
│   ├── arguments.h
│   ├── capi.h
│   ├── capi_private.h
│   ├── config.h
│   ├── error.h
│   ├── gradient_machine.h
│   ├── main.h
│   ├── matrix.h
│   ├── paddle_capi.map
│   └── vector.h
├── lib
│   ├── libpaddle_capi_engine.a
│   ├── libpaddle_capi_layers.a
│   ├── libpaddle_capi_shared.so
│   └── libpaddle_capi_whole.a
└── third_party
├── gflags
│   ├── include
│   │   └── gflags
│   │   ├── gflags_completions.h
│   │   ├── gflags_declare.h
│   │   ...
│   └── lib
│   └── libgflags.a
├── glog
│   ├── include
│   │   └── glog
│   │   ├── config.h
│   │   ...
│   └── lib
│   └── libglog.a
├── openblas
│   ├── include
│   │   ├── cblas.h
│   │   ...
│   └── lib
│   ...
├── protobuf
│   ├── include
│   │   └── google
│   │   └── protobuf
│   │   ...
│   └── lib
│   └── libprotobuf-lite.a
└── zlib
├── include
│   ...
└── lib
...
```
### 链接说明
目前提供三种链接方式:
1. 链接`libpaddle_capi_shared.so` 动态库
- 使用 PaddlePaddle C-API 开发预测程序链接`libpaddle_capi_shared.so`时,需注意:
1. 如果编译时指定编译CPU版本,且使用`OpenBLAS`数学库,在使用C-API开发预测程序时,只需要链接`libpaddle_capi_shared.so`这一个库。
1. 如果是用编译时指定CPU版本,且使用`MKL`数学库,由于`MKL`库有自己独立的动态库文件,在使用PaddlePaddle C-API开发预测程序时,需要自己链接MKL链接库。
1. 如果编译时指定编译GPU版本,CUDA相关库会在预测程序运行时动态装载,需要将CUDA相关的库设置到`LD_LIBRARY_PATH`环境变量中。
- 这种方式最为简便,链接相对容易,**在无特殊需求情况下,推荐使用此方式**
2. 链接静态库 `libpaddle_capi_whole.a`
- 使用PaddlePaddle C-API 开发预测程序链接`libpaddle_capi_whole.a`时,需注意:
1. 需要指定`-Wl,--whole-archive`链接选项。
1. 需要显式地链接 `gflags``glog``libz``protobuf` 等第三方库,可在`PADDLE_ROOT/third_party`下找到。
1. 如果在编译 C-API 时使用OpenBLAS数学库,需要显示地链接`libopenblas.a`
1. 如果在编译 C-API 是使用MKL数学库,需要显示地链接MKL的动态库。
3. 链接静态库 `libpaddle_capi_layers.a``libpaddle_capi_engine.a`
- 使用PaddlePaddle C-API 开发预测程序链接`libpaddle_capi_whole.a`时,需注意:
1. 这种链接方式主要用于移动端预测。
1. 为了减少生成链接库的大小把`libpaddle_capi_whole.a`拆成以上两个静态链接库。
1. 需指定`-Wl,--whole-archive -lpaddle_capi_layers` 和 `-Wl,--no-whole-archive -lpaddle_capi_engine` 进行链接。
1. 第三方依赖库需要按照与方式2同样方法显示地进行链接。
PaddlePaddle C-API
==================
.. toctree::
:maxdepth: 1
compile_paddle_lib_cn.md
organization_of_the_inputs_cn.md
workflow_of_capi_cn.md
## 输入/输出数据组织
这篇文档介绍在使用 PaddlePaddle C-API 时如何组织输入数据,以及如何解析神经网络前向计算的输出结果。
### 输入/输出数据类型
在C-API中,按照基本数据类型在PaddlePaddle内部的定义和实现,输入数据可分为:
1. 一维整型数组
1. 二维浮点型矩阵
- 稠密矩阵
- 稀疏矩阵
说明:
1. 一维数组**仅支持整型值**
- 常用于自然语言处理任务,例如:表示词语在词典中的序号;
- 分类任务中类别标签;
1. 逻辑上高于二维的数据(例如含有多个通道的图片,视频等)在程序实现中都会转化为二维矩阵,转化方法在相应的领域都有通用解决方案,需要使用者自己了解并完成转化;
1. 二维矩阵可以表示行向量和列向量,任何时候如果需要浮点型数组(向量),都应使用C-API中的矩阵来表示,而不是C-API中的一维数组。
1. 不论是一维整型数组还是二维浮点数矩阵,**为它们附加上序列信息将变成序列输入。PaddlePaddle 会通过判数据是否附带有序列信息来判断一个向量/矩阵是否是一个序列**。当非序列输入时,无需关心和处理序列信息。关于什么是“序列信息”,下文会详细进行介绍。
### 基本使用概念
- 在PaddlePaddle内部,神经网络中一个计算层的输入/输出被组织为一个 `Argument` 结构体,如果神经网络有多个输入或者多个输入,每一个输入/输入都会对应有自己的`Argument`
- `Argument` 并不真正“存储”数据,而是将输入/输出信息有机地组织在一起。
-`Argument`内部由`IVector`(对应着上文提到的一维整型数组)和`Matrix`(对应着上文提到的二维浮点型矩阵)来实际存储数据;由 `Sequence Start Positions` (下文详细解释) 来描述输入/输出的序列信息。
- **注**
1. 这篇文档之后部分将会统一使用`argument`来特指PaddlePaddle中神经网络计算层一个输入/输出数据。
1. 使用`paddle_ivector`来特指PaddlePaddle中的一维整型数组。
1. 使用`paddle_matrix`来特指PaddlePaddle中的二维浮点型矩阵。
### 组织输入数据
- 一维整型数组
概念上可以将`paddle_ivector`理解为一个一维的整型数组,通常用于表示离散的类别标签,或是在自然语言处理任务中表示词语在字典中的序号。下面的代码片段创建了含有三个元素`1`、`2`、`3`的`paddle_ivector`。
```c
int ids[] = {1, 2, 3};
paddle_ivector ids_array =
paddle_ivector_create(ids, sizeof(ids) / sizeof(int), false, false);
CHECK(paddle_arguments_set_ids(in_args, 0, ids_array));
```
- **稠密矩阵**
- 一个`m×n`的稠密矩阵是一个由`m``n`列元素排列成的矩形阵列,矩阵里的元素是浮点数。对神经网络来说,矩阵的高度`m`是一次预测接受的样本数目,宽度$n$是神经网络定义时,`paddle.layer.data``size`
- 下面的代码片段创建了一个高度为1,宽度为`layer_size`的稠密矩阵,矩阵中每个元素的值随机生成。
```c
paddle_matrix mat = paddle_matrix_create(
/* height = batch size */ 1,
/* width = dimensionality of the data layer */ layer_size,
/* whether to use GPU */ false);
paddle_real* array;
// Get the pointer pointing to the start address of the first row of the
// created matrix.
CHECK(paddle_matrix_get_row(mat, 0, &array));
// Fill the matrix with a randomly generated test sample.
srand(time(0));
for (int i = 0; i < layer_size; ++i) {
array[i] = rand() / ((float)RAND_MAX);
}
// Assign the matrix to the argument.
CHECK(paddle_arguments_set_value(in_args, 0, mat));
```
- **稀疏矩阵**
PaddlePaddle C-API 中 稀疏矩阵使用[CSR(Compressed Sparse Row Format)](https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format))格式存储。下图是CSR存储稀疏矩阵的示意图。
<p align="center">
<img src="https://user-images.githubusercontent.com/5842774/34159369-009fd328-e504-11e7-9e08-36bc6dc5e505.png" width=700><br> 图1. 稀疏矩阵存储示意图
</p>
CSR存储格式通过:(1)非零元素的值(上图中的`values`);(2)行偏移(上图中的`row offsets`):每一行元素在`values`中的起始偏移,`row offsets`中元素个数总是等于行数 + 1;(3)非零元素的列号(上图中的`column indices`)来确定稀疏矩阵的内容。
在PaddlePaddle C-API中,通过调用以下接口创建稀疏矩阵:
```c
PD_API paddle_matrix paddle_matrix_create_sparse(
uint64_t height, uint64_t width, uint64_t nnz, bool isBinary, bool useGpu);
```
1. 创建稀疏矩阵时需要显示地指定矩阵的(1)高度(`height`,在神经网络中等于一次预测处理的样本数)(2)宽度(`width``paddle.layer.data``size`)以及(3)非零元个数(`nnz`)。
1. 当上述接口第4个参数`isBinary`指定为`true`时,**只需要设置行偏移(`row_offset`)和列号(`colum indices`),不需要提供元素值(`values`)**,这时行偏移和列号指定的元素默认其值为1。
下面的代码片段创建了一个CPU上的二值稀疏矩阵:
```c
paddle_matrix mat = paddle_matrix_create_sparse(1, layer_size, nnz, true, false);
int colIndices[] = {9, 93, 109}; // layer_size here is greater than 109.
int rowOffset[] = {0, sizeof(colIndices) / sizeof(int)};
CHECK(paddle_matrix_sparse_copy_from(mat,
rowOffset,
sizeof(rowOffset) / sizeof(int),
colIndices,
(colIndices) / sizeof(int),
NULL /*values array is NULL.*/,
0 /*size of the value arrary is 0.*/));
CHECK(paddle_arguments_set_value(in_args, 0, mat));
```
下面的代码片段在创建了一个CPU上的带元素值的稀疏矩阵:
```c
paddle_matrix mat = paddle_matrix_create_sparse(1, layer_size, nnz, false, false);
int colIndices[] = {9, 93, 109}; // layer_size here is greater than 109.
int rowOffset[] = {0, sizeof(colIndices) / sizeof(int)};
float values[] = {0.5, 0.5, 0.5};
CHECK(paddle_matrix_sparse_copy_from(mat,
rowOffset,
sizeof(rowOffset) / sizeof(int),
colIndices,
sizeof(colIndices) / sizeof(int),
values,
sizeof(values) / sizeof(float)));
```
注意事项:
1. 移动端预测**不支持**稀疏矩阵及相关的接口。
### 组织序列信息
多个排成一列的元素(可以是整型、浮点数、浮点数向量等)构成一个序列,元素之间的顺序是序列所携带的重要信息。不同序列可能会含有不同数目个元素。在 PaddlePaddle 中,序列输入/输出数据是在上文介绍的**数据输入(一维整型数组,二维浮点数矩阵)基础上,附加上序列信息**。下面详细解释什么是“序列信息”。
我们将神经网络一次计算接受的所有输入样本称之为一个`batch`(可以含有一条或多条样本),每一个序列在整个`batch`中的偏移,就是PaddlePaddle中所指的**序列信息**,称之为“sequence start positions”。PaddlePaddle 支持两种序列类型:
1. 单层序列
- 序列中的每一个元素是非序列,是进行计算的基本单位,不可再进行拆分。
- 例如:自然语言中的句子是一个序列,序列中的元素是词语;
1. 双层序列
- 序列中的每一个元素又是一个序列。
- 例如:自然语言中的段落是一个双层序列;段落是由句子构成的序列;句子是由词语构成的序列。
- 双层序列在处理长序列的任务或是构建层级模型时会发挥作用。
这篇文档之后部分会统一使用`sequence_start_positions`来特指:PaddlePaddle中神经网络计算层输入/输出所携带的序列信息。
对双层序列来讲,不仅要提供每一个外层序列在整个`batch`中的偏移,每一个外层序列又含有若干个内层序列,需要同时提供每一个内层序列在整个`batch`中的偏移。也就是说:**双层序列需要设置分别为外层序列和内层序列分别设置`sequence_start_positions`信息**
**注:**
1. 不论序列中的元素在内存中占用多少实际存储空间,`sequence_start_positions`表示的偏移是以“序列中的一个元素”作为统计的基本单位,而不是相对`batch`起始存储地址以数据的存储大小为单位的偏移。
1. 非序列输入不携带`sequence_start_positions`,非序列输入无需构造`sequence_start_positions`
1. **不论是单层序列还是双层序列的序列信息,都使用`paddle_ivector`(也就是PaddlePaddle中的一维整型数组)来存储。**
图2 是PaddlePaddle中单层序列和双层序列存储示意图。
<p align="center">
<img src="https://user-images.githubusercontent.com/5842774/34159714-1f81a9be-e505-11e7-8a8a-4902146ec899.png" width=800><br>图2. 序列输入示意图
</p>
- 单层序列
图2 (a) 展示了一个含有4个序列的`batch`输入:
1. 4个序列的长度分别为:5、3、2、4;
1. 这时的`sequence_start_positions`为:`[0, 5, 8, 10, 14]`;
1. 本地训练. 不论数据域是`paddle_ivector`类型还是`paddle_matrix`类型,都可以通过调用下面的接口为原有的数据输入附加上序列信息,使之变为一个单层序列输入,代码片段如下:
```c
int seq_pos_array[] = {0, 5, 8, 10, 14};
paddle_ivector seq_pos = paddle_ivector_create(
seq_pos_array, sizeof(seq_pos_array) / sizeof(int), false, false);
// Suppose the network only has one input data layer.
CHECK(paddle_arguments_set_sequence_start_pos(in_args, 0, 0, seq_pos));
```
- 双层序列
图2 (b) 展示了一个含有4个序列的`batch`输入;
1. 4个序列的长度分别为:5、3、2、4;这四个序列又分别含有3、2、1、2个子序列;
1. 这时的需要同时提供:
- 外层序列在`batch`中的起始偏移`:[0, 5, 8, 10, 14]`;
- 内层序列在`batch`中的起始偏移:`[0, 2, 3, 5, 7, 8, 10, 13, 14]`;
1. 不论数据域是`paddle_ivector`类型还是`paddle_matrix`类型,这时需要调用创建序列信息和为`argument`设置序列信息的接口**两次**,分别为数据输入添加外层序列和内层序列的序列信息,使之变为一个双层序列输入,代码片段如下:
```c
// set the sequence start positions for the outter sequences.
int outter_seq_pos_array[] = {0, 5, 8, 10, 14};
paddle_ivector seq_pos =
paddle_ivector_create(outter_seq_pos_array,
sizeof(outter_pos_array) / sizeof(int),
false,
false);
// The third parameter of this API indicates the sequence level.
// 0 for the outter sequence. 1 for the inner sequence.
// If the input is a sequence not the nested sequence, the third parameter is
// fixed to be 0.
CHECK(paddle_arguments_set_sequence_start_pos(in_args, 0, 0, seq_pos));
// set the sequence start positions for the outter sequences.
int inner_seq_pos_array[] = {0, 2, 3, 5, 7, 8, 10, 13, 14};
paddle_ivector seq_pos = paddle_ivector_create(
inner_pos_array, sizeof(inner_pos_array) / sizeof(int), false, false);
// The third parameter of this API indicates the sequence level.
// 0 for the outter sequence. 1 for the inner sequence.
CHECK(paddle_arguments_set_sequence_start_pos(in_args, 0, 1, seq_pos));
```
注意事项:
1. 当一个`batch`中含有多个序列,**不支持序列长度为`0`的序列(也就是空输入)** 作为输入。不同计算层对空输入的处理策略有可能不同,潜在会引起未定义行为,或者引起行时错误,请在输入时进行合法性检查。
### Python 端数据类型说明
下表列出了Python端训练接口暴露的数据类型(`paddle.layer.data`函数`type`字段的取值)对应于调用C-API需要创建的数据类型:
<html>
<table border="2" frame="border">
<table>
<thead>
<tr>
<th style="text-align:left">Python 端数据类型</th>
<th style="text-align:left">C-API 输入数据类型</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left">paddle.data_type.integer_value</td>
<td style="text-align:left">整型数组,无需附加序列信息</td>
</tr>
<tr>
<td style="text-align:left">paddle.data_type.dense_vector</td>
<td style="text-align:left">浮点型稠密矩阵,无需附加序列信息</td>
</tr>
<tr>
<td style="text-align:left">paddle.data_type.sparse_binary_vector</td>
<td style="text-align:left">浮点型稀疏矩阵,无需提供非零元的值,默认为1,无需附加序列信息</td>
</tr>
<tr>
<td style="text-align:left">paddle.data_type.sparse_vector</td>
<td style="text-align:left">浮点型稀疏矩阵,需提供非零元的值,无需附加序列信息</td>
</tr>
<tr>
<td style="text-align:left">paddle.data_type.integer_value_sequence</td>
<td style="text-align:left">整型数组,需附加序列信息</td>
</tr>
<tr>
<td style="text-align:left">paddle.data_type.dense_vector_sequence</td>
<td style="text-align:left">浮点型稠密矩阵,需附加序列信息</td>
</tr>
<tr>
<td style="text-align:left">paddle.data_type.sparse_binary_vector_sequence</td>
<td style="text-align:left">浮点型稀疏矩阵,无需提供非零元的值,默认为1,需附加序列信息</td>
</tr>
<tr>
<td style="text-align:left">paddle.data_type.sparse_vector_sequence</td>
<td style="text-align:left">浮点型稀疏矩阵,需提供非零元的值,需附加序列信息</td>
</tr>
<tr>
<td style="text-align:left">paddle.data_type.integer_value_sub_sequence</td>
<td style="text-align:left">整型数组,需附加双层序列信息</td>
</tr>
<tr>
<td style="text-align:left">paddle.data_type.dense_vector_sub_sequence</td>
<td style="text-align:left">浮点型稠密矩阵,需附加双层序列信息</td>
</tr>
<tr>
<td style="text-align:left">paddle.data_type.sparse_binary_vector_sub_sequence</td>
<td style="text-align:left">浮点型稀疏矩阵,无需提供非零元的值,默认为1,需附加双层序列信息</td>
</tr>
<tr>
<td style="text-align:left">paddle.data_type.sparse_vector_sub_sequence</td>
<td style="text-align:left">浮点型稀疏矩阵,需提供非零元的值,需附加双层序列信息</td>
</tr>
</tbody>
</table>
</html>
<br>
### 输出数据
PaddlePaddle中一个计算层的输出数据组织方式和输入数据组织方式完全相同。一个输出数据同样被组织为一个`argument``argument`通过`paddle_matrix``paddle_ivector`存数数据,如果输出是一个序列,那么会携带有`sequence_start_positions`信息。调用C-API相关接口,读取需要的结果即可。
### 总结
- 在PaddlePaddle内部,神经网络中一个计算层的输入/输出被组织为`argument`
- `argument`并不真正“存储”数据,而是将输入/输出信息有机地组织在一起。
-`argument`内部由`paddle_ivector`(一维整型数组)和`paddle_matrix`(二维浮点型矩阵)来实际存储数据。
如果是一个序列输入/输出由 `sequence start positions` 来记录输入/输出的序列信息。
于是,在组织神经网络输入时,需要思考完成以下工作:
1. 为每一个输入/输出创建`argument`
- C-API 中操作`argument`的接口请查看[argument.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/arguments.h)
1. 为每一个`argument`创建`paddle_matrix`或者`paddle_ivector`来存储数据。
- C-API 中操作`paddle_ivector`的接口请查看 [vector.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/vector.h)
- C-API 中操作`paddle_matrix`的接口请查看[matrix.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/matrix.h)
1. 如果输入是序列数据,需要创建并填写`sequence_start_positions`信息。
- 通过调用 [`paddle_arguments_set_sequence_start_pos`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/arguments.h#L137) 来为一个`argument`添加序列信息。
- 通过调用 [`paddle_arguments_get_sequence_start_pos`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/arguments.h#L150) 来读取一个`argument`添加序列信息。
- 接口说明请查看 [argument.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/arguments.h) 文件。
## C-API 使用流程
这篇文档介绍 PaddlePaddle C-API 整体使用流程。
### 使用流程
使用 C-API 的工作流程如图1所示,分为(1)准备预测模型和(2)预测程序开发两大部分。
<p align="center">
<img src="https://user-images.githubusercontent.com/5842774/34658453-365f73ea-f46a-11e7-9b3f-0fd112b27bae.png" width=500><br> 图1. C-API使用流程示意图
</p>
- 准备预测模型
1. 只将神经网络结构进行序列化。
- 只对神经网络结构进行序列化,加载模型需同时指定:网络结构的序列化结果和模型参数存储目录。
1. 将网络结构定义和训练结束存储下来的模型参数文件(多个)合并入一个文件。
- 神经网络模型结构和训练好的模型将被序列化合并入一个文件。
- 预测时只需加载一个文件便于发布。
- **注意**:以上两种方式只需选择其一即可。
- 调用 C-API 开发预测序
1. 初始化PaddlePaddle运行环境。
1. 加载预测模型。
1. 创建神经网络输入,组织输入数据。
1. 进行前向计算,获得计算结果。
1. 清理和结束。
### 准备预测模型
准备预测模型部分,我们以手写数字识别任务为例进行介绍。手写数字识别任务定义了一个含有[两个隐层的简单全连接网络](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.cn.md#softmax回归softmax-regression),网络接受一幅图片作为输入,将图片分类到 0 ~ 9 类别标签之一。完整代码可以查看[此目录](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense) 中的相关脚本。
调用C-API开发预测程序需要一个训练好的模型,运行[MNIST手写数字识别目录](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense)下的[mnist_v2.py](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/examples/model_inference/dense/mnist_v2.py)脚本,在终端执行`python mnist_v2.py`,会使用 PaddlePaddle 内置的 [MNIST 数据集](http://yann.lecun.com/exdb/mnist/)进行训练。训练好的模型默认保存在当前运行目录下的`models`目录中。
下面,我们将训练结束后存储下来的模型转换成预测模型。
1. 序列化神经网络模型配置
PaddlePaddle 使用 protobuf 来传输网络配置文件中定义的网络结构和相关参数,使用 C-API 进行预测时,需要将网络结构使用 protobuf 进行序列化,写入文件中。
调用[`paddle.utils.dump_v2_config`](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/utils/dump_v2_config.py)中的`dump_v2_config`函数能够将使用 PaddlePaddle V2 API 定义的神经网络结构 dump 到指定文件中,示例代码如下:
```python
from paddle.utils.dump_v2_config import dump_v2_config
from mnist_v2 import network
predict = network(is_infer=True)
dump_v2_config(predict, "trainer_config.bin", True)
```
对[手写数字识别](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense)这个示例,[`mnist_v2.py`](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense/mnist_v2.py)脚本集成了序列化神经网络结构的过程,可以直接运行 `python mnist_v2.py --task dump_config` 对神经网络结构进行序列化,结果会写入当前运行目录下的`trainer_config.bin`文件中。
使用这种方式,需要**在运行时将神经网络的多个可学习参数放在同一个目录中**,C-API可以通过分别指定序列化后的网络结构文件和参数目录来加载训练好的模型。
2. 合并模型文件(可选)
一些情况为了便于发布,希望能够将序列化后的神经网络结构和训练好的模型参数打包进一个文件。对于这样的需求,可以使用`paddle.utils.merge_model`中的`merge_v2_model`接口对神经网络结构和训练好的参数进行序列化,将序列化结果写入一个文件内。
代码示例如下:
```python
from paddle.utils.merge_model import merge_v2_modelss
from mnist_v2 import network
net = network(is_infer=True)
param_file = "models/params_pass_4.tar"
output_file = "output.paddle.model"
merge_v2_model(net, param_file, output_file)
```
对[手写数字识别](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense)这个示例,可直接运行 `python` [merge_v2_model.py](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense/merge_v2_model.py)。序列化结果会写入当前运行目录下的`output.paddle.model`文件中。使用这种方式,运行时C-API可以通过指定`output.paddle.model`文件的路径来加载预测模型。
#### 注意事项
1. 为使用C-API,在调用`dump_v2_config`序列化神经网络结构时,参数`binary`必须指定为`True`
1. **预测使用的网络结构往往不同于训练**,通常需要去掉网络中的:(1)类别标签层;(2)损失函数层;(3)`evaluator`等,只留下核心计算层,请注意是否需要修改网络结构。
1. 预测时,可以获取网络中定义的任意多个(大于等于一个)层前向计算的结果,需要哪些层的计算结果作为输出,就将这些层加入一个Python list中,作为调用`dump_v2_config`的第一个参数。
### 编写预测代码
预测代码更多详细示例代码请参考[C-API使用示例](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference) 目录下的代码示例。这一节对图1中预测代码编写的5个步骤进行介绍和说明。
#### step 1. 初始化PaddlePaddle运行环境
第一步需调用[`paddle_init`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/main.h#L27) 初始化PaddlePaddle运行环境,该接口接受两个参数:参数的个数和参数列表。
#### step2. 加载模型
这里介绍C-API使用中的一个重要概念:Gradient Machine。
概念上,在 PaddlePaddle 内部,一个GradientMachine类的对象管理着一组计算层(PaddlePaddle Layers)来完成前向和反向计算,并处理与之相关的所有细节。在调用C-API预测时,只需进行前向计算而无需调用反向计算。这篇文档之后部分会使用`gradient machine`来特指调用PaddlePaddle C-API创建的GradientMachine类的对象。每一个 `gradient machine` 都会管理维护一份训练好的模型,下面是C-API提供的,两种常用的模型加载方式:
1. 调用[`paddle_gradient_machine_load_parameter_from_disk`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/gradient_machine.h#L61)接口,从磁盘加载预测模型。这时`gradient machine`会独立拥有一份训练好的模型;
1. 调用[`paddle_gradient_machine_create_shared_param`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/gradient_machine.h#L88)接口,与其它`gradient machine`的共享已经加载的预测模型。这种情况多出现在使用多线程预测时,通过多个线程共享同一个模型来减少内存开销。可参考[此示例](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/examples/model_inference/multi_thread/main.c)
- 注意事项
1. 使用PaddlePaddle V2 API训练,模型中所有可学习参数会被存为一个压缩文件,需要手动进行解压,将它们放在同一目录中,C-API不会直接加载 V2 API 存储的压缩文件。
1. 如果使用`merge model`方式将神经网络结构和训练好的参数序列化到一个文件,请参考此[示例](https://github.com/PaddlePaddle/Mobile/blob/develop/Demo/linux/paddle_image_recognizer.cpp#L59)
1. 通过灵活使用以上两个接口,加载模型可其它多种方式,例如也可在程序运行过程中再加载另外一个模型。
#### step 3. 创建神经网络输入,组织输入数据
基本使用概念:
- 在PaddlePaddle内部,神经网络中一个计算层的输入输出被组织为一个 `Argument` 结构体,如果神经网络有多个输入或者多个输出,每一个输入/输出都会对应有自己的`Argument`
- `Argument` 并不真正“存储”数据,而是将输入/输出数据有机地组织在一起。
-`Argument`内部由:1. `Matrix`(二维矩阵,存储浮点类型输入/输出);2. `IVector`(一维数组,**仅用于存储整型值**,多用于自然语言处理任务)来实际存储数据。
C-API支持的所有输入数据类型和他们的组织方式,请参考“输入/输出数据组织”一节。
这篇文档的之后部分会使用`argument`来特指PaddlePaddle C-API中神经网络的一个输入/输出,使用`paddle_matrix`**特指**`argument`中用于存储数据的`Matrix`类的对象。
在组织神经网络输入,获取输出时,需要思考完成以下工作:
1. 为每一个输入/输出创建`argument`
1. 为每一个`argument`创建`paddle_matrix`来存储数据;
与输入不同的是,不需在使用C-API时为输出`argument``paddle_matrix`对象分配空间。前向计算之后PaddlePaddle内部已经分配/管理了每个计算层输出的存储空间。
#### step 4. 前向计算
完成上述准备之后,通过调用 [`paddle_gradient_machine_forward`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/gradient_machine.h#L73) 接口完成神经网络的前向计算。
#### step 5. 清理
结束预测之后,对使用的中间变量和资源进行清理和释放。
...@@ -3,59 +3,82 @@ ...@@ -3,59 +3,82 @@
#include "../common/common.h" #include "../common/common.h"
// Modify this path as needed.
#define CONFIG_BIN "./trainer_config.bin" #define CONFIG_BIN "./trainer_config.bin"
// Modify this path as needed.
// This demo assumes that merged model is not used, then this path is the
// directory storing all the trained parameters.
// If the model is trained by PaddlePaddle V2 API, the model is saved as
// a compressed file. You need to uncompress the compressed file first.
#define MODEL_PATH "models/pass_4"
int main() { int main() {
// Initalize Paddle // Initalize the PaddlePaddle runtime environment.
char* argv[] = {"--use_gpu=False"}; char* argv[] = {"--use_gpu=False"};
CHECK(paddle_init(1, (char**)argv)); CHECK(paddle_init(1, (char**)argv));
// Reading config binary file. It is generated by `convert_protobin.sh` // Read the binary configuration file generated by `convert_protobin.sh`
long size; long size;
void* buf = read_config(CONFIG_BIN, &size); void* buf = read_config(CONFIG_BIN, &size);
// Create a gradient machine for inference. // Create the gradient machine for inference.
paddle_gradient_machine machine; paddle_gradient_machine machine;
CHECK(paddle_gradient_machine_create_for_inference(&machine, buf, (int)size)); CHECK(paddle_gradient_machine_create_for_inference(&machine, buf, (int)size));
CHECK(paddle_gradient_machine_randomize_param(machine));
// Loading parameter. Uncomment the following line and change the directory. // Load the trained model. Modify the parameter MODEL_PATH to set the correct
// CHECK(paddle_gradient_machine_load_parameter_from_disk(machine, // path of the trained model.
// "./some_where_to_params")); CHECK(paddle_gradient_machine_load_parameter_from_disk(machine, MODEL_PATH));
// Inputs and outputs of the network are organized as paddle_arguments object
// in C-API. In the comments below, "argument" specifically means one input of
// the neural network in PaddlePaddle C-API.
paddle_arguments in_args = paddle_arguments_create_none(); paddle_arguments in_args = paddle_arguments_create_none();
// There is only one input of this network. // There is only one data layer in this demo MNIST network, invoke this
// function to create one argument.
CHECK(paddle_arguments_resize(in_args, 1)); CHECK(paddle_arguments_resize(in_args, 1));
// Create input matrix. // Each argument needs one matrix or one ivector (integer vector, for sparse
paddle_matrix mat = paddle_matrix_create(/* sample_num */ 1, // index input, usually used in NLP task) to holds the real input data.
/* size */ 784, // In the comments below, "matrix" specifically means the object needed by
/* useGPU */ false); // argument to hold the data. Here we create the matrix for the above created
srand(time(0)); // agument to store the testing samples.
paddle_matrix mat =
paddle_matrix_create(/* height = batch size */ 1,
/* width = dimensionality of the data layer */ 784,
/* whether to use GPU */ false);
paddle_real* array; paddle_real* array;
// Get the pointer pointing to the start address of the first row of the
// Get First row. // created matrix.
CHECK(paddle_matrix_get_row(mat, 0, &array)); CHECK(paddle_matrix_get_row(mat, 0, &array));
// Fill the matrix with a randomly generated test sample.
srand(time(0));
for (int i = 0; i < 784; ++i) { for (int i = 0; i < 784; ++i) {
array[i] = rand() / ((float)RAND_MAX); array[i] = rand() / ((float)RAND_MAX);
} }
// Assign the matrix to the argument.
CHECK(paddle_arguments_set_value(in_args, 0, mat)); CHECK(paddle_arguments_set_value(in_args, 0, mat));
// Create the output argument.
paddle_arguments out_args = paddle_arguments_create_none(); paddle_arguments out_args = paddle_arguments_create_none();
// Invoke the forward computation.
CHECK(paddle_gradient_machine_forward(machine, CHECK(paddle_gradient_machine_forward(machine,
in_args, in_args,
out_args, out_args,
/* isTrain */ false)); /* is train taks or not */ false));
paddle_matrix prob = paddle_matrix_create_none();
// Create the matrix to hold the forward result of the neural network.
paddle_matrix prob = paddle_matrix_create_none();
// Access the matrix of the output argument, the predicted result is stored in
// which.
CHECK(paddle_arguments_get_value(out_args, 0, prob)); CHECK(paddle_arguments_get_value(out_args, 0, prob));
uint64_t height; uint64_t height;
uint64_t width; uint64_t width;
CHECK(paddle_matrix_get_shape(prob, &height, &width)); CHECK(paddle_matrix_get_shape(prob, &height, &width));
CHECK(paddle_matrix_get_row(prob, 0, &array)); CHECK(paddle_matrix_get_row(prob, 0, &array));
...@@ -68,6 +91,7 @@ int main() { ...@@ -68,6 +91,7 @@ int main() {
} }
printf("\n"); printf("\n");
// The cleaning up.
CHECK(paddle_matrix_destroy(prob)); CHECK(paddle_matrix_destroy(prob));
CHECK(paddle_arguments_destroy(out_args)); CHECK(paddle_arguments_destroy(out_args));
CHECK(paddle_matrix_destroy(mat)); CHECK(paddle_matrix_destroy(mat));
......
from paddle.utils.merge_model import merge_v2_model
from mnist_v2 import network
net = network(is_infer=True)
param_file = "models/params_pass_4.tar"
output_file = "output.paddle.model"
merge_v2_model(net, param_file, output_file)
import os
import sys
import gzip
import logging
import argparse
from PIL import Image
import numpy as np
import paddle.v2 as paddle
from paddle.utils.dump_v2_config import dump_v2_config
logger = logging.getLogger("paddle")
logger.setLevel(logging.INFO)
def multilayer_perceptron(img, layer_size, lbl_dim):
for idx, size in enumerate(layer_size):
hidden = paddle.layer.fc(input=(img if not idx else hidden),
size=size,
act=paddle.activation.Relu())
return paddle.layer.fc(input=hidden,
size=lbl_dim,
act=paddle.activation.Softmax())
def network(input_dim=784, lbl_dim=10, is_infer=False):
images = paddle.layer.data(
name='pixel', type=paddle.data_type.dense_vector(input_dim))
predict = multilayer_perceptron(
images, layer_size=[128, 64], lbl_dim=lbl_dim)
if is_infer:
return predict
else:
label = paddle.layer.data(
name='label', type=paddle.data_type.integer_value(lbl_dim))
return paddle.layer.classification_cost(input=predict, label=label)
def main(task="train", use_gpu=False, trainer_count=1, save_dir="models"):
if task == "train":
if not os.path.exists(save_dir):
os.mkdir(save_dir)
paddle.init(use_gpu=use_gpu, trainer_count=trainer_count)
cost = network()
parameters = paddle.parameters.create(cost)
optimizer = paddle.optimizer.Momentum(
learning_rate=0.1 / 128.0,
momentum=0.9,
regularization=paddle.optimizer.L2Regularization(rate=0.0005 * 128))
trainer = paddle.trainer.SGD(cost=cost,
parameters=parameters,
update_equation=optimizer)
def event_handler(event):
if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 100 == 0:
logger.info("Pass %d, Batch %d, Cost %f, %s" %
(event.pass_id, event.batch_id, event.cost,
event.metrics))
if isinstance(event, paddle.event.EndPass):
with gzip.open(
os.path.join(save_dir, "params_pass_%d.tar" %
event.pass_id), "w") as f:
trainer.save_parameter_to_tar(f)
trainer.train(
reader=paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=8192),
batch_size=128),
event_handler=event_handler,
num_passes=5)
elif task == "dump_config":
predict = network(is_infer=True)
dump_v2_config(predict, "trainer_config.bin", True)
else:
raise RuntimeError(("Error value for parameter task. "
"Available options are: train and dump_config."))
def parse_cmd():
parser = argparse.ArgumentParser(
description="PaddlePaddle MNIST demo for CAPI.")
parser.add_argument(
"--task",
type=str,
required=False,
help=("A string indicating the taks type. "
"Available options are: \"train\", \"dump_config\"."),
default="train")
parser.add_argument(
"--use_gpu",
type=bool,
help=("A bool flag indicating whether to use GPU device or not."),
default=False)
parser.add_argument(
"--trainer_count",
type=int,
help=("This parameter is only used in training task. It indicates "
"how many computing threads are created in training."),
default=1)
parser.add_argument(
"--save_dir",
type=str,
help=("This parameter is only used in training task. It indicates "
"path of the directory to save the trained models."),
default="models")
return parser.parse_args()
if __name__ == "__main__":
args = parse_cmd()
main(args.task, args.use_gpu, args.trainer_count, args.save_dir)
#include <paddle/capi.h> #include <paddle/capi.h>
#include <time.h> #include <time.h>
#include "../common/common.h" #include "../common/common.h"
#define CONFIG_BIN "./trainer_config.bin" #define CONFIG_BIN "./trainer_config.bin"
...@@ -9,16 +10,18 @@ int main() { ...@@ -9,16 +10,18 @@ int main() {
char* argv[] = {"--use_gpu=False"}; char* argv[] = {"--use_gpu=False"};
CHECK(paddle_init(1, (char**)argv)); CHECK(paddle_init(1, (char**)argv));
// Reading config binary file. It is generated by `convert_protobin.sh` // Read the binary configuration file which is generated by
// `convert_protobin.sh`
long size; long size;
void* buf = read_config(CONFIG_BIN, &size); void* buf = read_config(CONFIG_BIN, &size);
// Create a gradient machine for inference. // Create the gradient machine for inference.
paddle_gradient_machine machine; paddle_gradient_machine machine;
CHECK(paddle_gradient_machine_create_for_inference(&machine, buf, (int)size)); CHECK(paddle_gradient_machine_create_for_inference(&machine, buf, (int)size));
CHECK(paddle_gradient_machine_randomize_param(machine)); CHECK(paddle_gradient_machine_randomize_param(machine));
// Loading parameter. Uncomment the following line and change the directory. // Load the trained parameters. Uncomment the following line and change the
// directory as needed.
// CHECK(paddle_gradient_machine_load_parameter_from_disk(machine, // CHECK(paddle_gradient_machine_load_parameter_from_disk(machine,
// "./some_where_to_params")); // "./some_where_to_params"));
paddle_arguments in_args = paddle_arguments_create_none(); paddle_arguments in_args = paddle_arguments_create_none();
...@@ -26,7 +29,7 @@ int main() { ...@@ -26,7 +29,7 @@ int main() {
// There is only one input of this network. // There is only one input of this network.
CHECK(paddle_arguments_resize(in_args, 1)); CHECK(paddle_arguments_resize(in_args, 1));
// Create input matrix. // Create the input matrix.
paddle_matrix mat = paddle_matrix_create_sparse(1, 784, 3, true, false); paddle_matrix mat = paddle_matrix_create_sparse(1, 784, 3, true, false);
srand(time(0)); srand(time(0));
paddle_real* array; paddle_real* array;
......
...@@ -47,7 +47,7 @@ cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker) ...@@ -47,7 +47,7 @@ cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker)
cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto) cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto)
cc_library(shape_inference SRCS shape_inference.cc DEPS ddim attribute device_context) cc_library(shape_inference SRCS shape_inference.cc DEPS ddim attribute device_context)
cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope glog cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope glog
shape_inference data_transform) shape_inference data_transform lod_tensor)
cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry init) cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry init)
cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS shape_inference op_info operator glog) cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS shape_inference op_info operator glog)
......
...@@ -31,15 +31,14 @@ static const platform::DeviceContext* GetDeviceContext( ...@@ -31,15 +31,14 @@ static const platform::DeviceContext* GetDeviceContext(
} }
} }
Tensor* DeviceTransform(const Tensor& in, const platform::Place& dst_place) { void DeviceTransform(const Tensor& in, const platform::Place& dst_place,
Tensor* out) {
VLOG(3) << "DeviceTransform in, src_place " << in.place() VLOG(3) << "DeviceTransform in, src_place " << in.place()
<< " dst_place: " << dst_place; << " dst_place: " << dst_place;
Tensor* out = new Tensor();
auto* dev_ctx = GetDeviceContext(in.place(), dst_place); auto* dev_ctx = GetDeviceContext(in.place(), dst_place);
dev_ctx->Wait(); dev_ctx->Wait();
Copy(in, dst_place, *dev_ctx, out); Copy(in, dst_place, *dev_ctx, out);
dev_ctx->Wait(); dev_ctx->Wait();
return out;
} }
} // namespace framework } // namespace framework
......
...@@ -21,7 +21,8 @@ limitations under the License. */ ...@@ -21,7 +21,8 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace framework { namespace framework {
Tensor* DeviceTransform(const Tensor& in, const platform::Place& dst_place); void DeviceTransform(const Tensor& in, const platform::Place& dst_place,
Tensor* out);
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -14,7 +14,9 @@ limitations under the License. */ ...@@ -14,7 +14,9 @@ limitations under the License. */
#pragma once #pragma once
#include <iostream> #include <cctype>
#include <ostream>
#include "paddle/platform/enforce.h" #include "paddle/platform/enforce.h"
namespace paddle { namespace paddle {
...@@ -27,12 +29,19 @@ enum class DataLayout { ...@@ -27,12 +29,19 @@ enum class DataLayout {
}; };
inline DataLayout StringToDataLayout(const std::string& str) { inline DataLayout StringToDataLayout(const std::string& str) {
if (str == "NHWC" || str == "nhwc") { std::string s(str);
for (size_t i = 0; i < s.size(); ++i) {
s[i] = toupper(s[i]);
}
if (s == "NHWC") {
return DataLayout::kNHWC; return DataLayout::kNHWC;
} else if (str == "NCHW" || str == "nchw") { } else if (s == "NCHW") {
return DataLayout::kNCHW; return DataLayout::kNCHW;
} else if (s == "ANYLAYOUT") {
return DataLayout::kAnyLayout;
} else { } else {
PADDLE_THROW("Unknown storage order string: %s", str); PADDLE_THROW("Unknown storage order string: %s", s);
} }
} }
...@@ -49,7 +58,7 @@ inline std::string DataLayoutToString(const DataLayout& data_layout) { ...@@ -49,7 +58,7 @@ inline std::string DataLayoutToString(const DataLayout& data_layout) {
} }
} }
inline std::ostream& operator<<(std::ostream& out, DataLayout l) { inline std::ostream& operator<<(std::ostream& out, const DataLayout& l) {
out << DataLayoutToString(l); out << DataLayoutToString(l);
return out; return out;
} }
......
...@@ -19,16 +19,14 @@ limitations under the License. */ ...@@ -19,16 +19,14 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace framework { namespace framework {
Tensor* DataTransform(const OpKernelType& expected_kernel_type, void DataTransform(const OpKernelType& expected_kernel_type,
const OpKernelType& kernel_type_for_var, const OpKernelType& kernel_type_for_var,
const Tensor& input_tensor) { const Tensor& input_tensor, Tensor* out) {
Tensor* out = nullptr;
if (!platform::is_same_place(kernel_type_for_var.place_, if (!platform::is_same_place(kernel_type_for_var.place_,
expected_kernel_type.place_)) { expected_kernel_type.place_)) {
out = DeviceTransform(input_tensor, expected_kernel_type.place_); DeviceTransform(input_tensor, expected_kernel_type.place_, out);
} }
PADDLE_ENFORCE_NOT_NULL(out, "out should not be null"); PADDLE_ENFORCE_NOT_NULL(out, "out should not be null");
return out;
} }
void CopyVariableWithTensor(const Variable& in_var, const Tensor& tensor, void CopyVariableWithTensor(const Variable& in_var, const Tensor& tensor,
......
...@@ -30,9 +30,9 @@ limitations under the License. */ ...@@ -30,9 +30,9 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace framework { namespace framework {
Tensor* DataTransform(const OpKernelType& expected_kernel_type, void DataTransform(const OpKernelType& expected_kernel_type,
const OpKernelType& kernel_type_for_var, const OpKernelType& kernel_type_for_var,
const Tensor& input_tensor); const Tensor& input_tensor, Tensor* out);
void CopyVariableWithTensor(const Variable& in_var, const Tensor& tensor, void CopyVariableWithTensor(const Variable& in_var, const Tensor& tensor,
Variable& out_var); Variable& out_var);
......
...@@ -11,6 +11,7 @@ distributed under the License is distributed on an "AS IS" BASIS, ...@@ -11,6 +11,7 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include <string.h> // for strdup
#include <algorithm> #include <algorithm>
#include <string> #include <string>
...@@ -60,7 +61,9 @@ void InitDevices() { ...@@ -60,7 +61,9 @@ void InitDevices() {
} }
void InitGLOG(const std::string &prog_name) { void InitGLOG(const std::string &prog_name) {
google::InitGoogleLogging(prog_name.c_str()); // glog will not hold the ARGV[0] inside.
// Use strdup to alloc a new string.
google::InitGoogleLogging(strdup(prog_name.c_str()));
google::InstallFailureSignalHandler(); google::InstallFailureSignalHandler();
} }
......
...@@ -69,6 +69,12 @@ std::ostream &operator<<(std::ostream &os, const LoDTensor &t) { ...@@ -69,6 +69,12 @@ std::ostream &operator<<(std::ostream &os, const LoDTensor &t) {
return os; return os;
} }
std::string LoDToString(const LoD &lod) {
std::ostringstream stream;
stream << lod;
return stream.str();
}
LoD SliceInLevel(const LoD &in, size_t level, size_t elem_begin, LoD SliceInLevel(const LoD &in, size_t level, size_t elem_begin,
size_t elem_end) { size_t elem_end) {
PADDLE_ENFORCE_LT(level, in.size()); PADDLE_ENFORCE_LT(level, in.size());
......
...@@ -60,6 +60,8 @@ using LoD = std::vector<Vector<size_t>>; ...@@ -60,6 +60,8 @@ using LoD = std::vector<Vector<size_t>>;
std::ostream& operator<<(std::ostream& os, const LoD& lod); std::ostream& operator<<(std::ostream& os, const LoD& lod);
std::ostream& operator<<(std::ostream& os, const LoDTensor& t); std::ostream& operator<<(std::ostream& os, const LoDTensor& t);
std::string LoDToString(const LoD& lod);
LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin, LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin,
size_t elem_end); size_t elem_end);
/* /*
......
...@@ -85,5 +85,10 @@ inline std::string KernelTypeToString(const OpKernelType& kernel_key) { ...@@ -85,5 +85,10 @@ inline std::string KernelTypeToString(const OpKernelType& kernel_key) {
return stream.str(); return stream.str();
} }
inline bool TransFromNeeded(const OpKernelType& l, const OpKernelType& r) {
return (!platform::places_are_same_class(l.place_, r.place_)) ||
(l.data_type_ != r.data_type_) || (l.data_layout_ != r.data_layout_);
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -368,24 +368,6 @@ TEST(OperatorRegistrar, OpWithMultiKernel) { ...@@ -368,24 +368,6 @@ TEST(OperatorRegistrar, OpWithMultiKernel) {
// TODO(qiao) add priority back // TODO(qiao) add priority back
// use all available kernels // use all available kernels
paddle::framework::UseALL();
op->Run(scope, cuda_place); op->Run(scope, cuda_place);
EXPECT_EQ(op_test_value, -10); EXPECT_EQ(op_test_value, -10);
// remove cuda kernels
paddle::framework::UseCPU();
op->Run(scope, cpu_place);
EXPECT_EQ(op_test_value, -9);
// add cuda kernels
paddle::framework::UseCUDA();
op->Run(scope, cuda_place);
EXPECT_EQ(op_test_value, -10);
// use cudnn kernel
paddle::framework::UseCUDNN();
op->Run(scope, cuda_place);
EXPECT_EQ(op_test_value, -20);
} }
...@@ -11,6 +11,7 @@ distributed under the License is distributed on an "AS IS" BASIS, ...@@ -11,6 +11,7 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include <gflags/gflags.h>
#include <glog/logging.h> #include <glog/logging.h>
#include <algorithm> #include <algorithm>
...@@ -21,61 +22,27 @@ limitations under the License. */ ...@@ -21,61 +22,27 @@ limitations under the License. */
#include "paddle/framework/shape_inference.h" #include "paddle/framework/shape_inference.h"
#include "paddle/framework/var_type.h" #include "paddle/framework/var_type.h"
DEFINE_bool(op_sync, false,
"Default cuda is asynchronous device, set to True will"
"force op run in synchronous mode.");
namespace paddle { namespace paddle {
namespace framework { namespace framework {
std::vector<std::tuple<platform::Place, LibraryType>> kKernelPriority; std::vector<std::tuple<platform::Place, LibraryType>> kKernelPriority = {
std::make_tuple(platform::CUDAPlace(0), LibraryType::kCUDNN),
void UseCPU() { std::make_tuple(platform::CUDAPlace(0), LibraryType::kPlain),
kKernelPriority.clear(); std::make_tuple(platform::CPUPlace(), LibraryType::kMKLDNN),
/*Plain CPU*/ std::make_tuple(platform::CPUPlace(), LibraryType::kPlain),
auto pair0 = std::make_tuple(platform::CPUPlace(), LibraryType::kPlain); };
kKernelPriority.insert(kKernelPriority.begin(), pair0);
}
void UseMKLDNN() {
UseCPU();
#if PADDLE_WITH_MKLML
{
/*MKLDNN Kernel*/
auto pair0 = std::make_tuple(platform::CPUPlace(), LibraryType::kMKLDNN);
kKernelPriority.insert(kKernelPriority.begin(), pair0);
}
#endif
}
void UseCUDA() {
UseMKLDNN();
#if PADDLE_WITH_CUDA
/*Plain GPU*/
auto pair0 = std::make_tuple(platform::CUDAPlace(0), LibraryType::kPlain);
kKernelPriority.insert(kKernelPriority.begin(), pair0);
#endif
}
void UseCUDNN() {
UseCUDA();
#if PADDLE_WITH_CUDA
if (platform::dynload::HasCUDNN()) {
/*CUDNN Kernel*/
auto pair0 = std::make_tuple(platform::CUDAPlace(0), LibraryType::kCUDNN);
kKernelPriority.insert(kKernelPriority.begin(), pair0);
}
#endif
}
void UseALL() {
UseCPU();
UseMKLDNN();
UseCUDA();
UseCUDNN();
}
static DDim GetDims(const Scope& scope, const std::string& name) { static DDim GetDims(const Scope& scope, const std::string& name) {
Variable* var = scope.FindVar(name); Variable* var = scope.FindVar(name);
if (var == nullptr) { if (var == nullptr) {
return DDim({-1}); return DDim({-1});
} else if (var->IsType<LoDTensor>()) { }
if (var->IsType<LoDTensor>()) {
return var->Get<LoDTensor>().dims(); return var->Get<LoDTensor>().dims();
} else if (var->IsType<SelectedRows>()) { } else if (var->IsType<SelectedRows>()) {
return var->Get<SelectedRows>().GetCompleteDims(); return var->Get<SelectedRows>().GetCompleteDims();
...@@ -84,6 +51,21 @@ static DDim GetDims(const Scope& scope, const std::string& name) { ...@@ -84,6 +51,21 @@ static DDim GetDims(const Scope& scope, const std::string& name) {
} }
} }
static LoD GetLoD(const Scope& scope, const std::string& name) {
Variable* var = scope.FindVar(name);
auto default_lod = LoD({{}});
if (var == nullptr) {
return default_lod;
}
if (var->IsType<LoDTensor>()) {
return var->Get<LoDTensor>().lod();
} else {
return default_lod;
}
}
std::string OperatorBase::Input(const std::string& name) const { std::string OperatorBase::Input(const std::string& name) const {
auto& ins = Inputs(name); auto& ins = Inputs(name);
PADDLE_ENFORCE_LE(ins.size(), 1UL, PADDLE_ENFORCE_LE(ins.size(), 1UL,
...@@ -125,7 +107,8 @@ std::string OperatorBase::DebugStringEx(const Scope* scope) const { ...@@ -125,7 +107,8 @@ std::string OperatorBase::DebugStringEx(const Scope* scope) const {
for (size_t i = 0; i < input.second.size(); ++i) { for (size_t i = 0; i < input.second.size(); ++i) {
ss << input.second[i]; ss << input.second[i];
if (scope) { if (scope) {
ss << "(" << GetDims(*scope, input.second[i]) << ")"; ss << "[" << GetDims(*scope, input.second[i]) << "]";
ss << "(" << GetLoD(*scope, input.second[i]) << ")";
} }
if (i != input.second.size() - 1) { if (i != input.second.size() - 1) {
ss << ", "; ss << ", ";
...@@ -144,7 +127,8 @@ std::string OperatorBase::DebugStringEx(const Scope* scope) const { ...@@ -144,7 +127,8 @@ std::string OperatorBase::DebugStringEx(const Scope* scope) const {
for (size_t i = 0; i < output.second.size(); ++i) { for (size_t i = 0; i < output.second.size(); ++i) {
ss << output.second[i]; ss << output.second[i];
if (scope) { if (scope) {
ss << "(" << GetDims(*scope, output.second[i]) << ")"; ss << "[" << GetDims(*scope, output.second[i]) << "]";
ss << "(" << GetLoD(*scope, output.second[i]) << ")";
} }
if (i != output.second.size() - 1) { if (i != output.second.size() - 1) {
ss << ", "; ss << ", ";
...@@ -247,36 +231,33 @@ static bool VarIsTensor(const Variable* var) { ...@@ -247,36 +231,33 @@ static bool VarIsTensor(const Variable* var) {
return var->IsType<LoDTensor>() || var->IsType<SelectedRows>(); return var->IsType<LoDTensor>() || var->IsType<SelectedRows>();
} }
static const Tensor* GetTensorFromVar(const Variable* var) { static const Tensor* GetTensorFromVar(Variable* var) {
const Tensor* t = nullptr;
if (var->IsType<LoDTensor>()) { if (var->IsType<LoDTensor>()) {
t = &(var->Get<LoDTensor>()); return var->GetMutable<LoDTensor>();
} else if (var->IsType<SelectedRows>()) { } else if (var->IsType<SelectedRows>()) {
t = &(var->Get<SelectedRows>().value()); return var->GetMutable<SelectedRows>()->mutable_value();
} else { } else {
PADDLE_THROW("Variable type_id %s, expect LoDTensor/SelectedRows.", PADDLE_THROW("Variable type_id %s, expect LoDTensor/SelectedRows.",
var->Type().name()); var->Type().name());
} }
return t;
} }
static Tensor* GetMutableTensorFromVar(Variable* var) { static Tensor* GetMutableTensorFromVar(Variable* var) {
Tensor* t = nullptr;
if (var->IsType<LoDTensor>()) { if (var->IsType<LoDTensor>()) {
t = var->GetMutable<LoDTensor>(); return var->GetMutable<LoDTensor>();
} else if (var->IsType<SelectedRows>()) { } else if (var->IsType<SelectedRows>()) {
t = var->GetMutable<SelectedRows>()->mutable_value(); return var->GetMutable<SelectedRows>()->mutable_value();
} else { } else {
PADDLE_THROW("Variable type_id %s, expect LoDTensor/SelectedRows.", PADDLE_THROW("Variable type_id %s, expect LoDTensor/SelectedRows.",
var->Type().name()); var->Type().name());
} }
return t;
} }
template <> template <>
const Tensor* ExecutionContext::Input<Tensor>(const std::string& name) const { const Tensor* ExecutionContext::Input<Tensor>(const std::string& name) const {
auto* var = InputVar(name); auto* var = InputVar(name);
return var == nullptr ? nullptr : GetTensorFromVar(var); return var == nullptr ? nullptr
: GetTensorFromVar(const_cast<Variable*>(var));
} }
template <> template <>
...@@ -319,6 +300,7 @@ bool OpSupportGPU(const std::string& op_type) { ...@@ -319,6 +300,7 @@ bool OpSupportGPU(const std::string& op_type) {
auto it = all_kernels.find(op_type); auto it = all_kernels.find(op_type);
if (it == all_kernels.end()) { if (it == all_kernels.end()) {
// All control operator must support GPU // All control operator must support GPU
return true; return true;
} }
for (auto& kern_pair : it->second) { for (auto& kern_pair : it->second) {
...@@ -492,21 +474,17 @@ void OperatorWithKernel::Run(const Scope& scope, ...@@ -492,21 +474,17 @@ void OperatorWithKernel::Run(const Scope& scope,
} }
ExecutionContext ctx(*this, scope, *dev_ctx); ExecutionContext ctx(*this, scope, *dev_ctx);
auto expected_kernel_key = this->GetExpectedKernelType(ctx);
OpKernelMap& kernels = kernels_iter->second; OpKernelMap& kernels = kernels_iter->second;
for (auto& candidate : kKernelPriority) { // TODO(dzhwinter) : kernel fallback mechanism will be added when all the
auto candidate_key = // transform functions are ready.
OpKernelType(expected_kernel_key.data_type_, std::get<0>(candidate),
expected_kernel_key.data_layout_, std::get<1>(candidate));
if ((candidate_key == expected_kernel_key) || // for (auto& candidate : kKernelPriority) {
(kernels.count(candidate_key))) { // Do selection
expected_kernel_key = candidate_key; // }
break;
} auto expected_kernel_key = this->GetExpectedKernelType(ctx);
}
VLOG(3) << "expected_kernel_key:" << expected_kernel_key; VLOG(3) << "expected_kernel_key:" << expected_kernel_key;
...@@ -520,7 +498,7 @@ void OperatorWithKernel::Run(const Scope& scope, ...@@ -520,7 +498,7 @@ void OperatorWithKernel::Run(const Scope& scope,
if (tensor_in->IsInitialized()) { if (tensor_in->IsInitialized()) {
auto kernel_type_for_var = this->GetKernelTypeForVar( auto kernel_type_for_var = this->GetKernelTypeForVar(
var_name_item.first, *tensor_in, expected_kernel_key); var_name_item.first, *tensor_in, expected_kernel_key);
if (kernel_type_for_var != expected_kernel_key) { if (TransFromNeeded(kernel_type_for_var, expected_kernel_key)) {
auto out_var_names = OutputVars(true); auto out_var_names = OutputVars(true);
if (std::find(out_var_names.begin(), out_var_names.end(), if (std::find(out_var_names.begin(), out_var_names.end(),
var_name) != out_var_names.end()) { var_name) != out_var_names.end()) {
...@@ -529,11 +507,13 @@ void OperatorWithKernel::Run(const Scope& scope, ...@@ -529,11 +507,13 @@ void OperatorWithKernel::Run(const Scope& scope,
"does not support transform", "does not support transform",
var_name); var_name);
} }
VLOG(3) << "need to do transform for var " << var_name; VLOG(3) << "Transform Variable " << var_name << " from "
<< kernel_type_for_var << " to " << expected_kernel_key;
auto* trans_var = new_scope.Var(var_name); auto* trans_var = new_scope.Var(var_name);
auto* out = DataTransform(expected_kernel_key, kernel_type_for_var, std::shared_ptr<Tensor> out(new Tensor);
*tensor_in); DataTransform(expected_kernel_key, kernel_type_for_var, *tensor_in,
CopyVariableWithTensor(*var, *out, *trans_var); out.get());
CopyVariableWithTensor(*var, *(out.get()), *trans_var);
} }
} }
} }
...@@ -542,8 +522,14 @@ void OperatorWithKernel::Run(const Scope& scope, ...@@ -542,8 +522,14 @@ void OperatorWithKernel::Run(const Scope& scope,
auto kernel_iter = kernels.find(expected_kernel_key); auto kernel_iter = kernels.find(expected_kernel_key);
kernel_iter->second->Compute(ExecutionContext( auto* new_dev_ctx = pool.Get(expected_kernel_key.place_);
*this, new_scope, *pool.Get(expected_kernel_key.place_))); kernel_iter->second->Compute(
ExecutionContext(*this, new_scope, *new_dev_ctx));
/*For profiling/benchmark only*/
if (FLAGS_op_sync) {
new_dev_ctx->Wait();
}
} }
proto::DataType OperatorWithKernel::IndicateDataType( proto::DataType OperatorWithKernel::IndicateDataType(
......
...@@ -54,33 +54,9 @@ constexpr char kGradVarSuffix[] = "@GRAD"; ...@@ -54,33 +54,9 @@ constexpr char kGradVarSuffix[] = "@GRAD";
constexpr char kZeroVarSuffix[] = "@ZERO"; constexpr char kZeroVarSuffix[] = "@ZERO";
// define some kernel priority // define some kernel priority
/* Define multiple kernel type fallback order*/
extern std::vector<std::tuple<platform::Place, LibraryType>> kKernelPriority; extern std::vector<std::tuple<platform::Place, LibraryType>> kKernelPriority;
/**
* @brief Use cpu kernel only
*/
void UseCPU();
/**
* @brief Perfer MKLDNN kernel than Plain CPU kernel
*/
void UseMKLDNN();
/**
* @brief Perfer CUDA kernel than Plain CPU kernel
*/
void UseCUDA();
/**
* @brief Perfer cudnn kernel than Plain CUDA kernel
*/
void UseCUDNN();
/**
* @brief Use all available kernels
*/
void UseALL();
inline std::string GradVarName(const std::string& var_name) { inline std::string GradVarName(const std::string& var_name) {
return var_name + kGradVarSuffix; return var_name + kGradVarSuffix;
} }
......
...@@ -116,8 +116,8 @@ inline void Copy(const Tensor& src, const platform::Place& dst_place, ...@@ -116,8 +116,8 @@ inline void Copy(const Tensor& src, const platform::Place& dst_place,
* @param[in] src The external tensor. * @param[in] src The external tensor.
* @param[in] ctx The device context contains device resources. * @param[in] ctx The device context contains device resources.
* *
* * @note CopyFromVector assumes that the tensor has been resized * * @note CopyFromVector will resize dst to an 1D tensor with the same
* before invoking. * size as src.
*/ */
template <typename T> template <typename T>
inline void CopyFromVector(const std::vector<T>& src, inline void CopyFromVector(const std::vector<T>& src,
......
...@@ -135,9 +135,8 @@ op_library(detection_output_op DEPS softmax) ...@@ -135,9 +135,8 @@ op_library(detection_output_op DEPS softmax)
op_library(sequence_softmax_op DEPS softmax) op_library(sequence_softmax_op DEPS softmax)
op_library(sum_op DEPS selected_rows_functor) op_library(sum_op DEPS selected_rows_functor)
op_library(sgd_op DEPS selected_rows_functor) op_library(sgd_op DEPS selected_rows_functor)
op_library(print_op DEPS lod_tensor)
op_library(adagrad_op DEPS selected_rows_functor) op_library(adagrad_op DEPS selected_rows_functor)
op_library(conv_op DEPS vol2col)
op_library(pool_op DEPS pooling)
op_library(maxout_op DEPS maxouting) op_library(maxout_op DEPS maxouting)
op_library(unpool_op DEPS unpooling) op_library(unpool_op DEPS unpooling)
op_library(pool_with_index_op DEPS pooling) op_library(pool_with_index_op DEPS pooling)
...@@ -148,12 +147,27 @@ op_library(max_sequence_len_op DEPS lod_rank_table) ...@@ -148,12 +147,27 @@ op_library(max_sequence_len_op DEPS lod_rank_table)
op_library(sequence_conv_op DEPS context_project) op_library(sequence_conv_op DEPS context_project)
op_library(sequence_pool_op DEPS sequence_pooling) op_library(sequence_pool_op DEPS sequence_pooling)
op_library(lstm_op DEPS sequence2batch lstm_compute) op_library(lstm_op DEPS sequence2batch lstm_compute)
op_library(conv_transpose_op DEPS vol2col)
op_library(gru_op DEPS sequence2batch gru_compute) op_library(gru_op DEPS sequence2batch gru_compute)
op_library(recurrent_op DEPS executor) op_library(recurrent_op DEPS executor)
op_library(warpctc_op DEPS dynload_warpctc sequence_padding math_function) op_library(warpctc_op DEPS dynload_warpctc sequence_padding math_function)
op_library(cos_sim_op DEPS cos_sim_functor) op_library(cos_sim_op DEPS cos_sim_functor)
op_library(parallel_do_op DEPS executor) op_library(parallel_do_op DEPS executor)
# Regist multiple Kernel to pybind
if (WITH_GPU)
op_library(conv_op SRCS conv_op.cc conv_op.cu.cc conv_cudnn_op.cu.cc DEPS vol2col)
op_library(pool_op SRCS pool_op.cc pool_op.cu.cc pool_cudnn_op.cu.cc DEPS pooling)
op_library(conv_transpose_op SRCS conv_transpose_op.cc conv_transpose_op.cu.cc
conv_transpose_cudnn_op.cu.cc DEPS vol2col)
file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(conv2d, CUDNN);\n")
file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(pool2d, CUDNN);\n")
file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(conv2d_transpose, CUDNN);\n")
else()
op_library(conv_op SRCS conv_op.cc DEPS vol2col)
op_library(pool_op SRCS pool_op.cc DEPS pooling)
op_library(conv_transpose_op SRCS conv_transpose_op.cc DEPS vol2col)
endif()
# FIXME(typhoonzero): save/load depends lodtensor serialization functions # FIXME(typhoonzero): save/load depends lodtensor serialization functions
op_library(save_op DEPS lod_tensor) op_library(save_op DEPS lod_tensor)
op_library(load_op DEPS lod_tensor) op_library(load_op DEPS lod_tensor)
......
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/assign_value_op.h"
namespace paddle {
namespace operators {
class AssignValueOp : public framework::OperatorWithKernel {
public:
AssignValueOp(const std::string &type,
const framework::VariableNameMap &inputs,
const framework::VariableNameMap &outputs,
const framework::AttributeMap &attrs)
: OperatorWithKernel(type, inputs, outputs, attrs) {}
void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of AssignValueOp should not be null.");
auto shape = ctx->Attrs().Get<std::vector<int>>("shape");
ctx->SetOutputDim("Out", framework::make_ddim(shape));
}
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext &ctx) const override {
return framework::OpKernelType(
framework::proto::DataType(ctx.Attr<int>("dtype")), ctx.GetPlace());
}
};
class AssignValueOpMaker : public framework::OpProtoAndCheckerMaker {
public:
AssignValueOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput("Out", "(Tensor) Output tensor of assign_value operator.");
AddAttr<std::vector<int>>("shape",
"(vector<int>) "
"Shape of values.");
AddAttr<int>("dtype", "data type of values")
.InEnum({framework::proto::DataType::INT32,
framework::proto::DataType::FP32});
AddAttr<std::vector<float>>("fp32_values", "store the float values")
.SetDefault({});
AddAttr<std::vector<int>>("int32_values", "store the int values")
.SetDefault({});
AddComment(R"DOC(
AssignValue operator
$$Out = values$$
)DOC");
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(assign_value, ops::AssignValueOp, ops::AssignValueOpMaker);
REGISTER_OP_CPU_KERNEL(assign_value, ops::AssignValueKernel<int>,
ops::AssignValueKernel<float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. you may not use this file except in compliance with the License.
You may obtain a copy of the License at Indicesou may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#pragma once #include "paddle/operators/assign_value_op.h"
#include "paddle/framework/op_registry.h"
#include "paddle/operators/pool_op.h"
namespace paddle { namespace ops = paddle::operators;
namespace operators {} // namespace operators REGISTER_OP_CUDA_KERNEL(assign_value, ops::AssignValueKernel<int>,
} // namespace paddle ops::AssignValueKernel<float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. /* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. you may not use this file except in compliance with the License.
...@@ -12,28 +12,39 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ...@@ -12,28 +12,39 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/operators/pool_cudnn_op.h" #pragma once
namespace ops = paddle::operators; #include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
REGISTER_OP(pool2d_cudnn, ops::PoolOp, ops::Pool2dOpMaker, pool2d_cudnn_grad, #include "paddle/platform/enforce.h"
ops::PoolOpGrad);
namespace paddle {
REGISTER_OP_CPU_KERNEL( namespace operators {
pool2d_cudnn, ops::PoolKernel<paddle::platform::CPUDeviceContext, float>,
ops::PoolKernel<paddle::platform::CPUDeviceContext, double>); template <typename T>
REGISTER_OP_CPU_KERNEL( class AssignValueKernel : public framework::OpKernel<T> {
pool2d_cudnn_grad, public:
ops::PoolGradKernel<paddle::platform::CPUDeviceContext, float>, virtual void Compute(const framework::ExecutionContext& ctx) const {
ops::PoolGradKernel<paddle::platform::CPUDeviceContext, double>) auto shape = ctx.Attr<std::vector<int>>("shape");
auto* out = ctx.Output<framework::Tensor>("Out");
REGISTER_OP(pool3d_cudnn, ops::PoolOp, ops::Pool3dOpMaker, pool3d_cudnn_grad, int dtype = ctx.Attr<int>("dtype");
ops::PoolOpGrad); const char* value_name = nullptr;
switch (dtype) {
REGISTER_OP_CPU_KERNEL( case framework::proto::DataType::INT32:
pool3d_cudnn, ops::PoolKernel<paddle::platform::CPUDeviceContext, float>, value_name = "int32_values";
ops::PoolKernel<paddle::platform::CPUDeviceContext, double>); break;
REGISTER_OP_CPU_KERNEL( case framework::proto::DataType::FP32:
pool3d_cudnn_grad, value_name = "fp32_values";
ops::PoolGradKernel<paddle::platform::CPUDeviceContext, float>, break;
ops::PoolGradKernel<paddle::platform::CPUDeviceContext, double>) default:
PADDLE_THROW("Unsupported dtype for assign_value_op: %d", dtype);
break;
}
auto values = ctx.Attr<std::vector<T>>(value_name);
framework::CopyFromVector(values, ctx.device_context(), out);
out->Resize(framework::make_ddim(shape));
}
};
} // namespace operators
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/conv_op.h"
namespace paddle {
namespace operators {
class CudnnConv2DOpMaker : public Conv2DOpMaker {
public:
CudnnConv2DOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: Conv2DOpMaker(proto, op_checker) {
AddAttr<int>("workspace_size_MB",
"workspace size for cudnn, in MB, "
"workspace is a section of GPU memory which will be "
"allocated/freed each time the operator runs, larger "
"workspace size can increase performance but also requires "
"better hardware. This size should be chosen carefully.")
.SetDefault(4096);
}
};
class CudnnConv3DOpMaker : public Conv3DOpMaker {
public:
CudnnConv3DOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: Conv3DOpMaker(proto, op_checker) {
AddAttr<int>("workspace_size_MB",
"workspace size for cudnn, in MB, "
"workspace is a section of GPU memory which will be "
"allocated/freed each time the operator runs, larger "
"workspace size can increase performance but also requires "
"better hardware. This size should be chosen carefully.")
.SetDefault(4096);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP(conv2d_cudnn, ops::ConvOp, ops::CudnnConv2DOpMaker,
conv2d_cudnn_grad, ops::ConvOpGrad);
REGISTER_OP(conv3d_cudnn, ops::ConvOp, ops::CudnnConv3DOpMaker,
conv3d_cudnn_grad, ops::ConvOpGrad);
REGISTER_OP_CPU_KERNEL(
conv2d_cudnn,
ops::GemmConvKernel<paddle::platform::CPUDeviceContext, float>,
ops::GemmConvKernel<paddle::platform::CPUDeviceContext, double>);
REGISTER_OP_CPU_KERNEL(
conv2d_cudnn_grad,
ops::GemmConvGradKernel<paddle::platform::CPUDeviceContext, float>,
ops::GemmConvGradKernel<paddle::platform::CPUDeviceContext, double>);
REGISTER_OP_CPU_KERNEL(
conv3d_cudnn,
ops::GemmConvKernel<paddle::platform::CPUDeviceContext, float>,
ops::GemmConvKernel<paddle::platform::CPUDeviceContext, double>);
REGISTER_OP_CPU_KERNEL(
conv3d_cudnn_grad,
ops::GemmConvGradKernel<paddle::platform::CPUDeviceContext, float>,
ops::GemmConvGradKernel<paddle::platform::CPUDeviceContext, double>);
...@@ -32,7 +32,7 @@ static constexpr size_t kCONV_CUDNN_WORKSPACE_LIMIT_BYTES = ...@@ -32,7 +32,7 @@ static constexpr size_t kCONV_CUDNN_WORKSPACE_LIMIT_BYTES =
static_cast<size_t>(1024) * 1024 * 1024; static_cast<size_t>(1024) * 1024 * 1024;
template <typename T> template <typename T>
class CudnnConvOpKernel : public framework::OpKernel<T> { class CUDNNConvOpKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
...@@ -147,7 +147,7 @@ class CudnnConvOpKernel : public framework::OpKernel<T> { ...@@ -147,7 +147,7 @@ class CudnnConvOpKernel : public framework::OpKernel<T> {
}; };
template <typename T> template <typename T>
class CudnnConvGradOpKernel : public framework::OpKernel<T> { class CUDNNConvGradOpKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
...@@ -315,17 +315,16 @@ class CudnnConvGradOpKernel : public framework::OpKernel<T> { ...@@ -315,17 +315,16 @@ class CudnnConvGradOpKernel : public framework::OpKernel<T> {
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
// TODO(dzhwinter) : below register should be removed REGISTER_OP_KERNEL(conv2d, CUDNN, ::paddle::platform::CUDAPlace,
REGISTER_OP_CUDA_KERNEL(conv2d_cudnn, paddle::operators::CUDNNConvOpKernel<float>,
paddle::operators::CudnnConvOpKernel<float>, paddle::operators::CUDNNConvOpKernel<double>);
paddle::operators::CudnnConvOpKernel<double>); REGISTER_OP_KERNEL(conv2d_grad, CUDNN, ::paddle::platform::CUDAPlace,
REGISTER_OP_CUDA_KERNEL(conv2d_cudnn_grad, paddle::operators::CUDNNConvGradOpKernel<float>,
paddle::operators::CudnnConvGradOpKernel<float>, paddle::operators::CUDNNConvGradOpKernel<double>);
paddle::operators::CudnnConvGradOpKernel<double>);
REGISTER_OP_KERNEL(conv3d, CUDNN, ::paddle::platform::CUDAPlace,
REGISTER_OP_CUDA_KERNEL(conv3d_cudnn, paddle::operators::CUDNNConvOpKernel<float>,
paddle::operators::CudnnConvOpKernel<float>, paddle::operators::CUDNNConvOpKernel<double>);
paddle::operators::CudnnConvOpKernel<double>); REGISTER_OP_KERNEL(conv3d_grad, CUDNN, ::paddle::platform::CUDAPlace,
REGISTER_OP_CUDA_KERNEL(conv3d_cudnn_grad, paddle::operators::CUDNNConvGradOpKernel<float>,
paddle::operators::CudnnConvGradOpKernel<float>, paddle::operators::CUDNNConvGradOpKernel<double>);
paddle::operators::CudnnConvGradOpKernel<double>);
...@@ -67,6 +67,23 @@ void ConvOp::InferShape(framework::InferShapeContext* ctx) const { ...@@ -67,6 +67,23 @@ void ConvOp::InferShape(framework::InferShapeContext* ctx) const {
ctx->ShareLoD("Input", "Output"); ctx->ShareLoD("Input", "Output");
} }
framework::OpKernelType ConvOp::GetExpectedKernelType(
const framework::ExecutionContext& ctx) const {
bool use_cudnn = ctx.Attr<bool>("use_cudnn");
framework::LibraryType library_;
if (use_cudnn) {
library_ = framework::LibraryType::kCUDNN;
} else {
library_ = framework::LibraryType::kPlain;
}
std::string data_format = ctx.Attr<std::string>("data_format");
framework::DataLayout layout_ = framework::StringToDataLayout(data_format);
return framework::OpKernelType(
framework::ToDataType(ctx.Input<Tensor>("Input")->type()), ctx.GetPlace(),
layout_, library_);
}
Conv2DOpMaker::Conv2DOpMaker(OpProto* proto, OpAttrChecker* op_checker) Conv2DOpMaker::Conv2DOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
...@@ -108,6 +125,26 @@ Conv2DOpMaker::Conv2DOpMaker(OpProto* proto, OpAttrChecker* op_checker) ...@@ -108,6 +125,26 @@ Conv2DOpMaker::Conv2DOpMaker(OpProto* proto, OpAttrChecker* op_checker)
"dilations(h_dilation, w_dilation) of " "dilations(h_dilation, w_dilation) of "
"convolution operator.") "convolution operator.")
.SetDefault({1, 1}); .SetDefault({1, 1});
AddAttr<bool>(
"use_cudnn",
"(bool, default false) Only used in cudnn kernel, need install cudnn")
.SetDefault(false);
AddAttr<std::string>(
"data_format",
"(string, default NCHW) Only used in "
"An optional string from: \"NHWC\", \"NCHW\". "
"Defaults to \"NHWC\". Specify the data format of the output data, "
"the input will be transformed automatically. ")
.SetDefault("AnyLayout");
// TODO(dzhwinter): need to registered layout transform function
AddAttr<int>("workspace_size_MB",
"Only used in cudnn kernel. Need set use_cudnn to true."
"workspace size for cudnn, in MB, "
"workspace is a section of GPU memory which will be "
"allocated/freed each time the operator runs, larger "
"workspace size can increase performance but also requires "
"better hardware. This size should be chosen carefully.")
.SetDefault(4096);
AddComment(R"DOC( AddComment(R"DOC(
Convolution Operator. Convolution Operator.
...@@ -181,6 +218,25 @@ Conv3DOpMaker::Conv3DOpMaker(OpProto* proto, OpAttrChecker* op_checker) ...@@ -181,6 +218,25 @@ Conv3DOpMaker::Conv3DOpMaker(OpProto* proto, OpAttrChecker* op_checker)
"dilations(d_dilation, h_dilation, w_dilation) of " "dilations(d_dilation, h_dilation, w_dilation) of "
"convolution operator.") "convolution operator.")
.SetDefault({1, 1, 1}); .SetDefault({1, 1, 1});
AddAttr<bool>(
"use_cudnn",
"(bool, default false) Only used in cudnn kernel, need install cudnn")
.SetDefault(false);
AddAttr<std::string>(
"data_format",
"(string, default NCHW) Only used in "
"An optional string from: \"NHWC\", \"NCHW\". "
"Defaults to \"NHWC\". Specify the data format of the output data, "
"the input will be transformed automatically. ")
.SetDefault("AnyLayout");
// TODO(dzhwinter): need to registered layout transform function
AddAttr<int>("workspace_size_MB",
"Only used in cudnn kernel. workspace size for cudnn, in MB, "
"workspace is a section of GPU memory which will be "
"allocated/freed each time the operator runs, larger "
"workspace size can increase performance but also requires "
"better hardware. This size should be chosen carefully.")
.SetDefault(4096);
AddComment(R"DOC( AddComment(R"DOC(
Convolution3D Operator. Convolution3D Operator.
...@@ -224,6 +280,23 @@ void ConvOpGrad::InferShape(framework::InferShapeContext* ctx) const { ...@@ -224,6 +280,23 @@ void ConvOpGrad::InferShape(framework::InferShapeContext* ctx) const {
} }
} }
framework::OpKernelType ConvOpGrad::GetExpectedKernelType(
const framework::ExecutionContext& ctx) const {
bool use_cudnn = ctx.Attr<bool>("use_cudnn");
framework::LibraryType library_;
if (use_cudnn) {
library_ = framework::LibraryType::kCUDNN;
} else {
library_ = framework::LibraryType::kPlain;
}
std::string data_format = ctx.Attr<std::string>("data_format");
framework::DataLayout layout_ = framework::StringToDataLayout(data_format);
return framework::OpKernelType(
framework::ToDataType(ctx.Input<Tensor>("Input")->type()), ctx.GetPlace(),
layout_, library_);
}
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
......
...@@ -62,12 +62,20 @@ class ConvOp : public framework::OperatorWithKernel { ...@@ -62,12 +62,20 @@ class ConvOp : public framework::OperatorWithKernel {
public: public:
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override; void InferShape(framework::InferShapeContext* ctx) const override;
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override;
}; };
class ConvOpGrad : public framework::OperatorWithKernel { class ConvOpGrad : public framework::OperatorWithKernel {
public: public:
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override; void InferShape(framework::InferShapeContext* ctx) const override;
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override;
}; };
template <typename DeviceContext, typename T> template <typename DeviceContext, typename T>
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/conv_transpose_op.h"
namespace paddle {
namespace operators {
class CudnnConv2DTransposeOpMaker : public Conv2DTransposeOpMaker {
public:
CudnnConv2DTransposeOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: Conv2DTransposeOpMaker(proto, op_checker) {
AddAttr<int>("workspace_size_MB",
"workspace size for cudnn, in MB, "
"workspace is a section of GPU memory which will be "
"allocated/freed each time the operator runs, larger "
"workspace size can increase performance but also requires "
"better hardward. This size should be carefully setted.")
.SetDefault(4096);
}
};
class CudnnConv3DTransposeOpMaker : public Conv3DTransposeOpMaker {
public:
CudnnConv3DTransposeOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: Conv3DTransposeOpMaker(proto, op_checker) {
AddAttr<int>("workspace_size_MB",
"workspace size for cudnn, in MB, "
"workspace is a section of GPU memory which will be "
"allocated/freed each time the operator runs, larger "
"workspace size can increase performance but also requires "
"better hardward. This size should be carefully setted.")
.SetDefault(4096);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP(conv2d_transpose_cudnn, ops::ConvTransposeOp,
ops::CudnnConv2DTransposeOpMaker, conv2d_transpose_cudnn_grad,
ops::ConvTransposeOpGrad);
REGISTER_OP_CPU_KERNEL(
conv2d_transpose_cudnn,
ops::GemmConvTransposeKernel<paddle::platform::CPUDeviceContext, float>,
ops::GemmConvTransposeKernel<paddle::platform::CPUDeviceContext, double>);
REGISTER_OP_CPU_KERNEL(
conv2d_transpose_cudnn_grad,
ops::GemmConvTransposeGradKernel<paddle::platform::CPUDeviceContext, float>,
ops::GemmConvTransposeGradKernel<paddle::platform::CPUDeviceContext,
double>);
REGISTER_OP(conv3d_transpose_cudnn, ops::ConvTransposeOp,
ops::CudnnConv3DTransposeOpMaker, conv3d_transpose_cudnn_grad,
ops::ConvTransposeOpGrad);
REGISTER_OP_CPU_KERNEL(
conv3d_transpose_cudnn,
ops::GemmConvTransposeKernel<paddle::platform::CPUDeviceContext, float>,
ops::GemmConvTransposeKernel<paddle::platform::CPUDeviceContext, double>);
REGISTER_OP_CPU_KERNEL(
conv3d_transpose_cudnn_grad,
ops::GemmConvTransposeGradKernel<paddle::platform::CPUDeviceContext, float>,
ops::GemmConvTransposeGradKernel<paddle::platform::CPUDeviceContext,
double>);
...@@ -28,10 +28,10 @@ using ScopedFilterDescriptor = platform::ScopedFilterDescriptor; ...@@ -28,10 +28,10 @@ using ScopedFilterDescriptor = platform::ScopedFilterDescriptor;
using ScopedConvolutionDescriptor = platform::ScopedConvolutionDescriptor; using ScopedConvolutionDescriptor = platform::ScopedConvolutionDescriptor;
using DataLayout = platform::DataLayout; using DataLayout = platform::DataLayout;
static constexpr size_t kConvCudnnWorkspaceLimitBytes = 1024 * 1024 * 1024; static constexpr size_t kConvCUDNNWorkspaceLimitBytes = 1024 * 1024 * 1024;
template <typename T> template <typename T>
class CudnnConvTransposeOpKernel : public framework::OpKernel<T> { class CUDNNConvTransposeOpKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
...@@ -77,7 +77,7 @@ class CudnnConvTransposeOpKernel : public framework::OpKernel<T> { ...@@ -77,7 +77,7 @@ class CudnnConvTransposeOpKernel : public framework::OpKernel<T> {
// ------------------- cudnn conv workspace --------------------- // ------------------- cudnn conv workspace ---------------------
void* cudnn_workspace = nullptr; void* cudnn_workspace = nullptr;
size_t workspace_size_in_bytes; // final workspace to allocate. size_t workspace_size_in_bytes; // final workspace to allocate.
size_t workspace_size_limit = kConvCudnnWorkspaceLimitBytes; size_t workspace_size_limit = kConvCUDNNWorkspaceLimitBytes;
if (user_workspace_size > 0) { if (user_workspace_size > 0) {
workspace_size_limit = user_workspace_size * 1024 * 1024; workspace_size_limit = user_workspace_size * 1024 * 1024;
} }
...@@ -116,7 +116,7 @@ class CudnnConvTransposeOpKernel : public framework::OpKernel<T> { ...@@ -116,7 +116,7 @@ class CudnnConvTransposeOpKernel : public framework::OpKernel<T> {
}; };
template <typename T> template <typename T>
class CudnnConvTransposeGradOpKernel : public framework::OpKernel<T> { class CUDNNConvTransposeGradOpKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
...@@ -161,7 +161,7 @@ class CudnnConvTransposeGradOpKernel : public framework::OpKernel<T> { ...@@ -161,7 +161,7 @@ class CudnnConvTransposeGradOpKernel : public framework::OpKernel<T> {
cudnnConvolutionBwdFilterAlgo_t filter_algo; cudnnConvolutionBwdFilterAlgo_t filter_algo;
size_t bwd_filter_ws_size, fwd_ws_size; size_t bwd_filter_ws_size, fwd_ws_size;
size_t workspace_size_in_bytes = 0; size_t workspace_size_in_bytes = 0;
size_t workspace_size_limit = kConvCudnnWorkspaceLimitBytes; size_t workspace_size_limit = kConvCUDNNWorkspaceLimitBytes;
if (user_workspace_size > 0) { if (user_workspace_size > 0) {
workspace_size_limit = user_workspace_size * 1024 * 1024; workspace_size_limit = user_workspace_size * 1024 * 1024;
} }
...@@ -236,16 +236,16 @@ class CudnnConvTransposeGradOpKernel : public framework::OpKernel<T> { ...@@ -236,16 +236,16 @@ class CudnnConvTransposeGradOpKernel : public framework::OpKernel<T> {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(conv2d_transpose_cudnn, REGISTER_OP_KERNEL(conv2d_transpose, CUDNN, ::paddle::platform::CUDAPlace,
ops::CudnnConvTransposeOpKernel<float>, ops::CUDNNConvTransposeOpKernel<float>,
ops::CudnnConvTransposeOpKernel<double>); ops::CUDNNConvTransposeOpKernel<double>);
REGISTER_OP_CUDA_KERNEL(conv2d_transpose_cudnn_grad, REGISTER_OP_KERNEL(conv2d_transpose_grad, CUDNN, ::paddle::platform::CUDAPlace,
ops::CudnnConvTransposeGradOpKernel<float>, ops::CUDNNConvTransposeGradOpKernel<float>,
ops::CudnnConvTransposeGradOpKernel<double>); ops::CUDNNConvTransposeGradOpKernel<double>);
REGISTER_OP_CUDA_KERNEL(conv3d_transpose_cudnn, REGISTER_OP_KERNEL(conv3d_transpose, CUDNN, ::paddle::platform::CUDAPlace,
ops::CudnnConvTransposeOpKernel<float>, ops::CUDNNConvTransposeOpKernel<float>,
ops::CudnnConvTransposeOpKernel<double>); ops::CUDNNConvTransposeOpKernel<double>);
REGISTER_OP_CUDA_KERNEL(conv3d_transpose_cudnn_grad, REGISTER_OP_KERNEL(conv3d_transpose_grad, CUDNN, ::paddle::platform::CUDAPlace,
ops::CudnnConvTransposeGradOpKernel<float>, ops::CUDNNConvTransposeGradOpKernel<float>,
ops::CudnnConvTransposeGradOpKernel<double>); ops::CUDNNConvTransposeGradOpKernel<double>);
...@@ -58,6 +58,23 @@ void ConvTransposeOp::InferShape(framework::InferShapeContext* ctx) const { ...@@ -58,6 +58,23 @@ void ConvTransposeOp::InferShape(framework::InferShapeContext* ctx) const {
ctx->SetOutputDim("Output", framework::make_ddim(output_shape)); ctx->SetOutputDim("Output", framework::make_ddim(output_shape));
} }
framework::OpKernelType ConvTransposeOp::GetExpectedKernelType(
const framework::ExecutionContext& ctx) const {
bool use_cudnn = ctx.Attr<bool>("use_cudnn");
framework::LibraryType library_;
if (use_cudnn) {
library_ = framework::LibraryType::kCUDNN;
} else {
library_ = framework::LibraryType::kPlain;
}
std::string data_format = ctx.Attr<std::string>("data_format");
framework::DataLayout layout_ = framework::StringToDataLayout(data_format);
return framework::OpKernelType(
framework::ToDataType(ctx.Input<Tensor>("Input")->type()), ctx.GetPlace(),
layout_, library_);
}
Conv2DTransposeOpMaker::Conv2DTransposeOpMaker(OpProto* proto, Conv2DTransposeOpMaker::Conv2DTransposeOpMaker(OpProto* proto,
OpAttrChecker* op_checker) OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
...@@ -94,6 +111,25 @@ Conv2DTransposeOpMaker::Conv2DTransposeOpMaker(OpProto* proto, ...@@ -94,6 +111,25 @@ Conv2DTransposeOpMaker::Conv2DTransposeOpMaker(OpProto* proto,
"(vector<int> default:{0, 0}), the paddings(h_pad, w_pad) of convolution " "(vector<int> default:{0, 0}), the paddings(h_pad, w_pad) of convolution "
"transpose operator.") "transpose operator.")
.SetDefault({0, 0}); .SetDefault({0, 0});
AddAttr<bool>(
"use_cudnn",
"(bool, default false) Only used in cudnn kernel, need install cudnn")
.SetDefault(false);
AddAttr<std::string>(
"data_format",
"(string, default NCHW) Only used in "
"An optional string from: \"NHWC\", \"NCHW\". "
"Defaults to \"NHWC\". Specify the data format of the output data, "
"the input will be transformed automatically. ")
.SetDefault("AnyLayout");
// TODO(dzhwinter): need to registered layout transform function
AddAttr<int>("workspace_size_MB",
"Used in cudnn kernel only. workspace size for cudnn, in MB, "
"workspace is a section of GPU memory which will be "
"allocated/freed each time the operator runs, larger "
"workspace size can increase performance but also requires "
"better hardward. This size should be carefully setted.")
.SetDefault(4096);
AddComment(R"DOC( AddComment(R"DOC(
Convolution2D Transpose Operator. Convolution2D Transpose Operator.
...@@ -163,6 +199,25 @@ Conv3DTransposeOpMaker::Conv3DTransposeOpMaker(OpProto* proto, ...@@ -163,6 +199,25 @@ Conv3DTransposeOpMaker::Conv3DTransposeOpMaker(OpProto* proto,
"(vector<int> default:{0, 0, 0}), paddings(d_pad, " "(vector<int> default:{0, 0, 0}), paddings(d_pad, "
"h_pad, w_pad) of convolution transpose operator.") "h_pad, w_pad) of convolution transpose operator.")
.SetDefault({0, 0, 0}); .SetDefault({0, 0, 0});
AddAttr<bool>(
"use_cudnn",
"(bool, default false) Only used in cudnn kernel, need install cudnn")
.SetDefault(false);
AddAttr<std::string>(
"data_format",
"(string, default NCHW) Only used in "
"An optional string from: \"NHWC\", \"NCHW\". "
"Defaults to \"NHWC\". Specify the data format of the output data, "
"the input will be transformed automatically. ")
.SetDefault("AnyLayout");
// TODO(dzhwinter): need to registered layout transform function
AddAttr<int>("workspace_size_MB",
"Used in cudnn kernel only. workspace size for cudnn, in MB, "
"workspace is a section of GPU memory which will be "
"allocated/freed each time the operator runs, larger "
"workspace size can increase performance but also requires "
"better hardward. This size should be carefully setted.")
.SetDefault(4096);
AddComment(R"DOC( AddComment(R"DOC(
Convolution3D Transpose Operator. Convolution3D Transpose Operator.
...@@ -205,6 +260,23 @@ void ConvTransposeOpGrad::InferShape(framework::InferShapeContext* ctx) const { ...@@ -205,6 +260,23 @@ void ConvTransposeOpGrad::InferShape(framework::InferShapeContext* ctx) const {
} }
} }
framework::OpKernelType ConvTransposeOpGrad::GetExpectedKernelType(
const framework::ExecutionContext& ctx) const {
bool use_cudnn = ctx.Attr<bool>("use_cudnn");
framework::LibraryType library_;
if (use_cudnn) {
library_ = framework::LibraryType::kCUDNN;
} else {
library_ = framework::LibraryType::kPlain;
}
std::string data_format = ctx.Attr<std::string>("data_format");
framework::DataLayout layout_ = framework::StringToDataLayout(data_format);
return framework::OpKernelType(
framework::ToDataType(ctx.Input<Tensor>("Input")->type()), ctx.GetPlace(),
layout_, library_);
}
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
......
...@@ -42,12 +42,20 @@ class ConvTransposeOp : public framework::OperatorWithKernel { ...@@ -42,12 +42,20 @@ class ConvTransposeOp : public framework::OperatorWithKernel {
public: public:
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override; void InferShape(framework::InferShapeContext* ctx) const override;
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override;
}; };
class ConvTransposeOpGrad : public framework::OperatorWithKernel { class ConvTransposeOpGrad : public framework::OperatorWithKernel {
public: public:
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override; void InferShape(framework::InferShapeContext* ctx) const override;
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override;
}; };
template <typename DeviceContext, typename T> template <typename DeviceContext, typename T>
......
...@@ -21,7 +21,7 @@ class ElementwiseAddOpMaker : public ElementwiseOpMaker { ...@@ -21,7 +21,7 @@ class ElementwiseAddOpMaker : public ElementwiseOpMaker {
public: public:
ElementwiseAddOpMaker(OpProto* proto, OpAttrChecker* op_checker) ElementwiseAddOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: ElementwiseOpMaker(proto, op_checker) { : ElementwiseOpMaker(proto, op_checker) {
SetComment("Add", "$Out = X + Y$"); SetComment("Add", "Out = X + Y");
AddComment(comment_); AddComment(comment_);
} }
}; };
......
...@@ -21,7 +21,7 @@ class ElementwiseDivOpMaker : public ElementwiseOpMaker { ...@@ -21,7 +21,7 @@ class ElementwiseDivOpMaker : public ElementwiseOpMaker {
public: public:
ElementwiseDivOpMaker(OpProto* proto, OpAttrChecker* op_checker) ElementwiseDivOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: ElementwiseOpMaker(proto, op_checker) { : ElementwiseOpMaker(proto, op_checker) {
SetComment("Div", "$Out = X / Y$"); SetComment("Div", "Out = X / Y");
AddComment(comment_); AddComment(comment_);
} }
}; };
......
...@@ -22,7 +22,7 @@ class ElementwiseMulOpMaker : public ElementwiseOpMaker { ...@@ -22,7 +22,7 @@ class ElementwiseMulOpMaker : public ElementwiseOpMaker {
public: public:
ElementwiseMulOpMaker(OpProto* proto, OpAttrChecker* op_checker) ElementwiseMulOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: ElementwiseOpMaker(proto, op_checker) { : ElementwiseOpMaker(proto, op_checker) {
SetComment("Mul", "$Out = X \\odot\\ Y$"); SetComment("Mul", "Out = X \\odot\\ Y");
AddComment(comment_); AddComment(comment_);
} }
}; };
......
...@@ -58,7 +58,8 @@ Limited Elementwise {name} Operator. ...@@ -58,7 +58,8 @@ Limited Elementwise {name} Operator.
The equation is: The equation is:
{equation} .. math::
{equation}
X is a tensor of any dimension and the dimensions of tensor Y must be smaller than X is a tensor of any dimension and the dimensions of tensor Y must be smaller than
or equal to the dimensions of X. or equal to the dimensions of X.
...@@ -71,15 +72,16 @@ For case 2: ...@@ -71,15 +72,16 @@ For case 2:
Y will be broadcasted to match the shape of X and axis should be Y will be broadcasted to match the shape of X and axis should be
the starting dimension index for broadcasting Y onto X. the starting dimension index for broadcasting Y onto X.
example: For example
shape(X) = (2, 3, 4, 5), shape(Y) = (,) .. code-block:: python
shape(X) = (2, 3, 4, 5), shape(Y) = (5,)
shape(X) = (2, 3, 4, 5), shape(Y) = (4, 5)
shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
Both the input X and Y can carry the LoD (Level of Details) information, shape(X) = (2, 3, 4, 5), shape(Y) = (,)
or not. But the output only shares the LoD information with input X. shape(X) = (2, 3, 4, 5), shape(Y) = (5,)
shape(X) = (2, 3, 4, 5), shape(Y) = (4, 5)
shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
Either of the inputs X and Y or none can carry the LoD (Level of Details) information. However, the output only shares the LoD information with input X.
)DOC"; )DOC";
AddComment(comment_); AddComment(comment_);
......
...@@ -21,7 +21,7 @@ class ElementwiseSubOpMaker : public ElementwiseOpMaker { ...@@ -21,7 +21,7 @@ class ElementwiseSubOpMaker : public ElementwiseOpMaker {
public: public:
ElementwiseSubOpMaker(OpProto* proto, OpAttrChecker* op_checker) ElementwiseSubOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: ElementwiseOpMaker(proto, op_checker) { : ElementwiseOpMaker(proto, op_checker) {
SetComment("Sub", "$Out = X - Y$"); SetComment("Sub", "Out = X - Y");
AddComment(comment_); AddComment(comment_);
} }
}; };
......
...@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and ...@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/operators/math/sequence2batch.h" #include "paddle/operators/math/sequence2batch.h"
#include "paddle/operators/math/math_function.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
......
...@@ -12,7 +12,8 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ...@@ -12,7 +12,8 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/operators/pool_cudnn_op.h" #include "paddle/framework/op_registry.h"
#include "paddle/operators/pool_op.h"
#include "paddle/platform/cudnn_helper.h" #include "paddle/platform/cudnn_helper.h"
namespace paddle { namespace paddle {
...@@ -25,7 +26,7 @@ using DataLayout = platform::DataLayout; ...@@ -25,7 +26,7 @@ using DataLayout = platform::DataLayout;
using PoolingMode = platform::PoolingMode; using PoolingMode = platform::PoolingMode;
template <typename T> template <typename T>
class PoolCudnnOpKernel : public framework::OpKernel<T> { class PoolCUDNNOpKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext &ctx) const override { void Compute(const framework::ExecutionContext &ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
...@@ -86,7 +87,7 @@ class PoolCudnnOpKernel : public framework::OpKernel<T> { ...@@ -86,7 +87,7 @@ class PoolCudnnOpKernel : public framework::OpKernel<T> {
}; };
template <typename T> template <typename T>
class PoolCudnnGradOpKernel : public framework::OpKernel<T> { class PoolCUDNNGradOpKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext &ctx) const override { void Compute(const framework::ExecutionContext &ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
...@@ -162,12 +163,16 @@ class PoolCudnnGradOpKernel : public framework::OpKernel<T> { ...@@ -162,12 +163,16 @@ class PoolCudnnGradOpKernel : public framework::OpKernel<T> {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(pool2d_cudnn, ops::PoolCudnnOpKernel<float>, REGISTER_OP_KERNEL(pool2d, CUDNN, ::paddle::platform::CUDAPlace,
ops::PoolCudnnOpKernel<double>); ops::PoolCUDNNOpKernel<float>,
REGISTER_OP_CUDA_KERNEL(pool2d_cudnn_grad, ops::PoolCudnnGradOpKernel<float>, ops::PoolCUDNNOpKernel<double>);
ops::PoolCudnnGradOpKernel<double>); REGISTER_OP_KERNEL(pool2d_grad, CUDNN, ::paddle::platform::CUDAPlace,
ops::PoolCUDNNGradOpKernel<float>,
REGISTER_OP_CUDA_KERNEL(pool3d_cudnn, ops::PoolCudnnOpKernel<float>, ops::PoolCUDNNGradOpKernel<double>);
ops::PoolCudnnOpKernel<double>);
REGISTER_OP_CUDA_KERNEL(pool3d_cudnn_grad, ops::PoolCudnnGradOpKernel<float>, REGISTER_OP_KERNEL(pool3d, CUDNN, ::paddle::platform::CUDAPlace,
ops::PoolCudnnGradOpKernel<double>); ops::PoolCUDNNOpKernel<float>,
ops::PoolCUDNNOpKernel<double>);
REGISTER_OP_KERNEL(pool3d_grad, CUDNN, ::paddle::platform::CUDAPlace,
ops::PoolCUDNNGradOpKernel<float>,
ops::PoolCUDNNGradOpKernel<double>);
...@@ -61,6 +61,23 @@ void PoolOp::InferShape(framework::InferShapeContext *ctx) const { ...@@ -61,6 +61,23 @@ void PoolOp::InferShape(framework::InferShapeContext *ctx) const {
ctx->ShareLoD("X", "Out"); ctx->ShareLoD("X", "Out");
} }
framework::OpKernelType PoolOp::GetExpectedKernelType(
const framework::ExecutionContext &ctx) const {
bool use_cudnn = ctx.Attr<bool>("use_cudnn");
framework::LibraryType library_;
if (use_cudnn) {
library_ = framework::LibraryType::kCUDNN;
} else {
library_ = framework::LibraryType::kPlain;
}
std::string data_format = ctx.Attr<std::string>("data_format");
framework::DataLayout layout_ = framework::StringToDataLayout(data_format);
return framework::OpKernelType(
framework::ToDataType(ctx.Input<Tensor>("X")->type()), ctx.GetPlace(),
layout_, library_);
}
void PoolOpGrad::InferShape(framework::InferShapeContext *ctx) const { void PoolOpGrad::InferShape(framework::InferShapeContext *ctx) const {
PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) must not be null."); PADDLE_ENFORCE(ctx->HasInput("X"), "Input(X) must not be null.");
PADDLE_ENFORCE(ctx->HasOutput(framework::GradVarName("X")), PADDLE_ENFORCE(ctx->HasOutput(framework::GradVarName("X")),
...@@ -68,6 +85,23 @@ void PoolOpGrad::InferShape(framework::InferShapeContext *ctx) const { ...@@ -68,6 +85,23 @@ void PoolOpGrad::InferShape(framework::InferShapeContext *ctx) const {
ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X")); ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X"));
} }
framework::OpKernelType PoolOpGrad::GetExpectedKernelType(
const framework::ExecutionContext &ctx) const {
bool use_cudnn = ctx.Attr<bool>("use_cudnn");
framework::LibraryType library_;
if (use_cudnn) {
library_ = framework::LibraryType::kCUDNN;
} else {
library_ = framework::LibraryType::kPlain;
}
std::string data_format = ctx.Attr<std::string>("data_format");
framework::DataLayout layout_ = framework::StringToDataLayout(data_format);
return framework::OpKernelType(
framework::ToDataType(ctx.Input<Tensor>("X")->type()), ctx.GetPlace(),
layout_, library_);
}
Pool2dOpMaker::Pool2dOpMaker(OpProto *proto, OpAttrChecker *op_checker) Pool2dOpMaker::Pool2dOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
...@@ -101,15 +135,27 @@ Pool2dOpMaker::Pool2dOpMaker(OpProto *proto, OpAttrChecker *op_checker) ...@@ -101,15 +135,27 @@ Pool2dOpMaker::Pool2dOpMaker(OpProto *proto, OpAttrChecker *op_checker)
AddAttr<std::vector<int>>("strides", AddAttr<std::vector<int>>("strides",
"(vector<int>, default {1, 1}), strides(height, " "(vector<int>, default {1, 1}), strides(height, "
"width) of pooling operator.") "width) of pooling operator.")
.SetDefault({1, 1}); // TODO(Chengduo): Add checker. (Currently, .SetDefault({1, 1});
// TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddAttr<std::vector<int>>( AddAttr<std::vector<int>>(
"paddings", "paddings",
"(vector<int>, default {0,0}), paddings(height, width) of pooling " "(vector<int>, default {0,0}), paddings(height, width) of pooling "
"operator." "operator."
"If global_pooling = true, paddings and ksize will be ignored.") "If global_pooling = true, paddings and ksize will be ignored.")
.SetDefault({0, 0}); // TODO(Chengduo): Add checker. (Currently, .SetDefault({0, 0});
// TypedAttrChecker don't support vector type.) AddAttr<bool>(
"use_cudnn",
"(bool, default false) Only used in cudnn kernel, need install cudnn")
.SetDefault(false);
AddAttr<std::string>(
"data_format",
"(string, default NCHW) Only used in "
"An optional string from: \"NHWC\", \"NCHW\". "
"Defaults to \"NHWC\". Specify the data format of the output data, "
"the input will be transformed automatically. ")
.SetDefault("AnyLayout");
// TODO(dzhwinter): need to registered layout transform function
AddComment(R"DOC( AddComment(R"DOC(
Pool2d Operator. Pool2d Operator.
...@@ -182,6 +228,19 @@ Pool3dOpMaker::Pool3dOpMaker(OpProto *proto, OpAttrChecker *op_checker) ...@@ -182,6 +228,19 @@ Pool3dOpMaker::Pool3dOpMaker(OpProto *proto, OpAttrChecker *op_checker)
.SetDefault({0, 0, 0}); // TODO(Chengduo): Add checker. (Currently, .SetDefault({0, 0, 0}); // TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddAttr<bool>(
"use_cudnn",
"(bool, default false) Only used in cudnn kernel, need install cudnn")
.SetDefault(false);
AddAttr<std::string>(
"data_format",
"(string, default NCHW) Only used in "
"An optional string from: \"NHWC\", \"NCHW\". "
"Defaults to \"NHWC\". Specify the data format of the output data, "
"the input will be transformed automatically. ")
.SetDefault("AnyLayout");
// TODO(dzhwinter): need to registered layout transform function
AddComment(R"DOC( AddComment(R"DOC(
Pool3d Operator. Pool3d Operator.
......
...@@ -29,6 +29,10 @@ class PoolOp : public framework::OperatorWithKernel { ...@@ -29,6 +29,10 @@ class PoolOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override; void InferShape(framework::InferShapeContext* ctx) const override;
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override;
}; };
class PoolOpGrad : public framework::OperatorWithKernel { class PoolOpGrad : public framework::OperatorWithKernel {
...@@ -36,6 +40,10 @@ class PoolOpGrad : public framework::OperatorWithKernel { ...@@ -36,6 +40,10 @@ class PoolOpGrad : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel; using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext* ctx) const override; void InferShape(framework::InferShapeContext* ctx) const override;
protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override;
}; };
class Pool2dOpMaker : public framework::OpProtoAndCheckerMaker { class Pool2dOpMaker : public framework::OpProtoAndCheckerMaker {
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <algorithm>
#include <ctime>
#include "paddle/framework/op_registry.h"
#include "paddle/framework/variable.h"
namespace paddle {
namespace operators {
#define CLOG std::cout
const std::string kForward = "FORWARD";
const std::string kBackward = "BACKWARD";
const std::string kBoth = "BOTH";
struct Formater {
std::string message;
std::string name;
std::vector<int> dims;
std::type_index dtype{typeid(char)};
framework::LoD lod;
int summarize;
void* data{nullptr};
void operator()(size_t size) {
PrintMessage();
PrintName();
PrintDims();
PrintDtype();
PrintLod();
PrintData(size);
}
private:
void PrintMessage() { CLOG << std::time(nullptr) << "\t" << message; }
void PrintName() {
if (!name.empty()) {
CLOG << "Tensor[" << name << "]" << std::endl;
}
}
void PrintDims() {
if (!dims.empty()) {
CLOG << "\tshape: [";
for (auto i : dims) {
CLOG << i << ",";
}
CLOG << "]" << std::endl;
}
}
void PrintDtype() {
if (dtype.hash_code() != typeid(char).hash_code()) {
CLOG << "\tdtype: " << dtype.name() << std::endl;
}
}
void PrintLod() {
if (!lod.empty()) {
CLOG << "\tLoD: [";
for (auto level : lod) {
CLOG << "[ ";
for (auto i : level) {
CLOG << i << ",";
}
CLOG << " ]";
}
CLOG << "]" << std::endl;
}
}
void PrintData(size_t size) {
PADDLE_ENFORCE_NOT_NULL(data);
// print float
if (dtype.hash_code() == typeid(float).hash_code()) {
Display<float>(size);
}
if (dtype.hash_code() == typeid(double).hash_code()) {
Display<double>(size);
}
if (dtype.hash_code() == typeid(int).hash_code()) {
Display<int>(size);
}
if (dtype.hash_code() == typeid(int64_t).hash_code()) {
Display<int64_t>(size);
}
}
template <typename T>
void Display(size_t size) {
auto* d = (T*)data;
CLOG << "\tdata: ";
if (summarize != -1) {
summarize = std::min(size, (size_t)summarize);
for (int i = 0; i < summarize; i++) {
CLOG << d[i] << ",";
}
} else {
for (size_t i = 0; i < size; i++) {
CLOG << d[i] << ",";
}
}
CLOG << std::endl;
}
};
// TODO(ChunweiYan) there should be some other printers for TensorArray
class TensorPrintOp : public framework::OperatorBase {
public:
TensorPrintOp(const std::string& type,
const framework::VariableNameMap& inputs,
const framework::VariableNameMap& outputs,
const framework::AttributeMap& attrs)
: OperatorBase(type, inputs, outputs, attrs) {}
TensorPrintOp(const TensorPrintOp& o)
: framework::OperatorBase(
static_cast<const framework::OperatorBase&>(o)) {
PADDLE_THROW("Not implemented.");
}
void Run(const framework::Scope& scope,
const platform::Place& place) const override {
const framework::Variable* in_var_ptr = nullptr;
std::string phase = kForward;
std::string printed_var_name = "";
auto& inputs = Inputs();
if (inputs.find("In") != inputs.end() && !Inputs("In").empty()) {
in_var_ptr = scope.FindVar(Input("In"));
printed_var_name = Inputs("In").front();
} else if (inputs.find("In@GRAD") != inputs.end() &&
!Inputs("In@GRAD").empty()) {
in_var_ptr = scope.FindVar(Input("In@GRAD"));
printed_var_name = Inputs("In@GRAD").front();
phase = kBackward;
} else {
PADDLE_THROW("Unknown phase, should be forward or backward.");
}
PADDLE_ENFORCE_NOT_NULL(in_var_ptr);
auto& in_tensor = in_var_ptr->Get<framework::LoDTensor>();
auto* out_var_ptr = scope.FindVar(Output("Out"));
auto& out_tensor = *out_var_ptr->GetMutable<framework::LoDTensor>();
// Just copy data from input tensor to output tensor
// output tensor share same memory with input tensor
out_tensor.ShareDataWith(in_tensor);
out_tensor.set_lod(in_tensor.lod());
std::string print_phase = Attr<std::string>("print_phase");
if (print_phase != phase && print_phase != kBoth) {
return;
}
int first_n = Attr<int>("first_n");
if (first_n > 0 && ++times_ > first_n) return;
framework::LoDTensor printed_tensor;
printed_tensor.set_lod(in_tensor.lod());
printed_tensor.Resize(in_tensor.dims());
if (platform::is_cpu_place(in_tensor.place())) {
printed_tensor.ShareDataWith(in_tensor);
} else {
// copy data to cpu to print
platform::CPUPlace place;
framework::Copy(in_tensor, place, &printed_tensor);
}
Formater formater;
if (Attr<bool>("print_tensor_name")) {
formater.name = printed_var_name;
}
if (Attr<bool>("print_tensor_type")) {
formater.dtype = printed_tensor.type();
}
if (Attr<bool>("print_tensor_shape")) {
auto& dims = printed_tensor.dims();
formater.dims.resize(dims.size());
for (int i = 0; i < dims.size(); ++i) formater.dims[i] = dims[i];
}
if (Attr<bool>("print_tensor_lod")) {
formater.lod = printed_tensor.lod();
}
formater.summarize = Attr<int>("summarize");
formater.data = (void*)printed_tensor.data<void>();
formater(printed_tensor.numel());
}
private:
mutable int times_{0};
};
class PrintOpProtoAndCheckMaker : public framework::OpProtoAndCheckerMaker {
public:
PrintOpProtoAndCheckMaker(OpProto* proto, OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("In", "Input tensor to be displayed.");
AddAttr<int>("first_n", "Only log `first_n` number of times.");
AddAttr<std::string>("message", "A string message to print as a prefix.");
AddAttr<int>("summarize", "Number of elements printed.");
AddAttr<bool>("print_tensor_name", "Whether to print the tensor name.");
AddAttr<bool>("print_tensor_type", "Whether to print the tensor's dtype.");
AddAttr<bool>("print_tensor_shape", "Whether to print the tensor's shape.");
AddAttr<bool>("print_tensor_lod", "Whether to print the tensor's lod.");
AddAttr<std::string>(
"print_phase",
"(string, default 'BOTH') Which phase to display including 'FORWARD' "
"'BACKWARD' and 'BOTH'.")
.SetDefault(kBoth)
.InEnum({kForward, kBackward, kBoth});
AddOutput("Out", "Output tensor with same data as input tensor.");
AddComment(R"DOC(
Creates a print op that will print when a tensor is accessed.
Wraps the tensor passed in so that whenever that a tensor is accessed,
the message `message` is printed, along with the current value of the
tensor `t`.)DOC");
}
};
class InferShapeForward : public framework::InferShapeBase {
public:
void operator()(framework::InferShapeContext* context) const override {
PADDLE_ENFORCE(context->HasInput("In"), "Input(In) should not be null.");
context->ShareLoD("In", /*->*/ "Out");
context->SetOutputDim("Out", context->GetInputDim("In"));
}
};
class InferShapeBackward : public framework::InferShapeBase {
public:
void operator()(framework::InferShapeContext* context) const override {
PADDLE_ENFORCE(context->HasInput("In@GRAD"),
"Input(In@GRAD) should not be null.");
context->ShareLoD("In@GRAD", /*->*/ "Out");
context->SetOutputDim("Out", context->GetInputDim("In@GRAD"));
}
};
class InferVarType : public framework::VarTypeInference {
public:
void operator()(const framework::OpDesc& op_desc,
framework::BlockDesc* block) const override {}
};
class PrintOpProtoAndCheckGradOpMaker
: public framework::SingleGradOpDescMaker {
public:
using framework::SingleGradOpDescMaker::SingleGradOpDescMaker;
std::unique_ptr<framework::OpDesc> Apply() const override {
auto* op_desc_ptr = new framework::OpDesc();
op_desc_ptr->SetType("print_grad");
op_desc_ptr->SetInput("In@GRAD", OutputGrad("Out"));
op_desc_ptr->SetOutput("Out", InputGrad("In"));
op_desc_ptr->SetAttrMap(Attrs());
return std::unique_ptr<framework::OpDesc>(op_desc_ptr);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(print, ops::TensorPrintOp, ops::PrintOpProtoAndCheckMaker,
ops::PrintOpProtoAndCheckGradOpMaker, ops::InferShapeForward,
ops::InferVarType);
REGISTER_OPERATOR(print_grad, ops::TensorPrintOp, ops::InferShapeBackward);
...@@ -26,22 +26,44 @@ class ReorderLoDTensorByRankTableOpProtoMaker ...@@ -26,22 +26,44 @@ class ReorderLoDTensorByRankTableOpProtoMaker
ReorderLoDTensorByRankTableOpProtoMaker(OpProto *proto, ReorderLoDTensorByRankTableOpProtoMaker(OpProto *proto,
OpAttrChecker *op_checker) OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(LoDTensor) the input lod tensor need to be reordered."); AddInput("X",
"(LoDTensor), the input lod tensor to be reordered according to "
"Input(RankTable).");
AddInput("RankTable", AddInput("RankTable",
"(LoDRankTable) the rank table that input need follow"); "(LoDRankTable), the rank table according to which Input(X) is "
AddOutput("Out", "(LoDTensor) reordered lod tensor"); "reordered.");
AddComment(R"DOC(ReorderLoDTensorByRankTable AddOutput("Out", "(LoDTensor), the reordered lod tensor.");
AddComment(R"DOC(ReorderLoDTensorByRankTable operator.
Reorder the input X by the rank of `RankTable`. If `RankTable` is ordered by Input(X) is a batch of sequences. Input(RankTable) stores new orders of the
index [3, 0, 2, 1]. Input X will reorder its sequence, the third sequence of input sequence batch. The reorder_lod_tensor_by_rank operator reorders the
X will be the first sequence of Output. Input(X) according to the information provided by Input(RankTable).
NOTE: The RankTable does not need to be calculated by X.
For example: For example:
The X = [Seq0, Seq1, Seq2, Seq3]. The indices of RankTable are [3, 0, 2, 1].
The Out = [Seq3, Seq0, Seq2, Seq1] with correct LoD information. If the indices stored in the Input(RankTable) are [3, 0, 2, 1], the
Input(X) will be reordered that the fourth sequence in Input(X) will become the
first one, and then followed by the original first, third, and the second one.
This is:
X = [Seq0, Seq1, Seq2, Seq3]. The indices in RankTable are [3, 0, 2, 1].
Out = [Seq3, Seq0, Seq2, Seq1] with a new LoD information.
If the LoD information of Input(X) is empty, this means Input(X) is not sequence
data. This is also identical to a batch of sequences where each sequence has a
fixed length 1. In this case, the reorder_lod_tensor_by_rank operator reorders
each slice of Input(X) along the first axis according to Input(RankTable).
This is:
X = [Slice0, Slice1, Slice2, Slice3] and its LoD information is empty. The
indices in RankTable are [3, 0, 2, 1].
Out = [Slice3, Slice0, Slice2, Slice1] with no LoD information is appended.
NOTE: This operator sorts Input(X) according to a given LoDRankTable which does
not need to be calculated according to Input(X). It can be calculated according
to another different sequence, and then this operator sorts Input(X) according
to the given LoDRankTable.
)DOC"); )DOC");
} }
}; };
......
...@@ -45,7 +45,7 @@ class ShrinkRNNMemoryOp : public ArrayOp { ...@@ -45,7 +45,7 @@ class ShrinkRNNMemoryOp : public ArrayOp {
rank_items.begin(); rank_items.begin();
auto *out_var = scope.FindVar(Output("Out")); auto *out_var = scope.FindVar(Output("Out"));
PADDLE_ENFORCE(out_var != nullptr, "Output Out must be set"); PADDLE_ENFORCE(out_var != nullptr, "Output(Out) must be set.");
auto &out_tensor = *out_var->GetMutable<framework::LoDTensor>(); auto &out_tensor = *out_var->GetMutable<framework::LoDTensor>();
size_t height = dst_num_rows; size_t height = dst_num_rows;
...@@ -76,15 +76,17 @@ class ShrinkRNNMemoryOpProtoMaker : public framework::OpProtoAndCheckerMaker { ...@@ -76,15 +76,17 @@ class ShrinkRNNMemoryOpProtoMaker : public framework::OpProtoAndCheckerMaker {
"(LoDTensor) The step index. The RNN step memory 'X' will be " "(LoDTensor) The step index. The RNN step memory 'X' will be "
"shrinked to match the size of the input of the index'th step."); "shrinked to match the size of the input of the index'th step.");
AddOutput("Out", "(LoDTensor) The shrinked RNN step memory."); AddOutput("Out", "(LoDTensor) The shrinked RNN step memory.");
AddComment( AddComment(R"DOC(
R"DOC( This operator is used to shrink output batch of memory defined in dynamic RNN.
In dynamic RNN, we are able to handle sequences of different lengths.
Because of the multiple lengths, the size of each step input can be Dynamic RNN is able to handle variable-length sequences, in which, sequences in
different, which may lead to a mismatching between the input of a mini-batch are sorted by their lengths first. After that, the longest sequence
the current step and the memory generated by the previous one. This becomes the first one in the sorted batch, followed by the second longest, the
operator shrinks memory according to the size of the next step input, third longest, and so on. Dynamic RNN then slices a batch input timestep by
to make sure that they can match each other. timestep from the sorted input. Once any sequence in the input batch reaches its
)DOC"); end, memory defined in dynamicRNN has to shrink its outputs to adapt to the input
batch size for the next time step.
)DOC");
} }
}; };
...@@ -136,6 +138,7 @@ class ShrinkRNNMemoryGradOp : public ArrayOp { ...@@ -136,6 +138,7 @@ class ShrinkRNNMemoryGradOp : public ArrayOp {
math::set_constant(dev_ctx, &rest_tensor, 0.0f); math::set_constant(dev_ctx, &rest_tensor, 0.0f);
} }
} }
dx_tensor.set_lod(x_tensor.lod());
} }
}; };
......
...@@ -121,8 +121,8 @@ class WhileGradOp : public framework::OperatorBase { ...@@ -121,8 +121,8 @@ class WhileGradOp : public framework::OperatorBase {
for (size_t i = 0; i < outside_og_names.size(); ++i) { for (size_t i = 0; i < outside_og_names.size(); ++i) {
auto outside_og_name = outside_og_names[i]; auto outside_og_name = outside_og_names[i];
auto inside_og_name = inside_og_names[i]; auto inside_og_name = inside_og_names[i];
VLOG(10) << "Linking outside " << outside_og_name << " --> inside " VLOG(8) << "Linking outside " << outside_og_name << " --> inside "
<< inside_og_name; << inside_og_name;
auto &og_outside = auto &og_outside =
detail::Ref(scope.FindVar(outside_og_name), detail::Ref(scope.FindVar(outside_og_name),
"Cannot find Outside Gradient %s", outside_og_name); "Cannot find Outside Gradient %s", outside_og_name);
...@@ -141,11 +141,11 @@ class WhileGradOp : public framework::OperatorBase { ...@@ -141,11 +141,11 @@ class WhileGradOp : public framework::OperatorBase {
auto &outside_array = og_outside.Get<framework::LoDTensorArray>(); auto &outside_array = og_outside.Get<framework::LoDTensorArray>();
auto &inside_array = auto &inside_array =
detail::Ref(og_inside.GetMutable<framework::LoDTensorArray>()); detail::Ref(og_inside.GetMutable<framework::LoDTensorArray>());
VLOG(10) << outside_og_name << " size = " << outside_array.size(); VLOG(8) << outside_og_name << " size = " << outside_array.size();
inside_array.resize(outside_array.size()); inside_array.resize(outside_array.size());
for (size_t j = 0; j < inside_array.size(); ++j) { for (size_t j = 0; j < inside_array.size(); ++j) {
VLOG(10) << j << " " << outside_array[j].numel(); VLOG(8) << j << " " << outside_array[j].numel();
if (outside_array[j].numel() != 0) { if (outside_array[j].numel() != 0) {
inside_array[j].set_lod(outside_array[j].lod()); inside_array[j].set_lod(outside_array[j].lod());
inside_array[j].ShareDataWith(outside_array[j]); inside_array[j].ShareDataWith(outside_array[j]);
...@@ -187,10 +187,14 @@ class WhileGradOp : public framework::OperatorBase { ...@@ -187,10 +187,14 @@ class WhileGradOp : public framework::OperatorBase {
attrs["shape"] = framework::vectorize2int(inside_tensor.dims()); attrs["shape"] = framework::vectorize2int(inside_tensor.dims());
attrs["value"] = 0.0f; attrs["value"] = 0.0f;
auto var_name = pg_names[param_id];
auto zero_op = framework::OpRegistry::CreateOp( auto zero_op = framework::OpRegistry::CreateOp(
"fill_constant", framework::VariableNameMap{}, "fill_constant", framework::VariableNameMap{},
{{"Out", {pg_names[param_id]}}}, attrs); {{"Out", {var_name}}}, attrs);
zero_op->Run(scope, dev_place); zero_op->Run(scope, dev_place);
scope.FindVar(var_name)
->GetMutable<framework::LoDTensor>()
->set_lod(inside_tensor.lod());
} }
} }
...@@ -231,7 +235,7 @@ class WhileGradOpDescMaker : public framework::SingleGradOpDescMaker { ...@@ -231,7 +235,7 @@ class WhileGradOpDescMaker : public framework::SingleGradOpDescMaker {
auto igs = InputGrad(kX, /*do not drop empty gradient*/ false); auto igs = InputGrad(kX, /*do not drop empty gradient*/ false);
for (auto &each_ig : igs) { for (auto &each_ig : igs) {
if (inner_op_outputs.find(each_ig) == inner_op_outputs.end()) { if (inner_op_outputs.find(each_ig) == inner_op_outputs.end()) {
VLOG(10) << "Ignore " << each_ig; VLOG(8) << "Ignore " << each_ig;
each_ig = framework::kEmptyVarName; each_ig = framework::kEmptyVarName;
} }
} }
......
...@@ -44,7 +44,7 @@ CUDNN_DNN_ROUTINE_EACH_R7(DEFINE_WRAP); ...@@ -44,7 +44,7 @@ CUDNN_DNN_ROUTINE_EACH_R7(DEFINE_WRAP);
#ifdef PADDLE_USE_DSO #ifdef PADDLE_USE_DSO
bool HasCUDNN() { bool HasCUDNN() {
std::call_once(cudnn_dso_flag, GetCudnnDsoHandle, &cudnn_dso_handle); std::call_once(cudnn_dso_flag, GetCUDNNDsoHandle, &cudnn_dso_handle);
return cudnn_dso_handle != nullptr; return cudnn_dso_handle != nullptr;
} }
......
...@@ -36,7 +36,7 @@ extern void EnforceCUDNNLoaded(const char* fn_name); ...@@ -36,7 +36,7 @@ extern void EnforceCUDNNLoaded(const char* fn_name);
auto operator()(Args... args) -> decltype(__name(args...)) { \ auto operator()(Args... args) -> decltype(__name(args...)) { \
using cudnn_func = decltype(__name(args...)) (*)(Args...); \ using cudnn_func = decltype(__name(args...)) (*)(Args...); \
std::call_once(cudnn_dso_flag, \ std::call_once(cudnn_dso_flag, \
paddle::platform::dynload::GetCudnnDsoHandle, \ paddle::platform::dynload::GetCUDNNDsoHandle, \
&cudnn_dso_handle); \ &cudnn_dso_handle); \
EnforceCUDNNLoaded(#__name); \ EnforceCUDNNLoaded(#__name); \
void* p_##__name = dlsym(cudnn_dso_handle, #__name); \ void* p_##__name = dlsym(cudnn_dso_handle, #__name); \
......
...@@ -134,7 +134,7 @@ void GetCublasDsoHandle(void** dso_handle) { ...@@ -134,7 +134,7 @@ void GetCublasDsoHandle(void** dso_handle) {
#endif #endif
} }
void GetCudnnDsoHandle(void** dso_handle) { void GetCUDNNDsoHandle(void** dso_handle) {
#if defined(__APPLE__) || defined(__OSX__) #if defined(__APPLE__) || defined(__OSX__)
GetDsoHandleFromSearchPath(FLAGS_cudnn_dir, "libcudnn.dylib", dso_handle, GetDsoHandleFromSearchPath(FLAGS_cudnn_dir, "libcudnn.dylib", dso_handle,
false); false);
......
...@@ -32,7 +32,7 @@ void GetCublasDsoHandle(void** dso_handle); ...@@ -32,7 +32,7 @@ void GetCublasDsoHandle(void** dso_handle);
* @param **dso_handle dso handler * @param **dso_handle dso handler
* *
*/ */
void GetCudnnDsoHandle(void** dso_handle); void GetCUDNNDsoHandle(void** dso_handle);
/** /**
* @brief load the DSO of CURAND * @brief load the DSO of CURAND
......
...@@ -430,13 +430,8 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -430,13 +430,8 @@ All parameter, weight, gradient are variables in Paddle.
m.def("init_glog", framework::InitGLOG); m.def("init_glog", framework::InitGLOG);
m.def("init_devices", &framework::InitDevices); m.def("init_devices", &framework::InitDevices);
m.def("use_cpu", framework::UseCPU);
m.def("use_mkldnn", framework::UseMKLDNN);
m.def("use_cuda", framework::UseCUDA);
m.def("use_cudnn", framework::UseCUDNN);
m.def("use_all", framework::UseALL);
m.def("is_compile_gpu", IsCompileGPU); m.def("is_compile_gpu", IsCompileGPU);
m.def("set_feed_variable", framework::SetFeedVariable); m.def("set_feed_variable", framework::SetFeedVariable);
m.def("get_fetch_variable", framework::GetFetchVariable); m.def("get_fetch_variable", framework::GetFetchVariable);
......
...@@ -14,7 +14,7 @@ limitations under the License. */ ...@@ -14,7 +14,7 @@ limitations under the License. */
#pragma once #pragma once
#include <string> #include <string>
#include "paddle/framework/tensor.h" #include "paddle/framework/lod_tensor.h"
#include "paddle/memory/memcpy.h" #include "paddle/memory/memcpy.h"
#include "paddle/platform/device_context.h" #include "paddle/platform/device_context.h"
#include "pybind11/numpy.h" #include "pybind11/numpy.h"
...@@ -97,14 +97,27 @@ inline py::buffer_info CastToPyBuffer(framework::Tensor &tensor) { ...@@ -97,14 +97,27 @@ inline py::buffer_info CastToPyBuffer(framework::Tensor &tensor) {
template <typename T> template <typename T>
T TensorGetElement(framework::Tensor &self, size_t offset) { T TensorGetElement(framework::Tensor &self, size_t offset) {
PADDLE_ENFORCE(platform::is_cpu_place(self.place())); if (platform::is_cpu_place(self.place())) {
return self.data<T>()[offset]; return self.data<T>()[offset];
} else {
std::shared_ptr<framework::Tensor> dst(new framework::Tensor);
framework::Copy(self, platform::CPUPlace(), dst.get());
return dst->data<T>()[offset];
}
} }
// TODO(dzhwinter) : fix the redundent Tensor allocate and free
template <typename T> template <typename T>
void TensorSetElement(framework::Tensor &self, size_t offset, T elem) { void TensorSetElement(framework::Tensor &self, size_t offset, T elem) {
PADDLE_ENFORCE(platform::is_cpu_place(self.place())); if (platform::is_gpu_place(self.place())) {
self.data<T>()[offset] = elem; std::shared_ptr<framework::Tensor> dst(new framework::Tensor);
framework::Copy(self, platform::CPUPlace(), dst.get());
dst->data<T>()[offset] = elem;
framework::Copy(*dst.get(), self.place(), &self);
} else if (platform::is_cpu_place(self.place())) {
self.data<T>()[offset] = elem;
}
} }
template <typename T> template <typename T>
......
...@@ -49,7 +49,18 @@ function cpu_config() { ...@@ -49,7 +49,18 @@ function cpu_config() {
if [ "@WITH_MKL@" == "OFF" ]; then if [ "@WITH_MKL@" == "OFF" ]; then
return 0 return 0
fi fi
ht=`lscpu |grep "per core"|awk -F':' '{print $2}'|xargs` platform="`uname -s`"
ht=0
if [ $platform == "Linux" ]; then
ht=`lscpu |grep "per core"|awk -F':' '{print $2}'|xargs`
elif [ $platform == "Darwin" ]; then
if [`sysctl -n hw.physicalcpu` -eq `sysctl -n hw.logicalcpu`]; then
# HT is OFF
ht=1
fi
else
return 0
fi
if [ $ht -eq 1 ]; then # HT is OFF if [ $ht -eq 1 ]; then # HT is OFF
if [ -z "$KMP_AFFINITY" ]; then if [ -z "$KMP_AFFINITY" ]; then
export KMP_AFFINITY="granularity=fine,compact,0,0" export KMP_AFFINITY="granularity=fine,compact,0,0"
...@@ -72,7 +83,15 @@ function threads_config() { ...@@ -72,7 +83,15 @@ function threads_config() {
# according to trainer_count and total processors # according to trainer_count and total processors
# only when MKL enabled # only when MKL enabled
# auto set OPENBLAS_NUM_THREADS when do not use MKL # auto set OPENBLAS_NUM_THREADS when do not use MKL
processors=`grep "processor" /proc/cpuinfo|sort -u|wc -l` platform="`uname -s`"
processors=0
if [ $platform == "Linux" ]; then
processors=`grep "processor" /proc/cpuinfo|sort -u|wc -l`
elif [ $platform == "Darwin" ]; then
processors=`sysctl -n hw.logicalcpu`
else
return 0
fi
trainers=`grep -Eo 'trainer_count.[0-9]+' <<< "$@" |grep -Eo '[0-9]+'|xargs` trainers=`grep -Eo 'trainer_count.[0-9]+' <<< "$@" |grep -Eo '[0-9]+'|xargs`
if [ -z $trainers ]; then if [ -z $trainers ]; then
trainers=1 trainers=1
...@@ -148,11 +167,7 @@ else: ...@@ -148,11 +167,7 @@ else:
sys.exit(0) sys.exit(0)
EOF EOF
if [ "`uname -s`" == "Linux" ]; then cpu_config
# only support on linux yet, with mac can use v2
cpu_config
fi
# echo $KMP_AFFINITY $OMP_DYNAMIC # echo $KMP_AFFINITY $OMP_DYNAMIC
case "$1" in case "$1" in
......
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import collections
from paddle.trainer_config_helpers.layers import LayerOutput
from paddle.v2.layer import parse_network
from paddle.proto import TrainerConfig_pb2
__all__ = ["dump_v2_config"]
def dump_v2_config(topology, save_path, binary=False):
""" Dump the network topology to a specified file.
This function is only used to dump network defined by using PaddlePaddle V2
APIs. This function will NOT dump configurations related to PaddlePaddle
optimizer.
:param topology: The output layers (can be more than one layers given in a
Python List or Tuple) of the entire network. Using the
specified layers (if more than one layer is given) as root,
traversing back to the data layer(s), all the layers
connected to the specified output layers will be dumped.
Layers not connceted to the specified will not be dumped.
:type topology: LayerOutput|List|Tuple
:param save_path: The path to save the dumped network topology.
:type save_path: str
:param binary: Whether to dump the serialized network topology or not.
The default value is false. NOTE that, if you call this
function to generate network topology for PaddlePaddle C-API,
a serialized version of network topology is required. When
using PaddlePaddle C-API, this flag MUST be set to True.
:type binary: bool
"""
if isinstance(topology, LayerOutput):
topology = [topology]
elif isinstance(topology, collections.Sequence):
for out_layer in topology:
assert isinstance(out_layer, LayerOutput), (
"The type of each element in the parameter topology "
"should be LayerOutput.")
else:
raise RuntimeError("Error input type for parameter topology.")
model_str = parse_network(topology)
with open(save_path, "w") as fout:
if binary:
fout.write(model_str.SerializeToString())
else:
fout.write(str(model_str))
...@@ -30,7 +30,8 @@ def merge_v2_model(net, param_file, output_file): ...@@ -30,7 +30,8 @@ def merge_v2_model(net, param_file, output_file):
which ends with .tar.gz. which ends with .tar.gz.
@param net The output layer of the network for inference. @param net The output layer of the network for inference.
@param param_file Path of the parameters (.tar.gz) which is stored by v2 api. @param param_file Path of the parameters (.tar.gz) which is stored by
v2 api.
@param output_file Path of the merged file which will be generated. @param output_file Path of the merged file which will be generated.
Usage: Usage:
......
...@@ -18,14 +18,29 @@ from param_attr import ParamAttr ...@@ -18,14 +18,29 @@ from param_attr import ParamAttr
from data_feeder import DataFeeder from data_feeder import DataFeeder
from core import LoDTensor, CPUPlace, CUDAPlace from core import LoDTensor, CPUPlace, CUDAPlace
from distribute_transpiler import DistributeTranspiler from distribute_transpiler import DistributeTranspiler
from distribute_transpiler_simple import SimpleDistributeTranspiler
import clip import clip
from memory_optimization_transpiler import memory_optimize from memory_optimization_transpiler import memory_optimize
Tensor = LoDTensor Tensor = LoDTensor
__all__ = framework.__all__ + executor.__all__ + [ __all__ = framework.__all__ + executor.__all__ + [
'io', 'initializer', 'layers', 'nets', 'optimizer', 'backward', 'io',
'regularizer', 'LoDTensor', 'CPUPlace', 'CUDAPlace', 'Tensor', 'ParamAttr' 'initializer',
'DataFeeder', 'clip', 'DistributeTranspiler', 'memory_optimize' 'layers',
'nets',
'optimizer',
'backward',
'regularizer',
'LoDTensor',
'CPUPlace',
'CUDAPlace',
'Tensor',
'ParamAttr'
'DataFeeder',
'clip',
'SimpleDistributeTranspiler',
'DistributeTranspiler',
'memory_optimize',
] ]
...@@ -58,7 +73,7 @@ def __bootstrap__(): ...@@ -58,7 +73,7 @@ def __bootstrap__():
read_env_flags = ['use_pinned_memory', 'check_nan_inf'] read_env_flags = ['use_pinned_memory', 'check_nan_inf']
if core.is_compile_gpu(): if core.is_compile_gpu():
read_env_flags.append('fraction_of_gpu_memory_to_use') read_env_flags += ['fraction_of_gpu_memory_to_use', 'op_sync']
core.init_gflags([sys.argv[0]] + core.init_gflags([sys.argv[0]] +
["--tryfromenv=" + ",".join(read_env_flags)]) ["--tryfromenv=" + ",".join(read_env_flags)])
core.init_glog(sys.argv[0]) core.init_glog(sys.argv[0])
......
...@@ -3,7 +3,10 @@ from . import core ...@@ -3,7 +3,10 @@ from . import core
import collections import collections
import copy import copy
__all__ = ['append_backward', 'calc_gradient'] __all__ = [
'append_backward',
'calc_gradient',
]
def _rename_arg_(op_descs, old_name, new_name, begin_idx=None, end_idx=None): def _rename_arg_(op_descs, old_name, new_name, begin_idx=None, end_idx=None):
......
...@@ -3,7 +3,10 @@ import layers ...@@ -3,7 +3,10 @@ import layers
from . import core from . import core
__all__ = [ __all__ = [
'GradientClipByValue', 'append_gradient_clip_ops', 'error_clip_callback' 'GradientClipByValue',
'ErrorClipByValue',
'append_gradient_clip_ops',
'error_clip_callback',
] ]
...@@ -23,12 +26,12 @@ class ErrorClipByValue(BaseErrorClipAttr): ...@@ -23,12 +26,12 @@ class ErrorClipByValue(BaseErrorClipAttr):
self.min = min self.min = min
def append_clip_op(self, block, grad_name): def append_clip_op(self, block, grad_name):
block.append_op( clip_op_desc = block.desc.append_op()
type="clip", clip_op_desc.set_type("clip")
inputs={"X": grad_name}, clip_op_desc.set_input("X", [grad_name])
outputs={"Out": grad_name}, clip_op_desc.set_output("Out", [grad_name])
attrs={"min": self.min, clip_op_desc.set_attr("min", self.min)
"max": self.max}) clip_op_desc.set_attr("max", self.max)
def error_clip_callback(block, context): def error_clip_callback(block, context):
...@@ -39,6 +42,11 @@ def error_clip_callback(block, context): ...@@ -39,6 +42,11 @@ def error_clip_callback(block, context):
op_desc.output_arg_names()): op_desc.output_arg_names()):
fwd_var = block.var_recursive(grad_to_var[grad_n]) fwd_var = block.var_recursive(grad_to_var[grad_n])
error_clip = getattr(fwd_var, "error_clip", None) error_clip = getattr(fwd_var, "error_clip", None)
if not (error_clip is None or isinstance(error_clip,
BaseErrorClipAttr)):
raise TypeError(
"Variable's error_clip should be an instance of BaseErrorClipAttr or None."
)
if error_clip is not None: if error_clip is not None:
error_clip.append_clip_op(block, grad_n) error_clip.append_clip_op(block, grad_n)
......
""" """
Default scope function. Default scope function.
`Paddle` manages Scope as programming language's scope. It just a `Paddle` manages Scope as programming language's scope. It just a
thread-local stack of Scope. Top of that stack is current scope, the bottom thread-local stack of Scope. Top of that stack is current scope, the bottom
of that stack is all scopes' parent. of that stack is all scopes' parent.
Invoking `var/find_var` can `new/find` variable in current scope. Invoking `var/find_var` can `new/find` variable in current scope.
Invoking `enter_local_scope/leave_local_scope` can create or destroy local Invoking `enter_local_scope/leave_local_scope` can create or destroy local
scope. scope.
A `scoped_function` will take a `function` as input. That function will be A `scoped_function` will take a `function` as input. That function will be
invoked in a new local scope. invoked in a new local scope.
""" """
import paddle.v2.fluid.core import paddle.v2.fluid.core
...@@ -19,8 +19,12 @@ import threading ...@@ -19,8 +19,12 @@ import threading
__tl_scope__ = threading.local() __tl_scope__ = threading.local()
__all__ = [ __all__ = [
'get_cur_scope', 'enter_local_scope', 'leave_local_scope', 'var', 'get_cur_scope',
'find_var', 'scoped_function' 'enter_local_scope',
'leave_local_scope',
'var',
'find_var',
'scoped_function',
] ]
...@@ -71,7 +75,7 @@ def find_var(name): ...@@ -71,7 +75,7 @@ def find_var(name):
def scoped_function(func): def scoped_function(func):
""" """
invoke `func` in new scope. invoke `func` in new scope.
:param func: a callable function that will be run in new scope. :param func: a callable function that will be run in new scope.
:type func: callable :type func: callable
""" """
......
import framework
from framework import Program, default_main_program, Parameter, Variable
import optimizer
from layer_helper import LayerHelper
def hash_name_to_server(params_grads, pserver_endpoints):
"""
:param param_grads:
:return: a map of pserver endpoint ->
params -> [param list]
grads -> [grad list]
"""
def _hash_param(param_name, total):
return hash(param_name) % total
param_grad_map = dict()
for param, grad in params_grads:
if param.trainable is True and grad is not None:
server_id = _hash_param(param.name, len(pserver_endpoints))
server_for_param = pserver_endpoints[server_id]
if not param_grad_map.has_key(server_for_param):
param_grad_map[server_for_param] = {"params": [], "grads": []}
param_grad_map[server_for_param]["params"].append(param)
param_grad_map[server_for_param]["grads"].append(grad)
return param_grad_map
def round_robin(params_grads, pserver_endpoints):
assert (len(params_grads) > len(pserver_endpoints))
param_grad_map = dict()
pserver_idx = 0
for param, grad in params_grads:
if param.trainable is True:
server_for_param = pserver_endpoints[pserver_idx]
if not param_grad_map.has_key(server_for_param):
param_grad_map[server_for_param] = {"params": [], "grads": []}
param_grad_map[server_for_param]["params"].append(param)
param_grad_map[server_for_param]["grads"].append(grad)
pserver_idx += 1
if pserver_idx >= len(pserver_endpoints):
pserver_idx = 0
return param_grad_map
class SimpleDistributeTranspiler:
def transpile(self,
optimize_ops,
params_grads,
program=None,
pservers="127.0.0.1:6174",
trainers=1,
split_method=round_robin):
"""
Transpile the program to a distributed data-parallelism programs.
The main_program will be transform to use a remote parameter server
to do parameter optimization. And the optimization graph will be put
in to a parameter server program.
Use different methods to split trainable varialbles to different
parameter servers.
Example to run:
exe = fluid.Executor(place)
t = fluid.DistributeTranspiler()
t.transpile(optimize_ops, params_grads, pservers="127.0.0.1:6174", trainers=1)
pserver_endpoint = os.getenv("PSERVER")
if pserver_endpoint:
pserver_prog = t.get_pserver_program(pserver_endpoint, optimize_ops)
exe.run(fluid.default_startup_program())
exe.run(pserver_prog)
else:
feeder = fluid.DataFeeder(feed_list=[images, label], place=place)
exe.run(fluid.default_startup_program())
for pass_id in range(PASS_NUM):
...
:param optimize_ops: op list of optimization, should be the
return value of Optimizer.minimize
:type optimize_ops: list
:param program: program to optimize, default default_main_program
:param pservers: parameter server endpoints like "m1:6174,m2:6174"
:type pservers: string
:return: return a list of programs
"""
if program is None:
program = default_main_program()
self.program = program
self.trainers = trainers
self.optimize_ops = optimize_ops
self._optimize_distributed(
optimize_ops,
program,
params_grads,
pservers=pservers,
trainers=trainers,
split_method=split_method)
def _clone_param(self, block, v):
assert isinstance(v, Parameter)
new_p = Parameter(
block=block,
shape=v.shape,
dtype=v.dtype,
type=v.type,
lod_level=v.lod_level,
stop_gradient=v.stop_gradient,
trainable=v.trainable,
optimize_attr=v.optimize_attr,
regularizer=v.regularizer,
name=v.name)
block.vars[new_p.name] = new_p
def _clone_var(self, block, var):
assert isinstance(var, Variable)
return block.create_var(
name=var.name,
shape=var.shape,
dtype=var.dtype,
type=var.type,
lod_level=var.lod_level,
persistable=var.persistable)
def _optimize_distributed(self, optimize_ops, program, params_and_grads,
**kwargs):
if kwargs.has_key("split_method"):
split_method = kwargs["split_method"]
else:
split_method = round_robin
assert (callable(split_method))
pserver_endpoints = kwargs["pservers"].split(",")
self.param_grad_map = split_method(params_and_grads, pserver_endpoints)
send_op_ordered_inputs = []
send_op_ordered_outputs = []
epmap = []
for ep, v in self.param_grad_map.iteritems():
send_op_ordered_inputs.extend(v["grads"])
send_op_ordered_outputs.extend(v["params"])
for i in v["grads"]:
epmap.append(ep)
send_op = program.global_block().append_op(
type="send",
inputs={"X": send_op_ordered_inputs
}, # inputs is a list of tensors to be send
outputs={"Out": send_op_ordered_outputs},
attrs={"endpoints": pserver_endpoints,
"epmap": epmap})
def get_trainer_program(self):
# remove optimize ops and add a send op to main_program
self.program.global_block().delete_ops(self.optimize_ops)
return self.program
def _create_var_for_trainers(self, block, var, trainers):
var_list = []
for i in xrange(trainers):
var_each = block.create_var(
name="%s.trainer_%d" % (var.name, i),
psersistable=var.persistable,
dtype=var.dtype,
shape=var.shape)
var_list.append(var_each)
return var_list
def get_pserver_program(self, endpoint, optimize_ops):
pserver_program = Program()
for v in self.param_grad_map[endpoint]["params"]:
self._clone_param(pserver_program.global_block(), v)
optimize_sub_program = Program()
grad_var_names = [
var.name for var in self.param_grad_map[endpoint]["grads"]
]
for opt_op in optimize_ops:
for _, var in opt_op.inputs.iteritems():
# NOTE: append operators to merge gradients from multiple
# trainers. If trainers == 1, this is not needed.
if self.trainers > 1 and var.name in grad_var_names:
vars2merge = self._create_var_for_trainers(
optimize_sub_program.global_block(), var, self.trainers)
merged_var = optimize_sub_program.global_block().create_var(
name=var.name,
persistable=var.persistable,
dtype=var.dtype,
shape=var.shape)
optimize_sub_program.global_block().append_op(
type="sum",
inputs={"X": vars2merge},
outputs={"Out": merged_var})
optimize_sub_program.global_block().append_op(
type="scale",
inputs={"X": merged_var},
outputs={"Out": merged_var},
attrs={"scale": 1.0 / float(self.trainers)})
else:
optimize_sub_program.global_block().create_var(
name=var.name,
persistable=var.persistable,
dtype=var.dtype,
shape=var.shape)
if opt_op.inputs.has_key("Grad"):
if opt_op.inputs["Grad"].name in grad_var_names:
optimize_sub_program.global_block().append_op(
type=opt_op.type,
inputs=opt_op.inputs,
outputs=opt_op.outputs,
attrs=opt_op.attrs)
else:
optimize_sub_program.global_block().append_op(
type=opt_op.type,
inputs=opt_op.inputs,
outputs=opt_op.outputs,
attrs=opt_op.attrs)
pserver_program.global_block().append_op(
type="recv",
inputs={"RX":
self.param_grad_map[endpoint]["grads"]}, # grads to recv
outputs={},
attrs={
"OptimizeProgram": optimize_sub_program.desc,
"endpoint": endpoint,
"ParamList":
[p.name for p in self.param_grad_map[endpoint]["params"]],
"GradList":
[p.name for p in self.param_grad_map[endpoint]["grads"]],
"Trainers": self.trainers
})
pserver_program.sync_with_cpp()
return pserver_program
def hash_name(varlist, pserver_endpoints):
"""
hash variable names to several endpoints.
:param varlist: a list of Variables
:return: a map of pserver endpoint -> varname
"""
def _hash_block(block_str, total):
return hash(block_str) % total
eplist = []
for var in varlist:
server_id = _hash_block(var.name(), len(pserver_endpoints))
server_for_param = pserver_endpoints[server_id]
eplist.append(server_for_param)
return eplist
def round_robin(varlist, pserver_endpoints):
"""
distribute variables to several endpoints.
"""
assert (len(varlist) > len(pserver_endpoints))
eplist = []
pserver_idx = 0
for var in varlist:
server_for_param = pserver_endpoints[pserver_idx]
eplist.append(server_for_param)
pserver_idx += 1
if pserver_idx >= len(pserver_endpoints):
pserver_idx = 0
return eplist
...@@ -4,7 +4,10 @@ import layers ...@@ -4,7 +4,10 @@ import layers
from framework import Program, unique_name, Variable, program_guard from framework import Program, unique_name, Variable, program_guard
from layer_helper import LayerHelper from layer_helper import LayerHelper
__all__ = ['Accuracy', 'ChunkEvaluator'] __all__ = [
'Accuracy',
'ChunkEvaluator',
]
def _clone_var_(block, var): def _clone_var_(block, var):
...@@ -21,19 +24,19 @@ def _clone_var_(block, var): ...@@ -21,19 +24,19 @@ def _clone_var_(block, var):
class Evaluator(object): class Evaluator(object):
""" """
Base Class for all evaluators Base Class for all evaluators
Args: Args:
name(str): The name of evaluator. such as, "accuracy". Used for generate name(str): The name of evaluator. such as, "accuracy". Used for generate
temporary variable name. temporary variable name.
main_program(Program, optional): The evaluator should be added to this main_program(Program, optional): The evaluator should be added to this
main_program. Default default_main_program() main_program. Default default_main_program()
startup_program(Program, optional):The parameter should be added to this startup_program(Program, optional):The parameter should be added to this
startup_program. Default default_startup_program() startup_program. Default default_startup_program()
Attributes: Attributes:
states(list): The list of state variables. states will be reset to zero states(list): The list of state variables. states will be reset to zero
when `reset` is invoked. when `reset` is invoked.
metrics(list): The list of metrics variables. They will be calculate metrics(list): The list of metrics variables. They will be calculate
every mini-batch every mini-batch
""" """
...@@ -66,14 +69,14 @@ class Evaluator(object): ...@@ -66,14 +69,14 @@ class Evaluator(object):
def create_state(self, suffix, dtype, shape): def create_state(self, suffix, dtype, shape):
""" """
Create state variable. Create state variable.
NOTE: It is not a public API. NOTE: It is not a public API.
Args: Args:
suffix(str): the state suffix. suffix(str): the state suffix.
dtype(str|core.DataType): the state data type dtype(str|core.DataType): the state data type
shape(tuple|list): the shape of state shape(tuple|list): the shape of state
Returns: State variable Returns: State variable
...@@ -127,8 +130,8 @@ class Accuracy(Evaluator): ...@@ -127,8 +130,8 @@ class Accuracy(Evaluator):
class ChunkEvaluator(Evaluator): class ChunkEvaluator(Evaluator):
""" """
Accumulate counter numbers output by chunk_eval from mini-batches and Accumulate counter numbers output by chunk_eval from mini-batches and
compute the precision recall and F1-score using the accumulated counter compute the precision recall and F1-score using the accumulated counter
numbers. numbers.
""" """
......
...@@ -7,9 +7,15 @@ import proto.framework_pb2 as framework_pb2 ...@@ -7,9 +7,15 @@ import proto.framework_pb2 as framework_pb2
from . import core from . import core
__all__ = [ __all__ = [
'Block', 'Variable', 'Program', 'Operator', 'default_startup_program', 'Block',
'default_main_program', 'program_guard', 'switch_startup_program', 'Variable',
'switch_main_program' 'Program',
'Operator',
'default_startup_program',
'default_main_program',
'program_guard',
'switch_startup_program',
'switch_main_program',
] ]
EMPTY_VAR_NAME = core.kEmptyVarName() EMPTY_VAR_NAME = core.kEmptyVarName()
...@@ -274,6 +280,9 @@ class Variable(object): ...@@ -274,6 +280,9 @@ class Variable(object):
uid = core.unique_integer(prefix) # unique during whole process. uid = core.unique_integer(prefix) # unique during whole process.
return "_".join([prefix, str(uid)]) return "_".join([prefix, str(uid)])
def set_error_clip(self, error_clip):
self.error_clip = error_clip
def get_all_op_protos(): def get_all_op_protos():
""" """
......
import framework import framework
import numpy as np import numpy as np
__all__ = ['Constant', 'Uniform', 'Normal', 'Xavier'] __all__ = [
'Constant',
'Uniform',
'Normal',
'Xavier',
]
class Initializer(object): class Initializer(object):
......
...@@ -4,13 +4,29 @@ import cPickle as pickle ...@@ -4,13 +4,29 @@ import cPickle as pickle
from paddle.v2.fluid.framework import Program, Parameter, default_main_program, Variable from paddle.v2.fluid.framework import Program, Parameter, default_main_program, Variable
__all__ = [ __all__ = [
'save_vars', 'save_params', 'save_persistables', 'load_vars', 'load_params', 'save_vars',
'load_persistables', "save_inference_model", "load_inference_model", 'save_params',
"get_inference_program" 'save_persistables',
'load_vars',
'load_params',
'load_persistables',
'save_inference_model',
'load_inference_model',
'get_inference_program',
] ]
def is_parameter(var): def is_parameter(var):
"""Check whether the variable is a Parameter.
This function checks whether the input variable is a Parameter.
Args:
var : The input variable.
Returns:
boolean result whether the variable is a Parameter.
"""
return isinstance(var, Parameter) return isinstance(var, Parameter)
......
...@@ -12,7 +12,7 @@ __all__ = [ ...@@ -12,7 +12,7 @@ __all__ = [
'array_to_lod_tensor', 'increment', 'array_write', 'create_array', 'array_to_lod_tensor', 'increment', 'array_write', 'create_array',
'less_than', 'array_read', 'shrink_memory', 'array_length', 'IfElse', 'less_than', 'array_read', 'shrink_memory', 'array_length', 'IfElse',
'DynamicRNN', 'ConditionalBlock', 'StaticRNN', 'reorder_lod_tensor_by_rank', 'DynamicRNN', 'ConditionalBlock', 'StaticRNN', 'reorder_lod_tensor_by_rank',
'ParallelDo' 'ParallelDo', 'Print'
] ]
...@@ -110,6 +110,67 @@ def merge_lod_tensor(in_true, in_false, x, mask, level=0): ...@@ -110,6 +110,67 @@ def merge_lod_tensor(in_true, in_false, x, mask, level=0):
return out return out
def Print(input,
first_n=-1,
message=None,
summarize=-1,
print_tensor_name=True,
print_tensor_type=True,
print_tensor_shape=True,
print_tensor_lod=True,
print_phase='both'):
'''
**Print operator**
This creates a print op that will print when a tensor is accessed.
Wraps the tensor passed in so that whenever that a tensor is accessed,
the message `message` is printed, along with the current value of the
tensor `t`.
Args:
input (Variable): A Tensor to print.
summarize (int): Print this number of elements in the tensor, will print
all if left is negative.
message (str): A string message to print as a prefix.
first_n (int): Only log `first_n` number of times.
print_tensor_name (bool): Print the tensor name.
print_tensor_type (bool): Print the tensor type.
print_tensor_shape (bool): Print the tensor shape.
print_tensor_lod (bool): Print the tensor lod.
print_phase (bool): Which phase to displace, including 'forward',
'backward' and 'both'. If set to 'backward' or 'both', will
print the gradients of input tensor.
Returns:
Variable: Output tensor, same data with input tensor.
Examples:
.. code-block:: python
value = some_layer(...)
Print(value, summarize=10,
message="The content of some_layer: ")
'''
helper = LayerHelper('print', **locals())
out = helper.create_tmp_variable(dtype=helper.input_dtype())
helper.append_op(
type='print',
inputs={'In': input},
attrs={
'first_n': first_n,
'summarize': summarize,
'message': message or "",
'print_tensor_name': print_tensor_name,
'print_tensor_type': print_tensor_type,
'print_tensor_shape': print_tensor_shape,
'print_tensor_lod': print_tensor_lod,
'print_phase': print_phase.upper()
},
outputs={'Out': out})
return out
class BlockGuard(object): class BlockGuard(object):
""" """
BlockGuard class. BlockGuard class.
...@@ -687,11 +748,10 @@ def topk(input, k): ...@@ -687,11 +748,10 @@ def topk(input, k):
def lod_tensor_to_array(x, table): def lod_tensor_to_array(x, table):
"""This function performs the operation that converts an LOD_Tensor to """ Convert a LOD_TENSOR to an LOD_TENSOR_ARRAY.
an array.
Args: Args:
x (Variable|list): The tensor that needs to be converted to an array. x (Variable|list): The LOD tensor to be converted to a LOD tensor array.
table (ParamAttr|list): The variable that stores the level of lod table (ParamAttr|list): The variable that stores the level of lod
which is ordered by sequence length in which is ordered by sequence length in
descending order. descending order.
...@@ -721,11 +781,10 @@ def lod_tensor_to_array(x, table): ...@@ -721,11 +781,10 @@ def lod_tensor_to_array(x, table):
def array_to_lod_tensor(x, table): def array_to_lod_tensor(x, table):
"""This function performs the operations that converts an array to """Convert a LoD_Tensor_Aarry to an LoDTensor.
an LOD_Tensor.
Args: Args:
x (Variable|list): The array that needs to be converted to a tensor. x (Variable|list): The lod tensor array to be converted to a tensor.
table (ParamAttr|list): The variable that stores the level of lod table (ParamAttr|list): The variable that stores the level of lod
which is ordered by sequence length in which is ordered by sequence length in
descending order. descending order.
...@@ -753,7 +812,8 @@ def array_to_lod_tensor(x, table): ...@@ -753,7 +812,8 @@ def array_to_lod_tensor(x, table):
def increment(x, value=1.0, in_place=True): def increment(x, value=1.0, in_place=True):
"""This function performs an operation that increments each value in the """
This function performs an operation that increments each value in the
input :math:`x` by an amount: :math:`value` as mentioned in the input input :math:`x` by an amount: :math:`value` as mentioned in the input
parameter. This operation is performed in-place by default. parameter. This operation is performed in-place by default.
...@@ -786,17 +846,24 @@ def increment(x, value=1.0, in_place=True): ...@@ -786,17 +846,24 @@ def increment(x, value=1.0, in_place=True):
def array_write(x, i, array=None): def array_write(x, i, array=None):
"""This function performs the operation to write the data out as an """
LOD_TENSOR_ARRAY. This function writes the given input variable to the specified position
indicating by the arrary index to an output LOD_TENSOR_ARRAY. If the
output LOD_TENSOR_ARRAY is not given(None), a new one will be created and
returned.
Args: Args:
x (Variable|list): The input tensor from which the data will be read. x (Variable|list): The input tensor from which the data will be read.
i (Variable|list): The subscript index in tensor array, that points the i (Variable|list): The index of the output LOD_TENSOR_ARRAY, pointing to
place from which data will be read. the position to which the input tensor will be
array (Variable|list): The data can be read into this variable if written.
this is assigned. array (Variable|list): The output LOD_TENSOR_ARRAY to which the input
tensor will be written. If this parameter is
NONE, a new LOD_TENSOR_ARRAY will be created and
returned.
Returns: Returns:
Variable: The tensor type variable that has the data written to it. Variable: The output LOD_TENSOR_ARRAY where the input tensor is written.
Examples: Examples:
.. code-block::python .. code-block::python
...@@ -1159,7 +1226,8 @@ class DynamicRNN(object): ...@@ -1159,7 +1226,8 @@ class DynamicRNN(object):
self.lod_rank_table = None self.lod_rank_table = None
self.max_seq_len = None self.max_seq_len = None
self.step_idx = None self.step_idx = None
self.zero_idx = fill_constant(shape=[1], value=0, dtype='int64') self.zero_idx = fill_constant(
shape=[1], value=0, dtype='int64', force_cpu=True)
self.mem_dict = dict() self.mem_dict = dict()
self.output_array = [] self.output_array = []
self.outputs = [] self.outputs = []
...@@ -1173,7 +1241,7 @@ class DynamicRNN(object): ...@@ -1173,7 +1241,7 @@ class DynamicRNN(object):
self._assert_in_rnn_block_("step_input") self._assert_in_rnn_block_("step_input")
if not isinstance(x, Variable): if not isinstance(x, Variable):
raise TypeError( raise TypeError(
"step_input() can only take a Variable as its input") "step_input() can only take a Variable as its input.")
parent_block = self._parent_block_() parent_block = self._parent_block_()
if self.lod_rank_table is None: if self.lod_rank_table is None:
self.lod_rank_table = parent_block.create_var( self.lod_rank_table = parent_block.create_var(
...@@ -1234,7 +1302,8 @@ class DynamicRNN(object): ...@@ -1234,7 +1302,8 @@ class DynamicRNN(object):
def block(self): def block(self):
if self.status != DynamicRNN.BEFORE_RNN: if self.status != DynamicRNN.BEFORE_RNN:
raise ValueError("rnn.block() can only be invoke once") raise ValueError("rnn.block() can only be invoke once")
self.step_idx = fill_constant(shape=[1], dtype='int64', value=0) self.step_idx = fill_constant(
shape=[1], dtype='int64', value=0, force_cpu=True)
self.step_idx.stop_gradient = False self.step_idx.stop_gradient = False
self.status = DynamicRNN.IN_RNN self.status = DynamicRNN.IN_RNN
with self.while_op.block(): with self.while_op.block():
...@@ -1254,8 +1323,8 @@ class DynamicRNN(object): ...@@ -1254,8 +1323,8 @@ class DynamicRNN(object):
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
if self.status != DynamicRNN.AFTER_RNN: if self.status != DynamicRNN.AFTER_RNN:
raise ValueError( raise ValueError(("Output of the dynamic RNN can only be visited "
"Dynamic RNN outputs can only be retrieved after rnn block") "outside the rnn block."))
if len(self.outputs) == 1: if len(self.outputs) == 1:
return self.outputs[0] return self.outputs[0]
else: else:
......
...@@ -9,12 +9,33 @@ from ..param_attr import ParamAttr ...@@ -9,12 +9,33 @@ from ..param_attr import ParamAttr
from tensor import concat from tensor import concat
__all__ = [ __all__ = [
'fc', 'embedding', 'dynamic_lstm', 'gru_unit', 'linear_chain_crf', 'fc',
'crf_decoding', 'cos_sim', 'cross_entropy', 'square_error_cost', 'accuracy', 'embedding',
'chunk_eval', 'sequence_conv', 'conv2d', 'sequence_pool', 'pool2d', 'dynamic_lstm',
'batch_norm', 'beam_search_decode', 'conv2d_transpose', 'sequence_expand', 'gru_unit',
'lstm_unit', 'reduce_sum', 'reduce_mean', 'reduce_max', 'reduce_min', 'linear_chain_crf',
'sequence_first_step', 'sequence_last_step', 'dropout' 'crf_decoding',
'cos_sim',
'cross_entropy',
'square_error_cost',
'accuracy',
'chunk_eval',
'sequence_conv',
'conv2d',
'sequence_pool',
'pool2d',
'batch_norm',
'beam_search_decode',
'conv2d_transpose',
'sequence_expand',
'lstm_unit',
'reduce_sum',
'reduce_mean',
'reduce_max',
'reduce_min',
'sequence_first_step',
'sequence_last_step',
'dropout',
] ]
...@@ -248,13 +269,13 @@ def gru_unit(input, ...@@ -248,13 +269,13 @@ def gru_unit(input,
h_t & = dot((1-u_t), m_t) + dot(u_t, h_{t-1}) h_t & = dot((1-u_t), m_t) + dot(u_t, h_{t-1})
The inputs of gru unit includes :math:`z_t`, :math:`h_{t-1}`. In terms The inputs of gru unit includes :math:`z_t`, :math:`h_{t-1}`. In terms
of the equation above, the :math:`z_t` is split into 3 parts - of the equation above, the :math:`z_t` is split into 3 parts -
:math:`xu_t`, :math:`xr_t` and :math:`xm_t`. This means that in order to :math:`xu_t`, :math:`xr_t` and :math:`xm_t`. This means that in order to
implement a full GRU unit operator for an input, a fully implement a full GRU unit operator for an input, a fully
connected layer has to be applied, such that :math:`z_t = W_{fc}x_t`. connected layer has to be applied, such that :math:`z_t = W_{fc}x_t`.
The terms :math:`u_t` and :math:`r_t` represent the update and reset gates The terms :math:`u_t` and :math:`r_t` represent the update and reset gates
of the GRU cell. Unlike LSTM, GRU has one lesser gate. However, there is of the GRU cell. Unlike LSTM, GRU has one lesser gate. However, there is
an intermediate candidate hidden output, which is denoted by :math:`m_t`. an intermediate candidate hidden output, which is denoted by :math:`m_t`.
This layer has three outputs :math:`h_t`, :math:`dot(r_t, h_{t-1})` This layer has three outputs :math:`h_t`, :math:`dot(r_t, h_{t-1})`
and concatenation of :math:`u_t`, :math:`r_t` and :math:`m_t`. and concatenation of :math:`u_t`, :math:`r_t` and :math:`m_t`.
...@@ -276,7 +297,7 @@ def gru_unit(input, ...@@ -276,7 +297,7 @@ def gru_unit(input,
.. code-block:: python .. code-block:: python
# assuming we have x_t_data and prev_hidden of size=10 # assuming we have x_t_data and prev_hidden of size=10
x_t = fluid.layers.fc(input=x_t_data, size=30) x_t = fluid.layers.fc(input=x_t_data, size=30)
hidden_val, r_h_val, gate_val = fluid.layers.gru_unit(input=x_t, hidden_val, r_h_val, gate_val = fluid.layers.gru_unit(input=x_t,
hidden = prev_hidden) hidden = prev_hidden)
...@@ -754,7 +775,7 @@ def conv2d(input, ...@@ -754,7 +775,7 @@ def conv2d(input,
pre_bias = helper.create_tmp_variable(dtype) pre_bias = helper.create_tmp_variable(dtype)
helper.append_op( helper.append_op(
type='conv2d_cudnn', type='conv2d',
inputs={ inputs={
'Input': input, 'Input': input,
'Filter': filter_param, 'Filter': filter_param,
...@@ -983,7 +1004,7 @@ def batch_norm(input, ...@@ -983,7 +1004,7 @@ def batch_norm(input,
default_initializer=Constant(1.0)) default_initializer=Constant(1.0))
bias = helper.create_parameter( bias = helper.create_parameter(
attr=helper.param_attr, shape=param_shape, dtype=dtype, is_bias=True) attr=helper.bias_attr, shape=param_shape, dtype=dtype, is_bias=True)
mean = helper.create_global_variable( mean = helper.create_global_variable(
dtype=input.dtype, dtype=input.dtype,
......
from ..registry import register_layer from ..registry import register_layer
__activations__ = [ __activations__ = [
'abs', 'tanh', 'sigmoid', 'relu', 'sqrt', 'ceil', 'floor', 'log', 'round' 'sigmoid',
'logsigmoid',
'exp',
'relu',
'tanh',
'tanh_shrink',
'softshrink',
'sqrt',
'abs',
'ceil',
'floor',
'round',
'reciprocal',
'log',
'square',
'softplus',
'softsign',
'brelu',
'leaky_relu',
'soft_relu',
'elu',
'relu6',
'pow',
'stanh',
'hard_shrink',
'thresholded_relu',
'hard_sigmoid',
'swish',
] ]
__all__ = [ __all__ = [
......
from ..layer_helper import LayerHelper from ..layer_helper import LayerHelper
from ..param_attr import ParamAttr from ..param_attr import ParamAttr
from ..framework import convert_np_dtype_to_dtype_
from ..framework import Variable
from ..core import DataType
import numpy
__all__ = [ __all__ = [
'create_tensor', 'create_parameter', 'cast', 'concat', 'sums', 'assign', 'create_tensor',
'fill_constant_batch_size_like', 'fill_constant', 'ones', 'zeros' 'create_parameter',
'cast',
'concat',
'sums',
'assign',
'fill_constant_batch_size_like',
'fill_constant',
'ones',
'zeros',
] ]
...@@ -121,7 +133,7 @@ def assign(input, output): ...@@ -121,7 +133,7 @@ def assign(input, output):
This function copies the *input* Variable to the *output* Variable. This function copies the *input* Variable to the *output* Variable.
Args: Args:
input(Variable): The source variable input(Variable|numpy.ndarray): The source variable
output(Variable): The destination variable output(Variable): The destination variable
Returns: Returns:
...@@ -134,37 +146,64 @@ def assign(input, output): ...@@ -134,37 +146,64 @@ def assign(input, output):
fluid.layers.assign(hidden, out) fluid.layers.assign(hidden, out)
""" """
helper = LayerHelper('assign', **locals()) helper = LayerHelper('assign', **locals())
helper.append_op( if isinstance(input, Variable):
type='scale', helper.append_op(
inputs={'X': [input]}, type='scale',
outputs={'Out': [output]}, inputs={'X': [input]},
attrs={'scale': 1.0}) outputs={'Out': [output]},
attrs={'scale': 1.0})
elif isinstance(input, numpy.ndarray):
dtype = convert_np_dtype_to_dtype_(input.dtype)
if dtype == DataType.FP32:
value_name = "fp32_values"
values = [float(v) for v in input.flat]
elif dtype == DataType.INT32:
value_name = "int32_values"
values = [int(v) for v in input.flat]
else:
raise ValueError("Unsupported dtype %s", input.dtype)
if input.size > 1024 * 1024:
raise ValueError("The size of input is too big. Please consider "
"saving it to file and 'load_op' to load it")
helper.append_op(
type='assign_value',
outputs={'Out': [output]},
attrs={
'dtype': dtype,
'shape': list(input.shape),
value_name: values
})
else:
raise ValueError("Wrong type for assign input: %s" % type(input))
return output return output
def fill_constant(shape, dtype, value, out=None): def fill_constant(shape, dtype, value, force_cpu=False, out=None):
""" """
**fill_constant** **fill_constant**
This function creates a tensor of specified *shape* and This function creates a tensor with specified `shape` and `dtype`, and
*dtype*, and initializes this with a constant supplied in *value*. initializes it with a constant specifed by `value`.
It also sets *stop_gradient* to True. The attribute `stop_gradient` of the created tensor is set to True.
Args: Args:
shape(tuple|list|None): Shape of output tensor shape(tuple|list|None): Shape of the output tensor.
dtype(np.dtype|core.DataType|str): Data type of output tensor dtype(np.dtype|core.DataType|str): Data type of the output tensor.
value(float): Constant value to initialize the output tensor value(float): The constant value used to initialize the output tensor.
out(Variable): Output Variable to initialize out(Variable): The output tensor.
Returns: Returns:
Variable: The tensor variable storing the output Variable: The tensor variable storing the output.
Examples: Examples:
.. code-block:: python .. code-block:: python
data = fluid.layers.fill_constant(shape=[1], value=0, dtype='int64') data = fluid.layers.fill_constant(shape=[1], value=0, dtype='int64')
""" """
helper = LayerHelper("fill_constant", **locals()) helper = LayerHelper("fill_constant", **locals())
if out is None: if out is None:
out = helper.create_tmp_variable(dtype=dtype) out = helper.create_tmp_variable(dtype=dtype)
...@@ -172,9 +211,12 @@ def fill_constant(shape, dtype, value, out=None): ...@@ -172,9 +211,12 @@ def fill_constant(shape, dtype, value, out=None):
type='fill_constant', type='fill_constant',
inputs={}, inputs={},
outputs={'Out': [out]}, outputs={'Out': [out]},
attrs={'shape': shape, attrs={
'dtype': out.dtype, 'shape': shape,
'value': float(value)}) 'dtype': out.dtype,
'value': float(value),
'force_cpu': force_cpu
})
out.stop_gradient = True out.stop_gradient = True
return out return out
......
...@@ -121,8 +121,10 @@ class ControlFlowGraph(object): ...@@ -121,8 +121,10 @@ class ControlFlowGraph(object):
# and dtype_to_size[cache_dtype] # and dtype_to_size[cache_dtype]
if x_dtype == cache_dtype: if x_dtype == cache_dtype:
print( print(
"Hit Cache !!!! cache pool index is %d, var name is %s, cached var name is %s, var shape is %s " ("Hit Cache !!!! cache pool index "
% "is %d, var name is %s, "
"cached var name is %s, "
"var shape is %s ") %
(index, x, cache_var, str(cache_shape))) (index, x, cache_var, str(cache_shape)))
self.pool.pop(index) self.pool.pop(index)
_rename_arg_( _rename_arg_(
......
import layers import layers
__all__ = ["simple_img_conv_pool", "sequence_conv_pool"] __all__ = [
"simple_img_conv_pool",
"sequence_conv_pool",
]
def simple_img_conv_pool(input, def simple_img_conv_pool(input,
......
...@@ -8,7 +8,11 @@ import proto.framework_pb2 as framework_pb2 ...@@ -8,7 +8,11 @@ import proto.framework_pb2 as framework_pb2
from framework import OpProtoHolder, Variable, Program, Operator from framework import OpProtoHolder, Variable, Program, Operator
from paddle.v2.fluid.layer_helper import LayerHelper, unique_name from paddle.v2.fluid.layer_helper import LayerHelper, unique_name
__all__ = ['deprecated', 'register_layer', 'autodoc'] __all__ = [
'deprecated',
'register_layer',
'autodoc',
]
def _convert_(name): def _convert_(name):
...@@ -80,11 +84,10 @@ def _generate_doc_string_(op_proto): ...@@ -80,11 +84,10 @@ def _generate_doc_string_(op_proto):
def register_layer(op_type): def register_layer(op_type):
""" """Register the Python layer for an Operator.
Register an Python layer for an Operator
Args: Args:
op_type: The name of the operator to be created op_type: The name of the operator to be created.
This function takes in the operator type (sigmoid, mean , average etc) and This function takes in the operator type (sigmoid, mean , average etc) and
creates the operator functionality. creates the operator functionality.
...@@ -98,16 +101,16 @@ def register_layer(op_type): ...@@ -98,16 +101,16 @@ def register_layer(op_type):
if len(not_intermediate_outputs) != 1: if len(not_intermediate_outputs) != 1:
raise ValueError("Only one non intermediate output operator can be", raise ValueError("Only one non intermediate output operator can be",
"automatically generated") "automatically generated.")
if not_intermediate_outputs[0].duplicable: if not_intermediate_outputs[0].duplicable:
raise ValueError( raise ValueError(
"Only non duplicable op can be automatically generated") "Only non duplicable op can be automatically generated.")
for output in intermediate_outputs: for output in intermediate_outputs:
if output.duplicable: if output.duplicable:
raise ValueError("The op can be automatically generated only when ", raise ValueError("The op can be automatically generated only when ",
"all intermediate ops are not duplicable") "all intermediate ops are not duplicable.")
o_name = not_intermediate_outputs[0].name o_name = not_intermediate_outputs[0].name
intermediate_output_names = [output.name for output in intermediate_outputs] intermediate_output_names = [output.name for output in intermediate_outputs]
......
import framework import framework
__all__ = ['append_regularization_ops', 'L1Decay', 'L2Decay'] __all__ = [
'append_regularization_ops',
'L1Decay',
'L2Decay',
]
def append_regularization_ops(parameters_and_grads, regularization=None): def append_regularization_ops(parameters_and_grads, regularization=None):
......
...@@ -5,3 +5,4 @@ foreach(src ${TEST_OPS}) ...@@ -5,3 +5,4 @@ foreach(src ${TEST_OPS})
endforeach() endforeach()
add_subdirectory(book) add_subdirectory(book)
add_subdirectory(book_distribute)
file(GLOB TEST_OPS RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}" "test_*.py")
string(REPLACE ".py" "" TEST_OPS "${TEST_OPS}")
foreach(src ${TEST_OPS})
py_test(${src} SRCS ${src}.py)
endforeach()
import numpy as np
import paddle.v2 as paddle
import paddle.v2.fluid as fluid
import os
x = fluid.layers.data(name='x', shape=[13], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
y = fluid.layers.data(name='y', shape=[1], dtype='float32')
cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_cost = fluid.layers.mean(x=cost)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
optimize_ops, params_grads = sgd_optimizer.minimize(avg_cost)
BATCH_SIZE = 20
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500),
batch_size=BATCH_SIZE)
place = fluid.CPUPlace()
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
exe = fluid.Executor(place)
t = fluid.DistributeTranspiler()
# all parameter server endpoints list for spliting parameters
pserver_endpoints = os.getenv("PSERVERS")
# server endpoint for current node
current_endpoint = os.getenv("SERVER_ENDPOINT")
# run as trainer or parameter server
training_role = os.getenv("TRAINING_ROLE",
"TRAINER") # get the training role: trainer/pserver
t.transpile(optimize_ops, params_grads, pservers=pserver_endpoints, trainers=2)
if training_role == "PSERVER":
if not current_endpoint:
print("need env SERVER_ENDPOINT")
exit(1)
pserver_prog = t.get_pserver_program(current_endpoint, optimize_ops)
exe.run(fluid.default_startup_program())
exe.run(pserver_prog)
else:
trainer_prog = t.get_trainer_program()
exe.run(fluid.default_startup_program())
PASS_NUM = 100
for pass_id in range(PASS_NUM):
fluid.io.save_persistables(exe, "./fit_a_line.model/")
fluid.io.load_persistables(exe, "./fit_a_line.model/")
for data in train_reader():
avg_loss_value, = exe.run(trainer_prog,
feed=feeder.feed(data),
fetch_list=[avg_cost])
if avg_loss_value[0] < 10.0:
exit(0)
exit(1)
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册