提交 ec92588a 编写于 作者: Y yangyaming

Fix some conflicts and correct unittest.

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix-3923-c
...@@ -21,7 +21,6 @@ addons: ...@@ -21,7 +21,6 @@ addons:
- python - python
- python-pip - python-pip
- python2.7-dev - python2.7-dev
- python-numpy
- python-wheel - python-wheel
- libboost-dev - libboost-dev
- curl - curl
...@@ -35,8 +34,8 @@ before_install: ...@@ -35,8 +34,8 @@ before_install:
- if [[ "$JOB" == "check_style" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi - if [[ "$JOB" == "check_style" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi
# Paddle is using protobuf 3.1 currently. Protobuf 3.2 breaks the compatibility. So we specify the python # Paddle is using protobuf 3.1 currently. Protobuf 3.2 breaks the compatibility. So we specify the python
# protobuf version. # protobuf version.
- pip install -r $TRAVIS_BUILD_DIR/python/requirements.txt - sudo pip install -r $TRAVIS_BUILD_DIR/python/requirements.txt
- pip install wheel sphinx==1.5.6 recommonmark sphinx-rtd-theme==0.1.9 virtualenv pre-commit LinkChecker - sudo pip install wheel sphinx==1.5.6 recommonmark sphinx-rtd-theme==0.1.9 virtualenv pre-commit LinkChecker
- curl https://glide.sh/get | bash - curl https://glide.sh/get | bash
- eval "$(GIMME_GO_VERSION=1.8.3 gimme)" - eval "$(GIMME_GO_VERSION=1.8.3 gimme)"
- go get -u github.com/alecthomas/gometalinter - go get -u github.com/alecthomas/gometalinter
......
...@@ -65,8 +65,8 @@ if(NOT CMAKE_BUILD_TYPE) ...@@ -65,8 +65,8 @@ if(NOT CMAKE_BUILD_TYPE)
endif() endif()
if(ANDROID) if(ANDROID)
if(${CMAKE_SYSTEM_VERSION} VERSION_LESS "21") if(${CMAKE_SYSTEM_VERSION} VERSION_LESS "16")
message(FATAL_ERROR "Unsupport standalone toolchains with Android API level lower than 21") message(FATAL_ERROR "Unsupport standalone toolchains with Android API level lower than 16")
endif() endif()
set(WITH_GPU OFF CACHE STRING set(WITH_GPU OFF CACHE STRING
......
...@@ -86,12 +86,13 @@ def layer.fc(X): ...@@ -86,12 +86,13 @@ def layer.fc(X):
We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example: We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example:
```
| C++ functions/functors | mul | add | | | | C++ functions/functors | mul | add | | |
|------------------------|--------------|--------------|-------------|----------|
| C++ operator class | mulOp | addOp | FCOp | | | C++ operator class | mulOp | addOp | FCOp | |
| Python binding | operator.mul | operator.add | operator.fc | | | Python binding | operator.mul | operator.add | operator.fc | |
| Python function | | | | layer.fc | | Python function | | | | layer.fc |
```
This is how we differentiate layer and operators in PaddlePaddle: This is how we differentiate layer and operators in PaddlePaddle:
......
# Design Doc: Operation Graph Based Parameter Server
## Abstract
We propose an approach to implement the parameter server. In this
approach, there is no fundamental difference between the trainer and
the parameter server: they both run subgraphs, but subgraphs of
different purposes.
## Background
The previous implementations of the parameter server does not run a
subgraph. parameter initialization, optimizer computation, network
communication and checkpointing are implemented twice on both the
trainer and the parameter server.
It would be great if we can write code once and use them on both the
trainer and the parameter server: reduces code duplication and
improves extensibility. Given that after the current refactor, we are
representing everything as a computing graph on the
trainer. Representing everything as a computing graph on the parameter
server becomes a natural extension.
## Design
### Graph Converter
The *graph converter* converts the user-defined operation (OP) graph
into subgraphs to be scheduled on different nodes with the following
steps:
1. OP placement: the OPs will be placed on different nodes according
to heuristic that minimizes estimated total computation
time. Currently we will use a simple heuristic that puts parameter
varable on parameter server workers and everything else on trainer
workers.
1. Add communication OPs to enable the communication between nodes.
We will need these OPs: *Send*, *Recv*, *Enqueue*, *Dequeue*.
Below is an example of converting the user defined graph to the
subgraphs for the trainer and the parameter server:
<img src="src/local-graph.png" width="300"/>
After converting:
<img src="src/dist-graph.png" width="700"/>
1. The parameter variable W and it's optimizer subgraph are placed on the parameter server.
1. Operators are added to the subgraphs.
- *Send* sends data to the connected *Recv* operator. The
scheduler on the receive node will only schedule *Recv* operator
to run when the *Send* operator has ran (the *Send* OP will mark
the *Recv* OP runnable automatically).
- *Enueue* enqueues the input variable, it can block until space
become available in the queue.
- *Dequeue* outputs configurable numbers of tensors from the
queue. It will block until the queue have the required number of
tensors.
### Benefits
- Model parallelism become easier to implement: it's an extension to
the trainer - parameter server approach. we already have the
communication OPs, but need to extend the graph converter's
placement functionality.
- User-defined optimizer is easier to add - user can now express it as
a subgraph.
- No more duplication logic inside the trainer and the parameter
server mentioned in the background section.
### Challenges
- It might be hard for the graph converter to cut a general graph
(without any hint for which subgraph is the optimizer). We may need
to label which subgraph inside the OP graph is the optimizer.
- It's important to balance the parameter shards of on multiple
parameter server. If a single parameter is very big (some
word-embedding, fully connected, softmax layer), we need to
automatically partition the single parameter onto different
parameter servers when possible (only element-wise optimizer depends
on the parameter variable).
### Discussion
- In the "Aync SGD" figure, the "W" variable on the parameter server
could be read and wrote concurrently, what is our locking strategy?
E.g., each variable have a lock cpp method to be invoked by every
OP, or, have a lock OP.
- Can the Enqueue OP be implemented under our current tensor design
(puts the input tensor into the queue tensor)?
- *Dequeue* OP will have variable numbers of output (depends on the
`min_count` attribute), does our current design support it? (similar
question for the *Add* OP)
### References:
[1] [TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf)
...@@ -262,7 +262,7 @@ MulOp(const std::string &type, const framework::VariableNameMap &inputs, ...@@ -262,7 +262,7 @@ MulOp(const std::string &type, const framework::VariableNameMap &inputs,
- 生成库 - 生成库
无需修改 [`paddle/pybind/CMakeLists.txt`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/pybind/CMakeLists.txt)文件,`paddle/operators` 目录下新增的 `*_op.cc` 文件会被自动添加链接到生成的lib库中。 `paddle/operators` 目录下新增的 `*_op.cc` 文件会被自动添加链接到生成的lib库中。
## 实现单元测试 ## 实现单元测试
...@@ -354,11 +354,7 @@ class TestMulGradOp(GradientChecker): ...@@ -354,11 +354,7 @@ class TestMulGradOp(GradientChecker):
### 编译和执行单元测试 ### 编译和执行单元测试
单元测试编写完成之后,在[`python/paddle/v2/framework/tests/CMakeLists.txt`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/framework/tests/CMakeLists.txt)中添加以下内容,将单元测试加入工程: `python/paddle/v2/framework/tests` 目录下新增的 `test_*.py` 单元测试会被自动加入工程进行编译。
```
py_test(test_mul_op SRCS test_mul_op.py)
```
请注意,**不同于Op的编译测试,运行单元测试测时需要编译整个工程**,并且编译时需要打开`WITH_TESTING`, 即`cmake paddle_dir -DWITH_TESTING=ON`。编译成功后,执行下面的命令来运行单元测试: 请注意,**不同于Op的编译测试,运行单元测试测时需要编译整个工程**,并且编译时需要打开`WITH_TESTING`, 即`cmake paddle_dir -DWITH_TESTING=ON`。编译成功后,执行下面的命令来运行单元测试:
......
...@@ -5,15 +5,13 @@ ...@@ -5,15 +5,13 @@
PaddlePaddle的文档包括英文文档 ``doc`` 和中文文档 ``doc_cn`` 两个部分。文档都是通过 `cmake`_ 驱动 `sphinx`_ 编译生成,生成后的文档分别存储在编译目录的 ``doc`` 和 ``doc_cn`` 两个子目录下。 PaddlePaddle的文档包括英文文档 ``doc`` 和中文文档 ``doc_cn`` 两个部分。文档都是通过 `cmake`_ 驱动 `sphinx`_ 编译生成,生成后的文档分别存储在编译目录的 ``doc`` 和 ``doc_cn`` 两个子目录下。
如何构建PaddlePaddle的文档 如何构建文档
========================== ============
PaddlePaddle的文档构建有直接构建和基于Docker构建两种方式,我们提供了一个构建脚本build_docs.sh来进行构建。 PaddlePaddle的文档构建有两种方式。
PaddlePaddle文档需要准备的环境相对较复杂,所以我们推荐使用基于Docker来构建PaddlePaddle的文档。
使用Docker构建
使用Docker构建PaddlePaddle的文档 --------------
--------------------------------
使用Docker构建PaddlePaddle的文档,需要在系统里先安装好Docker工具包。Docker安装请参考 `Docker的官网 <https://docs.docker.com/>`_ 。安装好Docker之后可以使用源码目录下的脚本构建文档,即 使用Docker构建PaddlePaddle的文档,需要在系统里先安装好Docker工具包。Docker安装请参考 `Docker的官网 <https://docs.docker.com/>`_ 。安装好Docker之后可以使用源码目录下的脚本构建文档,即
...@@ -21,58 +19,46 @@ PaddlePaddle文档需要准备的环境相对较复杂,所以我们推荐使 ...@@ -21,58 +19,46 @@ PaddlePaddle文档需要准备的环境相对较复杂,所以我们推荐使
cd TO_YOUR_PADDLE_CLONE_PATH cd TO_YOUR_PADDLE_CLONE_PATH
cd paddle/scripts/tools/build_docs cd paddle/scripts/tools/build_docs
bash build_docs.sh with_docker sh build_docs.sh
编译完成后,会在当前目录生成两个子目录\:
* doc 英文文档目录
* doc_cn 中文文档目录
编译完成之后,会在当前目录生成两个子目录\: doc(英文文档目录)和 doc_cn(中文文档目录)。
打开浏览器访问对应目录下的index.html即可访问本地文档。 打开浏览器访问对应目录下的index.html即可访问本地文档。
直接构建
--------
直接构建PaddlePaddle的文档
--------------------------
因为PaddlePaddle的v2 api文档生成过程依赖于py_paddle Python包,用户需要首先确认py_paddle包已经安装。
.. code-block:: bash
python -c "import py_paddle"
如果提示错误,那么用户需要在本地编译安装PaddlePaddle,请参考 `源码编译文档 <http://doc.paddlepaddle.org/develop/doc/getstarted/build_and_install/build_from_source_en.html>`_ 。
注意,用户在首次编译安装PaddlePaddle时,请将WITH_DOC选项关闭。在编译安装正确之后,请再次确认py_paddle包已经安装,即可进行下一步操作。
如果提示正确,可以执行以下命令编译生成文档,即 如果提示正确,可以执行以下命令编译生成文档,即
.. code-block:: bash .. code-block:: bash
cd TO_YOUR_PADDLE_CLONE_PATH cd TO_YOUR_PADDLE_CLONE_PATH
cd paddle/scripts/tools/build_docs mkdir -p build
bash build_docs.sh local cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug -DWITH_GPU=OFF -DWITH_MKLDNN=OFF -DWITH_MKLML=OFF -DWITH_DOC=ON
编译完成之后,会在当前目录生成两个子目录\: make gen_proto_py
make paddle_docs paddle_docs_cn
* doc 英文文档目录
* doc_cn 中文文档目录
编译完成之后,会在当前目录生成两个子目录\: doc(英文文档目录)和 doc_cn(中文文档目录)。
打开浏览器访问对应目录下的index.html即可访问本地文档。 打开浏览器访问对应目录下的index.html即可访问本地文档。
如何书写PaddlePaddle的文档 如何书写文档
========================== ============
PaddlePaddle文档使用 `sphinx`_ 自动生成,用户可以参考sphinx教程进行书写。 PaddlePaddle文档使用 `sphinx`_ 自动生成,用户可以参考sphinx教程进行书写。
如何更新www.paddlepaddle.org文档 如何更新文档主题
================================ ================
PaddlePaddle文档主题在 `TO_YOUR_PADDLE_CLONE_PATH/doc_theme` 文件夹下,包含所有和前端网页设计相关的文件。
开发者给PaddlePaddle代码增加的注释以PR的形式提交到github中,提交方式可参见 `贡献文档 <http://doc.paddlepaddle.org/develop/doc_cn/howto/dev/contribute_to_paddle_cn.html>`_ 。 如何更新doc.paddlepaddle.org
============================
更新的文档以PR的形式提交到github中,提交方式参见 `贡献文档 <http://doc.paddlepaddle.org/develop/doc_cn/howto/dev/contribute_to_paddle_cn.html>`_ 。
目前PaddlePaddle的develop分支的文档是自动触发更新的,用户可以分别查看最新的 `中文文档 <http://doc.paddlepaddle.org/develop/doc_cn/>`_ 和 目前PaddlePaddle的develop分支的文档是自动触发更新的,用户可以分别查看最新的 `中文文档 <http://doc.paddlepaddle.org/develop/doc_cn/>`_ 和
`英文文档 <http://doc.paddlepaddle.org/develop/doc/>`_ 。 `英文文档 <http://doc.paddlepaddle.org/develop/doc/>`_ 。
.. _cmake: https://cmake.org/ .. _cmake: https://cmake.org/
.. _sphinx: http://www.sphinx-doc.org/en/1.4.8/ .. _sphinx: http://www.sphinx-doc.org/en/1.4.8/
...@@ -9,6 +9,7 @@ cc_test(eigen_test SRCS eigen_test.cc DEPS tensor) ...@@ -9,6 +9,7 @@ cc_test(eigen_test SRCS eigen_test.cc DEPS tensor)
cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor) cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor)
cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor) cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor)
nv_test(lod_tensor_gpu_test SRCS lod_tensor_test.cu DEPS lod_tensor)
cc_test(variable_test SRCS variable_test.cc) cc_test(variable_test SRCS variable_test.cc)
......
...@@ -45,7 +45,19 @@ class GreaterThanChecker { ...@@ -45,7 +45,19 @@ class GreaterThanChecker {
public: public:
explicit GreaterThanChecker(T lower_bound) : lower_bound_(lower_bound) {} explicit GreaterThanChecker(T lower_bound) : lower_bound_(lower_bound) {}
void operator()(T& value) const { void operator()(T& value) const {
PADDLE_ENFORCE(value > lower_bound_, "larger_than check fail"); PADDLE_ENFORCE(value > lower_bound_, "larger_than check fails.");
}
private:
T lower_bound_;
};
template <typename T>
class EqualGreaterThanChecker {
public:
explicit EqualGreaterThanChecker(T lower_bound) : lower_bound_(lower_bound) {}
void operator()(T& value) const {
PADDLE_ENFORCE_GE(value, lower_bound_, "equal_larger_than check fails.");
} }
private: private:
...@@ -115,6 +127,11 @@ class TypedAttrChecker { ...@@ -115,6 +127,11 @@ class TypedAttrChecker {
return *this; return *this;
} }
TypedAttrChecker& EqualGreaterThan(const T& lower_bound) {
value_checkers_.push_back(EqualGreaterThanChecker<T>(lower_bound));
return *this;
}
// we can add more common limits, like LessThan(), Between()... // we can add more common limits, like LessThan(), Between()...
TypedAttrChecker& SetDefault(const T& default_value) { TypedAttrChecker& SetDefault(const T& default_value) {
......
...@@ -2,20 +2,31 @@ ...@@ -2,20 +2,31 @@
## Motivation ## Motivation
In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass. In Neural Network, many model is solved by the the backpropagation algorithm(known as BP) at present. Technically it caculates the gradient of the loss function, then distributed back through the networks. Follows the chain rule, so we need a module chains the gradient operators/expressions together with to construct the backward pass. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
## Backward Operator Registry
A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients. ## Implementation
In this design doc, we exported only one API for generating the backward pass.
```c++
std::unique_ptr<OperatorBase> Backward(const OperatorBase& forwardOp,
const std::unordered_set<std::string>& no_grad_vars);
```
The implementation behind it can be divided into two parts, **Backward Operator Creating** and **Backward Operator Building**.
### Backward Operator Registry
A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs, and output gradients and then calculate its input gradients.
| | forward operator | backward operator | | forward operator | backward operator
| ---------------------- | ---------------- |------------------------- | | ---------------------- | ---------------- |------------------------- |
| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | | **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients |
| **Operator::outputs_** | Outputs | InputGradients | | **Operator::outputs_** | Outputs | InputGradients |
In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced. In most cases, there is a one-to-one correspondence between the forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.
For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro: For example, we have got a `mul_op`, and we can register its information and corresponding backward operator by the following macro:
```cpp ```cpp
REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
...@@ -25,58 +36,65 @@ REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); ...@@ -25,58 +36,65 @@ REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
`mul_grad` is the type of backward operator, and `MulOpGrad` is its class name. `mul_grad` is the type of backward operator, and `MulOpGrad` is its class name.
## Backward Opeartor Creating ### Backward Opeartor Creating
Given a certain forward operator, we can get its corresponding backward opeartor by calling: Given a certain forward operator, we can get its corresponding backward operator by calling:
```cpp ```cpp
OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op); OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op);
``` ```
The function `BuildGradOp` will sequentially execute following processes: The function `BuildGradOp` will sequentially execute following processes:
1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`. 1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.
2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these are not necessary for gradient computing. 2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these, are not necessary for gradient computing.
3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`. 3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.
4. Building backward operator with `inputs`, `outputs` and forward operator's attributes. 4. Building backward operator with `inputs`, `outputs` and forward operator's attributes.
## Backward Network Building ### Backward Network Building
A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and put them together. A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and append them together one by one. There is some corner case need to process specially.
In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network.
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.
1. Op 1. Op
when the input forward network is a Op, return its gradient Operator Immediately. When the input forward network is an Op, return its gradient Operator Immediately. If all of its outputs are in no gradient set, then return a special `NOP`.
2. NetOp 2. NetOp
when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to forward NetOp. In our design, the network itself is also a kind of operator(**NetOp**). So the operators contained by a big network may be some small network. When the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp.
3. RnnOp
RnnOp is a nested stepnet operator. Backward module need to recusively call `Backward` for every stepnet.
4. Sharing Variables
**sharing variables**. As illustrated in the pictures, two operator's share the same variable name of W@GRAD, which will overwrite their sharing input variable.
<p align="center">
<img src="./images/duplicate_op.png" width="50%" ><br/>
**shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwirte their shared input variable. ​ pic 1. Sharing variables in operators.
<p align="center"> </p>
<img src="./images/duplicate_op.png" width="70%" ><br/>
1. shared variable in two operators. ​ Sharing variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator to replace the overwrite links.
</p> <p align="center">
<img src="images/duplicate_op2.png" width="40%" ><br/>
Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator replace the overwirte links. ​ pic 2. Replace sharing variable's gradient with `Add` operator.
<p align="center"> </p>
<img src="images/duplicate_op2.png" width="90%" ><br/>
2. replace shared variable gradient with `Add` Operator ​ Because our framework finds variables accord to their names, we need to rename the output links. We add a suffix of number to represent its position in clockwise.
</p> 5. Part of Gradient is Zero.
In the whole graph, there is some case of that one operator's gradient is not needed, but its input's gradient is a dependency link of other operator, we need to fill a same shape gradient matrix in the position. In our implement, we insert a special `fillZeroLike` operator.
​ Then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it. Follow these rules above, then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it.
...@@ -283,5 +283,14 @@ std::ostream& operator<<(std::ostream& os, const DDim& ddim) { ...@@ -283,5 +283,14 @@ std::ostream& operator<<(std::ostream& os, const DDim& ddim) {
DDim::DDim(std::initializer_list<int64_t> init_list) { DDim::DDim(std::initializer_list<int64_t> init_list) {
*this = make_ddim(init_list); *this = make_ddim(init_list);
} }
DDim flatten_to_2d(const DDim& src, int num_col_dims) {
int rank = src.size();
return make_ddim({product(slice_ddim(src, 0, num_col_dims)),
product(slice_ddim(src, num_col_dims, rank))});
}
DDim flatten_to_1d(const DDim& src) { return make_ddim({product(src)}); }
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -115,6 +115,12 @@ int arity(const DDim& ddim); ...@@ -115,6 +115,12 @@ int arity(const DDim& ddim);
std::ostream& operator<<(std::ostream&, const DDim&); std::ostream& operator<<(std::ostream&, const DDim&);
// Reshape a tensor to a matrix. The matrix's first dimension(column length)
// will be the product of tensor's first `num_col_dims` dimensions.
DDim flatten_to_2d(const DDim& src, int num_col_dims);
DDim flatten_to_1d(const DDim& src);
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
......
...@@ -63,20 +63,35 @@ struct EigenTensor { ...@@ -63,20 +63,35 @@ struct EigenTensor {
template <typename T, int MajorType = Eigen::RowMajor, template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex> typename IndexType = Eigen::DenseIndex>
struct EigenMatrix : public EigenTensor<T, 2, MajorType, IndexType> {}; struct EigenMatrix : public EigenTensor<T, 2, MajorType, IndexType> {
static typename EigenMatrix::Type Reshape(Tensor& tensor, int num_col_dims) {
int rank = tensor.dims_.size();
PADDLE_ENFORCE(num_col_dims > 0 && num_col_dims < rank,
"`num_col_dims` must be between (0, rank_of_tensor).");
return EigenMatrix::From(tensor,
flatten_to_2d(tensor.dims(), num_col_dims));
}
static typename EigenMatrix::ConstType Reshape(const Tensor& tensor,
int num_col_dims) {
int rank = tensor.dims_.size();
PADDLE_ENFORCE(num_col_dims > 0 && num_col_dims < rank,
"`num_col_dims` must be between (0, rank_of_tensor).");
return EigenMatrix::From(tensor,
flatten_to_2d(tensor.dims(), num_col_dims));
}
};
template <typename T, int MajorType = Eigen::RowMajor, template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex> typename IndexType = Eigen::DenseIndex>
struct EigenVector : public EigenTensor<T, 1, MajorType, IndexType> { struct EigenVector : public EigenTensor<T, 1, MajorType, IndexType> {
// Flatten reshapes a Tensor into an EigenVector. // Flatten reshapes a Tensor into an EigenVector.
static typename EigenVector::Type Flatten(Tensor& tensor) { static typename EigenVector::Type Flatten(Tensor& tensor) {
return EigenVector::From( return EigenVector::From(tensor, {product(tensor.dims_)});
tensor, make_ddim({static_cast<int>(product(tensor.dims_))}));
} }
static typename EigenVector::ConstType Flatten(const Tensor& tensor) { static typename EigenVector::ConstType Flatten(const Tensor& tensor) {
return EigenVector::From( return EigenVector::From(tensor, {product(tensor.dims_)});
tensor, make_ddim({static_cast<int>(product(tensor.dims_))}));
} }
}; };
......
...@@ -108,5 +108,24 @@ TEST(Eigen, Matrix) { ...@@ -108,5 +108,24 @@ TEST(Eigen, Matrix) {
} }
} }
TEST(Eigen, MatrixReshape) {
Tensor t;
float* p = t.mutable_data<float>({2, 3, 6, 4}, platform::CPUPlace());
for (int i = 0; i < 2 * 3 * 6 * 4; ++i) {
p[i] = static_cast<float>(i);
}
EigenMatrix<float>::Type em = EigenMatrix<float>::Reshape(t, 2);
ASSERT_EQ(2 * 3, em.dimension(0));
ASSERT_EQ(6 * 4, em.dimension(1));
for (int i = 0; i < 2 * 3; i++) {
for (int j = 0; j < 6 * 4; j++) {
ASSERT_NEAR(i * 6 * 4 + j, em(i, j), 1e-6f);
}
}
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -18,8 +18,10 @@ ...@@ -18,8 +18,10 @@
#ifndef PADDLE_ONLY_CPU #ifndef PADDLE_ONLY_CPU
#include <thrust/device_vector.h> #include <thrust/device_vector.h>
#include <thrust/host_vector.h> #include <thrust/host_vector.h>
#include <thrust/system/cuda/experimental/pinned_allocator.h>
#endif #endif
#include <glog/logging.h>
#include "paddle/framework/ddim.h" #include "paddle/framework/ddim.h"
#include "paddle/framework/tensor.h" #include "paddle/framework/tensor.h"
#include "paddle/platform/enforce.h" #include "paddle/platform/enforce.h"
...@@ -32,7 +34,8 @@ template <typename T> ...@@ -32,7 +34,8 @@ template <typename T>
using Vector = std::vector<T>; using Vector = std::vector<T>;
#else #else
template <typename T> template <typename T>
using Vector = thrust::host_vector<T>; using Vector = thrust::host_vector<
T, thrust::system::cuda::experimental::pinned_allocator<T>>;
#endif #endif
using LoD = std::vector<Vector<size_t>>; using LoD = std::vector<Vector<size_t>>;
......
/*
Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
#include <cuda.h>
#include <cuda_runtime.h>
#include "paddle/framework/lod_tensor.h"
#include "paddle/platform/assert.h"
#include <gtest/gtest.h>
__global__ void test(size_t* a, int size) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < size;
i += blockDim.x * gridDim.x) {
a[i] *= 2;
}
}
TEST(LoDTensor, LoDInGPU) {
paddle::framework::Tensor tensor;
paddle::framework::LoDTensor lod_tensor;
paddle::platform::GPUPlace place(0);
paddle::framework::LoD src_lod;
src_lod.push_back(std::vector<size_t>{0, 2, 4, 6, 8, 10, 12, 14});
tensor.Resize({14, 16});
tensor.mutable_data<float>(place);
lod_tensor.set_lod(src_lod);
lod_tensor.set_tensor(&tensor);
CHECK_EQ(lod_tensor.lod_element(0, 2), 4);
CHECK_EQ(lod_tensor.lod_element(0, 4), 8);
auto lod = lod_tensor.lod();
test<<<1, 8>>>(lod[0].data(), lod[0].size());
cudaDeviceSynchronize();
for (size_t i = 0; i < src_lod[0].size(); ++i) {
CHECK_EQ(lod[0].data()[i], src_lod[0].data()[i] * 2);
}
}
...@@ -123,6 +123,15 @@ OperatorBase::OperatorBase(const std::string& type, ...@@ -123,6 +123,15 @@ OperatorBase::OperatorBase(const std::string& type,
CheckAllInputOutputSet(); CheckAllInputOutputSet();
} }
std::vector<std::string> OperatorBase::InputVars() const {
std::vector<std::string> ret_val;
for (auto& o : outputs_) {
ret_val.reserve(ret_val.size() + o.second.size());
ret_val.insert(ret_val.end(), o.second.begin(), o.second.end());
}
return ret_val;
}
std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const { std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const {
std::vector<std::string> ret_val; std::vector<std::string> ret_val;
if (has_intermediate) { if (has_intermediate) {
......
...@@ -94,11 +94,14 @@ class OperatorBase { ...@@ -94,11 +94,14 @@ class OperatorBase {
const VariableNameMap& Inputs() const { return inputs_; } const VariableNameMap& Inputs() const { return inputs_; }
const VariableNameMap& Outputs() const { return outputs_; } const VariableNameMap& Outputs() const { return outputs_; }
//! Get a input with argument's name described in `op_proto` //! Get a input with argument's name described in `op_proto`
std::string Input(const std::string& name) const; std::string Input(const std::string& name) const;
//! Get a input which has multiple variables. //! Get a input which has multiple variables.
const std::vector<std::string>& Inputs(const std::string& name) const; const std::vector<std::string>& Inputs(const std::string& name) const;
std::vector<std::string> InputVars() const;
//! Get a output with argument's name described in `op_proto` //! Get a output with argument's name described in `op_proto`
std::string Output(const std::string& name) const; std::string Output(const std::string& name) const;
//! Get an output which has multiple variables. //! Get an output which has multiple variables.
...@@ -311,9 +314,9 @@ class InferShapeContext { ...@@ -311,9 +314,9 @@ class InferShapeContext {
} }
template <typename T> template <typename T>
std::vector<const T*> MultiOutput(const std::string& name) const { std::vector<T*> MultiOutput(const std::string& name) const {
auto names = op_.Outputs(name); auto names = op_.Outputs(name);
std::vector<const T*> res; std::vector<T*> res;
res.reserve(names.size()); res.reserve(names.size());
std::transform(names.begin(), names.end(), std::back_inserter(res), std::transform(names.begin(), names.end(), std::back_inserter(res),
[&](const std::string& sub_name) { [&](const std::string& sub_name) {
......
...@@ -43,6 +43,9 @@ class Tensor { ...@@ -43,6 +43,9 @@ class Tensor {
template <typename T, size_t D, int MajorType, typename IndexType> template <typename T, size_t D, int MajorType, typename IndexType>
friend struct EigenTensor; friend struct EigenTensor;
template <typename T, int MajorType, typename IndexType>
friend struct EigenMatrix;
template <typename T, int MajorType, typename IndexType> template <typename T, int MajorType, typename IndexType>
friend struct EigenVector; friend struct EigenVector;
...@@ -78,6 +81,9 @@ class Tensor { ...@@ -78,6 +81,9 @@ class Tensor {
/*! Return the dimensions of the memory block. */ /*! Return the dimensions of the memory block. */
inline const DDim& dims() const; inline const DDim& dims() const;
/*! Return the numel of the memory block. */
inline int64_t numel() const;
/*! Resize the dimensions of the memory block. */ /*! Resize the dimensions of the memory block. */
inline Tensor& Resize(const DDim& dims); inline Tensor& Resize(const DDim& dims);
...@@ -159,6 +165,12 @@ class Tensor { ...@@ -159,6 +165,12 @@ class Tensor {
/*! points to dimensions of memory block. */ /*! points to dimensions of memory block. */
DDim dims_; DDim dims_;
/**
* A cache of the number of elements in a tensor.
* Would be 0 for an uninitialized tensor.
*/
int64_t numel_;
/** /**
* @brief A PlaceHolder may be shared by more than one tensor. * @brief A PlaceHolder may be shared by more than one tensor.
* *
......
...@@ -24,7 +24,7 @@ inline void Tensor::check_memory_size() const { ...@@ -24,7 +24,7 @@ inline void Tensor::check_memory_size() const {
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE_NOT_NULL(
holder_, "Tenosr holds no memory. Call Tensor::mutable_data first."); holder_, "Tenosr holds no memory. Call Tensor::mutable_data first.");
PADDLE_ENFORCE_GE( PADDLE_ENFORCE_GE(
holder_->size(), product(dims_) * sizeof(T) + offset_, holder_->size(), numel() * sizeof(T) + offset_,
"Tensor's dims_ is out of bound. Call Tensor::mutable_data " "Tensor's dims_ is out of bound. Call Tensor::mutable_data "
"first to re-allocate memory.\n" "first to re-allocate memory.\n"
"or maybe the required data-type mismatches the data already stored."); "or maybe the required data-type mismatches the data already stored.");
...@@ -54,11 +54,11 @@ inline T* Tensor::mutable_data(DDim dims, platform::Place place) { ...@@ -54,11 +54,11 @@ inline T* Tensor::mutable_data(DDim dims, platform::Place place) {
template <typename T> template <typename T>
inline T* Tensor::mutable_data(platform::Place place) { inline T* Tensor::mutable_data(platform::Place place) {
static_assert(std::is_pod<T>::value, "T must be POD"); static_assert(std::is_pod<T>::value, "T must be POD");
PADDLE_ENFORCE_GT(product(dims_), 0, PADDLE_ENFORCE_GT(numel(), 0,
"Tensor's numel must be larger than zero to call " "Tensor's numel must be larger than zero to call "
"Tensor::mutable_data. Call Tensor::set_dim first."); "Tensor::mutable_data. Call Tensor::set_dim first.");
/* some versions of boost::variant don't have operator!= */ /* some versions of boost::variant don't have operator!= */
int64_t size = product(dims_) * sizeof(T); int64_t size = numel() * sizeof(T);
if (holder_ == nullptr || !(holder_->place() == place) || if (holder_ == nullptr || !(holder_->place() == place) ||
holder_->size() < size + offset_) { holder_->size() < size + offset_) {
if (platform::is_cpu_place(place)) { if (platform::is_cpu_place(place)) {
...@@ -97,7 +97,7 @@ inline void Tensor::CopyFrom(const Tensor& src, ...@@ -97,7 +97,7 @@ inline void Tensor::CopyFrom(const Tensor& src,
auto dst_ptr = static_cast<void*>(mutable_data<T>(dst_place)); auto dst_ptr = static_cast<void*>(mutable_data<T>(dst_place));
auto size = product(src.dims_) * sizeof(T); auto size = src.numel() * sizeof(T);
if (platform::is_cpu_place(src_place) && platform::is_cpu_place(dst_place)) { if (platform::is_cpu_place(src_place) && platform::is_cpu_place(dst_place)) {
memory::Copy(boost::get<platform::CPUPlace>(dst_place), dst_ptr, memory::Copy(boost::get<platform::CPUPlace>(dst_place), dst_ptr,
...@@ -131,7 +131,7 @@ inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const { ...@@ -131,7 +131,7 @@ inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const {
PADDLE_ENFORCE_LT(begin_idx, end_idx, PADDLE_ENFORCE_LT(begin_idx, end_idx,
"Begin index must be less than end index."); "Begin index must be less than end index.");
PADDLE_ENFORCE_NE(dims_[0], 1, "Can not slice a tensor with dims_[0] = 1."); PADDLE_ENFORCE_NE(dims_[0], 1, "Can not slice a tensor with dims_[0] = 1.");
size_t base = product(dims_) / dims_[0]; size_t base = numel() / dims_[0];
Tensor dst; Tensor dst;
dst.holder_ = holder_; dst.holder_ = holder_;
DDim dst_dims = dims_; DDim dst_dims = dims_;
...@@ -143,10 +143,21 @@ inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const { ...@@ -143,10 +143,21 @@ inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const {
inline Tensor& Tensor::Resize(const DDim& dims) { inline Tensor& Tensor::Resize(const DDim& dims) {
dims_ = dims; dims_ = dims;
numel_ = product(dims_);
return *this; return *this;
} }
inline const DDim& Tensor::dims() const { return dims_; } inline const DDim& Tensor::dims() const { return dims_; }
inline int64_t Tensor::numel() const { return numel_; }
template <typename T>
inline Tensor ReshapeToMatrix(const Tensor& src, int num_col_dims) {
Tensor res;
res.ShareDataWith<T>(src);
res.Resize(flatten_to_2d(src.dims(), num_col_dims));
return res;
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -262,3 +262,16 @@ TEST(Tensor, CopyFrom) { ...@@ -262,3 +262,16 @@ TEST(Tensor, CopyFrom) {
} }
#endif #endif
} }
TEST(Tensor, ReshapeToMatrix) {
using namespace paddle::framework;
using namespace paddle::platform;
Tensor src;
int* src_ptr = src.mutable_data<int>({2, 3, 4, 9}, CPUPlace());
for (int i = 0; i < 2 * 3 * 4 * 9; ++i) {
src_ptr[i] = i;
}
Tensor res = ReshapeToMatrix<int>(src, 2);
ASSERT_EQ(res.dims()[0], 2 * 3);
ASSERT_EQ(res.dims()[1], 4 * 9);
}
\ No newline at end of file
...@@ -62,14 +62,18 @@ void BatchNormBaseLayer::calFeatureMapSize() { ...@@ -62,14 +62,18 @@ void BatchNormBaseLayer::calFeatureMapSize() {
const ImageConfig& conf = config_.inputs(0).image_conf(); const ImageConfig& conf = config_.inputs(0).image_conf();
imageH_ = inputLayers_[0]->getOutput().getFrameHeight(); imageH_ = inputLayers_[0]->getOutput().getFrameHeight();
imageW_ = inputLayers_[0]->getOutput().getFrameWidth(); imageW_ = inputLayers_[0]->getOutput().getFrameWidth();
imageD_ = inputLayers_[0]->getOutput().getFrameDepth();
if (0 == imageD_) imageD_ = conf.img_size_z();
if (imageH_ == 0 && imageW_ == 0) { if (imageH_ == 0 && imageW_ == 0) {
imageH_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size(); imageH_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size();
imageW_ = conf.img_size(); imageW_ = conf.img_size();
} else { } else {
getOutput().setFrameHeight(imageH_); getOutput().setFrameHeight(imageH_);
getOutput().setFrameWidth(imageW_); getOutput().setFrameWidth(imageW_);
getOutput().setFrameDepth(imageD_);
} }
imgPixels_ = imageH_ * imageW_; imgPixels_ = imageH_ * imageW_ * imageD_;
} }
} // namespace paddle } // namespace paddle
...@@ -80,6 +80,7 @@ protected: ...@@ -80,6 +80,7 @@ protected:
/// Height or width of input image feature. /// Height or width of input image feature.
/// Both of them are 1 if the input is fully-connected layer. /// Both of them are 1 if the input is fully-connected layer.
int imageD_;
int imageH_; int imageH_;
int imageW_; int imageW_;
/// Height * Width. /// Height * Width.
......
...@@ -37,7 +37,7 @@ bool CudnnBatchNormLayer::init(const LayerMap& layerMap, ...@@ -37,7 +37,7 @@ bool CudnnBatchNormLayer::init(const LayerMap& layerMap,
} }
void CudnnBatchNormLayer::reshape(int batchSize) { void CudnnBatchNormLayer::reshape(int batchSize) {
hl_tensor_reshape(ioDesc_, batchSize, channels_, imageH_, imageW_); hl_tensor_reshape(ioDesc_, batchSize, channels_, imageH_ * imageD_, imageW_);
} }
void CudnnBatchNormLayer::forward(PassType passType) { void CudnnBatchNormLayer::forward(PassType passType) {
...@@ -104,7 +104,7 @@ void CudnnBatchNormLayer::forward(PassType passType) { ...@@ -104,7 +104,7 @@ void CudnnBatchNormLayer::forward(PassType passType) {
EPS, EPS,
batchSize, batchSize,
channels_, channels_,
imageH_, imageH_ * imageD_,
imageW_); imageW_);
} }
} }
......
...@@ -53,27 +53,27 @@ bool DeConv3DLayer::init(const LayerMap &layerMap, ...@@ -53,27 +53,27 @@ bool DeConv3DLayer::init(const LayerMap &layerMap,
size_t DeConv3DLayer::getSize() { size_t DeConv3DLayer::getSize() {
CHECK_NE(inputLayers_.size(), 0UL); CHECK_NE(inputLayers_.size(), 0UL);
outputH_.clear(); imgSizeW_.clear();
outputW_.clear(); imgSizeH_.clear();
outputD_.clear(); imgSizeD_.clear();
N_.clear(); N_.clear();
NOut_.clear(); NOut_.clear();
size_t layerSize = 0; size_t layerSize = 0;
for (size_t i = 0; i < inputLayers_.size(); ++i) { for (size_t i = 0; i < inputLayers_.size(); ++i) {
outputW_.push_back( imgSizeW_.push_back(
imageSize(imgSizeW_[i], filterSize_[i], padding_[i], stride_[i], true)); imageSize(outputW_[i], filterSize_[i], padding_[i], stride_[i], true));
outputH_.push_back(imageSize( imgSizeH_.push_back(imageSize(
imgSizeH_[i], filterSizeY_[i], paddingY_[i], strideY_[i], true)); outputH_[i], filterSizeY_[i], paddingY_[i], strideY_[i], true));
outputD_.push_back(imageSize( imgSizeD_.push_back(imageSize(
imgSizeD_[i], filterSizeZ_[i], paddingZ_[i], strideZ_[i], true)); outputD_[i], filterSizeZ_[i], paddingZ_[i], strideZ_[i], true));
NOut_.push_back(outputD_[i] * outputH_[i] * outputW_[i]); NOut_.push_back(imgSizeD_[i] * imgSizeH_[i] * imgSizeW_[i]);
N_.push_back(imgSizeD_[i] * imgSizeH_[i] * imgSizeW_[i]); N_.push_back(outputD_[i] * outputH_[i] * outputW_[i]);
CHECK(layerSize == 0 || N_[i] * size_t(numFilters_) == layerSize); CHECK(layerSize == 0 || N_[i] * size_t(numFilters_) == layerSize);
layerSize += NOut_[i] * numFilters_; layerSize += NOut_[i] * numFilters_;
} }
getOutput().setFrameHeight(outputH_[0]); getOutput().setFrameHeight(imgSizeH_[0]);
getOutput().setFrameWidth(outputW_[0]); getOutput().setFrameWidth(imgSizeW_[0]);
getOutput().setFrameDepth(outputD_[0]); getOutput().setFrameDepth(imgSizeD_[0]);
return layerSize; return layerSize;
} }
...@@ -103,9 +103,9 @@ void DeConv3DLayer::forward(PassType passType) { ...@@ -103,9 +103,9 @@ void DeConv3DLayer::forward(PassType passType) {
} }
colBuf_->col2Vol(outMat->getData() + n * outMat->getStride(), colBuf_->col2Vol(outMat->getData() + n * outMat->getStride(),
numFilters_, numFilters_,
outputD_[i], imgSizeD_[i],
outputH_[i], imgSizeH_[i],
outputW_[i], imgSizeW_[i],
filterSizeZ_[i], filterSizeZ_[i],
filterSizeY_[i], filterSizeY_[i],
filterSize_[i], filterSize_[i],
...@@ -144,9 +144,9 @@ void DeConv3DLayer::backward(const UpdateCallback &callback) { ...@@ -144,9 +144,9 @@ void DeConv3DLayer::backward(const UpdateCallback &callback) {
colBuf_->vol2Col( colBuf_->vol2Col(
getOutputGrad()->getData() + n * getOutputGrad()->getStride(), getOutputGrad()->getData() + n * getOutputGrad()->getStride(),
numFilters_, numFilters_,
outputD_[i], imgSizeD_[i],
outputH_[i], imgSizeH_[i],
outputW_[i], imgSizeW_[i],
filterSizeZ_[i], filterSizeZ_[i],
filterSizeY_[i], filterSizeY_[i],
filterSize_[i], filterSize_[i],
......
...@@ -139,7 +139,13 @@ void DetectionOutputLayer::forward(PassType passType) { ...@@ -139,7 +139,13 @@ void DetectionOutputLayer::forward(PassType passType) {
allDecodedBBoxes, allDecodedBBoxes,
&allIndices); &allIndices);
resetOutput(numKept, 7); if (numKept > 0) {
resetOutput(numKept, 7);
} else {
MatrixPtr outV = getOutputValue();
outV = NULL;
return;
}
MatrixPtr outV = getOutputValue(); MatrixPtr outV = getOutputValue();
getDetectionOutput(confBuffer_->getData(), getDetectionOutput(confBuffer_->getData(),
numKept, numKept,
......
...@@ -469,7 +469,7 @@ size_t getDetectionIndices( ...@@ -469,7 +469,7 @@ size_t getDetectionIndices(
const size_t numClasses, const size_t numClasses,
const size_t backgroundId, const size_t backgroundId,
const size_t batchSize, const size_t batchSize,
const size_t confThreshold, const real confThreshold,
const size_t nmsTopK, const size_t nmsTopK,
const real nmsThreshold, const real nmsThreshold,
const size_t keepTopK, const size_t keepTopK,
......
...@@ -275,7 +275,7 @@ size_t getDetectionIndices( ...@@ -275,7 +275,7 @@ size_t getDetectionIndices(
const size_t numClasses, const size_t numClasses,
const size_t backgroundId, const size_t backgroundId,
const size_t batchSize, const size_t batchSize,
const size_t confThreshold, const real confThreshold,
const size_t nmsTopK, const size_t nmsTopK,
const real nmsThreshold, const real nmsThreshold,
const size_t keepTopK, const size_t keepTopK,
......
...@@ -49,6 +49,12 @@ struct LayerState { ...@@ -49,6 +49,12 @@ struct LayerState {
}; };
typedef std::shared_ptr<LayerState> LayerStatePtr; typedef std::shared_ptr<LayerState> LayerStatePtr;
/// Paddle device ID, MKLDNN is -2, CPU is -1
enum PADDLE_DEVICE_ID {
MKLDNN_DEVICE = -2,
CPU_DEVICE = -1,
};
/** /**
* @brief Base class for layer. * @brief Base class for layer.
* Define necessary variables and functions for every layer. * Define necessary variables and functions for every layer.
...@@ -59,11 +65,6 @@ protected: ...@@ -59,11 +65,6 @@ protected:
LayerConfig config_; LayerConfig config_;
/// whether to use GPU /// whether to use GPU
bool useGpu_; bool useGpu_;
/// Paddle device ID, MKLDNN is -2, CPU is -1
enum PADDLE_DEVICE_ID {
MKLDNN_DEVICE = -2,
CPU_DEVICE = -1,
};
/// Device Id. MKLDNN is -2, CPU is -1, and GPU is 0, 1, 2 ... /// Device Id. MKLDNN is -2, CPU is -1, and GPU is 0, 1, 2 ...
int deviceId_; int deviceId_;
/// Input layers /// Input layers
......
...@@ -14,7 +14,6 @@ limitations under the License. */ ...@@ -14,7 +14,6 @@ limitations under the License. */
#include "MKLDNNFcLayer.h" #include "MKLDNNFcLayer.h"
#include "paddle/utils/Logging.h" #include "paddle/utils/Logging.h"
#include "paddle/utils/Stat.h"
using namespace mkldnn; // NOLINT using namespace mkldnn; // NOLINT
typedef memory::format format; typedef memory::format format;
...@@ -40,6 +39,8 @@ bool MKLDNNFcLayer::init(const LayerMap& layerMap, ...@@ -40,6 +39,8 @@ bool MKLDNNFcLayer::init(const LayerMap& layerMap,
oc_ = getSize(); oc_ = getSize();
oh_ = 1; oh_ = 1;
ow_ = 1; ow_ = 1;
ih_ = 1;
iw_ = 1;
// input size can not change in FC // input size can not change in FC
iLayerSize_ = inputLayers_[0]->getSize(); iLayerSize_ = inputLayers_[0]->getSize();
...@@ -77,111 +78,86 @@ void MKLDNNFcLayer::convertWeightsToPaddle() { ...@@ -77,111 +78,86 @@ void MKLDNNFcLayer::convertWeightsToPaddle() {
wgtVal_->reorderDataTo(wgtVal_, dstFmt, targetDim); wgtVal_->reorderDataTo(wgtVal_, dstFmt, targetDim);
} }
void MKLDNNFcLayer::convertOutputToOtherDevice() { void MKLDNNFcLayer::reshape(
copyOutputInfoToOtherDevice(); int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) {
// find other cpu device and reorder output to cpu device reshapeInput(bs, ih, iw);
int cnt = 0;
for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
if (outputOtherDevice_[i].deviceId == CPU_DEVICE) {
// fc cpu output value do not need convert
// just share point
outputOtherDevice_[i].value = output_.value;
++cnt;
}
}
if (cnt > 1) {
LOG(WARNING) << "should not have more than one CPU devie";
}
}
void MKLDNNFcLayer::reshape() {
const Argument& input = getInput(0, getPrev(0)->getDeviceId());
int batchSize = input.getBatchSize();
if (bs_ == batchSize) {
return;
}
bs_ = batchSize;
ih_ = input.getFrameHeight();
iw_ = input.getFrameWidth();
if (ih_ == 0) {
ih_ = 1;
}
if (iw_ == 0) {
iw_ = 1;
}
CHECK_EQ(iLayerSize_, inputLayers_[0]->getSize()); CHECK_EQ(iLayerSize_, inputLayers_[0]->getSize());
ic_ = iLayerSize_ / (ih_ * iw_); ic = iLayerSize_ / (ih * iw);
CHECK_EQ(size_t(ic_ * ih_ * iw_), iLayerSize_) << "not divisible"; CHECK_EQ(size_t(ic * ih * iw), iLayerSize_) << "not divisible";
CHECK_EQ(size_t(oc_), getSize()); CHECK_EQ(size_t(oc), getSize());
printSizeInfo();
// reset output reshapeOutput(oh, ow);
output_.setFrameHeight(oh_); resizeOutput(bs, oc);
output_.setFrameWidth(ow_);
resetOutput(bs_, oc_);
// reset mkldnn forward printSizeInfo();
resetFwd();
needResetBwd_ = true;
convertWeightsFromPaddle();
} }
void MKLDNNFcLayer::resetFwd() { void MKLDNNFcLayer::resetFwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) {
pipeline.clear();
bool hasBias = biases_ && biases_->getW(); bool hasBias = biases_ && biases_->getW();
const MatrixPtr& wgt = weight_->getW(); const MatrixPtr& wgtVal = weight_->getW();
const MatrixPtr& bias = hasBias ? biases_->getW() : nullptr; const MatrixPtr& biasVal = hasBias ? biases_->getW() : nullptr;
const MatrixPtr& out = output_.value; const MatrixPtr& outVal = output_.value;
if (inputIsOnlyMKLDNN()) { if (inputIsOnlyMKLDNN()) {
const MatrixPtr& in = getInputValue(0); const MatrixPtr& inVal = getInputValue(0);
inVal_ = std::dynamic_pointer_cast<MKLDNNMatrix>(in); in = std::dynamic_pointer_cast<MKLDNNMatrix>(inVal);
CHECK(inVal_) << "Input should be MKLDNNMatrix"; CHECK(in) << "Input should be MKLDNNMatrix";
} else { } else {
CHECK_EQ(getPrev(0)->getDeviceId(), CPU_DEVICE) << "Only support CPU yet"; CHECK_EQ(getPrev(0)->getDeviceId(), CPU_DEVICE) << "Only support CPU yet";
const MatrixPtr& in = getInputValue(0, CPU_DEVICE); const MatrixPtr& inVal = getInputValue(0, CPU_DEVICE);
inVal_ = MKLDNNMatrix::create( in = MKLDNNMatrix::create(
in, memory::dims{bs_, ic_, ih_, iw_}, format::nchw, engine_); inVal, memory::dims{bs_, ic_, ih_, iw_}, format::nchw, engine_);
} }
inVal_->downSpatial(); in->downSpatial();
wgtVal_ = MKLDNNMatrix::create( wgt = MKLDNNMatrix::create(
wgt, memory::dims{oc_, ic_, ih_, iw_}, format::oihw, engine_); wgtVal, memory::dims{oc_, ic_, ih_, iw_}, format::oihw, engine_);
wgtVal_->downSpatial(); wgt->downSpatial();
biasVal_ = bias = hasBias ? MKLDNNMatrix::create(biasVal, {oc_}, format::x, engine_)
hasBias ? MKLDNNMatrix::create(bias, {oc_}, format::x, engine_) : nullptr; : nullptr;
outVal_ = MKLDNNMatrix::create(out, {bs_, oc_}, format::nc, engine_); out = MKLDNNMatrix::create(outVal, {bs_, oc_}, format::nc, engine_);
// change original output value to mkldnn output value // change original output value to mkldnn output value
output_.value = std::dynamic_pointer_cast<Matrix>(outVal_); output_.value = std::dynamic_pointer_cast<Matrix>(out);
if (!outputIsOnlyMKLDNN()) { if (!outputIsOnlyMKLDNN()) {
convertOutputToOtherDevice(); // fc cpu output value do not need create convert
// just share point
getOutput(CPU_DEVICE).value->setData(output_.value->getData());
} }
// create forward handle // create forward handle
prop_kind pk = prop_kind::forward; prop_kind pk = prop_kind::forward;
fc_fwd::desc fwdDesc = hasBias ? fc_fwd::desc(pk, fc_fwd::desc fwdDesc = hasBias ? fc_fwd::desc(pk,
inVal_->getMemoryDesc(), in->getMemoryDesc(),
wgtVal_->getMemoryDesc(), wgt->getMemoryDesc(),
biasVal_->getMemoryDesc(), bias->getMemoryDesc(),
outVal_->getMemoryDesc()) out->getMemoryDesc())
: fc_fwd::desc(pk, : fc_fwd::desc(pk,
inVal_->getMemoryDesc(), in->getMemoryDesc(),
wgtVal_->getMemoryDesc(), wgt->getMemoryDesc(),
outVal_->getMemoryDesc()); out->getMemoryDesc());
fc_fwd::primitive_desc fwdPD = fc_fwd::primitive_desc(fwdDesc, engine_); fc_fwd::primitive_desc fwdPD = fc_fwd::primitive_desc(fwdDesc, engine_);
if (hasBias) { if (hasBias) {
fwd_.reset(new fc_fwd(fwdPD, *inVal_, *wgtVal_, *biasVal_, *outVal_)); fwd_.reset(new fc_fwd(fwdPD, *in, *wgt, *bias, *out));
} else { } else {
fwd_.reset(new fc_fwd(fwdPD, *inVal_, *wgtVal_, *outVal_)); fwd_.reset(new fc_fwd(fwdPD, *in, *wgt, *out));
} }
printValueFormatFlow(); printValueFormatFlow();
pipelineFwd_.clear(); pipeline.push_back(*fwd_);
pipelineFwd_.push_back(*fwd_);
} }
void MKLDNNFcLayer::resetBwd() { void MKLDNNFcLayer::resetBwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) {
pipeline.clear();
if (!needResetBwd_) { if (!needResetBwd_) {
return; return;
} }
...@@ -190,8 +166,8 @@ void MKLDNNFcLayer::resetBwd() { ...@@ -190,8 +166,8 @@ void MKLDNNFcLayer::resetBwd() {
/// backward weight /// backward weight
CHECK(inVal_) << "Should have input value"; CHECK(inVal_) << "Should have input value";
const MatrixPtr& wgt = weight_->getWGrad(); const MatrixPtr& wgtGrad = weight_->getWGrad();
const MatrixPtr& bias = hasBias ? biases_->getWGrad() : nullptr; const MatrixPtr& biasGrad = hasBias ? biases_->getWGrad() : nullptr;
// TODO(TJ): merge outgrad // TODO(TJ): merge outgrad
int device = outputIsOnlyMKLDNN() ? MKLDNN_DEVICE : CPU_DEVICE; int device = outputIsOnlyMKLDNN() ? MKLDNN_DEVICE : CPU_DEVICE;
...@@ -202,101 +178,66 @@ void MKLDNNFcLayer::resetBwd() { ...@@ -202,101 +178,66 @@ void MKLDNNFcLayer::resetBwd() {
// for CPU device: // for CPU device:
// fc do not need to convert from cpu device since output is always nc format // fc do not need to convert from cpu device since output is always nc format
// only need create from cpu device // only need create from cpu device
const MatrixPtr& out = getOutput(device).grad; const MatrixPtr& outGrad = getOutput(device).grad;
outGrad_ = MKLDNNMatrix::create(out, outVal_->getPrimitiveDesc()); out = MKLDNNMatrix::create(outGrad, outVal_->getPrimitiveDesc());
wgtGrad_ = MKLDNNMatrix::create(wgt, wgtVal_->getPrimitiveDesc()); wgt = MKLDNNMatrix::create(wgtGrad, wgtVal_->getPrimitiveDesc());
biasGrad_ = hasBias ? MKLDNNMatrix::create(bias, biasVal_->getPrimitiveDesc()) bias = hasBias ? MKLDNNMatrix::create(biasGrad, biasVal_->getPrimitiveDesc())
: nullptr; : nullptr;
// create memory primitive desc // create memory primitive desc
fc_fwd::desc fwdDesc = fc_fwd::desc(prop_kind::forward, fc_fwd::desc fwdDesc = fc_fwd::desc(prop_kind::forward,
inVal_->getMemoryDesc(), inVal_->getMemoryDesc(),
wgtGrad_->getMemoryDesc(), wgt->getMemoryDesc(),
outGrad_->getMemoryDesc()); out->getMemoryDesc());
fc_fwd::primitive_desc fwdPD = fc_fwd::primitive_desc(fwdDesc, engine_); fc_fwd::primitive_desc fwdPD = fc_fwd::primitive_desc(fwdDesc, engine_);
fc_bwdWgt::desc bwdWgtDesc = hasBias fc_bwdWgt::desc bwdWgtDesc = hasBias
? fc_bwdWgt::desc(inVal_->getMemoryDesc(), ? fc_bwdWgt::desc(inVal_->getMemoryDesc(),
wgtGrad_->getMemoryDesc(), wgt->getMemoryDesc(),
biasGrad_->getMemoryDesc(), bias->getMemoryDesc(),
outGrad_->getMemoryDesc()) out->getMemoryDesc())
: fc_bwdWgt::desc(inVal_->getMemoryDesc(), : fc_bwdWgt::desc(inVal_->getMemoryDesc(),
wgtGrad_->getMemoryDesc(), wgt->getMemoryDesc(),
outGrad_->getMemoryDesc()); out->getMemoryDesc());
fc_bwdWgt::primitive_desc bwdWgtPD = fc_bwdWgt::primitive_desc bwdWgtPD =
fc_bwdWgt::primitive_desc(bwdWgtDesc, engine_, fwdPD); fc_bwdWgt::primitive_desc(bwdWgtDesc, engine_, fwdPD);
if (hasBias) { if (hasBias) {
bwdWgt_.reset( bwdWgt_.reset(new fc_bwdWgt(bwdWgtPD, *inVal_, *out, *wgt, *bias));
new fc_bwdWgt(bwdWgtPD, *inVal_, *outGrad_, *wgtGrad_, *biasGrad_));
} else { } else {
bwdWgt_.reset(new fc_bwdWgt(bwdWgtPD, *inVal_, *outGrad_, *wgtGrad_)); bwdWgt_.reset(new fc_bwdWgt(bwdWgtPD, *inVal_, *out, *wgt));
} }
pipelineBwd_.clear(); pipeline.push_back(*bwdWgt_);
pipelineBwd_.push_back(*bwdWgt_);
/// backward data /// backward data
device = inputIsOnlyMKLDNN() ? MKLDNN_DEVICE : CPU_DEVICE; const MatrixPtr& inGrad = inputLayers_[0]->getOutput().grad;
const MatrixPtr& in = getInputGrad(0, device); if (inGrad == nullptr) {
if (in == nullptr) {
return; return;
} }
if (getInput(0, device).getAllCount() > 1) { if (getInput(0, MKLDNN_DEVICE).getAllCount() > 1) {
// TODO(TJ): use outputMaps_ ways when merge outgrad done // TODO(TJ): use outputMaps_ ways to get the inGrad_ when merge outgrad done
} else { } else {
inGrad_ = MKLDNNMatrix::create(in, inVal_->getPrimitiveDesc()); in = MKLDNNMatrix::create(inGrad, inVal_->getPrimitiveDesc());
} }
fc_bwdData::desc bwdDataDesc = fc_bwdData::desc(inVal_->getMemoryDesc(), fc_bwdData::desc bwdDataDesc = fc_bwdData::desc(
wgtGrad_->getMemoryDesc(), inVal_->getMemoryDesc(), wgt->getMemoryDesc(), out->getMemoryDesc());
outGrad_->getMemoryDesc());
fc_bwdData::primitive_desc bwdDataPD = fc_bwdData::primitive_desc bwdDataPD =
fc_bwdData::primitive_desc(bwdDataDesc, engine_, fwdPD); fc_bwdData::primitive_desc(bwdDataDesc, engine_, fwdPD);
CHECK(wgtVal_) << "Should have weight memory"; CHECK(wgtVal_) << "Should have weight memory";
bwdData_.reset(new fc_bwdData(bwdDataPD, *outGrad_, *wgtVal_, *inGrad_)); bwdData_.reset(new fc_bwdData(bwdDataPD, *out, *wgtVal_, *in));
printGradFormatFlow(); printGradFormatFlow();
pipelineBwd_.push_back(*bwdData_); pipeline.push_back(*bwdData_);
} }
void MKLDNNFcLayer::forward(PassType passType) { void MKLDNNFcLayer::updateInputData() {
Layer::forward(passType); inVal_->setData(getInputValue(0, CPU_DEVICE)->getData());
reshape();
{
REGISTER_TIMER_INFO("mkldnn_FwdTimer", getName().c_str());
syncInputValue();
// just submit forward pipeline
stream_->submit(pipelineFwd_);
}
/* activation */ {
REGISTER_TIMER_INFO("FwActTimer", getName().c_str());
forwardActivation();
}
} }
void MKLDNNFcLayer::backward(const UpdateCallback& callback) { void MKLDNNFcLayer::updateWeights(const UpdateCallback& callback) {
/* Do derivation */ { weight_->getParameterPtr()->incUpdate(callback);
REGISTER_TIMER_INFO("BpActTimer", getName().c_str()); if (biases_ && biases_->getWGrad()) {
backwardActivation(); biases_->getParameterPtr()->incUpdate(callback);
}
{
REGISTER_TIMER_INFO("mkldnn_bwdTimer", getName().c_str());
resetBwd();
syncOutputGrad();
// just sumbmit backward pipeline
stream_->submit(pipelineBwd_);
}
{
REGISTER_TIMER_INFO("WeightUpdate", getName().c_str());
weight_->getParameterPtr()->incUpdate(callback);
if (biases_ && biases_->getWGrad()) {
biases_->getParameterPtr()->incUpdate(callback);
}
} }
} }
} // namespace paddle } // namespace paddle
...@@ -45,35 +45,28 @@ public: ...@@ -45,35 +45,28 @@ public:
bool init(const LayerMap& layerMap, bool init(const LayerMap& layerMap,
const ParameterMap& parameterMap) override; const ParameterMap& parameterMap) override;
void convertWeightsFromPaddle() override; void reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) override;
void convertWeightsToPaddle() override; void resetFwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override;
void forward(PassType passType) override; void resetBwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override;
void backward(const UpdateCallback& callback) override; void updateInputData() override;
protected: void updateWeights(const UpdateCallback& callback) override;
/**
* reshape the input image sizes void convertWeightsFromPaddle() override;
* and reset output buffer size
* and reset mkldnn forward void convertWeightsToPaddle() override;
*/
void reshape();
/**
* reset the forward primitve and memory
* only would be called when input size changes
*/
void resetFwd();
/**
* reset the backward primitve and memory for mkldnn fc
* only would be called when needed
*/
void resetBwd();
void convertOutputToOtherDevice() override;
}; };
} // namespace paddle } // namespace paddle
...@@ -19,6 +19,7 @@ limitations under the License. */ ...@@ -19,6 +19,7 @@ limitations under the License. */
#include "MKLDNNBase.h" #include "MKLDNNBase.h"
#include "mkldnn.hpp" #include "mkldnn.hpp"
#include "paddle/math/MKLDNNMatrix.h" #include "paddle/math/MKLDNNMatrix.h"
#include "paddle/utils/Stat.h"
DECLARE_bool(use_mkldnn); DECLARE_bool(use_mkldnn);
...@@ -33,6 +34,8 @@ typedef std::shared_ptr<MKLDNNLayer> MKLDNNLayerPtr; ...@@ -33,6 +34,8 @@ typedef std::shared_ptr<MKLDNNLayer> MKLDNNLayerPtr;
*/ */
class MKLDNNLayer : public Layer { class MKLDNNLayer : public Layer {
protected: protected:
// input value element count
size_t inputElemenCnt_;
// batch size // batch size
int bs_; int bs_;
// input image channel, height and width // input image channel, height and width
...@@ -52,7 +55,7 @@ protected: ...@@ -52,7 +55,7 @@ protected:
std::vector<mkldnn::primitive> pipelineFwd_; std::vector<mkldnn::primitive> pipelineFwd_;
std::vector<mkldnn::primitive> pipelineBwd_; std::vector<mkldnn::primitive> pipelineBwd_;
// MKLDNNMatrixPtr // MKLDNNMatrixPtr with internal format
MKLDNNMatrixPtr inVal_; MKLDNNMatrixPtr inVal_;
MKLDNNMatrixPtr inGrad_; MKLDNNMatrixPtr inGrad_;
MKLDNNMatrixPtr outVal_; MKLDNNMatrixPtr outVal_;
...@@ -65,6 +68,7 @@ protected: ...@@ -65,6 +68,7 @@ protected:
public: public:
explicit MKLDNNLayer(const LayerConfig& config) explicit MKLDNNLayer(const LayerConfig& config)
: Layer(config), : Layer(config),
inputElemenCnt_(0),
bs_(0), bs_(0),
ic_(0), ic_(0),
ih_(0), ih_(0),
...@@ -95,12 +99,104 @@ public: ...@@ -95,12 +99,104 @@ public:
if (!Layer::init(layerMap, parameterMap)) { if (!Layer::init(layerMap, parameterMap)) {
return false; return false;
} }
checkCPUOutputsNumber();
stream_.reset(new MKLDNNStream()); stream_.reset(new MKLDNNStream());
engine_ = CPUEngine::Instance().getEngine(); engine_ = CPUEngine::Instance().getEngine();
return true; return true;
} }
void forward(PassType passType) override {
passType_ = passType;
{
REGISTER_TIMER_INFO("mkldnn_FwdTimer", getName().c_str());
CHECK(!inputLayers_.empty());
copySeqInfoToOutputs();
size_t elemenCnt = inputLayers_[0]->getOutput().value->getElementCnt();
if (inputElemenCnt_ != elemenCnt) {
// reset when input total sizes changed, not only the batchsize
inputElemenCnt_ = elemenCnt;
reshape(bs_, ic_, ih_, iw_, oc_, oh_, ow_);
resetFwd(pipelineFwd_, inVal_, wgtVal_, biasVal_, outVal_);
convertWeightsFromPaddle();
needResetBwd_ = true;
}
if (inputLayers_[0]->getType() == "data") {
updateInputData();
}
stream_->submit(pipelineFwd_);
}
/* activation */ {
REGISTER_TIMER_INFO("FwActTimer", getName().c_str());
forwardActivation();
}
}
void backward(const UpdateCallback& callback) override {
/* Do derivation */ {
REGISTER_TIMER_INFO("BpActTimer", getName().c_str());
backwardActivation();
}
{
REGISTER_TIMER_INFO("mkldnn_bwdTimer", getName().c_str());
if (needResetBwd_) {
resetBwd(pipelineBwd_, inGrad_, wgtGrad_, biasGrad_, outGrad_);
needResetBwd_ = false;
}
stream_->submit(pipelineBwd_);
}
{
REGISTER_TIMER_INFO("WeightUpdate", getName().c_str());
updateWeights(callback);
}
}
/**
* reshape the input image sizes
* and reset output image and buffer size
* output channel can not be changed
*/
virtual void reshape(
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) = 0;
/**
* reset the mkldnn forward primitve and memory
* only would be called when input size changes
*/
virtual void resetFwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) = 0;
/**
* reset the mkldnn backward primitve and memory for mkldnn fc
* only would be called when needed
*/
virtual void resetBwd(std::vector<mkldnn::primitive>& pipeline,
MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) = 0;
/**
* Update input value data when input layer is "data" type.
* Since the input value data address might be changed.
*/
virtual void updateInputData() {}
/**
* Update weights and biases if necessary.
*/
virtual void updateWeights(const UpdateCallback& callback) {}
/** /**
* convert weight from paddle format to mkldnn format * convert weight from paddle format to mkldnn format
* weight_ will be override * weight_ will be override
...@@ -114,10 +210,38 @@ public: ...@@ -114,10 +210,38 @@ public:
virtual void convertWeightsToPaddle() {} virtual void convertWeightsToPaddle() {}
/** /**
* convert MKLDNN output to other device. * add this interface as public for unit test
* only support CPU device yet */
void addOutputArgument(int deviceId) { Layer::addOutputArgument(deviceId); }
protected:
/**
* reshape the input image sizes and input batchsize
*/ */
virtual void convertOutputToOtherDevice() {} virtual void reshapeInput(int& batchsize, int& height, int& width) {
const Argument& input = inputLayers_[0]->getOutput();
batchsize = input.getBatchSize();
int h = input.getFrameHeight();
int w = input.getFrameWidth();
if (h != 0) {
height = h;
}
if (w != 0) {
width = w;
}
}
/**
* reshape output image sizes
*/
virtual void reshapeOutput(size_t height, size_t width) {
output_.setFrameHeight(height);
output_.setFrameWidth(width);
for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
outputOtherDevice_[i].setFrameHeight(height);
outputOtherDevice_[i].setFrameWidth(width);
}
}
/** /**
* print info about sizes * print info about sizes
...@@ -133,8 +257,8 @@ public: ...@@ -133,8 +257,8 @@ public:
*/ */
virtual void printValueFormatFlow() { virtual void printValueFormatFlow() {
if (inVal_ && outVal_) { if (inVal_ && outVal_) {
VLOG(MKLDNN_FMTS) << "value format flow --- " << inVal_->getFormat() VLOG(MKLDNN_FMTS) << inVal_->getFormat() << " >>> "
<< " >>> " << outVal_->getFormat(); << outVal_->getFormat();
} }
} }
...@@ -143,29 +267,12 @@ public: ...@@ -143,29 +267,12 @@ public:
*/ */
virtual void printGradFormatFlow() { virtual void printGradFormatFlow() {
if (inGrad_ && outGrad_) { if (inGrad_ && outGrad_) {
VLOG(MKLDNN_FMTS) << "grad format flow --- " << inGrad_->getFormat() VLOG(MKLDNN_FMTS) << inGrad_->getFormat() << " <<< "
<< " <<< " << outGrad_->getFormat(); << outGrad_->getFormat();
} }
} }
protected: protected:
/**
* copy image size and sequence info to other device
* @note: can not directly use Layer::copyOutputToOtherDevice since here only
* copy base info and do not copy data value
*/
void copyOutputInfoToOtherDevice() {
for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
outputOtherDevice_[i].setFrameHeight(output_.getFrameHeight());
outputOtherDevice_[i].setFrameWidth(output_.getFrameWidth());
outputOtherDevice_[i].sequenceStartPositions =
output_.sequenceStartPositions;
outputOtherDevice_[i].subSequenceStartPositions =
output_.subSequenceStartPositions;
outputOtherDevice_[i].cpuSequenceDims = output_.cpuSequenceDims;
}
}
/** /**
* If input only has MKLDNN device. * If input only has MKLDNN device.
* Otherwise, only support the previous layer using CPU device. * Otherwise, only support the previous layer using CPU device.
...@@ -193,37 +300,12 @@ protected: ...@@ -193,37 +300,12 @@ protected:
return outputOtherDevice_.size() == 0; return outputOtherDevice_.size() == 0;
} }
/**
* Sync input value data
*/
void syncInputValue() {
if (inputIsOnlyMKLDNN()) {
return;
}
real* iData = getInputValue(0, CPU_DEVICE)->getData();
// update input data
// since it might be changed if this is after data layer
inVal_->updateData(iData);
}
/**
* Sync output grad data
*/
void syncOutputGrad() {
if (outputIsOnlyMKLDNN()) {
return;
}
// update diff
real* oDiff = getOutput(CPU_DEVICE).grad->getData();
outGrad_->updateData(oDiff);
}
/** /**
* Set deviceId of this layer. * Set deviceId of this layer.
*/ */
void setDevice(int id) { deviceId_ = id; } void setDevice(int id) { deviceId_ = id; }
private:
/** /**
* Set deviceId of the params used in this layer. * Set deviceId of the params used in this layer.
*/ */
...@@ -247,6 +329,42 @@ protected: ...@@ -247,6 +329,42 @@ protected:
parameter->setDevice(id); parameter->setDevice(id);
} }
} }
/**
* Check the cpu device number of outputOtherDevice_.
* should have only one at most.
*/
void checkCPUOutputsNumber(int max = 1) {
int cnt = 0;
for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
if (outputOtherDevice_[i].deviceId == CPU_DEVICE) {
++cnt;
}
}
CHECK_LE(cnt, max) << "too much CPU devies";
}
/**
* copy SeqInfo from input layer to this output and other output devices.
* @note: do not use getInput(0) since it used this deviceId_,
* use "inputLayers_[0]->getOutput()" instead.
*/
void copySeqInfoToOutputs() {
if (inputLayers_.empty() || !needSequenceInfo_) {
return;
}
const Argument& input = inputLayers_[0]->getOutput();
output_.sequenceStartPositions = input.sequenceStartPositions;
output_.subSequenceStartPositions = input.subSequenceStartPositions;
output_.cpuSequenceDims = input.cpuSequenceDims;
for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
outputOtherDevice_[i].sequenceStartPositions =
output_.sequenceStartPositions;
outputOtherDevice_[i].subSequenceStartPositions =
output_.subSequenceStartPositions;
outputOtherDevice_[i].cpuSequenceDims = output_.cpuSequenceDims;
}
}
}; };
} // namespace paddle } // namespace paddle
...@@ -24,19 +24,21 @@ bool SwitchOrderLayer::init(const LayerMap& layerMap, ...@@ -24,19 +24,21 @@ bool SwitchOrderLayer::init(const LayerMap& layerMap,
/* Initialize the basic parent class */ /* Initialize the basic parent class */
Layer::init(layerMap, parameterMap); Layer::init(layerMap, parameterMap);
auto& img_conf = config_.inputs(0).image_conf(); auto& img_conf = config_.inputs(0).image_conf();
size_t inD = img_conf.img_size_z();
size_t inH = size_t inH =
img_conf.has_img_size_y() ? img_conf.img_size_y() : img_conf.img_size(); img_conf.has_img_size_y() ? img_conf.img_size_y() : img_conf.img_size();
size_t inW = img_conf.img_size(); size_t inW = img_conf.img_size();
size_t inC = img_conf.channels(); size_t inC = img_conf.channels();
inH = inH * inD;
inDims_ = TensorShape({0, inC, inH, inW}); inDims_ = TensorShape({0, inC, inH, inW});
outDims_ = TensorShape(4); outDims_ = TensorShape(4);
auto& reshape_conf = config_.reshape_conf(); auto& reshape_conf = config_.reshape_conf();
for (size_t i = 0; i < reshape_conf.heightaxis_size(); i++) { for (int i = 0; i < reshape_conf.height_axis_size(); i++) {
heightAxis_.push_back(reshape_conf.heightaxis(i)); heightAxis_.push_back(reshape_conf.height_axis(i));
} }
for (size_t i = 0; i < reshape_conf.widthaxis_size(); i++) { for (int i = 0; i < reshape_conf.width_axis_size(); i++) {
widthAxis_.push_back(reshape_conf.widthaxis(i)); widthAxis_.push_back(reshape_conf.width_axis(i));
} }
createFunction(nchw2nhwc_, "NCHW2NHWC", FuncConfig()); createFunction(nchw2nhwc_, "NCHW2NHWC", FuncConfig());
createFunction(nhwc2nchw_, "NHWC2NCHW", FuncConfig()); createFunction(nhwc2nchw_, "NHWC2NCHW", FuncConfig());
...@@ -64,9 +66,10 @@ void SwitchOrderLayer::setInDims() { ...@@ -64,9 +66,10 @@ void SwitchOrderLayer::setInDims() {
MatrixPtr input = inputLayers_[0]->getOutputValue(); MatrixPtr input = inputLayers_[0]->getOutputValue();
size_t batchSize = input->getHeight(); size_t batchSize = input->getHeight();
inDims_.setDim(0, batchSize); inDims_.setDim(0, batchSize);
int d = inputLayers_[0]->getOutput().getFrameDepth();
d = (d == 0 ? 1 : d);
int h = inputLayers_[0]->getOutput().getFrameHeight(); int h = inputLayers_[0]->getOutput().getFrameHeight();
if (h != 0) inDims_.setDim(2, h); if (h != 0) inDims_.setDim(2, h * d);
int w = inputLayers_[0]->getOutput().getFrameWidth(); int w = inputLayers_[0]->getOutput().getFrameWidth();
if (w != 0) inDims_.setDim(3, w); if (w != 0) inDims_.setDim(3, w);
int totalCount = input->getElementCnt(); int totalCount = input->getElementCnt();
...@@ -80,8 +83,7 @@ void SwitchOrderLayer::forward(PassType passType) { ...@@ -80,8 +83,7 @@ void SwitchOrderLayer::forward(PassType passType) {
setOutDims(); setOutDims();
resetOutput(outDims_[0], outDims_[1] * outDims_[2] * outDims_[3]); resetOutput(outDims_[0], outDims_[1] * outDims_[2] * outDims_[3]);
if (heightAxis_.size() > 0) { if (heightAxis_.size() > 0) {
getOutputValue()->reshape(reshapeHeight_, reshapeWidth_); resetOutput(reshapeHeight_, reshapeWidth_);
getOutputGrad()->reshape(reshapeHeight_, reshapeWidth_);
} }
// switch NCHW to NHWC // switch NCHW to NHWC
......
...@@ -63,8 +63,12 @@ void MKLDNNTester::reset(const TestConfig& dnn, ...@@ -63,8 +63,12 @@ void MKLDNNTester::reset(const TestConfig& dnn,
initTestLayer( initTestLayer(
configs_[i], &(layerMaps_[i]), &(parameters_[i]), &(testLayers_[i])); configs_[i], &(layerMaps_[i]), &(parameters_[i]), &(testLayers_[i]));
} }
dnnLayer_ = testLayers_[DNN];
refLayer_ = testLayers_[REF]; refLayer_ = testLayers_[REF];
dnnLayer_ = std::dynamic_pointer_cast<MKLDNNLayer>(testLayers_[DNN]);
CHECK(dnnLayer_);
// for comparison with Paddle reference results,
// need manually add cpu device output for test
dnnLayer_->addOutputArgument(CPU_DEVICE);
EXPECT_EQ(dataLayers_[DNN].size(), dataLayers_[REF].size()); EXPECT_EQ(dataLayers_[DNN].size(), dataLayers_[REF].size());
EXPECT_EQ(parameters_[DNN].size(), parameters_[REF].size()); EXPECT_EQ(parameters_[DNN].size(), parameters_[REF].size());
...@@ -109,20 +113,22 @@ void MKLDNNTester::randomBotDatas() { ...@@ -109,20 +113,22 @@ void MKLDNNTester::randomBotDatas() {
void MKLDNNTester::randomTopDiffs() { void MKLDNNTester::randomTopDiffs() {
refLayer_->getOutputGrad()->randomizeUniform(); refLayer_->getOutputGrad()->randomizeUniform();
dnnLayer_->getOutputGrad()->copyFrom(*(refLayer_->getOutputGrad())); dnnLayer_->getOutput(CPU_DEVICE)
VLOG(lvl_) << "Random dom Backward Input, TopDiff: "; .grad->copyFrom(*(refLayer_->getOutputGrad()));
VLOG(lvl_) << "Random Backward Input, TopDiff: ";
printMatrix(refLayer_->getOutputGrad()); printMatrix(refLayer_->getOutputGrad());
} }
void MKLDNNTester::checkForward() { void MKLDNNTester::checkForward() {
printTopDatas();
double delta = compareMatrix(testLayers_[DNN]->getOutputValue(),
testLayers_[REF]->getOutputValue());
VLOG(MKLDNN_ALL) << "Check Forward"; VLOG(MKLDNN_ALL) << "Check Forward";
printTopDatas();
double delta = compareMatrix(dnnLayer_->getOutput(-1).value,
refLayer_->getOutputValue());
EXPECT_LE(fabs(delta), eps_); EXPECT_LE(fabs(delta), eps_);
} }
void MKLDNNTester::checkBackwardData() { void MKLDNNTester::checkBackwardData() {
VLOG(MKLDNN_ALL) << "Check Backward Data";
// TODO(TJ): uncomment me when batch norm ready // TODO(TJ): uncomment me when batch norm ready
// const bool isBN = dnnLayer_->getType() == "mkldnn_batch_norm"; // const bool isBN = dnnLayer_->getType() == "mkldnn_batch_norm";
for (size_t i = 0; i < dataLayers_[DNN].size(); ++i) { for (size_t i = 0; i < dataLayers_[DNN].size(); ++i) {
...@@ -144,14 +150,12 @@ void MKLDNNTester::checkBackwardData() { ...@@ -144,14 +150,12 @@ void MKLDNNTester::checkBackwardData() {
} }
void MKLDNNTester::checkBackwardWgts() { void MKLDNNTester::checkBackwardWgts() {
VLOG(MKLDNN_ALL) << "Check Backward Weight";
CHECK_EQ(parameters_[DNN].size(), parameters_[REF].size()); CHECK_EQ(parameters_[DNN].size(), parameters_[REF].size());
vector<VectorPtr> dnnWgts; // used to temply save mkldnn weights vector<VectorPtr> dnnWgts; // used to temply save mkldnn weights
saveWgt(parameters_[DNN], dnnWgts); saveWgt(parameters_[DNN], dnnWgts);
const MKLDNNLayerPtr dnnlayer = dnnLayer_->convertWeightsToPaddle();
std::dynamic_pointer_cast<MKLDNNLayer>(dnnLayer_);
CHECK(dnnlayer);
dnnlayer->convertWeightsToPaddle();
for (size_t i = 0; i < parameters_[DNN].size(); ++i) { for (size_t i = 0; i < parameters_[DNN].size(); ++i) {
const VectorPtr& dnn = parameters_[DNN][i]->getBuf(PARAMETER_VALUE); const VectorPtr& dnn = parameters_[DNN][i]->getBuf(PARAMETER_VALUE);
const VectorPtr& ref = parameters_[REF][i]->getBuf(PARAMETER_VALUE); const VectorPtr& ref = parameters_[REF][i]->getBuf(PARAMETER_VALUE);
...@@ -189,38 +193,38 @@ void MKLDNNTester::restoreWgt(const vector<VectorPtr>& from, ...@@ -189,38 +193,38 @@ void MKLDNNTester::restoreWgt(const vector<VectorPtr>& from,
} }
// clear parameters grad // clear parameters grad
void MKLDNNTester::clearWgtDiffs() { void MKLDNNTester::clearWgtDiffs(size_t id) {
CHECK_LE(id, parameters_.size());
for (size_t n = 0; n < parameters_.size(); ++n) { for (size_t n = 0; n < parameters_.size(); ++n) {
for (size_t i = 0; i < parameters_[n].size(); ++i) { if (id == n || id == parameters_.size()) {
const VectorPtr& grad = parameters_[n][i]->getBuf(PARAMETER_GRADIENT); for (size_t i = 0; i < parameters_[n].size(); ++i) {
if (grad) { const VectorPtr& grad = parameters_[n][i]->getBuf(PARAMETER_GRADIENT);
grad->zeroMem(); if (grad) {
grad->zeroMem();
}
} }
} }
} }
} }
void MKLDNNTester::clearBotDiffs() { void MKLDNNTester::clearBotDiffs(size_t id) {
// dnn and ref CHECK_LE(id, dataLayers_.size());
for (size_t n = 0; n < dataLayers_.size(); ++n) { for (size_t n = 0; n < dataLayers_.size(); ++n) {
// all inputs layers if (id == n || id == dataLayers_.size()) {
for (size_t i = 0; i < dataLayers_[n].size(); ++i) { // clear inputs layers of this specific layer
dataLayers_[n][i]->getOutputGrad()->zeroMem(); for (size_t i = 0; i < dataLayers_[n].size(); ++i) {
dataLayers_[n][i]->getOutputGrad()->zeroMem();
}
} }
} }
} }
void MKLDNNTester::clearBotDiffs(int n) { void MKLDNNTester::clearTopDatas(size_t id) {
CHECK_LT(n, NUM); CHECK_LE(id, testLayers_.size());
// all inputs layers
for (size_t i = 0; i < dataLayers_[n].size(); ++i) {
dataLayers_[n][i]->getOutputGrad()->zeroMem();
}
}
void MKLDNNTester::clearTopDatas() {
for (size_t i = 0; i < testLayers_.size(); ++i) { for (size_t i = 0; i < testLayers_.size(); ++i) {
testLayers_[i]->getOutputValue()->zeroMem(); if (id == i || id == testLayers_.size()) {
testLayers_[i]->getOutputValue()->zeroMem();
}
} }
} }
...@@ -300,16 +304,24 @@ void MKLDNNTester::runOnce() { ...@@ -300,16 +304,24 @@ void MKLDNNTester::runOnce() {
checkForward(); checkForward();
// test backward // test backward
// simple updater
UpdateCallback updateCallback = [](Parameter* para) {
auto& grad = para->getBuf(PARAMETER_GRADIENT);
auto& value = para->getBuf(PARAMETER_VALUE);
real lr = 1e-3;
value->add(*grad, lr);
};
randomTopDiffs(); randomTopDiffs();
dnnLayer_->backward(nullptr); dnnLayer_->backward(updateCallback);
refLayer_->backward(nullptr); refLayer_->backward(updateCallback);
checkBackwardData(); checkBackwardData();
checkBackwardWgts(); checkBackwardWgts();
// clear buffers // clear buffers
// ref code will addto the diff, dnn code will writeto it // ref code will addto the diff, dnn code will writeto it
// and clearTopDatas() and clearWgtDiffs() should be coverd by test layers // and clearTopDatas(REF) should be coverd by ref layers
clearBotDiffs(REF); clearBotDiffs(REF);
clearWgtDiffs(REF);
} }
void MKLDNNTester::run(const TestConfig& dnn, void MKLDNNTester::run(const TestConfig& dnn,
......
...@@ -18,6 +18,7 @@ limitations under the License. */ ...@@ -18,6 +18,7 @@ limitations under the License. */
#include <vector> #include <vector>
#include "LayerGradUtil.h" #include "LayerGradUtil.h"
#include "paddle/gserver/layers/MKLDNNBase.h" #include "paddle/gserver/layers/MKLDNNBase.h"
#include "paddle/gserver/layers/MKLDNNLayer.h"
namespace paddle { namespace paddle {
...@@ -40,7 +41,8 @@ protected: ...@@ -40,7 +41,8 @@ protected:
vector<LayerMap> layerMaps_; vector<LayerMap> layerMaps_;
vector<vector<ParameterPtr>> parameters_; vector<vector<ParameterPtr>> parameters_;
vector<LayerPtr> testLayers_; vector<LayerPtr> testLayers_;
LayerPtr dnnLayer_, refLayer_; LayerPtr refLayer_;
MKLDNNLayerPtr dnnLayer_;
/// run some iterations, all the result should pass /// run some iterations, all the result should pass
size_t iter_; size_t iter_;
...@@ -88,10 +90,10 @@ private: ...@@ -88,10 +90,10 @@ private:
void checkBackwardData(); void checkBackwardData();
void checkBackwardWgts(); void checkBackwardWgts();
void clearWgtDiffs(); // clear specific layer, clear all when id equals NUM
void clearBotDiffs(); void clearWgtDiffs(size_t id = NUM);
void clearBotDiffs(int n); // clear specific layer void clearBotDiffs(size_t id = NUM);
void clearTopDatas(); void clearTopDatas(size_t id = NUM);
void printTopDatas(); void printTopDatas();
void printMatrix(const MatrixPtr& m); void printMatrix(const MatrixPtr& m);
......
...@@ -1703,6 +1703,55 @@ TEST(Layer, BatchNormalizationLayer) { ...@@ -1703,6 +1703,55 @@ TEST(Layer, BatchNormalizationLayer) {
#endif #endif
} }
void testBatchNorm3DLayer(const string& type, bool trans, bool useGpu) {
TestConfig config;
const int CHANNELS = 10;
const int IMG_SIZE = 16;
const int IMG_SIZE_Y = 8;
const int IMG_SIZE_Z = 8;
size_t size = CHANNELS * IMG_SIZE * IMG_SIZE_Y * IMG_SIZE_Z;
config.layerConfig.set_type(type);
config.layerConfig.set_size(size);
config.layerConfig.set_active_type("sigmoid");
config.biasSize = CHANNELS;
config.inputDefs.push_back({INPUT_DATA,
"layer_0",
/* dim= */ size,
/* paraSize= */ CHANNELS});
config.inputDefs.push_back({INPUT_DATA, "layer_1_running_mean", 1, CHANNELS});
config.inputDefs.back().isStatic = true;
config.inputDefs.push_back({INPUT_DATA, "layer_2_running_var", 1, CHANNELS});
config.inputDefs.back().isStatic = true;
LayerInputConfig* input = config.layerConfig.add_inputs();
config.layerConfig.add_inputs();
config.layerConfig.add_inputs();
ImageConfig* img_conf = input->mutable_image_conf();
img_conf->set_channels(CHANNELS);
img_conf->set_img_size(IMG_SIZE);
img_conf->set_img_size_y(IMG_SIZE_Y);
img_conf->set_img_size_z(IMG_SIZE_Z);
testLayerGrad(config,
"batch_norm",
64,
/* trans= */ trans,
useGpu,
/* useWeight */ true);
}
TEST(Layer, testBatchNorm3DLayer) {
testBatchNorm3DLayer("batch_norm", false, false);
#ifndef PADDLE_ONLY_CPU
testBatchNorm3DLayer("batch_norm", false, true);
if (hl_get_cudnn_lib_version() >= int(4000)) {
testBatchNorm3DLayer("cudnn_batch_norm", false, true);
}
#endif
}
void testConvOperator(bool isDeconv) { void testConvOperator(bool isDeconv) {
TestConfig config; TestConfig config;
const int NUM_FILTERS = 16; const int NUM_FILTERS = 16;
...@@ -2019,10 +2068,10 @@ TEST(Layer, SwitchOrderLayer) { ...@@ -2019,10 +2068,10 @@ TEST(Layer, SwitchOrderLayer) {
img->set_img_size_y(16); img->set_img_size_y(16);
ReshapeConfig* reshape = config.layerConfig.mutable_reshape_conf(); ReshapeConfig* reshape = config.layerConfig.mutable_reshape_conf();
reshape->add_heightaxis(0); reshape->add_height_axis(0);
reshape->add_heightaxis(1); reshape->add_height_axis(1);
reshape->add_heightaxis(2); reshape->add_height_axis(2);
reshape->add_widthaxis(3); reshape->add_width_axis(3);
// config softmax layer // config softmax layer
config.layerConfig.set_type("switch_order"); config.layerConfig.set_type("switch_order");
...@@ -2253,26 +2302,27 @@ void test3DDeConvLayer(const string& type, bool trans, bool useGpu) { ...@@ -2253,26 +2302,27 @@ void test3DDeConvLayer(const string& type, bool trans, bool useGpu) {
conv->set_stride(2); conv->set_stride(2);
conv->set_stride_y(2); conv->set_stride_y(2);
conv->set_stride_z(2); conv->set_stride_z(2);
conv->set_img_size(IMAGE_SIZE); conv->set_output_x(IMAGE_SIZE);
conv->set_img_size_y(IMAGE_SIZE_Y); conv->set_output_y(IMAGE_SIZE_Y);
conv->set_img_size_z(IMAGE_SIZE_Z); conv->set_output_z(IMAGE_SIZE_Z);
conv->set_output_x(imageSize(conv->img_size(),
conv->set_img_size(imageSize(conv->output_x(),
conv->filter_size(), conv->filter_size(),
conv->padding(), conv->padding(),
conv->stride(), conv->stride(),
true)); true));
conv->set_output_y(imageSize(conv->img_size_y(), conv->set_img_size_y(imageSize(conv->output_y(),
conv->filter_size_y(), conv->filter_size_y(),
conv->padding_y(), conv->padding_y(),
conv->stride_y(), conv->stride_y(),
true)); true));
conv->set_output_z(imageSize(conv->img_size_z(), conv->set_img_size_z(imageSize(conv->output_z(),
conv->filter_size_z(), conv->filter_size_z(),
conv->padding_z(), conv->padding_z(),
conv->stride_z(), conv->stride_z(),
true)); true));
config.layerConfig.set_size(conv->output_x() * conv->output_y() * config.layerConfig.set_size(conv->img_size() * conv->img_size_y() *
conv->output_z() * NUM_FILTERS); conv->img_size_z() * NUM_FILTERS);
conv->set_groups(1); conv->set_groups(1);
conv->set_filter_channels(conv->channels() / conv->groups()); conv->set_filter_channels(conv->channels() / conv->groups());
config.inputDefs.push_back( config.inputDefs.push_back(
......
...@@ -33,14 +33,12 @@ MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m, memory::primitive_desc pd) { ...@@ -33,14 +33,12 @@ MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m, memory::primitive_desc pd) {
size_t width = cnts / dims[0]; size_t width = cnts / dims[0];
m = Matrix::create(height, width, false, false); m = Matrix::create(height, width, false, false);
} }
CHECK(m) << " Matrix should not be empty"; CHECK(m) << " Matrix should not be empty";
CpuMatrixPtr cpuMatrix = std::dynamic_pointer_cast<CpuMatrix>(m); CpuMatrixPtr cpuMatrix = std::dynamic_pointer_cast<CpuMatrix>(m);
CHECK(cpuMatrix) << "Only support create from CPU matrix yet"; CHECK(cpuMatrix) << "Only support create from CPU matrix yet";
CHECK_EQ(cpuMatrix->getElementCnt(), cnts) << "Count size does not match";
CHECK_EQ(cnts, m->getElementCnt()) << "Count size does not match"; return std::make_shared<MKLDNNMatrix>(cpuMatrix, pd);
return std::make_shared<MKLDNNMatrix>(
m->getData(), m->getHeight(), m->getWidth(), pd);
} }
MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m, MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m,
...@@ -138,7 +136,7 @@ void MKLDNNMatrix::downSpatial() { ...@@ -138,7 +136,7 @@ void MKLDNNMatrix::downSpatial() {
mkldnn_primitive_create(&result, pd.get(), nullptr, nullptr), mkldnn_primitive_create(&result, pd.get(), nullptr, nullptr),
"could not create a memory primitive"); "could not create a memory primitive");
reset(result); reset(result);
set_data_handle(getData()); set_data_handle(data_);
} }
} // namespace paddle } // namespace paddle
...@@ -30,11 +30,10 @@ typedef std::shared_ptr<MKLDNNMatrix> MKLDNNMatrixPtr; ...@@ -30,11 +30,10 @@ typedef std::shared_ptr<MKLDNNMatrix> MKLDNNMatrixPtr;
*/ */
class MKLDNNMatrix : public CpuMatrix, public mkldnn::memory { class MKLDNNMatrix : public CpuMatrix, public mkldnn::memory {
public: public:
MKLDNNMatrix(real* data, MKLDNNMatrix(CpuMatrixPtr m, mkldnn::memory::primitive_desc pd)
size_t height, : CpuMatrix(m->getData(), m->getHeight(), m->getWidth(), false),
size_t width, mkldnn::memory(pd, m->getData()),
mkldnn::memory::primitive_desc pd) m_(m) {}
: CpuMatrix(data, height, width, false), mkldnn::memory(pd, data) {}
~MKLDNNMatrix() {} ~MKLDNNMatrix() {}
...@@ -81,11 +80,29 @@ public: ...@@ -81,11 +80,29 @@ public:
void downSpatial(); void downSpatial();
/** /**
* Update the memory data handle. * set the memory data handle.
* Caution: This will not check the buffer size of the data, * Caution: This will not check the buffer size of the data,
* it should be coverd by user. * it should be coverd by user.
*/ */
void updateData(void* data) { set_data_handle(data); } void setData(real* data) {
set_data_handle(data);
CpuMatrix::setData(data);
m_.reset();
}
/**
* override Matrix::getData
* check data before return
*/
real* getData() override {
CHECK_EQ((void*)data_, get_data_handle());
return data_;
}
const real* getData() const override {
CHECK_EQ((void*)data_, get_data_handle());
return data_;
}
/** /**
* Get primitive descriptor. * Get primitive descriptor.
...@@ -143,6 +160,10 @@ protected: ...@@ -143,6 +160,10 @@ protected:
memory::format srcFmt, memory::format srcFmt,
memory::format dstFmt, memory::format dstFmt,
memory::dims dm); memory::dims dm);
private:
// save the CpuMatrixPtr in case the buffer released outside
CpuMatrixPtr m_;
}; };
} // namespace paddle } // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/concat_op.h"
#include <vector>
namespace paddle {
namespace operators {
using framework::Tensor;
class ConcatOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
auto ins = ctx.MultiInput<framework::Tensor>("X");
auto *out = ctx.Output<framework::Tensor>("Out");
size_t axis = static_cast<size_t>(ctx.Attr<int>("axis"));
size_t n = ins.size();
PADDLE_ENFORCE_GT(n, 1, "Input tensors count should > 1.");
auto out_dims = ins[0]->dims();
size_t in_zero_dims_size = out_dims.size();
for (size_t i = 1; i < n; i++) {
for (size_t j = 0; j < in_zero_dims_size; j++) {
if (j == axis) {
out_dims[axis] += ins[i]->dims()[j];
continue;
}
PADDLE_ENFORCE_EQ(out_dims[j], ins[i]->dims()[j],
"Input tensors should have the same "
"elements except the specify axis.")
}
}
out->Resize(out_dims);
}
};
class ConcatOpMaker : public framework::OpProtoAndCheckerMaker {
public:
ConcatOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "the input tensors of concat operator.").AsDuplicable();
AddOutput("Out", "the output tensor of concat operator.");
AddComment(R"DOC(
Join the input tensors along with the axis.
Examples:
Input[0] = [[1,2],[3,4]]
Input[1] = [[5,6]]
axis = 0
Output = [[1,2],
[3,4],
[5,6]]
)DOC");
AddAttr<int>("axis", "The axis which the inputs will be joined with.")
.SetDefault(0);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(concat, ops::ConcatOp, ops::ConcatOpMaker)
REGISTER_OP_CPU_KERNEL(concat,
ops::ConcatKernel<paddle::platform::CPUPlace, float>)
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#define EIGEN_USE_GPU
#include "paddle/operators/concat_op.h"
namespace ops = paddle::operators;
// TODO(Yancey1989) Add GPU kernel
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
template <typename Place, typename T>
class ConcatKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto ins = ctx.MultiInput<framework::Tensor>("X");
auto* out = ctx.Output<framework::Tensor>("Out");
int64_t axis = static_cast<int64_t>(ctx.Attr<int>("axis"));
size_t n = ins.size();
size_t output_axis_dim = 0;
size_t before = 1, after = 1;
for (size_t i = 0; i < n; i++) {
output_axis_dim += ins[i]->dims()[axis];
}
auto& input_zero = ins[0];
for (int64_t i = 0; i < input_zero->dims().size(); i++) {
if (i == axis) {
continue;
}
if (i < axis) {
before *= input_zero->dims()[i];
} else {
after *= input_zero->dims()[i];
}
}
size_t output_offset = 0;
for (size_t i = 0; i < n; i++) {
auto& in = ins[i];
auto axis_dim = in->dims()[axis];
for (size_t j = 0; j < before; j++) {
size_t len = axis_dim * after * sizeof(T);
const T* src = in->data<T>() + axis_dim * after * j;
T* out_data = out->mutable_data<T>(platform::CPUPlace());
T* dest = out_data + output_offset + output_axis_dim * after * j;
memcpy(dest, src, len);
}
output_offset += axis_dim * after;
}
}
};
} // namespace operators
} // namespace paddle
...@@ -42,7 +42,7 @@ class CosSimKernel : public framework::OpKernel { ...@@ -42,7 +42,7 @@ class CosSimKernel : public framework::OpKernel {
output_y_norm->mutable_data<T>(context.GetPlace()); output_y_norm->mutable_data<T>(context.GetPlace());
auto dims = input_x->dims(); auto dims = input_x->dims();
int size = static_cast<int>(framework::product(dims)); int64_t size = input_x->numel();
auto new_dims = framework::make_ddim({dims[0], size / dims[0]}); auto new_dims = framework::make_ddim({dims[0], size / dims[0]});
auto x = EigenMatrix<T>::From(*input_x, new_dims); auto x = EigenMatrix<T>::From(*input_x, new_dims);
auto y = EigenMatrix<T>::From(*input_y, new_dims); auto y = EigenMatrix<T>::From(*input_y, new_dims);
...@@ -72,7 +72,7 @@ class CosSimGradKernel : public framework::OpKernel { ...@@ -72,7 +72,7 @@ class CosSimGradKernel : public framework::OpKernel {
auto* input_grad_z = context.Input<Tensor>(framework::GradVarName("Out")); auto* input_grad_z = context.Input<Tensor>(framework::GradVarName("Out"));
auto dims = input_x->dims(); auto dims = input_x->dims();
int size = static_cast<int>(framework::product(dims)); int64_t size = input_x->numel();
auto new_dims = framework::make_ddim({dims[0], size / dims[0]}); auto new_dims = framework::make_ddim({dims[0], size / dims[0]});
auto x = EigenMatrix<T>::From(*input_x, new_dims); auto x = EigenMatrix<T>::From(*input_x, new_dims);
auto y = EigenMatrix<T>::From(*input_y, new_dims); auto y = EigenMatrix<T>::From(*input_y, new_dims);
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/elementwise_mul_op.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
class ElementWiseMulOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), "Input(Y) should not be null");
auto x_dim = ctx.Input<Tensor>("X")->dims();
auto y_dim = ctx.Input<Tensor>("Y")->dims();
PADDLE_ENFORCE_GE(x_dim.size(), y_dim.size(),
"Rank of first input must >= rank of second input.")
ctx.Output<Tensor>("Out")->Resize(x_dim);
}
};
class ElementWiseMulOpMaker : public framework::OpProtoAndCheckerMaker {
public:
ElementWiseMulOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The first input of elementwise mul op");
AddInput("Y", "The second input of elementwise mul op");
AddAttr<int>("axis",
R"DOC(
When shape(Y) does not equal shape(X),Y will be broadcasted
to match the shape of X and axis should be dimension index Y in X
)DOC")
.SetDefault(-1)
.EqualGreaterThan(-1);
AddOutput("Out", "The output of elementwise mul op");
AddComment(R"DOC(
Limited elementwise multiple operator.The equation is: Out = X ⊙ Y.
1. The shape of Y should be same with X or
2. Y's shape is a subset of X.
Y will be broadcasted to match the shape of X and axis should be dimension index Y in X.
example:
shape(X) = (2, 3, 4, 5), shape(Y) = (,)
shape(X) = (2, 3, 4, 5), shape(Y) = (5,)
shape(X) = (2, 3, 4, 5), shape(Y) = (4, 5)
shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
)DOC");
}
};
class ElementWiseMulOpGrad : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), "Input(Y) should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null");
auto x_dims = ctx.Input<Tensor>("X")->dims();
auto y_dims = ctx.Input<Tensor>("Y")->dims();
auto out_dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims();
auto *x_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto *y_grad = ctx.Output<Tensor>(framework::GradVarName("Y"));
PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(),
"Rank of first input must >= rank of second input.")
if (x_grad) {
x_grad->Resize(x_dims);
}
if (y_grad) {
y_grad->Resize(y_dims);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP(elementwise_mul, ops::ElementWiseMulOp, ops::ElementWiseMulOpMaker,
elementwise_mul_grad, ops::ElementWiseMulOpGrad);
REGISTER_OP_CPU_KERNEL(
elementwise_mul,
ops::ElementWiseMulKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(
elementwise_mul_grad,
ops::ElementWiseMulGradKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#define EIGEN_USE_GPU
#include "paddle/operators/elementwise_mul_op.h"
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(
elementwise_mul,
ops::ElementWiseMulKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(
elementwise_mul_grad,
ops::ElementWiseMulGradKernel<paddle::platform::GPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <iostream>
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
#include "paddle/operators/math/math_function.h"
namespace paddle {
namespace operators {
/*
* Out = X ⊙ Y
* 1. shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
* pre=2, n=3*4, post=5
* 2. shape(X) = (2, 3, 4, 5), shape(Y) = (4,5)
* pre=2*3, n=4*5, post=1
*/
inline void get_mid_dims(const framework::DDim& x_dims,
const framework::DDim& y_dims, const int axis,
int& pre, int& n, int& post) {
pre = 1;
n = 1;
post = 1;
for (int i = 0; i < axis; ++i) {
pre *= x_dims[i];
}
for (int i = 0; i < y_dims.size(); ++i) {
PADDLE_ENFORCE_EQ(x_dims[i + axis], y_dims[i],
"Broadcast dimension mismatch.");
n *= y_dims[i];
}
for (int i = axis + y_dims.size(); i < x_dims.size(); ++i) {
post *= x_dims[i];
}
}
template <typename Place, typename T>
class ElementWiseMulKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
using Tensor = framework::Tensor;
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* z = ctx.Output<Tensor>("Out");
z->mutable_data<T>(ctx.GetPlace());
auto x_e = framework::EigenVector<T>::Flatten(*x);
auto y_e = framework::EigenVector<T>::Flatten(*y);
auto z_e = framework::EigenVector<T>::Flatten(*z);
auto x_dims = x->dims();
auto y_dims = y->dims();
PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(),
"Rank of first input must >= rank of second input.")
if (x_dims == y_dims || product(y_dims) == 1) {
z_e.device(ctx.GetEigenDevice<Place>()) = x_e * y_e;
return;
}
int axis = ctx.Attr<int>("axis");
axis = (axis == -1 ? x_dims.size() - y_dims.size() : axis);
PADDLE_ENFORCE(axis >= 0 && axis < x_dims.size(),
"Axis should be in range [0, x_dims)");
int pre, n, post;
get_mid_dims(x_dims, y_dims, axis, pre, n, post);
if (post == 1) {
auto y_bcast = y_e.reshape(Eigen::DSizes<int, 2>(1, n))
.broadcast(Eigen::DSizes<int, 2>(pre, 1))
.reshape(Eigen::DSizes<int, 1>(x_e.size()));
z_e.device(ctx.GetEigenDevice<Place>()) = x_e * y_bcast;
return;
} else {
auto y_bcast = y_e.reshape(Eigen::DSizes<int, 3>(1, n, 1))
.broadcast(Eigen::DSizes<int, 3>(pre, 1, post))
.reshape(Eigen::DSizes<int, 1>(x_e.size()));
z_e.device(ctx.GetEigenDevice<Place>()) = x_e * y_bcast;
return;
}
}
};
template <typename Place, typename T>
class ElementWiseMulGradKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
using Tensor = framework::Tensor;
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto x_e = framework::EigenVector<T>::Flatten(*x);
auto y_e = framework::EigenVector<T>::Flatten(*y);
auto dout_e = framework::EigenVector<T>::Flatten(*dout);
auto x_dims = x->dims();
auto y_dims = y->dims();
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
if (dx) {
dx->mutable_data<T>(ctx.GetPlace());
}
if (dy) {
dy->mutable_data<T>(ctx.GetPlace());
}
if (x_dims == y_dims || product(y_dims) == 1) {
if (dx) {
auto dx_e = framework::EigenVector<T>::Flatten(*dx);
dx_e.device(ctx.GetEigenDevice<Place>()) = dout_e * y_e;
}
if (dy) {
auto dy_e = framework::EigenVector<T>::Flatten(*dy);
dy_e.device(ctx.GetEigenDevice<Place>()) = x_e * dout_e;
}
return;
}
int axis = ctx.Attr<int>("axis");
axis = (axis == -1 ? x_dims.size() - y_dims.size() : axis);
int pre, n, post;
get_mid_dims(x_dims, y_dims, axis, pre, n, post);
// TODO(gongweibao): wrap reshape to a function.
if (post == 1) {
auto y_e_bcast = y_e.reshape(Eigen::DSizes<int, 2>(1, n))
.broadcast(Eigen::DSizes<int, 2>(pre, 1))
.reshape(Eigen::DSizes<int, 1>(x_e.size()));
if (dx) {
auto dx_e = framework::EigenVector<T>::Flatten(*dx);
dx_e.device(ctx.GetEigenDevice<Place>()) = dout_e * y_e_bcast;
}
if (dy) {
auto dy_e = framework::EigenVector<T>::Flatten(*dy);
dy_e.device(ctx.GetEigenDevice<Place>()) =
(x_e * dout_e)
.reshape(Eigen::DSizes<int, 2>(pre, n))
.sum(Eigen::array<int, 1>{{0}});
}
return;
} else {
auto y_e_bcast = y_e.reshape(Eigen::DSizes<int, 3>(1, n, 1))
.broadcast(Eigen::DSizes<int, 3>(pre, 1, post))
.reshape(Eigen::DSizes<int, 1>(x_e.size()));
if (dx) {
auto dx_e = framework::EigenVector<T>::Flatten(*dx);
dx_e.device(ctx.GetEigenDevice<Place>()) = dout_e * y_e_bcast;
}
if (dy) {
auto dy_e = framework::EigenVector<T>::Flatten(*dy);
dy_e.device(ctx.GetEigenDevice<Place>()) =
(x_e * dout_e)
.reshape(Eigen::DSizes<int, 3>(pre, n, post))
.sum(Eigen::array<int, 2>{{0, 2}});
}
return;
}
}
};
} // namespace operators
} // namespace paddle
...@@ -31,7 +31,7 @@ class CPUGaussianRandomKernel : public framework::OpKernel { ...@@ -31,7 +31,7 @@ class CPUGaussianRandomKernel : public framework::OpKernel {
} }
engine.seed(seed); engine.seed(seed);
std::normal_distribution<T> dist(mean, std); std::normal_distribution<T> dist(mean, std);
int64_t size = framework::product(tensor->dims()); int64_t size = tensor->numel();
for (int64_t i = 0; i < size; ++i) { for (int64_t i = 0; i < size; ++i) {
data[i] = dist(engine); data[i] = dist(engine);
} }
......
...@@ -50,8 +50,8 @@ class GPUGaussianRandomKernel : public framework::OpKernel { ...@@ -50,8 +50,8 @@ class GPUGaussianRandomKernel : public framework::OpKernel {
T mean = static_cast<T>(context.Attr<float>("mean")); T mean = static_cast<T>(context.Attr<float>("mean"));
T std = static_cast<T>(context.Attr<float>("std")); T std = static_cast<T>(context.Attr<float>("std"));
thrust::counting_iterator<unsigned int> index_sequence_begin(0); thrust::counting_iterator<unsigned int> index_sequence_begin(0);
ssize_t N = framework::product(tensor->dims()); int64_t size = tensor->numel();
thrust::transform(index_sequence_begin, index_sequence_begin + N, thrust::transform(index_sequence_begin, index_sequence_begin + size,
thrust::device_ptr<T>(data), thrust::device_ptr<T>(data),
GaussianGenerator<T>(mean, std, seed)); GaussianGenerator<T>(mean, std, seed));
} }
......
...@@ -70,7 +70,7 @@ class LookupTableCUDAKernel : public framework::OpKernel { ...@@ -70,7 +70,7 @@ class LookupTableCUDAKernel : public framework::OpKernel {
size_t N = table_t->dims()[0]; size_t N = table_t->dims()[0];
size_t D = table_t->dims()[1]; size_t D = table_t->dims()[1];
size_t K = product(ids_t->dims()); size_t K = ids_t->numel();
auto ids = ids_t->data<int32_t>(); auto ids = ids_t->data<int32_t>();
auto table = table_t->data<T>(); auto table = table_t->data<T>();
auto output = output_t->mutable_data<T>(context.GetPlace()); auto output = output_t->mutable_data<T>(context.GetPlace());
...@@ -91,7 +91,7 @@ class LookupTableGradCUDAKernel : public framework::OpKernel { ...@@ -91,7 +91,7 @@ class LookupTableGradCUDAKernel : public framework::OpKernel {
int N = d_table_t->dims()[0]; int N = d_table_t->dims()[0];
int D = d_table_t->dims()[1]; int D = d_table_t->dims()[1];
int K = product(ids_t->dims()); int K = ids_t->numel();
const int32_t* ids = ids_t->data<int32_t>(); const int32_t* ids = ids_t->data<int32_t>();
const T* d_output = d_output_t->data<T>(); const T* d_output = d_output_t->data<T>();
T* d_table = d_table_t->mutable_data<T>(context.GetPlace()); T* d_table = d_table_t->mutable_data<T>(context.GetPlace());
......
...@@ -35,7 +35,7 @@ class LookupTableKernel : public framework::OpKernel { ...@@ -35,7 +35,7 @@ class LookupTableKernel : public framework::OpKernel {
auto ids = ids_t->data<int32_t>(); auto ids = ids_t->data<int32_t>();
auto table = table_t->data<T>(); auto table = table_t->data<T>();
auto output = output_t->mutable_data<T>(context.GetPlace()); auto output = output_t->mutable_data<T>(context.GetPlace());
for (ssize_t i = 0; i < product(ids_t->dims()); ++i) { for (int64_t i = 0; i < ids_t->numel(); ++i) {
PADDLE_ENFORCE_LT(ids[i], N); PADDLE_ENFORCE_LT(ids[i], N);
PADDLE_ENFORCE_GE(ids[i], 0); PADDLE_ENFORCE_GE(ids[i], 0);
memcpy(output + i * D, table + ids[i] * D, D * sizeof(T)); memcpy(output + i * D, table + ids[i] * D, D * sizeof(T));
...@@ -61,7 +61,7 @@ class LookupTableGradKernel : public framework::OpKernel { ...@@ -61,7 +61,7 @@ class LookupTableGradKernel : public framework::OpKernel {
t.device(context.GetEigenDevice<platform::CPUPlace>()) = t.device(context.GetEigenDevice<platform::CPUPlace>()) =
t.constant(static_cast<T>(0)); t.constant(static_cast<T>(0));
for (ssize_t i = 0; i < product(ids_t->dims()); ++i) { for (int64_t i = 0; i < ids_t->numel(); ++i) {
PADDLE_ENFORCE_LT(ids[i], N); PADDLE_ENFORCE_LT(ids[i], N);
PADDLE_ENFORCE_GE(ids[i], 0); PADDLE_ENFORCE_GE(ids[i], 0);
for (int j = 0; j < D; ++j) { for (int j = 0; j < D; ++j) {
......
...@@ -119,4 +119,4 @@ TEST(math, im2col) { ...@@ -119,4 +119,4 @@ TEST(math, im2col) {
#ifndef PADDLE_ONLY_CPU #ifndef PADDLE_ONLY_CPU
testIm2col<paddle::platform::GPUPlace>(); testIm2col<paddle::platform::GPUPlace>();
#endif #endif
} }
\ No newline at end of file
...@@ -49,12 +49,11 @@ class MeanGradKernel : public framework::OpKernel { ...@@ -49,12 +49,11 @@ class MeanGradKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto OG = context.Input<Tensor>(framework::GradVarName("Out")); auto OG = context.Input<Tensor>(framework::GradVarName("Out"));
PADDLE_ENFORCE(framework::product(OG->dims()) == 1, PADDLE_ENFORCE(OG->numel() == 1, "Mean Gradient should be scalar");
"Mean Gradient should be scalar");
auto IG = context.Output<Tensor>(framework::GradVarName("X")); auto IG = context.Output<Tensor>(framework::GradVarName("X"));
IG->mutable_data<T>(context.GetPlace()); IG->mutable_data<T>(context.GetPlace());
T ig_size = (T)framework::product(IG->dims()); T ig_size = static_cast<T>(IG->numel());
Eigen::DSizes<int, 1> bcast(ig_size); Eigen::DSizes<int, 1> bcast(ig_size);
EigenVector<T>::Flatten(*IG).device(context.GetEigenDevice<Place>()) = EigenVector<T>::Flatten(*IG).device(context.GetEigenDevice<Place>()) =
......
...@@ -31,8 +31,7 @@ class MinusOp : public framework::OperatorWithKernel { ...@@ -31,8 +31,7 @@ class MinusOp : public framework::OperatorWithKernel {
auto *right_tensor = ctx.Input<framework::Tensor>("Y"); auto *right_tensor = ctx.Input<framework::Tensor>("Y");
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
framework::product(left_tensor->dims()), left_tensor->numel(), right_tensor->numel(),
framework::product(right_tensor->dims()),
"Minus operator must take two tensor with same num of elements"); "Minus operator must take two tensor with same num of elements");
ctx.Output<framework::Tensor>("Out")->Resize(left_tensor->dims()); ctx.Output<framework::Tensor>("Out")->Resize(left_tensor->dims());
} }
......
...@@ -31,11 +31,10 @@ class ModifiedHuberLossOp : public framework::OperatorWithKernel { ...@@ -31,11 +31,10 @@ class ModifiedHuberLossOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_EQ(x->dims(), y->dims(), PADDLE_ENFORCE_EQ(x->dims(), y->dims(),
"Dimensions of X and Y must be the same."); "Dimensions of X and Y must be the same.");
PADDLE_ENFORCE_EQ(framework::arity(x->dims()), 2, PADDLE_ENFORCE_EQ(x->dims().size(), 2, "Tensor rank of X must be 2.");
"Tensor rank of X must be 2."); PADDLE_ENFORCE_EQ(x->dims()[1], 1, "2nd dimension of X must be 1.");
PADDLE_ENFORCE_EQ(x->dims()[1], 1, "Second dimension of X must be 1.");
context.Output<Tensor>("intermediate_val")->Resize(x->dims()); context.Output<Tensor>("IntermediateVal")->Resize(x->dims());
context.Output<Tensor>("Out")->Resize({x->dims()[0], 1}); context.Output<Tensor>("Out")->Resize({x->dims()[0], 1});
} }
}; };
...@@ -47,7 +46,7 @@ class ModifiedHuberLossOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -47,7 +46,7 @@ class ModifiedHuberLossOpMaker : public framework::OpProtoAndCheckerMaker {
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input value of ModifiedHuberLossOp."); AddInput("X", "Input value of ModifiedHuberLossOp.");
AddInput("Y", "Target labels of ModifiedHuberLossOp."); AddInput("Y", "Target labels of ModifiedHuberLossOp.");
AddOutput("intermediate_val", AddOutput("IntermediateVal",
"Variable to save intermediate result which will be reused in " "Variable to save intermediate result which will be reused in "
"backward processing.") "backward processing.")
.AsIntermediate(); .AsIntermediate();
...@@ -75,7 +74,7 @@ class ModifiedHuberLossGradOp : public framework::OperatorWithKernel { ...@@ -75,7 +74,7 @@ class ModifiedHuberLossGradOp : public framework::OperatorWithKernel {
void InferShape(const framework::InferShapeContext& context) const override { void InferShape(const framework::InferShapeContext& context) const override {
auto* x = context.Input<Tensor>("X"); auto* x = context.Input<Tensor>("X");
auto* y = context.Input<Tensor>("Y"); auto* y = context.Input<Tensor>("Y");
auto* intermediate_val = context.Input<Tensor>("intermediate_val"); auto* intermediate_val = context.Input<Tensor>("IntermediateVal");
auto* out_grad = context.Input<Tensor>(framework::GradVarName("Out")); auto* out_grad = context.Input<Tensor>(framework::GradVarName("Out"));
auto* x_grad = context.Output<Tensor>(framework::GradVarName("X")); auto* x_grad = context.Output<Tensor>(framework::GradVarName("X"));
......
...@@ -43,7 +43,7 @@ class ModifiedHuberLossGradGPUKernel : public framework::OpKernel { ...@@ -43,7 +43,7 @@ class ModifiedHuberLossGradGPUKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* in0 = context.Input<Tensor>("Y"); auto* in0 = context.Input<Tensor>("Y");
auto* in1 = context.Input<Tensor>("intermediate_val"); auto* in1 = context.Input<Tensor>("IntermediateVal");
auto* in2 = context.Input<Tensor>(framework::GradVarName("Out")); auto* in2 = context.Input<Tensor>(framework::GradVarName("Out"));
auto* out0 = context.Output<Tensor>(framework::GradVarName("X")); auto* out0 = context.Output<Tensor>(framework::GradVarName("X"));
......
...@@ -52,7 +52,7 @@ class ModifiedHuberLossKernel : public framework::OpKernel { ...@@ -52,7 +52,7 @@ class ModifiedHuberLossKernel : public framework::OpKernel {
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* in0 = context.Input<Tensor>("X"); auto* in0 = context.Input<Tensor>("X");
auto* in1 = context.Input<Tensor>("Y"); auto* in1 = context.Input<Tensor>("Y");
auto* out0 = context.Output<Tensor>("intermediate_val"); auto* out0 = context.Output<Tensor>("IntermediateVal");
auto* out1 = context.Output<Tensor>("Out"); auto* out1 = context.Output<Tensor>("Out");
out0->mutable_data<T>(context.GetPlace()); out0->mutable_data<T>(context.GetPlace());
...@@ -77,7 +77,7 @@ class ModifiedHuberLossGradCPUKernel : public framework::OpKernel { ...@@ -77,7 +77,7 @@ class ModifiedHuberLossGradCPUKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* in0 = context.Input<Tensor>("Y"); auto* in0 = context.Input<Tensor>("Y");
auto* in1 = context.Input<Tensor>("intermediate_val"); auto* in1 = context.Input<Tensor>("IntermediateVal");
auto* in2 = context.Input<Tensor>(framework::GradVarName("Out")); auto* in2 = context.Input<Tensor>(framework::GradVarName("Out"));
auto* out0 = context.Output<Tensor>(framework::GradVarName("X")); auto* out0 = context.Output<Tensor>(framework::GradVarName("X"));
......
...@@ -25,18 +25,27 @@ class MulOp : public framework::OperatorWithKernel { ...@@ -25,18 +25,27 @@ class MulOp : public framework::OperatorWithKernel {
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(const framework::InferShapeContext &ctx) const override {
auto dim0 = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx.Input<Tensor>("X")->dims();
auto dim1 = ctx.Input<Tensor>("Y")->dims(); auto y_dims = ctx.Input<Tensor>("Y")->dims();
PADDLE_ENFORCE_EQ(dim0.size(), 2, int x_num_col_dims = Attr<int>("x_num_col_dims");
"input X(%s) should be a tensor with 2 dims, a matrix", int y_num_col_dims = Attr<int>("y_num_col_dims");
ctx.op().Input("X"));
PADDLE_ENFORCE_EQ(dim1.size(), 2, PADDLE_ENFORCE(x_dims.size() > x_num_col_dims,
"input Y(%s) should be a tensor with 2 dims, a matrix", "The rank of input tensor X(%s) should be larger than "
ctx.op().Input("Y")); "`mul_op`'s `x_num_col_dims`.",
ctx.op().Input("X"));
PADDLE_ENFORCE(y_dims.size() > y_num_col_dims,
"The rank of input tensor Y(%s) should be larger than "
"`mul_op`'s `y_num_col_dims`.",
ctx.op().Input("Y"));
auto x_mat_dims = framework::flatten_to_2d(x_dims, x_num_col_dims);
auto y_mat_dims = framework::flatten_to_2d(y_dims, y_num_col_dims);
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
dim0[1], dim1[0], x_mat_dims[1], y_mat_dims[0],
"First matrix's width must be equal with second matrix's height."); "First matrix's width must be equal with second matrix's height.");
ctx.Output<Tensor>("Out")->Resize({dim0[0], dim1[1]}); ctx.Output<Tensor>("Out")->Resize({x_mat_dims[0], y_mat_dims[1]});
} }
}; };
...@@ -47,6 +56,23 @@ class MulOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -47,6 +56,23 @@ class MulOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput("X", "The first input of mul op"); AddInput("X", "The first input of mul op");
AddInput("Y", "The second input of mul op"); AddInput("Y", "The second input of mul op");
AddOutput("Out", "The output of mul op"); AddOutput("Out", "The output of mul op");
AddAttr<int>(
"x_num_col_dims",
R"DOC(mul_op can take tensors with more than two dimensions as input `X`,
in that case, tensors will be reshaped to a matrix. The matrix's first
dimension(column length) will be the product of tensor's last
`num_col_dims` dimensions, and the matrix's second dimension(row length)
will be the product of tensor's first `rank - num_col_dims` dimensions.
)DOC")
.SetDefault(1)
.EqualGreaterThan(1);
AddAttr<int>(
"y_num_col_dims",
R"DOC(mul_op can take tensors with more than two dimensions as input `Y`,
in that case, tensors will be reshaped to a matrix. Just like input `X`.
)DOC")
.SetDefault(1)
.EqualGreaterThan(1);
AddComment(R"DOC( AddComment(R"DOC(
Two Element Mul Operator. Two Element Mul Operator.
...@@ -70,10 +96,20 @@ class MulOpGrad : public framework::OperatorWithKernel { ...@@ -70,10 +96,20 @@ class MulOpGrad : public framework::OperatorWithKernel {
auto out_dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims(); auto out_dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims();
auto *x_grad = ctx.Output<Tensor>(framework::GradVarName("X")); auto *x_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto *y_grad = ctx.Output<Tensor>(framework::GradVarName("Y")); auto *y_grad = ctx.Output<Tensor>(framework::GradVarName("Y"));
PADDLE_ENFORCE(x_dims[0] == out_dims[0],
"Out@GRAD M X N must equal to X dims 0, M "); auto x_mat_dims =
PADDLE_ENFORCE(y_dims[1] == out_dims[1], framework::flatten_to_2d(x_dims, Attr<int>("x_num_col_dims"));
"Out@GRAD M X N must equal to Y dims 1, N "); auto y_mat_dims =
framework::flatten_to_2d(y_dims, Attr<int>("y_num_col_dims"));
PADDLE_ENFORCE_EQ(
x_mat_dims[0], out_dims[0],
"The first dimension of Out@GRAD must equal to the first dimension of "
"the first operand.");
PADDLE_ENFORCE_EQ(
y_mat_dims[1], out_dims[1],
"The second dimension of Out@GRAD must equal to the second "
"dimension of the second operand.");
if (x_grad) x_grad->Resize(x_dims); if (x_grad) x_grad->Resize(x_dims);
if (y_grad) y_grad->Resize(y_dims); if (y_grad) y_grad->Resize(y_dims);
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. You may not use this file except in compliance with the License.
You may obtain a copy of the License at You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
...@@ -31,13 +31,25 @@ template <typename Place, typename T> ...@@ -31,13 +31,25 @@ template <typename Place, typename T>
class MulKernel : public framework::OpKernel { class MulKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* x = context.Input<Tensor>("X"); const Tensor* x = context.Input<Tensor>("X");
auto* y = context.Input<Tensor>("Y"); const Tensor* y = context.Input<Tensor>("Y");
auto* z = context.Output<Tensor>("Out"); Tensor* z = context.Output<Tensor>("Out");
const Tensor x_matrix =
x->dims().size() > 2
? framework::ReshapeToMatrix<T>(
*x, context.template Attr<int>("x_num_col_dims"))
: *x;
const Tensor y_matrix =
y->dims().size() > 2
? framework::ReshapeToMatrix<T>(
*y, context.template Attr<int>("y_num_col_dims"))
: *y;
z->mutable_data<T>(context.GetPlace()); z->mutable_data<T>(context.GetPlace());
auto* device_context = auto* device_context =
const_cast<platform::DeviceContext*>(context.device_context_); const_cast<platform::DeviceContext*>(context.device_context_);
math::matmul<Place, T>(*x, false, *y, false, 1, z, 0, device_context); math::matmul<Place, T>(x_matrix, false, y_matrix, false, 1, z, 0,
device_context);
} }
}; };
...@@ -45,23 +57,39 @@ template <typename Place, typename T> ...@@ -45,23 +57,39 @@ template <typename Place, typename T>
class MulGradKernel : public framework::OpKernel { class MulGradKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X"); int x_num_col_dims = ctx.template Attr<int>("x_num_col_dims");
auto* y = ctx.Input<Tensor>("Y"); int y_num_col_dims = ctx.template Attr<int>("y_num_col_dims");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out")); const Tensor* x = ctx.Input<Tensor>("X");
const Tensor* y = ctx.Input<Tensor>("Y");
const Tensor x_matrix =
x->dims().size() > 2 ? framework::ReshapeToMatrix<T>(*x, x_num_col_dims)
: *x;
const Tensor y_matrix =
y->dims().size() > 2 ? framework::ReshapeToMatrix<T>(*y, y_num_col_dims)
: *y;
const Tensor* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X")); Tensor* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y")); Tensor* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
auto* device_context = auto* device_context =
const_cast<platform::DeviceContext*>(ctx.device_context_); const_cast<platform::DeviceContext*>(ctx.device_context_);
if (dx) { if (dx) {
dx->mutable_data<T>(ctx.GetPlace()); dx->mutable_data<T>(ctx.GetPlace());
Tensor dx_matrix = dx->dims().size() > 2 ? framework::ReshapeToMatrix<T>(
*dx, x_num_col_dims)
: *dx;
// dx = dout * y'. dx: M x K, dout : M x N, y : K x N // dx = dout * y'. dx: M x K, dout : M x N, y : K x N
math::matmul<Place, T>(*dout, false, *y, true, 1, dx, 0, device_context); math::matmul<Place, T>(*dout, false, y_matrix, true, 1, &dx_matrix, 0,
device_context);
} }
if (dy) { if (dy) {
dy->mutable_data<T>(ctx.GetPlace()); dy->mutable_data<T>(ctx.GetPlace());
Tensor dy_matrix = dy->dims().size() > 2 ? framework::ReshapeToMatrix<T>(
*dy, y_num_col_dims)
: *dy;
// dy = x' * dout. dy K x N, dout : M x N, x : M x K // dy = x' * dout. dy K x N, dout : M x N, x : M x K
math::matmul<Place, T>(*x, true, *dout, false, 1, dy, 0, device_context); math::matmul<Place, T>(x_matrix, true, *dout, false, 1, &dy_matrix, 0,
device_context);
} }
} }
}; };
......
## Operator's Parameter Name Convention
To make the operator document itself more clear, we recommend operator names obey the listing conventions.
### OpProtoMaker names
When defining an operator in Paddle, a corresponding [OpProtoMaker](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L170) (TODO: OpProtoMaker Doc)need to be defined. All the Input/Output and Attributes will write into the [OpProto](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L61) , and will be used in client language to create operator.
- Input/Output.
- Input/Output names follow the **CamelCase**. e.g. `X`, `Y`, `Matrix`, `LastAxisInMatrix`. Input/Output much more like Variables, we prefer to meaningful English words.
- If an operator's Input/Output are tensors in math, not match to any meaningful words, input name should starts from `X`. e.g. `X`, `Y`, and output name should starts from `Out`. e.g. `Out`. This rule intends making operators which have few inputs/outputs unified.
- Attribute.
- Attribute name follows the **camelCase**. e.g. `x`, `y`, `axis`, `rowwiseMatrix`. Also, attribute name prefers to meaningful English words.
- Comments.
- Input/Output/Attr comment follow the format of **(type,default value) usage**, corresponding to which type it can be and how it will be used in the operator. e.g. Attribute in Accumulator`"gamma" `,`(float, default 1.0) Accumulation multiplier`.
- Operator comment format of` R"DOC(your comment here)DOC"`. You should explain the input/output of the operator first. If there is math calculation in this operator, you should write the equation in the comment. e.g. `Out = X + Y`.
- Order.
- Follow the order of Input/Output, then Attribute, then Comments. See the example in best practice.
### Best Practice
Here we give some examples to show how these rules will be used.
- The operator has one input, one output. e.g.`relu`, inputs: `X`, outputs: `Out`.
- The operator has two input, one output. e.g. `rowwise_add`, inputs : `X`, `Y`, outputs : `Out`.
- The operator contains attribute. e.g. `cosine`, inputs : `X`, `axis`, outputs : `Out`.
We give a full example of Accumulator Operator.
```c++
class AccumulateOpMaker : public framework::OpProtoAndCheckerMaker {
public:
AccumulateOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) The input tensor that has to be accumulated to the output tensor. If the output size is not the same as input size, the output tensor is first reshaped and initialized to zero, and only then, accumulation is done.");
AddOutput("Out", "(Tensor) Accumulated output tensor");
AddAttr<float>("gamma", "(float, default 1.0) Accumulation multiplier");
AddComment(R"DOC(
Accumulate operator accumulates the input tensor to the output tensor. If the
output tensor already has the right size, we add to it; otherwise, we first
initialize the output tensor to all zeros, and then do accumulation. Any
further calls to the operator, given that no one else fiddles with the output
in the interim, will do simple accumulations.
Accumulation is done as shown:
Out = 1*X + gamma*Out
where X is the input tensor, Y is the output tensor and gamma is the multiplier
argument.
)DOC");
}
};
```
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/reshape_op.h"
namespace paddle {
namespace operators {
class ReshapeOp : public framework::OperatorWithKernel {
public:
ReshapeOp(const std::string &type, const framework::VariableNameMap &inputs,
const framework::VariableNameMap &outputs,
const framework::AttributeMap &attrs)
: OperatorWithKernel(type, inputs, outputs, attrs) {}
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
// input check
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) shouldn't be null");
auto shape = ctx.Attr<std::vector<int>>("shape");
PADDLE_ENFORCE(shape.size() > 0, "Attr(shape) shouldn't be empty.");
for (auto dim : shape) {
PADDLE_ENFORCE(dim > 0, "Each dimension of shape must be positive.");
}
// capacity check
int64_t capacity =
std::accumulate(shape.begin(), shape.end(), 1, std::multiplies<int>());
auto *in = ctx.Input<framework::Tensor>("X");
int64_t in_size = framework::product(in->dims());
PADDLE_ENFORCE_EQ(capacity, in_size,
"The size of Input(X) mismatches with Attr(shape).");
// resize output
std::vector<int64_t> shape_int64(shape.size(), 0);
std::transform(shape.begin(), shape.end(), shape_int64.begin(),
[](int a) { return static_cast<int64_t>(a); });
auto out_dims = framework::make_ddim(shape_int64);
ctx.Output<framework::Tensor>("Out")->Resize(out_dims);
}
};
class ReshapeOpMaker : public framework::OpProtoAndCheckerMaker {
public:
ReshapeOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input tensor of reshape operator.");
AddOutput("Out", "The output tensor of reshape operator.");
AddAttr<std::vector<int>>("shape", "Target shape of reshape operator.");
AddComment(R"DOC(Reshape operator
Reshape Input(X) into the shape specified by Attr(shape).
An example:
Given a 2-D tensor X with 2 rows and 2 columns
[[1, 2], [3, 4]]
with target shape = [1, 4], the reshape operator will transform
the tensor X into a 1-D tensor:
[1, 2, 3, 4]
)DOC");
}
};
class ReshapeGradOp : public framework::OperatorWithKernel {
public:
ReshapeGradOp(const std::string &type,
const framework::VariableNameMap &inputs,
const framework::VariableNameMap &outputs,
const framework::AttributeMap &attrs)
: OperatorWithKernel(type, inputs, outputs, attrs) {}
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) shouldn't be null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")),
"Input(Out@GRAD) shouldn't be null.");
auto dims = ctx.Input<framework::Tensor>("X")->dims();
auto *d_in = ctx.Output<framework::Tensor>(framework::GradVarName("X"));
d_in->Resize(dims);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP(reshape, ops::ReshapeOp, ops::ReshapeOpMaker, reshape_grad,
ops::ReshapeGradOp);
REGISTER_OP_CPU_KERNEL(reshape,
ops::ReshapeKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(
reshape_grad, ops::ReshapeGradKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/reshape_op.h"
REGISTER_OP_GPU_KERNEL(
reshape,
paddle::operators::ReshapeKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(
reshape_grad,
paddle::operators::ReshapeGradKernel<paddle::platform::GPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
template <typename Place, typename T>
class ReshapeKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const {
auto* out = ctx.Output<framework::Tensor>("Out");
auto* in = ctx.Input<framework::Tensor>("X");
out->mutable_data<T>(ctx.GetPlace());
auto shape = ctx.Attr<std::vector<int>>("shape");
std::vector<int64_t> shape_int64(shape.size(), 0);
std::transform(shape.begin(), shape.end(), shape_int64.begin(),
[](int a) { return static_cast<int64_t>(a); });
auto out_dims = framework::make_ddim(shape_int64);
out->CopyFrom<T>(*in, ctx.GetPlace());
out->Resize(out_dims);
}
};
template <typename Place, typename T>
class ReshapeGradKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const {
auto* d_out = ctx.Input<framework::Tensor>(framework::GradVarName("Out"));
auto* d_x = ctx.Output<framework::Tensor>(framework::GradVarName("X"));
d_x->mutable_data<T>(ctx.GetPlace());
auto in_dims = d_x->dims();
d_x->CopyFrom<T>(*d_out, ctx.GetPlace());
d_x->Resize(in_dims);
}
};
} // namespace operators
} // namespace paddle
...@@ -25,14 +25,19 @@ class RowwiseAddOp : public framework::OperatorWithKernel { ...@@ -25,14 +25,19 @@ class RowwiseAddOp : public framework::OperatorWithKernel {
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(const framework::InferShapeContext &ctx) const override {
auto dim0 = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx.Input<Tensor>("X")->dims();
auto dim1 = ctx.Input<Tensor>("b")->dims(); auto b_dims = ctx.Input<Tensor>("b")->dims();
PADDLE_ENFORCE_GT(
PADDLE_ENFORCE(dim0.size() == 2, "Input 0 must be matrix"); x_dims.size(), b_dims.size(),
PADDLE_ENFORCE(dim1.size() == 1, "The second input must be vector"); "The rank of input `X` must be larger than the one of input `b`.");
PADDLE_ENFORCE(dim0[1] == dim1[0], "The width of two input must be same");
PADDLE_ENFORCE(ctx.OutputSize("Out") == 1, "The output size must be 1"); int num_col_dims = x_dims.size() - b_dims.size();
ctx.Output<Tensor>("Out")->Resize(ctx.Input<Tensor>("X")->dims());
PADDLE_ENFORCE_EQ(
framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims,
"The width of two operands must be same");
PADDLE_ENFORCE_EQ(ctx.OutputSize("Out"), 1, "The output size must be 1");
ctx.Output<Tensor>("Out")->Resize(x_dims);
} }
}; };
...@@ -61,13 +66,20 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel { ...@@ -61,13 +66,20 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("b"), "b should not be null"); PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("b"), "b should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null"); "Input(Out@GRAD) should not be null");
auto dims0 = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx.Input<Tensor>("X")->dims();
auto dims1 = ctx.Input<Tensor>("b")->dims(); auto b_dims = ctx.Input<Tensor>("b")->dims();
PADDLE_ENFORCE_EQ(1, dims1.size(), "b dims should be 1") PADDLE_ENFORCE_GT(
x_dims.size(), b_dims.size(),
"The rank of input `X` must be larger than the one of input `b`.");
int num_col_dims = x_dims.size() - b_dims.size();
PADDLE_ENFORCE_EQ(
framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims,
"The width of two operands must be same");
auto *dx = ctx.Output<Tensor>(framework::GradVarName("X")); auto *dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto *db = ctx.Output<Tensor>(framework::GradVarName("b")); auto *db = ctx.Output<Tensor>(framework::GradVarName("b"));
if (dx) dx->Resize(dims0); if (dx) dx->Resize(x_dims);
if (db) db->Resize(dims1); if (db) db->Resize(b_dims);
} }
}; };
......
...@@ -33,10 +33,12 @@ class RowwiseAddKernel : public framework::OpKernel { ...@@ -33,10 +33,12 @@ class RowwiseAddKernel : public framework::OpKernel {
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto out = context.Output<Tensor>("Out"); auto out = context.Output<Tensor>("Out");
out->mutable_data<T>(context.GetPlace()); out->mutable_data<T>(context.GetPlace());
int num_col_dims = context.Input<Tensor>("X")->dims().size() -
auto input = EigenMatrix<T>::From(*context.Input<Tensor>("X")); context.Input<Tensor>("b")->dims().size();
auto bias = EigenVector<T>::From(*context.Input<Tensor>("b")); auto input =
auto output = EigenMatrix<T>::From(*out); EigenMatrix<T>::Reshape(*context.Input<Tensor>("X"), num_col_dims);
auto bias = EigenVector<T>::Flatten(*context.Input<Tensor>("b"));
auto output = EigenMatrix<T>::Reshape(*out, num_col_dims);
const int bias_size = bias.dimension(0); const int bias_size = bias.dimension(0);
const int rest_size = input.size() / bias_size; const int rest_size = input.size() / bias_size;
...@@ -54,12 +56,15 @@ class RowwiseAddGradKernel : public framework::OpKernel { ...@@ -54,12 +56,15 @@ class RowwiseAddGradKernel : public framework::OpKernel {
auto* dout = context.Input<Tensor>(framework::GradVarName("Out")); auto* dout = context.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = context.Output<Tensor>(framework::GradVarName("X")); auto* dx = context.Output<Tensor>(framework::GradVarName("X"));
auto* db = context.Output<Tensor>(framework::GradVarName("b")); auto* db = context.Output<Tensor>(framework::GradVarName("b"));
int num_col_dims = context.Input<Tensor>("X")->dims().size() -
context.Input<Tensor>("b")->dims().size();
auto out_grad = EigenMatrix<T>::From(*dout); auto out_grad = EigenMatrix<T>::Reshape(*dout, num_col_dims);
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
if (dx) { if (dx) {
dx->mutable_data<T>(context.GetPlace()); dx->mutable_data<T>(context.GetPlace());
EigenMatrix<T>::From(*dx).device(place) = out_grad; EigenMatrix<T>::Reshape(*dx, num_col_dims).device(place) = out_grad;
} }
if (db) { if (db) {
......
...@@ -41,8 +41,7 @@ class SquaredL2DistanceOp : public framework::OperatorWithKernel { ...@@ -41,8 +41,7 @@ class SquaredL2DistanceOp : public framework::OperatorWithKernel {
int rank = framework::arity(x_dims); int rank = framework::arity(x_dims);
PADDLE_ENFORCE_GE(rank, 2, "Tensor rank should be at least equal to 2."); PADDLE_ENFORCE_GE(rank, 2, "Tensor rank should be at least equal to 2.");
PADDLE_ENFORCE_EQ(framework::product(x_dims) / x_dims[0], PADDLE_ENFORCE_EQ(x->numel() / x_dims[0], y->numel() / y_dims[0],
framework::product(y_dims) / y_dims[0],
"Product of dimensions expcet the first dimension of " "Product of dimensions expcet the first dimension of "
"input and target must be equal."); "input and target must be equal.");
PADDLE_ENFORCE(y_dims[0] == 1 || y_dims[0] == x_dims[0], PADDLE_ENFORCE(y_dims[0] == 1 || y_dims[0] == x_dims[0],
...@@ -50,8 +49,7 @@ class SquaredL2DistanceOp : public framework::OperatorWithKernel { ...@@ -50,8 +49,7 @@ class SquaredL2DistanceOp : public framework::OperatorWithKernel {
"or to 1."); "or to 1.");
ctx.Output<Tensor>("sub_result") ctx.Output<Tensor>("sub_result")
->Resize({static_cast<int>(x_dims[0]), ->Resize({x_dims[0], x->numel() / x_dims[0]});
static_cast<int>(framework::product(x_dims) / x_dims[0])});
ctx.Output<Tensor>("Out")->Resize({x_dims[0], 1}); ctx.Output<Tensor>("Out")->Resize({x_dims[0], 1});
} }
}; };
......
...@@ -39,7 +39,7 @@ class SquaredL2DistanceKernel : public framework::OpKernel { ...@@ -39,7 +39,7 @@ class SquaredL2DistanceKernel : public framework::OpKernel {
auto in0_dims = in0->dims(); auto in0_dims = in0->dims();
auto in1_dims = in1->dims(); auto in1_dims = in1->dims();
int cols = framework::product(in0_dims) / in0_dims[0]; int cols = in0->numel() / in0_dims[0];
// reduce dimensions except the first // reduce dimensions except the first
auto x = auto x =
EigenMatrix<T>::From(*in0, framework::make_ddim({in0_dims[0], cols})); EigenMatrix<T>::From(*in0, framework::make_ddim({in0_dims[0], cols}));
...@@ -82,7 +82,7 @@ class SquaredL2DistanceGradKernel : public framework::OpKernel { ...@@ -82,7 +82,7 @@ class SquaredL2DistanceGradKernel : public framework::OpKernel {
auto x_dims = x_g->dims(); auto x_dims = x_g->dims();
auto y_dims = y_g->dims(); auto y_dims = y_g->dims();
int cols = framework::product(x_dims) / x_dims[0]; int cols = x_g->numel() / x_dims[0];
// calculate gradient // calculate gradient
auto grad_mat = 2 * auto grad_mat = 2 *
(out_grad.broadcast(Eigen::array<int, 2>({{1, cols}}))) * (out_grad.broadcast(Eigen::array<int, 2>({{1, cols}}))) *
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/sum_op.h"
#include <vector>
namespace paddle {
namespace operators {
using framework::Tensor;
class SumOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
auto ins = ctx.MultiInput<framework::Tensor>("X");
auto *out = ctx.Output<framework::Tensor>("Out");
int N = ins.size();
auto in_dim = ins[0]->dims();
PADDLE_ENFORCE_GT(N, 1, "Input tensors count should > 1.");
for (int i = 1; i < N; i++) {
auto dim = ins[i]->dims();
PADDLE_ENFORCE(in_dim == dim, "Input tensors must have same shape");
}
out->Resize(in_dim);
}
};
class SumOpMaker : public framework::OpProtoAndCheckerMaker {
public:
SumOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "the input tensors of sum operator.").AsDuplicable();
AddOutput("Out", "the output tensor of sum operator.");
AddComment(R"DOC(
Sum the input tensors.
)DOC");
}
};
class SumGradOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
auto outputs = ctx.MultiOutput<Tensor>(framework::GradVarName("X"));
auto dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims();
for (auto output : outputs) {
output->Resize(dims);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP(sum, ops::SumOp, ops::SumOpMaker, sum_grad, ops::SumGradOp);
REGISTER_OP_CPU_KERNEL(sum, ops::SumKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(sum_grad,
ops::SumGradKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#define EIGEN_USE_GPU
#include "paddle/operators/sum_op.h"
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(sum, ops::SumKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(sum_grad,
ops::SumGradKernel<paddle::platform::GPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
template <typename Place, typename T>
class SumKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto ins = context.MultiInput<Tensor>("X");
auto* out = context.Output<Tensor>("Out");
out->mutable_data<T>(context.GetPlace());
auto place = context.GetEigenDevice<Place>();
auto result = EigenVector<T>::Flatten(*out);
int N = ins.size();
auto in = EigenVector<T>::Flatten(*(ins[0]));
result.device(place) = in;
for (int i = 1; i < N; i++) {
auto in = EigenVector<T>::Flatten(*(ins[i]));
result.device(place) = result + in;
}
}
};
template <typename Place, typename T>
class SumGradKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto* input = context.Input<Tensor>(framework::GradVarName("Out"));
auto outs = context.MultiOutput<Tensor>(framework::GradVarName("X"));
for (auto out : outs) {
out->mutable_data<T>(context.GetPlace());
}
auto place = context.GetEigenDevice<Place>();
auto in = EigenVector<T>::Flatten(*input);
for (auto out : outs) {
auto result = EigenVector<T>::Flatten(*out);
result.device(place) = in;
}
}
};
} // namespace operators
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/top_k_op.h"
namespace paddle {
namespace operators {
class TopkOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"),
"Input of TopkOP must be initialized.");
auto *input = ctx.Input<framework::Tensor>("X");
const int k = static_cast<int>(ctx.Attr<int>("k"));
PADDLE_ENFORCE_GE(k, 1, "k must >= 1");
PADDLE_ENFORCE_GE(input->dims().size(), 1, "input must have >= 1d shape");
PADDLE_ENFORCE_GE(input->dims()[input->dims().size() - 1], k,
"input must have >= k columns");
framework::DDim dims = input->dims();
dims[dims.size() - 1] = k;
ctx.Output<Tensor>("Out")->Resize(dims);
ctx.Output<Tensor>("Indices")->Resize(dims);
}
};
class TopkOpMaker : public framework::OpProtoAndCheckerMaker {
public:
TopkOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of Topk op");
AddOutput("Out", "The output tensor of Topk op");
AddOutput("Indices", "The indices of Topk elements of input");
AddComment(
R"DOC(If the input is a vector (1d tensor), finds the k largest entries in the vector and outputs their values and indices as vectors. Thus values[j] is the j-th largest entry in input, and its index is indices[j].
For matrices, computes the top k entries in each row. )DOC");
AddAttr<int>("k",
"Number of top elements to look for along the last "
"dimension (along each row for matrices).")
.SetDefault(1);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(top_k, ops::TopkOp, ops::TopkOpMaker);
REGISTER_OP_CPU_KERNEL(top_k,
ops::TopkKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/op_registry.h"
#include "paddle/platform/assert.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T>
struct Pair {
__device__ __forceinline__ Pair() {}
__device__ __forceinline__ Pair(T value, int id) : v(value), id(id) {}
__device__ __forceinline__ void set(T value, int id) {
v = value;
id = id;
}
__device__ __forceinline__ void operator=(const Pair<T>& in) {
v = in.v;
id = in.id;
}
__device__ __forceinline__ bool operator<(const T value) const {
return (v < value);
}
__device__ __forceinline__ bool operator<(const Pair<T>& in) const {
return (v < in.v) || ((v == in.v) && (id > in.id));
}
__device__ __forceinline__ bool operator>(const Pair<T>& in) const {
return (v > in.v) || ((v == in.v) && (id < in.id));
}
T v;
int id;
};
template <typename T>
__device__ __forceinline__ void AddTo(Pair<T> topk[], const Pair<T>& p,
int beam_size) {
for (int k = beam_size - 2; k >= 0; k--) {
if (topk[k] < p) {
topk[k + 1] = topk[k];
} else {
topk[k + 1] = p;
return;
}
}
topk[0] = p;
}
template <typename T, int beam_size>
__device__ __forceinline__ void AddTo(Pair<T> topk[], const Pair<T>& p) {
for (int k = beam_size - 2; k >= 0; k--) {
if (topk[k] < p) {
topk[k + 1] = topk[k];
} else {
topk[k + 1] = p;
return;
}
}
topk[0] = p;
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* src, int idx,
int dim, int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < src[idx]) {
Pair<T> tmp(src[idx], idx);
AddTo<T>(topk, tmp, beam_size);
}
idx += BlockSize;
}
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* src, int idx,
int dim, const Pair<T>& max,
int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < src[idx]) {
Pair<T> tmp(src[idx], idx);
if (tmp < max) {
AddTo<T>(topk, tmp, beam_size);
}
}
idx += BlockSize;
}
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* val, int* col,
int idx, int dim, int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < val[idx]) {
Pair<T> tmp(val[idx], col[idx]);
AddTo<T>(topk, tmp, beam_size);
}
idx += BlockSize;
}
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* val, int* col,
int idx, int dim, const Pair<T>& max,
int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < val[idx]) {
Pair<T> tmp(val[idx], col[idx]);
if (tmp < max) {
AddTo<T>(topk, tmp, beam_size);
}
}
idx += BlockSize;
}
}
template <typename T, int MaxLength, int BlockSize>
__device__ __forceinline__ void ThreadGetTopK(Pair<T> topk[], int& beam,
int beam_size, const T* src,
bool& firstStep, bool& is_empty,
Pair<T>& max, int dim,
const int tid) {
if (beam > 0) {
int length = beam < beam_size ? beam : beam_size;
if (firstStep) {
firstStep = false;
GetTopK<T, BlockSize>(topk, src, tid, dim, length);
} else {
for (int k = 0; k < MaxLength; k++) {
if (k < MaxLength - beam) {
topk[k] = topk[k + beam];
} else {
topk[k].set(-INFINITY, -1);
}
}
if (!is_empty) {
GetTopK<T, BlockSize>(topk + MaxLength - beam, src, tid, dim, max,
length);
}
}
max = topk[MaxLength - 1];
if (max.v == -1) is_empty = true;
beam = 0;
}
}
template <typename T, int MaxLength, int BlockSize>
__device__ __forceinline__ void ThreadGetTopK(Pair<T> topk[], int& beam,
int beam_size, const T* val,
int* col, bool& firstStep,
bool& is_empty, Pair<T>& max,
int dim, const int tid) {
if (beam > 0) {
int length = beam < beam_size ? beam : beam_size;
if (firstStep) {
firstStep = false;
GetTopK<T, BlockSize>(topk, val, col, tid, dim, length);
} else {
for (int k = 0; k < MaxLength; k++) {
if (k < MaxLength - beam) {
topk[k] = topk[k + beam];
} else {
topk[k].set(-INFINITY, -1);
}
}
if (!is_empty) {
GetTopK<T, BlockSize>(topk + MaxLength - beam, val, col, tid, dim, max,
length);
}
}
max = topk[MaxLength - 1];
if (max.v == -1) is_empty = true;
beam = 0;
}
}
template <typename T, int MaxLength, int BlockSize>
__device__ __forceinline__ void BlockReduce(Pair<T>* sh_topk, int* maxid,
Pair<T> topk[], T** topVal,
int** topIds, int& beam, int& k,
const int tid, const int warp) {
while (true) {
__syncthreads();
if (tid < BlockSize / 2) {
if (sh_topk[tid] < sh_topk[tid + BlockSize / 2]) {
maxid[tid] = tid + BlockSize / 2;
} else {
maxid[tid] = tid;
}
}
__syncthreads();
for (int stride = BlockSize / 4; stride > 0; stride = stride / 2) {
if (tid < stride) {
if (sh_topk[maxid[tid]] < sh_topk[maxid[tid + stride]]) {
maxid[tid] = maxid[tid + stride];
}
}
__syncthreads();
}
__syncthreads();
if (tid == 0) {
**topVal = sh_topk[maxid[0]].v;
**topIds = sh_topk[maxid[0]].id;
(*topVal)++;
(*topIds)++;
}
if (tid == maxid[0]) beam++;
if (--k == 0) break;
__syncthreads();
if (tid == maxid[0]) {
if (beam < MaxLength) {
sh_topk[tid] = topk[beam];
}
}
if (maxid[0] / 32 == warp) {
if (__shfl(beam, (maxid[0]) % 32, 32) == MaxLength) break;
}
}
}
/**
* Each block compute one sample.
* In a block:
* 1. every thread get top MaxLength value;
* 2. merge to sh_topk, block reduce and get max value;
* 3. go to the second setp, until one thread's topk value is null;
* 4. go to the first setp, until get the topk value.
*/
template <typename T, int MaxLength, int BlockSize>
__global__ void KeMatrixTopK(T* output, int output_stride, int* indices,
const T* src, int lds, int dim, int k) {
__shared__ Pair<T> sh_topk[BlockSize];
__shared__ int maxid[BlockSize / 2];
const int tid = threadIdx.x;
const int warp = threadIdx.x / 32;
output += blockIdx.x * output_stride;
indices += blockIdx.x * k;
Pair<T> topk[MaxLength];
int beam = MaxLength;
Pair<T> max;
bool is_empty = false;
bool firststep = true;
for (int k = 0; k < MaxLength; k++) {
topk[k].set(-INFINITY, -1);
}
while (k) {
ThreadGetTopK<T, MaxLength, BlockSize>(topk, beam, k,
src + blockIdx.x * lds, firststep,
is_empty, max, dim, tid);
sh_topk[tid] = topk[0];
BlockReduce<T, MaxLength, BlockSize>(sh_topk, maxid, topk, &output,
&indices, beam, k, tid, warp);
}
}
template <typename T>
class TopkOpCUDAKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"It must use GPUPlace.");
auto* input = ctx.Input<Tensor>("X");
auto* output = ctx.Output<Tensor>("Out");
auto* indices = ctx.Output<Tensor>("Indices");
size_t k = static_cast<int>(ctx.Attr<int>("k"));
const T* input_data = input->data<T>();
T* output_data = output->mutable_data<T>(ctx.GetPlace());
// FIXME(typhoonzero): data is always converted to type T?
int* indices_data = indices->mutable_data<int>(ctx.GetPlace());
size_t input_height = input->dims()[0];
size_t input_width = input->dims()[1];
if (k > input_width) k = input_width;
// NOTE: pass lds and dim same to input width.
// NOTE: old matrix implementation of stride is different to eigen.
// TODO(typhoonzero): launch kernel on specified stream.
// TODO(typhoonzero): refine this kernel.
dim3 threads(256, 1);
dim3 grid(input_height, 1);
KeMatrixTopK<T, 5, 256><<<grid, threads>>>(
output_data, output->dims()[1], indices_data, input_data, input_width,
input_width, int(k));
}
};
} // namespace operators
} // namespace paddle
REGISTER_OP_GPU_KERNEL(top_k, paddle::operators::TopkOpCUDAKernel<float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <algorithm>
#include <iostream>
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
template <typename Place, typename T>
class TopkKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
// Get the top k elements of each row of input tensor
// FIXME: only deal with matrix(2d tensor).
auto* input = ctx.Input<Tensor>("X");
auto* output = ctx.Output<Tensor>("Out");
auto* indices = ctx.Output<Tensor>("Indices");
// k is determined by Attr
const size_t k = static_cast<int>(ctx.Attr<int>("k"));
T* output_data = output->mutable_data<T>(ctx.GetPlace());
T* indices_data = indices->mutable_data<T>(ctx.GetPlace());
auto eg_input = EigenMatrix<T>::From(*input);
// reshape input to a flattern matrix(like flat_inner_dims)
framework::DDim inputdims = input->dims();
const size_t row = framework::product(
framework::slice_ddim(inputdims, 0, inputdims.size() - 1));
const size_t col = inputdims[inputdims.size() - 1];
Eigen::DSizes<int, 2> flat2dims(row, col);
// NOTE: eigen shape doesn't affect paddle tensor.
eg_input.reshape(flat2dims);
for (size_t i = 0; i < row; i++) {
std::vector<std::pair<T, size_t>> vec;
for (size_t j = 0; j < col; j++) {
vec.push_back(std::pair<T, size_t>(eg_input(i, j), j));
}
std::partial_sort(
vec.begin(), vec.begin() + k, vec.end(),
[](const std::pair<T, size_t>& l, const std::pair<T, size_t>& r) {
return l.first > r.first;
});
for (size_t j = 0; j < k; j++) {
output_data[i * k + j] = vec[j].first;
indices_data[i * k + j] = vec[j].second;
}
}
}
};
} // namespace operators
} // namespace paddle
...@@ -35,7 +35,7 @@ class CPUUniformRandomKernel : public framework::OpKernel { ...@@ -35,7 +35,7 @@ class CPUUniformRandomKernel : public framework::OpKernel {
std::uniform_real_distribution<T> dist( std::uniform_real_distribution<T> dist(
static_cast<T>(context.Attr<float>("min")), static_cast<T>(context.Attr<float>("min")),
static_cast<T>(context.Attr<float>("max"))); static_cast<T>(context.Attr<float>("max")));
int64_t size = framework::product(tensor->dims()); int64_t size = tensor->numel();
for (int64_t i = 0; i < size; ++i) { for (int64_t i = 0; i < size; ++i) {
data[i] = dist(engine); data[i] = dist(engine);
} }
......
...@@ -53,8 +53,8 @@ class GPUUniformRandomKernel : public framework::OpKernel { ...@@ -53,8 +53,8 @@ class GPUUniformRandomKernel : public framework::OpKernel {
T min = static_cast<T>(context.Attr<float>("min")); T min = static_cast<T>(context.Attr<float>("min"));
T max = static_cast<T>(context.Attr<float>("max")); T max = static_cast<T>(context.Attr<float>("max"));
thrust::counting_iterator<unsigned int> index_sequence_begin(0); thrust::counting_iterator<unsigned int> index_sequence_begin(0);
ssize_t N = framework::product(tensor->dims()); int64_t size = tensor->numel();
thrust::transform(index_sequence_begin, index_sequence_begin + N, thrust::transform(index_sequence_begin, index_sequence_begin + size,
thrust::device_ptr<T>(data), thrust::device_ptr<T>(data),
UniformGenerator<T>(min, max, seed)); UniformGenerator<T>(min, max, seed));
} }
......
...@@ -25,10 +25,6 @@ limitations under the License. */ ...@@ -25,10 +25,6 @@ limitations under the License. */
#include "paddle/string/printf.h" #include "paddle/string/printf.h"
#include "paddle/string/to_string.h" #include "paddle/string/to_string.h"
#ifdef __GNUC__
#include <cxxabi.h> // for __cxa_demangle
#endif
#ifndef PADDLE_ONLY_CPU #ifndef PADDLE_ONLY_CPU
#include "paddle/platform/dynload/cublas.h" #include "paddle/platform/dynload/cublas.h"
...@@ -46,19 +42,6 @@ limitations under the License. */ ...@@ -46,19 +42,6 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace platform { namespace platform {
namespace {
#ifdef __GNUC__
inline std::string demangle(std::string name) {
int status = -4; // some arbitrary value to eliminate the compiler warning
std::unique_ptr<char, void (*)(void*)> res{
abi::__cxa_demangle(name.c_str(), NULL, NULL, &status), std::free};
return (status == 0) ? res.get() : name;
}
#else
inline std::string demangle(std::string name) { return name; }
#endif
}
struct EnforceNotMet : public std::exception { struct EnforceNotMet : public std::exception {
std::exception_ptr exp_; std::exception_ptr exp_;
std::string err_str_; std::string err_str_;
...@@ -79,7 +62,7 @@ struct EnforceNotMet : public std::exception { ...@@ -79,7 +62,7 @@ struct EnforceNotMet : public std::exception {
Dl_info info; Dl_info info;
for (int i = 0; i < size; ++i) { for (int i = 0; i < size; ++i) {
if (dladdr(call_stack[i], &info)) { if (dladdr(call_stack[i], &info)) {
auto demangled = demangle(info.dli_sname); auto demangled = info.dli_sname;
auto addr_offset = static_cast<char*>(call_stack[i]) - auto addr_offset = static_cast<char*>(call_stack[i]) -
static_cast<char*>(info.dli_saddr); static_cast<char*>(info.dli_saddr);
sout << string::Sprintf("%-3d %*0p %s + %zd\n", i, sout << string::Sprintf("%-3d %*0p %s + %zd\n", i,
......
...@@ -17,6 +17,7 @@ limitations under the License. */ ...@@ -17,6 +17,7 @@ limitations under the License. */
#include <vector> #include <vector>
#include "paddle/framework/backward.h" #include "paddle/framework/backward.h"
#include "paddle/framework/lod_tensor.h"
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/operators/net_op.h" #include "paddle/operators/net_op.h"
#include "paddle/operators/recurrent_op.h" #include "paddle/operators/recurrent_op.h"
...@@ -34,6 +35,7 @@ USE_OP(add); ...@@ -34,6 +35,7 @@ USE_OP(add);
USE_OP(onehot_cross_entropy); USE_OP(onehot_cross_entropy);
USE_OP(sgd); USE_OP(sgd);
USE_OP(mul); USE_OP(mul);
USE_OP(elementwise_mul);
USE_OP(mean); USE_OP(mean);
USE_OP(sigmoid); USE_OP(sigmoid);
USE_OP(softmax); USE_OP(softmax);
...@@ -49,13 +51,19 @@ USE_OP(minus); ...@@ -49,13 +51,19 @@ USE_OP(minus);
USE_OP(cos_sim); USE_OP(cos_sim);
USE_CPU_ONLY_OP(gather); USE_CPU_ONLY_OP(gather);
USE_CPU_ONLY_OP(scatter); USE_CPU_ONLY_OP(scatter);
USE_CPU_ONLY_OP(concat);
USE_OP(top_k);
USE_OP(squared_l2_distance); USE_OP(squared_l2_distance);
USE_OP(sum);
USE_OP(reshape);
USE_OP(modified_huber_loss); USE_OP(modified_huber_loss);
namespace paddle { namespace paddle {
namespace framework { namespace framework {
using Tensor = framework::Tensor; using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
using LoD = framework::LoD;
static size_t UniqueIntegerGenerator() { static size_t UniqueIntegerGenerator() {
static std::atomic<size_t> generator; static std::atomic<size_t> generator;
...@@ -115,6 +123,60 @@ PYBIND11_PLUGIN(core) { ...@@ -115,6 +123,60 @@ PYBIND11_PLUGIN(core) {
return self.data<float>()[offset]; return self.data<float>()[offset];
}); });
py::class_<LoDTensor>(m, "LoDTensor", R"DOC(LoD(Leval of Ddetails) Tensor.
The tensor and LoD info should be created before creating the LoDTensor, then
call the set_tensor and set_lod functions to set them.
)DOC")
.def("__init__",
[](LoDTensor &instance,
const std::vector<std::vector<size_t>> &lod,
Tensor *t) {
#ifdef PADDLE_ONLY_CPU
new (&instance) LoDTensor(lod, t);
#else
paddle::framework::LoD new_lod;
new_lod.reserve(lod.size());
std::copy(lod.begin(), lod.end(), std::back_inserter(new_lod));
new (&instance) LoDTensor(new_lod, t);
#endif
})
.def("set_tensor",
[](LoDTensor &self, Tensor *tensor) { self.set_tensor(tensor); })
.def("set_lod",
[](LoDTensor &self, const std::vector<std::vector<size_t>> &lod) {
#ifdef PADDLE_ONLY_CPU
self.set_lod(lod);
#else
paddle::framework::LoD new_lod;
new_lod.reserve(lod.size());
std::copy(lod.begin(), lod.end(), std::back_inserter(new_lod));
self.set_lod(new_lod);
#endif
})
.def("tensor",
[](LoDTensor &self) -> Tensor & { return self.tensor(); },
py::return_value_policy::reference)
.def("lod", [](LoDTensor &self) -> std::vector<std::vector<size_t>> {
#ifdef PADDLE_ONLY_CPU
return self.lod();
#else
auto lod = self.lod();
std::vector<std::vector<size_t>> new_lod;
new_lod.reserve(lod.size());
std::transform(lod.begin(), lod.end(), std::back_inserter(new_lod),
[](paddle::framework::Vector<size_t> item) ->
std::vector<size_t> {
std::vector<size_t> v;
v.reserve(item.size());
std::copy(item.begin(), item.end(), std::back_inserter(v));
return v;
});
return new_lod;
#endif
});
py::class_<Variable>(m, "Variable", R"DOC(Variable Class. py::class_<Variable>(m, "Variable", R"DOC(Variable Class.
All parameter, weight, gradient are variables in Paddle. All parameter, weight, gradient are variables in Paddle.
...@@ -126,6 +188,11 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -126,6 +188,11 @@ All parameter, weight, gradient are variables in Paddle.
.def("get_tensor", .def("get_tensor",
[](Variable &self) -> Tensor * { return self.GetMutable<Tensor>(); }, [](Variable &self) -> Tensor * { return self.GetMutable<Tensor>(); },
py::return_value_policy::reference) py::return_value_policy::reference)
.def("get_lod_tensor",
[](Variable &self) -> LoDTensor * {
return self.GetMutable<LoDTensor>();
},
py::return_value_policy::reference)
.def("get_net", .def("get_net",
[](Variable &self) -> operators::NetOp * { [](Variable &self) -> operators::NetOp * {
return self.GetMutable<operators::NetOp>(); return self.GetMutable<operators::NetOp>();
...@@ -216,7 +283,10 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -216,7 +283,10 @@ All parameter, weight, gradient are variables in Paddle.
-> std::map<std::string, std::vector<std::string>> { -> std::map<std::string, std::vector<std::string>> {
return op.Outputs(); return op.Outputs();
}) })
.def("output_vars",
[](const OperatorBase &op) { return op.OutputVars(true); })
.def("inputs", [](const OperatorBase &op) { return op.Inputs(); }) .def("inputs", [](const OperatorBase &op) { return op.Inputs(); })
.def("input_vars", [](const OperatorBase &op) { return op.InputVars(); })
.def("__str__", &OperatorBase::DebugString) .def("__str__", &OperatorBase::DebugString)
.def("no_intermediate_outputs", .def("no_intermediate_outputs",
[](const OperatorBase &op) { return op.OutputVars(false); }) [](const OperatorBase &op) { return op.OutputVars(false); })
......
...@@ -30,6 +30,8 @@ Configuring cmake in /paddle/build ... ...@@ -30,6 +30,8 @@ Configuring cmake in /paddle/build ...
-DCMAKE_BUILD_TYPE=Release -DCMAKE_BUILD_TYPE=Release
-DWITH_DOC=OFF -DWITH_DOC=OFF
-DWITH_GPU=${WITH_GPU:-OFF} -DWITH_GPU=${WITH_GPU:-OFF}
-DWITH_MKLDNN=${WITH_MKLDNN:-ON}
-DWITH_MKLML=${WITH_MKLML:-ON}
-DWITH_AVX=${WITH_AVX:-OFF} -DWITH_AVX=${WITH_AVX:-OFF}
-DWITH_GOLANG=${WITH_GOLANG:-ON} -DWITH_GOLANG=${WITH_GOLANG:-ON}
-DWITH_SWIG_PY=ON -DWITH_SWIG_PY=ON
...@@ -37,7 +39,7 @@ Configuring cmake in /paddle/build ... ...@@ -37,7 +39,7 @@ Configuring cmake in /paddle/build ...
-DWITH_PYTHON=${WITH_PYTHON:-ON} -DWITH_PYTHON=${WITH_PYTHON:-ON}
-DWITH_SWIG_PY=${WITH_SWIG_PY:-ON} -DWITH_SWIG_PY=${WITH_SWIG_PY:-ON}
-DCUDNN_ROOT=/usr/ -DCUDNN_ROOT=/usr/
-DWITH_STYLE_CHECK=${WITH_STYLE_CHECK:-OFF} -DWITH_STYLE_CHECK=${WITH_STYLE_CHECK:-ON}
-DWITH_TESTING=${WITH_TESTING:-ON} -DWITH_TESTING=${WITH_TESTING:-ON}
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
======================================== ========================================
...@@ -50,6 +52,8 @@ cmake .. \ ...@@ -50,6 +52,8 @@ cmake .. \
-DCMAKE_BUILD_TYPE=Release \ -DCMAKE_BUILD_TYPE=Release \
-DWITH_DOC=OFF \ -DWITH_DOC=OFF \
-DWITH_GPU=${WITH_GPU:-OFF} \ -DWITH_GPU=${WITH_GPU:-OFF} \
-DWITH_MKLDNN=${WITH_MKLDNN:-ON} \
-DWITH_MKLML=${WITH_MKLML:-ON} \
-DWITH_AVX=${WITH_AVX:-OFF} \ -DWITH_AVX=${WITH_AVX:-OFF} \
-DWITH_GOLANG=${WITH_GOLANG:-ON} \ -DWITH_GOLANG=${WITH_GOLANG:-ON} \
-DWITH_SWIG_PY=${WITH_SWIG_PY:-ON} \ -DWITH_SWIG_PY=${WITH_SWIG_PY:-ON} \
......
...@@ -320,6 +320,9 @@ void loadFileList(const std::string& fileListFileName, ...@@ -320,6 +320,9 @@ void loadFileList(const std::string& fileListFileName,
} }
double getMemoryUsage() { double getMemoryUsage() {
#if defined(__ANDROID__)
return 0.0;
#else
FILE* fp = fopen("/proc/meminfo", "r"); FILE* fp = fopen("/proc/meminfo", "r");
CHECK(fp) << "failed to fopen /proc/meminfo"; CHECK(fp) << "failed to fopen /proc/meminfo";
size_t bufsize = 256 * sizeof(char); size_t bufsize = 256 * sizeof(char);
...@@ -357,6 +360,7 @@ double getMemoryUsage() { ...@@ -357,6 +360,7 @@ double getMemoryUsage() {
delete[] buf; delete[] buf;
double usedMem = 1.0 - 1.0 * (freeMem + bufMem + cacheMem) / totalMem; double usedMem = 1.0 - 1.0 * (freeMem + bufMem + cacheMem) / totalMem;
return usedMem; return usedMem;
#endif
} }
SyncThreadPool* getGlobalSyncThreadPool() { SyncThreadPool* getGlobalSyncThreadPool() {
......
...@@ -33,6 +33,13 @@ limitations under the License. */ ...@@ -33,6 +33,13 @@ limitations under the License. */
#include "Flags.h" #include "Flags.h"
#include "hl_gpu.h" #include "hl_gpu.h"
#if defined(__ANDROID__) && (__ANDROID_API__ < 21)
inline int rand_r(unsigned int* seedp) {
(void)seedp;
return rand();
}
#endif
/** /**
* Loop over the elements in a container * Loop over the elements in a container
* TODO(yuyang18): It's this foreach useful? Why not use C++ 11 foreach, * TODO(yuyang18): It's this foreach useful? Why not use C++ 11 foreach,
......
...@@ -271,6 +271,7 @@ message ImageConfig { ...@@ -271,6 +271,7 @@ message ImageConfig {
// The size of input feature map. // The size of input feature map.
required uint32 img_size = 8; required uint32 img_size = 8;
optional uint32 img_size_y = 9; optional uint32 img_size_y = 9;
optional uint32 img_size_z = 10 [ default = 1 ];
} }
message PriorBoxConfig { message PriorBoxConfig {
...@@ -288,8 +289,8 @@ message PadConfig { ...@@ -288,8 +289,8 @@ message PadConfig {
} }
message ReshapeConfig { message ReshapeConfig {
repeated uint32 heightAxis = 1; repeated uint32 height_axis = 1;
repeated uint32 widthAxis = 2; repeated uint32 width_axis = 2;
} }
message MultiBoxLossConfig { message MultiBoxLossConfig {
...@@ -519,6 +520,7 @@ message LayerConfig { ...@@ -519,6 +520,7 @@ message LayerConfig {
// for HuberRegressionLoss // for HuberRegressionLoss
optional double delta = 57 [ default = 1.0 ]; optional double delta = 57 [ default = 1.0 ];
// for 3D data
optional uint64 depth = 58 [ default = 1 ]; optional uint64 depth = 58 [ default = 1 ];
// for switch order layer // for switch order layer
......
...@@ -1332,6 +1332,12 @@ def parse_image(image, input_layer_name, image_conf): ...@@ -1332,6 +1332,12 @@ def parse_image(image, input_layer_name, image_conf):
get_img_size(input_layer_name, image_conf.channels) get_img_size(input_layer_name, image_conf.channels)
def parse_image3d(image, input_layer_name, image_conf):
image_conf.channels = image.channels
image_conf.img_size, image_conf.img_size_y, image_conf.img_size_z = \
get_img3d_size(input_layer_name, image_conf.channels)
def parse_norm(norm, input_layer_name, norm_conf): def parse_norm(norm, input_layer_name, norm_conf):
norm_conf.norm_type = norm.norm_type norm_conf.norm_type = norm.norm_type
config_assert( config_assert(
...@@ -2028,6 +2034,7 @@ class ParameterReluLayer(LayerBase): ...@@ -2028,6 +2034,7 @@ class ParameterReluLayer(LayerBase):
config_assert(input_layer.size % partial_sum == 0, config_assert(input_layer.size % partial_sum == 0,
"a wrong setting for partial_sum") "a wrong setting for partial_sum")
self.set_layer_size(input_layer.size) self.set_layer_size(input_layer.size)
self.config.partial_sum = partial_sum
self.create_input_parameter(0, input_layer.size / partial_sum) self.create_input_parameter(0, input_layer.size / partial_sum)
...@@ -2365,9 +2372,11 @@ class BatchNormLayer(LayerBase): ...@@ -2365,9 +2372,11 @@ class BatchNormLayer(LayerBase):
name, name,
inputs, inputs,
bias=True, bias=True,
img3D=False,
use_global_stats=True, use_global_stats=True,
moving_average_fraction=0.9, moving_average_fraction=0.9,
batch_norm_type=None, batch_norm_type=None,
mean_var_names=None,
**xargs): **xargs):
if inputs is None: if inputs is None:
inputs = [] inputs = []
...@@ -2409,24 +2418,69 @@ class BatchNormLayer(LayerBase): ...@@ -2409,24 +2418,69 @@ class BatchNormLayer(LayerBase):
input_layer = self.get_input_layer(0) input_layer = self.get_input_layer(0)
image_conf = self.config.inputs[0].image_conf image_conf = self.config.inputs[0].image_conf
parse_image(self.inputs[0].image, input_layer.name, image_conf) if img3D:
parse_image3d(self.inputs[0].image, input_layer.name, image_conf)
# Only pass the width and height of input to batch_norm layer # Only pass the width and height of input to batch_norm layer
# when either of it is non-zero. # when either of it is non-zero.
if input_layer.width != 0 or input_layer.height != 0: if input_layer.width != 0 or input_layer.height != 0:
self.set_cnn_layer(name, image_conf.img_size_y, image_conf.img_size, self.set_cnn_layer(
image_conf.channels, False) input_layer_name=name,
depth=image_conf.img_size_z,
height=image_conf.img_size_y,
width=image_conf.img_size,
channels=image_conf.channels,
is_print=True)
else:
self.set_layer_size(input_layer.size)
else: else:
self.set_layer_size(input_layer.size) parse_image(self.inputs[0].image, input_layer.name, image_conf)
# Only pass the width and height of input to batch_norm layer
# when either of it is non-zero.
if input_layer.width != 0 or input_layer.height != 0:
self.set_cnn_layer(
input_layer_name=name,
height=image_conf.img_size_y,
width=image_conf.img_size,
channels=image_conf.channels,
is_print=True)
else:
self.set_layer_size(input_layer.size)
psize = self.calc_parameter_size(image_conf) psize = self.calc_parameter_size(image_conf)
dims = [1, psize] dims = [1, psize]
if mean_var_names is not None:
assert len(mean_var_names) == 2
self.inputs[1].parameter_name = mean_var_names[0]
self.inputs[2].parameter_name = mean_var_names[1]
self.create_input_parameter(0, psize) self.create_input_parameter(0, psize)
self.create_input_parameter(1, psize, dims) self.create_input_parameter(1, psize, dims)
self.create_input_parameter(2, psize, dims) self.create_input_parameter(2, psize, dims)
self.create_bias_parameter(bias, psize) self.create_bias_parameter(bias, psize)
def set_cnn_layer(self,
input_layer_name,
depth=None,
height=None,
width=None,
channels=None,
is_print=True):
depthIsNone = False
if depth is None:
depth = 1
depthIsNone = True
size = depth * height * width * channels
self.set_layer_size(size)
self.set_layer_height_width(height, width)
self.set_layer_depth(depth)
if is_print and depthIsNone:
print("output for %s: c = %d, h = %d, w = %d, size = %d" %
(input_layer_name, channels, height, width, size))
elif is_print:
print("output for %s: c = %d, d = %d, h = %d, w = %d, size = %d" %
(input_layer_name, channels, depth, height, width, size))
def calc_parameter_size(self, image_conf): def calc_parameter_size(self, image_conf):
return image_conf.channels return image_conf.channels
...@@ -2688,9 +2742,20 @@ class AddToLayer(LayerBase): ...@@ -2688,9 +2742,20 @@ class AddToLayer(LayerBase):
super(AddToLayer, self).__init__( super(AddToLayer, self).__init__(
name, 'addto', 0, inputs=inputs, **xargs) name, 'addto', 0, inputs=inputs, **xargs)
config_assert(len(inputs) > 0, 'inputs cannot be empty for AddToLayer') config_assert(len(inputs) > 0, 'inputs cannot be empty for AddToLayer')
for input_index in xrange(len(self.inputs)):
input_layer = self.get_input_layer(input_index) if len(self.inputs) > 1:
self.set_layer_size(input_layer.size) for input_index in xrange(len(self.inputs)):
assert self.get_input_layer(0).height == self.get_input_layer(
input_index).height
assert self.get_input_layer(0).width == self.get_input_layer(
input_index).width
assert self.get_input_layer(0).depth == self.get_input_layer(
input_index).depth
self.set_layer_size(self.get_input_layer(0).size)
self.set_layer_height_width(self.get_input_layer(0).height, \
self.get_input_layer(0).width)
self.set_layer_depth(self.get_input_layer(0).depth)
self.create_bias_parameter(bias, self.config.size) self.create_bias_parameter(bias, self.config.size)
...@@ -3370,11 +3435,20 @@ class ConcatenateLayer(LayerBase): ...@@ -3370,11 +3435,20 @@ class ConcatenateLayer(LayerBase):
name, 'concat', 0, inputs=inputs, **xargs) name, 'concat', 0, inputs=inputs, **xargs)
size = 0 size = 0
for input_index in xrange(len(self.inputs)): for input_index in xrange(len(self.inputs)):
assert self.get_input_layer(0).height == self.get_input_layer(
input_index).height
assert self.get_input_layer(0).width == self.get_input_layer(
input_index).width
assert self.get_input_layer(0).depth == self.get_input_layer(
input_index).depth
input_layer = self.get_input_layer(input_index) input_layer = self.get_input_layer(input_index)
input = self.inputs[input_index] input = self.inputs[input_index]
if self.config.size == 0: if self.config.size == 0:
size += input_layer.size size += input_layer.size
self.set_layer_height_width(self.get_input_layer(0).height, \
self.get_input_layer(0).width)
self.set_layer_depth(self.get_input_layer(0).depth)
self.set_layer_size(size) self.set_layer_size(size)
...@@ -3675,8 +3749,8 @@ class SwitchOrderLayer(LayerBase): ...@@ -3675,8 +3749,8 @@ class SwitchOrderLayer(LayerBase):
def __init__(self, name, inputs, reshape, **xargs): def __init__(self, name, inputs, reshape, **xargs):
super(SwitchOrderLayer, self).__init__( super(SwitchOrderLayer, self).__init__(
name, 'switch_order', 0, inputs=inputs, **xargs) name, 'switch_order', 0, inputs=inputs, **xargs)
self.config.reshape_conf.heightAxis.extend(reshape['height']) self.config.reshape_conf.height_axis.extend(reshape['height'])
self.config.reshape_conf.widthAxis.extend(reshape['width']) self.config.reshape_conf.width_axis.extend(reshape['width'])
# Deprecated, use a new layer specific class instead # Deprecated, use a new layer specific class instead
......
...@@ -354,6 +354,10 @@ class LayerOutput(object): ...@@ -354,6 +354,10 @@ class LayerOutput(object):
def height(self): def height(self):
return cp.g_layer_map[self.full_name].height return cp.g_layer_map[self.full_name].height
@property
def depth(self):
return cp.g_layer_map[self.full_name].depth
def set_input(self, input): def set_input(self, input):
""" """
Set the input for a memory layer. Can only be used for memory layer Set the input for a memory layer. Can only be used for memory layer
...@@ -943,7 +947,7 @@ def data_layer(name, size, depth=None, height=None, width=None, ...@@ -943,7 +947,7 @@ def data_layer(name, size, depth=None, height=None, width=None,
if height is not None and width is not None: if height is not None and width is not None:
num_filters = size / (width * height * depth) num_filters = size / (width * height * depth)
assert num_filters * width * height * depth == size, \ assert num_filters * width * height * depth == size, \
"size=%s width=%s height=%s depth=%s" % (size, width, height, depth) "size=%s width=%s height=%s depth=%s" % (size, width, height, depth)
return LayerOutput(name, LayerType.DATA, size=size, num_filters=num_filters) return LayerOutput(name, LayerType.DATA, size=size, num_filters=num_filters)
...@@ -1219,7 +1223,8 @@ def detection_output_layer(input_loc, ...@@ -1219,7 +1223,8 @@ def detection_output_layer(input_loc,
name=None): name=None):
""" """
Apply the NMS to the output of network and compute the predict bounding Apply the NMS to the output of network and compute the predict bounding
box location. box location. The output of this layer could be None if there is no valid
bounding box.
:param name: The Layer Name. :param name: The Layer Name.
:type name: basestring :type name: basestring
...@@ -2953,13 +2958,15 @@ def img_cmrnorm_layer(input, ...@@ -2953,13 +2958,15 @@ def img_cmrnorm_layer(input,
def batch_norm_layer(input, def batch_norm_layer(input,
act=None, act=None,
name=None, name=None,
img3D=False,
num_channels=None, num_channels=None,
bias_attr=None, bias_attr=None,
param_attr=None, param_attr=None,
layer_attr=None, layer_attr=None,
batch_norm_type=None, batch_norm_type=None,
moving_average_fraction=0.9, moving_average_fraction=0.9,
use_global_stats=None): use_global_stats=None,
mean_var_names=None):
""" """
Batch Normalization Layer. The notation of this layer as follow. Batch Normalization Layer. The notation of this layer as follow.
...@@ -3026,6 +3033,8 @@ def batch_norm_layer(input, ...@@ -3026,6 +3033,8 @@ def batch_norm_layer(input,
:math:`runningMean = newMean*(1-factor) :math:`runningMean = newMean*(1-factor)
+ runningMean*factor` + runningMean*factor`
:type moving_average_fraction: float. :type moving_average_fraction: float.
:param mean_var_names: [mean name, variance name]
:type mean_var_names: string list
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
""" """
...@@ -3039,6 +3048,7 @@ def batch_norm_layer(input, ...@@ -3039,6 +3048,7 @@ def batch_norm_layer(input,
(batch_norm_type == "cudnn_batch_norm") (batch_norm_type == "cudnn_batch_norm")
l = Layer( l = Layer(
name=name, name=name,
img3D=img3D,
inputs=Input( inputs=Input(
input.name, image=Image(channels=num_channels), **param_attr.attr), input.name, image=Image(channels=num_channels), **param_attr.attr),
active_type=act.name, active_type=act.name,
...@@ -3047,6 +3057,7 @@ def batch_norm_layer(input, ...@@ -3047,6 +3057,7 @@ def batch_norm_layer(input,
bias=ParamAttr.to_bias(bias_attr), bias=ParamAttr.to_bias(bias_attr),
moving_average_fraction=moving_average_fraction, moving_average_fraction=moving_average_fraction,
use_global_stats=use_global_stats, use_global_stats=use_global_stats,
mean_var_names=mean_var_names,
**ExtraLayerAttribute.to_kwargs(layer_attr)) **ExtraLayerAttribute.to_kwargs(layer_attr))
return LayerOutput( return LayerOutput(
...@@ -6410,7 +6421,7 @@ def gated_unit_layer(input, ...@@ -6410,7 +6421,7 @@ def gated_unit_layer(input,
@wrap_name_default('switch_order') @wrap_name_default('switch_order')
def switch_order_layer(input, def switch_order_layer(input,
name=None, name=None,
reshape=None, reshape_axis=None,
act=None, act=None,
layer_attr=None): layer_attr=None):
""" """
...@@ -6421,8 +6432,9 @@ def switch_order_layer(input, ...@@ -6421,8 +6432,9 @@ def switch_order_layer(input,
The example usage is: The example usage is:
.. code-block:: python .. code-block:: python
reshape_axis = 3
switch = switch_order(input=layer, name='switch', reshape_axis=reshape_axis)
reshape = {'height':[ 0, 1, 2], 'width':[3]} reshape = {'height':[ 0, 1, 2], 'width':[3]}
switch = switch_order(input=layer, name='switch', reshape=reshape)
:param input: The input layer. :param input: The input layer.
:type input: LayerOutput :type input: LayerOutput
...@@ -6434,6 +6446,11 @@ def switch_order_layer(input, ...@@ -6434,6 +6446,11 @@ def switch_order_layer(input,
:rtype: LayerOutput :rtype: LayerOutput
""" """
assert isinstance(input, LayerOutput) assert isinstance(input, LayerOutput)
assert reshape_axis != None and (reshape_axis > 0 and reshape_axis < 4)
height = [ele for ele in xrange(reshape_axis)]
width = [ele for ele in range(reshape_axis, 4)]
reshape = {'height': height, 'width': width}
l = Layer( l = Layer(
name=name, name=name,
inputs=input.name, inputs=input.name,
...@@ -6444,6 +6461,7 @@ def switch_order_layer(input, ...@@ -6444,6 +6461,7 @@ def switch_order_layer(input,
return LayerOutput( return LayerOutput(
name=name, name=name,
layer_type=LayerType.SWITCH_ORDER_LAYER, layer_type=LayerType.SWITCH_ORDER_LAYER,
activation=act,
parents=input, parents=input,
size=l.config.size) size=l.config.size)
......
...@@ -10,6 +10,6 @@ test_prelu_layer test_row_conv test_detection_output_layer test_multibox_loss_la ...@@ -10,6 +10,6 @@ test_prelu_layer test_row_conv test_detection_output_layer test_multibox_loss_la
test_recursive_topology test_gated_unit_layer test_clip_layer test_row_l2_norm_layer test_recursive_topology test_gated_unit_layer test_clip_layer test_row_l2_norm_layer
test_kmax_seq_socre_layer test_sub_nested_seq_select_layer test_scale_shift_layer test_kmax_seq_socre_layer test_sub_nested_seq_select_layer test_scale_shift_layer
test_seq_slice_layer test_cross_entropy_over_beam test_pooling3D_layer test_seq_slice_layer test_cross_entropy_over_beam test_pooling3D_layer
test_conv3d_layer test_deconv3d_layer) test_conv3d_layer test_deconv3d_layer test_BatchNorm3D)
export whole_configs=(test_split_datasource) export whole_configs=(test_split_datasource)
...@@ -62,6 +62,7 @@ layers { ...@@ -62,6 +62,7 @@ layers {
moving_average_fraction: 0.9 moving_average_fraction: 0.9
height: 227 height: 227
width: 227 width: 227
depth: 1
} }
layers { layers {
name: "__crmnorm_0__" name: "__crmnorm_0__"
......
...@@ -62,6 +62,7 @@ layers { ...@@ -62,6 +62,7 @@ layers {
moving_average_fraction: 0.9 moving_average_fraction: 0.9
height: 256 height: 256
width: 256 width: 256
depth: 1
} }
layers { layers {
name: "__crmnorm_0__" name: "__crmnorm_0__"
......
type: "nn"
layers {
name: "data3D"
type: "data"
size: 360
active_type: ""
height: 6
width: 20
depth: 3
}
layers {
name: "__batch_norm_0__"
type: "batch_norm"
size: 360
active_type: "relu"
inputs {
input_layer_name: "data3D"
input_parameter_name: "___batch_norm_0__.w0"
image_conf {
channels: 1
img_size: 20
img_size_y: 6
img_size_z: 3
}
}
inputs {
input_layer_name: "data3D"
input_parameter_name: "___batch_norm_0__.w1"
}
inputs {
input_layer_name: "data3D"
input_parameter_name: "___batch_norm_0__.w2"
}
bias_parameter_name: "___batch_norm_0__.wbias"
moving_average_fraction: 0.9
height: 6
width: 20
depth: 3
}
parameters {
name: "___batch_norm_0__.w0"
size: 1
initial_mean: 1.0
initial_std: 0.0
initial_strategy: 0
initial_smart: false
}
parameters {
name: "___batch_norm_0__.w1"
size: 1
initial_mean: 0.0
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
is_static: true
is_shared: true
}
parameters {
name: "___batch_norm_0__.w2"
size: 1
initial_mean: 0.0
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
is_static: true
is_shared: true
}
parameters {
name: "___batch_norm_0__.wbias"
size: 1
initial_mean: 0.0
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
}
input_layer_names: "data3D"
output_layer_names: "__batch_norm_0__"
sub_models {
name: "root"
layer_names: "data3D"
layer_names: "__batch_norm_0__"
input_layer_names: "data3D"
output_layer_names: "__batch_norm_0__"
is_recurrent_layer_group: false
}
...@@ -74,6 +74,9 @@ layers { ...@@ -74,6 +74,9 @@ layers {
inputs { inputs {
input_layer_name: "__bidirectional_gru_0___bw" input_layer_name: "__bidirectional_gru_0___bw"
} }
height: 0
width: 0
depth: 1
} }
parameters { parameters {
name: "___bidirectional_gru_0___fw_transform.w0" name: "___bidirectional_gru_0___fw_transform.w0"
......
...@@ -14,6 +14,29 @@ layers { ...@@ -14,6 +14,29 @@ layers {
input_layer_name: "input" input_layer_name: "input"
input_parameter_name: "___prelu_layer_0__.w0" input_parameter_name: "___prelu_layer_0__.w0"
} }
partial_sum: 1
}
layers {
name: "__prelu_layer_1__"
type: "prelu"
size: 300
active_type: ""
inputs {
input_layer_name: "input"
input_parameter_name: "___prelu_layer_1__.w0"
}
partial_sum: 1
}
layers {
name: "__prelu_layer_2__"
type: "prelu"
size: 300
active_type: ""
inputs {
input_layer_name: "input"
input_parameter_name: "___prelu_layer_2__.w0"
}
partial_sum: 5
} }
parameters { parameters {
name: "___prelu_layer_0__.w0" name: "___prelu_layer_0__.w0"
...@@ -23,14 +46,32 @@ parameters { ...@@ -23,14 +46,32 @@ parameters {
initial_strategy: 0 initial_strategy: 0
initial_smart: true initial_smart: true
} }
parameters {
name: "___prelu_layer_1__.w0"
size: 300
initial_mean: 0.0
initial_std: 0.057735026919
initial_strategy: 0
initial_smart: true
}
parameters {
name: "___prelu_layer_2__.w0"
size: 60
initial_mean: 0.0
initial_std: 0.129099444874
initial_strategy: 0
initial_smart: true
}
input_layer_names: "input" input_layer_names: "input"
output_layer_names: "__prelu_layer_0__" output_layer_names: "__prelu_layer_2__"
sub_models { sub_models {
name: "root" name: "root"
layer_names: "input" layer_names: "input"
layer_names: "__prelu_layer_0__" layer_names: "__prelu_layer_0__"
layer_names: "__prelu_layer_1__"
layer_names: "__prelu_layer_2__"
input_layer_names: "input" input_layer_names: "input"
output_layer_names: "__prelu_layer_0__" output_layer_names: "__prelu_layer_2__"
is_recurrent_layer_group: false is_recurrent_layer_group: false
} }
...@@ -16,6 +16,9 @@ layers { ...@@ -16,6 +16,9 @@ layers {
inputs { inputs {
input_layer_name: "data" input_layer_name: "data"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_1__" name: "__addto_1__"
...@@ -28,6 +31,9 @@ layers { ...@@ -28,6 +31,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_0__" input_layer_name: "__addto_0__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_2__" name: "__addto_2__"
...@@ -40,6 +46,9 @@ layers { ...@@ -40,6 +46,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_1__" input_layer_name: "__addto_1__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_3__" name: "__addto_3__"
...@@ -52,6 +61,9 @@ layers { ...@@ -52,6 +61,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_2__" input_layer_name: "__addto_2__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_4__" name: "__addto_4__"
...@@ -64,6 +76,9 @@ layers { ...@@ -64,6 +76,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_3__" input_layer_name: "__addto_3__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_5__" name: "__addto_5__"
...@@ -76,6 +91,9 @@ layers { ...@@ -76,6 +91,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_4__" input_layer_name: "__addto_4__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_6__" name: "__addto_6__"
...@@ -88,6 +106,9 @@ layers { ...@@ -88,6 +106,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_5__" input_layer_name: "__addto_5__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_7__" name: "__addto_7__"
...@@ -100,6 +121,9 @@ layers { ...@@ -100,6 +121,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_6__" input_layer_name: "__addto_6__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_8__" name: "__addto_8__"
...@@ -112,6 +136,9 @@ layers { ...@@ -112,6 +136,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_7__" input_layer_name: "__addto_7__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_9__" name: "__addto_9__"
...@@ -124,6 +151,9 @@ layers { ...@@ -124,6 +151,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_8__" input_layer_name: "__addto_8__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_10__" name: "__addto_10__"
...@@ -136,6 +166,9 @@ layers { ...@@ -136,6 +166,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_9__" input_layer_name: "__addto_9__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_11__" name: "__addto_11__"
...@@ -148,6 +181,9 @@ layers { ...@@ -148,6 +181,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_10__" input_layer_name: "__addto_10__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_12__" name: "__addto_12__"
...@@ -160,6 +196,9 @@ layers { ...@@ -160,6 +196,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_11__" input_layer_name: "__addto_11__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_13__" name: "__addto_13__"
...@@ -172,6 +211,9 @@ layers { ...@@ -172,6 +211,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_12__" input_layer_name: "__addto_12__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_14__" name: "__addto_14__"
...@@ -184,6 +226,9 @@ layers { ...@@ -184,6 +226,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_13__" input_layer_name: "__addto_13__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_15__" name: "__addto_15__"
...@@ -196,6 +241,9 @@ layers { ...@@ -196,6 +241,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_14__" input_layer_name: "__addto_14__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_16__" name: "__addto_16__"
...@@ -208,6 +256,9 @@ layers { ...@@ -208,6 +256,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_15__" input_layer_name: "__addto_15__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_17__" name: "__addto_17__"
...@@ -220,6 +271,9 @@ layers { ...@@ -220,6 +271,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_16__" input_layer_name: "__addto_16__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_18__" name: "__addto_18__"
...@@ -232,6 +286,9 @@ layers { ...@@ -232,6 +286,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_17__" input_layer_name: "__addto_17__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_19__" name: "__addto_19__"
...@@ -244,6 +301,9 @@ layers { ...@@ -244,6 +301,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_18__" input_layer_name: "__addto_18__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_20__" name: "__addto_20__"
...@@ -256,6 +316,9 @@ layers { ...@@ -256,6 +316,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_19__" input_layer_name: "__addto_19__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_21__" name: "__addto_21__"
...@@ -268,6 +331,9 @@ layers { ...@@ -268,6 +331,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_20__" input_layer_name: "__addto_20__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_22__" name: "__addto_22__"
...@@ -280,6 +346,9 @@ layers { ...@@ -280,6 +346,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_21__" input_layer_name: "__addto_21__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_23__" name: "__addto_23__"
...@@ -292,6 +361,9 @@ layers { ...@@ -292,6 +361,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_22__" input_layer_name: "__addto_22__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_24__" name: "__addto_24__"
...@@ -304,6 +376,9 @@ layers { ...@@ -304,6 +376,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_23__" input_layer_name: "__addto_23__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_25__" name: "__addto_25__"
...@@ -316,6 +391,9 @@ layers { ...@@ -316,6 +391,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_24__" input_layer_name: "__addto_24__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_26__" name: "__addto_26__"
...@@ -328,6 +406,9 @@ layers { ...@@ -328,6 +406,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_25__" input_layer_name: "__addto_25__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_27__" name: "__addto_27__"
...@@ -340,6 +421,9 @@ layers { ...@@ -340,6 +421,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_26__" input_layer_name: "__addto_26__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_28__" name: "__addto_28__"
...@@ -352,6 +436,9 @@ layers { ...@@ -352,6 +436,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_27__" input_layer_name: "__addto_27__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_29__" name: "__addto_29__"
...@@ -364,6 +451,9 @@ layers { ...@@ -364,6 +451,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_28__" input_layer_name: "__addto_28__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_30__" name: "__addto_30__"
...@@ -376,6 +466,9 @@ layers { ...@@ -376,6 +466,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_29__" input_layer_name: "__addto_29__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_31__" name: "__addto_31__"
...@@ -388,6 +481,9 @@ layers { ...@@ -388,6 +481,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_30__" input_layer_name: "__addto_30__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__fc_layer_0__" name: "__fc_layer_0__"
......
...@@ -22,6 +22,9 @@ layers { ...@@ -22,6 +22,9 @@ layers {
inputs { inputs {
input_layer_name: "b" input_layer_name: "b"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__concat_0__" name: "__concat_0__"
...@@ -34,6 +37,9 @@ layers { ...@@ -34,6 +37,9 @@ layers {
inputs { inputs {
input_layer_name: "b" input_layer_name: "b"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__concat_1__" name: "__concat_1__"
......
from paddle.trainer_config_helpers import *
settings(batch_size=1000, learning_rate=1e-4)
#data = data_layer(name='data', size=180, width=30, height=6)
#batchNorm = batch_norm_layer(data, num_channels=1)
#outputs(batchNorm)
data3D = data_layer(name='data3D', size=120 * 3, width=20, height=6, depth=3)
batchNorm3D = batch_norm_layer(data3D, num_channels=1, img3D=True)
outputs(batchNorm3D)
...@@ -2,5 +2,7 @@ from paddle.trainer_config_helpers import * ...@@ -2,5 +2,7 @@ from paddle.trainer_config_helpers import *
data = data_layer(name='input', size=300) data = data_layer(name='input', size=300)
prelu = prelu_layer(input=data) prelu = prelu_layer(input=data)
prelu = prelu_layer(input=data, partial_sum=1)
prelu = prelu_layer(input=data, partial_sum=5)
outputs(prelu) outputs(prelu)
...@@ -53,10 +53,13 @@ class BeginPass(object): ...@@ -53,10 +53,13 @@ class BeginPass(object):
class EndPass(WithMetric): class EndPass(WithMetric):
""" """
Event On One Pass Training Complete. Event On One Pass Training Complete.
To get the output of a specific layer, add "event.gm.getLayerOutputs('predict_layer')"
in your event_handler call back
""" """
def __init__(self, pass_id, evaluator): def __init__(self, pass_id, evaluator, gm):
self.pass_id = pass_id self.pass_id = pass_id
self.gm = gm
WithMetric.__init__(self, evaluator) WithMetric.__init__(self, evaluator)
...@@ -73,10 +76,13 @@ class BeginIteration(object): ...@@ -73,10 +76,13 @@ class BeginIteration(object):
class EndIteration(WithMetric): class EndIteration(WithMetric):
""" """
Event On One Batch Training Complete. Event On One Batch Training Complete.
To get the output of a specific layer, add "event.gm.getLayerOutputs('predict_layer')"
in your event_handler call back
""" """
def __init__(self, pass_id, batch_id, cost, evaluator): def __init__(self, pass_id, batch_id, cost, evaluator, gm):
self.pass_id = pass_id self.pass_id = pass_id
self.batch_id = batch_id self.batch_id = batch_id
self.cost = cost self.cost = cost
self.gm = gm
WithMetric.__init__(self, evaluator) WithMetric.__init__(self, evaluator)
...@@ -43,7 +43,6 @@ class OpDescCreationMethod(object): ...@@ -43,7 +43,6 @@ class OpDescCreationMethod(object):
if len(args) != 0: if len(args) != 0:
raise ValueError("Only keyword arguments are supported.") raise ValueError("Only keyword arguments are supported.")
op_desc = framework_pb2.OpDesc() op_desc = framework_pb2.OpDesc()
for input_parameter in self.__op_proto__.inputs: for input_parameter in self.__op_proto__.inputs:
input_arguments = kwargs.get(input_parameter.name, []) input_arguments = kwargs.get(input_parameter.name, [])
if is_str(input_arguments): if is_str(input_arguments):
...@@ -142,8 +141,8 @@ def create_op_creation_method(op_proto): ...@@ -142,8 +141,8 @@ def create_op_creation_method(op_proto):
return OpInfo( return OpInfo(
method=__impl__, method=__impl__,
name=op_proto.type, name=op_proto.type,
inputs=[var.name for var in op_proto.inputs], inputs=[(var.name, var.duplicable) for var in op_proto.inputs],
outputs=[var.name for var in op_proto.outputs], outputs=[(var.name, var.duplicable) for var in op_proto.outputs],
attrs=[attr.name for attr in op_proto.attrs]) attrs=[attr.name for attr in op_proto.attrs])
...@@ -180,9 +179,15 @@ class OperatorFactory(object): ...@@ -180,9 +179,15 @@ class OperatorFactory(object):
return self.op_methods.get(t) return self.op_methods.get(t)
def get_op_input_names(self, type): def get_op_input_names(self, type):
return map(lambda x: x[0], self.get_op_info(type).inputs)
def get_op_inputs(self, type):
return self.get_op_info(type).inputs return self.get_op_info(type).inputs
def get_op_output_names(self, type): def get_op_output_names(self, type):
return map(lambda x: x[0], self.get_op_info(type).outputs)
def get_op_outputs(self, type):
return self.get_op_info(type).outputs return self.get_op_info(type).outputs
def get_op_attr_names(self, type): def get_op_attr_names(self, type):
......
py_test(test_net SRCS test_net.py) file(GLOB TEST_OPS RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}" "test_*.py")
string(REPLACE ".py" "" TEST_OPS "${TEST_OPS}")
py_test(test_scope SRCS test_scope.py) foreach(src ${TEST_OPS})
py_test(${src} SRCS ${src}.py)
py_test(test_tensor SRCS test_tensor.py) endforeach()
py_test(test_mul_op SRCS test_mul_op.py)
py_test(test_cos_sim_op SRCS test_cos_sim_op.py)
py_test(test_mean_op SRCS test_mean_op.py)
py_test(test_protobuf SRCS test_protobuf.py)
py_test(test_add_two_op SRCS test_add_two_op.py)
py_test(test_sigmoid_op SRCS test_sigmoid_op.py)
py_test(test_softmax_op SRCS test_softmax_op.py)
py_test(test_cross_entropy_op SRCS test_cross_entropy_op.py)
py_test(test_gather_op SRCS test_gather_op.py)
py_test(test_scatter_op SRCS test_scatter_op.py)
py_test(test_fill_zeros_like_op SRCS test_fill_zeros_like_op.py)
py_test(gradient_checker SRCS gradient_checker.py)
py_test(test_rowwise_add_op SRCS test_rowwise_add_op.py)
py_test(test_default_scope_funcs SRCS test_default_scope_funcs.py)
py_test(test_operator SRCS test_operator.py)
py_test(test_gaussian_random_op SRCS test_gaussian_random_op.py)
py_test(test_uniform_random_op SRCS test_uniform_random_op.py)
py_test(test_recurrent_op SRCS test_recurrent_op.py)
py_test(test_sgd_op SRCS test_sgd_op.py)
py_test(test_gradient_checker SRCS test_gradient_checker.py)
py_test(test_lookup_table SRCS test_lookup_table.py)
py_test(test_scale_and_identity_op SRCS test_scale_and_identity_op.py)
py_test(mnist SRCS mnist.py)
py_test(test_squared_l2_distance_op SRCS test_squared_l2_distance_op.py)
py_test(test_modified_huber_loss_op SRCS test_modified_huber_loss_op.py)
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册