提交 6684b55b 编写于 作者: W wanghaoshuang

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into pad_op

Conflicts:
	paddle/operators/CMakeLists.txt
	paddle/pybind/CMakeLists.txt
...@@ -54,7 +54,8 @@ ExternalProject_Add( ...@@ -54,7 +54,8 @@ ExternalProject_Add(
${EXTERNAL_PROJECT_LOG_ARGS} ${EXTERNAL_PROJECT_LOG_ARGS}
PREFIX ${MKLML_SOURCE_DIR} PREFIX ${MKLML_SOURCE_DIR}
DOWNLOAD_DIR ${MKLML_DOWNLOAD_DIR} DOWNLOAD_DIR ${MKLML_DOWNLOAD_DIR}
DOWNLOAD_COMMAND wget --no-check-certificate -qO- ${MKLML_URL} | tar xz -C ${MKLML_DOWNLOAD_DIR} DOWNLOAD_COMMAND wget --no-check-certificate ${MKLML_URL} -c -q -O ${MKLML_VER}.tgz
&& tar zxf ${MKLML_VER}.tgz
DOWNLOAD_NO_PROGRESS 1 DOWNLOAD_NO_PROGRESS 1
UPDATE_COMMAND "" UPDATE_COMMAND ""
CMAKE_ARGS -DCMAKE_INSTALL_PREFIX=${MKLML_INSTALL_ROOT} CMAKE_ARGS -DCMAKE_INSTALL_PREFIX=${MKLML_INSTALL_ROOT}
......
...@@ -25,7 +25,12 @@ IF(NOT ${CBLAS_FOUND}) ...@@ -25,7 +25,12 @@ IF(NOT ${CBLAS_FOUND})
"${CBLAS_INSTALL_DIR}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}openblas${CMAKE_STATIC_LIBRARY_SUFFIX}" "${CBLAS_INSTALL_DIR}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}openblas${CMAKE_STATIC_LIBRARY_SUFFIX}"
CACHE FILEPATH "openblas library." FORCE) CACHE FILEPATH "openblas library." FORCE)
IF(APPLE)
SET(OPENBLAS_CC "${CMAKE_C_COMPILER} -isysroot ${CMAKE_OSX_SYSROOT}")
SET(COMMON_ARGS CC=${OPENBLAS_CC} NO_SHARED=1 NO_LAPACK=1 libs)
ELSE()
SET(COMMON_ARGS CC=${CMAKE_C_COMPILER} NO_SHARED=1 NO_LAPACK=1 libs) SET(COMMON_ARGS CC=${CMAKE_C_COMPILER} NO_SHARED=1 NO_LAPACK=1 libs)
ENDIF()
IF(CMAKE_CROSSCOMPILING) IF(CMAKE_CROSSCOMPILING)
IF(ANDROID) IF(ANDROID)
...@@ -40,11 +45,11 @@ IF(NOT ${CBLAS_FOUND}) ...@@ -40,11 +45,11 @@ IF(NOT ${CBLAS_FOUND})
SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=${TARGET} ARM_SOFTFP_ABI=1 USE_THREAD=0) SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=${TARGET} ARM_SOFTFP_ABI=1 USE_THREAD=0)
ELSEIF(RPI) ELSEIF(RPI)
# use hardfp # use hardfp
SET(OPENBLAS_COMMIT "v0.2.19") SET(OPENBLAS_COMMIT "v0.2.20")
SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=ARMV7 USE_THREAD=0) SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=ARMV7 USE_THREAD=0)
ENDIF() ENDIF()
ELSE() ELSE()
SET(OPENBLAS_COMMIT "v0.2.19") SET(OPENBLAS_COMMIT "v0.2.20")
SET(OPTIONAL_ARGS "") SET(OPTIONAL_ARGS "")
IF(CMAKE_SYSTEM_PROCESSOR MATCHES "^x86(_64)?$") IF(CMAKE_SYSTEM_PROCESSOR MATCHES "^x86(_64)?$")
SET(OPTIONAL_ARGS DYNAMIC_ARCH=1 NUM_THREADS=64) SET(OPTIONAL_ARGS DYNAMIC_ARCH=1 NUM_THREADS=64)
......
...@@ -434,9 +434,9 @@ lambda_cost ...@@ -434,9 +434,9 @@ lambda_cost
.. autoclass:: paddle.v2.layer.lambda_cost .. autoclass:: paddle.v2.layer.lambda_cost
:noindex: :noindex:
mse_cost square_error_cost
-------- --------
.. autoclass:: paddle.v2.layer.mse_cost .. autoclass:: paddle.v2.layer.square_error_cost
:noindex: :noindex:
rank_cost rank_cost
......
# Design Doc: Functions, Operators, and Layers
In a DL system, we can compose one or more fine grained operators into a coarse grained one. For example, the FC layer can be composed of a multiplication operator and an add operator.
Historically, some fine grained operations are known as operators, and some coarse level ones are known as layers. But we need a well-defined separation.
In general, operators are those very fine grained operations, e.g., mul and add. In the implementation, we can write them as C++ functions:
```c++
template <typename T> T add(T x, T y) { return x + y; }
template <typename T> T mul(T x, T y) { return x * y; }
```
Then we can wrap them into operators which are C++ classes and can be created from Python bindings by name. A C macro can do this. For example, the following macro invocation
```c++
#define MAKE_FUNCTION_OPERATOR(mul);
```
generates
```c++
template <typename T> class mulOp : public OperatorBase {...};
REGISTER_OP(mulOp<float32>, "mul");
```
so that in Python we can create operator mul by:
```python
X1 = Var()
X2 = Var()
Y = Var()
paddle.cpp.create_operator("mul", input=[X1, X2], output=Y)
```
Also, at the same time, we can compose a coarse level C++ operator class by composing functions `mul` and `add`:
```c++
template <typename T>
class FCOp : public OperatorBase {
public:
void Run(...) {
add(mul(Input<T>("X"), Input<T>("W")), Input<T>("b");
}
};
REGISTER_OP(FCOp, "fc");
```
We need to support such composition in Python as well. To do so, we need a higher level Python wrapping of operator creation than `paddle.cpp.create_operator`. This higher level operator API should be compatible with the layer API.
Let's explain using an example. Suppose that we are going to compose the FC using mul and add in Python, we'd like to have Python functions `mul` and `add` defined in module `operator`:
```python
def operator.mul(X1, X2):
O = Var()
paddle.cpp.create_operator("mul", input={X1, Y1], output=O)
return O
def operator.add(X1, X2):
O = Var()
paddle.cpp.create_operator("add", input={X1, X2], output=O)
return O
```
Above code snippets are automatically generated. Given them, users can define
```python
def layer.fc(X):
W = Var()
b = Var()
return operator.add(operator.mul(X, W), b)
```
If we don't have `operator.mul` and `operator.add`, the definiton of `layer.fc` would be complicated:
```python
def layer.fc(X):
W = Var()
b = Var()
O1 = Var()
paddle.cpp.create_operator("mul", input=[X, W], output=O1)
O2 = Var()
paddle.cpp.create_operator("add", input=[O1, b], output=O2)
return O2
```
We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example:
```
| C++ functions/functors | mul | add | | |
| C++ operator class | mulOp | addOp | FCOp | |
| Python binding | operator.mul | operator.add | operator.fc | |
| Python function | | | | layer.fc |
```
This is how we differentiate layer and operators in PaddlePaddle:
- those defined in C++ and have a lightweighted Python wrapper in module `operators` are operators; whereas
- those who don't have C++ implementations but a Python implementation that compose C++ operators are known as layers.
# Design Doc: Computations as Graphs
A primary goal of the refactorization of PaddlePaddle is a more flexible representation of deep learning computation, in particular, a graph of operators and variables, instead of sequences of layers as before.
This document explains that the construction of a graph as three steps:
- construct the forward part
- construct the backward part
- construct the optimization part
Let us take the problem of image classification as a simple example. The application program that trains the model looks like:
```python
x = layer.data("images")
l = layer.data("label")
y = layer.fc(x)
cost = layer.mse(y, l)
optimize(cost)
train(cost, reader=mnist.train())
```
### Forward Part
The first four lines of above program build the forward part of the graph.
![](images/graph_construction_example_forward_only.png)
In particular, the first line `x = layer.data("images")` creates variable x and a Feed operator that copies a column from the minibatch to x. `y = layer.fc(x)` creates not only the FC operator and output variable y, but also two parameters, W and b.
In this example, all operators are created as `OpDesc` protobuf messages, and all variables are `VarDesc`. These protobuf messages are saved in a `BlockDesc` protobuf message.
### Backward Part
The fifth line `optimize(cost)` calls two functions, `ConstructBackwardGraph` and `ConstructOptimizationGraph`.
`ConstructBackwardGraph` traverses the forward graph in the `BlockDesc` protobuf message and builds the backward part.
![](images/graph_construction_example_forward_backward.png)
According to the chain rule of gradient computation, `ConstructBackwardGraph` would
1. create a gradient operator G for each operator F,
1. make all inputs, outputs, and outputs' gradient of F as inputs of G,
1. create gradients for all inputs of F, except for those who don't have gradients, like x and l, and
1. make all these gradients as outputs of G.
### Optimization Part
For each parameter, like W and b created by `layer.fc`, marked as double circles in above graphs, `ConstructOptimizationGraph` creates an optimization operator to apply its gradient. Here results in the complete graph:
![](images/graph_construction_example_all.png)
IfOp should have only one branch. An IfOp operator takes a `cond` variable whose value must be a vector of N boolean elements. Its return value has M (M<=N) instances, each corresponds to a true element in `cond`.
```python
import paddle as pd
x = var()
y = var()
cond = var()
b = pd.create_ifop(inputs=[x], output_num=1)
with b.true_block():
x = b.inputs(0)
z = operator.add(x, y)
b.set_output(0, operator.softmax(z))
out = b(cond)
```
If we want the output still has N instances, we can use IfElseOp with a default value, whose minibatch size must be N:
```python
import paddle as pd
x = var()
y = var()
cond = var()
default_value = var()
b = pd.create_ifelseop(inputs=[x], output_num=1)
with b.true_block():
x = b.inputs(0)
z = operator.add(x, y)
b.set_output(0, operator.softmax(z))
with b.false_block():
x = b.inputs(0)
z = layer.fc(x)
b.set_output(0, operator.softmax(z))
out = b(cond)
```
If only true_block is set in an IfElseOp, we can have a default value for false as:
```python
import paddle as pd
x = var()
y = var()
cond = var()
default_value = var()
b = pd.create_ifelseop(inputs=[x], output_num=1, default_value)
with b.true_block():
x = b.inputs(0)
z = operator.add(x, y)
b.set_output(0, operator.softmax(z))
out = b(cond)
```
where default_value is a list of vars for `cond` == False.
cat ./graph_construction_example.dot | \
sed 's/color=red/color=red, style=invis/g' | \
sed 's/color=green/color=green, style=invis/g' | \
dot -Tpng > graph_construction_example_forward_only.png
cat ./graph_construction_example.dot | \
sed 's/color=green/color=green, style=invis/g' | \
dot -Tpng > graph_construction_example_forward_backward.png
cat ./graph_construction_example.dot | \
dot -Tpng > graph_construction_example_all.png
digraph ImageClassificationGraph {
///////// The forward part /////////
FeedX [label="Feed", color=blue, shape=box];
FeedY [label="Feed", color=blue, shape=box];
FC [label="FC", color=blue, shape=box];
MSE [label="MSE", color=blue, shape=box];
x [label="x", color=blue, shape=oval];
l [label="l", color=blue, shape=oval];
y [label="y", color=blue, shape=oval];
W [label="W", color=blue, shape=doublecircle];
b [label="b", color=blue, shape=doublecircle];
cost [label="cost", color=blue, shape=oval];
FeedX -> x -> FC -> y -> MSE -> cost [color=blue];
FeedY -> l [color=blue];
W -> FC [color=blue];
b -> FC [color=blue];
l -> MSE [color=blue];
////////// The backward part /////////
MSE_Grad [label="MSE_grad", color=red, shape=box];
FC_Grad [label="FC_grad", color=red, shape=box];
d_cost [label="d cost", color=red, shape=oval];
d_y [label="d y", color=red, shape=oval];
d_b [label="d b", color=red, shape=oval];
d_W [label="d W", color=red, shape=oval];
cost -> MSE_Grad [color=red];
d_cost -> MSE_Grad [color=red];
x -> MSE_Grad [color=red];
l -> MSE_Grad [color=red];
y -> MSE_Grad -> d_y [color=red];
x -> FC_Grad [color=red];
y -> FC_Grad [color=red];
d_y -> FC_Grad [color=red];
W -> FC_Grad -> d_W [color=red];
b -> FC_Grad -> d_b [color=red];
////////// The optimizaiton part //////////
OPT_W [label="SGD", color=green, shape=box];
OPT_b [label="SGD", color=green, shape=box];
W -> OPT_W [color=green];
b -> OPT_b [color=green];
d_W -> OPT_W -> W [color=green];
d_b -> OPT_b -> b [color=green];
////////// Groupings //////////
subgraph clusterMSE {
style=invis;
MSE;
MSE_Grad;
}
subgraph clusterFC {
style=invis;
FC;
FC_Grad;
}
}
...@@ -55,7 +55,7 @@ PaddlePaddle是源于百度的一个深度学习平台。这份简短的介绍 ...@@ -55,7 +55,7 @@ PaddlePaddle是源于百度的一个深度学习平台。这份简短的介绍
# 线性计算网络层: ȳ = wx + b # 线性计算网络层: ȳ = wx + b
ȳ = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b')) ȳ = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b'))
# 计算误差函数,即 ȳ 和真实 y 之间的距离 # 计算误差函数,即 ȳ 和真实 y 之间的距离
cost = mse_cost(input= ȳ, label=y) cost = square_error_cost(input= ȳ, label=y)
outputs(cost) outputs(cost)
...@@ -69,7 +69,7 @@ PaddlePaddle是源于百度的一个深度学习平台。这份简短的介绍 ...@@ -69,7 +69,7 @@ PaddlePaddle是源于百度的一个深度学习平台。这份简短的介绍
- **数据层**:数据层 `data_layer` 是神经网络的入口,它读入数据并将它们传输到接下来的网络层。这里数据层有两个,分别对应于变量 `x` 和 `y`。 - **数据层**:数据层 `data_layer` 是神经网络的入口,它读入数据并将它们传输到接下来的网络层。这里数据层有两个,分别对应于变量 `x` 和 `y`。
- **全连接层**:全连接层 `fc_layer` 是基础的计算单元,这里利用它建模变量之间的线性关系。计算单元是神经网络的核心,PaddlePaddle支持大量的计算单元和任意深度的网络连接,从而可以拟合任意的函数来学习复杂的数据关系。 - **全连接层**:全连接层 `fc_layer` 是基础的计算单元,这里利用它建模变量之间的线性关系。计算单元是神经网络的核心,PaddlePaddle支持大量的计算单元和任意深度的网络连接,从而可以拟合任意的函数来学习复杂的数据关系。
- **回归误差代价层**:回归误差代价层 `mse_cost` 是众多误差代价函数层的一种,它们在训练过程作为网络的出口,用来计算模型的误差,是模型参数优化的目标函数。 - **回归误差代价层**:回归误差代价层 `square_error_cost` 是众多误差代价函数层的一种,它们在训练过程作为网络的出口,用来计算模型的误差,是模型参数优化的目标函数。
定义了网络结构并保存为 `trainer_config.py` 之后,运行以下训练命令: 定义了网络结构并保存为 `trainer_config.py` 之后,运行以下训练命令:
......
...@@ -49,7 +49,7 @@ To recover this relationship between ``X`` and ``Y``, we use a neural network wi ...@@ -49,7 +49,7 @@ To recover this relationship between ``X`` and ``Y``, we use a neural network wi
x = data_layer(name='x', size=1) x = data_layer(name='x', size=1)
y = data_layer(name='y', size=1) y = data_layer(name='y', size=1)
y_predict = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b')) y_predict = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b'))
cost = mse_cost(input=y_predict, label=y) cost = square_error_cost(input=y_predict, label=y)
outputs(cost) outputs(cost)
Some of the most fundamental usages of PaddlePaddle are demonstrated: Some of the most fundamental usages of PaddlePaddle are demonstrated:
......
...@@ -8,7 +8,7 @@ paddle.init(use_gpu=False) ...@@ -8,7 +8,7 @@ paddle.init(use_gpu=False)
x = paddle.layer.data(name='x', type=paddle.data_type.dense_vector(2)) x = paddle.layer.data(name='x', type=paddle.data_type.dense_vector(2))
y_predict = paddle.layer.fc(input=x, size=1, act=paddle.activation.Linear()) y_predict = paddle.layer.fc(input=x, size=1, act=paddle.activation.Linear())
y = paddle.layer.data(name='y', type=paddle.data_type.dense_vector(1)) y = paddle.layer.data(name='y', type=paddle.data_type.dense_vector(1))
cost = paddle.layer.mse_cost(input=y_predict, label=y) cost = paddle.layer.square_error_cost(input=y_predict, label=y)
# create parameters # create parameters
parameters = paddle.parameters.create(cost) parameters = paddle.parameters.create(cost)
......
...@@ -81,9 +81,9 @@ PaddlePaddle支持不同类型的输入数据,主要包括四种类型,和 ...@@ -81,9 +81,9 @@ PaddlePaddle支持不同类型的输入数据,主要包括四种类型,和
.. code-block:: bash .. code-block:: bash
y_predict = paddle.layer.fc(input=x, size=1, act=paddle.activation.Linear()) y_predict = paddle.layer.fc(input=x, size=1, act=paddle.activation.Linear())
cost = paddle.layer.mse_cost(input=y_predict, label=y) cost = paddle.layer.square_error_cost(input=y_predict, label=y)
其中,x与y为之前描述的输入层;而y_predict是接收x作为输入,接上一个全连接层;cost接收y_predict与y作为输入,接上方误差层。 其中,x与y为之前描述的输入层;而y_predict是接收x作为输入,接上一个全连接层;cost接收y_predict与y作为输入,接上方误差层。
最后一层cost中记录了神经网络的所有拓扑结构,通过组合不同的layer,我们即可完成神经网络的搭建。 最后一层cost中记录了神经网络的所有拓扑结构,通过组合不同的layer,我们即可完成神经网络的搭建。
......
...@@ -227,6 +227,12 @@ make mul_op ...@@ -227,6 +227,12 @@ make mul_op
USE_CPU_ONLY_OP(gather); USE_CPU_ONLY_OP(gather);
``` ```
如果OP不带Kernel,则使用`USE_NO_KENREL_OP`:
```
USE_NO_KENREL_OP(recurrent);
```
使用`USE_OP`告知编译器需要链接该Op的目标文件,具体解释参考[代码注释](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_registry.h#L81)。 使用`USE_OP`告知编译器需要链接该Op的目标文件,具体解释参考[代码注释](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/op_registry.h#L81)。
...@@ -280,28 +286,50 @@ class TestMulOp(unittest.TestCase): ...@@ -280,28 +286,50 @@ class TestMulOp(unittest.TestCase):
反向Op单测继承自`GradientChecker`,而`GradientChecker`集成自`unittest.TestCase`,所以反向单测函数需要`test_`开头。 反向Op单测继承自`GradientChecker`,而`GradientChecker`集成自`unittest.TestCase`,所以反向单测函数需要`test_`开头。
``` ```
class MulGradOpTest(GradientChecker): class TestMulGradOp(GradientChecker):
def test_mul(self): def setUp(self):
op = create_op("mul") self.op = create_op("mul")
inputs = { self.inputs = {
'X': np.random.random((32, 84)).astype("float32"), 'X': np.random.random((32, 84)).astype("float32"),
'Y': np.random.random((84, 100)).astype("float32") 'Y': np.random.random((84, 100)).astype("float32")
} }
self.compare_grad(op, inputs)
def test_cpu_gpu_compare(self):
self.compare_grad(self.op, self.inputs)
def test_normal(self):
# mul op will enlarge the relative error # mul op will enlarge the relative error
self.check_grad( self.check_grad(
op, inputs, set(["X", "Y"]), "Out", max_relative_error=0.5) self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.5)
```
def test_ignore_x(self):
self.check_grad(
self.op,
self.inputs, ["Y"],
"Out",
max_relative_error=0.5,
no_grad_set={"X"})
def test_ignore_y(self):
self.check_grad(
self.op,
self.inputs, ["X"],
"Out",
max_relative_error=0.5,
no_grad_set={"Y"})
```
下面解释一些关键的地方:
- 调用`create_op("mul")`创建反向Op对应的前向Op。 - 调用`create_op("mul")`创建反向Op对应的前向Op。
- 定义输入`inputs`
- 调用`compare_grad`函数对比CPU、GPU计算结果。 - 调用`compare_grad`函数对比CPU、GPU计算结果。
- 调用`check_grad`检查梯度稳定性,这里采用数值法检测梯度正确性。 - `test_normal`调用`check_grad`检查梯度稳定性,这里采用数值法检测梯度正确性。
- 第一个参数`op` : 前向op。 - 第一个参数`self.op` : 前向Op。
- 第二个参数`inputs` : 输入词典,词典的Key和`ProtoMaker`定义保持一致。 - 第二个参数`self.inputs` : 输入词典,词典的Key和`ProtoMaker`定义保持一致。
- 第三个参数`set(["X", "Y"])` : 指定对输入变量`X``Y`做梯度检测。 - 第三个参数`["X", "Y"]` : 指定对输入变量`X``Y`做梯度检测。
- 第四个参数`"Out"` : 指定前向网络最终的输出目标变量`Out` - 第四个参数`"Out"` : 指定前向网络最终的输出目标变量`Out`
- `test_ignore_x``test_ignore_y`分支测试只需要计算一个输入梯度的情况。
### 编译和执行 ### 编译和执行
......
...@@ -213,7 +213,7 @@ I1116 09:10:17.123440 50 Util.cpp:130] Calling runInitFunctions ...@@ -213,7 +213,7 @@ I1116 09:10:17.123440 50 Util.cpp:130] Calling runInitFunctions
I1116 09:10:17.123764 50 Util.cpp:143] Call runInitFunctions done. I1116 09:10:17.123764 50 Util.cpp:143] Call runInitFunctions done.
[WARNING 2016-11-16 09:10:17,227 default_decorators.py:40] please use keyword arguments in paddle config. [WARNING 2016-11-16 09:10:17,227 default_decorators.py:40] please use keyword arguments in paddle config.
[INFO 2016-11-16 09:10:17,239 networks.py:1282] The input order is [movie_id, title, genres, user_id, gender, age, occupation, rating] [INFO 2016-11-16 09:10:17,239 networks.py:1282] The input order is [movie_id, title, genres, user_id, gender, age, occupation, rating]
[INFO 2016-11-16 09:10:17,239 networks.py:1289] The output order is [__mse_cost_0__] [INFO 2016-11-16 09:10:17,239 networks.py:1289] The output order is [__square_error_cost_0__]
I1116 09:10:17.392917 50 Trainer.cpp:170] trainer mode: Normal I1116 09:10:17.392917 50 Trainer.cpp:170] trainer mode: Normal
I1116 09:10:17.613910 50 PyDataProvider2.cpp:257] loading dataprovider dataprovider::process I1116 09:10:17.613910 50 PyDataProvider2.cpp:257] loading dataprovider dataprovider::process
I1116 09:10:17.680917 50 PyDataProvider2.cpp:257] loading dataprovider dataprovider::process I1116 09:10:17.680917 50 PyDataProvider2.cpp:257] loading dataprovider dataprovider::process
......
...@@ -173,6 +173,96 @@ extern void hl_avgpool_backward(const int frameCnt, ...@@ -173,6 +173,96 @@ extern void hl_avgpool_backward(const int frameCnt,
real* backGrad, real* backGrad,
const int outStride); const int outStride);
extern void hl_maxpool3D_forward(const int frameCnt,
const real* inputData,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int paddingD,
const int paddingH,
const int paddingW,
real* tgtData,
real* maxPoolIdxData,
const int tgtStride);
extern void hl_maxpool3D_backward(const int frameCnt,
const real* outGrad,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int paddingD,
const int paddingH,
const int paddingW,
real scaleA,
real scaleB,
real* targetGrad,
real* maxPoolIdxData,
const int outStride);
extern void hl_avgpool3D_forward(const int frameCnt,
const real* inputData,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int paddingD,
const int paddingH,
const int paddingW,
real* tgtData,
const int tgtStride);
extern void hl_avgpool3D_backward(const int frameCnt,
const real* outGrad,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
int paddingD,
int paddingH,
int paddingW,
real scaleA,
real scaleB,
real* backGrad,
const int outStride);
/** /**
* @brief Bilinear interpolation forward. * @brief Bilinear interpolation forward.
* *
...@@ -275,4 +365,4 @@ extern void hl_maxout_backward(real* inGrad, ...@@ -275,4 +365,4 @@ extern void hl_maxout_backward(real* inGrad,
size_t featLen, size_t featLen,
size_t groups); size_t groups);
#endif /* HL_CNN_H_ */ #endif // HL_CNN_H_
...@@ -224,4 +224,80 @@ extern void hl_matrix_collect_shared_bias(real* B_d, ...@@ -224,4 +224,80 @@ extern void hl_matrix_collect_shared_bias(real* B_d,
extern void hl_matrix_rotate( extern void hl_matrix_rotate(
real* mat, real* matRot, int dimM, int dimN, bool clockWise); real* mat, real* matRot, int dimM, int dimN, bool clockWise);
/**
* @brief Matrix vol2Col: Convert 3D volume into col matrix
*
* @param[in] matSrc input matrix.
* @param[in] channel channel of matSrc.
* @param[in] depth depth of matSrc.
* @param[in] height height of matSrc.
* @param[in] width width of matSrc.
* @param[in] filterD depth of filter.
* @param[in] filterH height of filter.
* @param[in] filterW width of filter.
* @param[in] strideD stride in the depth.
* @param[in] strideH stride in the height.
* @param[in] strideW stride in the width.
* @param[in] paddingD padding in the depth.
* @param[in] paddingH padding in the height.
* @param[in] paddingW padding in the width.
* @param[out] dataDst output matrix.
*
*/
extern void hl_matrix_vol2Col(const real* dataSrc,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
real* dataDst);
/**
* @brief Matrix col2Vol: Convert col matrix into 3D volume
*
* @param[out] matDst output matrix.
* @param[in] channel channel of matDst.
* @param[in] depth depth of matDst.
* @param[in] height height of matDst.
* @param[in] width width of matDst.
* @param[in] filterD depth of filter.
* @param[in] filterH height of filter.
* @param[in] filterW width of filter.
* @param[in] strideD stride in the depth.
* @param[in] strideH stride in the height.
* @param[in] strideW stride in the width.
* @param[in] paddingD padding in the depth.
* @param[in] paddingH padding in the height.
* @param[in] paddingW padding in the width.
* @param[in] matSrc input matrix.
* @param[in] beta input
* @param[in] alpha input
*
*/
extern void hl_matrix_col2Vol(real* dataDst,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
const real* dataSrc,
real alpha,
real beta);
#endif /* HL_MATRIX_H_ */ #endif /* HL_MATRIX_H_ */
...@@ -87,6 +87,96 @@ inline void hl_avgpool_backward(const int frameCnt, ...@@ -87,6 +87,96 @@ inline void hl_avgpool_backward(const int frameCnt,
real* backGrad, real* backGrad,
const int outStride) {} const int outStride) {}
inline void hl_maxpool3D_forward(const int frameCnt,
const real* inputData,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int paddingD,
const int paddingH,
const int paddingW,
real* tgtData,
real* maxPoolIdxData,
const int tgtStride) {}
inline void hl_maxpool3D_backward(const int frameCnt,
const real* outGrad,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int paddingD,
const int paddingH,
const int paddingW,
real scaleA,
real scaleB,
real* targetGrad,
real* maxPoolIdxData,
const int outStride) {}
inline void hl_avgpool3D_forward(const int frameCnt,
const real* inputData,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int paddingD,
const int paddingH,
const int paddingW,
real* tgtData,
const int tgtStride) {}
inline void hl_avgpool3D_backward(const int frameCnt,
const real* outGrad,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int paddingD,
const int paddingH,
const int paddingW,
real scaleA,
real scaleB,
real* backGrad,
const int outStride) {}
inline void hl_bilinear_forward(const real* inData, inline void hl_bilinear_forward(const real* inData,
const size_t inImgH, const size_t inImgH,
const size_t inImgW, const size_t inImgW,
......
...@@ -99,4 +99,38 @@ inline void hl_matrix_collect_shared_bias(real* B_d, ...@@ -99,4 +99,38 @@ inline void hl_matrix_collect_shared_bias(real* B_d,
inline void hl_matrix_rotate( inline void hl_matrix_rotate(
real* mat, real* matRot, int dimM, int dimN, bool clockWise) {} real* mat, real* matRot, int dimM, int dimN, bool clockWise) {}
inline void hl_matrix_vol2Col(const real* dataSrc,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
real* dataDst) {}
inline void hl_matrix_col2Vol(real* dataDst,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
const real* dataSrc,
real alpha,
real beta) {}
#endif // HL_MATRIX_STUB_H_ #endif // HL_MATRIX_STUB_H_
...@@ -353,6 +353,433 @@ void hl_avgpool_backward(const int frameCnt, ...@@ -353,6 +353,433 @@ void hl_avgpool_backward(const int frameCnt,
CHECK_SYNC("hl_avgpool_backward failed"); CHECK_SYNC("hl_avgpool_backward failed");
} }
__global__ void KeMaxPool3DForward(const int nthreads,
const real* inputData,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int ksizeD,
const int ksizeH,
const int ksizeW,
const int strideD,
const int strideH,
const int strideW,
const int padD,
const int padH,
const int padW,
real* tgtData,
real* maxPoolIdxData,
const int tgtStride) {
for (int index = blockIdx.x * blockDim.x + threadIdx.x; index < (nthreads);
index += blockDim.x * gridDim.x) {
int pw = index % pooledW;
int ph = (index / pooledW) % pooledH;
int pd = (index / pooledW / pooledH) % pooledD;
int c = (index / pooledW / pooledH / pooledD) % channels;
int frameNum = index / pooledW / pooledH / pooledD / channels;
int dstart = pd * strideD - padD;
int hstart = ph * strideH - padH;
int wstart = pw * strideW - padW;
int dend = min(dstart + ksizeD, depth);
int hend = min(hstart + ksizeH, height);
int wend = min(wstart + ksizeW, width);
dstart = max(dstart, 0);
hstart = max(hstart, 0);
wstart = max(wstart, 0);
real maxval = -FLT_MAX;
int maxIdx = -1;
inputData += (frameNum * channels + c) * depth * height * width;
for (int d = dstart; d < dend; ++d) {
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
if (maxval < inputData[(d * height + h) * width + w]) {
maxval = inputData[(d * height + h) * width + w];
maxIdx = (d * height + h) * width + w;
}
}
}
}
int tgtIndex =
index % (pooledW * pooledH * pooledD * channels) + frameNum * tgtStride;
tgtData[tgtIndex] = maxval;
maxPoolIdxData[tgtIndex] = maxIdx;
}
}
void hl_maxpool3D_forward(const int frameCnt,
const real* inputData,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int padD,
const int padH,
const int padW,
real* tgtData,
real* maxPoolIdxData,
const int tgtStride) {
int num_kernels = pooledD * pooledH * pooledW * channels * frameCnt;
int blocks = (num_kernels + 1024 - 1) / 1024;
dim3 threads(1024, 1);
dim3 grid(blocks, 1);
KeMaxPool3DForward<<<grid, threads, 0, STREAM_DEFAULT>>>(num_kernels,
inputData,
channels,
depth,
height,
width,
pooledD,
pooledH,
pooledW,
sizeZ,
sizeY,
sizeX,
strideD,
strideH,
strideW,
padD,
padH,
padW,
tgtData,
maxPoolIdxData,
tgtStride);
CHECK_SYNC("hl_maxpool3D_forward failed");
}
__global__ void KeMaxPool3DBackward(const int nthreads,
const real* outGrad,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int padD,
const int padH,
const int padW,
real scaleA,
real scaleB,
real* targetGrad,
real* maxPoolIdxData,
const int outStride) {
for (int index = blockIdx.x * blockDim.x + threadIdx.x; index < (nthreads);
index += blockDim.x * gridDim.x) {
int offsetW = index % width;
int offsetH = (index / width) % height;
int offsetD = (index / width / height) % depth;
int offsetC = (index / width / height / depth) % channels;
int frameNum = index / width / height / depth / channels;
int pdstart =
(offsetD + padD < sizeZ) ? 0 : (offsetD + padD - sizeZ) / strideD + 1;
int phstart =
(offsetH + padH < sizeY) ? 0 : (offsetH + padH - sizeY) / strideH + 1;
int pwstart =
(offsetW + padW < sizeX) ? 0 : (offsetW + padW - sizeX) / strideW + 1;
int pdend = min((offsetD + padD) / strideD + 1, pooledD);
int phend = min((offsetH + padH) / strideH + 1, pooledH);
int pwend = min((offsetW + padW) / strideW + 1, pooledW);
real gradient = 0;
outGrad += ((frameNum * channels + offsetC) * pooledD * pooledH * pooledW);
maxPoolIdxData +=
((frameNum * channels + offsetC) * pooledD * pooledH * pooledW);
for (int pd = pdstart; pd < pdend; ++pd) {
for (int ph = phstart; ph < phend; ++ph) {
for (int pw = pwstart; pw < pwend; ++pw) {
if (((offsetD * height + offsetH) * width + offsetW) ==
maxPoolIdxData[(pd * pooledH + ph) * pooledW + pw])
gradient += outGrad[(pd * pooledH + ph) * pooledW + pw];
}
}
}
targetGrad[index] = scaleA * gradient + scaleB * targetGrad[index];
}
}
void hl_maxpool3D_backward(const int frameCnt,
const real* outGrad,
const int channels,
const int depth,
const int height,
const int width,
const int outputD,
const int outputH,
const int outputW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int paddingD,
const int paddingH,
const int paddingW,
real scaleA,
real scaleB,
real* targetGrad,
real* maxPoolIdxData,
const int outStride) {
int num_kernels = depth * height * width * channels * frameCnt;
int blocks = (num_kernels + 1024 - 1) / 1024;
KeMaxPool3DBackward<<<blocks, 1024, 0, STREAM_DEFAULT>>>(num_kernels,
outGrad,
channels,
depth,
height,
width,
outputD,
outputH,
outputW,
sizeZ,
sizeY,
sizeX,
strideD,
strideH,
strideW,
paddingD,
paddingH,
paddingW,
scaleA,
scaleB,
targetGrad,
maxPoolIdxData,
outStride);
CHECK_SYNC("hl_maxpool3D_backward");
}
__global__ void KeAvgPool3DForward(const int nthreads,
const real* inputData,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int padD,
const int padH,
const int padW,
real* tgtData,
const int tgtStride) {
for (int index = blockIdx.x * blockDim.x + threadIdx.x; index < (nthreads);
index += blockDim.x * gridDim.x) {
int pw = index % pooledW;
int ph = (index / pooledW) % pooledH;
int pd = (index / pooledW / pooledH) % pooledD;
int c = (index / pooledW / pooledH / pooledD) % channels;
int frameNum = index / pooledW / pooledH / pooledD / channels;
int dstart = pd * strideD - padD;
int hstart = ph * strideH - padH;
int wstart = pw * strideW - padW;
int dend = min(dstart + sizeZ, depth + padD);
int hend = min(hstart + sizeY, height + padH);
int wend = min(wstart + sizeX, width + padW);
int pool_size = (dend - dstart) * (hend - hstart) * (wend - wstart);
dstart = max(dstart, 0);
hstart = max(hstart, 0);
wstart = max(wstart, 0);
dend = min(dend, depth);
hend = min(hend, height);
wend = min(wend, width);
real aveval = 0;
inputData += (frameNum * channels + c) * depth * height * width;
for (int d = dstart; d < dend; ++d) {
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
aveval += inputData[(d * height + h) * width + w];
}
}
}
int tgtIndex =
index % (pooledW * pooledH * pooledD * channels) + frameNum * tgtStride;
tgtData[tgtIndex] = aveval / pool_size;
}
}
void hl_avgpool3D_forward(const int frameCnt,
const real* inputData,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int paddingD,
const int paddingH,
const int paddingW,
real* tgtData,
const int tgtStride) {
int num_kernels = pooledD * pooledH * pooledW * channels * frameCnt;
int blocks = (num_kernels + 1024 - 1) / 1024;
KeAvgPool3DForward<<<blocks, 1024, 0, STREAM_DEFAULT>>>(num_kernels,
inputData,
channels,
depth,
height,
width,
pooledD,
pooledH,
pooledW,
sizeZ,
sizeY,
sizeX,
strideD,
strideH,
strideW,
paddingD,
paddingH,
paddingW,
tgtData,
tgtStride);
CHECK_SYNC("hl_avgpool3D_forward failed");
}
__global__ void KeAvgPool3DBackward(const int nthreads,
const real* outGrad,
const int channels,
const int depth,
const int height,
const int width,
const int pooledD,
const int pooledH,
const int pooledW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
const int padD,
const int padH,
const int padW,
real scaleA,
real scaleB,
real* tgtGrad,
const int outStride) {
for (int index = blockIdx.x * blockDim.x + threadIdx.x; index < (nthreads);
index += blockDim.x * gridDim.x) {
int offsetW = index % width + padW;
int offsetH = (index / width) % height + padH;
int offsetD = (index / width / height) % depth + padD;
int offsetC = (index / width / height / depth) % channels;
int frameNum = index / width / height / depth / channels;
int pdstart = (offsetD < sizeZ) ? 0 : (offsetD - sizeZ) / strideD + 1;
int phstart = (offsetH < sizeY) ? 0 : (offsetH - sizeY) / strideH + 1;
int pwstart = (offsetW < sizeX) ? 0 : (offsetW - sizeX) / strideW + 1;
int pdend = min(offsetD / strideD + 1, pooledD);
int phend = min(offsetH / strideH + 1, pooledH);
int pwend = min(offsetW / strideW + 1, pooledW);
real gradient = 0;
outGrad += (frameNum * channels + offsetC) * pooledD * pooledH * pooledW;
for (int pd = pdstart; pd < pdend; ++pd) {
for (int ph = phstart; ph < phend; ++ph) {
for (int pw = pwstart; pw < pwend; ++pw) {
// figure out the pooling size
int dstart = pd * strideD - padD;
int hstart = ph * strideH - padH;
int wstart = pw * strideW - padW;
int dend = min(dstart + sizeZ, depth + padD);
int hend = min(hstart + sizeY, height + padH);
int wend = min(wstart + sizeX, width + padW);
int poolsize = (dend - dstart) * (hend - hstart) * (wend - wstart);
gradient += outGrad[(pd * pooledH + ph) * pooledW + pw] / poolsize;
}
}
}
tgtGrad[index] = scaleA * gradient + scaleB * tgtGrad[index];
}
}
void hl_avgpool3D_backward(const int frameCnt,
const real* outGrad,
const int channels,
const int depth,
const int height,
const int width,
const int outputD,
const int outputH,
const int outputW,
const int sizeZ,
const int sizeY,
const int sizeX,
const int strideD,
const int strideH,
const int strideW,
int paddingD,
int paddingH,
int paddingW,
real scaleA,
real scaleB,
real* backGrad,
const int outStride) {
int num_kernels = depth * height * width * channels * frameCnt;
int blocks = (num_kernels + 1024 - 1) / 1024;
KeAvgPool3DBackward<<<blocks, 1024, 0, STREAM_DEFAULT>>>(num_kernels,
outGrad,
channels,
depth,
height,
width,
outputD,
outputH,
outputW,
sizeZ,
sizeY,
sizeX,
strideD,
strideH,
strideW,
paddingD,
paddingH,
paddingW,
scaleA,
scaleB,
backGrad,
outStride);
CHECK_SYNC("hl_avgpool3D_backward failed");
}
__global__ void KeBilinearInterpFw(const real* in, __global__ void KeBilinearInterpFw(const real* in,
const size_t inImgH, const size_t inImgH,
const size_t inImgW, const size_t inImgW,
......
...@@ -592,3 +592,204 @@ void hl_matrix_rotate( ...@@ -592,3 +592,204 @@ void hl_matrix_rotate(
mat, matRot, dimM, dimN, clockWise); mat, matRot, dimM, dimN, clockWise);
CHECK_SYNC("hl_matrix_rotate failed"); CHECK_SYNC("hl_matrix_rotate failed");
} }
__global__ void keMatrixVol2Col(int num_kernels,
const real* dataSrc,
real* dataDst,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
int depth_col,
int height_col,
int width_col) {
for (int index = blockIdx.x * blockDim.x + threadIdx.x; index < num_kernels;
index += blockDim.x * gridDim.x) {
int w_out = index % width_col;
int h_out = (index / width_col) % height_col;
int d_out = (index / width_col / height_col) % depth_col;
int channel_in = index / width_col / height_col / depth_col;
int channel_out = channel_in * filterD * filterH * filterW;
int w_in = w_out * strideW - paddingW;
int h_in = h_out * strideH - paddingH;
int d_in = d_out * strideD - paddingD;
dataDst +=
((channel_out * depth_col + d_out) * height_col + h_out) * width_col +
w_out;
dataSrc += ((channel_in * depth + d_in) * height + h_in) * width + w_in;
for (int k = 0; k < filterD; ++k) {
for (int i = 0; i < filterH; ++i) {
for (int j = 0; j < filterW; ++j) {
int d = d_in + k;
int h = h_in + i;
int w = w_in + j;
*dataDst = (d >= 0 && d < depth && h >= 0 && h < height && w >= 0 &&
w < width)
? dataSrc[(k * height + i) * width + j]
: 0;
dataDst += depth_col * height_col * width_col;
}
}
}
}
}
void hl_matrix_vol2Col(const real* dataSrc,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
real* dataDst) {
int depth_col = (depth + 2 * paddingD - filterD) / strideD + 1;
int height_col = (height + 2 * paddingH - filterH) / strideH + 1;
int width_col = (width + 2 * paddingW - filterW) / strideW + 1;
int num_kernels = channels * depth_col * height_col * width_col;
const int threads = 512;
const int blocks = DIVUP(num_kernels, threads);
keMatrixVol2Col<<<blocks, threads, 0, STREAM_DEFAULT>>>(num_kernels,
dataSrc,
dataDst,
depth,
height,
width,
filterD,
filterH,
filterW,
strideD,
strideH,
strideW,
paddingD,
paddingH,
paddingW,
depth_col,
height_col,
width_col);
CHECK_SYNC("hl_matrix_vol2Col failed");
}
__global__ void keMatrixCol2Vol(int num_kernels,
real* dataDst,
const real* dataSrc,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
int depth_col,
int height_col,
int width_col,
real alpha,
real beta) {
for (int index = blockIdx.x * blockDim.x + threadIdx.x; index < num_kernels;
index += blockDim.x * gridDim.x) {
real srcVal = 0;
real dstVal = dataDst[index];
int w = index % width + paddingW;
int h = (index / width) % height + paddingH;
int d = (index / width / height) % depth + paddingD;
int c = index / width / height / depth;
// compute the start and end of the output
int w_col_start = (w < filterW) ? 0 : (w - filterW) / strideW + 1;
int w_col_end = min(w / strideW + 1, width_col);
int h_col_start = (h < filterH) ? 0 : (h - filterH) / strideH + 1;
int h_col_end = min(h / strideH + 1, height_col);
int d_col_start = (d < filterD) ? 0 : (d - filterD) / strideD + 1;
int d_col_end = min(d / strideD + 1, depth_col);
int offset = (c * filterD * filterW * filterH + d * filterW * filterH +
h * filterW + w) *
depth_col * height_col * width_col;
int coeff_d_col =
(1 - strideD * filterW * filterH * depth_col) * height_col * width_col;
int coeff_h_col =
(1 - strideH * filterW * depth_col * height_col) * width_col;
int coeff_w_col = (1 - strideW * depth_col * height_col * width_col);
for (int d_col = d_col_start; d_col < d_col_end; ++d_col) {
for (int h_col = h_col_start; h_col < h_col_end; ++h_col) {
for (int w_col = w_col_start; w_col < w_col_end; ++w_col) {
srcVal += dataSrc[offset + d_col * coeff_d_col + h_col * coeff_h_col +
w_col * coeff_w_col];
}
}
}
dataDst[index] = alpha * srcVal + beta * dstVal;
}
}
void hl_matrix_col2Vol(real* dataDst,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
const real* dataSrc,
real alpha,
real beta) {
int depth_col = (depth + 2 * paddingD - filterD) / strideD + 1;
int height_col = (height + 2 * paddingH - filterH) / strideH + 1;
int width_col = (width + 2 * paddingW - filterW) / strideW + 1;
int num_kernels = channels * depth * height * width;
const int threads = 512;
const int blocks = DIVUP(num_kernels, threads);
keMatrixCol2Vol<<<blocks, threads, 0, STREAM_DEFAULT>>>(num_kernels,
dataDst,
dataSrc,
depth,
height,
width,
filterD,
filterH,
filterW,
strideD,
strideH,
strideW,
paddingD,
paddingH,
paddingW,
depth_col,
height_col,
width_col,
alpha,
beta);
CHECK_SYNC("hl_matrix_col2Vol failed");
}
...@@ -43,6 +43,10 @@ template <> ...@@ -43,6 +43,10 @@ template <>
AttrType AttrTypeID<std::vector<std::string>>() { AttrType AttrTypeID<std::vector<std::string>>() {
return STRINGS; return STRINGS;
} }
template <>
AttrType AttrTypeID<std::vector<std::pair<int, int>>>() {
return INT_PAIRS;
}
Attribute GetAttrValue(const OpDesc::Attr& attr_desc) { Attribute GetAttrValue(const OpDesc::Attr& attr_desc) {
switch (attr_desc.type()) { switch (attr_desc.type()) {
...@@ -76,6 +80,14 @@ Attribute GetAttrValue(const OpDesc::Attr& attr_desc) { ...@@ -76,6 +80,14 @@ Attribute GetAttrValue(const OpDesc::Attr& attr_desc) {
} }
return val; return val;
} }
case paddle::framework::AttrType::INT_PAIRS: {
std::vector<std::pair<int, int>> val(attr_desc.int_pairs_size());
for (int i = 0; i < attr_desc.int_pairs_size(); ++i) {
val[i].first = attr_desc.int_pairs(i).first();
val[i].second = attr_desc.int_pairs(i).second();
}
return val;
}
} }
PADDLE_ENFORCE(false, "Unknown OpDesc::AttrDesc::type !"); PADDLE_ENFORCE(false, "Unknown OpDesc::AttrDesc::type !");
return boost::blank(); return boost::blank();
......
...@@ -28,7 +28,8 @@ namespace paddle { ...@@ -28,7 +28,8 @@ namespace paddle {
namespace framework { namespace framework {
typedef boost::variant<boost::blank, int, float, std::string, std::vector<int>, typedef boost::variant<boost::blank, int, float, std::string, std::vector<int>,
std::vector<float>, std::vector<std::string>> std::vector<float>, std::vector<std::string>,
std::vector<std::pair<int, int>>>
Attribute; Attribute;
typedef std::unordered_map<std::string, Attribute> AttributeMap; typedef std::unordered_map<std::string, Attribute> AttributeMap;
......
...@@ -182,7 +182,7 @@ static std::unique_ptr<OperatorBase> BackwardRecursive( ...@@ -182,7 +182,7 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
}); });
// process recurrent gradient op as a special operator. // process recurrent gradient op as a special operator.
if (forwardOp.Type() == "recurrent_op") { if (forwardOp.Type() == "recurrent") {
// NOTE clean up cycle call somewhere (RNN's stepnet constains itself), or // NOTE clean up cycle call somewhere (RNN's stepnet constains itself), or
// this will result in infinite loop. // this will result in infinite loop.
const auto& rnnop = const auto& rnnop =
......
...@@ -127,8 +127,8 @@ class FillZeroOpMaker : public OpProtoAndCheckerMaker { ...@@ -127,8 +127,8 @@ class FillZeroOpMaker : public OpProtoAndCheckerMaker {
public: public:
FillZeroOpMaker(OpProto *proto, OpAttrChecker *op_checker) FillZeroOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("x", "x"); AddInput("Src", "x");
AddOutput("out", "out"); AddOutput("Dst", "out");
AddComment(""); AddComment("");
} }
}; };
...@@ -138,7 +138,7 @@ class AddOpMaker : public OpProtoAndCheckerMaker { ...@@ -138,7 +138,7 @@ class AddOpMaker : public OpProtoAndCheckerMaker {
AddOpMaker(OpProto *proto, OpAttrChecker *op_checker) AddOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "x").AsDuplicable(); AddInput("X", "x").AsDuplicable();
AddOutput("Y", "y"); AddOutput("Out", "out");
AddComment(""); AddComment("");
} }
}; };
......
...@@ -22,8 +22,14 @@ enum AttrType { ...@@ -22,8 +22,14 @@ enum AttrType {
INTS = 3; INTS = 3;
FLOATS = 4; FLOATS = 4;
STRINGS = 5; STRINGS = 5;
INT_PAIRS = 6;
} }
message IntPair {
required int32 first = 1;
required int32 second = 2;
};
// OpDesc describes an instance of a C++ framework::OperatorBase // OpDesc describes an instance of a C++ framework::OperatorBase
// derived class type. // derived class type.
message OpDesc { message OpDesc {
...@@ -37,6 +43,7 @@ message OpDesc { ...@@ -37,6 +43,7 @@ message OpDesc {
repeated int32 ints = 6; repeated int32 ints = 6;
repeated float floats = 7; repeated float floats = 7;
repeated string strings = 8; repeated string strings = 8;
repeated IntPair int_pairs = 9;
}; };
message Var { message Var {
......
...@@ -19,25 +19,24 @@ ...@@ -19,25 +19,24 @@
namespace paddle { namespace paddle {
namespace framework { namespace framework {
LODTensor::LOD LODTensor::LOD::SliceLevels(size_t level_begin, LOD SliceLevels(const LOD& in, size_t level_begin, size_t level_end) {
size_t level_end) const {
LOD new_lod; LOD new_lod;
new_lod.reserve(level_end - level_begin); new_lod.reserve(level_end - level_begin);
for (size_t i = level_begin; i < level_end; i++) { for (size_t i = level_begin; i < level_end; i++) {
new_lod.emplace_back(at(i)); new_lod.emplace_back(in.at(i));
} }
return new_lod; return new_lod;
} }
LODTensor::LOD LODTensor::LOD::SliceInLevel(size_t level, size_t elem_begin, LOD SliceInLevel(const LOD& in, size_t level, size_t elem_begin,
size_t elem_end) const { size_t elem_end) {
// slice the lod. // slice the lod.
LOD new_lod; LOD new_lod;
new_lod.reserve(size() - level); new_lod.reserve(in.size() - level);
auto start = this->at(level)[elem_begin]; auto start = in.at(level)[elem_begin];
auto end = this->at(level)[elem_end]; auto end = in.at(level)[elem_end];
for (auto it = this->begin() + level; it != this->end(); it++) { for (auto it = in.begin() + level; it != in.end(); it++) {
auto it_begin = std::find(it->begin(), it->end(), start); auto it_begin = std::find(it->begin(), it->end(), start);
auto it_end = std::find(it_begin, it->end(), end); auto it_end = std::find(it_begin, it->end(), end);
PADDLE_ENFORCE(it_begin != it->end(), "error in parsing lod info"); PADDLE_ENFORCE(it_begin != it->end(), "error in parsing lod info");
...@@ -49,11 +48,11 @@ LODTensor::LOD LODTensor::LOD::SliceInLevel(size_t level, size_t elem_begin, ...@@ -49,11 +48,11 @@ LODTensor::LOD LODTensor::LOD::SliceInLevel(size_t level, size_t elem_begin,
[start](int v) { return v - start; }); [start](int v) { return v - start; });
PADDLE_ENFORCE_EQ(new_lod.back().front(), 0, "error in slice LOD"); PADDLE_ENFORCE_EQ(new_lod.back().front(), 0, "error in slice LOD");
} }
PADDLE_ENFORCE_LE(new_lod.size(), this->size()); PADDLE_ENFORCE_LE(new_lod.size(), in.size());
return new_lod; return new_lod;
} }
bool operator==(const LODTensor::LOD& a, const LODTensor::LOD& b) { bool operator==(const LOD& a, const LOD& b) {
if (a.size() != b.size()) { if (a.size() != b.size()) {
return false; return false;
} }
...@@ -70,9 +69,27 @@ bool operator==(const LODTensor::LOD& a, const LODTensor::LOD& b) { ...@@ -70,9 +69,27 @@ bool operator==(const LODTensor::LOD& a, const LODTensor::LOD& b) {
} }
} }
} }
return true; return true;
} }
void LODTensor::SliceLevels(size_t level_begin, size_t level_end) {
auto new_lod = framework::SliceLevels(lod_, level_begin, level_end);
lod_ = new_lod;
}
void LODTensor::SliceInLevel(size_t level, size_t elem_begin, size_t elem_end) {
PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level,
NumLevels());
PADDLE_ENFORCE(elem_begin < NumElements(level),
"element begin [%d] out of range [%d]", elem_begin,
NumElements(level));
PADDLE_ENFORCE(elem_end < NumElements(level) + 1,
"element end [%d] out of range [%d]", elem_end,
NumElements(level));
auto new_lod = framework::SliceInLevel(lod_, level, elem_begin, elem_end);
lod_ = new_lod;
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
#pragma once #pragma once
#include <memory> #include <memory>
#if !defined(PADDLE_ONLY_CPU) #ifndef PADDLE_ONLY_CPU
#include <thrust/device_vector.h> #include <thrust/device_vector.h>
#include <thrust/host_vector.h> #include <thrust/host_vector.h>
#endif #endif
...@@ -27,33 +27,39 @@ ...@@ -27,33 +27,39 @@
namespace paddle { namespace paddle {
namespace framework { namespace framework {
#ifdef PADDLE_ONLY_CPU
template <typename T>
using Vector = std::vector<T>;
#else
template <typename T>
using Vector = thrust::host_vector<T>;
#endif
using LOD = std::vector<Vector<size_t>>;
LOD SliceLevels(const LOD& in, size_t level_begin, size_t level_end);
LOD SliceInLevel(const LOD& in, size_t level, size_t elem_begin,
size_t elem_end);
bool operator==(const LOD& a, const LOD& b);
/* /*
* LODTensor (Level of details Tensor) * LODTensor (Level of details Tensor)
* see https://en.wikipedia.org/wiki/Level_of_details for reference. * see https://en.wikipedia.org/wiki/Level_of_details for reference.
*/ */
class LODTensor : public Tensor { class LODTensor {
public:
// Level save offsets of each unit.
#ifdef PADDLE_ONLY_CPU
template <typename T>
using Vector = std::vector<T>;
#else
template <typename T>
using Vector = thrust::host_vector<T>;
#endif
// LoD stores offsets of each level of units, the largest units level first,
// then the smaller units level. Each Level stores the offsets of units in
// Tesor.
class LOD : public std::vector<Vector<size_t>> {
public: public:
LOD SliceLevels(size_t level_begin, size_t level_end) const;
LOD SliceInLevel(size_t level, size_t elem_begin, size_t elem_end) const;
};
LODTensor() {} LODTensor() {}
explicit LODTensor(const LOD &lod) : lod_(lod) {} LODTensor(const LOD& lod, Tensor* t) : lod_(lod), tensor_(t) {}
void set_lod(const LOD& lod) { lod_ = lod; }
void set_tensor(Tensor* tensor) { tensor_ = tensor; }
Tensor& tensor() { return *tensor_; }
virtual Tensor *Clone() const { return new LODTensor(lod_); } LOD lod() { return lod_; }
/* /*
* Get a element from LOD. * Get a element from LOD.
...@@ -79,71 +85,23 @@ class LODTensor : public Tensor { ...@@ -79,71 +85,23 @@ class LODTensor : public Tensor {
PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level, PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level,
NumLevels()); NumLevels());
// the last offset is the end of last element // the last offset is the end of last element
return lod_[level].size() - 1; return (lod_)[level].size() - 1;
} }
/* /*
* Slice of levels[level_begin:level_end], with tensor shared. * Slice of levels[level_begin:level_end]
*/ */
template <typename T> void SliceLevels(size_t level_begin, size_t level_end);
LODTensor SliceLevels(size_t level_begin, size_t level_end) const;
/* /*
* Slice of elements of a level, [elem_begin: elem_end], with tensor shared. * Slice of elements of a level, [elem_begin: elem_end]
* @note: low performance in slice lod_. * @note: low performance in slice lod_.
*/ */
template <typename T> void SliceInLevel(size_t level, size_t elem_begin, size_t elem_end);
LODTensor SliceInLevel(size_t level, size_t elem_begin,
size_t elem_end) const;
/*
* Copy other's lod_'s content, free to mutate.
*/
void CopyLOD(const LODTensor &other) { lod_ = other.lod_; }
/*
* Determine whether LODTensor has a valid LOD info.
*/
const LOD &lod() const { return lod_; }
LOD *mutable_lod() { return &lod_; }
virtual ~LODTensor() {}
private: private:
LOD lod_; LOD lod_;
Tensor* tensor_; // not owned
}; };
bool operator==(const LODTensor::LOD &a, const LODTensor::LOD &b);
template <typename T>
LODTensor LODTensor::SliceLevels(size_t level_begin, size_t level_end) const {
auto new_lod = lod_.SliceLevels(level_begin, level_end);
// slice levels just need to update LOD info, each level will contains the
// whole tensor_, so no need to modify tensor_.
LODTensor new_tensor(new_lod);
new_tensor.ShareDataWith<T>(*this);
return new_tensor;
}
template <typename T>
LODTensor LODTensor::SliceInLevel(size_t level, size_t elem_begin,
size_t elem_end) const {
PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level,
NumLevels());
PADDLE_ENFORCE(elem_begin < NumElements(level),
"element begin [%d] out of range [%d]", elem_begin,
NumElements(level));
PADDLE_ENFORCE(elem_end < NumElements(level) + 1,
"element end [%d] out of range [%d]", elem_end,
NumElements(level));
auto new_lod = lod_.SliceInLevel(level, elem_begin, elem_end);
// slice elements just need to update LOD info, because offsets are not
// changed, so the original tensor_ can be reused.
LODTensor new_tensor(new_lod);
new_tensor.ShareDataWith<T>(*this);
return new_tensor;
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
# Design Doc: LoD (Level-of-Detail) Tensor
PaddlePaddle's RNN doesn't require that all instances have the same length. To do so, we introduce an extension to Tensor, namely, LoD Tensor.
## Challenge of Variable-length Inputs
People usually represent a mini-batch by a Tensor. For example, a mini-batch of 32 images, each of size 32x32, is a 10x32x32 Tensor. So a transformation, T, of all images can be a matrix multiplication of the 32x32xO-dimensional tensor T and the 10x32x32 Tensor.
Another example is that each mini-batch contains 32 sentences, where each word is a D-dimensional one-hot vector. If all sentences have the same length L, we can represent this mini-batch by a 32xLxD tensor. However, in most cases, sentences have variable lengths, and we will need an index data structure to record these variable lengths.
## LoD as a Solution
### Mini-Batch of variable-length sentenses
Let's imagine a mini-batch of 3 variable lengths sentences, containing 3, 1, and 2 words respectively. We can represent it by a (3+1+2)xD tensor plus some index information:
```
3
3 1 2
||| | ||
```
Each `|` represents a D-dimensional word vectors. The number 3 on top indicate 3 sentences, and numbers 3, 1, and 2 on the second level represent the number of words in each sentence.
### Mini-Batch of variable-length videos
This approach generalizes to the case where elements are not words, but higher dimensional objects, like images. Suppose that a mini-batch contains videos of the same frame size 640x480. If a mini-batch contains 3 videos of 3, 1, and 2 frames respectively. The underlying tensor is of size (3+1+2)x640x480. The index information illustrates as:
```
3
3 1 2
口口口 口 口口
```
where each `口` represents an image.
### Mini-Batch of fixed-size images
Let's get back to a typical example, image classification, where each mini-batch has M fixed-sized images. The LoD Tensor representation is
```
M
1 1 1 1 1
口口口口 ... 口
```
The many 1's on the second level seem duplicated. For this particular case of 2 levels and the second level always have length 1, we can ignore the LoD index.
### Design and summarization
In summary, as long as that the essential elements (words or images) have the same size, we can represent mini-batches by a LoD Tensor:
- The underlying tensor has size LxD1xD2x..., where D1xD2... is the size of the essential elements, and
- the first dimension size L has an additon property -- a LoD index as a nested vector:
```c++
typedef std::vector<std::vector> > LoD;
```
- The LoD index can is not necessary when there are only two levels and all elements of the second level have length 1.
## Slicing of LoD Tensor
Consider that we have a network with three levels of RNN: the top level one handles articles, the second level one handles sentences, and the basic level one handles words. This network requires that mini-batches represented by 4 level LoD Tensor, for example,
```
3
3 1 2
3 2 4 1 2 3
||| || |||| | || |||
```
To allow each level of RNN to handle its input, we define **the slicing of a LoD Tensor is defined as getting the j-th sequence on level i, or the <i,j>-slice**
For example, the <2,1>-slice of above slice is
```
2
||
```
and the <1,2>-slice of above example is
```
2
2 3
|| |||
```
Let's go on slicing this slice. Its <1,1>-slice is
```
3
|||
```
### The General Slicing Algorithm
The algorithm, with over-simplified data structure, is defined as
```c++
typedef vector<vector<int> > LoD;
struct LoDTensor {
LoD lod_;
float* tensor_;
};
LoDTensor Slice(const LoDTensor& lodt, int level, int sequence) {
}
```
### Slicing the Top Level
Please be aware that an RNN operator only slices the top level of a LoD Tensor to get the step inputs.
```c++
LoDTensor Slice(const LoDTensor& lodt, int sequence) {
}
```
...@@ -24,13 +24,12 @@ namespace framework { ...@@ -24,13 +24,12 @@ namespace framework {
class LODTensorTester : public ::testing::Test { class LODTensorTester : public ::testing::Test {
public: public:
virtual void SetUp() override { virtual void SetUp() override {
lod_tensor.reset(new LODTensor);
// tensor's batch_size: 30 // tensor's batch_size: 30
// 3 levels // 3 levels
// 0 10 20 // 0 10 20
// 0 5 10 15 20 // 0 5 10 15 20
// 0 2 5 7 10 12 15 20 // 0 2 5 7 10 12 15 20
LODTensor::LOD lod; LOD lod;
lod.push_back(std::vector<size_t>{0, 10, 20}); lod.push_back(std::vector<size_t>{0, 10, 20});
lod.push_back(std::vector<size_t>{0, 5, 10, 15, 20}); lod.push_back(std::vector<size_t>{0, 5, 10, 15, 20});
lod.push_back(std::vector<size_t>{0, 2, 5, 7, 10, 12, 15, 17, 20}); lod.push_back(std::vector<size_t>{0, 2, 5, 7, 10, 12, 15, 17, 20});
...@@ -41,75 +40,65 @@ class LODTensorTester : public ::testing::Test { ...@@ -41,75 +40,65 @@ class LODTensorTester : public ::testing::Test {
// malloc memory // malloc memory
tensor.mutable_data<float>(place); tensor.mutable_data<float>(place);
lod_tensor.reset(new LODTensor(lod)); lod_tensor.set_lod(lod);
lod_tensor->Resize({20 /*batch size*/, 128 /*dim*/}); lod_tensor.set_tensor(&tensor);
lod_tensor->ShareDataWith<float>(tensor);
// lod_tensor->ShareDataWith<Tensor>(tensor);
} }
protected: protected:
std::unique_ptr<LODTensor> lod_tensor;
platform::CPUPlace place; platform::CPUPlace place;
Tensor tensor; Tensor tensor;
LODTensor lod_tensor;
}; };
TEST_F(LODTensorTester, NumLevels) { ASSERT_EQ(lod_tensor->NumLevels(), 3UL); } TEST_F(LODTensorTester, NumLevels) { ASSERT_EQ(lod_tensor.NumLevels(), 3UL); }
TEST_F(LODTensorTester, NumElements) { TEST_F(LODTensorTester, NumElements) {
ASSERT_EQ(lod_tensor->NumElements(0), 2UL); ASSERT_EQ(lod_tensor.NumElements(0), 2UL);
ASSERT_EQ(lod_tensor->NumElements(1), 4UL); ASSERT_EQ(lod_tensor.NumElements(1), 4UL);
ASSERT_EQ(lod_tensor->NumElements(2), 8UL); ASSERT_EQ(lod_tensor.NumElements(2), 8UL);
} }
TEST_F(LODTensorTester, SliceLevels) { TEST_F(LODTensorTester, SliceLevels) {
// slice 1 level // slice 1 level
for (size_t level = 0; level < 3UL; ++level) { for (size_t level = 0; level < 3UL; ++level) {
auto new_lod_tensor = lod_tensor->SliceLevels<float>(level, level + 1); LODTensor new_lod_tensor = lod_tensor;
new_lod_tensor.SliceLevels(level, level + 1);
ASSERT_EQ(new_lod_tensor.NumLevels(), 1UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 1UL);
ASSERT_EQ(new_lod_tensor.NumElements(0UL), lod_tensor->NumElements(level)); ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor.NumElements(level));
// ASSERT_EQ(new_lod_tensor, *lod_tensor); ASSERT_EQ(new_lod_tensor.tensor().data<float>(),
lod_tensor.tensor().data<float>());
} }
// slice 2 level // slice 2 level
for (size_t level = 0; level < 2UL; ++level) { for (size_t level = 0; level < 2UL; ++level) {
auto new_lod_tensor = lod_tensor->SliceLevels<float>(level, level + 2); LODTensor new_lod_tensor = lod_tensor;
new_lod_tensor.SliceLevels(level, level + 2);
ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor->NumElements(level)); ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor.NumElements(level));
ASSERT_EQ(new_lod_tensor.NumElements(1), ASSERT_EQ(new_lod_tensor.NumElements(1), lod_tensor.NumElements(level + 1));
lod_tensor->NumElements(level + 1)); ASSERT_EQ(new_lod_tensor.tensor().data<float>(),
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor->data<float>()); lod_tensor.tensor().data<float>());
} }
} }
TEST_F(LODTensorTester, SliceInLevel) { TEST_F(LODTensorTester, SliceInLevel) {
size_t level = 0; size_t level = 0;
auto new_lod_tensor = lod_tensor->SliceInLevel<float>(level, 0, 2); LODTensor new_lod_tensor = lod_tensor;
new_lod_tensor.SliceInLevel(level, 0, 2);
EXPECT_EQ(new_lod_tensor.NumLevels(), 3UL); EXPECT_EQ(new_lod_tensor.NumLevels(), 3UL);
EXPECT_EQ(new_lod_tensor.NumElements(0), 2UL); EXPECT_EQ(new_lod_tensor.NumElements(0), 2UL);
EXPECT_EQ(new_lod_tensor.NumElements(1), 4UL); EXPECT_EQ(new_lod_tensor.NumElements(1), 4UL);
EXPECT_EQ(new_lod_tensor.NumElements(2), 8UL); EXPECT_EQ(new_lod_tensor.NumElements(2), 8UL);
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor->data<float>()); ASSERT_EQ(new_lod_tensor.tensor().data<float>(),
lod_tensor.tensor().data<float>());
level = 1; level = 1;
new_lod_tensor = lod_tensor->SliceInLevel<float>(level, 0, 2); new_lod_tensor = lod_tensor;
new_lod_tensor.SliceInLevel(level, 0, 2);
ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
ASSERT_EQ(new_lod_tensor.NumElements(0), 2UL); ASSERT_EQ(new_lod_tensor.NumElements(0), 2UL);
ASSERT_EQ(new_lod_tensor.NumElements(1), 4UL); ASSERT_EQ(new_lod_tensor.NumElements(1), 4UL);
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor->data<float>()); ASSERT_EQ(new_lod_tensor.tensor().data<float>(),
} lod_tensor.tensor().data<float>());
TEST_F(LODTensorTester, ShareLOD) {
LODTensor new_lod_tensor;
new_lod_tensor.CopyLOD(*lod_tensor);
ASSERT_EQ(new_lod_tensor.lod(), lod_tensor->lod());
}
TEST_F(LODTensorTester, CopyLOD) {
LODTensor new_lod_tensor;
new_lod_tensor.CopyLOD(*lod_tensor);
bool equals = std::equal(lod_tensor->lod().begin(), lod_tensor->lod().end(),
new_lod_tensor.lod().begin());
ASSERT_TRUE(equals);
} }
} // namespace framework } // namespace framework
......
...@@ -80,9 +80,19 @@ class OpInfoMap { ...@@ -80,9 +80,19 @@ class OpInfoMap {
} }
const OpInfo& Get(const std::string& type) const { const OpInfo& Get(const std::string& type) const {
auto op_info_ptr = GetNullable(type);
PADDLE_ENFORCE_NOT_NULL(op_info_ptr, "Operator %s has not been registered",
type);
return *op_info_ptr;
}
const OpInfo* GetNullable(const std::string& type) const {
auto it = map_.find(type); auto it = map_.find(type);
PADDLE_ENFORCE(it != map_.end(), "Operator %s are not found", type); if (it == map_.end()) {
return it->second; return nullptr;
} else {
return &it->second;
}
} }
template <typename Callback> template <typename Callback>
......
...@@ -199,6 +199,8 @@ class OpKernelRegistrar : public Registrar { ...@@ -199,6 +199,8 @@ class OpKernelRegistrar : public Registrar {
USE_OP_DEVICE_KERNEL(op_type, GPU) USE_OP_DEVICE_KERNEL(op_type, GPU)
#endif #endif
#define USE_NO_KERNEL_OP(op_type) USE_OP_ITSELF(op_type);
#define USE_CPU_ONLY_OP(op_type) \ #define USE_CPU_ONLY_OP(op_type) \
USE_OP_ITSELF(op_type); \ USE_OP_ITSELF(op_type); \
USE_OP_DEVICE_KERNEL(op_type, CPU); USE_OP_DEVICE_KERNEL(op_type, CPU);
......
...@@ -33,12 +33,12 @@ ExecutionContext::GetEigenDevice<platform::GPUPlace, Eigen::GpuDevice>() const { ...@@ -33,12 +33,12 @@ ExecutionContext::GetEigenDevice<platform::GPUPlace, Eigen::GpuDevice>() const {
} }
#endif #endif
const std::string& OperatorBase::Input(const std::string& name) const { std::string OperatorBase::Input(const std::string& name) const {
auto& ins = Inputs(name); auto& ins = Inputs(name);
PADDLE_ENFORCE_EQ(ins.size(), 1UL, PADDLE_ENFORCE_LE(ins.size(), 1UL,
"Op %s input %s should contain only one variable", type_, "Op %s input %s should contain only one variable", type_,
name); name);
return ins[0]; return ins.empty() ? kEmptyVarName : ins[0];
} }
const std::vector<std::string>& OperatorBase::Inputs( const std::vector<std::string>& OperatorBase::Inputs(
...@@ -49,12 +49,12 @@ const std::vector<std::string>& OperatorBase::Inputs( ...@@ -49,12 +49,12 @@ const std::vector<std::string>& OperatorBase::Inputs(
return it->second; return it->second;
} }
const std::string& OperatorBase::Output(const std::string& name) const { std::string OperatorBase::Output(const std::string& name) const {
auto& outs = Outputs(name); auto& outs = Outputs(name);
PADDLE_ENFORCE_EQ(outs.size(), 1UL, PADDLE_ENFORCE_LE(outs.size(), 1UL,
"Op %s output %s should contain only one variable", type_, "Op %s output %s should contain only one variable", type_,
name); name);
return outs[0]; return outs.empty() ? kEmptyVarName : outs[0];
} }
const std::vector<std::string>& OperatorBase::Outputs( const std::vector<std::string>& OperatorBase::Outputs(
...@@ -119,16 +119,8 @@ OperatorBase::OperatorBase(const std::string& type, ...@@ -119,16 +119,8 @@ OperatorBase::OperatorBase(const std::string& type,
const VariableNameMap& outputs, const VariableNameMap& outputs,
const AttributeMap& attrs) const AttributeMap& attrs)
: type_(type), inputs_(inputs), outputs_(outputs), attrs_(attrs) { : type_(type), inputs_(inputs), outputs_(outputs), attrs_(attrs) {
static std::atomic<size_t> gUniqId(0UL); GenerateTemporaryNames();
for (auto& output : outputs_) { CheckAllInputOutputSet();
for (auto& output_name : output.second) {
if (output_name == kTempVarName) {
output_name += type_;
output_name += "@";
output_name += std::to_string(gUniqId.fetch_add(1));
}
}
}
} }
std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const { std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const {
...@@ -156,6 +148,35 @@ std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const { ...@@ -156,6 +148,35 @@ std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const {
return ret_val; return ret_val;
} }
void OperatorBase::CheckAllInputOutputSet() const {
auto& info_map = OpInfoMap::Instance();
auto* op_info = info_map.GetNullable(Type());
if (op_info == nullptr || op_info->proto_ == nullptr) return;
for (auto& in : op_info->Proto().inputs()) {
PADDLE_ENFORCE(inputs_.find(in.name()) != inputs_.end(),
"Type %s's input %s is not set", Type(), in.name());
}
for (auto& out : op_info->Proto().outputs()) {
PADDLE_ENFORCE(outputs_.find(out.name()) != outputs_.end(),
"Type %s's output %s is not set", Type(), out.name());
}
}
void OperatorBase::GenerateTemporaryNames() {
static std::atomic<size_t> gUniqId(0UL);
for (auto& output : outputs_) {
for (auto& output_name : output.second) {
if (output_name == kTempVarName) {
output_name += type_;
output_name += "@";
output_name += std::to_string(gUniqId.fetch_add(1));
}
}
}
}
void OpProtoAndCheckerMaker::Validate() { void OpProtoAndCheckerMaker::Validate() {
validated_ = true; validated_ = true;
CheckNoDuplicatedInOutAttrs(); CheckNoDuplicatedInOutAttrs();
......
...@@ -95,12 +95,12 @@ class OperatorBase { ...@@ -95,12 +95,12 @@ class OperatorBase {
const VariableNameMap& Inputs() const { return inputs_; } const VariableNameMap& Inputs() const { return inputs_; }
const VariableNameMap& Outputs() const { return outputs_; } const VariableNameMap& Outputs() const { return outputs_; }
//! Get a input with argument's name described in `op_proto` //! Get a input with argument's name described in `op_proto`
const std::string& Input(const std::string& name) const; std::string Input(const std::string& name) const;
//! Get a input which has multiple variables. //! Get a input which has multiple variables.
const std::vector<std::string>& Inputs(const std::string& name) const; const std::vector<std::string>& Inputs(const std::string& name) const;
//! Get a output with argument's name described in `op_proto` //! Get a output with argument's name described in `op_proto`
const std::string& Output(const std::string& name) const; std::string Output(const std::string& name) const;
//! Get an output which has multiple variables. //! Get an output which has multiple variables.
//! TODO add a vector_view to prevent memory copy. //! TODO add a vector_view to prevent memory copy.
const std::vector<std::string>& Outputs(const std::string& name) const; const std::vector<std::string>& Outputs(const std::string& name) const;
...@@ -127,6 +127,10 @@ class OperatorBase { ...@@ -127,6 +127,10 @@ class OperatorBase {
// IG (Inputs Gradients) // IG (Inputs Gradients)
VariableNameMap outputs_; VariableNameMap outputs_;
AttributeMap attrs_; AttributeMap attrs_;
private:
void GenerateTemporaryNames();
void CheckAllInputOutputSet() const;
}; };
// Macro for define a clone method. // Macro for define a clone method.
...@@ -229,6 +233,15 @@ class InferShapeContext { ...@@ -229,6 +233,15 @@ class InferShapeContext {
InferShapeContext(const OperatorBase& op, const Scope& scope) InferShapeContext(const OperatorBase& op, const Scope& scope)
: op_(op), scope_(scope) {} : op_(op), scope_(scope) {}
const OperatorBase& op() const { return op_; }
const Scope& scope() const { return scope_; }
template <typename T>
inline const T& GetAttr(const std::string& name) const {
return op_.GetAttr<T>(name);
}
size_t InputSize(const std::string& name) const { size_t InputSize(const std::string& name) const {
return op_.Inputs(name).size(); return op_.Inputs(name).size();
} }
...@@ -238,11 +251,13 @@ class InferShapeContext { ...@@ -238,11 +251,13 @@ class InferShapeContext {
} }
const Variable* InputVar(const std::string& name) const { const Variable* InputVar(const std::string& name) const {
return scope_.FindVar(op_.Input(name)); auto ipt = op_.Input(name);
return ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
} }
Variable* OutputVar(const std::string& name) const { Variable* OutputVar(const std::string& name) const {
return scope_.FindVar(op_.Output(name)); auto opt = op_.Output(name);
return opt == kEmptyVarName ? nullptr : scope_.FindVar(opt);
} }
const std::vector<const Variable*> MultiInputVar( const std::vector<const Variable*> MultiInputVar(
...@@ -250,9 +265,11 @@ class InferShapeContext { ...@@ -250,9 +265,11 @@ class InferShapeContext {
auto names = op_.Inputs(name); auto names = op_.Inputs(name);
std::vector<const Variable*> res; std::vector<const Variable*> res;
res.reserve(names.size()); res.reserve(names.size());
std::transform( std::transform(names.begin(), names.end(), std::back_inserter(res),
names.begin(), names.end(), std::back_inserter(res), [this](const std::string& name) {
[this](const std::string& name) { return scope_.FindVar(name); }); return name == kEmptyVarName ? nullptr
: scope_.FindVar(name);
});
return res; return res;
} }
...@@ -260,24 +277,24 @@ class InferShapeContext { ...@@ -260,24 +277,24 @@ class InferShapeContext {
auto names = op_.Outputs(name); auto names = op_.Outputs(name);
std::vector<const Variable*> res; std::vector<const Variable*> res;
res.reserve(names.size()); res.reserve(names.size());
std::transform( std::transform(names.begin(), names.end(), std::back_inserter(res),
names.begin(), names.end(), std::back_inserter(res), [this](const std::string& name) {
[this](const std::string& name) { return scope_.FindVar(name); }); return name == kEmptyVarName ? nullptr
: scope_.FindVar(name);
});
return res; return res;
} }
template <typename T> template <typename T>
const T* Input(const std::string& name) const { const T* Input(const std::string& name) const {
auto* var = InputVar(name); auto* var = InputVar(name);
PADDLE_ENFORCE_NOT_NULL(var, "Input(%s) should not be nullptr", name); return var == nullptr ? nullptr : &var->Get<T>();
return &var->Get<T>();
} }
template <typename T> template <typename T>
T* Output(const std::string& name) const { T* Output(const std::string& name) const {
auto var = OutputVar(name); auto var = OutputVar(name);
PADDLE_ENFORCE_NOT_NULL(var, "Output(%s) should not be nullptr", name); return var == nullptr ? nullptr : var->GetMutable<T>();
return var->GetMutable<T>();
} }
template <typename T> template <typename T>
...@@ -288,10 +305,7 @@ class InferShapeContext { ...@@ -288,10 +305,7 @@ class InferShapeContext {
std::transform(names.begin(), names.end(), std::back_inserter(res), std::transform(names.begin(), names.end(), std::back_inserter(res),
[&](const std::string& sub_name) { [&](const std::string& sub_name) {
auto var = scope_.FindVar(sub_name); auto var = scope_.FindVar(sub_name);
PADDLE_ENFORCE_NOT_NULL( return var == nullptr ? nullptr : &var->Get<T>();
var, "MultiInput(%s:%s) should not be nullptr", name,
sub_name);
return &var->Get<T>();
}); });
return res; return res;
} }
...@@ -304,14 +318,12 @@ class InferShapeContext { ...@@ -304,14 +318,12 @@ class InferShapeContext {
std::transform(names.begin(), names.end(), std::back_inserter(res), std::transform(names.begin(), names.end(), std::back_inserter(res),
[&](const std::string& sub_name) { [&](const std::string& sub_name) {
auto var = scope_.FindVar(sub_name); auto var = scope_.FindVar(sub_name);
PADDLE_ENFORCE_NOT_NULL( return var == nullptr ? nullptr : var->GetMutable<T>();
var, "MultiOutput(%s:%s) should not be nullptr.", name,
sub_name);
return var->GetMutable<T>();
}); });
return res; return res;
} }
private:
const OperatorBase& op_; const OperatorBase& op_;
const Scope& scope_; const Scope& scope_;
}; };
......
...@@ -122,10 +122,10 @@ class CPUKernelTest : public OpKernel { ...@@ -122,10 +122,10 @@ class CPUKernelTest : public OpKernel {
public: public:
void Compute(const ExecutionContext& ctx) const { void Compute(const ExecutionContext& ctx) const {
std::cout << "this is cpu kernel" << std::endl; std::cout << "this is cpu kernel" << std::endl;
std::cout << ctx.op_.DebugString() << std::endl; std::cout << ctx.op().DebugString() << std::endl;
cpu_kernel_run_num++; cpu_kernel_run_num++;
ASSERT_EQ(ctx.op_.Input("x"), "IN1"); ASSERT_EQ(ctx.op().Input("x"), "IN1");
ASSERT_EQ(ctx.op_.Output("y"), "OUT1"); ASSERT_EQ(ctx.op().Output("y"), "OUT1");
} }
}; };
...@@ -148,7 +148,7 @@ class OpKernelTestMultiInputsProtoAndCheckerMaker ...@@ -148,7 +148,7 @@ class OpKernelTestMultiInputsProtoAndCheckerMaker
class CPUKernalMultiInputsTest : public OpKernel { class CPUKernalMultiInputsTest : public OpKernel {
public: public:
void Compute(const ExecutionContext& ctx) const { void Compute(const ExecutionContext& ctx) const {
auto xs = ctx.op_.Inputs("xs"); auto xs = ctx.op().Inputs("xs");
ASSERT_EQ(xs.size(), 3UL); ASSERT_EQ(xs.size(), 3UL);
ASSERT_EQ(xs[0], "x0"); ASSERT_EQ(xs[0], "x0");
ASSERT_EQ(xs[1], "x1"); ASSERT_EQ(xs[1], "x1");
...@@ -172,10 +172,10 @@ class CPUKernalMultiInputsTest : public OpKernel { ...@@ -172,10 +172,10 @@ class CPUKernalMultiInputsTest : public OpKernel {
auto outTensor0 = ctx.MultiOutput<Tensor>("ys"); auto outTensor0 = ctx.MultiOutput<Tensor>("ys");
ASSERT_EQ(outTensor0.size(), 2U); ASSERT_EQ(outTensor0.size(), 2U);
auto k = ctx.op_.Input("k"); auto k = ctx.op().Input("k");
ASSERT_EQ(k, "k0"); ASSERT_EQ(k, "k0");
auto ys = ctx.op_.Outputs("ys"); auto ys = ctx.op().Outputs("ys");
ASSERT_EQ(ys.size(), 2UL); ASSERT_EQ(ys.size(), 2UL);
ASSERT_EQ(ys[0], "y0"); ASSERT_EQ(ys[0], "y0");
ASSERT_EQ(ys[1], "y1"); ASSERT_EQ(ys[1], "y1");
......
...@@ -117,6 +117,8 @@ inline void Tensor::CopyFrom(const Tensor& src, ...@@ -117,6 +117,8 @@ inline void Tensor::CopyFrom(const Tensor& src,
memory::Copy(boost::get<platform::GPUPlace>(dst_place), dst_ptr, memory::Copy(boost::get<platform::GPUPlace>(dst_place), dst_ptr,
boost::get<platform::GPUPlace>(src_place), src_ptr, size, 0); boost::get<platform::GPUPlace>(src_place), src_ptr, size, 0);
} }
PADDLE_ENFORCE(cudaStreamSynchronize(0),
"cudaStreamSynchronize failed in Tensor CopyFrom");
#endif #endif
} }
......
...@@ -14,18 +14,20 @@ limitations under the License. */ ...@@ -14,18 +14,20 @@ limitations under the License. */
#include "Evaluator.h" #include "Evaluator.h"
#include "paddle/gserver/gradientmachines/NeuralNetwork.h" #include "paddle/gserver/gradientmachines/NeuralNetwork.h"
#include "paddle/utils/StringUtil.h"
namespace paddle { namespace paddle {
/** /**
* calculate sequence-to-sequence edit distance * calculate sequence-to-sequence edit distance
*/ */
class CTCErrorEvaluator : public NotGetableEvaluator { class CTCErrorEvaluator : public Evaluator {
private: private:
MatrixPtr outActivations_; MatrixPtr outActivations_;
int numTimes_, numClasses_, numSequences_, blank_; int numTimes_, numClasses_, numSequences_, blank_;
real deletions_, insertions_, substitutions_; real deletions_, insertions_, substitutions_;
int seqClassficationError_; int seqClassficationError_;
mutable std::unordered_map<std::string, real> evalResults_;
std::vector<int> path2String(const std::vector<int>& path) { std::vector<int> path2String(const std::vector<int>& path) {
std::vector<int> str; std::vector<int> str;
...@@ -183,6 +185,18 @@ private: ...@@ -183,6 +185,18 @@ private:
return stringAlignment(gtStr, recogStr); return stringAlignment(gtStr, recogStr);
} }
void storeLocalValues() const {
evalResults_["error"] = numSequences_ ? totalScore_ / numSequences_ : 0;
evalResults_["deletion_error"] =
numSequences_ ? deletions_ / numSequences_ : 0;
evalResults_["insertion_error"] =
numSequences_ ? insertions_ / numSequences_ : 0;
evalResults_["substitution_error"] =
numSequences_ ? substitutions_ / numSequences_ : 0;
evalResults_["sequence_error"] =
(real)seqClassficationError_ / numSequences_;
}
public: public:
CTCErrorEvaluator() CTCErrorEvaluator()
: numTimes_(0), : numTimes_(0),
...@@ -245,16 +259,12 @@ public: ...@@ -245,16 +259,12 @@ public:
} }
virtual void printStats(std::ostream& os) const { virtual void printStats(std::ostream& os) const {
os << config_.name() << "=" storeLocalValues();
<< (numSequences_ ? totalScore_ / numSequences_ : 0); os << config_.name() << " error = " << evalResults_["error"];
os << " deletions error" os << " deletions error = " << evalResults_["deletion_error"];
<< "=" << (numSequences_ ? deletions_ / numSequences_ : 0); os << " insertions error = " << evalResults_["insertion_error"];
os << " insertions error" os << " substitution error = " << evalResults_["substitution_error"];
<< "=" << (numSequences_ ? insertions_ / numSequences_ : 0); os << " sequence error = " << evalResults_["sequence_error"];
os << " substitutions error"
<< "=" << (numSequences_ ? substitutions_ / numSequences_ : 0);
os << " sequences error"
<< "=" << (real)seqClassficationError_ / numSequences_;
} }
virtual void distributeEval(ParameterClient2* client) { virtual void distributeEval(ParameterClient2* client) {
...@@ -272,6 +282,37 @@ public: ...@@ -272,6 +282,37 @@ public:
seqClassficationError_ = (int)buf[4]; seqClassficationError_ = (int)buf[4];
numSequences_ = (int)buf[5]; numSequences_ = (int)buf[5];
} }
void getNames(std::vector<std::string>* names) {
storeLocalValues();
names->reserve(names->size() + evalResults_.size());
for (auto it = evalResults_.begin(); it != evalResults_.end(); ++it) {
names->push_back(config_.name() + "." + it->first);
}
}
real getValue(const std::string& name, Error* err) const {
storeLocalValues();
std::vector<std::string> buffers;
paddle::str::split(name, '.', &buffers);
auto it = evalResults_.find(buffers[buffers.size() - 1]);
if (it == evalResults_.end()) {
*err = Error("Evaluator does not have the key %s", name.c_str());
return 0.0f;
}
return it->second;
}
std::string getType(const std::string& name, Error* err) const {
this->getValue(name, err);
if (!err->isOK()) {
return "";
}
return "ctc_edit_distance";
}
}; };
REGISTER_EVALUATOR(ctc_edit_distance, CTCErrorEvaluator); REGISTER_EVALUATOR(ctc_edit_distance, CTCErrorEvaluator);
......
...@@ -268,7 +268,13 @@ public: ...@@ -268,7 +268,13 @@ public:
} }
// get type of evaluator // get type of evaluator
std::string getTypeImpl() const { return "chunk"; } std::string getType(const std::string& name, Error* err) const {
this->getValue(name, err);
if (!err->isOK()) {
return "";
}
return "chunk";
}
private: private:
void storeLocalValues() const { void storeLocalValues() const {
......
...@@ -211,6 +211,7 @@ public: ...@@ -211,6 +211,7 @@ public:
*err = Error("Not implemented"); *err = Error("Not implemented");
return .0f; return .0f;
} }
std::string getType(const std::string& name, Error* err) const { std::string getType(const std::string& name, Error* err) const {
*err = Error("Not implemented"); *err = Error("Not implemented");
return ""; return "";
...@@ -331,6 +332,7 @@ private: ...@@ -331,6 +332,7 @@ private:
protected: protected:
std::string getTypeImpl() const; std::string getTypeImpl() const;
}; };
/** /**
* @brief precision, recall and f1 score Evaluator * @brief precision, recall and f1 score Evaluator
* \f[ * \f[
...@@ -358,6 +360,12 @@ public: ...@@ -358,6 +360,12 @@ public:
virtual void distributeEval(ParameterClient2* client); virtual void distributeEval(ParameterClient2* client);
void getNames(std::vector<std::string>* names);
real getValue(const std::string& name, Error* err) const;
std::string getType(const std::string& name, Error* err) const;
struct StatsInfo { struct StatsInfo {
/// numbers of true positives /// numbers of true positives
double TP; double TP;
...@@ -428,11 +436,6 @@ private: ...@@ -428,11 +436,6 @@ private:
mutable std::unordered_map<std::string, real> values_; mutable std::unordered_map<std::string, real> values_;
void storeLocalValues() const; void storeLocalValues() const;
// Evaluator interface
public:
void getNames(std::vector<std::string>* names);
real getValue(const std::string& name, Error* err) const;
std::string getType(const std::string& name, Error* err) const;
}; };
/* /*
......
/* Copyright (c) 2016 Baidu, Inc. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "Conv3DLayer.h"
#include "paddle/utils/Logging.h"
#include "paddle/utils/Stat.h"
namespace paddle {
REGISTER_LAYER(conv3d, Conv3DLayer);
bool Conv3DLayer::init(const LayerMap &layerMap,
const ParameterMap &parameterMap) {
if (!ConvBaseLayer::init(layerMap, parameterMap)) return false;
int index = 0;
for (auto &inputConfig : config_.inputs()) {
const ConvConfig &conf = inputConfig.conv_conf();
M_.push_back(numFilters_ / conf.groups());
K_.push_back(filterPixels_[index] * filterChannels_[index]);
// create a new weight
size_t height, width;
width = filterPixels_[index] * filterChannels_[index];
height = numFilters_;
CHECK_EQ(parameters_[index]->getSize(), width * height);
Weight *w = new Weight(height, width, parameters_[index]);
weights_.emplace_back(w);
++index;
}
if (biasParameter_.get()) {
if (sharedBiases_) {
CHECK_EQ((size_t)numFilters_, biasParameter_->getSize());
biases_ =
std::unique_ptr<Weight>(new Weight(numFilters_, 1, biasParameter_));
} else {
biases_ =
std::unique_ptr<Weight>(new Weight(getSize(), 1, biasParameter_));
}
}
return true;
}
size_t Conv3DLayer::getSize() {
CHECK_NE(inputLayers_.size(), 0UL);
outputH_.clear();
outputW_.clear();
outputD_.clear();
N_.clear();
size_t layerSize = 0;
for (size_t i = 0; i < inputLayers_.size(); ++i) {
outputW_.push_back(outputSize(
imgSizeW_[i], filterSize_[i], padding_[i], stride_[i], true));
outputH_.push_back(outputSize(
imgSizeH_[i], filterSizeY_[i], paddingY_[i], strideY_[i], true));
outputD_.push_back(outputSize(
imgSizeD_[i], filterSizeZ_[i], paddingZ_[i], strideZ_[i], true));
N_.push_back(outputD_[i] * outputH_[i] * outputW_[i]);
CHECK(layerSize == 0 || N_[i] * size_t(numFilters_) == layerSize);
layerSize += N_[i] * numFilters_;
}
getOutput().setFrameHeight(outputH_[0]);
getOutput().setFrameWidth(outputW_[0]);
getOutput().setFrameDepth(outputD_[0]);
return layerSize;
}
void Conv3DLayer::forward(PassType passType) {
Layer::forward(passType);
int batchSize = inputLayers_[0]->getOutputValue()->getHeight();
int outWidth = getSize();
resetOutput(batchSize, outWidth);
for (size_t i = 0; i != inputLayers_.size(); ++i) {
REGISTER_TIMER_INFO("FwdConv3D", getName().c_str());
const MatrixPtr &inMat = getInputValue(i);
const MatrixPtr &outMat = getOutputValue();
int M = M_[i];
int N = N_[i];
int K = K_[i];
Matrix::resizeOrCreate(colBuf_, K * groups_[i], N, false, useGpu_);
MatrixPtr wMat = weights_[i]->getW();
for (int n = 0; n < batchSize; ++n) {
colBuf_->vol2Col(inMat->getData() + n * inMat->getStride(),
channels_[i],
imgSizeD_[i],
imgSizeH_[i],
imgSizeW_[i],
filterSizeZ_[i],
filterSizeY_[i],
filterSize_[i],
strideZ_[i],
strideY_[i],
stride_[i],
paddingZ_[i],
paddingY_[i],
padding_[i]);
real *outData = outMat->getData() + n * outMat->getStride();
MatrixPtr outMatSub =
Matrix::create(outData, groups_[i] * M, N, false, useGpu_);
for (int g = 0; g < groups_[i]; g++) {
MatrixPtr wMatSub = wMat->subMatrix(g * M, M);
MatrixPtr in = colBuf_->subMatrix(g * K, K);
MatrixPtr out = outMatSub->subMatrix(g * M, M);
out->mul(*wMatSub, *in, 1.0, 1.0);
}
}
}
if (nullptr != this->biasParameter_) {
REGISTER_TIMER_INFO("FwBiasTimer", getName().c_str());
this->addBias();
}
forwardActivation();
}
void Conv3DLayer::backward(const UpdateCallback &callback) {
backwardActivation();
if (biases_ && biases_->getWGrad()) {
bpropBiases();
biases_->getParameterPtr()->incUpdate(callback);
}
for (size_t i = 0; i != inputLayers_.size(); ++i) {
REGISTER_TIMER_INFO("BwdConv3D", getName().c_str());
if (weights_[i]->getWGrad()) {
bpropWeights(i);
}
if (getInputGrad(i)) {
bpropData(i);
}
REGISTER_TIMER_INFO("WeightUpdate", getName().c_str());
weights_[i]->getParameterPtr()->incUpdate(callback);
}
}
void Conv3DLayer::bpropWeights(int i) {
int M = M_[i];
int N = N_[i];
int K = K_[i];
const MatrixPtr &inMat = getInputValue(i);
Matrix::resizeOrCreate(colBuf_, K * groups_[i], N, false, useGpu_);
MatrixPtr wGradMat = weights_[i]->getWGrad();
int batchSize = inputLayers_[0]->getOutputValue()->getHeight();
for (int n = 0; n < batchSize; ++n) {
colBuf_->vol2Col(inMat->getData() + n * inMat->getStride(),
channels_[i],
imgSizeD_[i],
imgSizeH_[i],
imgSizeW_[i],
filterSizeZ_[i],
filterSizeY_[i],
filterSize_[i],
strideZ_[i],
strideY_[i],
stride_[i],
paddingZ_[i],
paddingY_[i],
padding_[i]);
real *outGradData =
getOutputGrad()->getData() + n * getOutputGrad()->getStride();
MatrixPtr outGradSub =
Matrix::create(outGradData, groups_[i] * M, N, false, useGpu_);
for (int g = 0; g < groups_[i]; ++g) {
MatrixPtr inMatSub = colBuf_->subMatrix(g * K, K);
MatrixPtr outG = outGradSub->subMatrix(g * M, M);
MatrixPtr wGradSub = wGradMat->subMatrix(g * M, M);
wGradSub->mul(*outG, *(inMatSub->getTranspose()), 1.0, 1.0);
}
}
}
void Conv3DLayer::bpropData(int i) {
int M = M_[i];
int N = N_[i];
int K = K_[i];
Matrix::resizeOrCreate(colBuf_, K * groups_[i], N, false, useGpu_);
MatrixPtr wMat = weights_[i]->getW();
int batchSize = inputLayers_[0]->getOutputValue()->getHeight();
for (int n = 0; n < batchSize; ++n) {
real *outGradData =
getOutputGrad()->getData() + n * getOutputGrad()->getStride();
real *preGradData =
getInputGrad(i)->getData() + n * getInputGrad(i)->getStride();
MatrixPtr outGradSub =
Matrix::create(outGradData, M * groups_[i], N, false, useGpu_);
for (int g = 0; g < groups_[i]; ++g) {
MatrixPtr wMatSub = wMat->subMatrix(g * M, M);
MatrixPtr outG = outGradSub->subMatrix(g * M, M);
MatrixPtr inGradMatSub = colBuf_->subMatrix(g * K, K);
inGradMatSub->mul(*(wMatSub->getTranspose()), *outG, 1.0, 0.0);
}
colBuf_->col2Vol(preGradData,
channels_[i],
imgSizeD_[i],
imgSizeH_[i],
imgSizeW_[i],
filterSizeZ_[i],
filterSizeY_[i],
filterSize_[i],
strideZ_[i],
strideY_[i],
stride_[i],
paddingZ_[i],
paddingY_[i],
padding_[i],
1.0,
1.0);
}
}
void Conv3DLayer::bpropBiases() {
MatrixPtr biases = Matrix::create(biases_->getWGrad()->getData(),
1,
biases_->getWGrad()->getElementCnt(),
false,
useGpu_);
MatrixPtr outGradMat = getOutputGrad();
if (this->sharedBiases_) {
biases->collectSharedBias(*outGradMat, 1.0f);
} else {
biases->collectBias(*outGradMat, 1.0f);
}
}
void Conv3DLayer::addBias() {
MatrixPtr outMat = getOutputValue();
MatrixPtr bias = Matrix::create(biases_->getW()->getData(),
1,
biases_->getW()->getElementCnt(),
false,
useGpu_);
if (this->sharedBiases_) {
outMat->addSharedBias(*(bias), 1.0f);
} else {
outMat->addBias(*(bias), 1.0f);
}
}
} // namespace paddle
/* Copyright (c) 2016 Baidu, Inc. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "ConvBaseLayer.h"
#include "paddle/math/MathUtils.h"
#include "paddle/math/Matrix.h"
namespace paddle {
/**
* @brief A subclass of convolution layer.
* This layer expands input and use matrix multiplication to
* calculate convolution operation.
*/
class Conv3DLayer : public ConvBaseLayer {
public:
explicit Conv3DLayer(const LayerConfig& config) : ConvBaseLayer(config) {}
~Conv3DLayer() {}
bool init(const LayerMap& layerMap, const ParameterMap& parameterMap);
void forward(PassType passType);
void addBias();
void backward(const UpdateCallback& callback);
void bpropBiases();
void bpropData(int i);
void bpropWeights(int i);
size_t getSize();
protected:
// Figure out the dimensions for individual gemms.
IntV M_; /// numFilters_ / filter_group_;
IntV N_; /// channels_ * filterSizeZ_ * filterSize_ * filterSizeY_
IntV K_; /// outputD_ * outputH_ * outputW_
MatrixPtr colBuf_;
};
} // namespace paddle
...@@ -38,7 +38,6 @@ bool ConvBaseLayer::init(const LayerMap& layerMap, ...@@ -38,7 +38,6 @@ bool ConvBaseLayer::init(const LayerMap& layerMap,
strideY_.push_back(conf.stride_y()); strideY_.push_back(conf.stride_y());
dilationY_.push_back(conf.dilation_y()); dilationY_.push_back(conf.dilation_y());
filterSizeY_.push_back(conf.filter_size_y()); filterSizeY_.push_back(conf.filter_size_y());
filterPixels_.push_back(filterSize_.back() * filterSizeY_.back());
channels_.push_back(conf.channels()); channels_.push_back(conf.channels());
imgSizeH_.push_back(conf.has_img_size_y() ? conf.img_size_y() imgSizeH_.push_back(conf.has_img_size_y() ? conf.img_size_y()
: conf.img_size()); : conf.img_size());
...@@ -47,31 +46,20 @@ bool ConvBaseLayer::init(const LayerMap& layerMap, ...@@ -47,31 +46,20 @@ bool ConvBaseLayer::init(const LayerMap& layerMap,
filterChannels_.push_back(conf.filter_channels()); filterChannels_.push_back(conf.filter_channels());
outputH_.push_back(conf.has_output_y() ? conf.output_y() : conf.output_x()); outputH_.push_back(conf.has_output_y() ? conf.output_y() : conf.output_x());
outputW_.push_back(conf.output_x()); outputW_.push_back(conf.output_x());
paddingZ_.push_back(conf.padding_z());
strideZ_.push_back(conf.stride_z());
filterSizeZ_.push_back(conf.filter_size_z());
imgSizeD_.push_back(conf.img_size_z());
outputD_.push_back(conf.output_z());
filterPixels_.push_back(filterSize_.back() * filterSizeY_.back() *
filterSizeZ_.back());
} }
CHECK(inputLayers_.size() == parameters_.size()); CHECK(inputLayers_.size() == parameters_.size());
for (size_t i = 0; i < inputLayers_.size(); i++) {
size_t height, width;
height = filterPixels_[i] * filterChannels_[i];
width = (!isDeconv_) ? numFilters_ : channels_[i];
// create a new weight
CHECK_EQ(parameters_[i]->getSize(), width * height);
Weight* w = new Weight(height, width, parameters_[i]);
weights_.emplace_back(w);
}
/* initialize the biases_ */ // create new weights_ in derived class
if (biasParameter_.get()) { // create new biases_ in derived class
if (sharedBiases_) {
CHECK_EQ((size_t)numFilters_, biasParameter_->getSize());
biases_ =
std::unique_ptr<Weight>(new Weight(numFilters_, 1, biasParameter_));
} else {
biases_ =
std::unique_ptr<Weight>(new Weight(getSize(), 1, biasParameter_));
}
}
// default caffe model // default caffe model
caffeMode_ = true; caffeMode_ = true;
......
...@@ -62,6 +62,13 @@ protected: ...@@ -62,6 +62,13 @@ protected:
IntV outputH_; IntV outputH_;
/// The spatial dimensions of output feature map width. /// The spatial dimensions of output feature map width.
IntV outputW_; IntV outputW_;
IntV outputD_;
IntV imgSizeD_;
IntV filterSizeZ_;
IntV strideZ_;
IntV paddingZ_;
/// Group size, refer to grouped convolution in /// Group size, refer to grouped convolution in
/// Alex Krizhevsky's paper: when group=2, the first half of the /// Alex Krizhevsky's paper: when group=2, the first half of the
/// filters are only connected to the first half of the input channels, /// filters are only connected to the first half of the input channels,
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "CrossEntropyOverBeam.h"
namespace paddle {
void CostForOneSequence::calValidExpandStep() {
validExpansionCount_ = 0;
goldAsExtraPath_ = true;
for (size_t i = 0; i < beams_->expansionCount; ++i) {
real gold = static_cast<real>(beams_->gold[i]);
if (i) {
real* start = beams_->candidateIds[i - 1]->getData();
goldRowIds_[i] = std::count_if(
start,
start + goldRowIds_[i - 1] * beamSize_ + goldColIds_[i - 1],
[](const real& val) { return val != -1.; });
} else {
goldRowIds_[i] = 0;
}
real* start =
beams_->candidateIds[i]->getData() + goldRowIds_[i] * beamSize_;
real* findEnd = std::find(start, start + beamSize_, gold);
validExpansionCount_++;
if (start + beamSize_ == findEnd) return;
goldColIds_[i] = findEnd - start;
}
if (goldColIds_[beams_->expansionCount - 1] != -1) goldAsExtraPath_ = false;
}
size_t CostForOneSequence::initLastExpansion() {
int beamId = validExpansionCount_ - 1;
const MatrixPtr candidates = beams_->candidateIds[beamId];
size_t height = candidates->getHeight();
/* initialization the last expansion. */
size_t pathCount = std::count_if(candidates->getData(),
candidates->getData() + height * beamSize_,
[](const real& val) { return val != -1; });
/*
* if the gold sequence falls off the beam during search, add the gold
* sequence as the last path into the all expanded candidates.
*/
if (goldAsExtraPath_) goldIdsInFinalExpansion_ = pathCount++;
pathRowIdsInEachBeam_.clear();
pathRowIdsInEachBeam_.resize(validExpansionCount_,
std::vector<int>(pathCount, 0));
parentIdsInBeam_.clear();
parentIdsInBeam_.resize(pathCount, 0);
if (goldAsExtraPath_) {
/* add gold sequence into the total expansion. */
pathRowIdsInEachBeam_[beamId].back() =
beams_->gold[beamId] +
getSeqStartPos(beamId, goldRowIds_[validExpansionCount_ - 1]);
parentIdsInBeam_.back() = goldRowIds_[validExpansionCount_ - 1];
} else {
size_t goldOffset = goldRowIds_[beamId] * beamSize_ + goldColIds_[beamId];
goldIdsInFinalExpansion_ =
std::count_if(candidates->getData(),
candidates->getData() + goldOffset,
[](const real& val) { return val != -1.; });
}
/*
* TODO(caoying): fix this, store the indices of selected candidate
* paths into Argument.ids
*/
real* ids = candidates->getData();
size_t curIdx = 0;
for (size_t i = 0; i < height; ++i) {
int basePos = getSeqStartPos(beamId, i);
for (size_t j = 0; j < beamSize_; ++j) {
int id = ids[i * beamSize_ + j];
if (id == -1) continue;
pathRowIdsInEachBeam_[beamId][curIdx] = id + basePos;
parentIdsInBeam_[curIdx++] = i;
}
}
return pathCount;
}
void CostForOneSequence::constructTotalExpansion() {
/*
* construct the entire expanded beam by begining with the last search
* in which gold falls off the beam.
*/
size_t totalPathCount = initLastExpansion();
for (int beamId = validExpansionCount_ - 2; beamId >= 0; --beamId) {
const MatrixPtr candidates = beams_->candidateIds[beamId];
real* ids = candidates->getData();
int lastParentIdInBeam = -1;
int basePos = -1;
for (size_t i = 0;
i < (goldAsExtraPath_ ? totalPathCount - 1 : totalPathCount);
++i) {
int id = ids[parentIdsInBeam_[i]];
int parentRowId = std::div(parentIdsInBeam_[i], beamSize_).quot;
if (parentIdsInBeam_[i] != lastParentIdInBeam)
basePos = getSeqStartPos(beamId, parentRowId);
pathRowIdsInEachBeam_[beamId][i] = id + basePos;
lastParentIdInBeam = parentIdsInBeam_[i];
parentIdsInBeam_[i] = parentRowId;
if (goldAsExtraPath_)
pathRowIdsInEachBeam_[beamId][totalPathCount - 1] =
beams_->gold[beamId] + getSeqStartPos(beamId, goldRowIds_[beamId]);
}
}
}
real CostForOneSequence::globallyNormalizedScore() {
expandedPathScores_.resize(validExpansionCount_);
Matrix::resizeOrCreate(
softmaxOut_, 1, pathRowIdsInEachBeam_[0].size(), false, false);
softmaxOut_->zeroMem();
MatrixPtr tmp = Matrix::create(
softmaxOut_->getData(), softmaxOut_->getWidth(), 1, false, false);
for (size_t i = 0; i < validExpansionCount_; ++i) {
Matrix::resizeOrCreate(expandedPathScores_[i],
pathRowIdsInEachBeam_[i].size(),
1,
false,
false);
expandedPathScores_[i]->zeroMem();
IVectorPtr rowIds = IVector::create(pathRowIdsInEachBeam_[i].data(),
pathRowIdsInEachBeam_[i].size(),
false);
expandedPathScores_[i]->selectRows(*(beams_->scores[i]), *rowIds);
tmp->add(*expandedPathScores_[i]);
}
softmaxOut_->softmax(*softmaxOut_);
return -std::log(softmaxOut_->getData()[goldIdsInFinalExpansion_]);
}
real CostForOneSequence::forward() {
calValidExpandStep();
constructTotalExpansion();
return globallyNormalizedScore();
}
void CostForOneSequence::backward() {
/*
* when softmax layer is the output layer, and it is combined with
* cross-entropy as cost. The derivate with regard to softmax's input
* is simply:
*
* grad_i = softmax_out_i - target_i,
*
* and here hard label is used.
*/
softmaxOut_->getData()[goldIdsInFinalExpansion_] -= 1.;
MatrixPtr tmp = Matrix::create(
softmaxOut_->getData(), softmaxOut_->getWidth(), 1, false, false);
for (size_t i = 0; i < validExpansionCount_; ++i) {
IVectorPtr rowIds = IVector::create(pathRowIdsInEachBeam_[i].data(),
pathRowIdsInEachBeam_[i].size(),
false);
/*
beams_->scoreGrad[i] has been intialized outside this class, this
class only keeps a pointer pointing to the original input gradients,
so here does not need to allocate or initalize the memory.
*/
tmp->addToRows(*beams_->scoreGrad[i], *rowIds);
}
}
REGISTER_LAYER(cross_entropy_over_beam, CrossEntropyOverBeam);
bool CrossEntropyOverBeam::init(const LayerMap& layerMap,
const ParameterMap& parameterMap) {
/* Initialize the basic parent class */
Layer::init(layerMap, parameterMap);
CHECK_EQ(0U, inputLayers_.size() % 3) << "Error input number.";
beamExpanCount_ = inputLayers_.size() / 3;
candidateScores_.resize(beamExpanCount_);
candidateScoreGrad_.resize(beamExpanCount_);
candidateInBeam_.resize(beamExpanCount_);
goldSequence_.resize(beamExpanCount_);
gradToInputs_.resize(beamExpanCount_);
setNeedSequenceInfo(false);
return true;
}
void CrossEntropyOverBeam::checkInputs() {
batchSize_ = 0;
for (size_t i = 0; i < beamExpanCount_; ++i) {
const Argument& scores = getInput(i * 3);
const Argument& selCandidates = getInput(i * 3 + 1);
const Argument& goldSeq = getInput(i * 3 + 2);
if (i) {
CHECK(scores.hasSubseq()) << "input " << i << " "
<< inputLayers_[i * 3]->getName()
<< " should be a nested sequence";
CHECK_EQ(getInputValue(i * 3 + 1)->getWidth(), beamSize_);
CHECK_EQ(batchSize_, static_cast<size_t>(scores.getNumSequences()));
CHECK_EQ(scores.getNumSubSequences(), selCandidates.getBatchSize());
} else {
CHECK(scores.hasSeq()) << "input " << i << " "
<< inputLayers_[i]->getName()
<< " should be a sequence";
batchSize_ = scores.getNumSequences();
beamSize_ = getInputValue(i * 3 + 1)->getWidth();
CHECK_EQ(batchSize_, static_cast<size_t>(selCandidates.getBatchSize()));
}
CHECK_EQ(1U, scores.value->getWidth());
CHECK_EQ(batchSize_, static_cast<size_t>(goldSeq.getBatchSize()));
}
}
void CrossEntropyOverBeam::copyInputsToCpu() {
auto copyValue = [](const MatrixPtr& src, MatrixPtr& trg) {
if (dynamic_cast<GpuMatrix*>(src.get())) {
Matrix::resizeOrCreate(
trg, src->getHeight(), src->getWidth(), false, false);
trg->copyFrom(*src);
} else {
trg = std::move(src);
}
};
auto copyIds = [](const IVectorPtr& src, IVectorPtr& trg) {
if (dynamic_cast<GpuIVector*>(src.get())) {
IVector::resizeOrCreate(trg, src->getSize(), false);
trg->copyFrom(*src);
} else {
trg = std::move(src);
}
};
beamSplitPos_.clear();
beamSplitPos_.resize(batchSize_, std::vector<int>(beamExpanCount_, 0));
for (size_t i = 0; i < beamExpanCount_; ++i) {
copyValue(getInputValue(i * 3), candidateScores_[i]);
copyValue(getInputValue(i * 3 + 1), candidateInBeam_[i]);
copyIds(getInput(i * 3 + 2).ids, goldSequence_[i]);
if (i) {
ICpuGpuVectorPtr seqInfo = getInput(i * 3).sequenceStartPositions;
const int* seqStarts = seqInfo->getMutableData(false);
ICpuGpuVectorPtr subSeqInfo = getInput(i * 3).subSequenceStartPositions;
const int* subSeqStarts = subSeqInfo->getMutableData(false);
size_t seqId = 1;
for (size_t subSeqId = 0; subSeqId < subSeqInfo->getSize() - 1;
++subSeqId) {
CHECK_LT(seqId, seqInfo->getSize());
if (subSeqStarts[subSeqId] == seqStarts[seqId]) {
beamSplitPos_[seqId][i] = beamSplitPos_[seqId - 1][i];
seqId++;
}
beamSplitPos_[seqId - 1][i]++;
}
} else {
for (size_t j = 0; j < batchSize_; ++j) beamSplitPos_[j][i] = j + 1;
}
}
}
void CrossEntropyOverBeam::splitBatchBeams() {
beamCosts_.resize(batchSize_);
beamPerSeq_.resize(batchSize_, BeamExpansion(beamExpanCount_));
for (size_t i = 0; i < beamExpanCount_; ++i) {
int* seqStarts =
getInput(i * 3).sequenceStartPositions->getMutableData(false);
int* subSeqStarts = nullptr;
int maxLen = 0;
if (i) {
subSeqStarts =
getInput(i * 3).subSequenceStartPositions->getMutableData(false);
maxLen = getInput(i * 3).subSequenceStartPositions->getSize() - 1;
} else {
maxLen = getInput(i).sequenceStartPositions->getSize() - 1;
}
for (size_t j = 0; j < batchSize_; ++j) {
beamPerSeq_[j].scores[i] =
Matrix::create(candidateScores_[i]->getData() + seqStarts[j],
seqStarts[j + 1] - seqStarts[j],
1,
false,
false);
beamPerSeq_[j].scoreGrad[i] =
Matrix::create(candidateScoreGrad_[i]->getData() + seqStarts[j],
seqStarts[j + 1] - seqStarts[j],
1,
false,
false);
int offset = j ? beamSplitPos_[j - 1][i] : 0;
int height = beamSplitPos_[j][i] - (j ? beamSplitPos_[j - 1][i] : 0);
CHECK_GE(maxLen, offset + height);
beamPerSeq_[j].seqInfo[i] = IVector::create(
(i ? subSeqStarts : seqStarts) + offset, height + 1, false);
beamPerSeq_[j].candidateIds[i] =
Matrix::create(candidateInBeam_[i]->getData() + offset * beamSize_,
height,
beamSize_,
false,
false);
beamPerSeq_[j].gold[i] = goldSequence_[i]->getData()[j];
CHECK_LE(beamPerSeq_[j].gold[i], seqStarts[j + 1] - seqStarts[j]);
}
}
}
void CrossEntropyOverBeam::resizeOutput() {
Matrix::resizeOrCreate(output_.value, batchSize_, 1, false, false);
output_.value->zeroMem();
for (size_t i = 0; i < beamExpanCount_; ++i) {
MatrixPtr inGrad = getInputGrad(i * 3);
if (dynamic_cast<GpuMatrix*>(inGrad.get())) {
Matrix::resizeOrCreate(candidateScoreGrad_[i],
inGrad->getHeight(),
inGrad->getWidth(),
false,
false);
} else {
candidateScoreGrad_[i] = std::move(inGrad);
}
candidateScoreGrad_[i]->zeroMem();
}
}
void CrossEntropyOverBeam::copyGradToGpu(size_t copyCount) {
for (size_t i = 0; i < beamExpanCount_; ++i) {
if (dynamic_cast<GpuMatrix*>(getInputGrad(i * 3).get()))
getInputGrad(i * 3)->copyFrom(*candidateScoreGrad_[i]);
if (i == copyCount - 1) break;
}
}
void CrossEntropyOverBeam::forward(PassType passType) {
Layer::forward(passType);
checkInputs();
copyInputsToCpu();
resizeOutput();
splitBatchBeams();
MatrixPtr outputValue = getOutputValue();
for (size_t i = 0; i < batchSize_; ++i) {
BeamExpansionPtr ptr = std::make_shared<BeamExpansion>(beamPerSeq_[i]);
beamCosts_[i].setData(std::move(ptr), beamSize_);
outputValue->getData()[i] = beamCosts_[i].forward();
}
}
void CrossEntropyOverBeam::backward(const UpdateCallback& callback) {
for (size_t i = 0; i < batchSize_; ++i) {
beamCosts_[i].backward();
copyGradToGpu(beamCosts_[i].getValidExpansionCount());
}
}
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "CrossEntropyOverBeam.h"
#include "Layer.h"
namespace paddle {
/* This struct stores the beams in all search steps for a single sequence. */
struct BeamExpansion {
std::vector<MatrixPtr> scores;
std::vector<IVectorPtr> seqInfo;
std::vector<MatrixPtr> candidateIds;
std::vector<int> gold;
std::vector<MatrixPtr> scoreGrad;
size_t expansionCount;
explicit BeamExpansion(int n) {
expansionCount = n;
scores.resize(expansionCount);
seqInfo.resize(expansionCount);
candidateIds.resize(expansionCount);
scoreGrad.resize(expansionCount);
gold.resize(expansionCount);
}
};
typedef std::shared_ptr<BeamExpansion> BeamExpansionPtr;
class CostForOneSequence {
public:
CostForOneSequence()
: beamSize_(0), validExpansionCount_(0), goldAsExtraPath_(false) {}
void setData(const BeamExpansionPtr bPtr, size_t beamSize) {
beams_ = bPtr;
beamSize_ = beamSize;
expandedPathScores_.clear();
expandedPathScores_.resize(beams_->expansionCount);
goldRowIds_.clear();
goldRowIds_.resize(beams_->expansionCount, 0);
goldColIds_.clear();
goldColIds_.resize(beams_->expansionCount, -1);
}
size_t getValidExpansionCount() { return validExpansionCount_; }
real forward();
void backward();
private:
void calValidExpandStep();
void constructTotalExpansion();
size_t initLastExpansion();
real globallyNormalizedScore();
int getSeqStartPos(size_t beamId, size_t rowId) {
CHECK_GT(beams_->seqInfo[beamId]->getSize() - 1, rowId);
int* starts = beams_->seqInfo[beamId]->getData();
return starts[rowId] - starts[0];
}
size_t beamSize_;
size_t validExpansionCount_;
bool goldAsExtraPath_;
std::vector<int> goldRowIds_;
std::vector<int> goldColIds_;
BeamExpansionPtr beams_;
std::vector<std::vector<int>> pathRowIdsInEachBeam_;
std::vector<int> parentIdsInBeam_;
size_t goldIdsInFinalExpansion_;
std::vector<MatrixPtr> expandedPathScores_;
MatrixPtr softmaxOut_;
};
class CrossEntropyOverBeam : public Layer {
public:
explicit CrossEntropyOverBeam(const LayerConfig& config) : Layer(config) {}
bool init(const LayerMap& layerMap,
const ParameterMap& parameterMap) override;
void forward(PassType passType) override;
void backward(const UpdateCallback& callback) override;
private:
void checkInputs();
void copyInputsToCpu();
void resizeOutput();
void copyGradToGpu(size_t copyCount);
void splitBatchBeams();
size_t beamExpanCount_;
size_t batchSize_;
size_t beamSize_;
/*
* the process of constructing beams is not friendly to GPU, currently, this
* layer only runs on CPU, if any of its inputs is on GPU memory, then copy
* it to CPU memory.
*/
std::vector<MatrixPtr> candidateScores_;
std::vector<MatrixPtr> candidateScoreGrad_;
std::vector<MatrixPtr> candidateInBeam_;
std::vector<MatrixPtr> gradToInputs_;
std::vector<IVectorPtr> goldSequence_;
std::vector<std::vector<int>> beamSplitPos_;
/*
* split entire bath of beams into beam per sequnence and store the result
* into this member.
*/
std::vector<BeamExpansion> beamPerSeq_;
/* beamCosts_ is used to propagate error in one sequence. */
std::vector<CostForOneSequence> beamCosts_;
};
} // namespace paddle
...@@ -46,8 +46,26 @@ bool CudnnConvBaseLayer::init(const LayerMap &layerMap, ...@@ -46,8 +46,26 @@ bool CudnnConvBaseLayer::init(const LayerMap &layerMap,
projConf_.emplace_back(conf); projConf_.emplace_back(conf);
projections_.emplace_back( projections_.emplace_back(
Projection::create(*projConf_[i], parameters_[i], useGpu_)); Projection::create(*projConf_[i], parameters_[i], useGpu_));
// create a new weight
size_t height, width;
height = filterPixels_[i] * filterChannels_[i];
width = (!isDeconv_) ? numFilters_ : channels_[i];
CHECK_EQ(parameters_[i]->getSize(), width * height);
Weight *w = new Weight(height, width, parameters_[i]);
weights_.emplace_back(w);
} }
if (biasParameter_.get()) {
if (sharedBiases_) {
CHECK_EQ((size_t)numFilters_, biasParameter_->getSize());
biases_ =
std::unique_ptr<Weight>(new Weight(numFilters_, 1, biasParameter_));
} else {
biases_ =
std::unique_ptr<Weight>(new Weight(getSize(), 1, biasParameter_));
}
}
if (biases_.get() && sharedBiases_) { if (biases_.get() && sharedBiases_) {
hl_create_tensor_descriptor(&biasDesc_); hl_create_tensor_descriptor(&biasDesc_);
hl_create_tensor_descriptor(&outputDesc_); hl_create_tensor_descriptor(&outputDesc_);
......
/* Copyright (c) 2016 Baidu, Inc. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "DeConv3DLayer.h"
#include "paddle/utils/Logging.h"
#include "paddle/utils/Stat.h"
namespace paddle {
REGISTER_LAYER(deconv3d, DeConv3DLayer);
bool DeConv3DLayer::init(const LayerMap &layerMap,
const ParameterMap &parameterMap) {
if (!ConvBaseLayer::init(layerMap, parameterMap)) return false;
// for Deconv, the dimension of Kernel is
// channel * output * depth * height * weigth
// Matrix storage format: (output * depth * height * weigth) x channel
for (int index = 0; index < config_.inputs().size(); ++index) {
M_.push_back(filterChannels_[index]);
K_.push_back(filterPixels_[index] * (numFilters_ / groups_[index]));
// create a new weight
size_t height, width;
height = filterPixels_[index] * numFilters_;
width = filterChannels_[index];
CHECK_EQ(parameters_[index]->getSize(), width * height);
Weight *w = new Weight(height, width, parameters_[index]);
weights_.emplace_back(w);
}
if (biasParameter_.get()) {
if (sharedBiases_) {
CHECK_EQ((size_t)numFilters_, biasParameter_->getSize());
biases_ =
std::unique_ptr<Weight>(new Weight(numFilters_, 1, biasParameter_));
} else {
biases_ =
std::unique_ptr<Weight>(new Weight(getSize(), 1, biasParameter_));
}
}
return true;
}
size_t DeConv3DLayer::getSize() {
CHECK_NE(inputLayers_.size(), 0UL);
outputH_.clear();
outputW_.clear();
outputD_.clear();
N_.clear();
NOut_.clear();
size_t layerSize = 0;
for (size_t i = 0; i < inputLayers_.size(); ++i) {
outputW_.push_back(
imageSize(imgSizeW_[i], filterSize_[i], padding_[i], stride_[i], true));
outputH_.push_back(imageSize(
imgSizeH_[i], filterSizeY_[i], paddingY_[i], strideY_[i], true));
outputD_.push_back(imageSize(
imgSizeD_[i], filterSizeZ_[i], paddingZ_[i], strideZ_[i], true));
NOut_.push_back(outputD_[i] * outputH_[i] * outputW_[i]);
N_.push_back(imgSizeD_[i] * imgSizeH_[i] * imgSizeW_[i]);
CHECK(layerSize == 0 || N_[i] * size_t(numFilters_) == layerSize);
layerSize += NOut_[i] * numFilters_;
}
getOutput().setFrameHeight(outputH_[0]);
getOutput().setFrameWidth(outputW_[0]);
getOutput().setFrameDepth(outputD_[0]);
return layerSize;
}
void DeConv3DLayer::forward(PassType passType) {
Layer::forward(passType);
int batchSize = inputLayers_[0]->getOutputValue()->getHeight();
int outWidth = getSize();
resetOutput(batchSize, outWidth);
const MatrixPtr outMat = getOutputValue();
for (size_t i = 0; i != inputLayers_.size(); ++i) {
REGISTER_TIMER_INFO("FwdDeConv3D", getName().c_str());
const MatrixPtr &inMat = getInputValue(i);
int M = M_[i];
int N = N_[i];
int K = K_[i];
MatrixPtr wMat = weights_[i]->getW();
Matrix::resizeOrCreate(colBuf_, K * groups_[i], N, false, useGpu_);
for (int n = 0; n < batchSize; ++n) {
real *inData = inMat->getData() + n * inMat->getStride();
for (int g = 0; g < groups_[i]; ++g) {
MatrixPtr inMatSub = Matrix::create(inData, M, N, false, useGpu_);
MatrixPtr wMatSub = wMat->subMatrix(g * K, K);
MatrixPtr colBufDataSub = colBuf_->subMatrix(g * K, K);
colBufDataSub->mul(*wMatSub, *inMatSub, 1.0, 0.0);
inData += M * N;
}
colBuf_->col2Vol(outMat->getData() + n * outMat->getStride(),
numFilters_,
outputD_[i],
outputH_[i],
outputW_[i],
filterSizeZ_[i],
filterSizeY_[i],
filterSize_[i],
strideZ_[i],
strideY_[i],
stride_[i],
paddingZ_[i],
paddingY_[i],
padding_[i],
1.0,
1.0);
}
}
if (nullptr != this->biasParameter_) {
REGISTER_TIMER_INFO("FwBiasTimer", getName().c_str());
this->addBias();
}
forwardActivation();
}
void DeConv3DLayer::backward(const UpdateCallback &callback) {
backwardActivation();
int batchSize = getOutputGrad()->getHeight();
if (biases_ && biases_->getWGrad()) {
bpropBiases();
biases_->getParameterPtr()->incUpdate(callback);
}
for (size_t i = 0; i < inputLayers_.size(); ++i) {
if (weights_[i]->getWGrad() || this->needGradient_) {
int M = M_[i];
int N = N_[i];
int K = K_[i];
REGISTER_TIMER_INFO("BwdDeConv3D", getName().c_str());
Matrix::resizeOrCreate(colBuf_, K * groups_[i], N, false, useGpu_);
const MatrixPtr &inMat = getInputValue(i);
for (int n = 0; n < batchSize; ++n) {
colBuf_->vol2Col(
getOutputGrad()->getData() + n * getOutputGrad()->getStride(),
numFilters_,
outputD_[i],
outputH_[i],
outputW_[i],
filterSizeZ_[i],
filterSizeY_[i],
filterSize_[i],
strideZ_[i],
strideY_[i],
stride_[i],
paddingZ_[i],
paddingY_[i],
padding_[i]);
if (weights_[i]->getWGrad()) {
real *inData = inMat->getData() + n * inMat->getStride();
for (int g = 0; g < groups_[i]; ++g) {
MatrixPtr colBufDataSub = colBuf_->subMatrix(g * K, K);
MatrixPtr wGradMatSub =
weights_[i]->getWGrad()->subMatrix(g * K, K);
MatrixPtr inMatSub = Matrix::create(inData, M, N, false, useGpu_);
wGradMatSub->mul(
*colBufDataSub, *(inMatSub->getTranspose()), 1.0, 1.0);
inData += M * N;
}
}
if (getInputGrad(i)) {
real *preGrad =
getInputGrad(i)->getData() + n * getInputGrad(i)->getStride();
for (int g = 0; g < groups_[i]; ++g) {
MatrixPtr w = weights_[i]->getW()->subMatrix(g * K, K);
MatrixPtr outGradMat = colBuf_->subMatrix(g * K, K);
MatrixPtr inGradMatSub =
Matrix::create(preGrad, M, N, false, useGpu_);
inGradMatSub->mul(*(w->getTranspose()), *outGradMat, 1.0, 1.0);
preGrad += M * N;
}
}
}
REGISTER_TIMER_INFO("WeightUpdate", getName().c_str());
weights_[i]->getParameterPtr()->incUpdate(callback);
}
}
}
void DeConv3DLayer::bpropWeights(int i) {}
void DeConv3DLayer::bpropData(int i) {}
void DeConv3DLayer::bpropBiases() {
MatrixPtr biases = Matrix::create(biases_->getWGrad()->getData(),
1,
biases_->getWGrad()->getElementCnt(),
false,
useGpu_);
const MatrixPtr &outGradMat = getOutputGrad();
if (this->sharedBiases_) {
biases->collectSharedBias(*outGradMat, 1.0f);
} else {
biases->collectBias(*outGradMat, 1.0f);
}
}
void DeConv3DLayer::addBias() {
MatrixPtr outMat = getOutputValue();
MatrixPtr bias = Matrix::create(biases_->getW()->getData(),
1,
biases_->getW()->getElementCnt(),
false,
useGpu_);
if (this->sharedBiases_) {
outMat->addSharedBias(*(bias), 1.0f);
} else {
outMat->addBias(*(bias), 1.0f);
}
}
} // namespace paddle
/* Copyright (c) 2016 Baidu, Inc. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "ConvBaseLayer.h"
#include "paddle/math/MathUtils.h"
#include "paddle/math/Matrix.h"
namespace paddle {
/**
* @brief A subclass of deconvolution3D layer.
* This layer expands input and use matrix multiplication to
* calculate deconvolution3D operation.
*/
class DeConv3DLayer : public ConvBaseLayer {
public:
explicit DeConv3DLayer(const LayerConfig& config) : ConvBaseLayer(config) {}
~DeConv3DLayer() {}
bool init(const LayerMap& layerMap, const ParameterMap& parameterMap);
void forward(PassType passType);
void addBias();
void backward(const UpdateCallback& callback);
void bpropBiases();
void bpropData(int i);
void bpropWeights(int i);
size_t getSize();
protected:
// Figure out the dimensions for individual gemms.
IntV M_; /// numFilters_ / filter_group_;
IntV N_; /// channels_ * filterSizeZ_ * filterSize_ * filterSizeY_
IntV K_; /// outputD_ * outputH_ * outputW_
IntV NOut_;
MatrixPtr colBuf_;
};
} // namespace paddle
...@@ -22,12 +22,31 @@ bool ExpandConvBaseLayer::init(const LayerMap &layerMap, ...@@ -22,12 +22,31 @@ bool ExpandConvBaseLayer::init(const LayerMap &layerMap,
/* Initialize the basic convolutional parent class */ /* Initialize the basic convolutional parent class */
ConvBaseLayer::init(layerMap, parameterMap); ConvBaseLayer::init(layerMap, parameterMap);
int index = 0;
for (auto &inputConfig : config_.inputs()) { for (auto &inputConfig : config_.inputs()) {
const ConvConfig &conf = inputConfig.conv_conf(); const ConvConfig &conf = inputConfig.conv_conf();
/* Consistent caffe mode for multiple input */ /* Consistent caffe mode for multiple input */
caffeMode_ = conf.caffe_mode(); caffeMode_ = conf.caffe_mode();
}
// create a new weight
size_t height, width;
height = filterPixels_[index] * filterChannels_[index];
width = (!isDeconv_) ? numFilters_ : channels_[index];
CHECK_EQ(parameters_[index]->getSize(), width * height);
Weight *w = new Weight(height, width, parameters_[index]);
weights_.emplace_back(w);
index++;
}
if (biasParameter_.get()) {
if (sharedBiases_) {
CHECK_EQ((size_t)numFilters_, biasParameter_->getSize());
biases_ =
std::unique_ptr<Weight>(new Weight(numFilters_, 1, biasParameter_));
} else {
biases_ =
std::unique_ptr<Weight>(new Weight(getSize(), 1, biasParameter_));
}
}
getOutputSize(); getOutputSize();
return true; return true;
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "Pool3DLayer.h"
#include "PoolProjectionLayer.h"
#include "paddle/utils/Logging.h"
namespace paddle {
REGISTER_LAYER(pool3d, Pool3DLayer);
bool Pool3DLayer::init(const LayerMap& layerMap,
const ParameterMap& parameterMap) {
Layer::init(layerMap, parameterMap);
/* the size of inputs for pool-layer is 1 */
CHECK_EQ(config_.inputs_size(), 1);
const PoolConfig& conf = config_.inputs(0).pool_conf();
poolType_ = conf.pool_type();
channels_ = conf.channels();
sizeX_ = conf.size_x();
sizeY_ = conf.size_y();
sizeZ_ = conf.size_z();
strideW_ = conf.stride();
strideH_ = conf.stride_y();
strideD_ = conf.stride_z();
imgSizeW_ = conf.img_size();
imgSizeH_ = conf.img_size_y();
imgSizeD_ = conf.img_size_z();
paddingW_ = conf.padding();
paddingH_ = conf.padding_y();
paddingD_ = conf.padding_z();
outputW_ = conf.output_x();
outputH_ = conf.output_y();
outputD_ = conf.output_z();
return true;
}
size_t Pool3DLayer::getSize() {
CHECK_EQ(inputLayers_.size(), 1UL);
size_t layerSize = 0;
outputD_ = outputSize(imgSizeD_, sizeZ_, paddingD_, strideD_, false);
outputH_ = outputSize(imgSizeH_, sizeY_, paddingH_, strideH_, false);
outputW_ = outputSize(imgSizeW_, sizeX_, paddingW_, strideW_, false);
layerSize = outputD_ * outputH_ * outputW_ * channels_;
getOutput().setFrameHeight(outputH_);
getOutput().setFrameWidth(outputW_);
getOutput().setFrameDepth(outputD_);
return layerSize;
}
void Pool3DLayer::forward(PassType passType) {
Layer::forward(passType);
const MatrixPtr& inMat = inputLayers_[0]->getOutputValue();
size_t batchSize = inMat->getHeight();
size_t outWidth = getSize();
resetOutput(batchSize, outWidth);
Matrix::resizeOrCreate(maxPoolIdx_, batchSize, outWidth, false, useGpu_);
const MatrixPtr outMat = getOutputValue();
if (poolType_ == "avg") {
outMat->avgPool3DForward(*inMat,
channels_,
imgSizeD_,
imgSizeH_,
imgSizeW_,
outputD_,
outputH_,
outputW_,
sizeZ_,
sizeY_,
sizeX_,
strideD_,
strideH_,
strideW_,
paddingD_,
paddingH_,
paddingW_);
} else if (poolType_ == "max") {
outMat->maxPool3DForward(*inMat,
*maxPoolIdx_,
channels_,
imgSizeD_,
imgSizeH_,
imgSizeW_,
outputD_,
outputH_,
outputW_,
sizeZ_,
sizeY_,
sizeX_,
strideD_,
strideH_,
strideW_,
paddingD_,
paddingH_,
paddingW_);
} else {
LOG(FATAL) << "Unknown pool type: " << poolType_;
}
forwardActivation();
}
void Pool3DLayer::backward(const UpdateCallback& callback) {
backwardActivation();
(void)callback;
if (NULL == getInputGrad(0)) return;
MatrixPtr inMat = inputLayers_[0]->getOutputValue();
MatrixPtr inGradMat = inputLayers_[0]->getOutputGrad();
MatrixPtr outMat = getOutputValue();
MatrixPtr outGradMat = getOutputGrad();
if (poolType_ == "avg") {
inGradMat->avgPool3DBackward(*outGradMat,
imgSizeD_,
imgSizeH_,
imgSizeW_,
outputD_,
outputH_,
outputW_,
sizeZ_,
sizeY_,
sizeZ_,
strideD_,
strideH_,
strideW_,
paddingD_,
paddingH_,
paddingW_,
1.0,
1.0);
} else if (poolType_ == "max") {
inGradMat->maxPool3DBackward(*outGradMat,
*maxPoolIdx_,
imgSizeD_,
imgSizeH_,
imgSizeW_,
outputD_,
outputH_,
outputW_,
sizeZ_,
sizeY_,
sizeZ_,
strideD_,
strideH_,
strideW_,
paddingD_,
paddingH_,
paddingW_,
1.0,
1.0);
} else {
LOG(FATAL) << "Unknown pool type: " << poolType_;
}
}
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "Layer.h"
#include "paddle/math/MathUtils.h"
#include "paddle/math/Matrix.h"
namespace paddle {
/**
* @brief Basic parent layer of pooling
* Pools the input within regions
*/
class Pool3DLayer : public Layer {
public:
explicit Pool3DLayer(const LayerConfig& config) : Layer(config) {}
~Pool3DLayer() {}
bool init(const LayerMap& layerMap,
const ParameterMap& parameterMap) override;
void forward(PassType passType) override;
void backward(const UpdateCallback& callback) override;
size_t getSize();
protected:
int channels_;
int sizeX_, sizeY_, sizeZ_;
int strideW_, strideH_, strideD_;
int paddingW_, paddingH_, paddingD_;
int imgSizeW_, imgSizeH_, imgSizeD_;
int outputW_, outputH_, outputD_;
std::string poolType_;
MatrixPtr maxPoolIdx_;
};
} // namespace paddle
...@@ -48,7 +48,16 @@ public: ...@@ -48,7 +48,16 @@ public:
<< inputLayers_.size() << ") at " << getName(); << inputLayers_.size() << ") at " << getName();
} }
s << format.substr(pos); s << format.substr(pos);
LOG(INFO) << s.str();
const std::string delimiter("\n");
std::string content = s.str();
std::string::size_type foundPos = 0;
std::string::size_type prevPos = 0;
while ((foundPos = content.find(delimiter, prevPos)) != std::string::npos) {
LOG(INFO) << content.substr(prevPos, foundPos - prevPos);
prevPos = foundPos + delimiter.size();
}
LOG(INFO) << content.substr(prevPos);
} }
void backward(const UpdateCallback& callback) override {} void backward(const UpdateCallback& callback) override {}
......
...@@ -34,6 +34,13 @@ add_unittest_without_exec(test_CRFLayerGrad ...@@ -34,6 +34,13 @@ add_unittest_without_exec(test_CRFLayerGrad
add_test(NAME test_CRFLayerGrad add_test(NAME test_CRFLayerGrad
COMMAND test_CRFLayerGrad) COMMAND test_CRFLayerGrad)
################ test_CrossEntropyOverBeam ####################
add_unittest_without_exec(test_CrossEntropyOverBeam
test_CrossEntropyOverBeamGrad.cpp
LayerGradUtil.cpp)
add_test(NAME test_CrossEntropyOverBeam
COMMAND test_CrossEntropyOverBeam)
################ test_SeqSliceLayerGrad #################### ################ test_SeqSliceLayerGrad ####################
add_unittest_without_exec(test_SeqSliceLayerGrad add_unittest_without_exec(test_SeqSliceLayerGrad
test_SeqSliceLayerGrad.cpp test_SeqSliceLayerGrad.cpp
......
/* Copyright (c) 2016 Baidu, Inc. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <random>
#include <sstream>
#include <gtest/gtest.h>
#include "ModelConfig.pb.h"
#include "paddle/gserver/layers/DataLayer.h"
#include "paddle/trainer/Trainer.h"
#include "LayerGradUtil.h"
#include "paddle/testing/TestUtil.h"
using namespace paddle; // NOLINT
DECLARE_int32(gpu_id);
DECLARE_bool(thread_local_rand_use_global_seed);
const size_t MAX_SEQ_NUM = 23;
const size_t MAX_SEQ_LEN = 50;
const size_t MAX_BEAM_SIZE = 27;
const size_t SEED = (size_t)(time(NULL));
struct SingleBeamExpansion {
vector<int> seqStartPos;
vector<int> subSeqStartPos;
vector<real> candidateScores;
// TODO(caoying): store this into Argument.ids
vector<real> selectedIndices;
vector<int> groundTruth;
vector<size_t> inBeam;
vector<int> rowIdxInBeam;
vector<int> colIdxInBeam;
void resetGroundTruth(size_t n) {
groundTruth.clear();
groundTruth.resize(n, -1);
inBeam.clear();
inBeam.resize(n, 0);
rowIdxInBeam.clear();
rowIdxInBeam.resize(n, -1);
colIdxInBeam.clear();
colIdxInBeam.resize(n, -1);
}
};
inline float randFloat() {
return static_cast<float>(rand()) / static_cast<float>(RAND_MAX);
}
void genRand(real* numbers, size_t n) {
default_random_engine generator;
uniform_real_distribution<real> distribution(0.0, 1.0);
for (size_t i = 0; i < n; ++i) numbers[i] = distribution(generator);
}
vector<real> randSampling(real range, int n) {
CHECK_GE(range, n);
vector<real> num(range);
iota(begin(num), end(num), 0.);
if (range == n) return num;
random_shuffle(begin(num), end(num));
num.resize(n);
sort(begin(num), end(num));
return num;
}
void genCandidateScores(bool hasSubseq,
size_t beamSize,
SingleBeamExpansion& prevBeam,
SingleBeamExpansion& curBeam) {
vector<int>& seqStartPos = curBeam.seqStartPos;
seqStartPos.resize(1, 0);
vector<int>& subSeqStartPos = curBeam.subSeqStartPos;
subSeqStartPos.resize(1, 0);
srand(SEED);
if (prevBeam.selectedIndices.size()) {
if (prevBeam.subSeqStartPos.size() > 1) {
int seqIdx = 1;
// samples in previous beam are nested sequences.
for (size_t i = 1; i < prevBeam.subSeqStartPos.size(); ++i) {
for (size_t j = 0; j < beamSize; ++j) {
if (prevBeam.selectedIndices[(i - 1) * beamSize + j] == -1.) break;
subSeqStartPos.push_back(1 + (rand() % MAX_SEQ_LEN) +
subSeqStartPos.back());
}
if (prevBeam.seqStartPos[seqIdx] == prevBeam.subSeqStartPos[i]) {
seqStartPos.push_back(subSeqStartPos.back());
seqIdx++;
}
}
} else {
for (size_t i = 0; i <= prevBeam.selectedIndices.size(); ++i) {
if (i && i % beamSize == 0) {
seqStartPos.push_back(subSeqStartPos.back());
if (i == prevBeam.selectedIndices.size()) break;
}
if (prevBeam.selectedIndices[i] == -1.) continue;
subSeqStartPos.push_back(subSeqStartPos.back() +
(1 + (rand() % MAX_SEQ_LEN)));
}
}
} else {
// the first beam expansion
int seqNum = 1 + (rand() % MAX_SEQ_NUM);
for (int i = 0; i < seqNum; ++i) {
if (hasSubseq) {
for (size_t j = 0; j < 1 + (rand() % MAX_SEQ_NUM); ++j)
subSeqStartPos.push_back(subSeqStartPos.back() +
(1 + (rand() % MAX_SEQ_LEN)));
seqStartPos.push_back(subSeqStartPos.back());
} else {
seqStartPos.push_back(seqStartPos.back() +
(1 + (rand() % MAX_SEQ_LEN)));
}
}
}
size_t totalSeqNum = hasSubseq ? subSeqStartPos.back() : seqStartPos.back();
curBeam.candidateScores.resize(totalSeqNum, 0.);
genRand(curBeam.candidateScores.data(), totalSeqNum);
}
void genSelectedIndices(size_t beamSize,
vector<int>& seqStartPos,
vector<real>& selectedIndices) {
size_t selectedIdsCount = beamSize * (seqStartPos.size() - 1);
selectedIndices.resize(selectedIdsCount, -1.);
for (size_t i = 0; i < seqStartPos.size() - 1; ++i) {
int seqLen = seqStartPos[i + 1] - seqStartPos[i];
int n = min(seqLen, static_cast<int>(beamSize));
vector<real> ids = randSampling(seqLen, n);
memcpy(selectedIndices.data() + i * beamSize,
ids.data(),
sizeof(real) * ids.size());
}
}
void genGroundTruth(vector<SingleBeamExpansion>& beamExpansions,
size_t beamSize) {
SingleBeamExpansion& beam = beamExpansions[1];
size_t seqNum = beam.seqStartPos.size() - 1;
for (size_t i = 2; i < beamExpansions.size(); ++i)
CHECK_EQ(seqNum, beamExpansions[i].seqStartPos.size() - 1);
srand(SEED);
// initialize the first beam.
beam.resetGroundTruth(seqNum);
for (size_t i = 0; i < seqNum; ++i) {
if (randFloat() > 0.5) {
/*
* force the randomly generated label falls in the beam by chance 0.5.
* otherwise, when sequence length is relatively long and beam size is
* relatively small, the gold sequences falls off the beam at in the
* first search.
*/
real* begPos = beam.selectedIndices.data() + i * beamSize;
beam.colIdxInBeam[i] =
rand() % count_if(begPos, begPos + beamSize, [](const real& val) {
return val != -1.;
});
beam.groundTruth[i] =
beam.selectedIndices[i * beamSize + beam.colIdxInBeam[i]];
beam.inBeam[i] = 1;
} else {
int label = rand() % (beam.seqStartPos[i + 1] - beam.seqStartPos[i]);
beam.groundTruth[i] = label;
real* begPos = beam.selectedIndices.data() + i * beamSize;
real* endPos = begPos + beamSize;
real* lblPos = find(begPos, endPos, real(label));
if (lblPos != endPos) {
beam.inBeam[i] = 1;
beam.colIdxInBeam[i] = lblPos - begPos;
}
}
beam.rowIdxInBeam[i] = i;
}
// iterate over each beam expansions
for (size_t i = 2; i < beamExpansions.size(); ++i) {
SingleBeamExpansion& curBeam = beamExpansions[i];
SingleBeamExpansion& prevBeam = beamExpansions[i - 1];
curBeam.resetGroundTruth(seqNum);
// iterate over each sequence
for (size_t j = 0; j < seqNum; ++j) {
if (!prevBeam.inBeam[j]) continue;
// gold sequence falls in the beam in previous search.
real* begPos = prevBeam.selectedIndices.data();
int offset =
prevBeam.rowIdxInBeam[j] * beamSize + prevBeam.colIdxInBeam[j];
curBeam.rowIdxInBeam[j] = count_if(
begPos, begPos + offset, [](const real& val) { return val != -1.; });
if (randFloat() > 0.5) {
// force the randomly generated label falls in the beam by chance 0.5.
real* start =
curBeam.selectedIndices.data() + curBeam.rowIdxInBeam[j] * beamSize;
int n = rand() % count_if(start, start + beamSize, [](const real& val) {
return val != -1.;
});
curBeam.colIdxInBeam[j] = n;
curBeam.groundTruth[j] = *(start + n);
curBeam.inBeam[j] = 1;
} else {
CHECK_LE(curBeam.rowIdxInBeam[j] + 1,
curBeam.subSeqStartPos.size() - 1);
int start = curBeam.subSeqStartPos[curBeam.rowIdxInBeam[j]];
int end = curBeam.subSeqStartPos[curBeam.rowIdxInBeam[j] + 1];
CHECK_GT(size_t(end), size_t(start));
int label = rand() % (end - start);
curBeam.groundTruth[j] = label;
real* findBeg =
curBeam.selectedIndices.data() + curBeam.rowIdxInBeam[j] * beamSize;
real* lblPos =
find(findBeg, findBeg + beamSize, static_cast<real>(label));
if (lblPos != (findBeg + beamSize)) {
curBeam.inBeam[j] = 1;
curBeam.colIdxInBeam[j] = lblPos - findBeg;
}
}
}
}
}
void genOneBeam(size_t beamSize,
bool hasSubseq,
SingleBeamExpansion& prevBeam,
SingleBeamExpansion& curBeam) {
genCandidateScores(hasSubseq, beamSize, prevBeam, curBeam);
genSelectedIndices(beamSize,
hasSubseq ? curBeam.subSeqStartPos : curBeam.seqStartPos,
curBeam.selectedIndices);
}
void genRandomBeamExpansion(size_t expansionCount,
size_t beamSize,
vector<SingleBeamExpansion>& beamExpansions) {
beamExpansions.clear();
beamExpansions.resize(expansionCount + 1);
// beamExpansions[0] is reserved.
for (size_t i = 1; i <= expansionCount; ++i)
genOneBeam(beamSize, bool(i - 1), beamExpansions[i - 1], beamExpansions[i]);
genGroundTruth(beamExpansions, beamSize);
}
void testCrossEntropyOverBeam(bool useGpu,
size_t beamSize,
vector<SingleBeamExpansion>& beams) {
TestConfig config;
config.layerConfig.set_type("cross_entropy_over_beam");
size_t seqNum = 0;
for (size_t i = 1; i < beams.size(); ++i) {
const SingleBeamExpansion& beam = beams[i];
// create scores for all the candidates
MatrixPtr candidateScorePtr =
Matrix::create(beam.candidateScores.size(), 1, false, false);
candidateScorePtr->copyFrom(beam.candidateScores.data(),
beam.candidateScores.size());
ostringstream paramName;
paramName << "candidate_scores_" << i;
if (beam.subSeqStartPos.size() > 1) {
seqNum = beam.subSeqStartPos.size() - 1;
config.inputDefs.push_back({INPUT_SELF_DEFINE_DATA,
paramName.str(),
candidateScorePtr,
beam.seqStartPos,
beam.subSeqStartPos});
} else {
seqNum = beam.seqStartPos.size() - 1;
config.inputDefs.push_back({INPUT_SELF_DEFINE_DATA,
paramName.str(),
candidateScorePtr,
beam.seqStartPos});
}
config.layerConfig.add_inputs();
// create indices for the selected candidates
MatrixPtr selectedCandidates =
Matrix::create(seqNum, beamSize, false, false);
selectedCandidates->copyFrom(beam.selectedIndices.data(),
beam.selectedIndices.size());
paramName.clear();
paramName << "selected_candidates_" << i;
config.inputDefs.push_back(
{INPUT_SELF_DEFINE_DATA, paramName.str(), selectedCandidates});
config.layerConfig.add_inputs();
// create the ground truth
paramName.clear();
paramName << "label_" << i;
config.inputDefs.push_back(
{INPUT_SELF_DEFINE_DATA, paramName.str(), beam.groundTruth});
config.layerConfig.add_inputs();
}
testLayerGrad(
config, "cross_entropy_over_beam", seqNum, false, useGpu, false);
}
TEST(Layer, CrossEntropyOverBeam) {
LOG(INFO) << "SEED = " << SEED;
const size_t beamSize = 1 + rand() % MAX_BEAM_SIZE;
LOG(INFO) << "beamSize = " << beamSize;
// TODO(caoying): test with random beam expansions.
const size_t expansionCount = 3;
vector<SingleBeamExpansion> beams;
genRandomBeamExpansion(expansionCount, beamSize, beams);
for (bool useGpu : {false, true})
testCrossEntropyOverBeam(useGpu, beamSize, beams);
}
int main(int argc, char** argv) {
initMain(argc, argv);
hl_start();
hl_init(FLAGS_gpu_id);
FLAGS_thread_local_rand_use_global_seed = true;
srand(SEED);
testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}
...@@ -1246,6 +1246,75 @@ TEST(Layer, PoolLayer) { ...@@ -1246,6 +1246,75 @@ TEST(Layer, PoolLayer) {
#endif #endif
} }
void setPool3DConfig(TestConfig* config,
PoolConfig* pool,
const string& poolType) {
// filter size
const int NUM_FILTERS = 16;
const int FILTER_SIZE = 3;
const int FILTER_SIZE_Y = 3;
const int FILTER_SIZE_Z = 3;
const int CHANNELS = 16;
(*config).biasSize = 0;
(*config).layerConfig.set_type("pool3d");
(*config).layerConfig.set_num_filters(NUM_FILTERS);
int kw = FILTER_SIZE, kh = FILTER_SIZE_Y, kd = FILTER_SIZE_Z;
int pw = 0, ph = 0, pd = 0;
int sw = 2, sh = 2, sd = 2;
pool->set_pool_type(poolType);
pool->set_pool_type("avg");
pool->set_channels(CHANNELS);
pool->set_size_x(kw);
pool->set_size_y(kh);
pool->set_size_z(kd);
pool->set_padding(0);
pool->set_padding_y(0);
pool->set_padding_z(0);
pool->set_stride(sw);
pool->set_stride_y(sh);
pool->set_stride_z(sd);
pool->set_start(0);
int ow = outputSize(pool->img_size(), kw, pw, sw, /* caffeMode */ false);
int oh = outputSize(pool->img_size_y(), kh, ph, sh, /* caffeMode */ false);
int od = outputSize(pool->img_size_z(), kd, pd, sd, /* caffeMode */ false);
pool->set_output_x(ow);
pool->set_output_y(oh);
pool->set_output_z(od);
}
void testPool3DLayer(const string& poolType, bool trans, bool useGpu) {
TestConfig config;
config.inputDefs.push_back({INPUT_DATA, "layer_0", 11664, 0});
LayerInputConfig* input = config.layerConfig.add_inputs();
PoolConfig* pool = input->mutable_pool_conf();
const int IMAGE_SIZE = 9;
const int IMAGE_SIZE_Y = 9;
const int IMAGE_SIZE_Z = 9;
pool->set_img_size(IMAGE_SIZE);
pool->set_img_size_y(IMAGE_SIZE_Y);
pool->set_img_size_z(IMAGE_SIZE_Z);
setPool3DConfig(&config, pool, poolType);
config.layerConfig.set_size(pool->output_x() * pool->output_y() *
pool->channels());
testLayerGrad(config, "pool3d", 100, trans, useGpu);
}
TEST(Layer, Pool3DLayer) {
testPool3DLayer("avg", /* trans= */ false, /* useGpu= */ false);
testPool3DLayer("max", /* trans= */ false, /* useGpu= */ false);
#ifndef PADDLE_ONLY_CPU
testPool3DLayer("avg", /* trans= */ false, /* useGpu= */ true);
testPool3DLayer("max", /* trans= */ false, /* useGpu= */ true);
#endif
}
void testSppLayer(const string& poolType, void testSppLayer(const string& poolType,
const int pyramidHeight, const int pyramidHeight,
bool trans, bool trans,
...@@ -2047,6 +2116,159 @@ TEST(Layer, RowL2NormLayer) { ...@@ -2047,6 +2116,159 @@ TEST(Layer, RowL2NormLayer) {
} }
} }
void test3DConvLayer(const string& type, bool trans, bool useGpu) {
// filter size
const int NUM_FILTERS = 6;
// const int CHANNELS = 3;
const int FILTER_SIZE = 3;
const int FILTER_SIZE_Y = 3;
const int FILTER_SIZE_Z = 3;
// input image
const int CHANNELS = 3;
const int IMAGE_SIZE = 9;
const int IMAGE_SIZE_Y = 9;
const int IMAGE_SIZE_Z = 9;
TestConfig config;
config.biasSize = NUM_FILTERS;
config.layerConfig.set_type(type);
config.layerConfig.set_num_filters(NUM_FILTERS);
config.layerConfig.set_partial_sum(1);
config.layerConfig.set_shared_biases(true);
// Setting up conv3D-trans layer
LayerInputConfig* input = config.layerConfig.add_inputs();
ConvConfig* conv = input->mutable_conv_conf();
conv->set_channels(CHANNELS);
conv->set_filter_size(FILTER_SIZE);
conv->set_filter_size_y(FILTER_SIZE_Y);
conv->set_filter_size_z(FILTER_SIZE_Z);
conv->set_padding(0);
conv->set_padding_y(0);
conv->set_padding_z(0);
conv->set_stride(2);
conv->set_stride_y(2);
conv->set_stride_z(2);
conv->set_img_size(IMAGE_SIZE);
conv->set_img_size_y(IMAGE_SIZE_Y);
conv->set_img_size_z(IMAGE_SIZE_Z);
conv->set_output_x(outputSize(conv->img_size(),
conv->filter_size(),
conv->padding(),
conv->stride(),
/* caffeMode */ true));
conv->set_output_y(outputSize(conv->img_size_y(),
conv->filter_size_y(),
conv->padding_y(),
conv->stride_y(),
/* caffeMode */ true));
conv->set_output_z(outputSize(conv->img_size_z(),
conv->filter_size_z(),
conv->padding_z(),
conv->stride_z(),
/* caffeMode */ true));
config.layerConfig.set_size(conv->output_x() * conv->output_y() *
conv->output_z() * NUM_FILTERS);
conv->set_groups(1);
conv->set_filter_channels(conv->channels() / conv->groups());
config.inputDefs.push_back(
{INPUT_DATA,
"layer_0",
CHANNELS * IMAGE_SIZE * IMAGE_SIZE_Y * IMAGE_SIZE_Z,
conv->filter_channels() * FILTER_SIZE * FILTER_SIZE_Y * FILTER_SIZE_Z *
NUM_FILTERS});
testLayerGrad(config, "conv3D", 10, trans, useGpu);
// Use small batch_size and useWeight=true to test biasGrad
testLayerGrad(config, "conv3D", 2, trans, useGpu, true, 0.02);
}
TEST(Layer, test3DConvLayer) {
test3DConvLayer("conv3d", /* trans= */ false, /* useGpu= */ false);
#ifndef PADDLE_ONLY_CPU
test3DConvLayer("conv3d", /* trans= */ false, /* useGpu= */ true);
#endif
}
void test3DDeConvLayer(const string& type, bool trans, bool useGpu) {
// filter size
const int NUM_FILTERS = 6;
// const int CHANNELS = 3;
const int FILTER_SIZE = 3;
const int FILTER_SIZE_Y = 3;
const int FILTER_SIZE_Z = 3;
// input image
const int CHANNELS = 3;
const int IMAGE_SIZE = 4;
const int IMAGE_SIZE_Y = 6;
const int IMAGE_SIZE_Z = 6;
// Setting up conv-trans layer
TestConfig config;
config.biasSize = NUM_FILTERS;
config.layerConfig.set_type("deconv3d");
config.layerConfig.set_num_filters(NUM_FILTERS);
config.layerConfig.set_partial_sum(1);
config.layerConfig.set_shared_biases(true);
LayerInputConfig* input = config.layerConfig.add_inputs();
ConvConfig* conv = input->mutable_conv_conf();
conv->set_channels(CHANNELS);
conv->set_filter_size(FILTER_SIZE);
conv->set_filter_size_y(FILTER_SIZE_Y);
conv->set_filter_size_z(FILTER_SIZE_Z);
conv->set_padding(0);
conv->set_padding_y(0);
conv->set_padding_z(0);
conv->set_stride(2);
conv->set_stride_y(2);
conv->set_stride_z(2);
conv->set_img_size(IMAGE_SIZE);
conv->set_img_size_y(IMAGE_SIZE_Y);
conv->set_img_size_z(IMAGE_SIZE_Z);
conv->set_output_x(imageSize(conv->img_size(),
conv->filter_size(),
conv->padding(),
conv->stride(),
true));
conv->set_output_y(imageSize(conv->img_size_y(),
conv->filter_size_y(),
conv->padding_y(),
conv->stride_y(),
true));
conv->set_output_z(imageSize(conv->img_size_z(),
conv->filter_size_z(),
conv->padding_z(),
conv->stride_z(),
true));
config.layerConfig.set_size(conv->output_x() * conv->output_y() *
conv->output_z() * NUM_FILTERS);
conv->set_groups(1);
conv->set_filter_channels(conv->channels() / conv->groups());
config.inputDefs.push_back(
{INPUT_DATA,
"layer_0",
CHANNELS * IMAGE_SIZE * IMAGE_SIZE_Y * IMAGE_SIZE_Z,
conv->filter_channels() * FILTER_SIZE * FILTER_SIZE_Y * FILTER_SIZE_Z *
NUM_FILTERS});
testLayerGrad(config, "deconv3D", 10, trans, useGpu);
// Use small batch_size and useWeight=true to test biasGrad
testLayerGrad(config, "deconv3D", 2, trans, useGpu, true, 0.02);
}
TEST(Layer, test3DDeConvLayer) {
test3DDeConvLayer("deconv3d", /* trans= */ false, /* useGpu= */ false);
#ifndef PADDLE_ONLY_CPU
test3DDeConvLayer("deconv3d", /* trans= */ false, /* useGpu= */ true);
#endif
}
TEST(Layer, ScaleShiftLayer) { TEST(Layer, ScaleShiftLayer) {
const size_t batchSize = 16; const size_t batchSize = 16;
const size_t size = 32; const size_t size = 32;
......
...@@ -1190,6 +1190,221 @@ void GpuMatrix::avgPoolBackward(Matrix& outGrad, ...@@ -1190,6 +1190,221 @@ void GpuMatrix::avgPoolBackward(Matrix& outGrad,
outGrad.getStride()); outGrad.getStride());
} }
void GpuMatrix::maxPool3DForward(Matrix& inputMat,
Matrix& maxPoolIdx,
size_t channels,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW) {
CHECK(inputMat.useGpu_) << "Matrix type are not correct";
real* inputData = inputMat.getData();
real* maxPoolIdxData = maxPoolIdx.getData();
size_t num = inputMat.getHeight();
size_t width = imgSizeW;
size_t height = imgSizeH;
size_t depth = imgSizeD;
CHECK(depth * height * width * channels == inputMat.getWidth());
CHECK(height_ == inputMat.getHeight());
CHECK(width_ == outputD * outputH * outputW * channels);
hl_maxpool3D_forward(num,
inputData,
channels,
depth,
height,
width,
outputD,
outputH,
outputW,
sizeZ,
sizeY,
sizeX,
strideD,
strideH,
strideW,
paddingD,
paddingH,
paddingW,
getData(),
maxPoolIdxData,
getStride());
}
void GpuMatrix::maxPool3DBackward(Matrix& outGrad,
Matrix& maxPoolIdx,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW,
real scaleTargets,
real scaleOutput) {
CHECK(outGrad.useGpu_ && maxPoolIdx.useGpu_) << "Matrix type are not equal";
real* outDiff = outGrad.getData();
real* maxPoolIdxData = maxPoolIdx.getData();
size_t frameNum = getHeight();
size_t channels = outGrad.getWidth() / outputD / outputH / outputW;
size_t width = imgSizeW;
size_t height = imgSizeH;
size_t depth = imgSizeD;
CHECK(depth * height * width * channels == getWidth());
CHECK(width_ == depth * width * height * channels);
CHECK(outGrad.getHeight() == maxPoolIdx.getHeight() &&
outGrad.getWidth() == maxPoolIdx.getWidth());
hl_maxpool3D_backward(frameNum,
outDiff,
channels,
depth,
height,
width,
outputD,
outputH,
outputW,
sizeZ,
sizeY,
sizeX,
strideD,
strideH,
strideW,
paddingD,
paddingH,
paddingW,
scaleTargets,
scaleOutput,
getData(),
maxPoolIdxData,
outGrad.getStride());
}
void GpuMatrix::avgPool3DForward(Matrix& inputMat,
size_t channels,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW) {
CHECK(inputMat.useGpu_) << "Matrix type are not equal";
real* inputData = inputMat.getData();
size_t frameNum = inputMat.getHeight();
size_t height = imgSizeH;
size_t width = imgSizeW;
size_t depth = imgSizeD;
CHECK(depth * height * width * channels == inputMat.getWidth());
CHECK(height_ == inputMat.getHeight());
CHECK(width_ == outputD * outputH * outputW * channels);
hl_avgpool3D_forward(frameNum,
inputData,
channels,
depth,
height,
width,
outputD,
outputH,
outputW,
sizeZ,
sizeY,
sizeX,
strideD,
strideH,
strideW,
paddingD,
paddingH,
paddingW,
getData(),
getStride());
}
void GpuMatrix::avgPool3DBackward(Matrix& outGrad,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW,
real scaleTargets,
real scaleOutput) {
CHECK(outGrad.useGpu_) << "Matrix type are not equal";
real* outDiff = outGrad.getData();
size_t frameNum = outGrad.getHeight();
size_t channels = outGrad.getWidth() / outputD / outputH / outputW;
size_t height = imgSizeH;
size_t width = imgSizeW;
size_t depth = imgSizeD;
CHECK(depth * height * width * channels == width_);
CHECK(height_ == outGrad.getHeight());
CHECK(outGrad.getWidth() == outputD * outputH * outputW * channels);
hl_avgpool3D_backward(frameNum,
outDiff,
channels,
depth,
height,
width,
outputD,
outputH,
outputW,
sizeZ,
sizeY,
sizeX,
strideD,
strideH,
strideW,
paddingD,
paddingH,
paddingW,
scaleTargets,
scaleOutput,
getData(),
outGrad.getStride());
}
void GpuMatrix::maxSequenceForward(Matrix& input, void GpuMatrix::maxSequenceForward(Matrix& input,
const IVector& sequence, const IVector& sequence,
IVector& index) { IVector& index) {
...@@ -1389,6 +1604,72 @@ void GpuMatrix::multiBinaryLabelCrossEntropyBp(Matrix& output, Matrix& label) { ...@@ -1389,6 +1604,72 @@ void GpuMatrix::multiBinaryLabelCrossEntropyBp(Matrix& output, Matrix& label) {
output_d, grad_d, mat_d, height_, width_); output_d, grad_d, mat_d, height_, width_);
} }
void GpuMatrix::vol2Col(real* dataSrc,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW) {
hl_matrix_vol2Col(dataSrc,
channels,
depth,
height,
width,
filterD,
filterH,
filterW,
strideD,
strideH,
strideW,
paddingD,
paddingH,
paddingW,
getData());
}
void GpuMatrix::col2Vol(real* dataDst,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
real alpha,
real beta) {
hl_matrix_col2Vol(dataDst,
channels,
depth,
height,
width,
filterD,
filterH,
filterW,
strideD,
strideH,
strideW,
paddingD,
paddingH,
paddingW,
getData(),
alpha,
beta);
}
/** /**
* CpuMatrix * CpuMatrix
*/ */
...@@ -1930,6 +2211,276 @@ void CpuMatrix::avgPoolBackward(Matrix& input, ...@@ -1930,6 +2211,276 @@ void CpuMatrix::avgPoolBackward(Matrix& input,
} }
} }
void CpuMatrix::maxPool3DForward(Matrix& inputMat,
Matrix& maxPoolIdx,
size_t channels,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW) {
real* inputData = inputMat.getData();
real* outData = getData();
real* maxPoolIdxData = maxPoolIdx.getData();
size_t num = inputMat.getHeight();
size_t inWidth = imgSizeW;
size_t inHeight = imgSizeH;
size_t inDepth = imgSizeD;
CHECK(inHeight * inWidth * inDepth == inputMat.getWidth() / channels);
CHECK_EQ(num, this->getHeight());
CHECK_EQ(channels * outputH * outputW * outputD, this->getWidth());
size_t outStride = getStride();
/* initialize the data_ */
for (size_t i = 0; i < height_; i++) {
for (size_t j = 0; j < width_; j++) {
outData[(i)*outStride + j] = -(real)FLT_MAX;
maxPoolIdxData[(i)*outStride + j] = -1;
}
}
/* pool max one by one */
for (size_t n = 0; n < num; ++n) { // frame by frame
if (!isContiguous()) {
outData = getData() + n * outStride;
maxPoolIdxData = maxPoolIdx.getData() + n * outStride;
}
for (size_t c = 0; c < channels; ++c) { // channel by channel
for (size_t pd = 0; pd < outputD; ++pd) {
for (size_t ph = 0; ph < outputH; ++ph) {
for (size_t pw = 0; pw < outputW; ++pw) {
int dstart = pd * strideD - paddingD;
int hstart = ph * strideH - paddingH;
int wstart = pw * strideW - paddingW;
int dend = std::min(dstart + sizeZ, inDepth);
int hend = std::min(hstart + sizeY, inHeight);
int wend = std::min(wstart + sizeX, inWidth);
dstart = std::max(dstart, 0);
hstart = std::max(hstart, 0);
wstart = std::max(wstart, 0);
int maxIdx = -1;
real maxOutData = outData[(pd * outputH + ph) * outputW + pw];
for (int d = dstart; d < dend; ++d) {
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
if (maxOutData <
inputData[(d * inHeight + h) * inWidth + w]) {
maxOutData = inputData[(d * inHeight + h) * inWidth + w];
maxIdx = (d * inHeight + h) * inWidth + w;
}
}
}
}
outData[(pd * outputH + ph) * outputW + pw] = maxOutData;
maxPoolIdxData[(pd * outputH + ph) * outputW + pw] = maxIdx;
}
}
}
// compute offset
inputData += inDepth * inHeight * inWidth;
outData += outputD * outputH * outputW;
maxPoolIdxData += outputD * outputH * outputW;
}
}
}
void CpuMatrix::maxPool3DBackward(Matrix& outGrad,
Matrix& maxPoolIdx,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW,
real scaleTargets,
real scaleOutput) {
size_t num = getHeight();
size_t channels = size_t(width_ / imgSizeD / imgSizeH / imgSizeW);
CHECK(maxPoolIdx.getHeight() == outGrad.getHeight() &&
maxPoolIdx.getWidth() == outGrad.getWidth());
real* tgtGrad = getData();
real* otGrad = outGrad.getData();
real* maxPoolIdxData = maxPoolIdx.getData();
size_t outStride = outGrad.getStride();
for (size_t n = 0; n < num; ++n) {
if (!outGrad.isContiguous()) {
otGrad = outGrad.getData() + n * outStride;
maxPoolIdxData = maxPoolIdx.getData() + n * outStride;
}
for (size_t c = 0; c < channels; ++c) {
for (size_t pd = 0; pd < outputD; ++pd) {
for (size_t ph = 0; ph < outputH; ++ph) {
for (size_t pw = 0; pw < outputW; ++pw) {
const size_t index = (pd * outputH + ph) * outputW + pw;
const size_t tgtIdx = static_cast<size_t>(maxPoolIdxData[index]);
tgtGrad[tgtIdx] =
scaleTargets * tgtGrad[tgtIdx] + scaleOutput * otGrad[index];
}
}
}
// offset
tgtGrad += imgSizeD * imgSizeH * imgSizeW;
otGrad += outputD * outputH * outputW;
maxPoolIdxData += outputD * outputH * outputW;
}
}
}
void CpuMatrix::avgPool3DForward(Matrix& input,
size_t channels,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW) {
// The main loop
size_t num = input.getHeight();
size_t inDepth = imgSizeD;
size_t inHeight = imgSizeH;
size_t inWidth = imgSizeW;
CHECK(inDepth * inHeight * inWidth * channels == input.getWidth());
CHECK(outputD * outputH * outputW * channels * num == height_ * width_);
real* tgtData = getData();
real* inData = input.getData();
for (size_t n = 0; n < num; ++n) {
if (!isContiguous()) {
tgtData = data_ + n * getStride();
}
for (size_t c = 0; c < channels; ++c) {
for (size_t pd = 0; pd < outputD; ++pd) {
for (size_t ph = 0; ph < outputH; ++ph) {
for (size_t pw = 0; pw < outputW; ++pw) {
int dstart = pd * strideD - paddingD;
int hstart = ph * strideH - paddingH;
int wstart = pw * strideW - paddingW;
int dend = std::min(dstart + sizeZ, inDepth + paddingD);
int hend = std::min(hstart + sizeY, inHeight + paddingH);
int wend = std::min(wstart + sizeX, inWidth + paddingW);
int poolSize = (dend - dstart) * (hend - hstart) * (wend - wstart);
dstart = std::max(dstart, 0);
hstart = std::max(hstart, 0);
wstart = std::max(wstart, 0);
dend = std::min(dend, static_cast<int>(inDepth));
hend = std::min(hend, static_cast<int>(inHeight));
wend = std::min(wend, static_cast<int>(inWidth));
CHECK(poolSize);
tgtData[(pd * outputH + ph) * outputW + pw] = 0; // clear
for (int d = dstart; d < dend; ++d) {
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
tgtData[(pd * outputH + ph) * outputW + pw] +=
inData[(d * inHeight + h) * inWidth + w];
}
}
}
tgtData[(pd * outputH + ph) * outputW + pw] /= poolSize;
}
}
}
// compute offset
inData += inDepth * inHeight * inWidth;
tgtData += outputD * outputH * outputW;
}
}
}
void CpuMatrix::avgPool3DBackward(Matrix& input,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW,
real scaleTargets,
real scaleOutput) {
size_t num = input.getHeight();
size_t channels = input.getWidth() / outputD / outputH / outputW;
CHECK(imgSizeD * imgSizeH * imgSizeW * channels == getWidth());
real* inData = input.getData();
real* outData = getData();
for (size_t n = 0; n < num; ++n) {
if (!input.isContiguous()) {
inData = input.getData() + n * input.getStride();
}
for (size_t c = 0; c < channels; ++c) {
for (size_t pd = 0; pd < outputD; ++pd) {
for (size_t ph = 0; ph < outputH; ++ph) {
for (size_t pw = 0; pw < outputW; ++pw) {
int dstart = pd * strideD - paddingD;
int hstart = ph * strideH - paddingH;
int wstart = pw * strideW - paddingW;
int dend = std::min(dstart + sizeZ, imgSizeD + paddingD);
int hend = std::min(hstart + sizeY, imgSizeH + paddingH);
int wend = std::min(wstart + sizeX, imgSizeW + paddingW);
int poolSize = (dend - dstart) * (hend - hstart) * (wend - wstart);
dstart = std::max(dstart, 0);
hstart = std::max(hstart, 0);
wstart = std::max(wstart, 0);
dend = std::min(dend, static_cast<int>(imgSizeD));
hend = std::min(hend, static_cast<int>(imgSizeH));
wend = std::min(wend, static_cast<int>(imgSizeW));
CHECK(poolSize);
for (int d = dstart; d < dend; ++d) {
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
outData[(d * imgSizeH + h) * imgSizeW + w] +=
inData[(pd * outputH + ph) * outputW + pw] / poolSize;
}
}
}
}
}
}
// offset
outData += imgSizeD * imgSizeH * imgSizeW;
inData += outputD * outputH * outputW;
}
}
}
/** /**
* Input: one or more sequences. Each sequence contains some instances. * Input: one or more sequences. Each sequence contains some instances.
* Output: output size is the number of input sequences (NOT input instances). * Output: output size is the number of input sequences (NOT input instances).
...@@ -3975,6 +4526,95 @@ void CpuMatrix::bilinearBackward(const Matrix& out, ...@@ -3975,6 +4526,95 @@ void CpuMatrix::bilinearBackward(const Matrix& out,
} }
} }
void CpuMatrix::vol2Col(real* data,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW) {
real* outData = getData();
int outHeight = (height + 2 * paddingH - filterH) / strideH + 1;
int outWidth = (width + 2 * paddingW - filterW) / strideW + 1;
int outDepth = (depth + 2 * paddingD - filterD) / strideD + 1;
int channelsCol = channels * filterD * filterH * filterW;
for (int c = 0; c < channelsCol; ++c) {
int wOffset = c % filterW;
int hOffset = (c / filterW) % filterH;
int dOffset = (c / filterW / filterH) % filterD;
int cIn = c / filterW / filterH / filterD;
for (int d = 0; d < outDepth; ++d) {
for (int h = 0; h < outHeight; ++h) {
for (int w = 0; w < outWidth; ++w) {
int dPad = d * strideD - paddingD + dOffset;
int hPad = h * strideH - paddingH + hOffset;
int wPad = w * strideW - paddingW + wOffset;
if (hPad >= 0 && hPad < height && wPad >= 0 && wPad < width &&
dPad >= 0 && dPad < depth)
outData[((c * outDepth + d) * outHeight + h) * outWidth + w] =
data[((cIn * depth + dPad) * height + hPad) * width + wPad];
else
outData[((c * outDepth + d) * outHeight + h) * outWidth + w] = 0;
}
}
}
}
}
void CpuMatrix::col2Vol(real* trg,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
real alpha,
real beta) {
real* src = getData();
int outDepth = (depth + 2 * paddingD - filterD) / strideD + 1;
int outHeight = (height + 2 * paddingH - filterH) / strideH + 1;
int outWidth = (width + 2 * paddingW - filterW) / strideW + 1;
int channelsCol = channels * filterD * filterH * filterW;
for (int c = 0; c < channelsCol; ++c) {
int wOffset = c % filterW;
int hOffset = (c / filterW) % filterH;
int dOffset = (c / filterW / filterH) % filterD;
int cIm = c / filterW / filterH / filterD;
for (int d = 0; d < outDepth; ++d) {
for (int h = 0; h < outHeight; ++h) {
for (int w = 0; w < outWidth; ++w) {
int dPad = d * strideD - paddingD + dOffset;
int hPad = h * strideH - paddingH + hOffset;
int wPad = w * strideW - paddingW + wOffset;
if (hPad >= 0 && hPad < height && wPad >= 0 && wPad < width &&
dPad >= 0 && dPad < depth)
trg[((cIm * depth + dPad) * height + hPad) * width + wPad] =
alpha *
src[((c * outDepth + d) * outHeight + h) * outWidth + w] +
beta *
trg[((cIm * depth + dPad) * height + hPad) * width + wPad];
}
}
}
}
}
//////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////
// functions executed via cpu // // functions executed via cpu //
//////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////
......
...@@ -928,6 +928,93 @@ public: ...@@ -928,6 +928,93 @@ public:
size_t paddingW) { size_t paddingW) {
LOG(FATAL) << "Not implemeted"; LOG(FATAL) << "Not implemeted";
} }
/**
* Pooling 3D forward operation, pick out the largest element
* in the sizeX of value
*/
virtual void maxPool3DForward(Matrix& inputMat,
Matrix& maxPoolIdx,
size_t channels,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW) {
LOG(FATAL) << "Not implemeted";
}
virtual void maxPool3DBackward(Matrix& outGrad,
Matrix& maxPoolIdx,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW,
real scaleTargets,
real scaleOutput) {
LOG(FATAL) << "Not implemeted";
}
virtual void avgPool3DForward(Matrix& input,
size_t channels,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW) {
LOG(FATAL) << "Not implemeted";
}
virtual void avgPool3DBackward(Matrix& input,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW,
real scaleTargets,
real scaleOutput) {
LOG(FATAL) << "Not implemeted";
}
/** /**
* Input: one or more sequences. Each sequence contains some instances. * Input: one or more sequences. Each sequence contains some instances.
...@@ -1039,6 +1126,42 @@ public: ...@@ -1039,6 +1126,42 @@ public:
LOG(FATAL) << "Not implemented"; LOG(FATAL) << "Not implemented";
} }
virtual void vol2Col(real* data,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW) {
LOG(FATAL) << "Not implemeted";
}
virtual void col2Vol(real* trg,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
real alpha,
real beta) {
LOG(FATAL) << "Not implemeted";
}
virtual void bilinearForward(const Matrix& in, virtual void bilinearForward(const Matrix& in,
const size_t inImgH, const size_t inImgH,
const size_t inImgW, const size_t inImgW,
...@@ -1348,6 +1471,82 @@ public: ...@@ -1348,6 +1471,82 @@ public:
size_t paddingH, size_t paddingH,
size_t paddingW); size_t paddingW);
void maxPool3DForward(Matrix& inputMat,
Matrix& maxPoolIdx,
size_t channels,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW);
void maxPool3DBackward(Matrix& outGrad,
Matrix& maxPoolIdx,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW,
real scaleTargets,
real scaleOutput);
void avgPool3DForward(Matrix& input,
size_t channels,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW);
void avgPool3DBackward(Matrix& input,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW,
real scaleTargets,
real scaleOutput);
void maxSequenceForward(Matrix& input, void maxSequenceForward(Matrix& input,
const IVector& sequence, const IVector& sequence,
IVector& index); IVector& index);
...@@ -1374,6 +1573,38 @@ public: ...@@ -1374,6 +1573,38 @@ public:
const real ratioH, const real ratioH,
const real ratioW); const real ratioW);
void vol2Col(real* data,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW);
void col2Vol(real* trg,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
real alpha,
real beta);
void multiBinaryLabelCrossEntropy(Matrix& output, Matrix& label); void multiBinaryLabelCrossEntropy(Matrix& output, Matrix& label);
void multiBinaryLabelCrossEntropyBp(Matrix& output, Matrix& label); void multiBinaryLabelCrossEntropyBp(Matrix& output, Matrix& label);
...@@ -1507,6 +1738,82 @@ public: ...@@ -1507,6 +1738,82 @@ public:
size_t paddingH, size_t paddingH,
size_t paddingW); size_t paddingW);
void maxPool3DForward(Matrix& inputMat,
Matrix& maxPoolIdx,
size_t channels,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW);
void maxPool3DBackward(Matrix& outGrad,
Matrix& maxPoolIdx,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW,
real scaleTargets,
real scaleOutput);
void avgPool3DForward(Matrix& input,
size_t channels,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW);
void avgPool3DBackward(Matrix& input,
size_t imgSizeD,
size_t imgSizeH,
size_t imgSizeW,
size_t outputD,
size_t outputH,
size_t outputW,
size_t sizeZ,
size_t sizeY,
size_t sizeX,
size_t strideD,
size_t strideH,
size_t strideW,
size_t paddingD,
size_t paddingH,
size_t paddingW,
real scaleTargets,
real scaleOutput);
void maxSequenceForward(Matrix& input, void maxSequenceForward(Matrix& input,
const IVector& sequence, const IVector& sequence,
IVector& index); IVector& index);
...@@ -1715,6 +2022,38 @@ public: ...@@ -1715,6 +2022,38 @@ public:
const real ratioH, const real ratioH,
const real ratioW); const real ratioW);
void vol2Col(real* data,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW);
void col2Vol(real* trg,
int channels,
int depth,
int height,
int width,
int filterD,
int filterH,
int filterW,
int strideD,
int strideH,
int strideW,
int paddingD,
int paddingH,
int paddingW,
real alpha,
real beta);
template <typename ExpressionType> template <typename ExpressionType>
void operator=(const ExpressionType& expr) { void operator=(const ExpressionType& expr) {
TensorCpuApply<real>(*this, expr); TensorCpuApply<real>(*this, expr);
......
...@@ -18,6 +18,7 @@ limitations under the License. */ ...@@ -18,6 +18,7 @@ limitations under the License. */
#include <gtest/gtest.h> #include <gtest/gtest.h>
#include "TensorCheck.h" #include "TensorCheck.h"
#include "paddle/math/MathUtils.h"
#include "paddle/math/Matrix.h" #include "paddle/math/Matrix.h"
#include "paddle/math/SparseMatrix.h" #include "paddle/math/SparseMatrix.h"
#include "paddle/testing/TestUtil.h" #include "paddle/testing/TestUtil.h"
...@@ -1203,4 +1204,497 @@ TEST(Matrix, warpCTC) { ...@@ -1203,4 +1204,497 @@ TEST(Matrix, warpCTC) {
} }
} }
void testMaxPool3DFwdBwd(int numSamples,
int channels,
int imgSizeD,
int imgSizeH,
int imgSizeW,
int ksizeD,
int ksizeH,
int ksizeW,
int strideD,
int strideH,
int strideW,
int padD,
int padH,
int padW) {
int outD = outputSize(imgSizeD, ksizeD, padD, strideD, true);
int outH = outputSize(imgSizeH, ksizeH, padH, strideH, true);
int outW = outputSize(imgSizeW, ksizeW, padW, strideW, true);
int inWidth = channels * imgSizeD * imgSizeH * imgSizeW;
MatrixPtr input = CpuMatrix::create(numSamples, inWidth, false, false);
MatrixPtr inputGpu = GpuMatrix::create(numSamples, inWidth, false, true);
int outWidth = channels * outD * outH * outW;
MatrixPtr target = CpuMatrix::create(numSamples, outWidth, false, false);
MatrixPtr targetGpu = GpuMatrix::create(numSamples, outWidth, false, true);
MatrixPtr maxIdx = CpuMatrix::create(numSamples, outWidth, false, false);
MatrixPtr maxIdxGpu = GpuMatrix::create(numSamples, outWidth, false, true);
input->randomizeUniform();
target->randomizeUniform();
inputGpu->copyFrom(*input);
targetGpu->copyFrom(*target);
target->maxPool3DForward(*input,
*maxIdx,
channels,
imgSizeD,
imgSizeH,
imgSizeW,
outD,
outH,
outW,
ksizeD,
ksizeH,
ksizeW,
strideD,
strideH,
strideW,
padD,
padH,
padW);
targetGpu->maxPool3DForward(*inputGpu,
*maxIdxGpu,
channels,
imgSizeD,
imgSizeH,
imgSizeW,
outD,
outH,
outW,
ksizeD,
ksizeH,
ksizeW,
strideD,
strideH,
strideW,
padD,
padH,
padW);
MatrixPtr targetCheck = CpuMatrix::create(numSamples, outWidth, false, false);
targetCheck->copyFrom(*targetGpu);
checkMatrixEqual(target, targetCheck);
MatrixPtr inputGrad = CpuMatrix::create(numSamples, inWidth, false, false);
MatrixPtr inputGpuGrad = GpuMatrix::create(numSamples, inWidth, false, true);
MatrixPtr targetGrad = CpuMatrix::create(numSamples, outWidth, false, false);
MatrixPtr targetGpuGrad =
GpuMatrix::create(numSamples, outWidth, false, true);
inputGrad->randomizeUniform();
targetGrad->randomizeUniform();
inputGpuGrad->copyFrom(*inputGrad);
targetGpuGrad->copyFrom(*targetGrad);
inputGrad->maxPool3DBackward(*targetGrad,
*maxIdx,
imgSizeD,
imgSizeH,
imgSizeW,
outD,
outH,
outW,
ksizeD,
ksizeH,
ksizeW,
strideD,
strideH,
strideW,
padD,
padH,
padW,
1.0,
1.0);
inputGpuGrad->maxPool3DBackward(*targetGpuGrad,
*maxIdxGpu,
imgSizeD,
imgSizeH,
imgSizeW,
outD,
outH,
outW,
ksizeD,
ksizeH,
ksizeW,
strideD,
strideH,
strideW,
padD,
padH,
padW,
1.0,
1.0);
MatrixPtr targetBwdCheck =
CpuMatrix::create(numSamples, inWidth, false, false);
targetBwdCheck->copyFrom(*inputGpuGrad);
checkMatrixEqual(inputGrad, targetBwdCheck);
}
void testAvgPool3DFwdBwd(int numSamples,
int channels,
int imgSizeD,
int imgSizeH,
int imgSizeW,
int ksizeD,
int ksizeH,
int ksizeW,
int strideD,
int strideH,
int strideW,
int padD,
int padH,
int padW) {
int outD = outputSize(imgSizeD, ksizeD, padD, strideD, true);
int outH = outputSize(imgSizeH, ksizeH, padH, strideH, true);
int outW = outputSize(imgSizeW, ksizeW, padW, strideW, true);
int inWidth = imgSizeD * imgSizeH * imgSizeW * channels;
MatrixPtr input = CpuMatrix::create(numSamples, inWidth, false, false);
MatrixPtr inputGpu = GpuMatrix::create(numSamples, inWidth, false, true);
int outWidth = channels * outD * outH * outW;
MatrixPtr target = CpuMatrix::create(numSamples, outWidth, false, false);
MatrixPtr targetGpu = GpuMatrix::create(numSamples, outWidth, false, true);
input->randomizeUniform();
target->randomizeUniform();
inputGpu->copyFrom(*input);
targetGpu->copyFrom(*target);
target->avgPool3DForward(*input,
channels,
imgSizeD,
imgSizeH,
imgSizeW,
outD,
outH,
outW,
ksizeD,
ksizeH,
ksizeW,
strideD,
strideH,
strideW,
padD,
padH,
padW);
targetGpu->avgPool3DForward(*inputGpu,
channels,
imgSizeD,
imgSizeH,
imgSizeW,
outD,
outH,
outW,
ksizeD,
ksizeH,
ksizeW,
strideD,
strideH,
strideW,
padD,
padH,
padW);
TensorCheckErr(*target, *targetGpu);
MatrixPtr inputGrad = CpuMatrix::create(numSamples, inWidth, false, false);
MatrixPtr inputGpuGrad = GpuMatrix::create(numSamples, inWidth, false, true);
MatrixPtr targetGrad = CpuMatrix::create(numSamples, outWidth, false, false);
MatrixPtr targetGpuGrad =
GpuMatrix::create(numSamples, outWidth, false, true);
inputGrad->randomizeUniform();
targetGrad->randomizeUniform();
inputGpuGrad->copyFrom(*inputGrad);
targetGpuGrad->copyFrom(*targetGrad);
inputGrad->avgPool3DBackward(*targetGrad,
imgSizeD,
imgSizeH,
imgSizeW,
outD,
outH,
outW,
ksizeD,
ksizeH,
ksizeW,
strideD,
strideH,
strideW,
padD,
padH,
padW,
1.0,
1.0);
inputGpuGrad->avgPool3DBackward(*targetGpuGrad,
imgSizeD,
imgSizeH,
imgSizeW,
outD,
outH,
outW,
ksizeD,
ksizeH,
ksizeW,
strideD,
strideH,
strideW,
padD,
padH,
padW,
1.0,
1.0);
TensorCheckErr(*inputGrad, *inputGpuGrad);
}
// TODO(yi): I noticed many such blindly combinatorial tests in this
// file. They are no help to locate defects at all.
TEST(Matrix, Pool3DFwdBwd) {
for (auto numSamples : {1, 3}) {
for (auto channels : {3}) {
for (auto imgSizeD : {9, 16}) {
for (auto imgSizeH : {9, 32}) {
for (auto imgSizeW : {9, 32}) {
for (auto sizeX : {3}) {
for (auto sizeY : {3}) {
for (auto sizeZ : {3}) {
for (auto sD : {2}) {
for (auto sH : {2}) {
for (auto sW : {2}) {
for (auto pD : {0, (sizeZ - 1) / 2}) {
for (auto pH : {0, (sizeY - 1) / 2}) {
for (auto pW : {0, (sizeX - 1) / 2}) {
VLOG(3) << " numSamples=" << numSamples
<< " channels=" << channels
<< " imgSizeD=" << imgSizeD
<< " imgSizeH=" << imgSizeH
<< " imgSizeW=" << imgSizeW
<< " sizeX=" << sizeX
<< " sizeY=" << sizeY
<< " sizeZ=" << sizeZ << " strideD=" << sD
<< " strideH=" << sH << " strideW=" << sW
<< " padingD=" << pD << " padingH=" << pH
<< " padingW=" << pW;
testMaxPool3DFwdBwd(numSamples,
channels,
imgSizeD,
imgSizeH,
imgSizeW,
sizeX,
sizeY,
sizeZ,
sD,
sH,
sW,
pD,
pH,
pW);
testAvgPool3DFwdBwd(numSamples,
channels,
imgSizeD,
imgSizeH,
imgSizeW,
sizeX,
sizeY,
sizeZ,
sD,
sH,
sW,
pD,
pH,
pW);
}
}
}
}
}
}
}
}
}
}
}
}
}
}
// for (auto numSamples : {1, 3}) {
// for (auto channels : {1, 3}) {
// for (auto imgSizeD : {9,16}) {
// for (auto imgSizeH : {9, 32}) {
// for (auto imgSizeW : {9, 32}) {
// for (auto sizeX : {2, 3}) {
// for (auto sizeY : {2, 3}) {
// for (auto sizeZ : {2,3}){
// for (auto sD : {1, 2}) {
// for (auto sH : {1, 2}) {
// for (auto sW : {1, 2}) {
// for (auto pD : {0, (sizeZ - 1) / 2}){
// for (auto pH : {0, (sizeY - 1) / 2}) {
// for (auto pW : {0, (sizeX - 1) / 2}) {
// VLOG(3) << " numSamples=" << numSamples
// << " channels=" << channels
// << " imgSizeD=" << imgSizeD
// << " imgSizeH=" << imgSizeH
// << " imgSizeW=" << imgSizeW
// << " sizeX=" << sizeX
// << " sizeY=" << sizeY
// << " sizeZ=" << sizeZ
// << " strideD=" << sD
// << " strideH=" << sH
// << " strideW=" << sW
// << " padingD=" << pD
// << " padingH=" << pH
// << " padingW=" << pW;
//
// testMaxPool3DFwdBwd(numSamples,
// channels,
// imgSizeD,
// imgSizeH,
// imgSizeW,
// sizeX,
// sizeY,
// sizeZ,
// sD,
// sH,
// sW,
// pD,
// pH,
// pW);
// testAvgPool3DFwdBwd(numSamples,
// channels,
// imgSizeD,
// imgSizeH,
// imgSizeW,
// sizeX,
// sizeY,
// sizeZ,
// sD,
// sH,
// sW,
// pD,
// pH,
// pW);
// }
// }
// }
// }
// }
// }
// }
// }
// }
// }
// }
// }
// }
// }
}
void testMatrixCol2Vol(int depth, int height, int width) {
int channel = 3;
int filterX = 3, filterY = 4, filterZ = 5;
int strideX = 2, strideY = 2, strideZ = 2;
int padX = 1, padY = 1, padZ = 1;
MatrixPtr cpuImage =
std::make_shared<CpuMatrix>(channel, depth * height * width);
MatrixPtr gpuImage =
std::make_shared<GpuMatrix>(channel, depth * height * width);
cpuImage->randomizeUniform();
gpuImage->copyFrom(*cpuImage);
int outD = outputSize(depth, filterZ, padZ, strideZ, true);
int outH = outputSize(height, filterY, padY, strideY, true);
int outW = outputSize(width, filterX, padX, strideX, true);
int colBufHeight = channel * filterZ * filterY * filterX;
int colBufWidth = outD * outH * outW;
MatrixPtr cpuColBuf = std::make_shared<CpuMatrix>(colBufHeight, colBufWidth);
MatrixPtr gpuColBuf = std::make_shared<GpuMatrix>(colBufHeight, colBufWidth);
cpuColBuf->vol2Col(cpuImage->getData(),
channel,
depth,
height,
width,
filterZ,
filterY,
filterX,
strideZ,
strideY,
strideX,
padZ,
padY,
padX);
gpuColBuf->vol2Col(gpuImage->getData(),
channel,
depth,
height,
width,
filterZ,
filterY,
filterX,
strideZ,
strideY,
strideX,
padZ,
padY,
padX);
TensorCheckEqual(*cpuColBuf, *gpuColBuf);
cpuColBuf->randomizeUniform();
gpuColBuf->copyFrom(*cpuColBuf);
cpuColBuf->col2Vol(cpuImage->getData(),
channel,
depth,
height,
width,
filterZ,
filterY,
filterX,
strideZ,
strideY,
strideX,
padZ,
padY,
padX,
1.0,
1.0);
gpuColBuf->col2Vol(gpuImage->getData(),
channel,
depth,
height,
width,
filterZ,
filterY,
filterX,
strideZ,
strideY,
strideX,
padZ,
padY,
padX,
1.0,
1.0);
TensorCheckErr(*cpuImage, *gpuImage);
}
TEST(Matrix, col2Vol) {
for (auto depth : {9, 16, 64}) {
for (auto height : {9, 11, 128}) {
for (auto width : {9, 32, 128}) {
VLOG(3) << "depth=" << depth << " height=" << height
<< " width=" << width;
testMatrixCol2Vol(depth, height, width);
}
}
}
}
#endif #endif
file(GLOB GENERAL_OPS RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}" "*_op.cc")
string(REPLACE ".cc" "" GENERAL_OPS "${GENERAL_OPS}")
function(op_library TARGET) function(op_library TARGET)
# op_library is a function to create op library. The interface is same as # op_library is a function to create op library. The interface is same as
# cc_library. But it handle split GPU/CPU code and link some common library # cc_library. But it handle split GPU/CPU code and link some common library
# for ops. # for ops.
set(OP_LIBRARY ${TARGET} ${OP_LIBRARY} PARENT_SCOPE)
set(cc_srcs) set(cc_srcs)
set(cu_srcs) set(cu_srcs)
set(op_common_deps operator op_registry) set(op_common_deps operator op_registry)
...@@ -43,34 +46,26 @@ endfunction() ...@@ -43,34 +46,26 @@ endfunction()
add_subdirectory(math) add_subdirectory(math)
cc_test(gather_test SRCS gather_test.cc DEPS tensor) list(REMOVE_ITEM GENERAL_OPS
op_library(gather_op SRCS gather_op.cc gather_op.cu) net_op
minus_op
cc_test(scatter_test SRCS scatter_test.cc DEPS tensor) mul_op
op_library(scatter_op SRCS scatter_op.cc scatter_op.cu) recurrent_op
scale_op)
cc_library(net_op SRCS net_op.cc DEPS op_registry)
cc_test(net_op_test SRCS net_op_test.cc DEPS net_op)
op_library(add_op SRCS add_op.cc add_op.cu)
op_library(mean_op SRCS mean_op.cc mean_op.cu)
op_library(net_op SRCS net_op.cc)
op_library(minus_op SRCS minus_op.cc minus_op.cu DEPS scale_op)
op_library(mul_op SRCS mul_op.cc mul_op.cu DEPS math_function) op_library(mul_op SRCS mul_op.cc mul_op.cu DEPS math_function)
op_library(rowwise_add_op SRCS rowwise_add_op.cu rowwise_add_op.cc) op_library(recurrent_op SRCS recurrent_op.cc rnn/recurrent_op_utils.cc
DEPS framework_proto tensor operator net_op)
op_library(scale_op SRCS scale_op.cc scale_op.cu DEPS net_op)
op_library(sigmoid_op SRCS sigmoid_op.cc sigmoid_op.cu) foreach(src ${GENERAL_OPS})
op_library(softmax_op SRCS softmax_op.cc softmax_op.cu) op_library(${src} SRCS ${src}.cc ${src}.cu)
op_library(gaussian_random_op SRCS gaussian_random_op.cc gaussian_random_op.cu) endforeach()
op_library(cross_entropy_op SRCS cross_entropy_op.cc cross_entropy_op.cu)
op_library(fill_zeros_like_op SRCS fill_zeros_like_op.cc fill_zeros_like_op.cu)
op_library(sgd_op SRCS sgd_op.cc sgd_op.cu) set(GLOB_OP_LIB ${OP_LIBRARY} CACHE INTERNAL "Global OP library")
op_library(recurrent_op SRCS recurrent_op.cc rnn/recurrent_op_utils.cc cc_test(gather_test SRCS gather_test.cc DEPS tensor)
DEPS framework_proto tensor op_registry operator net_op) cc_test(net_op_test SRCS net_op_test.cc DEPS net_op)
op_library(uniform_random_op SRCS uniform_random_op.cc uniform_random_op.cu) cc_test(scatter_test SRCS scatter_test.cc DEPS tensor)
op_library(lookup_table_op SRCS lookup_table_op.cc lookup_table_op.cu)
op_library(scale_op SRCS scale_op.cc scale_op.cu DEPS net_op)
op_library(minus_op SRCS minus_op.cc minus_op.cu DEPS scale_op)
op_library(pad_op SRCS pad_op.cc pad_op.cu)
...@@ -19,13 +19,12 @@ template <typename T> ...@@ -19,13 +19,12 @@ template <typename T>
class CPUGaussianRandomKernel : public framework::OpKernel { class CPUGaussianRandomKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
float mean = context.op_.GetAttr<float>("mean"); float mean = context.GetAttr<float>("mean");
float std = context.op_.GetAttr<float>("std"); float std = context.GetAttr<float>("std");
auto* tensor = context.Output<framework::Tensor>("Out"); auto* tensor = context.Output<framework::Tensor>("Out");
T* data = tensor->mutable_data<T>(context.GetPlace()); T* data = tensor->mutable_data<T>(context.GetPlace());
unsigned int seed = unsigned int seed = static_cast<unsigned int>(context.GetAttr<int>("seed"));
static_cast<unsigned int>(context.op_.GetAttr<int>("seed"));
std::minstd_rand engine; std::minstd_rand engine;
if (seed == 0) { if (seed == 0) {
seed = std::random_device()(); seed = std::random_device()();
......
...@@ -42,14 +42,13 @@ class GPUGaussianRandomKernel : public framework::OpKernel { ...@@ -42,14 +42,13 @@ class GPUGaussianRandomKernel : public framework::OpKernel {
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* tensor = context.Output<framework::Tensor>("Out"); auto* tensor = context.Output<framework::Tensor>("Out");
T* data = tensor->mutable_data<T>(context.GetPlace()); T* data = tensor->mutable_data<T>(context.GetPlace());
unsigned int seed = unsigned int seed = static_cast<unsigned int>(context.GetAttr<int>("seed"));
static_cast<unsigned int>(context.op_.GetAttr<int>("seed"));
if (seed == 0) { if (seed == 0) {
std::random_device rd; std::random_device rd;
seed = rd(); seed = rd();
} }
T mean = static_cast<T>(context.op_.GetAttr<float>("mean")); T mean = static_cast<T>(context.GetAttr<float>("mean"));
T std = static_cast<T>(context.op_.GetAttr<float>("std")); T std = static_cast<T>(context.GetAttr<float>("std"));
thrust::counting_iterator<unsigned int> index_sequence_begin(0); thrust::counting_iterator<unsigned int> index_sequence_begin(0);
ssize_t N = framework::product(tensor->dims()); ssize_t N = framework::product(tensor->dims());
thrust::transform(index_sequence_begin, index_sequence_begin + N, thrust::transform(index_sequence_begin, index_sequence_begin + N,
......
...@@ -30,12 +30,12 @@ class LookupTableKernel : public framework::OpKernel { ...@@ -30,12 +30,12 @@ class LookupTableKernel : public framework::OpKernel {
auto ids_t = context.Input<Tensor>("Ids"); // int tensor auto ids_t = context.Input<Tensor>("Ids"); // int tensor
auto output_t = context.Output<Tensor>("Out"); // float tensor auto output_t = context.Output<Tensor>("Out"); // float tensor
size_t N = table_t->dims()[0]; int N = table_t->dims()[0];
size_t D = table_t->dims()[1]; int D = table_t->dims()[1];
auto ids = ids_t->data<int32_t>(); auto ids = ids_t->data<int32_t>();
auto table = table_t->data<T>(); auto table = table_t->data<T>();
auto output = output_t->mutable_data<T>(context.GetPlace()); auto output = output_t->mutable_data<T>(context.GetPlace());
for (size_t i = 0; i < product(ids_t->dims()); ++i) { for (ssize_t i = 0; i < product(ids_t->dims()); ++i) {
PADDLE_ENFORCE_LT(ids[i], N); PADDLE_ENFORCE_LT(ids[i], N);
PADDLE_ENFORCE_GE(ids[i], 0); PADDLE_ENFORCE_GE(ids[i], 0);
memcpy(output + i * D, table + ids[i] * D, D * sizeof(T)); memcpy(output + i * D, table + ids[i] * D, D * sizeof(T));
...@@ -51,8 +51,8 @@ class LookupTableGradKernel : public framework::OpKernel { ...@@ -51,8 +51,8 @@ class LookupTableGradKernel : public framework::OpKernel {
auto d_output_t = context.Input<Tensor>(framework::GradVarName("Out")); auto d_output_t = context.Input<Tensor>(framework::GradVarName("Out"));
auto d_table_t = context.Output<Tensor>(framework::GradVarName("W")); auto d_table_t = context.Output<Tensor>(framework::GradVarName("W"));
size_t N = d_table_t->dims()[0]; int N = d_table_t->dims()[0];
size_t D = d_table_t->dims()[1]; int D = d_table_t->dims()[1];
auto ids = ids_t->data<int32_t>(); auto ids = ids_t->data<int32_t>();
const T* d_output = d_output_t->data<T>(); const T* d_output = d_output_t->data<T>();
T* d_table = d_table_t->mutable_data<T>(context.GetPlace()); T* d_table = d_table_t->mutable_data<T>(context.GetPlace());
...@@ -61,10 +61,10 @@ class LookupTableGradKernel : public framework::OpKernel { ...@@ -61,10 +61,10 @@ class LookupTableGradKernel : public framework::OpKernel {
t.device(context.GetEigenDevice<platform::CPUPlace>()) = t.device(context.GetEigenDevice<platform::CPUPlace>()) =
t.constant(static_cast<T>(0)); t.constant(static_cast<T>(0));
for (size_t i = 0; i < product(ids_t->dims()); ++i) { for (ssize_t i = 0; i < product(ids_t->dims()); ++i) {
PADDLE_ENFORCE_LT(ids[i], N); PADDLE_ENFORCE_LT(ids[i], N);
PADDLE_ENFORCE_GE(ids[i], 0); PADDLE_ENFORCE_GE(ids[i], 0);
for (size_t j = 0; j < D; ++j) { for (int j = 0; j < D; ++j) {
d_table[ids[i] * D + j] += d_output[i * D + j]; d_table[ids[i] * D + j] += d_output[i * D + j];
} }
} }
......
...@@ -79,7 +79,7 @@ class MinusGradOp : public NetOp { ...@@ -79,7 +79,7 @@ class MinusGradOp : public NetOp {
} // namespace paddle } // namespace paddle
USE_OP(scale); USE_OP(scale);
USE_OP_ITSELF(identity); USE_NO_KERNEL_OP(identity);
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(minus, ops::MinusOp, ops::MinusOpMaker, minus_grad, REGISTER_OP(minus, ops::MinusOp, ops::MinusOpMaker, minus_grad,
ops::MinusGradOp<float>); ops::MinusGradOp<float>);
......
...@@ -29,10 +29,10 @@ class MulOp : public framework::OperatorWithKernel { ...@@ -29,10 +29,10 @@ class MulOp : public framework::OperatorWithKernel {
auto dim1 = ctx.Input<Tensor>("Y")->dims(); auto dim1 = ctx.Input<Tensor>("Y")->dims();
PADDLE_ENFORCE_EQ(dim0.size(), 2, PADDLE_ENFORCE_EQ(dim0.size(), 2,
"input X(%s) should be a tensor with 2 dims, a matrix", "input X(%s) should be a tensor with 2 dims, a matrix",
ctx.op_.Input("X")); ctx.op().Input("X"));
PADDLE_ENFORCE_EQ(dim1.size(), 2, PADDLE_ENFORCE_EQ(dim1.size(), 2,
"input Y(%s) should be a tensor with 2 dims, a matrix", "input Y(%s) should be a tensor with 2 dims, a matrix",
ctx.op_.Input("Y")); ctx.op().Input("Y"));
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
dim0[1], dim1[0], dim0[1], dim1[0],
"First matrix's width must be equal with second matrix's height."); "First matrix's width must be equal with second matrix's height.");
...@@ -75,8 +75,8 @@ class MulOpGrad : public framework::OperatorWithKernel { ...@@ -75,8 +75,8 @@ class MulOpGrad : public framework::OperatorWithKernel {
PADDLE_ENFORCE(y_dims[1] == out_dims[1], PADDLE_ENFORCE(y_dims[1] == out_dims[1],
"Out@GRAD M X N must equal to Y dims 1, N "); "Out@GRAD M X N must equal to Y dims 1, N ");
x_grad->Resize(x_dims); if (x_grad) x_grad->Resize(x_dims);
y_grad->Resize(y_dims); if (y_grad) y_grad->Resize(y_dims);
} }
}; };
......
...@@ -31,13 +31,13 @@ template <typename Place, typename T> ...@@ -31,13 +31,13 @@ template <typename Place, typename T>
class MulKernel : public framework::OpKernel { class MulKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* X = context.Input<Tensor>("X"); auto* x = context.Input<Tensor>("X");
auto* Y = context.Input<Tensor>("Y"); auto* y = context.Input<Tensor>("Y");
auto* Z = context.Output<Tensor>("Out"); auto* z = context.Output<Tensor>("Out");
Z->mutable_data<T>(context.GetPlace()); z->mutable_data<T>(context.GetPlace());
auto* device_context = auto* device_context =
const_cast<platform::DeviceContext*>(context.device_context_); const_cast<platform::DeviceContext*>(context.device_context_);
math::matmul<Place, T>(*X, false, *Y, false, 1, Z, 0, device_context); math::matmul<Place, T>(*x, false, *y, false, 1, z, 0, device_context);
} }
}; };
...@@ -45,20 +45,24 @@ template <typename Place, typename T> ...@@ -45,20 +45,24 @@ template <typename Place, typename T>
class MulGradKernel : public framework::OpKernel { class MulGradKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* X = ctx.Input<Tensor>("X"); auto* x = ctx.Input<Tensor>("X");
auto* Y = ctx.Input<Tensor>("Y"); auto* y = ctx.Input<Tensor>("Y");
auto* dOut = ctx.Input<Tensor>(framework::GradVarName("Out")); auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* dX = ctx.Output<Tensor>(framework::GradVarName("X")); auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dY = ctx.Output<Tensor>(framework::GradVarName("Y")); auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
dX->mutable_data<T>(ctx.GetPlace());
dY->mutable_data<T>(ctx.GetPlace());
auto* device_context = auto* device_context =
const_cast<platform::DeviceContext*>(ctx.device_context_); const_cast<platform::DeviceContext*>(ctx.device_context_);
// dX = dOut * Y'. dX: M x K, dOut : M x N, Y : K x N if (dx) {
math::matmul<Place, T>(*dOut, false, *Y, true, 1, dX, 0, device_context); dx->mutable_data<T>(ctx.GetPlace());
// dY = X' * dOut. dY: K x N, dOut : M x N, X : M x K // dx = dout * y'. dx: M x K, dout : M x N, y : K x N
math::matmul<Place, T>(*X, true, *dOut, false, 1, dY, 0, device_context); math::matmul<Place, T>(*dout, false, *y, true, 1, dx, 0, device_context);
}
if (dy) {
dy->mutable_data<T>(ctx.GetPlace());
// dy = x' * dout. dy K x N, dout : M x N, x : M x K
math::matmul<Place, T>(*x, true, *dout, false, 1, dy, 0, device_context);
}
} }
}; };
......
...@@ -235,5 +235,5 @@ RecurrentGradientOp::RecurrentGradientOp( ...@@ -235,5 +235,5 @@ RecurrentGradientOp::RecurrentGradientOp(
} // namespace paddle } // namespace paddle
REGISTER_OP_WITHOUT_GRADIENT( REGISTER_OP_WITHOUT_GRADIENT(
recurrent_op, paddle::operators::RecurrentOp, recurrent, paddle::operators::RecurrentOp,
paddle::operators::RecurrentAlgorithmProtoAndCheckerMaker); paddle::operators::RecurrentAlgorithmProtoAndCheckerMaker);
...@@ -64,8 +64,10 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel { ...@@ -64,8 +64,10 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel {
auto dims0 = ctx.Input<Tensor>("X")->dims(); auto dims0 = ctx.Input<Tensor>("X")->dims();
auto dims1 = ctx.Input<Tensor>("b")->dims(); auto dims1 = ctx.Input<Tensor>("b")->dims();
PADDLE_ENFORCE_EQ(1, dims1.size(), "b dims should be 1") PADDLE_ENFORCE_EQ(1, dims1.size(), "b dims should be 1")
ctx.Output<Tensor>(framework::GradVarName("X"))->Resize(dims0); auto *dx = ctx.Output<Tensor>(framework::GradVarName("X"));
ctx.Output<Tensor>(framework::GradVarName("b"))->Resize(dims1); auto *db = ctx.Output<Tensor>(framework::GradVarName("b"));
if (dx) dx->Resize(dims0);
if (db) db->Resize(dims1);
} }
}; };
......
...@@ -51,20 +51,24 @@ template <typename Place, typename T> ...@@ -51,20 +51,24 @@ template <typename Place, typename T>
class RowwiseAddGradKernel : public framework::OpKernel { class RowwiseAddGradKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* dOut = context.Input<Tensor>(framework::GradVarName("Out")); auto* dout = context.Input<Tensor>(framework::GradVarName("Out"));
auto* dX = context.Output<Tensor>(framework::GradVarName("X")); auto* dx = context.Output<Tensor>(framework::GradVarName("X"));
auto* db = context.Output<Tensor>(framework::GradVarName("b")); auto* db = context.Output<Tensor>(framework::GradVarName("b"));
dX->mutable_data<T>(context.GetPlace());
db->mutable_data<T>(context.GetPlace());
auto OutGrad = EigenMatrix<T>::From(*dOut); auto out_grad = EigenMatrix<T>::From(*dout);
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
EigenMatrix<T>::From(*dX).device(place) = OutGrad; if (dx) {
dx->mutable_data<T>(context.GetPlace());
EigenMatrix<T>::From(*dx).device(place) = out_grad;
}
if (db) {
db->mutable_data<T>(context.GetPlace());
// https://eigen.tuxfamily.org/dox/unsupported/TensorBase_8h_source.html // https://eigen.tuxfamily.org/dox/unsupported/TensorBase_8h_source.html
// colwise add // colwise add
Eigen::array<int, 1> dims{{0}}; /* dimension to reduce */ Eigen::array<int, 1> dims{{0}}; /* dimension to reduce */
EigenVector<T>::Flatten(*db).device(place) = OutGrad.sum(dims); EigenVector<T>::Flatten(*db).device(place) = out_grad.sum(dims);
}
} }
}; };
} // namespace operators } // namespace operators
......
...@@ -27,7 +27,7 @@ class ScaleKernel : public framework::OpKernel { ...@@ -27,7 +27,7 @@ class ScaleKernel : public framework::OpKernel {
auto* in = context.Input<framework::Tensor>("X"); auto* in = context.Input<framework::Tensor>("X");
tensor->mutable_data<T>(in->place()); tensor->mutable_data<T>(in->place());
auto scale = static_cast<T>(context.op_.GetAttr<AttrType>("scale")); auto scale = static_cast<T>(context.GetAttr<AttrType>("scale"));
auto eigen_out = framework::EigenVector<T>::Flatten(*tensor); auto eigen_out = framework::EigenVector<T>::Flatten(*tensor);
auto eigen_in = framework::EigenVector<T>::Flatten(*in); auto eigen_in = framework::EigenVector<T>::Flatten(*in);
......
...@@ -31,7 +31,7 @@ class SGDOpKernel : public framework::OpKernel { ...@@ -31,7 +31,7 @@ class SGDOpKernel : public framework::OpKernel {
auto param = ctx.Input<Tensor>("param"); auto param = ctx.Input<Tensor>("param");
auto grad = ctx.Input<Tensor>("grad"); auto grad = ctx.Input<Tensor>("grad");
auto param_out = ctx.Output<Tensor>("param_out"); auto param_out = ctx.Output<Tensor>("param_out");
float lr = ctx.op_.GetAttr<float>("learning_rate"); float lr = ctx.GetAttr<float>("learning_rate");
param_out->mutable_data<T>(ctx.GetPlace()); param_out->mutable_data<T>(ctx.GetPlace());
......
...@@ -26,16 +26,15 @@ class CPUUniformRandomKernel : public framework::OpKernel { ...@@ -26,16 +26,15 @@ class CPUUniformRandomKernel : public framework::OpKernel {
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* tensor = context.Output<framework::Tensor>("Out"); auto* tensor = context.Output<framework::Tensor>("Out");
T* data = tensor->mutable_data<T>(context.GetPlace()); T* data = tensor->mutable_data<T>(context.GetPlace());
unsigned int seed = unsigned int seed = static_cast<unsigned int>(context.GetAttr<int>("seed"));
static_cast<unsigned int>(context.op_.GetAttr<int>("seed"));
std::minstd_rand engine; std::minstd_rand engine;
if (seed == 0) { if (seed == 0) {
seed = std::random_device()(); seed = std::random_device()();
} }
engine.seed(seed); engine.seed(seed);
std::uniform_real_distribution<T> dist( std::uniform_real_distribution<T> dist(
static_cast<T>(context.op_.GetAttr<float>("min")), static_cast<T>(context.GetAttr<float>("min")),
static_cast<T>(context.op_.GetAttr<float>("max"))); static_cast<T>(context.GetAttr<float>("max")));
ssize_t size = framework::product(tensor->dims()); ssize_t size = framework::product(tensor->dims());
for (ssize_t i = 0; i < size; ++i) { for (ssize_t i = 0; i < size; ++i) {
data[i] = dist(engine); data[i] = dist(engine);
......
...@@ -45,14 +45,13 @@ class GPUUniformRandomKernel : public framework::OpKernel { ...@@ -45,14 +45,13 @@ class GPUUniformRandomKernel : public framework::OpKernel {
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* tensor = context.Output<framework::Tensor>("Out"); auto* tensor = context.Output<framework::Tensor>("Out");
T* data = tensor->mutable_data<T>(context.GetPlace()); T* data = tensor->mutable_data<T>(context.GetPlace());
unsigned int seed = unsigned int seed = static_cast<unsigned int>(context.GetAttr<int>("seed"));
static_cast<unsigned int>(context.op_.GetAttr<int>("seed"));
if (seed == 0) { if (seed == 0) {
std::random_device rd; std::random_device rd;
seed = rd(); seed = rd();
} }
T min = static_cast<T>(context.op_.GetAttr<float>("min")); T min = static_cast<T>(context.GetAttr<float>("min"));
T max = static_cast<T>(context.op_.GetAttr<float>("max")); T max = static_cast<T>(context.GetAttr<float>("max"));
thrust::counting_iterator<unsigned int> index_sequence_begin(0); thrust::counting_iterator<unsigned int> index_sequence_begin(0);
ssize_t N = framework::product(tensor->dims()); ssize_t N = framework::product(tensor->dims());
thrust::transform(index_sequence_begin, index_sequence_begin + N, thrust::transform(index_sequence_begin, index_sequence_begin + N,
......
...@@ -186,6 +186,7 @@ void Argument::resizeAndCopyFrom(const Argument& src, ...@@ -186,6 +186,7 @@ void Argument::resizeAndCopyFrom(const Argument& src,
resizeAndCopy(strs, src.strs, useGpu, stream); resizeAndCopy(strs, src.strs, useGpu, stream);
frameWidth = src.frameWidth; frameWidth = src.frameWidth;
frameHeight = src.frameHeight; frameHeight = src.frameHeight;
frameDepth = src.frameDepth;
} }
int32_t Argument::resizeAndCopyFrom(const Argument& src, int32_t Argument::resizeAndCopyFrom(const Argument& src,
...@@ -206,6 +207,7 @@ int32_t Argument::resizeAndCopyFrom(const Argument& src, ...@@ -206,6 +207,7 @@ int32_t Argument::resizeAndCopyFrom(const Argument& src,
dataId = src.dataId; dataId = src.dataId;
frameWidth = src.frameWidth; frameWidth = src.frameWidth;
frameHeight = src.frameHeight; frameHeight = src.frameHeight;
frameDepth = src.frameDepth;
if (!src.sequenceStartPositions) { if (!src.sequenceStartPositions) {
// non-sequence input, copy samples directly // non-sequence input, copy samples directly
...@@ -677,6 +679,7 @@ void Argument::reorganizeSeqInfo( ...@@ -677,6 +679,7 @@ void Argument::reorganizeSeqInfo(
const ICpuGpuVectorPtr subSeqStartPos, const ICpuGpuVectorPtr subSeqStartPos,
std::vector<std::vector<int>>& reorganizedSeqInfo) { std::vector<std::vector<int>>& reorganizedSeqInfo) {
CHECK(seqStartPos); CHECK(seqStartPos);
reorganizedSeqInfo.clear();
int seqNum = seqStartPos->getSize() - 1; int seqNum = seqStartPos->getSize() - 1;
int* seqStarts = seqStartPos->getMutableData(false); int* seqStarts = seqStartPos->getMutableData(false);
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. you may not use this file except in compliance with the License.
You may obtain a copy of the License at You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -35,6 +32,7 @@ struct Argument { ...@@ -35,6 +32,7 @@ struct Argument {
strs(nullptr), strs(nullptr),
frameHeight(0), frameHeight(0),
frameWidth(0), frameWidth(0),
frameDepth(0),
sequenceStartPositions(nullptr), sequenceStartPositions(nullptr),
subSequenceStartPositions(nullptr), subSequenceStartPositions(nullptr),
cpuSequenceDims(nullptr), cpuSequenceDims(nullptr),
...@@ -64,6 +62,7 @@ struct Argument { ...@@ -64,6 +62,7 @@ struct Argument {
allCount = argument.allCount; allCount = argument.allCount;
frameHeight = argument.frameHeight; frameHeight = argument.frameHeight;
frameWidth = argument.frameWidth; frameWidth = argument.frameWidth;
frameDepth = argument.frameDepth;
dataId = argument.dataId; dataId = argument.dataId;
} }
...@@ -76,6 +75,7 @@ struct Argument { ...@@ -76,6 +75,7 @@ struct Argument {
// A dataBatch includes batchSize frames, one frame maybe not only vector // A dataBatch includes batchSize frames, one frame maybe not only vector
size_t frameHeight; size_t frameHeight;
size_t frameWidth; size_t frameWidth;
size_t frameDepth;
// If NULL, each position is treated independently. // If NULL, each position is treated independently.
// Otherwise, its size should be #NumberOfSequences + 1. // Otherwise, its size should be #NumberOfSequences + 1.
...@@ -136,8 +136,10 @@ struct Argument { ...@@ -136,8 +136,10 @@ struct Argument {
} }
size_t getFrameHeight() const { return frameHeight; } size_t getFrameHeight() const { return frameHeight; }
size_t getFrameWidth() const { return frameWidth; } size_t getFrameWidth() const { return frameWidth; }
size_t getFrameDepth() const { return frameDepth; }
void setFrameHeight(size_t h) { frameHeight = h; } void setFrameHeight(size_t h) { frameHeight = h; }
void setFrameWidth(size_t w) { frameWidth = w; } void setFrameWidth(size_t w) { frameWidth = w; }
void setFrameDepth(size_t d) { frameDepth = d; }
int64_t getNumSequences() const { int64_t getNumSequences() const {
return sequenceStartPositions ? sequenceStartPositions->getSize() - 1 return sequenceStartPositions ? sequenceStartPositions->getSize() - 1
......
...@@ -22,3 +22,5 @@ ENDIF() ...@@ -22,3 +22,5 @@ ENDIF()
cc_library(device_context SRCS device_context.cc DEPS memory buddy_allocator cc_library(device_context SRCS device_context.cc DEPS memory buddy_allocator
system_allocator memory_block meta_data meta_cache place eigen3 ${GPU_CTX_DEPS}) system_allocator memory_block meta_data meta_cache place eigen3 ${GPU_CTX_DEPS})
nv_test(device_context_test SRCS device_context_test.cc DEPS device_context gpu_info) nv_test(device_context_test SRCS device_context_test.cc DEPS device_context gpu_info)
nv_test(cudnn_helper_test SRCS cudnn_helper_test.cc DEPS dynload_cuda)
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/platform/dynload/cudnn.h"
#include "paddle/platform/enforce.h"
#include "paddle/platform/macros.h"
namespace paddle {
namespace platform {
enum class DataLayout {
kNHWC,
kNCHW,
kNCHW_VECT_C,
};
enum class PoolingMode {
kMaximum,
kAverage,
};
template <typename T>
class CudnnDataType;
template <>
class CudnnDataType<float> {
public:
static const cudnnDataType_t type = CUDNN_DATA_FLOAT;
};
template <>
class CudnnDataType<double> {
public:
static const cudnnDataType_t type = CUDNN_DATA_DOUBLE;
};
inline cudnnTensorFormat_t GetCudnnTensorFormat(const DataLayout& order) {
switch (order) {
case DataLayout::kNHWC:
return CUDNN_TENSOR_NHWC;
case DataLayout::kNCHW:
return CUDNN_TENSOR_NCHW;
default:
PADDLE_THROW("Unknown cudnn equivalent for order");
}
return CUDNN_TENSOR_NCHW;
}
class ScopedTensorDescriptor {
public:
ScopedTensorDescriptor() {
PADDLE_ENFORCE(dynload::cudnnCreateTensorDescriptor(&desc_));
}
~ScopedTensorDescriptor() {
PADDLE_ENFORCE(dynload::cudnnDestroyTensorDescriptor(desc_));
}
inline cudnnTensorDescriptor_t descriptor(const cudnnTensorFormat_t format,
const cudnnDataType_t type,
const std::vector<int>& dims) {
// the format is not used now, but it maybe useful feature
std::vector<int> strides(dims.size());
strides[dims.size() - 1] = 1;
for (int i = dims.size() - 2; i >= 0; i--) {
strides[i] = dims[i + 1] * strides[i + 1];
}
PADDLE_ENFORCE(dynload::cudnnSetTensorNdDescriptor(
desc_, type, dims.size(), dims.data(), strides.data()));
return desc_;
}
template <typename T>
inline cudnnTensorDescriptor_t descriptor(const DataLayout& order,
const std::vector<int>& dims) {
return descriptor(GetCudnnTensorFormat(order), CudnnDataType<T>::type,
dims);
}
private:
cudnnTensorDescriptor_t desc_;
DISABLE_COPY_AND_ASSIGN(ScopedTensorDescriptor);
};
class ScopedFilterDescriptor {
public:
ScopedFilterDescriptor() {
PADDLE_ENFORCE(dynload::cudnnCreateFilterDescriptor(&desc_));
}
~ScopedFilterDescriptor() {
PADDLE_ENFORCE(dynload::cudnnDestroyFilterDescriptor(desc_));
}
inline cudnnFilterDescriptor_t descriptor(const cudnnTensorFormat_t format,
const cudnnDataType_t type,
const std::vector<int>& kernel) {
// filter layout: output input spatial_dim_y spatial_dim_x
PADDLE_ENFORCE(dynload::cudnnSetFilterNdDescriptor(
desc_, type, format, kernel.size(), kernel.data()));
return desc_;
}
template <typename T>
inline cudnnFilterDescriptor_t descriptor(const DataLayout& order,
const std::vector<int>& kernel) {
return descriptor(GetCudnnTensorFormat(order), CudnnDataType<T>::type,
kernel);
}
private:
cudnnFilterDescriptor_t desc_;
DISABLE_COPY_AND_ASSIGN(ScopedFilterDescriptor);
};
class ScopedConvolutionDescriptor {
public:
ScopedConvolutionDescriptor() {
PADDLE_ENFORCE(dynload::cudnnCreateConvolutionDescriptor(&desc_));
}
~ScopedConvolutionDescriptor() {
PADDLE_ENFORCE(dynload::cudnnDestroyConvolutionDescriptor(desc_));
}
inline cudnnConvolutionDescriptor_t descriptor(
cudnnDataType_t type, const std::vector<int>& pads,
const std::vector<int>& strides, const std::vector<int>& dilations) {
PADDLE_ENFORCE_EQ(pads.size(), strides.size());
PADDLE_ENFORCE_EQ(pads.size(), dilations.size());
#if CUDNN_VERSION < 6000
// cudnn v5 does not support dilation conv, the argument is called upscale
// instead of dilations and it is must be one.
for (size_t i = 0; i < dilations.size(); ++i) {
PADDLE_ENFORCE_EQ(
dilations[i], 1,
"Dilations conv is not supported in this cuDNN version");
}
#endif
PADDLE_ENFORCE(dynload::cudnnSetConvolutionNdDescriptor(
desc_, pads.size(), pads.data(), strides.data(), dilations.data(),
CUDNN_CROSS_CORRELATION, type));
return desc_;
}
template <typename T>
inline cudnnConvolutionDescriptor_t descriptor(
const std::vector<int>& pads, const std::vector<int>& strides,
const std::vector<int>& dilations) {
return descriptor(CudnnDataType<T>::type, pads, strides, dilations);
}
private:
cudnnConvolutionDescriptor_t desc_;
DISABLE_COPY_AND_ASSIGN(ScopedConvolutionDescriptor);
};
class ScopedPoolingDescriptor {
public:
ScopedPoolingDescriptor() {
PADDLE_ENFORCE(dynload::cudnnCreatePoolingDescriptor(&desc_));
}
~ScopedPoolingDescriptor() {
PADDLE_ENFORCE(dynload::cudnnDestroyPoolingDescriptor(desc_));
}
inline cudnnPoolingDescriptor_t descriptor(const PoolingMode& mode,
const std::vector<int>& kernel,
const std::vector<int>& pads,
const std::vector<int>& strides) {
PADDLE_ENFORCE_EQ(kernel.size(), pads.size());
PADDLE_ENFORCE_EQ(kernel.size(), strides.size());
PADDLE_ENFORCE(dynload::cudnnSetPoolingNdDescriptor(
desc_, (mode == PoolingMode::kMaximum
? CUDNN_POOLING_MAX
: CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING),
CUDNN_PROPAGATE_NAN, // Always propagate nans.
kernel.size(), kernel.data(), pads.data(), strides.data()));
return desc_;
}
private:
cudnnPoolingDescriptor_t desc_;
DISABLE_COPY_AND_ASSIGN(ScopedPoolingDescriptor);
};
} // namespace platform
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/platform/cudnn_helper.h"
#include <gtest/gtest.h>
TEST(CudnnHelper, ScopedTensorDescriptor) {
using paddle::platform::ScopedTensorDescriptor;
using paddle::platform::DataLayout;
ScopedTensorDescriptor tensor_desc;
std::vector<int> shape = {2, 4, 6, 6};
auto desc = tensor_desc.descriptor<float>(DataLayout::kNCHW, shape);
cudnnDataType_t type;
int nd;
std::vector<int> dims(4);
std::vector<int> strides(4);
paddle::platform::dynload::cudnnGetTensorNdDescriptor(
desc, 4, &type, &nd, dims.data(), strides.data());
EXPECT_EQ(nd, 4);
for (size_t i = 0; i < dims.size(); ++i) {
EXPECT_EQ(dims[i], shape[i]);
}
EXPECT_EQ(strides[3], 1);
EXPECT_EQ(strides[2], 6);
EXPECT_EQ(strides[1], 36);
EXPECT_EQ(strides[0], 144);
}
TEST(CudnnHelper, ScopedFilterDescriptor) {
using paddle::platform::ScopedFilterDescriptor;
using paddle::platform::DataLayout;
ScopedFilterDescriptor filter_desc;
std::vector<int> shape = {2, 3, 3};
auto desc = filter_desc.descriptor<float>(DataLayout::kNCHW, shape);
cudnnDataType_t type;
int nd;
cudnnTensorFormat_t format;
std::vector<int> kernel(3);
paddle::platform::dynload::cudnnGetFilterNdDescriptor(desc, 3, &type, &format,
&nd, kernel.data());
EXPECT_EQ(GetCudnnTensorFormat(DataLayout::kNCHW), format);
EXPECT_EQ(nd, 3);
for (size_t i = 0; i < shape.size(); ++i) {
EXPECT_EQ(kernel[i], shape[i]);
}
}
TEST(CudnnHelper, ScopedConvolutionDescriptor) {
using paddle::platform::ScopedConvolutionDescriptor;
ScopedConvolutionDescriptor conv_desc;
std::vector<int> src_pads = {2, 2, 2};
std::vector<int> src_strides = {1, 1, 1};
std::vector<int> src_dilations = {1, 1, 1};
auto desc = conv_desc.descriptor<float>(src_pads, src_strides, src_dilations);
cudnnDataType_t type;
cudnnConvolutionMode_t mode;
int nd;
std::vector<int> pads(3);
std::vector<int> strides(3);
std::vector<int> dilations(3);
paddle::platform::dynload::cudnnGetConvolutionNdDescriptor(
desc, 3, &nd, pads.data(), strides.data(), dilations.data(), &mode,
&type);
EXPECT_EQ(nd, 3);
for (size_t i = 0; i < src_pads.size(); ++i) {
EXPECT_EQ(pads[i], src_pads[i]);
EXPECT_EQ(strides[i], src_strides[i]);
EXPECT_EQ(dilations[i], src_dilations[i]);
}
EXPECT_EQ(mode, CUDNN_CROSS_CORRELATION);
}
TEST(CudnnHelper, ScopedPoolingDescriptor) {
using paddle::platform::ScopedPoolingDescriptor;
using paddle::platform::PoolingMode;
ScopedPoolingDescriptor pool_desc;
std::vector<int> src_kernel = {2, 2, 5};
std::vector<int> src_pads = {1, 1, 2};
std::vector<int> src_strides = {2, 2, 3};
auto desc = pool_desc.descriptor(PoolingMode::kMaximum, src_kernel, src_pads,
src_strides);
cudnnPoolingMode_t mode;
cudnnNanPropagation_t nan_t = CUDNN_PROPAGATE_NAN;
int nd;
std::vector<int> kernel(3);
std::vector<int> pads(3);
std::vector<int> strides(3);
paddle::platform::dynload::cudnnGetPoolingNdDescriptor(
desc, 3, &mode, &nan_t, &nd, kernel.data(), pads.data(), strides.data());
EXPECT_EQ(nd, 3);
for (size_t i = 0; i < src_pads.size(); ++i) {
EXPECT_EQ(kernel[i], src_kernel[i]);
EXPECT_EQ(pads[i], src_pads[i]);
EXPECT_EQ(strides[i], src_strides[i]);
}
EXPECT_EQ(mode, CUDNN_POOLING_MAX);
}
cc_library(dynamic_loader SRCS dynamic_loader.cc DEPS glog gflags) cc_library(dynamic_loader SRCS dynamic_loader.cc DEPS glog gflags)
nv_library(dynload_cuda SRCS cublas.cc cudnn.cc curand.cc) nv_library(dynload_cuda SRCS cublas.cc cudnn.cc curand.cc DEPS dynamic_loader)
...@@ -62,19 +62,27 @@ extern void* cudnn_dso_handle; ...@@ -62,19 +62,27 @@ extern void* cudnn_dso_handle;
#define CUDNN_DNN_ROUTINE_EACH(__macro) \ #define CUDNN_DNN_ROUTINE_EACH(__macro) \
__macro(cudnnSetTensor4dDescriptor); \ __macro(cudnnSetTensor4dDescriptor); \
__macro(cudnnSetTensor4dDescriptorEx); \ __macro(cudnnSetTensor4dDescriptorEx); \
__macro(cudnnSetTensorNdDescriptor); \
__macro(cudnnGetTensorNdDescriptor); \
__macro(cudnnGetConvolutionNdForwardOutputDim); \ __macro(cudnnGetConvolutionNdForwardOutputDim); \
__macro(cudnnGetConvolutionForwardAlgorithm); \ __macro(cudnnGetConvolutionForwardAlgorithm); \
__macro(cudnnCreateTensorDescriptor); \ __macro(cudnnCreateTensorDescriptor); \
__macro(cudnnDestroyTensorDescriptor); \ __macro(cudnnDestroyTensorDescriptor); \
__macro(cudnnCreateFilterDescriptor); \ __macro(cudnnCreateFilterDescriptor); \
__macro(cudnnSetFilter4dDescriptor); \ __macro(cudnnSetFilter4dDescriptor); \
__macro(cudnnSetFilterNdDescriptor); \
__macro(cudnnGetFilterNdDescriptor); \
__macro(cudnnSetPooling2dDescriptor); \ __macro(cudnnSetPooling2dDescriptor); \
__macro(cudnnSetPoolingNdDescriptor); \
__macro(cudnnGetPoolingNdDescriptor); \
__macro(cudnnDestroyFilterDescriptor); \ __macro(cudnnDestroyFilterDescriptor); \
__macro(cudnnCreateConvolutionDescriptor); \ __macro(cudnnCreateConvolutionDescriptor); \
__macro(cudnnCreatePoolingDescriptor); \ __macro(cudnnCreatePoolingDescriptor); \
__macro(cudnnDestroyPoolingDescriptor); \ __macro(cudnnDestroyPoolingDescriptor); \
__macro(cudnnSetConvolution2dDescriptor); \ __macro(cudnnSetConvolution2dDescriptor); \
__macro(cudnnDestroyConvolutionDescriptor); \ __macro(cudnnDestroyConvolutionDescriptor); \
__macro(cudnnSetConvolutionNdDescriptor); \
__macro(cudnnGetConvolutionNdDescriptor); \
__macro(cudnnCreate); \ __macro(cudnnCreate); \
__macro(cudnnDestroy); \ __macro(cudnnDestroy); \
__macro(cudnnSetStream); \ __macro(cudnnSetStream); \
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
// Disable the copy and assignment operator for a class.
#ifndef DISABLE_COPY_AND_ASSIGN
#define DISABLE_COPY_AND_ASSIGN(classname) \
private: \
classname(const classname&) = delete; \
classname& operator=(const classname&) = delete
#endif
...@@ -22,7 +22,6 @@ limitations under the License. */ ...@@ -22,7 +22,6 @@ limitations under the License. */
#include <arpa/inet.h> #include <arpa/inet.h>
#include <net/if.h> #include <net/if.h>
#include <net/if_arp.h>
#include <sys/ioctl.h> #include <sys/ioctl.h>
#include <sstream> #include <sstream>
......
...@@ -2,22 +2,5 @@ if(WITH_PYTHON) ...@@ -2,22 +2,5 @@ if(WITH_PYTHON)
cc_library(paddle_pybind SHARED cc_library(paddle_pybind SHARED
SRCS pybind.cc SRCS pybind.cc
DEPS pybind python backward DEPS pybind python backward
sgd_op ${GLOB_OP_LIB})
gather_op
scatter_op
add_op
mul_op
rowwise_add_op
sigmoid_op
softmax_op
mean_op
cross_entropy_op
recurrent_op
uniform_random_op
gaussian_random_op
fill_zeros_like_op
lookup_table_op
scale_op
minus_op
pad_op)
endif(WITH_PYTHON) endif(WITH_PYTHON)
...@@ -39,12 +39,12 @@ USE_OP(sigmoid); ...@@ -39,12 +39,12 @@ USE_OP(sigmoid);
USE_OP(softmax); USE_OP(softmax);
USE_OP(rowwise_add); USE_OP(rowwise_add);
USE_OP(fill_zeros_like); USE_OP(fill_zeros_like);
USE_OP_ITSELF(recurrent_op); USE_NO_KERNEL_OP(recurrent);
USE_OP(gaussian_random); USE_OP(gaussian_random);
USE_OP(uniform_random); USE_OP(uniform_random);
USE_OP(lookup_table); USE_OP(lookup_table);
USE_OP(scale); USE_OP(scale);
USE_OP_ITSELF(identity); USE_NO_KERNEL_OP(identity);
USE_OP(minus); USE_OP(minus);
USE_CPU_ONLY_OP(gather); USE_CPU_ONLY_OP(gather);
USE_OP(pad); USE_OP(pad);
......
...@@ -85,6 +85,12 @@ message ConvConfig { ...@@ -85,6 +85,12 @@ message ConvConfig {
optional uint32 dilation = 15 [ default = 1 ]; optional uint32 dilation = 15 [ default = 1 ];
optional uint32 dilation_y = 16 [ default = 1 ]; optional uint32 dilation_y = 16 [ default = 1 ];
optional uint32 filter_size_z = 17 [ default = 1 ];
optional uint32 padding_z = 18 [ default = 1 ];
optional uint32 stride_z = 19 [ default = 1 ];
optional uint32 output_z = 20 [ default = 1 ];
optional uint32 img_size_z = 21 [ default = 1 ];
} }
message PoolConfig { message PoolConfig {
...@@ -127,6 +133,12 @@ message PoolConfig { ...@@ -127,6 +133,12 @@ message PoolConfig {
// if not set, use padding // if not set, use padding
optional uint32 padding_y = 13; optional uint32 padding_y = 13;
optional uint32 size_z = 14 [ default = 1 ];
optional uint32 stride_z = 15 [ default = 1 ];
optional uint32 output_z = 16 [ default = 1 ];
optional uint32 img_size_z = 17 [ default = 1 ];
optional uint32 padding_z = 18 [ default = 1 ];
} }
message SppConfig { message SppConfig {
...@@ -502,6 +514,8 @@ message LayerConfig { ...@@ -502,6 +514,8 @@ message LayerConfig {
// for HuberRegressionLoss // for HuberRegressionLoss
optional double delta = 57 [ default = 1.0 ]; optional double delta = 57 [ default = 1.0 ];
optional uint64 depth = 58 [ default = 1 ];
} }
message EvaluatorConfig { message EvaluatorConfig {
......
...@@ -886,6 +886,36 @@ class Conv(Cfg): ...@@ -886,6 +886,36 @@ class Conv(Cfg):
config_assert(output_x <= 0) config_assert(output_x <= 0)
# please refer to the comments in proto/ModelConfig.proto
@config_class
class Conv3D(Cfg):
def __init__(self,
filter_size,
channels,
padding=None,
stride=None,
groups=None,
filter_channels=None,
output_x=None,
img_size=None,
caffe_mode=True,
filter_size_y=None,
padding_y=None,
stride_y=None,
filter_size_z=None,
padding_z=None,
stride_z=None):
self.add_keys(locals())
self.filter_size_y = filter_size_y if filter_size_y else filter_size
self.filter_size_z = filter_size_z if filter_size_z else filter_size
self.padding_y = padding_y if padding_y else padding
self.padding_z = padding_z if padding_z else padding
self.stride_y = stride_y if stride_y else stride
self.stride_z = stride_z if stride_z else stride
if output_x is not None:
config_assert(output_x <= 0)
@config_class @config_class
class BilinearInterp(Cfg): class BilinearInterp(Cfg):
def __init__(self, out_size_x=None, out_size_y=None, channels=None): def __init__(self, out_size_x=None, out_size_y=None, channels=None):
...@@ -908,6 +938,31 @@ class Pool(Cfg): ...@@ -908,6 +938,31 @@ class Pool(Cfg):
self.add_keys(locals()) self.add_keys(locals())
@config_class
class Pool3d(Cfg):
def __init__(
self,
pool_type,
channels,
size_x,
size_y=None,
size_z=None,
start=None,
stride=None, # 1 by defalut in protobuf
stride_y=None,
stride_z=None,
padding=None, # 0 by defalut in protobuf
padding_y=None,
padding_z=None):
self.add_keys(locals())
self.filter_size_y = size_y if size_y else size_x
self.filter_size_z = size_z if size_z else size_x
self.padding_y = padding_y if padding_y else padding
self.padding_z = padding_z if padding_z else padding
self.stride_y = stride_y if stride_y else stride
self.stride_z = stride_z if stride_z else stride
@config_class @config_class
class SpatialPyramidPool(Cfg): class SpatialPyramidPool(Cfg):
def __init__(self, pool_type, pyramid_height, channels): def __init__(self, pool_type, pyramid_height, channels):
...@@ -1172,6 +1227,20 @@ def get_img_size(input_layer_name, channels): ...@@ -1172,6 +1227,20 @@ def get_img_size(input_layer_name, channels):
return img_size, img_size_y return img_size, img_size_y
def get_img3d_size(input_layer_name, channels):
input = g_layer_map[input_layer_name]
img_pixels = input.size / channels
img_size = input.width
img_size_y = input.height
img_size_z = input.depth
config_assert(
img_size * img_size_y * img_size_z == img_pixels,
"Input layer %s: Incorrect input image size %d * %d * %d for input image pixels %d"
% (input_layer_name, img_size, img_size_y, img_size_z, img_pixels))
return img_size, img_size_y, img_size_z
def parse_bilinear(bilinear, input_layer_name, bilinear_conf): def parse_bilinear(bilinear, input_layer_name, bilinear_conf):
parse_image(bilinear, input_layer_name, bilinear_conf.image_conf) parse_image(bilinear, input_layer_name, bilinear_conf.image_conf)
bilinear_conf.out_size_x = bilinear.out_size_x bilinear_conf.out_size_x = bilinear.out_size_x
...@@ -1209,6 +1278,45 @@ def parse_pool(pool, input_layer_name, pool_conf, ceil_mode): ...@@ -1209,6 +1278,45 @@ def parse_pool(pool, input_layer_name, pool_conf, ceil_mode):
pool_conf.stride_y, not ceil_mode) pool_conf.stride_y, not ceil_mode)
def parse_pool3d(pool, input_layer_name, pool_conf, ceil_mode):
pool_conf.pool_type = pool.pool_type
config_assert(pool.pool_type in ['max-projection', 'avg-projection'],
"pool-type %s is not in "
"['max-projection', 'avg-projection']" % pool.pool_type)
pool_conf.channels = pool.channels
pool_conf.size_x = pool.size_x
pool_conf.stride = pool.stride
pool_conf.padding = pool.padding
pool_conf.size_y = default(pool.size_y, pool_conf.size_x)
pool_conf.size_z = default(pool.size_z, pool_conf.size_x)
pool_conf.stride_y = default(pool.stride_y, pool_conf.stride)
pool_conf.stride_z = default(pool.stride_z, pool_conf.stride)
pool_conf.padding_y = default(pool.padding_y, pool_conf.padding)
pool_conf.padding_z = default(pool.padding_z, pool_conf.padding)
pool_conf.img_size, pool_conf.img_size_y, pool_conf.img_size_z = \
get_img3d_size(input_layer_name, pool.channels)
config_assert(not pool.start, "start is deprecated in pooling.")
if pool.padding is not None:
pool_conf.padding = pool.padding
pool_conf.padding_y = default(pool.padding_y, pool_conf.padding)
pool_conf.padding_z = default(pool.padding_z, pool_conf.padding)
pool_conf.output_x = cnn_output_size(pool_conf.img_size, pool_conf.size_x,
pool_conf.padding, pool_conf.stride,
not ceil_mode)
pool_conf.output_y = cnn_output_size(pool_conf.img_size_y, pool_conf.size_y,
pool_conf.padding_y,
pool_conf.stride_y, not ceil_mode)
pool_conf.output_z = cnn_output_size(pool_conf.img_size_z, pool_conf.size_z,
pool_conf.padding_z,
pool_conf.stride_z, not ceil_mode)
def parse_spp(spp, input_layer_name, spp_conf): def parse_spp(spp, input_layer_name, spp_conf):
parse_image(spp, input_layer_name, spp_conf.image_conf) parse_image(spp, input_layer_name, spp_conf.image_conf)
spp_conf.pool_type = spp.pool_type spp_conf.pool_type = spp.pool_type
...@@ -1282,6 +1390,50 @@ def parse_conv(conv, input_layer_name, conv_conf, num_filters, trans=False): ...@@ -1282,6 +1390,50 @@ def parse_conv(conv, input_layer_name, conv_conf, num_filters, trans=False):
conv_conf.stride_y, conv_conf.caffe_mode) conv_conf.stride_y, conv_conf.caffe_mode)
#caffe_mode: compute the output size using floor instead of ceil,
# which is consistent of caffe and CuDNN's convention.
def parse_conv3d(conv, input_layer_name, conv_conf, num_filters, trans=False):
conv_conf.filter_size = conv.filter_size
conv_conf.filter_size_y = conv.filter_size_y
conv_conf.filter_size_z = conv.filter_size_z
conv_conf.channels = conv.channels
conv_conf.padding = conv.padding
conv_conf.padding_y = conv.padding_y
conv_conf.padding_z = conv.padding_z
conv_conf.stride = conv.stride
conv_conf.stride_y = conv.stride_y
conv_conf.stride_z = conv.stride_z
conv_conf.groups = conv.groups
conv_conf.caffe_mode = conv.caffe_mode
if not trans:
conv_conf.filter_channels = conv.channels / conv.groups
conv_conf.img_size, conv_conf.img_size_y, conv_conf.img_size_z = \
get_img3d_size(input_layer_name, conv.channels)
conv_conf.output_x = cnn_output_size(
conv_conf.img_size, conv_conf.filter_size, conv_conf.padding,
conv_conf.stride, conv_conf.caffe_mode)
conv_conf.output_y = cnn_output_size(
conv_conf.img_size_y, conv_conf.filter_size_y, conv_conf.padding_y,
conv_conf.stride_y, conv_conf.caffe_mode)
conv_conf.output_z = cnn_output_size(
conv_conf.img_size_z, conv_conf.filter_size_z, conv_conf.padding_z,
conv_conf.stride_z, conv_conf.caffe_mode)
else:
conv_conf.filter_channels = num_filters / conv.groups
conv_conf.output_x, conv_conf.output_y, conv_conf.output_z = \
get_img3d_size(input_layer_name, conv.channels)
conv_conf.img_size = cnn_image_size(
conv_conf.output_x, conv_conf.filter_size, conv_conf.padding,
conv_conf.stride, conv_conf.caffe_mode)
conv_conf.img_size_y = cnn_image_size(
conv_conf.output_y, conv_conf.filter_size_y, conv_conf.padding_y,
conv_conf.stride_y, conv_conf.caffe_mode)
conv_conf.img_size_z = cnn_image_size(
conv_conf.output_z, conv_conf.filter_size_z, conv_conf.padding_z,
conv_conf.stride_z, conv_conf.caffe_mode)
def parse_block_expand(block_expand, input_layer_name, block_expand_conf): def parse_block_expand(block_expand, input_layer_name, block_expand_conf):
block_expand_conf.channels = block_expand.channels block_expand_conf.channels = block_expand.channels
block_expand_conf.stride_x = block_expand.stride_x block_expand_conf.stride_x = block_expand.stride_x
...@@ -1585,6 +1737,9 @@ class LayerBase(object): ...@@ -1585,6 +1737,9 @@ class LayerBase(object):
self.config.height = height self.config.height = height
self.config.width = width self.config.width = width
def set_layer_depth(self, depth):
self.config.depth = depth
def set_cnn_layer(self, def set_cnn_layer(self,
input_layer_name, input_layer_name,
height, height,
...@@ -1607,6 +1762,21 @@ class MultiClassCrossEntropySelfNormCostLayer(LayerBase): ...@@ -1607,6 +1762,21 @@ class MultiClassCrossEntropySelfNormCostLayer(LayerBase):
self.config.softmax_selfnorm_alpha = softmax_selfnorm_alpha self.config.softmax_selfnorm_alpha = softmax_selfnorm_alpha
@config_layer('cross_entropy_over_beam')
class CrossEntropyOverBeamLayer(LayerBase):
def __init__(self, name, inputs, **xargs):
config_assert(len(inputs) % 3 == 0, "Error input number.")
super(CrossEntropyOverBeamLayer, self).__init__(
name, 'cross_entropy_over_beam', 0, inputs, **xargs)
input_num = len(inputs) / 3
for i in range(input_num):
input_layer = self.get_input_layer(i * 3)
config_assert(input_layer.size == 1, (
"Inputs for this layer are made up of "
"several triples, in which the first one is scores over "
"all candidate paths, whose size should be equal to 1."))
@config_layer('fc') @config_layer('fc')
class FCLayer(LayerBase): class FCLayer(LayerBase):
layer_type = 'fc' layer_type = 'fc'
...@@ -1788,11 +1958,19 @@ class DetectionOutputLayer(LayerBase): ...@@ -1788,11 +1958,19 @@ class DetectionOutputLayer(LayerBase):
@config_layer('data') @config_layer('data')
class DataLayer(LayerBase): class DataLayer(LayerBase):
def __init__(self, name, size, height=None, width=None, device=None): def __init__(self,
name,
size,
depth=None,
height=None,
width=None,
device=None):
super(DataLayer, self).__init__( super(DataLayer, self).__init__(
name, 'data', size, inputs=[], device=device) name, 'data', size, inputs=[], device=device)
if height and width: if height and width:
self.set_layer_height_width(height, width) self.set_layer_height_width(height, width)
if depth:
self.set_layer_depth(depth)
''' '''
...@@ -1991,6 +2169,87 @@ class ConvTransLayer(ConvTransLayerBase): ...@@ -1991,6 +2169,87 @@ class ConvTransLayer(ConvTransLayerBase):
layer_type = 'cudnn_convt' layer_type = 'cudnn_convt'
@config_layer('conv_3d')
class Conv3DLayerBase(LayerBase):
def __init__(self,
name,
inputs=[],
bias=True,
num_filters=None,
shared_biases=True,
**xargs):
super(Conv3DLayerBase, self).__init__(
name, self.layer_type, 0, inputs=inputs, **xargs)
if num_filters is not None:
self.config.num_filters = num_filters
# need to specify layer in config
self.config.type = self.layer_type
trans = False
if self.config.type == "deconv3d":
trans = True
if shared_biases is not None:
self.config.shared_biases = shared_biases
for input_index in xrange(len(self.inputs)):
input_layer = self.get_input_layer(input_index)
conv_conf = self.config.inputs[input_index].conv_conf
parse_conv3d(
self.inputs[input_index].conv,
input_layer.name,
conv_conf,
num_filters,
trans=trans
) # for z-axis pad:0, strid:1, filter_size:1, img_size:1
psize = self.calc_parameter_size(conv_conf)
self.create_input_parameter(input_index, psize)
if trans:
self.set_cnn_layer(name, conv_conf.img_size_z,
conv_conf.img_size_y, conv_conf.img_size,
self.config.num_filters)
else:
self.set_cnn_layer(name, conv_conf.output_z, conv_conf.output_y,
conv_conf.output_x, self.config.num_filters)
psize = self.config.size
if shared_biases:
psize = self.config.num_filters
self.create_bias_parameter(bias, psize, [psize, 1])
def calc_parameter_size(self, conv_conf):
return self.config.num_filters * conv_conf.filter_channels \
* (conv_conf.filter_size * conv_conf.filter_size_y \
* conv_conf.filter_size_z)
def set_cnn_layer(self,
input_layer_name,
depth,
height,
width,
channels,
is_print=True):
size = depth * height * width * channels
self.set_layer_size(size)
self.set_layer_height_width(height, width)
self.set_layer_depth(depth)
if is_print:
print("output for %s: c = %d, d = %d, h = %d, w = %d, size = %d" %
(input_layer_name, channels, depth, height, width, size))
@config_layer('conv3d')
class Conv3DLayer(Conv3DLayerBase):
layer_type = 'conv3d'
@config_layer('deconv3d')
class Conv3DLayer(Conv3DLayerBase):
layer_type = 'deconv3d'
@config_layer('norm') @config_layer('norm')
class NormLayer(LayerBase): class NormLayer(LayerBase):
def __init__(self, name, inputs, **xargs): def __init__(self, name, inputs, **xargs):
...@@ -2020,6 +2279,35 @@ class PoolLayer(LayerBase): ...@@ -2020,6 +2279,35 @@ class PoolLayer(LayerBase):
pool_conf.channels) pool_conf.channels)
@config_layer('pool3d')
class Pool3DLayer(LayerBase):
def __init__(self, name, inputs, ceil_mode=True, **xargs):
super(Pool3DLayer, self).__init__(
name, 'pool3d', 0, inputs=inputs, **xargs)
for input_index in xrange(len(self.inputs)):
input_layer = self.get_input_layer(input_index)
pool_conf = self.config.inputs[input_index].pool_conf
parse_pool3d(self.inputs[input_index].pool, input_layer.name,
pool_conf, ceil_mode)
self.set_cnn_layer(name, pool_conf.output_z, pool_conf.output_y,
pool_conf.output_x, pool_conf.channels)
def set_cnn_layer(self,
input_layer_name,
depth,
height,
width,
channels,
is_print=True):
size = depth * height * width * channels
self.set_layer_size(size)
self.set_layer_height_width(height, width)
self.set_layer_depth(depth)
if is_print:
print("output for %s: c = %d, d = %d, h = %d, w = %d, size = %d" %
(input_layer_name, channels, depth, height, width, size))
@config_layer('spp') @config_layer('spp')
class SpatialPyramidPoolLayer(LayerBase): class SpatialPyramidPoolLayer(LayerBase):
def __init__(self, name, inputs, **xargs): def __init__(self, name, inputs, **xargs):
...@@ -2268,6 +2556,7 @@ def define_cost(class_name, cost_type): ...@@ -2268,6 +2556,7 @@ def define_cost(class_name, cost_type):
define_cost('MultiClassCrossEntropy', 'multi-class-cross-entropy') define_cost('MultiClassCrossEntropy', 'multi-class-cross-entropy')
define_cost('CrossEntropyOverBeamCostLayer', 'cross_entropy_over_beam')
define_cost('RankingCost', 'rank-cost') define_cost('RankingCost', 'rank-cost')
define_cost('AucValidation', 'auc-validation') define_cost('AucValidation', 'auc-validation')
define_cost('PnpairValidation', 'pnpair-validation') define_cost('PnpairValidation', 'pnpair-validation')
......
文件模式从 100755 更改为 100644
...@@ -11,7 +11,6 @@ ...@@ -11,7 +11,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import functools import functools
import collections import collections
import inspect import inspect
...@@ -54,7 +53,7 @@ __all__ = [ ...@@ -54,7 +53,7 @@ __all__ = [
'cos_sim', 'cos_sim',
'hsigmoid', 'hsigmoid',
'conv_projection', 'conv_projection',
'mse_cost', 'square_error_cost',
'regression_cost', 'regression_cost',
'classification_cost', 'classification_cost',
'LayerOutput', 'LayerOutput',
...@@ -106,6 +105,8 @@ __all__ = [ ...@@ -106,6 +105,8 @@ __all__ = [
'nce_layer', 'nce_layer',
'cross_entropy_with_selfnorm', 'cross_entropy_with_selfnorm',
'cross_entropy', 'cross_entropy',
'BeamInput',
'cross_entropy_over_beam',
'multi_binary_label_cross_entropy', 'multi_binary_label_cross_entropy',
'sum_cost', 'sum_cost',
'rank_cost', 'rank_cost',
...@@ -136,8 +137,10 @@ __all__ = [ ...@@ -136,8 +137,10 @@ __all__ = [
'clip_layer', 'clip_layer',
'slice_projection', 'slice_projection',
'seq_slice_layer', 'seq_slice_layer',
'kmax_sequence_score_layer', 'kmax_seq_score_layer',
'img_pool3d_layer',
'scale_shift_layer', 'scale_shift_layer',
'img_conv3d_layer',
] ]
...@@ -166,6 +169,7 @@ class LayerType(object): ...@@ -166,6 +169,7 @@ class LayerType(object):
EXCONVTRANS_LAYER = 'exconvt' EXCONVTRANS_LAYER = 'exconvt'
CUDNNCONV_LAYER = 'cudnn_conv' CUDNNCONV_LAYER = 'cudnn_conv'
POOL_LAYER = 'pool' POOL_LAYER = 'pool'
POOL3D_LAYER = 'pool3d'
BATCH_NORM_LAYER = 'batch_norm' BATCH_NORM_LAYER = 'batch_norm'
NORM_LAYER = 'norm' NORM_LAYER = 'norm'
SUM_TO_ONE_NORM_LAYER = 'sum_to_one_norm' SUM_TO_ONE_NORM_LAYER = 'sum_to_one_norm'
...@@ -219,12 +223,16 @@ class LayerType(object): ...@@ -219,12 +223,16 @@ class LayerType(object):
CRF_DECODING_LAYER = 'crf_decoding' CRF_DECODING_LAYER = 'crf_decoding'
NCE_LAYER = 'nce' NCE_LAYER = 'nce'
CONV3D_LAYER = 'conv3d'
DECONV3D_LAYER = 'deconv3d'
RANK_COST = 'rank-cost' RANK_COST = 'rank-cost'
LAMBDA_COST = 'lambda_cost' LAMBDA_COST = 'lambda_cost'
HUBER_REGRESSION = 'huber_regression' HUBER_REGRESSION = 'huber_regression'
HUBER_CLASSIFICATION = 'huber_classification' HUBER_CLASSIFICATION = 'huber_classification'
CROSS_ENTROPY = 'multi-class-cross-entropy' CROSS_ENTROPY = 'multi-class-cross-entropy'
CROSS_ENTROPY_WITH_SELFNORM = 'multi_class_cross_entropy_with_selfnorm' CROSS_ENTROPY_WITH_SELFNORM = 'multi_class_cross_entropy_with_selfnorm'
CROSS_ENTROPY_OVER_BEAM = 'cross_entropy_over_beam'
SOFT_BIN_CLASS_CROSS_ENTROPY = 'soft_binary_class_cross_entropy' SOFT_BIN_CLASS_CROSS_ENTROPY = 'soft_binary_class_cross_entropy'
MULTI_BIN_LABEL_CROSS_ENTROPY = 'multi_binary_label_cross_entropy' MULTI_BIN_LABEL_CROSS_ENTROPY = 'multi_binary_label_cross_entropy'
SUM_COST = 'sum_cost' SUM_COST = 'sum_cost'
...@@ -894,7 +902,8 @@ def mixed_layer(size=0, ...@@ -894,7 +902,8 @@ def mixed_layer(size=0,
@layer_support() @layer_support()
def data_layer(name, size, height=None, width=None, layer_attr=None): def data_layer(name, size, depth=None, height=None, width=None,
layer_attr=None):
""" """
Define DataLayer For NeuralNetwork. Define DataLayer For NeuralNetwork.
...@@ -921,15 +930,18 @@ def data_layer(name, size, height=None, width=None, layer_attr=None): ...@@ -921,15 +930,18 @@ def data_layer(name, size, height=None, width=None, layer_attr=None):
type=LayerType.DATA, type=LayerType.DATA,
name=name, name=name,
size=size, size=size,
depth=depth,
height=height, height=height,
width=width, width=width,
**ExtraLayerAttribute.to_kwargs(layer_attr)) **ExtraLayerAttribute.to_kwargs(layer_attr))
if depth is None:
depth = 1
num_filters = None num_filters = None
if height is not None and width is not None: if height is not None and width is not None:
num_filters = size / (width * height) num_filters = size / (width * height * depth)
assert num_filters * width * height == size, \ assert num_filters * width * height * depth == size, \
"size=%s width=%s height=%s" % (size, width, height) "size=%s width=%s height=%s depth=%s" % (size, width, height, depth)
return LayerOutput(name, LayerType.DATA, size=size, num_filters=num_filters) return LayerOutput(name, LayerType.DATA, size=size, num_filters=num_filters)
...@@ -2653,6 +2665,146 @@ def img_pool_layer(input, ...@@ -2653,6 +2665,146 @@ def img_pool_layer(input,
size=l.config.size) size=l.config.size)
@wrap_name_default("pool3d")
@layer_support()
def img_pool3d_layer(input,
pool_size,
name=None,
num_channels=None,
pool_type=None,
stride=1,
padding=0,
layer_attr=None,
pool_size_y=None,
stride_y=None,
padding_y=None,
pool_size_z=None,
stride_z=None,
padding_z=None,
ceil_mode=True):
"""
Image pooling Layer.
The details of pooling layer, please refer ufldl's pooling_ .
.. _pooling: http://ufldl.stanford.edu/tutorial/supervised/Pooling/
- ceil_mode=True:
.. math::
w = 1 + int(ceil(input\_width + 2 * padding - pool\_size) / float(stride))
h = 1 + int(ceil(input\_height + 2 * padding\_y - pool\_size\_y) / float(stride\_y))
d = 1 + int(ceil(input\_depth + 2 * padding\_z - pool\_size\_z) / float(stride\_z))
- ceil_mode=False:
.. math::
w = 1 + int(floor(input\_width + 2 * padding - pool\_size) / float(stride))
h = 1 + int(floor(input\_height + 2 * padding\_y - pool\_size\_y) / float(stride\_y))
d = 1 + int(floor(input\_depth + 2 * padding\_z - pool\_size\_z) / float(stride\_z))
The example usage is:
.. code-block:: python
maxpool = img_pool3d_layer(input=conv,
pool_size=3,
num_channels=8,
stride=1,
padding=1,
pool_type=MaxPooling())
:param padding: pooling padding width.
:type padding: int|tuple|list
:param name: name of pooling layer
:type name: basestring.
:param input: layer's input
:type input: LayerOutput
:param pool_size: pooling window width
:type pool_size: int|tuple|list
:param num_channels: number of input channel.
:type num_channels: int
:param pool_type: pooling type. MaxPooling or AvgPooling. Default is
MaxPooling.
:type pool_type: BasePoolingType
:param stride: stride width of pooling.
:type stride: int|tuple|list
:param layer_attr: Extra Layer attribute.
:type layer_attr: ExtraLayerAttribute
:param ceil_mode: Wether to use ceil mode to calculate output height and with.
Defalut is True. If set false, Otherwise use floor.
:type ceil_mode: bool
:return: LayerOutput object.
:rtype: LayerOutput
"""
if num_channels is None:
assert input.num_filters is not None
num_channels = input.num_filters
if pool_type is None:
pool_type = MaxPooling()
elif isinstance(pool_type, AvgPooling):
pool_type.name = 'avg'
type_name = pool_type.name + '-projection' \
if (
isinstance(pool_type, AvgPooling) or isinstance(pool_type, MaxPooling)) \
else pool_type.name
if isinstance(pool_size, collections.Sequence):
assert len(pool_size) == 3
pool_size, pool_size_y, pool_size_z = pool_size
else:
pool_size_y = pool_size
pool_size_z = pool_size
if isinstance(stride, collections.Sequence):
assert len(stride) == 3
stride, stride_y, stride_z = stride
else:
stride_y = stride
stride_z = stride
if isinstance(padding, collections.Sequence):
assert len(padding) == 3
padding, padding_y, padding_y = padding
else:
padding_y = padding
padding_z = padding
l = Layer(
name=name,
type=LayerType.POOL3D_LAYER,
inputs=[
Input(
input.name,
pool=Pool3d(
pool_type=type_name,
channels=num_channels,
size_x=pool_size,
start=None,
stride=stride,
padding=padding,
size_y=pool_size_y,
stride_y=stride_y,
padding_y=padding_y,
size_z=pool_size_z,
stride_z=stride_z,
padding_z=padding_z))
],
ceil_mode=ceil_mode,
**ExtraLayerAttribute.to_kwargs(layer_attr))
return LayerOutput(
name,
LayerType.POOL_LAYER,
parents=[input],
num_filters=num_channels,
size=l.config.size)
@wrap_name_default("spp") @wrap_name_default("spp")
@layer_support() @layer_support()
def spp_layer(input, def spp_layer(input,
...@@ -4071,8 +4223,12 @@ def __cost_input__(input, label, weight=None): ...@@ -4071,8 +4223,12 @@ def __cost_input__(input, label, weight=None):
""" """
inputs and parents for cost layers. inputs and parents for cost layers.
""" """
ipts = [Input(input.name), Input(label.name)] if isinstance(input, LayerOutput):
parents = [input, label] input = [input]
if isinstance(label, LayerOutput):
label = [label]
ipts = [Input(ipt.name) for ipt in (input + label)]
parents = [ipt for ipt in (input + label)]
if weight is not None: if weight is not None:
assert weight.size == 1 assert weight.size == 1
ipts.append(Input(weight.name)) ipts.append(Input(weight.name))
...@@ -4082,13 +4238,18 @@ def __cost_input__(input, label, weight=None): ...@@ -4082,13 +4238,18 @@ def __cost_input__(input, label, weight=None):
@wrap_name_default() @wrap_name_default()
@layer_support() @layer_support()
def mse_cost(input, label, weight=None, name=None, coeff=1.0, layer_attr=None): def square_error_cost(input,
label,
weight=None,
name=None,
coeff=1.0,
layer_attr=None):
""" """
mean squared error cost: sum of square error cost:
.. math:: .. math::
\\frac{1}{N}\sum_{i=1}^N(t_i-y_i)^2 cost = \\sum_{i=1}^N(t_i-y_i)^2
:param name: layer name. :param name: layer name.
:type name: basestring :type name: basestring
...@@ -4117,7 +4278,7 @@ def mse_cost(input, label, weight=None, name=None, coeff=1.0, layer_attr=None): ...@@ -4117,7 +4278,7 @@ def mse_cost(input, label, weight=None, name=None, coeff=1.0, layer_attr=None):
return LayerOutput(name, LayerType.COST, parents=parents, size=1) return LayerOutput(name, LayerType.COST, parents=parents, size=1)
regression_cost = mse_cost regression_cost = square_error_cost
@wrap_name_default("cost") @wrap_name_default("cost")
...@@ -5059,17 +5220,6 @@ def warp_ctc_layer(input, ...@@ -5059,17 +5220,6 @@ def warp_ctc_layer(input,
building process, PaddlePaddle will clone the source codes, build and building process, PaddlePaddle will clone the source codes, build and
install it to :code:`third_party/install/warpctc` directory. install it to :code:`third_party/install/warpctc` directory.
To use warp_ctc layer, you need to specify the path of :code:`libwarpctc.so`,
using following methods:
1. Set it in :code:`paddle.init` (python api) or :code:`paddle_init` (c api),
such as :code:`paddle.init(use_gpu=True,
warpctc_dir=your_paddle_source_dir/third_party/install/warpctc/lib)`.
2. Set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH
on Mac OS. For instance, :code:`export
LD_LIBRARY_PATH=your_paddle_source_dir/third_party/install/warpctc/lib:$LD_LIBRARY_PATH`.
More details of CTC can be found by referring to `Connectionist Temporal More details of CTC can be found by referring to `Connectionist Temporal
Classification: Labelling Unsegmented Sequence Data with Recurrent Classification: Labelling Unsegmented Sequence Data with Recurrent
Neural Networks <http://machinelearning.wustl.edu/mlpapers/paper_files/ Neural Networks <http://machinelearning.wustl.edu/mlpapers/paper_files/
...@@ -5777,10 +5927,10 @@ def multi_binary_label_cross_entropy(input, ...@@ -5777,10 +5927,10 @@ def multi_binary_label_cross_entropy(input,
if input.activation is None or \ if input.activation is None or \
not isinstance(input.activation, SigmoidActivation): not isinstance(input.activation, SigmoidActivation):
logger.log( logger.log(logging.WARN,
logging.WARN, ("%s is not a recommended activation for "
"%s is not recommend for multi_binary_label_cross_entropy's activation, " "multi_binary_label_cross_entropy, sigmoid is better") %
"maybe the sigmoid is better" % repr(input.activation)) repr(input.activation))
Layer( Layer(
name=name, name=name,
...@@ -5795,6 +5945,113 @@ def multi_binary_label_cross_entropy(input, ...@@ -5795,6 +5945,113 @@ def multi_binary_label_cross_entropy(input,
size=1) size=1)
class BeamInput(object):
"""
Define the input for cross_entropy_over_beam layer.
A beam is made up of a triple: the first one is scores over all
candidates; the second one is indices of top k selected candidates; the
third one is the index of ground truth, which is also always called
gold.
"""
def __init__(self, candidate_scores, selected_candidates, gold):
assert isinstance(candidate_scores, LayerOutput)
self.candidate_scores = candidate_scores
assert candidate_scores.size == 1
assert isinstance(selected_candidates, LayerOutput)
self.selected_candidates = selected_candidates
assert isinstance(gold, LayerOutput)
self.gold = gold
@wrap_name_default()
@layer_support()
def cross_entropy_over_beam(input, name=None):
"""
This layer is used in learning to search models, which is to solve complex
joint prediction problems based on learning to search through a
problem-defined search space.
Specifically, the learning to search process for this layer begins with
searching a target sequence from a nested sequence. In the first search
step, top beam size sequences with highest scores, indices of these top k
sequences in the original nested sequence, and the ground truth (also
called gold) altogether (a triple) make up of the first beam.
Then, several special positions, for example, start and end positions
that define meaningful segments are searched. In these searches, top k
positions with highest scores are selected, and then sequence, starting
from the selected starts till ends of the sequences (or a fixed position)
are taken to search next.
We call the possible top k results returned in one search the beam. This
search process can be repeated for pre-defined turns and leads to several
beam expansions.
Finally, the layer cross_entropy_over_beam takes all the beam expansions
which contain several candidate targets found along the multi-step search.
cross_entropy_over_beam calculates cross entropy over the expanded beams
which all the candidates in the beam as the normalized factor.
Note that, if gold falls off the beam at search step t, then the cost is
calculated over the beam at step t.
This cost layer always works together with kmax_seq_score_layer,
sub_nested_seq_layer, and sequence_slice_layer to trim the input to form a
sub-search space.
The example usage is:
.. code-block:: python
cost = cross_entropy_over_beam(input=[
BeamInput(
candidate_scores=beam1_candidates,
selected_candidates=beam1_topk,
gold=gold1),
BeamInput(
candidate_scores=beam2_candidates,
selected_candidates=beam2_topk,
gold=gold2),
])
:param input: input beams for this layer.
:type input: BeamInput
:param name: input beams for this layer.
:type name: basestring
:return: LayerOutput object.
:rtype: LayerOutput
"""
if isinstance(input, BeamInput):
input = [input]
else:
assert isinstance(input, list), (
'input for cross_entropy_over_beam shold be a python list '
'of BeamInput object.')
for ipt in input:
assert isinstance(ipt, BeamInput), (
'input for cross_entropy_over_beam '
'should be a BeamInput object.')
ipts = []
parents = []
for beam in input:
parents += [beam.candidate_scores, beam.selected_candidates, beam.gold]
ipts += [
beam.candidate_scores.name, beam.selected_candidates.name,
beam.gold.name
]
Layer(name=name, type=LayerType.CROSS_ENTROPY_OVER_BEAM, inputs=ipts)
return LayerOutput(name, LayerType.CROSS_ENTROPY, parents=parents, size=1)
@wrap_name_default() @wrap_name_default()
@layer_support() @layer_support()
def smooth_l1_cost(input, label, name=None, coeff=1.0, layer_attr=None): def smooth_l1_cost(input, label, name=None, coeff=1.0, layer_attr=None):
...@@ -6345,14 +6602,14 @@ def seq_slice_layer(input, starts, ends, name=None): ...@@ -6345,14 +6602,14 @@ def seq_slice_layer(input, starts, ends, name=None):
@wrap_name_default() @wrap_name_default()
@layer_support() @layer_support()
def kmax_sequence_score_layer(input, name=None, beam_size=1): def kmax_seq_score_layer(input, name=None, beam_size=1):
""" """
This layer accepts one input which are scores over a sequence or a nested This layer accepts one input which are scores over a sequence or a nested
sequence, and returns indices of beam_size sequences with highest scores. sequence, and returns indices of beam_size sequences with highest scores.
.. code-block:: python .. code-block:: python
kmax_indices = kmax_sequence_score_layer(input=input_layer, beam_size) kmax_indices = kmax_seq_score_layer(input=input_layer, beam_size)
:param name: The Layer Name. :param name: The Layer Name.
...@@ -6365,10 +6622,10 @@ def kmax_sequence_score_layer(input, name=None, beam_size=1): ...@@ -6365,10 +6622,10 @@ def kmax_sequence_score_layer(input, name=None, beam_size=1):
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
""" """
assert isinstance(input, LayerOutput), ("kmax_sequence_score_layer " assert isinstance(input, LayerOutput), ("kmax_seq_score_layer "
"accepts only one input.") "accepts only one input.")
assert input.size == 1, ( assert input.size == 1, (
"input of kmax_sequence_score_layer is a score" "input of kmax_seq_score_layer is a score "
"over a sequence or a nested sequence, so its width must be 1.") "over a sequence or a nested sequence, so its width must be 1.")
Layer( Layer(
...@@ -6381,6 +6638,149 @@ def kmax_sequence_score_layer(input, name=None, beam_size=1): ...@@ -6381,6 +6638,149 @@ def kmax_sequence_score_layer(input, name=None, beam_size=1):
name, LayerType.KMAX_SEQ_SCORE, parents=[input], size=input.size) name, LayerType.KMAX_SEQ_SCORE, parents=[input], size=input.size)
@wrap_name_default("conv3d")
@wrap_param_attr_default()
@wrap_bias_attr_default()
@wrap_act_default(act=ReluActivation())
@layer_support(DROPOUT)
def img_conv3d_layer(input,
filter_size,
num_filters,
name=None,
num_channels=None,
act=None,
groups=1,
stride=1,
padding=0,
bias_attr=None,
param_attr=None,
shared_biases=True,
layer_attr=None,
trans=False,
layer_type=None):
"""
The example usage is:
.. code-block:: python
conv = img_conv3d_layer(input=data, filter_size=1,
num_channels=8,
num_filters=16, stride=1,
bias_attr=False,
act=ReluActivation())
:param name: Layer name.
:type name: basestring
:param input: Layer Input.
:type input: LayerOutput
:param filter_size: The x dimension of a filter kernel. Or input a list.
:type filter_size: int|tuple|list
:param num_filters: Each filter group's number of filter
:param act: Activation type. Default is tanh
:type act: BaseActivation
:param groups: Group size of filters.
:type groups: int
:param stride: The x dimension of the stride. Or input a tuple for two image
dimension.
:type stride: int|tuple|list
:param padding: The x dimension of the padding. Or input a tuple for two
image dimension
:type padding: int|tuple|list
:param bias_attr: Convolution bias attribute. None means default bias.
False means no bias.
:type bias_attr: ParameterAttribute|False
:param num_channels: number of input channels. If None will be set
automatically from previous output.
:type num_channels: int
:param param_attr: Convolution param attribute. None means default attribute
:type param_attr: ParameterAttribute
:param shared_biases: Is biases will be shared between filters or not.
:type shared_biases: bool
:param layer_attr: Layer Extra Attribute.
:type layer_attr: ExtraLayerAttribute
:param trans: true if it is a convTransLayer, false if it is a convLayer
:type trans: bool
:param layer_type: specify the layer_type, default is None. If trans=True,
layer_type has to be "exconvt" or "cudnn_convt",
otherwise layer_type has to be either "exconv" or
"cudnn_conv"
:type layer_type: String
:return: LayerOutput object.
:rtype: LayerOutput
"""
if num_channels is None:
assert input.num_filters is not None
num_channels = input.num_filters
if isinstance(filter_size, collections.Sequence):
assert len(filter_size) == 3
filter_size, filter_size_y, filter_size_z = filter_size
else:
filter_size_y = filter_size
filter_size_z = filter_size
if isinstance(stride, collections.Sequence):
assert len(stride) == 3
stride, stride_y, stride_z = stride
else:
stride_y = stride
stride_z = stride
if isinstance(padding, collections.Sequence):
assert len(padding) == 3
padding, padding_y, padding_z = padding
else:
padding_y = padding
padding_z = padding
if param_attr.attr.get('initial_smart'):
# special initial for conv layers.
init_w = (2.0 / (filter_size**2 * num_channels))**0.5
param_attr.attr["initial_mean"] = 0.0
param_attr.attr["initial_std"] = init_w
param_attr.attr["initial_strategy"] = 0
param_attr.attr["initial_smart"] = False
if layer_type:
if trans:
assert layer_type in ["deconv3d"]
lt = layer_type
else:
lt = LayerType.DECONV3D_LAYER if trans else LayerType.CONV3D_LAYER
l = Layer(
name=name,
inputs=Input(
input.name,
conv=Conv3D(
filter_size=filter_size,
padding=padding,
stride=stride,
channels=num_channels,
groups=groups,
filter_size_y=filter_size_y,
padding_y=padding_y,
stride_y=stride_y,
filter_size_z=filter_size_z,
padding_z=padding_z,
stride_z=stride_z),
**param_attr.attr),
active_type=act.name,
num_filters=num_filters,
bias=ParamAttr.to_bias(bias_attr),
shared_biases=shared_biases,
type=lt,
**ExtraLayerAttribute.to_kwargs(layer_attr))
return LayerOutput(
name,
lt,
parents=[input],
activation=act,
num_filters=num_filters,
size=l.config.size)
@wrap_name_default("scale_shift") @wrap_name_default("scale_shift")
@wrap_param_attr_default() @wrap_param_attr_default()
@wrap_bias_attr_default() @wrap_bias_attr_default()
......
文件模式从 100755 更改为 100644
...@@ -8,7 +8,8 @@ test_spp_layer test_bilinear_interp test_maxout test_bi_grumemory math_ops ...@@ -8,7 +8,8 @@ test_spp_layer test_bilinear_interp test_maxout test_bi_grumemory math_ops
test_seq_concat_reshape test_pad test_smooth_l1 test_multiplex_layer test_seq_concat_reshape test_pad test_smooth_l1 test_multiplex_layer
test_prelu_layer test_row_conv test_detection_output_layer test_multibox_loss_layer test_prelu_layer test_row_conv test_detection_output_layer test_multibox_loss_layer
test_recursive_topology test_gated_unit_layer test_clip_layer test_row_l2_norm_layer test_recursive_topology test_gated_unit_layer test_clip_layer test_row_l2_norm_layer
test_kmax_seq_socre_layer test_seq_select_layers test_scale_shift_layer test_kmax_seq_socre_layer test_sub_nested_seq_select_layer test_scale_shift_layer
test_seq_slice_layer) test_seq_slice_layer test_cross_entropy_over_beam test_pooling3D_layer
test_conv3d_layer test_deconv3d_layer)
export whole_configs=(test_split_datasource) export whole_configs=(test_split_datasource)
type: "nn"
layers {
name: "data"
type: "data"
size: 36288
active_type: ""
height: 48
width: 42
depth: 6
}
layers {
name: "conv3d_1"
type: "conv3d"
size: 24192
active_type: ""
inputs {
input_layer_name: "data"
input_parameter_name: "_conv3d_1.w0"
conv_conf {
filter_size: 3
channels: 3
stride: 2
padding: 1
groups: 1
filter_channels: 3
output_x: 21
img_size: 42
caffe_mode: true
filter_size_y: 3
padding_y: 1
stride_y: 2
output_y: 24
img_size_y: 48
filter_size_z: 3
padding_z: 1
stride_z: 2
output_z: 3
img_size_z: 6
}
}
bias_parameter_name: "_conv3d_1.wbias"
num_filters: 16
shared_biases: true
height: 24
width: 21
depth: 3
}
layers {
name: "conv3d_2"
type: "conv3d"
size: 24192
active_type: ""
inputs {
input_layer_name: "data"
input_parameter_name: "_conv3d_2.w0"
conv_conf {
filter_size: 3
channels: 3
stride: 2
padding: 1
groups: 1
filter_channels: 3
output_x: 21
img_size: 42
caffe_mode: true
filter_size_y: 3
padding_y: 1
stride_y: 2
output_y: 24
img_size_y: 48
filter_size_z: 3
padding_z: 1
stride_z: 2
output_z: 3
img_size_z: 6
}
}
bias_parameter_name: "_conv3d_2.wbias"
num_filters: 16
shared_biases: true
height: 24
width: 21
depth: 3
}
parameters {
name: "_conv3d_1.w0"
size: 1296
initial_mean: 0.0
initial_std: 0.272165526976
initial_strategy: 0
initial_smart: false
}
parameters {
name: "_conv3d_1.wbias"
size: 16
initial_mean: 0.0
initial_std: 0.0
dims: 16
dims: 1
initial_strategy: 0
initial_smart: false
}
parameters {
name: "_conv3d_2.w0"
size: 1296
initial_mean: 0.0
initial_std: 0.272165526976
initial_strategy: 0
initial_smart: false
}
parameters {
name: "_conv3d_2.wbias"
size: 16
initial_mean: 0.0
initial_std: 0.0
dims: 16
dims: 1
initial_strategy: 0
initial_smart: false
}
input_layer_names: "data"
output_layer_names: "conv3d_2"
sub_models {
name: "root"
layer_names: "data"
layer_names: "conv3d_1"
layer_names: "conv3d_2"
input_layer_names: "data"
output_layer_names: "conv3d_2"
is_recurrent_layer_group: false
}
...@@ -45,7 +45,7 @@ layers { ...@@ -45,7 +45,7 @@ layers {
coeff: 1.0 coeff: 1.0
} }
layers { layers {
name: "__mse_cost_0__" name: "__square_error_cost_0__"
type: "square_error" type: "square_error"
size: 1 size: 1
active_type: "" active_type: ""
...@@ -130,7 +130,7 @@ input_layer_names: "label" ...@@ -130,7 +130,7 @@ input_layer_names: "label"
input_layer_names: "weight" input_layer_names: "weight"
input_layer_names: "multi_class_label" input_layer_names: "multi_class_label"
output_layer_names: "__cost_0__" output_layer_names: "__cost_0__"
output_layer_names: "__mse_cost_0__" output_layer_names: "__square_error_cost_0__"
output_layer_names: "__nce_layer_0__" output_layer_names: "__nce_layer_0__"
evaluators { evaluators {
name: "classification_error_evaluator" name: "classification_error_evaluator"
...@@ -146,7 +146,7 @@ sub_models { ...@@ -146,7 +146,7 @@ sub_models {
layer_names: "weight" layer_names: "weight"
layer_names: "__fc_layer_0__" layer_names: "__fc_layer_0__"
layer_names: "__cost_0__" layer_names: "__cost_0__"
layer_names: "__mse_cost_0__" layer_names: "__square_error_cost_0__"
layer_names: "multi_class_label" layer_names: "multi_class_label"
layer_names: "__nce_layer_0__" layer_names: "__nce_layer_0__"
input_layer_names: "input" input_layer_names: "input"
...@@ -154,7 +154,7 @@ sub_models { ...@@ -154,7 +154,7 @@ sub_models {
input_layer_names: "weight" input_layer_names: "weight"
input_layer_names: "multi_class_label" input_layer_names: "multi_class_label"
output_layer_names: "__cost_0__" output_layer_names: "__cost_0__"
output_layer_names: "__mse_cost_0__" output_layer_names: "__square_error_cost_0__"
output_layer_names: "__nce_layer_0__" output_layer_names: "__nce_layer_0__"
evaluator_names: "classification_error_evaluator" evaluator_names: "classification_error_evaluator"
is_recurrent_layer_group: false is_recurrent_layer_group: false
......
type: "nn"
layers {
name: "sentence_states"
type: "data"
size: 32
active_type: ""
}
layers {
name: "sentence_scores"
type: "data"
size: 1
active_type: ""
}
layers {
name: "__kmax_seq_score_layer_0__"
type: "kmax_seq_score"
active_type: ""
inputs {
input_layer_name: "sentence_scores"
}
beam_size: 5
}
layers {
name: "__sub_nested_seq_layer_0__"
type: "sub_nested_seq"
size: 32
active_type: ""
inputs {
input_layer_name: "sentence_states"
}
inputs {
input_layer_name: "__kmax_seq_score_layer_0__"
}
}
layers {
name: "__fc_layer_0__"
type: "fc"
size: 1
active_type: ""
inputs {
input_layer_name: "__sub_nested_seq_layer_0__"
input_parameter_name: "___fc_layer_0__.w0"
}
bias_parameter_name: "___fc_layer_0__.wbias"
}
layers {
name: "__kmax_seq_score_layer_1__"
type: "kmax_seq_score"
active_type: ""
inputs {
input_layer_name: "sentence_scores"
}
beam_size: 5
}
layers {
name: "__seq_slice_layer_0__"
type: "seq_slice"
size: 32
active_type: ""
inputs {
input_layer_name: "__sub_nested_seq_layer_0__"
}
inputs {
input_layer_name: "__kmax_seq_score_layer_1__"
}
select_first: true
}
layers {
name: "__fc_layer_1__"
type: "fc"
size: 1
active_type: ""
inputs {
input_layer_name: "__seq_slice_layer_0__"
input_parameter_name: "___fc_layer_1__.w0"
}
bias_parameter_name: "___fc_layer_1__.wbias"
}
layers {
name: "__kmax_seq_score_layer_2__"
type: "kmax_seq_score"
active_type: ""
inputs {
input_layer_name: "__fc_layer_1__"
}
beam_size: 5
}
layers {
name: "sentences_ids"
type: "data"
size: 1
active_type: ""
}
layers {
name: "start_ids"
type: "data"
size: 1
active_type: ""
}
layers {
name: "end_ids"
type: "data"
size: 1
active_type: ""
}
layers {
name: "__cross_entropy_over_beam_0__"
type: "cross_entropy_over_beam"
active_type: ""
inputs {
input_layer_name: "sentence_scores"
}
inputs {
input_layer_name: "__kmax_seq_score_layer_0__"
}
inputs {
input_layer_name: "sentences_ids"
}
inputs {
input_layer_name: "__fc_layer_0__"
}
inputs {
input_layer_name: "__kmax_seq_score_layer_1__"
}
inputs {
input_layer_name: "start_ids"
}
inputs {
input_layer_name: "__fc_layer_1__"
}
inputs {
input_layer_name: "__kmax_seq_score_layer_2__"
}
inputs {
input_layer_name: "end_ids"
}
}
parameters {
name: "___fc_layer_0__.w0"
size: 32
initial_mean: 0.0
initial_std: 0.176776695297
dims: 32
dims: 1
initial_strategy: 0
initial_smart: true
}
parameters {
name: "___fc_layer_0__.wbias"
size: 1
initial_mean: 0.0
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
}
parameters {
name: "___fc_layer_1__.w0"
size: 32
initial_mean: 0.0
initial_std: 0.176776695297
dims: 32
dims: 1
initial_strategy: 0
initial_smart: true
}
parameters {
name: "___fc_layer_1__.wbias"
size: 1
initial_mean: 0.0
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
}
input_layer_names: "sentence_scores"
input_layer_names: "sentences_ids"
input_layer_names: "sentence_states"
input_layer_names: "start_ids"
input_layer_names: "end_ids"
output_layer_names: "__cross_entropy_over_beam_0__"
sub_models {
name: "root"
layer_names: "sentence_states"
layer_names: "sentence_scores"
layer_names: "__kmax_seq_score_layer_0__"
layer_names: "__sub_nested_seq_layer_0__"
layer_names: "__fc_layer_0__"
layer_names: "__kmax_seq_score_layer_1__"
layer_names: "__seq_slice_layer_0__"
layer_names: "__fc_layer_1__"
layer_names: "__kmax_seq_score_layer_2__"
layer_names: "sentences_ids"
layer_names: "start_ids"
layer_names: "end_ids"
layer_names: "__cross_entropy_over_beam_0__"
input_layer_names: "sentence_scores"
input_layer_names: "sentences_ids"
input_layer_names: "sentence_states"
input_layer_names: "start_ids"
input_layer_names: "end_ids"
output_layer_names: "__cross_entropy_over_beam_0__"
is_recurrent_layer_group: false
}
type: "nn"
layers {
name: "data"
type: "data"
size: 36288
active_type: ""
height: 48
width: 42
depth: 6
}
layers {
name: "deconv3d_1"
type: "deconv3d"
size: 1387760
active_type: ""
inputs {
input_layer_name: "data"
input_parameter_name: "_deconv3d_1.w0"
conv_conf {
filter_size: 3
channels: 3
stride: 2
padding: 1
groups: 1
filter_channels: 16
output_x: 42
img_size: 83
caffe_mode: true
filter_size_y: 3
padding_y: 1
stride_y: 2
output_y: 48
img_size_y: 95
filter_size_z: 3
padding_z: 1
stride_z: 2
output_z: 6
img_size_z: 11
}
}
bias_parameter_name: "_deconv3d_1.wbias"
num_filters: 16
shared_biases: true
height: 95
width: 83
depth: 11
}
layers {
name: "deconv3d_2"
type: "deconv3d"
size: 1387760
active_type: ""
inputs {
input_layer_name: "data"
input_parameter_name: "_deconv3d_2.w0"
conv_conf {
filter_size: 3
channels: 3
stride: 2
padding: 1
groups: 1
filter_channels: 16
output_x: 42
img_size: 83
caffe_mode: true
filter_size_y: 3
padding_y: 1
stride_y: 2
output_y: 48
img_size_y: 95
filter_size_z: 3
padding_z: 1
stride_z: 2
output_z: 6
img_size_z: 11
}
}
bias_parameter_name: "_deconv3d_2.wbias"
num_filters: 16
shared_biases: true
height: 95
width: 83
depth: 11
}
parameters {
name: "_deconv3d_1.w0"
size: 6912
initial_mean: 0.0
initial_std: 0.272165526976
initial_strategy: 0
initial_smart: false
}
parameters {
name: "_deconv3d_1.wbias"
size: 16
initial_mean: 0.0
initial_std: 0.0
dims: 16
dims: 1
initial_strategy: 0
initial_smart: false
}
parameters {
name: "_deconv3d_2.w0"
size: 6912
initial_mean: 0.0
initial_std: 0.272165526976
initial_strategy: 0
initial_smart: false
}
parameters {
name: "_deconv3d_2.wbias"
size: 16
initial_mean: 0.0
initial_std: 0.0
dims: 16
dims: 1
initial_strategy: 0
initial_smart: false
}
input_layer_names: "data"
output_layer_names: "deconv3d_2"
sub_models {
name: "root"
layer_names: "data"
layer_names: "deconv3d_1"
layer_names: "deconv3d_2"
input_layer_names: "data"
output_layer_names: "deconv3d_2"
is_recurrent_layer_group: false
}
...@@ -17,7 +17,7 @@ layers { ...@@ -17,7 +17,7 @@ layers {
bias_parameter_name: "___fc_layer_0__.wbias" bias_parameter_name: "___fc_layer_0__.wbias"
} }
layers { layers {
name: "__kmax_sequence_score_layer_0__" name: "__kmax_seq_score_layer_0__"
type: "kmax_seq_score" type: "kmax_seq_score"
active_type: "" active_type: ""
inputs { inputs {
...@@ -46,14 +46,14 @@ parameters { ...@@ -46,14 +46,14 @@ parameters {
initial_smart: false initial_smart: false
} }
input_layer_names: "input_seq" input_layer_names: "input_seq"
output_layer_names: "__kmax_sequence_score_layer_0__" output_layer_names: "__kmax_seq_score_layer_0__"
sub_models { sub_models {
name: "root" name: "root"
layer_names: "input_seq" layer_names: "input_seq"
layer_names: "__fc_layer_0__" layer_names: "__fc_layer_0__"
layer_names: "__kmax_sequence_score_layer_0__" layer_names: "__kmax_seq_score_layer_0__"
input_layer_names: "input_seq" input_layer_names: "input_seq"
output_layer_names: "__kmax_sequence_score_layer_0__" output_layer_names: "__kmax_seq_score_layer_0__"
is_recurrent_layer_group: false is_recurrent_layer_group: false
} }
type: "nn"
layers {
name: "data_2d"
type: "data"
size: 6000
active_type: ""
height: 20
width: 10
}
layers {
name: "pool___2d"
type: "pool"
size: 840
active_type: ""
inputs {
input_layer_name: "data_2d"
pool_conf {
pool_type: "avg-projection"
channels: 30
size_x: 5
stride: 3
output_x: 4
img_size: 10
padding: 1
size_y: 5
stride_y: 3
output_y: 7
img_size_y: 20
padding_y: 1
}
}
height: 7
width: 4
}
layers {
name: "data_3d_1"
type: "data"
size: 60000
active_type: ""
height: 20
width: 10
depth: 10
}
layers {
name: "pool_3d_1"
type: "pool3d"
size: 3360
active_type: ""
inputs {
input_layer_name: "data_3d_1"
pool_conf {
pool_type: "avg-projection"
channels: 30
size_x: 5
stride: 3
output_x: 4
img_size: 10
padding: 1
size_y: 5
stride_y: 3
output_y: 7
img_size_y: 20
padding_y: 1
size_z: 5
stride_z: 3
output_z: 4
img_size_z: 10
padding_z: 1
}
}
height: 7
width: 4
depth: 4
}
layers {
name: "pool_3d_2"
type: "pool3d"
size: 3360
active_type: ""
inputs {
input_layer_name: "data_3d_1"
pool_conf {
pool_type: "max-projection"
channels: 30
size_x: 5
stride: 3
output_x: 4
img_size: 10
padding: 1
size_y: 5
stride_y: 3
output_y: 7
img_size_y: 20
padding_y: 1
size_z: 5
stride_z: 3
output_z: 4
img_size_z: 10
padding_z: 1
}
}
height: 7
width: 4
depth: 4
}
input_layer_names: "data_2d"
output_layer_names: "pool___2d"
output_layer_names: "pool_3d_1"
output_layer_names: "pool_3d_2"
sub_models {
name: "root"
layer_names: "data_2d"
layer_names: "pool___2d"
layer_names: "data_3d_1"
layer_names: "pool_3d_1"
layer_names: "pool_3d_2"
input_layer_names: "data_2d"
output_layer_names: "pool___2d"
output_layer_names: "pool_3d_1"
output_layer_names: "pool_3d_2"
is_recurrent_layer_group: false
}
from paddle.trainer_config_helpers import *
settings(batch_size=1000, learning_rate=1e-5)
num_channels = 3
filter_size = 3
filter_size_y = 3
filter_size_z = 3
stride = 2
stride_y = 2
stride_z = 2
padding = 1
padding_y = 1
padding_z = 1
groups = 1
data = data_layer(
name='data', size=12096 * num_channels, height=48, width=42, depth=6)
# first
conv3d_1 = img_conv3d_layer(
input=data,
name='conv3d_1',
num_filters=16,
num_channels=num_channels,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=groups,
bias_attr=True,
shared_biases=True,
trans=False,
layer_type="conv3d",
act=LinearActivation())
# second
conv3d_2 = img_conv3d_layer(
input=data,
name='conv3d_2',
num_filters=16,
num_channels=num_channels,
filter_size=[filter_size, filter_size_y, filter_size_z],
stride=[stride, stride_y, stride_z],
padding=[padding, padding_y, padding_z],
groups=groups,
bias_attr=True,
shared_biases=True,
trans=False,
layer_type="conv3d",
act=LinearActivation())
outputs(conv3d_2)
...@@ -10,7 +10,7 @@ fc = fc_layer(input=data, size=10, act=SoftmaxActivation()) ...@@ -10,7 +10,7 @@ fc = fc_layer(input=data, size=10, act=SoftmaxActivation())
outputs( outputs(
classification_cost( classification_cost(
input=fc, label=lbl, weight=wt), input=fc, label=lbl, weight=wt),
mse_cost( square_error_cost(
input=fc, label=lbl, weight=wt), input=fc, label=lbl, weight=wt),
nce_layer( nce_layer(
input=fc, input=fc,
......
#!/usr/bin/env python
#coding=utf-8
from paddle.trainer_config_helpers import *
beam_size = 5
# the first beam expansion.
sentence_states = data_layer(name="sentence_states", size=32)
sentence_scores = data_layer(name="sentence_scores", size=1)
topk_sentence_ids = kmax_seq_score_layer(
input=sentence_scores, beam_size=beam_size)
# the second beam expansion.
topk_sen = sub_nested_seq_layer(
input=sentence_states, selected_indices=topk_sentence_ids)
start_pos_scores = fc_layer(input=topk_sen, size=1, act=LinearActivation())
topk_start_pos_ids = kmax_seq_score_layer(
input=sentence_scores, beam_size=beam_size)
# the final beam expansion.
topk_start_spans = seq_slice_layer(
input=topk_sen, starts=topk_start_pos_ids, ends=None)
end_pos_scores = fc_layer(
input=topk_start_spans, size=1, act=LinearActivation())
topk_end_pos_ids = kmax_seq_score_layer(
input=end_pos_scores, beam_size=beam_size)
# define the cost
sentence_idx = data_layer(name="sentences_ids", size=1)
start_idx = data_layer(name="start_ids", size=1)
end_idx = data_layer(name="end_ids", size=1)
cost = cross_entropy_over_beam(input=[
BeamInput(
candidate_scores=sentence_scores,
selected_candidates=topk_sentence_ids,
gold=sentence_idx), BeamInput(
candidate_scores=start_pos_scores,
selected_candidates=topk_start_pos_ids,
gold=start_idx), BeamInput(
candidate_scores=end_pos_scores,
selected_candidates=topk_end_pos_ids,
gold=end_idx)
])
outputs(cost)
from paddle.trainer_config_helpers import *
settings(batch_size=1000, learning_rate=1e-5)
num_channels = 3
filter_size = 3
filter_size_y = 3
filter_size_z = 3
stride = 2
stride_y = 2
stride_z = 2
padding = 1
padding_y = 1
padding_z = 1
groups = 1
data = data_layer(
name='data', size=12096 * num_channels, height=48, width=42, depth=6)
# first
deconv3d_1 = img_conv3d_layer(
input=data,
name='deconv3d_1',
num_filters=16,
num_channels=num_channels,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=groups,
bias_attr=True,
shared_biases=True,
trans=True,
layer_type="deconv3d",
act=LinearActivation())
# second
deconv3d_2 = img_conv3d_layer(
input=data,
name='deconv3d_2',
num_filters=16,
num_channels=num_channels,
filter_size=[filter_size, filter_size_y, filter_size_z],
stride=[stride, stride_y, stride_z],
padding=[padding, padding_y, padding_z],
groups=groups,
bias_attr=True,
shared_biases=True,
trans=True,
layer_type="deconv3d",
act=LinearActivation())
outputs(deconv3d_2)
...@@ -4,6 +4,6 @@ from paddle.trainer_config_helpers import * ...@@ -4,6 +4,6 @@ from paddle.trainer_config_helpers import *
data = data_layer(name="input_seq", size=128) data = data_layer(name="input_seq", size=128)
scores = fc_layer(input=data, size=1, act=ExpActivation()) scores = fc_layer(input=data, size=1, act=ExpActivation())
kmax_seq_id = kmax_sequence_score_layer(input=scores, beam_size=5) kmax_seq_id = kmax_seq_score_layer(input=scores, beam_size=5)
outputs(kmax_seq_id) outputs(kmax_seq_id)
from paddle.trainer_config_helpers import *
settings(batch_size=100, learning_rate=1e-5)
data_2d = data_layer(name='data_2d', size=6000, height=20, width=10)
pool_2d = img_pool_layer(
name="pool___2d",
input=data_2d,
num_channels=30,
pool_size=5,
stride=3,
padding=1,
pool_type=AvgPooling())
outputs(pool_2d)
data_3d = data_layer(
name='data_3d_1', size=60000, depth=10, height=20, width=10)
pool_3d_1 = img_pool3d_layer(
name="pool_3d_1",
input=data_3d,
num_channels=30,
pool_size=5,
stride=3,
padding=1,
pool_type=AvgPooling())
outputs(pool_3d_1)
pool_3d_2 = img_pool3d_layer(
name="pool_3d_2",
input=data_3d,
num_channels=30,
pool_size=[5, 5, 5],
stride=[3, 3, 3],
padding=[1, 1, 1],
pool_type=MaxPooling())
outputs(pool_3d_2)
...@@ -17,3 +17,4 @@ from paddle.trainer.config_parser import parse_config_and_serialize ...@@ -17,3 +17,4 @@ from paddle.trainer.config_parser import parse_config_and_serialize
if __name__ == '__main__': if __name__ == '__main__':
parse_config_and_serialize( parse_config_and_serialize(
'trainer_config_helpers/tests/layers_test_config.py', '') 'trainer_config_helpers/tests/layers_test_config.py', '')
# layers_test_config.py
...@@ -78,6 +78,8 @@ def init(**kwargs): ...@@ -78,6 +78,8 @@ def init(**kwargs):
if 'use_gpu' in kwargs: if 'use_gpu' in kwargs:
cp.g_command_config_args['use_gpu'] = kwargs['use_gpu'] cp.g_command_config_args['use_gpu'] = kwargs['use_gpu']
if 'use_mkldnn' in kwargs:
cp.g_command_config_args['use_mkldnn'] = kwargs['use_mkldnn']
assert 'parallel_nn' not in kwargs, ("currently 'parallel_nn' is not " assert 'parallel_nn' not in kwargs, ("currently 'parallel_nn' is not "
"supported in v2 APIs.") "supported in v2 APIs.")
......
...@@ -94,9 +94,14 @@ class OpDescCreationMethod(object): ...@@ -94,9 +94,14 @@ class OpDescCreationMethod(object):
new_attr.floats.extend(user_defined_attr) new_attr.floats.extend(user_defined_attr)
elif attr.type == framework_pb2.STRINGS: elif attr.type == framework_pb2.STRINGS:
new_attr.strings.extend(user_defined_attr) new_attr.strings.extend(user_defined_attr)
elif attr.type == framework_pb2.INT_PAIRS:
for p in user_defined_attr:
pair = new_attr.pairs.add()
pair.first = p[0]
pair.second = p[1]
else: else:
raise NotImplementedError("Not support attribute type " + raise NotImplementedError("Not support attribute type " +
attr.type) str(attr.type))
return op_desc return op_desc
...@@ -179,7 +184,7 @@ class OperatorFactory(object): ...@@ -179,7 +184,7 @@ class OperatorFactory(object):
class __RecurrentOp__(object): class __RecurrentOp__(object):
__proto__ = None __proto__ = None
type = 'recurrent_op' type = 'recurrent'
def __init__(self): def __init__(self):
# cache recurrent_op's proto # cache recurrent_op's proto
......
...@@ -286,6 +286,9 @@ class GradientChecker(unittest.TestCase): ...@@ -286,6 +286,9 @@ class GradientChecker(unittest.TestCase):
for no_grad in no_grad_set: for no_grad in no_grad_set:
if no_grad not in in_names: if no_grad not in in_names:
raise ValueError("no_grad should be in in_names") raise ValueError("no_grad should be in in_names")
if no_grad in inputs_to_check:
raise ValueError("no_grad should not be in inputs_to_check")
backward_op = core.Operator.backward(forward_op, no_grad_set) backward_op = core.Operator.backward(forward_op, no_grad_set)
places = [core.CPUPlace()] places = [core.CPUPlace()]
...@@ -301,7 +304,6 @@ class GradientChecker(unittest.TestCase): ...@@ -301,7 +304,6 @@ class GradientChecker(unittest.TestCase):
check_names = [grad_var_name(name) for name in inputs_to_check] check_names = [grad_var_name(name) for name in inputs_to_check]
for place in places: for place in places:
# get analytical gradients according to different device
analytic_grads = self.__get_gradient(forward_op, backward_op, analytic_grads = self.__get_gradient(forward_op, backward_op,
input_vars, check_names, place) input_vars, check_names, place)
self.__assert_is_close(numeric_grads, analytic_grads, check_names, self.__assert_is_close(numeric_grads, analytic_grads, check_names,
......
...@@ -16,16 +16,37 @@ class TestMulOp(unittest.TestCase): ...@@ -16,16 +16,37 @@ class TestMulOp(unittest.TestCase):
self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])} self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}
class MulGradOpTest(GradientChecker): class TestMulGradOp(GradientChecker):
def test_mul(self): def setUp(self):
op = create_op("mul") self.op = create_op("mul")
inputs = { self.inputs = {
'X': np.random.random((32, 84)).astype("float32"), 'X': np.random.random((32, 84)).astype("float32"),
'Y': np.random.random((84, 100)).astype("float32") 'Y': np.random.random((84, 100)).astype("float32")
} }
def test_cpu_gpu_compare(self):
self.compare_grad(self.op, self.inputs)
def test_normal(self):
# mul op will enlarge the relative error # mul op will enlarge the relative error
self.check_grad( self.check_grad(
op, inputs, set(["X", "Y"]), "Out", max_relative_error=0.5) self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.5)
def test_ignore_x(self):
self.check_grad(
self.op,
self.inputs, ["Y"],
"Out",
max_relative_error=0.5,
no_grad_set={"X"})
def test_ignore_y(self):
self.check_grad(
self.op,
self.inputs, ["X"],
"Out",
max_relative_error=0.5,
no_grad_set={"Y"})
# TODO(dzh,qijun) : mulgrad test case need transpose feature of blas library # TODO(dzh,qijun) : mulgrad test case need transpose feature of blas library
......
...@@ -16,14 +16,22 @@ class TestRowwiseAddOp(unittest.TestCase): ...@@ -16,14 +16,22 @@ class TestRowwiseAddOp(unittest.TestCase):
self.outputs = {'Out': np.add(self.inputs['X'], self.inputs['b'])} self.outputs = {'Out': np.add(self.inputs['X'], self.inputs['b'])}
class RowwiseAddGradOpTest(GradientChecker): class TestRowwiseAddGradOp(GradientChecker):
def test_rowwise_add(self): def setUp(self):
op = create_op("rowwise_add") self.op = create_op("rowwise_add")
inputs = { self.inputs = {
"X": np.random.uniform(0.1, 1, [5, 10]).astype("float32"), "X": np.random.uniform(0.1, 1, [5, 10]).astype("float32"),
"b": np.random.uniform(0.1, 1, [10]).astype("float32") "b": np.random.uniform(0.1, 1, [10]).astype("float32")
} }
self.check_grad(op, inputs, set(["X", "b"]), "Out")
def test_normal(self):
self.check_grad(self.op, self.inputs, ["X", "b"], "Out")
def test_ignore_b(self):
self.check_grad(self.op, self.inputs, ["X"], "Out", no_grad_set={"b"})
def test_ignore_x(self):
self.check_grad(self.op, self.inputs, ["b"], "Out", no_grad_set={"X"})
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -70,7 +70,7 @@ class Inference(object): ...@@ -70,7 +70,7 @@ class Inference(object):
item = [each_result[each_field] for each_field in field] item = [each_result[each_field] for each_field in field]
yield item yield item
def infer(self, input, field='value', **kwargs): def infer(self, input, field='value', flatten_result=True, **kwargs):
""" """
Infer a data by model. Infer a data by model.
:param input: input data batch. Should be python iterable object. :param input: input data batch. Should be python iterable object.
...@@ -83,7 +83,10 @@ class Inference(object): ...@@ -83,7 +83,10 @@ class Inference(object):
retv = [[] for i in xrange(len(result))] retv = [[] for i in xrange(len(result))]
for i, item in enumerate(result): for i, item in enumerate(result):
retv[i].append(item) retv[i].append(item)
if flatten_result:
retv = [numpy.concatenate(out) for out in retv] retv = [numpy.concatenate(out) for out in retv]
if len(retv) == 1: if len(retv) == 1:
return retv[0] return retv[0]
else: else:
......
...@@ -134,8 +134,9 @@ class CostLayerTest(unittest.TestCase): ...@@ -134,8 +134,9 @@ class CostLayerTest(unittest.TestCase):
cost3 = layer.cross_entropy_cost(input=inference, label=label) cost3 = layer.cross_entropy_cost(input=inference, label=label)
cost4 = layer.cross_entropy_with_selfnorm_cost( cost4 = layer.cross_entropy_with_selfnorm_cost(
input=inference, label=label) input=inference, label=label)
cost5 = layer.mse_cost(input=inference, label=label) cost5 = layer.square_error_cost(input=inference, label=label)
cost6 = layer.mse_cost(input=inference, label=label, weight=weight) cost6 = layer.square_error_cost(
input=inference, label=label, weight=weight)
cost7 = layer.multi_binary_label_cross_entropy_cost( cost7 = layer.multi_binary_label_cross_entropy_cost(
input=inference, label=label) input=inference, label=label)
cost8 = layer.rank_cost(left=score, right=score, label=score) cost8 = layer.rank_cost(left=score, right=score, label=score)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册