提交 2fc012c5 编写于 作者: L Liu Yiqun

Merge branch 'develop' into build_android_clang

...@@ -25,7 +25,12 @@ IF(NOT ${CBLAS_FOUND}) ...@@ -25,7 +25,12 @@ IF(NOT ${CBLAS_FOUND})
"${CBLAS_INSTALL_DIR}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}openblas${CMAKE_STATIC_LIBRARY_SUFFIX}" "${CBLAS_INSTALL_DIR}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}openblas${CMAKE_STATIC_LIBRARY_SUFFIX}"
CACHE FILEPATH "openblas library." FORCE) CACHE FILEPATH "openblas library." FORCE)
IF(APPLE)
SET(OPENBLAS_CC "${CMAKE_C_COMPILER} -isysroot ${CMAKE_OSX_SYSROOT}")
SET(COMMON_ARGS CC=${OPENBLAS_CC} NO_SHARED=1 NO_LAPACK=1 libs)
ELSE()
SET(COMMON_ARGS CC=${CMAKE_C_COMPILER} NO_SHARED=1 NO_LAPACK=1 libs) SET(COMMON_ARGS CC=${CMAKE_C_COMPILER} NO_SHARED=1 NO_LAPACK=1 libs)
ENDIF()
IF(CMAKE_CROSSCOMPILING) IF(CMAKE_CROSSCOMPILING)
IF(ANDROID) IF(ANDROID)
...@@ -40,11 +45,11 @@ IF(NOT ${CBLAS_FOUND}) ...@@ -40,11 +45,11 @@ IF(NOT ${CBLAS_FOUND})
SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=${TARGET} ARM_SOFTFP_ABI=1 USE_THREAD=0) SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=${TARGET} ARM_SOFTFP_ABI=1 USE_THREAD=0)
ELSEIF(RPI) ELSEIF(RPI)
# use hardfp # use hardfp
SET(OPENBLAS_COMMIT "v0.2.19") SET(OPENBLAS_COMMIT "v0.2.20")
SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=ARMV7 USE_THREAD=0) SET(OPTIONAL_ARGS HOSTCC=${HOST_C_COMPILER} TARGET=ARMV7 USE_THREAD=0)
ENDIF() ENDIF()
ELSE() ELSE()
SET(OPENBLAS_COMMIT "v0.2.19") SET(OPENBLAS_COMMIT "v0.2.20")
SET(OPTIONAL_ARGS "") SET(OPTIONAL_ARGS "")
IF(CMAKE_SYSTEM_PROCESSOR MATCHES "^x86(_64)?$") IF(CMAKE_SYSTEM_PROCESSOR MATCHES "^x86(_64)?$")
SET(OPTIONAL_ARGS DYNAMIC_ARCH=1 NUM_THREADS=64) SET(OPTIONAL_ARGS DYNAMIC_ARCH=1 NUM_THREADS=64)
......
# Design Doc: Functions, Operators, and Layers
In a DL system, we can compose one or more fine grained operators into a coarse grained one. For example, the FC layer can be composed of a multiplication operator and an add operator.
Historically, some fine grained operations are known as operators, and some coarse level ones are known as layers. But we need a well-defined separation.
In general, operators are those very fine grained operations, e.g., mul and add. In the implementation, we can write them as C++ functions:
```c++
template <typename T> T add(T x, T y) { return x + y; }
template <typename T> T mul(T x, T y) { return x * y; }
```
Then we can wrap them into operators which are C++ classes and can be created from Python bindings by name. A C macro can do this. For example, the following macro invocation
```c++
#define MAKE_FUNCTION_OPERATOR(mul);
```
generates
```c++
template <typename T> class mulOp : public OperatorBase {...};
REGISTER_OP(mulOp<float32>, "mul");
```
so that in Python we can create operator mul by:
```python
X1 = Var()
X2 = Var()
Y = Var()
paddle.cpp.create_operator("mul", input=[X1, X2], output=Y)
```
Also, at the same time, we can compose a coarse level C++ operator class by composing functions `mul` and `add`:
```c++
template <typename T>
class FCOp : public OperatorBase {
public:
void Run(...) {
add(mul(Input<T>("X"), Input<T>("W")), Input<T>("b");
}
};
REGISTER_OP(FCOp, "fc");
```
We need to support such composition in Python as well. To do so, we need a higher level Python wrapping of operator creation than `paddle.cpp.create_operator`. This higher level operator API should be compatible with the layer API.
Let's explain using an example. Suppose that we are going to compose the FC using mul and add in Python, we'd like to have Python functions `mul` and `add` defined in module `operator`:
```python
def operator.mul(X1, X2):
O = Var()
paddle.cpp.create_operator("mul", input={X1, Y1], output=O)
return O
def operator.add(X1, X2):
O = Var()
paddle.cpp.create_operator("add", input={X1, X2], output=O)
return O
```
Above code snippets are automatically generated. Given them, users can define
```python
def layer.fc(X):
W = Var()
b = Var()
return operator.add(operator.mul(X, W), b)
```
If we don't have `operator.mul` and `operator.add`, the definiton of `layer.fc` would be complicated:
```python
def layer.fc(X):
W = Var()
b = Var()
O1 = Var()
paddle.cpp.create_operator("mul", input=[X, W], output=O1)
O2 = Var()
paddle.cpp.create_operator("add", input=[O1, b], output=O2)
return O2
```
We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example:
```
| C++ functions/functors | mul | add | | |
| C++ operator class | mulOp | addOp | FCOp | |
| Python binding | operator.mul | operator.add | operator.fc | |
| Python function | | | | layer.fc |
```
This is how we differentiate layer and operators in PaddlePaddle:
- those defined in C++ and have a lightweighted Python wrapper in module `operators` are operators; whereas
- those who don't have C++ implementations but a Python implementation that compose C++ operators are known as layers.
IfOp should have only one branch. An IfOp operator takes a `cond` variable whose value must be a vector of N boolean elements. Its return value has M (M<=N) instances, each corresponds to a true element in `cond`.
```python
import paddle as pd
x = var()
y = var()
cond = var()
b = pd.create_ifop(inputs=[x], output_num=1)
with b.true_block():
x = b.inputs(0)
z = operator.add(x, y)
b.set_output(0, operator.softmax(z))
out = b(cond)
```
If we want the output still has N instances, we can use IfElseOp with a default value, whose minibatch size must be N:
```python
import paddle as pd
x = var()
y = var()
cond = var()
default_value = var()
b = pd.create_ifelseop(inputs=[x], output_num=1)
with b.true_block():
x = b.inputs(0)
z = operator.add(x, y)
b.set_output(0, operator.softmax(z))
with b.false_block():
x = b.inputs(0)
z = layer.fc(x)
b.set_output(0, operator.softmax(z))
out = b(cond)
```
If only true_block is set in an IfElseOp, we can have a default value for false as:
```python
import paddle as pd
x = var()
y = var()
cond = var()
default_value = var()
b = pd.create_ifelseop(inputs=[x], output_num=1, default_value)
with b.true_block():
x = b.inputs(0)
z = operator.add(x, y)
b.set_output(0, operator.softmax(z))
out = b(cond)
```
where default_value is a list of vars for `cond` == False.
...@@ -178,13 +178,13 @@ class MulKernel : public framework::OpKernel { ...@@ -178,13 +178,13 @@ class MulKernel : public framework::OpKernel {
```c++ ```c++
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, mul_grad, ops::MulOpGrad); REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, ops::MulOpGrad);
REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel<paddle::platform::CPUPlace, float>); REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(mul_grad, REGISTER_OP_CPU_KERNEL(mul_grad,
ops::MulGradKernel<paddle::platform::CPUPlace, float>); ops::MulGradKernel<paddle::platform::CPUPlace, float>);
``` ```
- `REGISTER_OP` : 注册`ops::MulOp`类,类型名为`mul`,该类的`ProtoMaker``ops::MulOpMaker`注册`ops::MulOpGrad`,类型名为`mul_grad` - `REGISTER_OP` : 注册`ops::MulOp`类,类型名为`mul`,该类的`ProtoMaker``ops::MulOpMaker`并且注册`ops::MulOpGrad`为其反向Op。
- `REGISTER_OP_WITHOUT_GRADIENT` : 用于注册没有反向的Op。 - `REGISTER_OP_WITHOUT_GRADIENT` : 用于注册没有反向的Op。
- `REGISTER_OP_CPU_KERNEL` :注册`ops::MulKernel`类,并特化模板参数为`paddle::platform::CPUPlace``float`类型,同理,注册`ops::MulKernel`类。 - `REGISTER_OP_CPU_KERNEL` :注册`ops::MulKernel`类,并特化模板参数为`paddle::platform::CPUPlace``float`类型,同理,注册`ops::MulKernel`类。
......
...@@ -18,7 +18,7 @@ A backward network is built up with several backward operators. Backward operato ...@@ -18,7 +18,7 @@ A backward network is built up with several backward operators. Backward operato
For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro: For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro:
```cpp ```cpp
REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); REGISTER_OP(mul, MulOp, MulOpMaker, MulOpGrad);
``` ```
`mul` is the operator's type. `MulOp` and `MulOpMaker` are the operator class and the operator maker class respectively. `mul` is the operator's type. `MulOp` and `MulOpMaker` are the operator class and the operator maker class respectively.
......
...@@ -127,8 +127,8 @@ class FillZeroOpMaker : public OpProtoAndCheckerMaker { ...@@ -127,8 +127,8 @@ class FillZeroOpMaker : public OpProtoAndCheckerMaker {
public: public:
FillZeroOpMaker(OpProto *proto, OpAttrChecker *op_checker) FillZeroOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("x", "x"); AddInput("Src", "x");
AddOutput("out", "out"); AddOutput("Dst", "out");
AddComment(""); AddComment("");
} }
}; };
...@@ -138,7 +138,7 @@ class AddOpMaker : public OpProtoAndCheckerMaker { ...@@ -138,7 +138,7 @@ class AddOpMaker : public OpProtoAndCheckerMaker {
AddOpMaker(OpProto *proto, OpAttrChecker *op_checker) AddOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "x").AsDuplicable(); AddInput("X", "x").AsDuplicable();
AddOutput("Y", "y"); AddOutput("Out", "out");
AddComment(""); AddComment("");
} }
}; };
...@@ -148,16 +148,14 @@ class AddOpMaker : public OpProtoAndCheckerMaker { ...@@ -148,16 +148,14 @@ class AddOpMaker : public OpProtoAndCheckerMaker {
namespace f = paddle::framework; namespace f = paddle::framework;
namespace ops = paddle::operators; namespace ops = paddle::operators;
using EnforceNotMet = paddle::platform::EnforceNotMet; using EnforceNotMet = paddle::platform::EnforceNotMet;
REGISTER_OP(rowwise_add, f::NOP, f::RowWiseAddOpMaker, rowwise_add_grad, REGISTER_OP(rowwise_add, f::NOP, f::RowWiseAddOpMaker, f::NOP);
f::NOP); REGISTER_OP(mul, f::NOP, f::MulOpMaker, f::NOP);
REGISTER_OP(mul, f::NOP, f::MulOpMaker, mul_grad, f::NOP); REGISTER_OP(sigmoid, f::NOP, f::SigmoidOpMaker, f::NOP);
REGISTER_OP(sigmoid, f::NOP, f::SigmoidOpMaker, sigmoid_grad, f::NOP);
REGISTER_OP_WITHOUT_GRADIENT(nograd, f::NOP, f::NoGradOpMaker); REGISTER_OP_WITHOUT_GRADIENT(nograd, f::NOP, f::NoGradOpMaker);
REGISTER_OP_WITHOUT_GRADIENT(fill_zeros_like, f::NOP, f::FillZeroOpMaker); REGISTER_OP_WITHOUT_GRADIENT(fill_zeros_like, f::NOP, f::FillZeroOpMaker);
REGISTER_OP(add, f::NOP, f::AddOpMaker, add_grad, f::NOP); REGISTER_OP(add, f::NOP, f::AddOpMaker, f::NOP);
REGISTER_OP_WITHOUT_GRADIENT(fc, f::FcOp, f::FcOpMaker); REGISTER_OP_WITHOUT_GRADIENT(fc, f::FcOp, f::FcOpMaker);
REGISTER_OP(many_output_op, f::NOP, f::ManyOutputOpMaker, many_output_op_grad, REGISTER_OP(many_output_op, f::NOP, f::ManyOutputOpMaker, f::NOP);
f::NOP);
TEST(Backward, simple_op_grad) { TEST(Backward, simple_op_grad) {
auto fwd = f::OpRegistry::CreateOp( auto fwd = f::OpRegistry::CreateOp(
......
...@@ -54,8 +54,8 @@ TEST(GradOpBuilder, AddTwo) { ...@@ -54,8 +54,8 @@ TEST(GradOpBuilder, AddTwo) {
EXPECT_EQ(grad_add_op->Output(f::GradVarName("Y")), f::GradVarName("y")); EXPECT_EQ(grad_add_op->Output(f::GradVarName("Y")), f::GradVarName("y"));
} }
REGISTER_OP(mult_io, f::NOP, f::MutiInOutOpMaker, mult_io_grad, f::NOP); REGISTER_OP(mult_io, f::NOP, f::MutiInOutOpMaker, f::NOP);
REGISTER_OP(io_ignored, f::NOP, f::IOIgnoredOpMaker, io_ignored_grad, f::NOP); REGISTER_OP(io_ignored, f::NOP, f::IOIgnoredOpMaker, f::NOP);
TEST(GradOpBuilder, MutiInOut) { TEST(GradOpBuilder, MutiInOut) {
std::shared_ptr<f::OperatorBase> test_op(f::OpRegistry::CreateOp( std::shared_ptr<f::OperatorBase> test_op(f::OpRegistry::CreateOp(
......
...@@ -19,25 +19,24 @@ ...@@ -19,25 +19,24 @@
namespace paddle { namespace paddle {
namespace framework { namespace framework {
LODTensor::LOD LODTensor::LOD::SliceLevels(size_t level_begin, LOD SliceLevels(const LOD& in, size_t level_begin, size_t level_end) {
size_t level_end) const {
LOD new_lod; LOD new_lod;
new_lod.reserve(level_end - level_begin); new_lod.reserve(level_end - level_begin);
for (size_t i = level_begin; i < level_end; i++) { for (size_t i = level_begin; i < level_end; i++) {
new_lod.emplace_back(at(i)); new_lod.emplace_back(in.at(i));
} }
return new_lod; return new_lod;
} }
LODTensor::LOD LODTensor::LOD::SliceInLevel(size_t level, size_t elem_begin, LOD SliceInLevel(const LOD& in, size_t level, size_t elem_begin,
size_t elem_end) const { size_t elem_end) {
// slice the lod. // slice the lod.
LOD new_lod; LOD new_lod;
new_lod.reserve(size() - level); new_lod.reserve(in.size() - level);
auto start = this->at(level)[elem_begin]; auto start = in.at(level)[elem_begin];
auto end = this->at(level)[elem_end]; auto end = in.at(level)[elem_end];
for (auto it = this->begin() + level; it != this->end(); it++) { for (auto it = in.begin() + level; it != in.end(); it++) {
auto it_begin = std::find(it->begin(), it->end(), start); auto it_begin = std::find(it->begin(), it->end(), start);
auto it_end = std::find(it_begin, it->end(), end); auto it_end = std::find(it_begin, it->end(), end);
PADDLE_ENFORCE(it_begin != it->end(), "error in parsing lod info"); PADDLE_ENFORCE(it_begin != it->end(), "error in parsing lod info");
...@@ -49,11 +48,11 @@ LODTensor::LOD LODTensor::LOD::SliceInLevel(size_t level, size_t elem_begin, ...@@ -49,11 +48,11 @@ LODTensor::LOD LODTensor::LOD::SliceInLevel(size_t level, size_t elem_begin,
[start](int v) { return v - start; }); [start](int v) { return v - start; });
PADDLE_ENFORCE_EQ(new_lod.back().front(), 0, "error in slice LOD"); PADDLE_ENFORCE_EQ(new_lod.back().front(), 0, "error in slice LOD");
} }
PADDLE_ENFORCE_LE(new_lod.size(), this->size()); PADDLE_ENFORCE_LE(new_lod.size(), in.size());
return new_lod; return new_lod;
} }
bool operator==(const LODTensor::LOD& a, const LODTensor::LOD& b) { bool operator==(const LOD& a, const LOD& b) {
if (a.size() != b.size()) { if (a.size() != b.size()) {
return false; return false;
} }
...@@ -70,9 +69,27 @@ bool operator==(const LODTensor::LOD& a, const LODTensor::LOD& b) { ...@@ -70,9 +69,27 @@ bool operator==(const LODTensor::LOD& a, const LODTensor::LOD& b) {
} }
} }
} }
return true; return true;
} }
void LODTensor::SliceLevels(size_t level_begin, size_t level_end) {
auto new_lod = framework::SliceLevels(lod_, level_begin, level_end);
lod_ = new_lod;
}
void LODTensor::SliceInLevel(size_t level, size_t elem_begin, size_t elem_end) {
PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level,
NumLevels());
PADDLE_ENFORCE(elem_begin < NumElements(level),
"element begin [%d] out of range [%d]", elem_begin,
NumElements(level));
PADDLE_ENFORCE(elem_end < NumElements(level) + 1,
"element end [%d] out of range [%d]", elem_end,
NumElements(level));
auto new_lod = framework::SliceInLevel(lod_, level, elem_begin, elem_end);
lod_ = new_lod;
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
#pragma once #pragma once
#include <memory> #include <memory>
#if !defined(PADDLE_ONLY_CPU) #ifndef PADDLE_ONLY_CPU
#include <thrust/device_vector.h> #include <thrust/device_vector.h>
#include <thrust/host_vector.h> #include <thrust/host_vector.h>
#endif #endif
...@@ -27,33 +27,39 @@ ...@@ -27,33 +27,39 @@
namespace paddle { namespace paddle {
namespace framework { namespace framework {
#ifdef PADDLE_ONLY_CPU
template <typename T>
using Vector = std::vector<T>;
#else
template <typename T>
using Vector = thrust::host_vector<T>;
#endif
using LOD = std::vector<Vector<size_t>>;
LOD SliceLevels(const LOD& in, size_t level_begin, size_t level_end);
LOD SliceInLevel(const LOD& in, size_t level, size_t elem_begin,
size_t elem_end);
bool operator==(const LOD& a, const LOD& b);
/* /*
* LODTensor (Level of details Tensor) * LODTensor (Level of details Tensor)
* see https://en.wikipedia.org/wiki/Level_of_details for reference. * see https://en.wikipedia.org/wiki/Level_of_details for reference.
*/ */
class LODTensor : public Tensor { class LODTensor {
public:
// Level save offsets of each unit.
#ifdef PADDLE_ONLY_CPU
template <typename T>
using Vector = std::vector<T>;
#else
template <typename T>
using Vector = thrust::host_vector<T>;
#endif
// LoD stores offsets of each level of units, the largest units level first,
// then the smaller units level. Each Level stores the offsets of units in
// Tesor.
class LOD : public std::vector<Vector<size_t>> {
public: public:
LOD SliceLevels(size_t level_begin, size_t level_end) const;
LOD SliceInLevel(size_t level, size_t elem_begin, size_t elem_end) const;
};
LODTensor() {} LODTensor() {}
explicit LODTensor(const LOD &lod) : lod_(lod) {} LODTensor(const LOD& lod, Tensor* t) : lod_(lod), tensor_(t) {}
void set_lod(const LOD& lod) { lod_ = lod; }
void set_tensor(Tensor* tensor) { tensor_ = tensor; }
Tensor& tensor() { return *tensor_; }
virtual Tensor *Clone() const { return new LODTensor(lod_); } LOD lod() { return lod_; }
/* /*
* Get a element from LOD. * Get a element from LOD.
...@@ -79,71 +85,23 @@ class LODTensor : public Tensor { ...@@ -79,71 +85,23 @@ class LODTensor : public Tensor {
PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level, PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level,
NumLevels()); NumLevels());
// the last offset is the end of last element // the last offset is the end of last element
return lod_[level].size() - 1; return (lod_)[level].size() - 1;
} }
/* /*
* Slice of levels[level_begin:level_end], with tensor shared. * Slice of levels[level_begin:level_end]
*/ */
template <typename T> void SliceLevels(size_t level_begin, size_t level_end);
LODTensor SliceLevels(size_t level_begin, size_t level_end) const;
/* /*
* Slice of elements of a level, [elem_begin: elem_end], with tensor shared. * Slice of elements of a level, [elem_begin: elem_end]
* @note: low performance in slice lod_. * @note: low performance in slice lod_.
*/ */
template <typename T> void SliceInLevel(size_t level, size_t elem_begin, size_t elem_end);
LODTensor SliceInLevel(size_t level, size_t elem_begin,
size_t elem_end) const;
/*
* Copy other's lod_'s content, free to mutate.
*/
void CopyLOD(const LODTensor &other) { lod_ = other.lod_; }
/*
* Determine whether LODTensor has a valid LOD info.
*/
const LOD &lod() const { return lod_; }
LOD *mutable_lod() { return &lod_; }
virtual ~LODTensor() {}
private: private:
LOD lod_; LOD lod_;
Tensor* tensor_; // not owned
}; };
bool operator==(const LODTensor::LOD &a, const LODTensor::LOD &b);
template <typename T>
LODTensor LODTensor::SliceLevels(size_t level_begin, size_t level_end) const {
auto new_lod = lod_.SliceLevels(level_begin, level_end);
// slice levels just need to update LOD info, each level will contains the
// whole tensor_, so no need to modify tensor_.
LODTensor new_tensor(new_lod);
new_tensor.ShareDataWith<T>(*this);
return new_tensor;
}
template <typename T>
LODTensor LODTensor::SliceInLevel(size_t level, size_t elem_begin,
size_t elem_end) const {
PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level,
NumLevels());
PADDLE_ENFORCE(elem_begin < NumElements(level),
"element begin [%d] out of range [%d]", elem_begin,
NumElements(level));
PADDLE_ENFORCE(elem_end < NumElements(level) + 1,
"element end [%d] out of range [%d]", elem_end,
NumElements(level));
auto new_lod = lod_.SliceInLevel(level, elem_begin, elem_end);
// slice elements just need to update LOD info, because offsets are not
// changed, so the original tensor_ can be reused.
LODTensor new_tensor(new_lod);
new_tensor.ShareDataWith<T>(*this);
return new_tensor;
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
# Design Doc: LoD (Level-of-Detail) Tensor
PaddlePaddle's RNN doesn't require that all instances have the same length. To do so, we introduce an extension to Tensor, namely, LoD Tensor.
## Challenge of Variable-length Inputs
People usually represent a mini-batch by a Tensor. For example, a mini-batch of 32 images, each of size 32x32, is a 10x32x32 Tensor. So a transformation, T, of all images can be a matrix multiplication of the 32x32xO-dimensional tensor T and the 10x32x32 Tensor.
Another example is that each mini-batch contains 32 sentences, where each word is a D-dimensional one-hot vector. If all sentences have the same length L, we can represent this mini-batch by a 32xLxD tensor. However, in most cases, sentences have variable lengths, and we will need an index data structure to record these variable lengths.
## LoD as a Solution
### Mini-Batch of variable-length sentenses
Let's imagine a mini-batch of 3 variable lengths sentences, containing 3, 1, and 2 words respectively. We can represent it by a (3+1+2)xD tensor plus some index information:
```
3
3 1 2
||| | ||
```
Each `|` represents a D-dimensional word vectors. The number 3 on top indicate 3 sentences, and numbers 3, 1, and 2 on the second level represent the number of words in each sentence.
### Mini-Batch of variable-length videos
This approach generalizes to the case where elements are not words, but higher dimensional objects, like images. Suppose that a mini-batch contains videos of the same frame size 640x480. If a mini-batch contains 3 videos of 3, 1, and 2 frames respectively. The underlying tensor is of size (3+1+2)x640x480. The index information illustrates as:
```
3
3 1 2
口口口 口 口口
```
where each `口` represents an image.
### Mini-Batch of fixed-size images
Let's get back to a typical example, image classification, where each mini-batch has M fixed-sized images. The LoD Tensor representation is
```
M
1 1 1 1 1
口口口口 ... 口
```
The many 1's on the second level seem duplicated. For this particular case of 2 levels and the second level always have length 1, we can ignore the LoD index.
### Design and summarization
In summary, as long as that the essential elements (words or images) have the same size, we can represent mini-batches by a LoD Tensor:
- The underlying tensor has size LxD1xD2x..., where D1xD2... is the size of the essential elements, and
- the first dimension size L has an additon property -- a LoD index as a nested vector:
```c++
typedef std::vector<std::vector> > LoD;
```
- The LoD index can is not necessary when there are only two levels and all elements of the second level have length 1.
## Slicing of LoD Tensor
Consider that we have a network with three levels of RNN: the top level one handles articles, the second level one handles sentences, and the basic level one handles words. This network requires that mini-batches represented by 4 level LoD Tensor, for example,
```
3
3 1 2
3 2 4 1 2 3
||| || |||| | || |||
```
To allow each level of RNN to handle its input, we define **the slicing of a LoD Tensor is defined as getting the j-th sequence on level i, or the <i,j>-slice**
For example, the <2,1>-slice of above slice is
```
2
||
```
and the <1,2>-slice of above example is
```
2
2 3
|| |||
```
Let's go on slicing this slice. Its <1,1>-slice is
```
3
|||
```
### The General Slicing Algorithm
The algorithm, with over-simplified data structure, is defined as
```c++
typedef vector<vector<int> > LoD;
struct LoDTensor {
LoD lod_;
float* tensor_;
};
LoDTensor Slice(const LoDTensor& lodt, int level, int sequence) {
}
```
### Slicing the Top Level
Please be aware that an RNN operator only slices the top level of a LoD Tensor to get the step inputs.
```c++
LoDTensor Slice(const LoDTensor& lodt, int sequence) {
}
```
...@@ -24,13 +24,12 @@ namespace framework { ...@@ -24,13 +24,12 @@ namespace framework {
class LODTensorTester : public ::testing::Test { class LODTensorTester : public ::testing::Test {
public: public:
virtual void SetUp() override { virtual void SetUp() override {
lod_tensor.reset(new LODTensor);
// tensor's batch_size: 30 // tensor's batch_size: 30
// 3 levels // 3 levels
// 0 10 20 // 0 10 20
// 0 5 10 15 20 // 0 5 10 15 20
// 0 2 5 7 10 12 15 20 // 0 2 5 7 10 12 15 20
LODTensor::LOD lod; LOD lod;
lod.push_back(std::vector<size_t>{0, 10, 20}); lod.push_back(std::vector<size_t>{0, 10, 20});
lod.push_back(std::vector<size_t>{0, 5, 10, 15, 20}); lod.push_back(std::vector<size_t>{0, 5, 10, 15, 20});
lod.push_back(std::vector<size_t>{0, 2, 5, 7, 10, 12, 15, 17, 20}); lod.push_back(std::vector<size_t>{0, 2, 5, 7, 10, 12, 15, 17, 20});
...@@ -41,75 +40,65 @@ class LODTensorTester : public ::testing::Test { ...@@ -41,75 +40,65 @@ class LODTensorTester : public ::testing::Test {
// malloc memory // malloc memory
tensor.mutable_data<float>(place); tensor.mutable_data<float>(place);
lod_tensor.reset(new LODTensor(lod)); lod_tensor.set_lod(lod);
lod_tensor->Resize({20 /*batch size*/, 128 /*dim*/}); lod_tensor.set_tensor(&tensor);
lod_tensor->ShareDataWith<float>(tensor);
// lod_tensor->ShareDataWith<Tensor>(tensor);
} }
protected: protected:
std::unique_ptr<LODTensor> lod_tensor;
platform::CPUPlace place; platform::CPUPlace place;
Tensor tensor; Tensor tensor;
LODTensor lod_tensor;
}; };
TEST_F(LODTensorTester, NumLevels) { ASSERT_EQ(lod_tensor->NumLevels(), 3UL); } TEST_F(LODTensorTester, NumLevels) { ASSERT_EQ(lod_tensor.NumLevels(), 3UL); }
TEST_F(LODTensorTester, NumElements) { TEST_F(LODTensorTester, NumElements) {
ASSERT_EQ(lod_tensor->NumElements(0), 2UL); ASSERT_EQ(lod_tensor.NumElements(0), 2UL);
ASSERT_EQ(lod_tensor->NumElements(1), 4UL); ASSERT_EQ(lod_tensor.NumElements(1), 4UL);
ASSERT_EQ(lod_tensor->NumElements(2), 8UL); ASSERT_EQ(lod_tensor.NumElements(2), 8UL);
} }
TEST_F(LODTensorTester, SliceLevels) { TEST_F(LODTensorTester, SliceLevels) {
// slice 1 level // slice 1 level
for (size_t level = 0; level < 3UL; ++level) { for (size_t level = 0; level < 3UL; ++level) {
auto new_lod_tensor = lod_tensor->SliceLevels<float>(level, level + 1); LODTensor new_lod_tensor = lod_tensor;
new_lod_tensor.SliceLevels(level, level + 1);
ASSERT_EQ(new_lod_tensor.NumLevels(), 1UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 1UL);
ASSERT_EQ(new_lod_tensor.NumElements(0UL), lod_tensor->NumElements(level)); ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor.NumElements(level));
// ASSERT_EQ(new_lod_tensor, *lod_tensor); ASSERT_EQ(new_lod_tensor.tensor().data<float>(),
lod_tensor.tensor().data<float>());
} }
// slice 2 level // slice 2 level
for (size_t level = 0; level < 2UL; ++level) { for (size_t level = 0; level < 2UL; ++level) {
auto new_lod_tensor = lod_tensor->SliceLevels<float>(level, level + 2); LODTensor new_lod_tensor = lod_tensor;
new_lod_tensor.SliceLevels(level, level + 2);
ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor->NumElements(level)); ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor.NumElements(level));
ASSERT_EQ(new_lod_tensor.NumElements(1), ASSERT_EQ(new_lod_tensor.NumElements(1), lod_tensor.NumElements(level + 1));
lod_tensor->NumElements(level + 1)); ASSERT_EQ(new_lod_tensor.tensor().data<float>(),
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor->data<float>()); lod_tensor.tensor().data<float>());
} }
} }
TEST_F(LODTensorTester, SliceInLevel) { TEST_F(LODTensorTester, SliceInLevel) {
size_t level = 0; size_t level = 0;
auto new_lod_tensor = lod_tensor->SliceInLevel<float>(level, 0, 2); LODTensor new_lod_tensor = lod_tensor;
new_lod_tensor.SliceInLevel(level, 0, 2);
EXPECT_EQ(new_lod_tensor.NumLevels(), 3UL); EXPECT_EQ(new_lod_tensor.NumLevels(), 3UL);
EXPECT_EQ(new_lod_tensor.NumElements(0), 2UL); EXPECT_EQ(new_lod_tensor.NumElements(0), 2UL);
EXPECT_EQ(new_lod_tensor.NumElements(1), 4UL); EXPECT_EQ(new_lod_tensor.NumElements(1), 4UL);
EXPECT_EQ(new_lod_tensor.NumElements(2), 8UL); EXPECT_EQ(new_lod_tensor.NumElements(2), 8UL);
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor->data<float>()); ASSERT_EQ(new_lod_tensor.tensor().data<float>(),
lod_tensor.tensor().data<float>());
level = 1; level = 1;
new_lod_tensor = lod_tensor->SliceInLevel<float>(level, 0, 2); new_lod_tensor = lod_tensor;
new_lod_tensor.SliceInLevel(level, 0, 2);
ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
ASSERT_EQ(new_lod_tensor.NumElements(0), 2UL); ASSERT_EQ(new_lod_tensor.NumElements(0), 2UL);
ASSERT_EQ(new_lod_tensor.NumElements(1), 4UL); ASSERT_EQ(new_lod_tensor.NumElements(1), 4UL);
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor->data<float>()); ASSERT_EQ(new_lod_tensor.tensor().data<float>(),
} lod_tensor.tensor().data<float>());
TEST_F(LODTensorTester, ShareLOD) {
LODTensor new_lod_tensor;
new_lod_tensor.CopyLOD(*lod_tensor);
ASSERT_EQ(new_lod_tensor.lod(), lod_tensor->lod());
}
TEST_F(LODTensorTester, CopyLOD) {
LODTensor new_lod_tensor;
new_lod_tensor.CopyLOD(*lod_tensor);
bool equals = std::equal(lod_tensor->lod().begin(), lod_tensor->lod().end(),
new_lod_tensor.lod().begin());
ASSERT_TRUE(equals);
} }
} // namespace framework } // namespace framework
......
...@@ -80,9 +80,19 @@ class OpInfoMap { ...@@ -80,9 +80,19 @@ class OpInfoMap {
} }
const OpInfo& Get(const std::string& type) const { const OpInfo& Get(const std::string& type) const {
auto op_info_ptr = GetNullable(type);
PADDLE_ENFORCE_NOT_NULL(op_info_ptr, "Operator %s has not been registered",
type);
return *op_info_ptr;
}
const OpInfo* GetNullable(const std::string& type) const {
auto it = map_.find(type); auto it = map_.find(type);
PADDLE_ENFORCE(it != map_.end(), "Operator %s are not found", type); if (it == map_.end()) {
return it->second; return nullptr;
} else {
return &it->second;
}
} }
template <typename Callback> template <typename Callback>
......
...@@ -33,8 +33,7 @@ namespace framework { ...@@ -33,8 +33,7 @@ namespace framework {
class OpRegistry { class OpRegistry {
public: public:
template <typename OpType, typename ProtoMakerType, typename GradOpType> template <typename OpType, typename ProtoMakerType, typename GradOpType>
static void RegisterOp(const std::string& op_type, static void RegisterOp(const std::string& op_type) {
const std::string& grad_op_type) {
PADDLE_ENFORCE(!OpInfoMap::Instance().Has(op_type), PADDLE_ENFORCE(!OpInfoMap::Instance().Has(op_type),
"'%s' is registered more than once.", op_type); "'%s' is registered more than once.", op_type);
OpInfo op_info; OpInfo op_info;
...@@ -43,9 +42,9 @@ class OpRegistry { ...@@ -43,9 +42,9 @@ class OpRegistry {
const VariableNameMap& outputs, const AttributeMap& attrs) { const VariableNameMap& outputs, const AttributeMap& attrs) {
return new OpType(type, inputs, outputs, attrs); return new OpType(type, inputs, outputs, attrs);
}; };
op_info.grad_op_type_ = grad_op_type;
if (std::type_index(typeid(ProtoMakerType)) != if (std::type_index(typeid(ProtoMakerType)) !=
std::type_index(typeid(NOPMaker))) { std::type_index(typeid(NOPMaker))) {
op_info.grad_op_type_ = op_type + "_grad";
op_info.proto_ = new OpProto; op_info.proto_ = new OpProto;
op_info.checker_ = new OpAttrChecker; op_info.checker_ = new OpAttrChecker;
auto maker = ProtoMakerType(op_info.proto_, op_info.checker_); auto maker = ProtoMakerType(op_info.proto_, op_info.checker_);
...@@ -55,15 +54,14 @@ class OpRegistry { ...@@ -55,15 +54,14 @@ class OpRegistry {
op_info.proto_->IsInitialized(), op_info.proto_->IsInitialized(),
"Fail to initialize %s's OpProto, because %s is not initialized", "Fail to initialize %s's OpProto, because %s is not initialized",
op_type, op_info.proto_->InitializationErrorString()); op_type, op_info.proto_->InitializationErrorString());
// register gradient op
RegisterOp<GradOpType, NOPMaker, NOP>(op_info.grad_op_type_);
} else { } else {
op_info.grad_op_type_ = "";
op_info.proto_ = nullptr; op_info.proto_ = nullptr;
op_info.checker_ = nullptr; op_info.checker_ = nullptr;
} }
OpInfoMap::Instance().Insert(op_type, op_info); OpInfoMap::Instance().Insert(op_type, op_info);
// register gradient op
if (!grad_op_type.empty()) {
RegisterOp<GradOpType, NOPMaker, NOP>(grad_op_type, "");
}
} }
static std::unique_ptr<OperatorBase> CreateOp(const std::string& type, static std::unique_ptr<OperatorBase> CreateOp(const std::string& type,
...@@ -92,10 +90,8 @@ class Registrar { ...@@ -92,10 +90,8 @@ class Registrar {
template <typename OpType, typename ProtoMakerType, typename GradOpType> template <typename OpType, typename ProtoMakerType, typename GradOpType>
class OpRegistrar : public Registrar { class OpRegistrar : public Registrar {
public: public:
explicit OpRegistrar(const char* op_type) { OpRegistrar(op_type, ""); } explicit OpRegistrar(const char* op_type) {
OpRegistrar(const char* op_type, const char* grad_op_type) { OpRegistry::RegisterOp<OpType, ProtoMakerType, GradOpType>(op_type);
OpRegistry::RegisterOp<OpType, ProtoMakerType, GradOpType>(op_type,
grad_op_type);
} }
}; };
...@@ -121,8 +117,7 @@ class OpKernelRegistrar : public Registrar { ...@@ -121,8 +117,7 @@ class OpKernelRegistrar : public Registrar {
/** /**
* Macro to register Operator. * Macro to register Operator.
*/ */
#define REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, \ #define REGISTER_OP(op_type, op_class, op_maker_class, grad_op_class) \
grad_op_class) \
STATIC_ASSERT_GLOBAL_NAMESPACE( \ STATIC_ASSERT_GLOBAL_NAMESPACE( \
__reg_op__##op_type, "REGISTER_OP must be called in global namespace"); \ __reg_op__##op_type, "REGISTER_OP must be called in global namespace"); \
class _OpClass_##op_type##_ : public op_class { \ class _OpClass_##op_type##_ : public op_class { \
...@@ -137,14 +132,14 @@ class OpKernelRegistrar : public Registrar { ...@@ -137,14 +132,14 @@ class OpKernelRegistrar : public Registrar {
}; \ }; \
static ::paddle::framework::OpRegistrar< \ static ::paddle::framework::OpRegistrar< \
_OpClass_##op_type##_, op_maker_class, _OpGradClass_##op_type##_> \ _OpClass_##op_type##_, op_maker_class, _OpGradClass_##op_type##_> \
__op_registrar_##op_type##__(#op_type, #grad_op_type); \ __op_registrar_##op_type##__(#op_type); \
int TouchOpRegistrar_##op_type() { \ int TouchOpRegistrar_##op_type() { \
__op_registrar_##op_type##__.Touch(); \ __op_registrar_##op_type##__.Touch(); \
return 0; \ return 0; \
} }
#define REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class) \ #define REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class) \
REGISTER_OP(op_type, op_class, op_maker_class, , ::paddle::framework::NOP) REGISTER_OP(op_type, op_class, op_maker_class, ::paddle::framework::NOP)
/** /**
* Macro to register OperatorKernel. * Macro to register OperatorKernel.
......
...@@ -33,12 +33,12 @@ ExecutionContext::GetEigenDevice<platform::GPUPlace, Eigen::GpuDevice>() const { ...@@ -33,12 +33,12 @@ ExecutionContext::GetEigenDevice<platform::GPUPlace, Eigen::GpuDevice>() const {
} }
#endif #endif
const std::string& OperatorBase::Input(const std::string& name) const { std::string OperatorBase::Input(const std::string& name) const {
auto& ins = Inputs(name); auto& ins = Inputs(name);
PADDLE_ENFORCE_EQ(ins.size(), 1UL, PADDLE_ENFORCE_LE(ins.size(), 1UL,
"Op %s input %s should contain only one variable", type_, "Op %s input %s should contain only one variable", type_,
name); name);
return ins[0]; return ins.empty() ? kEmptyVarName : ins[0];
} }
const std::vector<std::string>& OperatorBase::Inputs( const std::vector<std::string>& OperatorBase::Inputs(
...@@ -49,12 +49,12 @@ const std::vector<std::string>& OperatorBase::Inputs( ...@@ -49,12 +49,12 @@ const std::vector<std::string>& OperatorBase::Inputs(
return it->second; return it->second;
} }
const std::string& OperatorBase::Output(const std::string& name) const { std::string OperatorBase::Output(const std::string& name) const {
auto& outs = Outputs(name); auto& outs = Outputs(name);
PADDLE_ENFORCE_EQ(outs.size(), 1UL, PADDLE_ENFORCE_LE(outs.size(), 1UL,
"Op %s output %s should contain only one variable", type_, "Op %s output %s should contain only one variable", type_,
name); name);
return outs[0]; return outs.empty() ? kEmptyVarName : outs[0];
} }
const std::vector<std::string>& OperatorBase::Outputs( const std::vector<std::string>& OperatorBase::Outputs(
...@@ -119,16 +119,8 @@ OperatorBase::OperatorBase(const std::string& type, ...@@ -119,16 +119,8 @@ OperatorBase::OperatorBase(const std::string& type,
const VariableNameMap& outputs, const VariableNameMap& outputs,
const AttributeMap& attrs) const AttributeMap& attrs)
: type_(type), inputs_(inputs), outputs_(outputs), attrs_(attrs) { : type_(type), inputs_(inputs), outputs_(outputs), attrs_(attrs) {
static std::atomic<size_t> gUniqId(0UL); GenerateTemporaryNames();
for (auto& output : outputs_) { CheckAllInputOutputSet();
for (auto& output_name : output.second) {
if (output_name == kTempVarName) {
output_name += type_;
output_name += "@";
output_name += std::to_string(gUniqId.fetch_add(1));
}
}
}
} }
std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const { std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const {
...@@ -156,6 +148,35 @@ std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const { ...@@ -156,6 +148,35 @@ std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const {
return ret_val; return ret_val;
} }
void OperatorBase::CheckAllInputOutputSet() const {
auto& info_map = OpInfoMap::Instance();
auto* op_info = info_map.GetNullable(Type());
if (op_info == nullptr || op_info->proto_ == nullptr) return;
for (auto& in : op_info->Proto().inputs()) {
PADDLE_ENFORCE(inputs_.find(in.name()) != inputs_.end(),
"Type %s's input %s is not set", Type(), in.name());
}
for (auto& out : op_info->Proto().outputs()) {
PADDLE_ENFORCE(outputs_.find(out.name()) != outputs_.end(),
"Type %s's output %s is not set", Type(), out.name());
}
}
void OperatorBase::GenerateTemporaryNames() {
static std::atomic<size_t> gUniqId(0UL);
for (auto& output : outputs_) {
for (auto& output_name : output.second) {
if (output_name == kTempVarName) {
output_name += type_;
output_name += "@";
output_name += std::to_string(gUniqId.fetch_add(1));
}
}
}
}
void OpProtoAndCheckerMaker::Validate() { void OpProtoAndCheckerMaker::Validate() {
validated_ = true; validated_ = true;
CheckNoDuplicatedInOutAttrs(); CheckNoDuplicatedInOutAttrs();
......
...@@ -95,12 +95,12 @@ class OperatorBase { ...@@ -95,12 +95,12 @@ class OperatorBase {
const VariableNameMap& Inputs() const { return inputs_; } const VariableNameMap& Inputs() const { return inputs_; }
const VariableNameMap& Outputs() const { return outputs_; } const VariableNameMap& Outputs() const { return outputs_; }
//! Get a input with argument's name described in `op_proto` //! Get a input with argument's name described in `op_proto`
const std::string& Input(const std::string& name) const; std::string Input(const std::string& name) const;
//! Get a input which has multiple variables. //! Get a input which has multiple variables.
const std::vector<std::string>& Inputs(const std::string& name) const; const std::vector<std::string>& Inputs(const std::string& name) const;
//! Get a output with argument's name described in `op_proto` //! Get a output with argument's name described in `op_proto`
const std::string& Output(const std::string& name) const; std::string Output(const std::string& name) const;
//! Get an output which has multiple variables. //! Get an output which has multiple variables.
//! TODO add a vector_view to prevent memory copy. //! TODO add a vector_view to prevent memory copy.
const std::vector<std::string>& Outputs(const std::string& name) const; const std::vector<std::string>& Outputs(const std::string& name) const;
...@@ -127,6 +127,10 @@ class OperatorBase { ...@@ -127,6 +127,10 @@ class OperatorBase {
// IG (Inputs Gradients) // IG (Inputs Gradients)
VariableNameMap outputs_; VariableNameMap outputs_;
AttributeMap attrs_; AttributeMap attrs_;
private:
void GenerateTemporaryNames();
void CheckAllInputOutputSet() const;
}; };
// Macro for define a clone method. // Macro for define a clone method.
...@@ -238,11 +242,13 @@ class InferShapeContext { ...@@ -238,11 +242,13 @@ class InferShapeContext {
} }
const Variable* InputVar(const std::string& name) const { const Variable* InputVar(const std::string& name) const {
return scope_.FindVar(op_.Input(name)); auto ipt = op_.Input(name);
return ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
} }
Variable* OutputVar(const std::string& name) const { Variable* OutputVar(const std::string& name) const {
return scope_.FindVar(op_.Output(name)); auto opt = op_.Output(name);
return opt == kEmptyVarName ? nullptr : scope_.FindVar(opt);
} }
const std::vector<const Variable*> MultiInputVar( const std::vector<const Variable*> MultiInputVar(
...@@ -250,9 +256,11 @@ class InferShapeContext { ...@@ -250,9 +256,11 @@ class InferShapeContext {
auto names = op_.Inputs(name); auto names = op_.Inputs(name);
std::vector<const Variable*> res; std::vector<const Variable*> res;
res.reserve(names.size()); res.reserve(names.size());
std::transform( std::transform(names.begin(), names.end(), std::back_inserter(res),
names.begin(), names.end(), std::back_inserter(res), [this](const std::string& name) {
[this](const std::string& name) { return scope_.FindVar(name); }); return name == kEmptyVarName ? nullptr
: scope_.FindVar(name);
});
return res; return res;
} }
...@@ -260,24 +268,24 @@ class InferShapeContext { ...@@ -260,24 +268,24 @@ class InferShapeContext {
auto names = op_.Outputs(name); auto names = op_.Outputs(name);
std::vector<const Variable*> res; std::vector<const Variable*> res;
res.reserve(names.size()); res.reserve(names.size());
std::transform( std::transform(names.begin(), names.end(), std::back_inserter(res),
names.begin(), names.end(), std::back_inserter(res), [this](const std::string& name) {
[this](const std::string& name) { return scope_.FindVar(name); }); return name == kEmptyVarName ? nullptr
: scope_.FindVar(name);
});
return res; return res;
} }
template <typename T> template <typename T>
const T* Input(const std::string& name) const { const T* Input(const std::string& name) const {
auto* var = InputVar(name); auto* var = InputVar(name);
PADDLE_ENFORCE_NOT_NULL(var, "Input(%s) should not be nullptr", name); return var == nullptr ? nullptr : &var->Get<T>();
return &var->Get<T>();
} }
template <typename T> template <typename T>
T* Output(const std::string& name) const { T* Output(const std::string& name) const {
auto var = OutputVar(name); auto var = OutputVar(name);
PADDLE_ENFORCE_NOT_NULL(var, "Output(%s) should not be nullptr", name); return var == nullptr ? nullptr : var->GetMutable<T>();
return var->GetMutable<T>();
} }
template <typename T> template <typename T>
...@@ -288,10 +296,7 @@ class InferShapeContext { ...@@ -288,10 +296,7 @@ class InferShapeContext {
std::transform(names.begin(), names.end(), std::back_inserter(res), std::transform(names.begin(), names.end(), std::back_inserter(res),
[&](const std::string& sub_name) { [&](const std::string& sub_name) {
auto var = scope_.FindVar(sub_name); auto var = scope_.FindVar(sub_name);
PADDLE_ENFORCE_NOT_NULL( return var == nullptr ? nullptr : &var->Get<T>();
var, "MultiInput(%s:%s) should not be nullptr", name,
sub_name);
return &var->Get<T>();
}); });
return res; return res;
} }
...@@ -304,10 +309,7 @@ class InferShapeContext { ...@@ -304,10 +309,7 @@ class InferShapeContext {
std::transform(names.begin(), names.end(), std::back_inserter(res), std::transform(names.begin(), names.end(), std::back_inserter(res),
[&](const std::string& sub_name) { [&](const std::string& sub_name) {
auto var = scope_.FindVar(sub_name); auto var = scope_.FindVar(sub_name);
PADDLE_ENFORCE_NOT_NULL( return var == nullptr ? nullptr : var->GetMutable<T>();
var, "MultiOutput(%s:%s) should not be nullptr.", name,
sub_name);
return var->GetMutable<T>();
}); });
return res; return res;
} }
......
...@@ -223,7 +223,7 @@ void CrossEntropyOverBeam::checkInputs() { ...@@ -223,7 +223,7 @@ void CrossEntropyOverBeam::checkInputs() {
<< inputLayers_[i * 3]->getName() << inputLayers_[i * 3]->getName()
<< " should be a nested sequence"; << " should be a nested sequence";
CHECK_EQ(getInputValue(i * 3 + 1)->getWidth(), beamSize_); CHECK_EQ(getInputValue(i * 3 + 1)->getWidth(), beamSize_);
CHECK_EQ(scores.getNumSequences(), batchSize_); CHECK_EQ(batchSize_, static_cast<size_t>(scores.getNumSequences()));
CHECK_EQ(scores.getNumSubSequences(), selCandidates.getBatchSize()); CHECK_EQ(scores.getNumSubSequences(), selCandidates.getBatchSize());
} else { } else {
CHECK(scores.hasSeq()) << "input " << i << " " CHECK(scores.hasSeq()) << "input " << i << " "
...@@ -231,10 +231,10 @@ void CrossEntropyOverBeam::checkInputs() { ...@@ -231,10 +231,10 @@ void CrossEntropyOverBeam::checkInputs() {
<< " should be a sequence"; << " should be a sequence";
batchSize_ = scores.getNumSequences(); batchSize_ = scores.getNumSequences();
beamSize_ = getInputValue(i * 3 + 1)->getWidth(); beamSize_ = getInputValue(i * 3 + 1)->getWidth();
CHECK_EQ(batchSize_, selCandidates.getBatchSize()); CHECK_EQ(batchSize_, static_cast<size_t>(selCandidates.getBatchSize()));
} }
CHECK_EQ(1U, scores.value->getWidth()); CHECK_EQ(1U, scores.value->getWidth());
CHECK_EQ(batchSize_, goldSeq.getBatchSize()); CHECK_EQ(batchSize_, static_cast<size_t>(goldSeq.getBatchSize()));
} }
} }
...@@ -377,8 +377,8 @@ void CrossEntropyOverBeam::forward(PassType passType) { ...@@ -377,8 +377,8 @@ void CrossEntropyOverBeam::forward(PassType passType) {
MatrixPtr outputValue = getOutputValue(); MatrixPtr outputValue = getOutputValue();
for (size_t i = 0; i < batchSize_; ++i) { for (size_t i = 0; i < batchSize_; ++i) {
beamCosts_[i].setData( BeamExpansionPtr ptr = std::make_shared<BeamExpansion>(beamPerSeq_[i]);
std::move(std::make_shared<BeamExpansion>(beamPerSeq_[i])), beamSize_); beamCosts_[i].setData(std::move(ptr), beamSize_);
outputValue->getData()[i] = beamCosts_[i].forward(); outputValue->getData()[i] = beamCosts_[i].forward();
} }
} }
......
...@@ -48,7 +48,16 @@ public: ...@@ -48,7 +48,16 @@ public:
<< inputLayers_.size() << ") at " << getName(); << inputLayers_.size() << ") at " << getName();
} }
s << format.substr(pos); s << format.substr(pos);
LOG(INFO) << s.str();
const std::string delimiter("\n");
std::string content = s.str();
std::string::size_type foundPos = 0;
std::string::size_type prevPos = 0;
while ((foundPos = content.find(delimiter, prevPos)) != std::string::npos) {
LOG(INFO) << content.substr(prevPos, foundPos - prevPos);
prevPos = foundPos + delimiter.size();
}
LOG(INFO) << content.substr(prevPos);
} }
void backward(const UpdateCallback& callback) override {} void backward(const UpdateCallback& callback) override {}
......
file(GLOB GENERAL_OPS RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}" "*_op.cc")
string(REPLACE ".cc" "" GENERAL_OPS "${GENERAL_OPS}")
function(op_library TARGET) function(op_library TARGET)
# op_library is a function to create op library. The interface is same as # op_library is a function to create op library. The interface is same as
# cc_library. But it handle split GPU/CPU code and link some common library # cc_library. But it handle split GPU/CPU code and link some common library
# for ops. # for ops.
set(OP_LIBRARY ${TARGET} ${OP_LIBRARY} PARENT_SCOPE)
set(cc_srcs) set(cc_srcs)
set(cu_srcs) set(cu_srcs)
set(op_common_deps operator op_registry) set(op_common_deps operator op_registry)
...@@ -43,33 +46,26 @@ endfunction() ...@@ -43,33 +46,26 @@ endfunction()
add_subdirectory(math) add_subdirectory(math)
cc_test(gather_test SRCS gather_test.cc DEPS tensor) list(REMOVE_ITEM GENERAL_OPS
op_library(gather_op SRCS gather_op.cc gather_op.cu) net_op
minus_op
cc_test(scatter_test SRCS scatter_test.cc DEPS tensor) mul_op
op_library(scatter_op SRCS scatter_op.cc scatter_op.cu) recurrent_op
scale_op)
cc_library(net_op SRCS net_op.cc DEPS op_registry)
cc_test(net_op_test SRCS net_op_test.cc DEPS net_op)
op_library(add_op SRCS add_op.cc add_op.cu)
op_library(mean_op SRCS mean_op.cc mean_op.cu)
op_library(net_op SRCS net_op.cc)
op_library(minus_op SRCS minus_op.cc minus_op.cu DEPS scale_op)
op_library(mul_op SRCS mul_op.cc mul_op.cu DEPS math_function) op_library(mul_op SRCS mul_op.cc mul_op.cu DEPS math_function)
op_library(rowwise_add_op SRCS rowwise_add_op.cu rowwise_add_op.cc) op_library(recurrent_op SRCS recurrent_op.cc rnn/recurrent_op_utils.cc
DEPS framework_proto tensor operator net_op)
op_library(scale_op SRCS scale_op.cc scale_op.cu DEPS net_op)
op_library(sigmoid_op SRCS sigmoid_op.cc sigmoid_op.cu) foreach(src ${GENERAL_OPS})
op_library(softmax_op SRCS softmax_op.cc softmax_op.cu) op_library(${src} SRCS ${src}.cc ${src}.cu)
op_library(gaussian_random_op SRCS gaussian_random_op.cc gaussian_random_op.cu) endforeach()
op_library(cross_entropy_op SRCS cross_entropy_op.cc cross_entropy_op.cu)
op_library(fill_zeros_like_op SRCS fill_zeros_like_op.cc fill_zeros_like_op.cu)
op_library(sgd_op SRCS sgd_op.cc sgd_op.cu) set(GLOB_OP_LIB ${OP_LIBRARY} CACHE INTERNAL "Global OP library")
op_library(recurrent_op SRCS recurrent_op.cc rnn/recurrent_op_utils.cc cc_test(gather_test SRCS gather_test.cc DEPS tensor)
DEPS framework_proto tensor op_registry operator net_op) cc_test(net_op_test SRCS net_op_test.cc DEPS net_op)
op_library(uniform_random_op SRCS uniform_random_op.cc uniform_random_op.cu) cc_test(scatter_test SRCS scatter_test.cc DEPS tensor)
op_library(lookup_table_op SRCS lookup_table_op.cc lookup_table_op.cu)
op_library(scale_op SRCS scale_op.cc scale_op.cu DEPS net_op)
op_library(minus_op SRCS minus_op.cc minus_op.cu DEPS scale_op)
...@@ -57,7 +57,7 @@ class AddOpGrad : public framework::OperatorWithKernel { ...@@ -57,7 +57,7 @@ class AddOpGrad : public framework::OperatorWithKernel {
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(add_two, ops::AddOp, ops::AddOpMaker, add_two_grad, ops::AddOpGrad); REGISTER_OP(add_two, ops::AddOp, ops::AddOpMaker, ops::AddOpGrad);
REGISTER_OP_CPU_KERNEL(add_two, REGISTER_OP_CPU_KERNEL(add_two,
ops::AddKernel<paddle::platform::CPUPlace, float>); ops::AddKernel<paddle::platform::CPUPlace, float>);
...@@ -67,8 +67,7 @@ OnehotCrossEntropy Operator. ...@@ -67,8 +67,7 @@ OnehotCrossEntropy Operator.
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(onehot_cross_entropy, ops::OnehotCrossEntropyOp, REGISTER_OP(onehot_cross_entropy, ops::OnehotCrossEntropyOp,
ops::OnehotCrossEntropyOpMaker, onehot_cross_entropy_grad, ops::OnehotCrossEntropyOpMaker, ops::OnehotCrossEntropyGradientOp);
ops::OnehotCrossEntropyGradientOp);
REGISTER_OP_CPU_KERNEL(onehot_cross_entropy, REGISTER_OP_CPU_KERNEL(onehot_cross_entropy,
ops::OnehotCrossEntropyOpKernel<float>); ops::OnehotCrossEntropyOpKernel<float>);
REGISTER_OP_CPU_KERNEL(onehot_cross_entropy_grad, REGISTER_OP_CPU_KERNEL(onehot_cross_entropy_grad,
......
...@@ -63,8 +63,7 @@ Out = X[Index] ...@@ -63,8 +63,7 @@ Out = X[Index]
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(gather, ops::GatherOp, ops::GatherOpMaker, gather_grad, REGISTER_OP(gather, ops::GatherOp, ops::GatherOpMaker, ops::GatherGradOp);
ops::GatherGradOp);
REGISTER_OP_CPU_KERNEL(gather, REGISTER_OP_CPU_KERNEL(gather,
ops::GatherOpKernel<paddle::platform::CPUPlace, float>); ops::GatherOpKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -66,7 +66,7 @@ class LookupTableOpGrad : public framework::OperatorWithKernel { ...@@ -66,7 +66,7 @@ class LookupTableOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(lookup_table, ops::LookupTableOp, ops::LookupTableOpMaker, REGISTER_OP(lookup_table, ops::LookupTableOp, ops::LookupTableOpMaker,
lookup_table_grad, ops::LookupTableOpGrad); ops::LookupTableOpGrad);
REGISTER_OP_CPU_KERNEL(lookup_table, ops::LookupTableKernel<float>); REGISTER_OP_CPU_KERNEL(lookup_table, ops::LookupTableKernel<float>);
REGISTER_OP_CPU_KERNEL(lookup_table_grad, ops::LookupTableGradKernel<float>); REGISTER_OP_CPU_KERNEL(lookup_table_grad, ops::LookupTableGradKernel<float>);
...@@ -54,7 +54,7 @@ class MeanGradOp : public framework::OperatorWithKernel { ...@@ -54,7 +54,7 @@ class MeanGradOp : public framework::OperatorWithKernel {
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(mean, ops::MeanOp, ops::MeanOpMaker, mean_grad, ops::MeanGradOp); REGISTER_OP(mean, ops::MeanOp, ops::MeanOpMaker, ops::MeanGradOp);
REGISTER_OP_CPU_KERNEL(mean, REGISTER_OP_CPU_KERNEL(mean,
ops::MeanKernel<paddle::platform::CPUPlace, float>); ops::MeanKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(mean_grad, REGISTER_OP_CPU_KERNEL(mean_grad,
......
...@@ -81,7 +81,6 @@ class MinusGradOp : public NetOp { ...@@ -81,7 +81,6 @@ class MinusGradOp : public NetOp {
USE_OP(scale); USE_OP(scale);
USE_OP_ITSELF(identity); USE_OP_ITSELF(identity);
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(minus, ops::MinusOp, ops::MinusOpMaker, minus_grad, REGISTER_OP(minus, ops::MinusOp, ops::MinusOpMaker, ops::MinusGradOp<float>);
ops::MinusGradOp<float>);
REGISTER_OP_CPU_KERNEL(minus, REGISTER_OP_CPU_KERNEL(minus,
ops::MinusKernel<paddle::platform::CPUPlace, float>); ops::MinusKernel<paddle::platform::CPUPlace, float>);
...@@ -84,7 +84,7 @@ class MulOpGrad : public framework::OperatorWithKernel { ...@@ -84,7 +84,7 @@ class MulOpGrad : public framework::OperatorWithKernel {
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, mul_grad, ops::MulOpGrad); REGISTER_OP(mul, ops::MulOp, ops::MulOpMaker, ops::MulOpGrad);
REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel<paddle::platform::CPUPlace, float>); REGISTER_OP_CPU_KERNEL(mul, ops::MulKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(mul_grad, REGISTER_OP_CPU_KERNEL(mul_grad,
ops::MulGradKernel<paddle::platform::CPUPlace, float>); ops::MulGradKernel<paddle::platform::CPUPlace, float>);
...@@ -74,7 +74,7 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel { ...@@ -74,7 +74,7 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(rowwise_add, ops::RowwiseAddOp, ops::RowwiseAddOpMaker, REGISTER_OP(rowwise_add, ops::RowwiseAddOp, ops::RowwiseAddOpMaker,
rowwise_add_grad, ops::RowwiseAddGradOp); ops::RowwiseAddGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
rowwise_add, ops::RowwiseAddKernel<paddle::platform::CPUPlace, float>); rowwise_add, ops::RowwiseAddKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -97,7 +97,7 @@ class IdentityOp : public NetOp { ...@@ -97,7 +97,7 @@ class IdentityOp : public NetOp {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(scale, ops::ScaleOp, ops::ScaleOpMaker<float>, scale_grad, REGISTER_OP(scale, ops::ScaleOp, ops::ScaleOpMaker<float>,
ops::ScaleGradOp<float>); ops::ScaleGradOp<float>);
REGISTER_OP_CPU_KERNEL(scale, REGISTER_OP_CPU_KERNEL(scale,
ops::ScaleKernel<paddle::platform::CPUPlace, float>); ops::ScaleKernel<paddle::platform::CPUPlace, float>);
......
...@@ -77,8 +77,7 @@ Out[Index] = Ref[Index] + Updates ...@@ -77,8 +77,7 @@ Out[Index] = Ref[Index] + Updates
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(scatter, ops::ScatterOp, ops::ScatterOpMaker, scatter_grad, REGISTER_OP(scatter, ops::ScatterOp, ops::ScatterOpMaker, ops::ScatterGradOp);
ops::ScatterGradOp);
REGISTER_OP_CPU_KERNEL(scatter, REGISTER_OP_CPU_KERNEL(scatter,
ops::ScatterOpKernel<paddle::platform::CPUPlace, float>); ops::ScatterOpKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -53,8 +53,7 @@ class SigmoidOpGrad : public framework::OperatorWithKernel { ...@@ -53,8 +53,7 @@ class SigmoidOpGrad : public framework::OperatorWithKernel {
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(sigmoid, ops::SigmoidOp, ops::SigmoidOpMaker, sigmoid_grad, REGISTER_OP(sigmoid, ops::SigmoidOp, ops::SigmoidOpMaker, ops::SigmoidOpGrad);
ops::SigmoidOpGrad);
REGISTER_OP_CPU_KERNEL(sigmoid, REGISTER_OP_CPU_KERNEL(sigmoid,
ops::SigmoidKernel<paddle::platform::CPUPlace, float>); ops::SigmoidKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -62,8 +62,7 @@ class SoftmaxOpGrad : public framework::OperatorWithKernel { ...@@ -62,8 +62,7 @@ class SoftmaxOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(softmax, ops::SoftmaxOp, ops::SoftmaxOpMaker, softmax_grad, REGISTER_OP(softmax, ops::SoftmaxOp, ops::SoftmaxOpMaker, ops::SoftmaxOpGrad);
ops::SoftmaxOpGrad);
REGISTER_OP_CPU_KERNEL(softmax, REGISTER_OP_CPU_KERNEL(softmax,
ops::SoftmaxKernel<paddle::platform::CPUPlace, float>); ops::SoftmaxKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -2,21 +2,5 @@ if(WITH_PYTHON) ...@@ -2,21 +2,5 @@ if(WITH_PYTHON)
cc_library(paddle_pybind SHARED cc_library(paddle_pybind SHARED
SRCS pybind.cc SRCS pybind.cc
DEPS pybind python backward DEPS pybind python backward
sgd_op ${GLOB_OP_LIB})
gather_op
scatter_op
add_op
mul_op
rowwise_add_op
sigmoid_op
softmax_op
mean_op
cross_entropy_op
recurrent_op
uniform_random_op
gaussian_random_op
fill_zeros_like_op
lookup_table_op
scale_op
minus_op)
endif(WITH_PYTHON) endif(WITH_PYTHON)
...@@ -137,7 +137,7 @@ __all__ = [ ...@@ -137,7 +137,7 @@ __all__ = [
'clip_layer', 'clip_layer',
'slice_projection', 'slice_projection',
'seq_slice_layer', 'seq_slice_layer',
'kmax_sequence_score_layer', 'kmax_seq_score_layer',
'img_pool3d_layer', 'img_pool3d_layer',
'scale_shift_layer', 'scale_shift_layer',
'img_conv3d_layer', 'img_conv3d_layer',
...@@ -5994,7 +5994,7 @@ def cross_entropy_over_beam(input, name=None): ...@@ -5994,7 +5994,7 @@ def cross_entropy_over_beam(input, name=None):
Note that, if gold falls off the beam at search step t, then the cost is Note that, if gold falls off the beam at search step t, then the cost is
calculated over the beam at step t. calculated over the beam at step t.
This cost layer always works together with kmax_sequence_score_layer, This cost layer always works together with kmax_seq_score_layer,
sub_nested_seq_layer, and sequence_slice_layer to trim the input to form a sub_nested_seq_layer, and sequence_slice_layer to trim the input to form a
sub-search space. sub-search space.
...@@ -6597,14 +6597,14 @@ def seq_slice_layer(input, starts, ends, name=None): ...@@ -6597,14 +6597,14 @@ def seq_slice_layer(input, starts, ends, name=None):
@wrap_name_default() @wrap_name_default()
@layer_support() @layer_support()
def kmax_sequence_score_layer(input, name=None, beam_size=1): def kmax_seq_score_layer(input, name=None, beam_size=1):
""" """
This layer accepts one input which are scores over a sequence or a nested This layer accepts one input which are scores over a sequence or a nested
sequence, and returns indices of beam_size sequences with highest scores. sequence, and returns indices of beam_size sequences with highest scores.
.. code-block:: python .. code-block:: python
kmax_indices = kmax_sequence_score_layer(input=input_layer, beam_size) kmax_indices = kmax_seq_score_layer(input=input_layer, beam_size)
:param name: The Layer Name. :param name: The Layer Name.
...@@ -6617,10 +6617,10 @@ def kmax_sequence_score_layer(input, name=None, beam_size=1): ...@@ -6617,10 +6617,10 @@ def kmax_sequence_score_layer(input, name=None, beam_size=1):
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
""" """
assert isinstance(input, LayerOutput), ("kmax_sequence_score_layer " assert isinstance(input, LayerOutput), ("kmax_seq_score_layer "
"accepts only one input.") "accepts only one input.")
assert input.size == 1, ( assert input.size == 1, (
"input of kmax_sequence_score_layer is a score" "input of kmax_seq_score_layer is a score "
"over a sequence or a nested sequence, so its width must be 1.") "over a sequence or a nested sequence, so its width must be 1.")
Layer( Layer(
......
...@@ -8,7 +8,7 @@ test_spp_layer test_bilinear_interp test_maxout test_bi_grumemory math_ops ...@@ -8,7 +8,7 @@ test_spp_layer test_bilinear_interp test_maxout test_bi_grumemory math_ops
test_seq_concat_reshape test_pad test_smooth_l1 test_multiplex_layer test_seq_concat_reshape test_pad test_smooth_l1 test_multiplex_layer
test_prelu_layer test_row_conv test_detection_output_layer test_multibox_loss_layer test_prelu_layer test_row_conv test_detection_output_layer test_multibox_loss_layer
test_recursive_topology test_gated_unit_layer test_clip_layer test_row_l2_norm_layer test_recursive_topology test_gated_unit_layer test_clip_layer test_row_l2_norm_layer
test_kmax_seq_socre_layer test_seq_select_layers test_scale_shift_layer test_kmax_seq_socre_layer test_sub_nested_seq_select_layer test_scale_shift_layer
test_seq_slice_layer test_cross_entropy_over_beam test_pooling3D_layer test_seq_slice_layer test_cross_entropy_over_beam test_pooling3D_layer
test_conv3d_layer test_deconv3d_layer) test_conv3d_layer test_deconv3d_layer)
......
...@@ -12,7 +12,7 @@ layers { ...@@ -12,7 +12,7 @@ layers {
active_type: "" active_type: ""
} }
layers { layers {
name: "__kmax_sequence_score_layer_0__" name: "__kmax_seq_score_layer_0__"
type: "kmax_seq_score" type: "kmax_seq_score"
active_type: "" active_type: ""
inputs { inputs {
...@@ -29,7 +29,7 @@ layers { ...@@ -29,7 +29,7 @@ layers {
input_layer_name: "sentence_states" input_layer_name: "sentence_states"
} }
inputs { inputs {
input_layer_name: "__kmax_sequence_score_layer_0__" input_layer_name: "__kmax_seq_score_layer_0__"
} }
} }
layers { layers {
...@@ -44,7 +44,7 @@ layers { ...@@ -44,7 +44,7 @@ layers {
bias_parameter_name: "___fc_layer_0__.wbias" bias_parameter_name: "___fc_layer_0__.wbias"
} }
layers { layers {
name: "__kmax_sequence_score_layer_1__" name: "__kmax_seq_score_layer_1__"
type: "kmax_seq_score" type: "kmax_seq_score"
active_type: "" active_type: ""
inputs { inputs {
...@@ -61,7 +61,7 @@ layers { ...@@ -61,7 +61,7 @@ layers {
input_layer_name: "__sub_nested_seq_layer_0__" input_layer_name: "__sub_nested_seq_layer_0__"
} }
inputs { inputs {
input_layer_name: "__kmax_sequence_score_layer_1__" input_layer_name: "__kmax_seq_score_layer_1__"
} }
select_first: true select_first: true
} }
...@@ -77,7 +77,7 @@ layers { ...@@ -77,7 +77,7 @@ layers {
bias_parameter_name: "___fc_layer_1__.wbias" bias_parameter_name: "___fc_layer_1__.wbias"
} }
layers { layers {
name: "__kmax_sequence_score_layer_2__" name: "__kmax_seq_score_layer_2__"
type: "kmax_seq_score" type: "kmax_seq_score"
active_type: "" active_type: ""
inputs { inputs {
...@@ -111,7 +111,7 @@ layers { ...@@ -111,7 +111,7 @@ layers {
input_layer_name: "sentence_scores" input_layer_name: "sentence_scores"
} }
inputs { inputs {
input_layer_name: "__kmax_sequence_score_layer_0__" input_layer_name: "__kmax_seq_score_layer_0__"
} }
inputs { inputs {
input_layer_name: "sentences_ids" input_layer_name: "sentences_ids"
...@@ -120,7 +120,7 @@ layers { ...@@ -120,7 +120,7 @@ layers {
input_layer_name: "__fc_layer_0__" input_layer_name: "__fc_layer_0__"
} }
inputs { inputs {
input_layer_name: "__kmax_sequence_score_layer_1__" input_layer_name: "__kmax_seq_score_layer_1__"
} }
inputs { inputs {
input_layer_name: "start_ids" input_layer_name: "start_ids"
...@@ -129,7 +129,7 @@ layers { ...@@ -129,7 +129,7 @@ layers {
input_layer_name: "__fc_layer_1__" input_layer_name: "__fc_layer_1__"
} }
inputs { inputs {
input_layer_name: "__kmax_sequence_score_layer_2__" input_layer_name: "__kmax_seq_score_layer_2__"
} }
inputs { inputs {
input_layer_name: "end_ids" input_layer_name: "end_ids"
...@@ -185,13 +185,13 @@ sub_models { ...@@ -185,13 +185,13 @@ sub_models {
name: "root" name: "root"
layer_names: "sentence_states" layer_names: "sentence_states"
layer_names: "sentence_scores" layer_names: "sentence_scores"
layer_names: "__kmax_sequence_score_layer_0__" layer_names: "__kmax_seq_score_layer_0__"
layer_names: "__sub_nested_seq_layer_0__" layer_names: "__sub_nested_seq_layer_0__"
layer_names: "__fc_layer_0__" layer_names: "__fc_layer_0__"
layer_names: "__kmax_sequence_score_layer_1__" layer_names: "__kmax_seq_score_layer_1__"
layer_names: "__seq_slice_layer_0__" layer_names: "__seq_slice_layer_0__"
layer_names: "__fc_layer_1__" layer_names: "__fc_layer_1__"
layer_names: "__kmax_sequence_score_layer_2__" layer_names: "__kmax_seq_score_layer_2__"
layer_names: "sentences_ids" layer_names: "sentences_ids"
layer_names: "start_ids" layer_names: "start_ids"
layer_names: "end_ids" layer_names: "end_ids"
......
...@@ -17,7 +17,7 @@ layers { ...@@ -17,7 +17,7 @@ layers {
bias_parameter_name: "___fc_layer_0__.wbias" bias_parameter_name: "___fc_layer_0__.wbias"
} }
layers { layers {
name: "__kmax_sequence_score_layer_0__" name: "__kmax_seq_score_layer_0__"
type: "kmax_seq_score" type: "kmax_seq_score"
active_type: "" active_type: ""
inputs { inputs {
...@@ -46,14 +46,14 @@ parameters { ...@@ -46,14 +46,14 @@ parameters {
initial_smart: false initial_smart: false
} }
input_layer_names: "input_seq" input_layer_names: "input_seq"
output_layer_names: "__kmax_sequence_score_layer_0__" output_layer_names: "__kmax_seq_score_layer_0__"
sub_models { sub_models {
name: "root" name: "root"
layer_names: "input_seq" layer_names: "input_seq"
layer_names: "__fc_layer_0__" layer_names: "__fc_layer_0__"
layer_names: "__kmax_sequence_score_layer_0__" layer_names: "__kmax_seq_score_layer_0__"
input_layer_names: "input_seq" input_layer_names: "input_seq"
output_layer_names: "__kmax_sequence_score_layer_0__" output_layer_names: "__kmax_seq_score_layer_0__"
is_recurrent_layer_group: false is_recurrent_layer_group: false
} }
...@@ -7,14 +7,14 @@ beam_size = 5 ...@@ -7,14 +7,14 @@ beam_size = 5
# the first beam expansion. # the first beam expansion.
sentence_states = data_layer(name="sentence_states", size=32) sentence_states = data_layer(name="sentence_states", size=32)
sentence_scores = data_layer(name="sentence_scores", size=1) sentence_scores = data_layer(name="sentence_scores", size=1)
topk_sentence_ids = kmax_sequence_score_layer( topk_sentence_ids = kmax_seq_score_layer(
input=sentence_scores, beam_size=beam_size) input=sentence_scores, beam_size=beam_size)
# the second beam expansion. # the second beam expansion.
topk_sen = sub_nested_seq_layer( topk_sen = sub_nested_seq_layer(
input=sentence_states, selected_indices=topk_sentence_ids) input=sentence_states, selected_indices=topk_sentence_ids)
start_pos_scores = fc_layer(input=topk_sen, size=1, act=LinearActivation()) start_pos_scores = fc_layer(input=topk_sen, size=1, act=LinearActivation())
topk_start_pos_ids = kmax_sequence_score_layer( topk_start_pos_ids = kmax_seq_score_layer(
input=sentence_scores, beam_size=beam_size) input=sentence_scores, beam_size=beam_size)
# the final beam expansion. # the final beam expansion.
...@@ -22,7 +22,7 @@ topk_start_spans = seq_slice_layer( ...@@ -22,7 +22,7 @@ topk_start_spans = seq_slice_layer(
input=topk_sen, starts=topk_start_pos_ids, ends=None) input=topk_sen, starts=topk_start_pos_ids, ends=None)
end_pos_scores = fc_layer( end_pos_scores = fc_layer(
input=topk_start_spans, size=1, act=LinearActivation()) input=topk_start_spans, size=1, act=LinearActivation())
topk_end_pos_ids = kmax_sequence_score_layer( topk_end_pos_ids = kmax_seq_score_layer(
input=end_pos_scores, beam_size=beam_size) input=end_pos_scores, beam_size=beam_size)
# define the cost # define the cost
......
...@@ -4,6 +4,6 @@ from paddle.trainer_config_helpers import * ...@@ -4,6 +4,6 @@ from paddle.trainer_config_helpers import *
data = data_layer(name="input_seq", size=128) data = data_layer(name="input_seq", size=128)
scores = fc_layer(input=data, size=1, act=ExpActivation()) scores = fc_layer(input=data, size=1, act=ExpActivation())
kmax_seq_id = kmax_sequence_score_layer(input=scores, beam_size=5) kmax_seq_id = kmax_seq_score_layer(input=scores, beam_size=5)
outputs(kmax_seq_id) outputs(kmax_seq_id)
...@@ -78,6 +78,8 @@ def init(**kwargs): ...@@ -78,6 +78,8 @@ def init(**kwargs):
if 'use_gpu' in kwargs: if 'use_gpu' in kwargs:
cp.g_command_config_args['use_gpu'] = kwargs['use_gpu'] cp.g_command_config_args['use_gpu'] = kwargs['use_gpu']
if 'use_mkldnn' in kwargs:
cp.g_command_config_args['use_mkldnn'] = kwargs['use_mkldnn']
assert 'parallel_nn' not in kwargs, ("currently 'parallel_nn' is not " assert 'parallel_nn' not in kwargs, ("currently 'parallel_nn' is not "
"supported in v2 APIs.") "supported in v2 APIs.")
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册