提交 253b19ed 编写于 作者: W wanghaoshuang

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_warning

...@@ -21,7 +21,6 @@ addons: ...@@ -21,7 +21,6 @@ addons:
- python - python
- python-pip - python-pip
- python2.7-dev - python2.7-dev
- python-numpy
- python-wheel - python-wheel
- libboost-dev - libboost-dev
- curl - curl
...@@ -35,8 +34,8 @@ before_install: ...@@ -35,8 +34,8 @@ before_install:
- if [[ "$JOB" == "check_style" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi - if [[ "$JOB" == "check_style" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi
# Paddle is using protobuf 3.1 currently. Protobuf 3.2 breaks the compatibility. So we specify the python # Paddle is using protobuf 3.1 currently. Protobuf 3.2 breaks the compatibility. So we specify the python
# protobuf version. # protobuf version.
- pip install -r $TRAVIS_BUILD_DIR/python/requirements.txt - sudo pip install -r $TRAVIS_BUILD_DIR/python/requirements.txt
- pip install wheel sphinx==1.5.6 recommonmark sphinx-rtd-theme==0.1.9 virtualenv pre-commit LinkChecker - sudo pip install wheel sphinx==1.5.6 recommonmark sphinx-rtd-theme==0.1.9 virtualenv pre-commit LinkChecker
- curl https://glide.sh/get | bash - curl https://glide.sh/get | bash
- eval "$(GIMME_GO_VERSION=1.8.3 gimme)" - eval "$(GIMME_GO_VERSION=1.8.3 gimme)"
- go get -u github.com/alecthomas/gometalinter - go get -u github.com/alecthomas/gometalinter
......
...@@ -65,8 +65,8 @@ if(NOT CMAKE_BUILD_TYPE) ...@@ -65,8 +65,8 @@ if(NOT CMAKE_BUILD_TYPE)
endif() endif()
if(ANDROID) if(ANDROID)
if(${CMAKE_SYSTEM_VERSION} VERSION_LESS "21") if(${CMAKE_SYSTEM_VERSION} VERSION_LESS "16")
message(FATAL_ERROR "Unsupport standalone toolchains with Android API level lower than 21") message(FATAL_ERROR "Unsupport standalone toolchains with Android API level lower than 16")
endif() endif()
set(WITH_GPU OFF CACHE STRING set(WITH_GPU OFF CACHE STRING
......
...@@ -86,12 +86,13 @@ def layer.fc(X): ...@@ -86,12 +86,13 @@ def layer.fc(X):
We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example: We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example:
```
| C++ functions/functors | mul | add | | | | C++ functions/functors | mul | add | | |
|------------------------|--------------|--------------|-------------|----------|
| C++ operator class | mulOp | addOp | FCOp | | | C++ operator class | mulOp | addOp | FCOp | |
| Python binding | operator.mul | operator.add | operator.fc | | | Python binding | operator.mul | operator.add | operator.fc | |
| Python function | | | | layer.fc | | Python function | | | | layer.fc |
```
This is how we differentiate layer and operators in PaddlePaddle: This is how we differentiate layer and operators in PaddlePaddle:
......
# Design Doc: Operation Graph Based Parameter Server
## Abstract
We propose an approach to implement the parameter server. In this
approach, there is no fundamental difference between the trainer and
the parameter server: they both run subgraphs, but subgraphs of
different purposes.
## Background
The previous implementations of the parameter server does not run a
subgraph. parameter initialization, optimizer computation, network
communication and checkpointing are implemented twice on both the
trainer and the parameter server.
It would be great if we can write code once and use them on both the
trainer and the parameter server: reduces code duplication and
improves extensibility. Given that after the current refactor, we are
representing everything as a computing graph on the
trainer. Representing everything as a computing graph on the parameter
server becomes a natural extension.
## Design
### Graph Converter
The *graph converter* converts the user-defined operation (OP) graph
into subgraphs to be scheduled on different nodes with the following
steps:
1. OP placement: the OPs will be placed on different nodes according
to heuristic that minimizes estimated total computation
time. Currently we will use a simple heuristic that puts parameter
varable on parameter server workers and everything else on trainer
workers.
1. Add communication OPs to enable the communication between nodes.
We will need these OPs: *Send*, *Recv*, *Enqueue*, *Dequeue*.
Below is an example of converting the user defined graph to the
subgraphs for the trainer and the parameter server:
<img src="src/local-graph.png" width="300"/>
After converting:
<img src="src/dist-graph.png" width="700"/>
1. The parameter variable W and it's optimizer subgraph are placed on the parameter server.
1. Operators are added to the subgraphs.
- *Send* sends data to the connected *Recv* operator. The
scheduler on the receive node will only schedule *Recv* operator
to run when the *Send* operator has ran (the *Send* OP will mark
the *Recv* OP runnable automatically).
- *Enueue* enqueues the input variable, it can block until space
become available in the queue.
- *Dequeue* outputs configurable numbers of tensors from the
queue. It will block until the queue have the required number of
tensors.
### Benefits
- Model parallelism become easier to implement: it's an extension to
the trainer - parameter server approach. we already have the
communication OPs, but need to extend the graph converter's
placement functionality.
- User-defined optimizer is easier to add - user can now express it as
a subgraph.
- No more duplication logic inside the trainer and the parameter
server mentioned in the background section.
### Challenges
- It might be hard for the graph converter to cut a general graph
(without any hint for which subgraph is the optimizer). We may need
to label which subgraph inside the OP graph is the optimizer.
- It's important to balance the parameter shards of on multiple
parameter server. If a single parameter is very big (some
word-embedding, fully connected, softmax layer), we need to
automatically partition the single parameter onto different
parameter servers when possible (only element-wise optimizer depends
on the parameter variable).
### Discussion
- In the "Aync SGD" figure, the "W" variable on the parameter server
could be read and wrote concurrently, what is our locking strategy?
E.g., each variable have a lock cpp method to be invoked by every
OP, or, have a lock OP.
- Can the Enqueue OP be implemented under our current tensor design
(puts the input tensor into the queue tensor)?
- *Dequeue* OP will have variable numbers of output (depends on the
`min_count` attribute), does our current design support it? (similar
question for the *Add* OP)
### References:
[1] [TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf)
...@@ -45,7 +45,19 @@ class GreaterThanChecker { ...@@ -45,7 +45,19 @@ class GreaterThanChecker {
public: public:
explicit GreaterThanChecker(T lower_bound) : lower_bound_(lower_bound) {} explicit GreaterThanChecker(T lower_bound) : lower_bound_(lower_bound) {}
void operator()(T& value) const { void operator()(T& value) const {
PADDLE_ENFORCE(value > lower_bound_, "larger_than check fail"); PADDLE_ENFORCE(value > lower_bound_, "larger_than check fails.");
}
private:
T lower_bound_;
};
template <typename T>
class EqualGreaterThanChecker {
public:
explicit EqualGreaterThanChecker(T lower_bound) : lower_bound_(lower_bound) {}
void operator()(T& value) const {
PADDLE_ENFORCE_GE(value, lower_bound_, "equal_larger_than check fails.");
} }
private: private:
...@@ -115,6 +127,11 @@ class TypedAttrChecker { ...@@ -115,6 +127,11 @@ class TypedAttrChecker {
return *this; return *this;
} }
TypedAttrChecker& EqualGreaterThan(const T& lower_bound) {
value_checkers_.push_back(EqualGreaterThanChecker<T>(lower_bound));
return *this;
}
// we can add more common limits, like LessThan(), Between()... // we can add more common limits, like LessThan(), Between()...
TypedAttrChecker& SetDefault(const T& default_value) { TypedAttrChecker& SetDefault(const T& default_value) {
......
...@@ -2,20 +2,20 @@ ...@@ -2,20 +2,20 @@
## Motivation ## Motivation
In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass. In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the gradient operators/expressions together with the chain rule. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
## Backward Operator Registry ## Backward Operator Registry
A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients. A backward network is built up with several backward operators. Backward operators take forward operators' inputs outputs, and output gradients and then calculate its input gradients.
| | forward operator | backward operator | | forward operator | backward operator
| ---------------------- | ---------------- |------------------------- | | ---------------------- | ---------------- |------------------------- |
| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | | **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients |
| **Operator::outputs_** | Outputs | InputGradients | | **Operator::outputs_** | Outputs | InputGradients |
In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced. In most cases, there is a one-to-one correspondence between the forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.
For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro: For example, we have got a `mul_op`, and we can register its information and corresponding backward operator by the following macro:
```cpp ```cpp
REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
...@@ -27,17 +27,17 @@ REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); ...@@ -27,17 +27,17 @@ REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
## Backward Opeartor Creating ## Backward Opeartor Creating
Given a certain forward operator, we can get its corresponding backward opeartor by calling: Given a certain forward operator, we can get its corresponding backward operator by calling:
```cpp ```cpp
OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op); OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op);
``` ```
The function `BuildGradOp` will sequentially execute following processes: The function `BuildGradOp` will sequentially execute following processes:
1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`. 1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.
2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these are not necessary for gradient computing. 2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these, are not necessary for gradient computing.
3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`. 3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.
...@@ -49,31 +49,31 @@ A backward network is a series of backward operators. The main idea of building ...@@ -49,31 +49,31 @@ A backward network is a series of backward operators. The main idea of building
In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network. In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network.
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`. given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`, `InputGradients`.
1. Op 1. Op
when the input forward network is a Op, return its gradient Operator Immediately. when the input forward network is an Op, return its gradient Operator Immediately.
2. NetOp 2. NetOp
when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to forward NetOp. when the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp.
**shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwirte their shared input variable. **shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwrite their shared input variable.
<p align="center"> <p align="center">
<img src="./images/duplicate_op.png" width="70%" ><br/> <img src="./images/duplicate_op.png" width="50%" ><br/>
1. shared variable in two operators. 1. Shared variable in operators.
</p> </p>
Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator replace the overwirte links. Share variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator replace the overwrite links.
<p align="center"> <p align="center">
<img src="images/duplicate_op2.png" width="90%" ><br/> <img src="images/duplicate_op2.png" width="50%" ><br/>
2. replace shared variable gradient with `Add` Operator 2. Replace shared variable's gradient with `Add` operator.
</p> </p>
......
...@@ -283,5 +283,14 @@ std::ostream& operator<<(std::ostream& os, const DDim& ddim) { ...@@ -283,5 +283,14 @@ std::ostream& operator<<(std::ostream& os, const DDim& ddim) {
DDim::DDim(std::initializer_list<int64_t> init_list) { DDim::DDim(std::initializer_list<int64_t> init_list) {
*this = make_ddim(init_list); *this = make_ddim(init_list);
} }
DDim flatten_to_2d(const DDim& src, int num_col_dims) {
int rank = src.size();
return make_ddim({product(slice_ddim(src, 0, num_col_dims)),
product(slice_ddim(src, num_col_dims, rank))});
}
DDim flatten_to_1d(const DDim& src) { return make_ddim({product(src)}); }
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -115,6 +115,12 @@ int arity(const DDim& ddim); ...@@ -115,6 +115,12 @@ int arity(const DDim& ddim);
std::ostream& operator<<(std::ostream&, const DDim&); std::ostream& operator<<(std::ostream&, const DDim&);
// Reshape a tensor to a matrix. The matrix's first dimension(column length)
// will be the product of tensor's first `num_col_dims` dimensions.
DDim flatten_to_2d(const DDim& src, int num_col_dims);
DDim flatten_to_1d(const DDim& src);
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
......
...@@ -63,20 +63,35 @@ struct EigenTensor { ...@@ -63,20 +63,35 @@ struct EigenTensor {
template <typename T, int MajorType = Eigen::RowMajor, template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex> typename IndexType = Eigen::DenseIndex>
struct EigenMatrix : public EigenTensor<T, 2, MajorType, IndexType> {}; struct EigenMatrix : public EigenTensor<T, 2, MajorType, IndexType> {
static typename EigenMatrix::Type Reshape(Tensor& tensor, int num_col_dims) {
int rank = tensor.dims_.size();
PADDLE_ENFORCE(num_col_dims > 0 && num_col_dims < rank,
"`num_col_dims` must be between (0, rank_of_tensor).");
return EigenMatrix::From(tensor,
flatten_to_2d(tensor.dims(), num_col_dims));
}
static typename EigenMatrix::ConstType Reshape(const Tensor& tensor,
int num_col_dims) {
int rank = tensor.dims_.size();
PADDLE_ENFORCE(num_col_dims > 0 && num_col_dims < rank,
"`num_col_dims` must be between (0, rank_of_tensor).");
return EigenMatrix::From(tensor,
flatten_to_2d(tensor.dims(), num_col_dims));
}
};
template <typename T, int MajorType = Eigen::RowMajor, template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex> typename IndexType = Eigen::DenseIndex>
struct EigenVector : public EigenTensor<T, 1, MajorType, IndexType> { struct EigenVector : public EigenTensor<T, 1, MajorType, IndexType> {
// Flatten reshapes a Tensor into an EigenVector. // Flatten reshapes a Tensor into an EigenVector.
static typename EigenVector::Type Flatten(Tensor& tensor) { static typename EigenVector::Type Flatten(Tensor& tensor) {
return EigenVector::From( return EigenVector::From(tensor, {product(tensor.dims_)});
tensor, make_ddim({static_cast<int>(product(tensor.dims_))}));
} }
static typename EigenVector::ConstType Flatten(const Tensor& tensor) { static typename EigenVector::ConstType Flatten(const Tensor& tensor) {
return EigenVector::From( return EigenVector::From(tensor, {product(tensor.dims_)});
tensor, make_ddim({static_cast<int>(product(tensor.dims_))}));
} }
}; };
......
...@@ -108,5 +108,24 @@ TEST(Eigen, Matrix) { ...@@ -108,5 +108,24 @@ TEST(Eigen, Matrix) {
} }
} }
TEST(Eigen, MatrixReshape) {
Tensor t;
float* p = t.mutable_data<float>({2, 3, 6, 4}, platform::CPUPlace());
for (int i = 0; i < 2 * 3 * 6 * 4; ++i) {
p[i] = static_cast<float>(i);
}
EigenMatrix<float>::Type em = EigenMatrix<float>::Reshape(t, 2);
ASSERT_EQ(2 * 3, em.dimension(0));
ASSERT_EQ(6 * 4, em.dimension(1));
for (int i = 0; i < 2 * 3; i++) {
for (int j = 0; j < 6 * 4; j++) {
ASSERT_NEAR(i * 6 * 4 + j, em(i, j), 1e-6f);
}
}
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -43,6 +43,9 @@ class Tensor { ...@@ -43,6 +43,9 @@ class Tensor {
template <typename T, size_t D, int MajorType, typename IndexType> template <typename T, size_t D, int MajorType, typename IndexType>
friend struct EigenTensor; friend struct EigenTensor;
template <typename T, int MajorType, typename IndexType>
friend struct EigenMatrix;
template <typename T, int MajorType, typename IndexType> template <typename T, int MajorType, typename IndexType>
friend struct EigenVector; friend struct EigenVector;
......
...@@ -148,5 +148,13 @@ inline Tensor& Tensor::Resize(const DDim& dims) { ...@@ -148,5 +148,13 @@ inline Tensor& Tensor::Resize(const DDim& dims) {
inline const DDim& Tensor::dims() const { return dims_; } inline const DDim& Tensor::dims() const { return dims_; }
template <typename T>
inline Tensor ReshapeToMatrix(const Tensor& src, int num_col_dims) {
Tensor res;
res.ShareDataWith<T>(src);
res.Resize(flatten_to_2d(src.dims(), num_col_dims));
return res;
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -262,3 +262,16 @@ TEST(Tensor, CopyFrom) { ...@@ -262,3 +262,16 @@ TEST(Tensor, CopyFrom) {
} }
#endif #endif
} }
TEST(Tensor, ReshapeToMatrix) {
using namespace paddle::framework;
using namespace paddle::platform;
Tensor src;
int* src_ptr = src.mutable_data<int>({2, 3, 4, 9}, CPUPlace());
for (int i = 0; i < 2 * 3 * 4 * 9; ++i) {
src_ptr[i] = i;
}
Tensor res = ReshapeToMatrix<int>(src, 2);
ASSERT_EQ(res.dims()[0], 2 * 3);
ASSERT_EQ(res.dims()[1], 4 * 9);
}
\ No newline at end of file
...@@ -62,14 +62,18 @@ void BatchNormBaseLayer::calFeatureMapSize() { ...@@ -62,14 +62,18 @@ void BatchNormBaseLayer::calFeatureMapSize() {
const ImageConfig& conf = config_.inputs(0).image_conf(); const ImageConfig& conf = config_.inputs(0).image_conf();
imageH_ = inputLayers_[0]->getOutput().getFrameHeight(); imageH_ = inputLayers_[0]->getOutput().getFrameHeight();
imageW_ = inputLayers_[0]->getOutput().getFrameWidth(); imageW_ = inputLayers_[0]->getOutput().getFrameWidth();
imageD_ = inputLayers_[0]->getOutput().getFrameDepth();
if (0 == imageD_) imageD_ = conf.img_size_z();
if (imageH_ == 0 && imageW_ == 0) { if (imageH_ == 0 && imageW_ == 0) {
imageH_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size(); imageH_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size();
imageW_ = conf.img_size(); imageW_ = conf.img_size();
} else { } else {
getOutput().setFrameHeight(imageH_); getOutput().setFrameHeight(imageH_);
getOutput().setFrameWidth(imageW_); getOutput().setFrameWidth(imageW_);
getOutput().setFrameDepth(imageD_);
} }
imgPixels_ = imageH_ * imageW_; imgPixels_ = imageH_ * imageW_ * imageD_;
} }
} // namespace paddle } // namespace paddle
...@@ -80,6 +80,7 @@ protected: ...@@ -80,6 +80,7 @@ protected:
/// Height or width of input image feature. /// Height or width of input image feature.
/// Both of them are 1 if the input is fully-connected layer. /// Both of them are 1 if the input is fully-connected layer.
int imageD_;
int imageH_; int imageH_;
int imageW_; int imageW_;
/// Height * Width. /// Height * Width.
......
...@@ -37,7 +37,7 @@ bool CudnnBatchNormLayer::init(const LayerMap& layerMap, ...@@ -37,7 +37,7 @@ bool CudnnBatchNormLayer::init(const LayerMap& layerMap,
} }
void CudnnBatchNormLayer::reshape(int batchSize) { void CudnnBatchNormLayer::reshape(int batchSize) {
hl_tensor_reshape(ioDesc_, batchSize, channels_, imageH_, imageW_); hl_tensor_reshape(ioDesc_, batchSize, channels_, imageH_ * imageD_, imageW_);
} }
void CudnnBatchNormLayer::forward(PassType passType) { void CudnnBatchNormLayer::forward(PassType passType) {
...@@ -104,7 +104,7 @@ void CudnnBatchNormLayer::forward(PassType passType) { ...@@ -104,7 +104,7 @@ void CudnnBatchNormLayer::forward(PassType passType) {
EPS, EPS,
batchSize, batchSize,
channels_, channels_,
imageH_, imageH_ * imageD_,
imageW_); imageW_);
} }
} }
......
...@@ -24,10 +24,12 @@ bool SwitchOrderLayer::init(const LayerMap& layerMap, ...@@ -24,10 +24,12 @@ bool SwitchOrderLayer::init(const LayerMap& layerMap,
/* Initialize the basic parent class */ /* Initialize the basic parent class */
Layer::init(layerMap, parameterMap); Layer::init(layerMap, parameterMap);
auto& img_conf = config_.inputs(0).image_conf(); auto& img_conf = config_.inputs(0).image_conf();
size_t inD = img_conf.img_size_z();
size_t inH = size_t inH =
img_conf.has_img_size_y() ? img_conf.img_size_y() : img_conf.img_size(); img_conf.has_img_size_y() ? img_conf.img_size_y() : img_conf.img_size();
size_t inW = img_conf.img_size(); size_t inW = img_conf.img_size();
size_t inC = img_conf.channels(); size_t inC = img_conf.channels();
inH = inH * inD;
inDims_ = TensorShape({0, inC, inH, inW}); inDims_ = TensorShape({0, inC, inH, inW});
outDims_ = TensorShape(4); outDims_ = TensorShape(4);
...@@ -64,9 +66,10 @@ void SwitchOrderLayer::setInDims() { ...@@ -64,9 +66,10 @@ void SwitchOrderLayer::setInDims() {
MatrixPtr input = inputLayers_[0]->getOutputValue(); MatrixPtr input = inputLayers_[0]->getOutputValue();
size_t batchSize = input->getHeight(); size_t batchSize = input->getHeight();
inDims_.setDim(0, batchSize); inDims_.setDim(0, batchSize);
int d = inputLayers_[0]->getOutput().getFrameDepth();
d = (d == 0 ? 1 : d);
int h = inputLayers_[0]->getOutput().getFrameHeight(); int h = inputLayers_[0]->getOutput().getFrameHeight();
if (h != 0) inDims_.setDim(2, h); if (h != 0) inDims_.setDim(2, h * d);
int w = inputLayers_[0]->getOutput().getFrameWidth(); int w = inputLayers_[0]->getOutput().getFrameWidth();
if (w != 0) inDims_.setDim(3, w); if (w != 0) inDims_.setDim(3, w);
int totalCount = input->getElementCnt(); int totalCount = input->getElementCnt();
......
...@@ -1703,6 +1703,55 @@ TEST(Layer, BatchNormalizationLayer) { ...@@ -1703,6 +1703,55 @@ TEST(Layer, BatchNormalizationLayer) {
#endif #endif
} }
void testBatchNorm3DLayer(const string& type, bool trans, bool useGpu) {
TestConfig config;
const int CHANNELS = 10;
const int IMG_SIZE = 16;
const int IMG_SIZE_Y = 8;
const int IMG_SIZE_Z = 8;
size_t size = CHANNELS * IMG_SIZE * IMG_SIZE_Y * IMG_SIZE_Z;
config.layerConfig.set_type(type);
config.layerConfig.set_size(size);
config.layerConfig.set_active_type("sigmoid");
config.biasSize = CHANNELS;
config.inputDefs.push_back({INPUT_DATA,
"layer_0",
/* dim= */ size,
/* paraSize= */ CHANNELS});
config.inputDefs.push_back({INPUT_DATA, "layer_1_running_mean", 1, CHANNELS});
config.inputDefs.back().isStatic = true;
config.inputDefs.push_back({INPUT_DATA, "layer_2_running_var", 1, CHANNELS});
config.inputDefs.back().isStatic = true;
LayerInputConfig* input = config.layerConfig.add_inputs();
config.layerConfig.add_inputs();
config.layerConfig.add_inputs();
ImageConfig* img_conf = input->mutable_image_conf();
img_conf->set_channels(CHANNELS);
img_conf->set_img_size(IMG_SIZE);
img_conf->set_img_size_y(IMG_SIZE_Y);
img_conf->set_img_size_z(IMG_SIZE_Z);
testLayerGrad(config,
"batch_norm",
64,
/* trans= */ trans,
useGpu,
/* useWeight */ true);
}
TEST(Layer, testBatchNorm3DLayer) {
testBatchNorm3DLayer("batch_norm", false, false);
#ifndef PADDLE_ONLY_CPU
testBatchNorm3DLayer("batch_norm", false, true);
if (hl_get_cudnn_lib_version() >= int(4000)) {
testBatchNorm3DLayer("cudnn_batch_norm", false, true);
}
#endif
}
void testConvOperator(bool isDeconv) { void testConvOperator(bool isDeconv) {
TestConfig config; TestConfig config;
const int NUM_FILTERS = 16; const int NUM_FILTERS = 16;
......
...@@ -18,17 +18,20 @@ ...@@ -18,17 +18,20 @@
namespace paddle { namespace paddle {
namespace operators { namespace operators {
// identity is a alias of scale op. This is also a example for creating a alias // The identity operator is an alias of the scale operator. This is also an
// operator. // example for creating an alias for an existing operator.
template <typename AttrType> template <typename AttrType>
class IdentityOpMaker : public framework::OpProtoAndCheckerMaker { class IdentityOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
IdentityOpMaker(framework::OpProto *proto, IdentityOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "input tensor of identity op"); AddInput("X", "The input tensor of identity operator.");
AddOutput("Out", "output tensor of identity op"); AddOutput("Out", "The output tensor of identity operator.");
AddComment("identity operator. Just a alias of scale op which scale = 1.0"); AddComment(R"DOC(
The identity operator is an alias of the scale operator
with the attribute scale fixed to 1.0.
)DOC");
} }
}; };
......
...@@ -71,8 +71,12 @@ void testIm2col() { ...@@ -71,8 +71,12 @@ void testIm2col() {
context = context =
new paddle::platform::CPUDeviceContext(paddle::platform::CPUPlace()); new paddle::platform::CPUDeviceContext(paddle::platform::CPUPlace());
} else { } else {
#ifndef PADDLE_ONLY_CPU
context = context =
new paddle::platform::CUDADeviceContext(paddle::platform::GPUPlace()); new paddle::platform::CUDADeviceContext(paddle::platform::GPUPlace());
#else
PADDLE_THROW("no GPU support");
#endif // PADDLE_ONLY_CPU
} }
im2col(input, output_cfo, stride, stride, padding, padding, context); im2col(input, output_cfo, stride, stride, padding, padding, context);
im2col_ocf(input, output_ocf, stride, stride, padding, padding, context); im2col_ocf(input, output_ocf, stride, stride, padding, padding, context);
......
...@@ -25,18 +25,27 @@ class MulOp : public framework::OperatorWithKernel { ...@@ -25,18 +25,27 @@ class MulOp : public framework::OperatorWithKernel {
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(const framework::InferShapeContext &ctx) const override {
auto dim0 = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx.Input<Tensor>("X")->dims();
auto dim1 = ctx.Input<Tensor>("Y")->dims(); auto y_dims = ctx.Input<Tensor>("Y")->dims();
PADDLE_ENFORCE_EQ(dim0.size(), 2, int x_num_col_dims = Attr<int>("x_num_col_dims");
"input X(%s) should be a tensor with 2 dims, a matrix", int y_num_col_dims = Attr<int>("y_num_col_dims");
ctx.op().Input("X"));
PADDLE_ENFORCE_EQ(dim1.size(), 2, PADDLE_ENFORCE(x_dims.size() > x_num_col_dims,
"input Y(%s) should be a tensor with 2 dims, a matrix", "The rank of input tensor X(%s) should be larger than "
ctx.op().Input("Y")); "`mul_op`'s `x_num_col_dims`.",
ctx.op().Input("X"));
PADDLE_ENFORCE(y_dims.size() > y_num_col_dims,
"The rank of input tensor Y(%s) should be larger than "
"`mul_op`'s `y_num_col_dims`.",
ctx.op().Input("Y"));
auto x_mat_dims = framework::flatten_to_2d(x_dims, x_num_col_dims);
auto y_mat_dims = framework::flatten_to_2d(y_dims, y_num_col_dims);
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
dim0[1], dim1[0], x_mat_dims[1], y_mat_dims[0],
"First matrix's width must be equal with second matrix's height."); "First matrix's width must be equal with second matrix's height.");
ctx.Output<Tensor>("Out")->Resize({dim0[0], dim1[1]}); ctx.Output<Tensor>("Out")->Resize({x_mat_dims[0], y_mat_dims[1]});
} }
}; };
...@@ -47,6 +56,23 @@ class MulOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -47,6 +56,23 @@ class MulOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput("X", "The first input of mul op"); AddInput("X", "The first input of mul op");
AddInput("Y", "The second input of mul op"); AddInput("Y", "The second input of mul op");
AddOutput("Out", "The output of mul op"); AddOutput("Out", "The output of mul op");
AddAttr<int>(
"x_num_col_dims",
R"DOC(mul_op can take tensors with more than two dimensions as input `X`,
in that case, tensors will be reshaped to a matrix. The matrix's first
dimension(column length) will be the product of tensor's last
`num_col_dims` dimensions, and the matrix's second dimension(row length)
will be the product of tensor's first `rank - num_col_dims` dimensions.
)DOC")
.SetDefault(1)
.EqualGreaterThan(1);
AddAttr<int>(
"y_num_col_dims",
R"DOC(mul_op can take tensors with more than two dimensions as input `Y`,
in that case, tensors will be reshaped to a matrix. Just like input `X`.
)DOC")
.SetDefault(1)
.EqualGreaterThan(1);
AddComment(R"DOC( AddComment(R"DOC(
Two Element Mul Operator. Two Element Mul Operator.
...@@ -70,10 +96,20 @@ class MulOpGrad : public framework::OperatorWithKernel { ...@@ -70,10 +96,20 @@ class MulOpGrad : public framework::OperatorWithKernel {
auto out_dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims(); auto out_dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims();
auto *x_grad = ctx.Output<Tensor>(framework::GradVarName("X")); auto *x_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto *y_grad = ctx.Output<Tensor>(framework::GradVarName("Y")); auto *y_grad = ctx.Output<Tensor>(framework::GradVarName("Y"));
PADDLE_ENFORCE(x_dims[0] == out_dims[0],
"Out@GRAD M X N must equal to X dims 0, M "); auto x_mat_dims =
PADDLE_ENFORCE(y_dims[1] == out_dims[1], framework::flatten_to_2d(x_dims, Attr<int>("x_num_col_dims"));
"Out@GRAD M X N must equal to Y dims 1, N "); auto y_mat_dims =
framework::flatten_to_2d(y_dims, Attr<int>("y_num_col_dims"));
PADDLE_ENFORCE_EQ(
x_mat_dims[0], out_dims[0],
"The first dimension of Out@GRAD must equal to the first dimension of "
"the first operand.");
PADDLE_ENFORCE_EQ(
y_mat_dims[1], out_dims[1],
"The second dimension of Out@GRAD must equal to the second "
"dimension of the second operand.");
if (x_grad) x_grad->Resize(x_dims); if (x_grad) x_grad->Resize(x_dims);
if (y_grad) y_grad->Resize(y_dims); if (y_grad) y_grad->Resize(y_dims);
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. You may not use this file except in compliance with the License.
You may obtain a copy of the License at You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
...@@ -31,13 +31,25 @@ template <typename Place, typename T> ...@@ -31,13 +31,25 @@ template <typename Place, typename T>
class MulKernel : public framework::OpKernel { class MulKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* x = context.Input<Tensor>("X"); const Tensor* x = context.Input<Tensor>("X");
auto* y = context.Input<Tensor>("Y"); const Tensor* y = context.Input<Tensor>("Y");
auto* z = context.Output<Tensor>("Out"); Tensor* z = context.Output<Tensor>("Out");
const Tensor x_matrix =
x->dims().size() > 2
? framework::ReshapeToMatrix<T>(
*x, context.template Attr<int>("x_num_col_dims"))
: *x;
const Tensor y_matrix =
y->dims().size() > 2
? framework::ReshapeToMatrix<T>(
*y, context.template Attr<int>("y_num_col_dims"))
: *y;
z->mutable_data<T>(context.GetPlace()); z->mutable_data<T>(context.GetPlace());
auto* device_context = auto* device_context =
const_cast<platform::DeviceContext*>(context.device_context_); const_cast<platform::DeviceContext*>(context.device_context_);
math::matmul<Place, T>(*x, false, *y, false, 1, z, 0, device_context); math::matmul<Place, T>(x_matrix, false, y_matrix, false, 1, z, 0,
device_context);
} }
}; };
...@@ -45,23 +57,39 @@ template <typename Place, typename T> ...@@ -45,23 +57,39 @@ template <typename Place, typename T>
class MulGradKernel : public framework::OpKernel { class MulGradKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X"); int x_num_col_dims = ctx.template Attr<int>("x_num_col_dims");
auto* y = ctx.Input<Tensor>("Y"); int y_num_col_dims = ctx.template Attr<int>("y_num_col_dims");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out")); const Tensor* x = ctx.Input<Tensor>("X");
const Tensor* y = ctx.Input<Tensor>("Y");
const Tensor x_matrix =
x->dims().size() > 2 ? framework::ReshapeToMatrix<T>(*x, x_num_col_dims)
: *x;
const Tensor y_matrix =
y->dims().size() > 2 ? framework::ReshapeToMatrix<T>(*y, y_num_col_dims)
: *y;
const Tensor* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X")); Tensor* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y")); Tensor* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
auto* device_context = auto* device_context =
const_cast<platform::DeviceContext*>(ctx.device_context_); const_cast<platform::DeviceContext*>(ctx.device_context_);
if (dx) { if (dx) {
dx->mutable_data<T>(ctx.GetPlace()); dx->mutable_data<T>(ctx.GetPlace());
Tensor dx_matrix = dx->dims().size() > 2 ? framework::ReshapeToMatrix<T>(
*dx, x_num_col_dims)
: *dx;
// dx = dout * y'. dx: M x K, dout : M x N, y : K x N // dx = dout * y'. dx: M x K, dout : M x N, y : K x N
math::matmul<Place, T>(*dout, false, *y, true, 1, dx, 0, device_context); math::matmul<Place, T>(*dout, false, y_matrix, true, 1, &dx_matrix, 0,
device_context);
} }
if (dy) { if (dy) {
dy->mutable_data<T>(ctx.GetPlace()); dy->mutable_data<T>(ctx.GetPlace());
Tensor dy_matrix = dy->dims().size() > 2 ? framework::ReshapeToMatrix<T>(
*dy, y_num_col_dims)
: *dy;
// dy = x' * dout. dy K x N, dout : M x N, x : M x K // dy = x' * dout. dy K x N, dout : M x N, x : M x K
math::matmul<Place, T>(*x, true, *dout, false, 1, dy, 0, device_context); math::matmul<Place, T>(x_matrix, true, *dout, false, 1, &dy_matrix, 0,
device_context);
} }
} }
}; };
......
...@@ -25,14 +25,19 @@ class RowwiseAddOp : public framework::OperatorWithKernel { ...@@ -25,14 +25,19 @@ class RowwiseAddOp : public framework::OperatorWithKernel {
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(const framework::InferShapeContext &ctx) const override {
auto dim0 = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx.Input<Tensor>("X")->dims();
auto dim1 = ctx.Input<Tensor>("b")->dims(); auto b_dims = ctx.Input<Tensor>("b")->dims();
PADDLE_ENFORCE_GT(
PADDLE_ENFORCE(dim0.size() == 2, "Input 0 must be matrix"); x_dims.size(), b_dims.size(),
PADDLE_ENFORCE(dim1.size() == 1, "The second input must be vector"); "The rank of input `X` must be larger than the one of input `b`.");
PADDLE_ENFORCE(dim0[1] == dim1[0], "The width of two input must be same");
PADDLE_ENFORCE(ctx.OutputSize("Out") == 1, "The output size must be 1"); int num_col_dims = x_dims.size() - b_dims.size();
ctx.Output<Tensor>("Out")->Resize(ctx.Input<Tensor>("X")->dims());
PADDLE_ENFORCE_EQ(
framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims,
"The width of two operands must be same");
PADDLE_ENFORCE_EQ(ctx.OutputSize("Out"), 1, "The output size must be 1");
ctx.Output<Tensor>("Out")->Resize(x_dims);
} }
}; };
...@@ -61,13 +66,20 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel { ...@@ -61,13 +66,20 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("b"), "b should not be null"); PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("b"), "b should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null"); "Input(Out@GRAD) should not be null");
auto dims0 = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx.Input<Tensor>("X")->dims();
auto dims1 = ctx.Input<Tensor>("b")->dims(); auto b_dims = ctx.Input<Tensor>("b")->dims();
PADDLE_ENFORCE_EQ(1, dims1.size(), "b dims should be 1") PADDLE_ENFORCE_GT(
x_dims.size(), b_dims.size(),
"The rank of input `X` must be larger than the one of input `b`.");
int num_col_dims = x_dims.size() - b_dims.size();
PADDLE_ENFORCE_EQ(
framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims,
"The width of two operands must be same");
auto *dx = ctx.Output<Tensor>(framework::GradVarName("X")); auto *dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto *db = ctx.Output<Tensor>(framework::GradVarName("b")); auto *db = ctx.Output<Tensor>(framework::GradVarName("b"));
if (dx) dx->Resize(dims0); if (dx) dx->Resize(x_dims);
if (db) db->Resize(dims1); if (db) db->Resize(b_dims);
} }
}; };
......
...@@ -33,10 +33,12 @@ class RowwiseAddKernel : public framework::OpKernel { ...@@ -33,10 +33,12 @@ class RowwiseAddKernel : public framework::OpKernel {
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto out = context.Output<Tensor>("Out"); auto out = context.Output<Tensor>("Out");
out->mutable_data<T>(context.GetPlace()); out->mutable_data<T>(context.GetPlace());
int num_col_dims = context.Input<Tensor>("X")->dims().size() -
auto input = EigenMatrix<T>::From(*context.Input<Tensor>("X")); context.Input<Tensor>("b")->dims().size();
auto bias = EigenVector<T>::From(*context.Input<Tensor>("b")); auto input =
auto output = EigenMatrix<T>::From(*out); EigenMatrix<T>::Reshape(*context.Input<Tensor>("X"), num_col_dims);
auto bias = EigenVector<T>::Flatten(*context.Input<Tensor>("b"));
auto output = EigenMatrix<T>::Reshape(*out, num_col_dims);
const int bias_size = bias.dimension(0); const int bias_size = bias.dimension(0);
const int rest_size = input.size() / bias_size; const int rest_size = input.size() / bias_size;
...@@ -54,12 +56,15 @@ class RowwiseAddGradKernel : public framework::OpKernel { ...@@ -54,12 +56,15 @@ class RowwiseAddGradKernel : public framework::OpKernel {
auto* dout = context.Input<Tensor>(framework::GradVarName("Out")); auto* dout = context.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = context.Output<Tensor>(framework::GradVarName("X")); auto* dx = context.Output<Tensor>(framework::GradVarName("X"));
auto* db = context.Output<Tensor>(framework::GradVarName("b")); auto* db = context.Output<Tensor>(framework::GradVarName("b"));
int num_col_dims = context.Input<Tensor>("X")->dims().size() -
context.Input<Tensor>("b")->dims().size();
auto out_grad = EigenMatrix<T>::From(*dout); auto out_grad = EigenMatrix<T>::Reshape(*dout, num_col_dims);
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
if (dx) { if (dx) {
dx->mutable_data<T>(context.GetPlace()); dx->mutable_data<T>(context.GetPlace());
EigenMatrix<T>::From(*dx).device(place) = out_grad; EigenMatrix<T>::Reshape(*dx, num_col_dims).device(place) = out_grad;
} }
if (db) { if (db) {
......
...@@ -44,11 +44,13 @@ class ScaleOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -44,11 +44,13 @@ class ScaleOpMaker : public framework::OpProtoAndCheckerMaker {
The equation is: Out = scale*X The equation is: Out = scale*X
)DOC"); )DOC");
AddAttr<AttrType>("scale", "scale of scale operator.").SetDefault(1.0); AddAttr<AttrType>("scale", "The scaling factor of the scale operator.")
.SetDefault(1.0);
} }
}; };
// Scale Op's gradient is scale op, too. // The operator to calculate gradients of a scale operator is just the scale
// operator itself.
// Grad(Out=scale(X)) => Grad(X) = scale(Grad(Out)) // Grad(Out=scale(X)) => Grad(X) = scale(Grad(Out))
template <typename AttrType> template <typename AttrType>
class ScaleGradOp : public NetOp { class ScaleGradOp : public NetOp {
......
...@@ -51,7 +51,7 @@ the other dimensions in the K-dimensional vector input. Then the ratio of the ...@@ -51,7 +51,7 @@ the other dimensions in the K-dimensional vector input. Then the ratio of the
exponential of the given dimension and the sum of exponential values of all exponential of the given dimension and the sum of exponential values of all
the other dimensions is the output of the softmax operator. the other dimensions is the output of the softmax operator.
For each row `i` and each column `j` in X, we have: For each row `i` and each column `j` in input X, we have:
Y[i, j] = exp(X[i, j]) / sum_j(exp(X[i, j])) Y[i, j] = exp(X[i, j]) / sum_j(exp(X[i, j]))
)DOC"); )DOC");
...@@ -64,14 +64,15 @@ class SoftmaxOpGrad : public framework::OperatorWithKernel { ...@@ -64,14 +64,15 @@ class SoftmaxOpGrad : public framework::OperatorWithKernel {
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(const framework::InferShapeContext &ctx) const override {
PADDLE_ENFORCE(ctx.InputVar("Y") != nullptr, "Input(Y) should not be null"); PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), "Input(Y) should be not null.");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Y")), PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Y")),
"Input(Y@GRAD) should not be null"); "Input(Y@GRAD) should be not null.");
PADDLE_ENFORCE(ctx.Input<Tensor>("Y")->dims() == PADDLE_ENFORCE_EQ(ctx.Input<Tensor>("Y")->dims(),
ctx.Input<Tensor>(framework::GradVarName("Y"))->dims(), ctx.Input<Tensor>(framework::GradVarName("Y"))->dims(),
"the shape of Input(0) and Input(1) should be the same"); "Input(Y) and its gradients should have a same shape.");
ctx.Output<Tensor>(framework::GradVarName("X")) ctx.Output<Tensor>(framework::GradVarName("X"))
->Resize(ctx.Input<Tensor>("Y")->dims()); ->Resize(ctx.Input<Tensor>("X")->dims());
} }
}; };
......
...@@ -28,12 +28,12 @@ template <typename Place, typename T> ...@@ -28,12 +28,12 @@ template <typename Place, typename T>
class SoftmaxKernel : public framework::OpKernel { class SoftmaxKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto input = context.Input<Tensor>("X"); auto X = context.Input<Tensor>("X");
auto output = context.Output<Tensor>("Y"); auto Y = context.Output<Tensor>("Y");
output->mutable_data<T>(context.GetPlace()); Y->mutable_data<T>(context.GetPlace());
auto logits = EigenMatrix<T>::From(*input); auto logits = EigenMatrix<T>::From(*X);
auto softmax = EigenMatrix<T>::From(*output); auto softmax = EigenMatrix<T>::From(*Y);
const int kBatchDim = 0; const int kBatchDim = 0;
const int kClassDim = 1; const int kClassDim = 1;
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/top_k_op.h"
namespace paddle {
namespace operators {
class TopkOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"),
"Input of TopkOP must be initialized.");
auto *input = ctx.Input<framework::Tensor>("X");
const int k = static_cast<int>(ctx.Attr<int>("k"));
PADDLE_ENFORCE_GE(k, 1, "k must >= 1");
PADDLE_ENFORCE_GE(input->dims().size(), 1, "input must have >= 1d shape");
PADDLE_ENFORCE_GE(input->dims()[input->dims().size() - 1], k,
"input must have >= k columns");
framework::DDim dims = input->dims();
dims[dims.size() - 1] = k;
ctx.Output<Tensor>("Out")->Resize(dims);
ctx.Output<Tensor>("Indices")->Resize(dims);
}
};
class TopkOpMaker : public framework::OpProtoAndCheckerMaker {
public:
TopkOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of Topk op");
AddOutput("Out", "The output tensor of Topk op");
AddOutput("Indices", "The indices of Topk elements of input");
AddComment(
R"DOC(If the input is a vector (1d tensor), finds the k largest entries in the vector and outputs their values and indices as vectors. Thus values[j] is the j-th largest entry in input, and its index is indices[j].
For matrices, computes the top k entries in each row. )DOC");
AddAttr<int>("k",
"Number of top elements to look for along the last "
"dimension (along each row for matrices).")
.SetDefault(1);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(top_k, ops::TopkOp, ops::TopkOpMaker);
REGISTER_OP_CPU_KERNEL(top_k,
ops::TopkKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/op_registry.h"
#include "paddle/platform/assert.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T>
struct Pair {
__device__ __forceinline__ Pair() {}
__device__ __forceinline__ Pair(T value, int id) : v(value), id(id) {}
__device__ __forceinline__ void set(T value, int id) {
v = value;
id = id;
}
__device__ __forceinline__ void operator=(const Pair<T>& in) {
v = in.v;
id = in.id;
}
__device__ __forceinline__ bool operator<(const T value) const {
return (v < value);
}
__device__ __forceinline__ bool operator<(const Pair<T>& in) const {
return (v < in.v) || ((v == in.v) && (id > in.id));
}
__device__ __forceinline__ bool operator>(const Pair<T>& in) const {
return (v > in.v) || ((v == in.v) && (id < in.id));
}
T v;
int id;
};
template <typename T>
__device__ __forceinline__ void AddTo(Pair<T> topk[], const Pair<T>& p,
int beam_size) {
for (int k = beam_size - 2; k >= 0; k--) {
if (topk[k] < p) {
topk[k + 1] = topk[k];
} else {
topk[k + 1] = p;
return;
}
}
topk[0] = p;
}
template <typename T, int beam_size>
__device__ __forceinline__ void AddTo(Pair<T> topk[], const Pair<T>& p) {
for (int k = beam_size - 2; k >= 0; k--) {
if (topk[k] < p) {
topk[k + 1] = topk[k];
} else {
topk[k + 1] = p;
return;
}
}
topk[0] = p;
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* src, int idx,
int dim, int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < src[idx]) {
Pair<T> tmp(src[idx], idx);
AddTo<T>(topk, tmp, beam_size);
}
idx += BlockSize;
}
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* src, int idx,
int dim, const Pair<T>& max,
int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < src[idx]) {
Pair<T> tmp(src[idx], idx);
if (tmp < max) {
AddTo<T>(topk, tmp, beam_size);
}
}
idx += BlockSize;
}
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* val, int* col,
int idx, int dim, int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < val[idx]) {
Pair<T> tmp(val[idx], col[idx]);
AddTo<T>(topk, tmp, beam_size);
}
idx += BlockSize;
}
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* val, int* col,
int idx, int dim, const Pair<T>& max,
int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < val[idx]) {
Pair<T> tmp(val[idx], col[idx]);
if (tmp < max) {
AddTo<T>(topk, tmp, beam_size);
}
}
idx += BlockSize;
}
}
template <typename T, int MaxLength, int BlockSize>
__device__ __forceinline__ void ThreadGetTopK(Pair<T> topk[], int& beam,
int beam_size, const T* src,
bool& firstStep, bool& is_empty,
Pair<T>& max, int dim,
const int tid) {
if (beam > 0) {
int length = beam < beam_size ? beam : beam_size;
if (firstStep) {
firstStep = false;
GetTopK<T, BlockSize>(topk, src, tid, dim, length);
} else {
for (int k = 0; k < MaxLength; k++) {
if (k < MaxLength - beam) {
topk[k] = topk[k + beam];
} else {
topk[k].set(-INFINITY, -1);
}
}
if (!is_empty) {
GetTopK<T, BlockSize>(topk + MaxLength - beam, src, tid, dim, max,
length);
}
}
max = topk[MaxLength - 1];
if (max.v == -1) is_empty = true;
beam = 0;
}
}
template <typename T, int MaxLength, int BlockSize>
__device__ __forceinline__ void ThreadGetTopK(Pair<T> topk[], int& beam,
int beam_size, const T* val,
int* col, bool& firstStep,
bool& is_empty, Pair<T>& max,
int dim, const int tid) {
if (beam > 0) {
int length = beam < beam_size ? beam : beam_size;
if (firstStep) {
firstStep = false;
GetTopK<T, BlockSize>(topk, val, col, tid, dim, length);
} else {
for (int k = 0; k < MaxLength; k++) {
if (k < MaxLength - beam) {
topk[k] = topk[k + beam];
} else {
topk[k].set(-INFINITY, -1);
}
}
if (!is_empty) {
GetTopK<T, BlockSize>(topk + MaxLength - beam, val, col, tid, dim, max,
length);
}
}
max = topk[MaxLength - 1];
if (max.v == -1) is_empty = true;
beam = 0;
}
}
template <typename T, int MaxLength, int BlockSize>
__device__ __forceinline__ void BlockReduce(Pair<T>* sh_topk, int* maxid,
Pair<T> topk[], T** topVal,
int** topIds, int& beam, int& k,
const int tid, const int warp) {
while (true) {
__syncthreads();
if (tid < BlockSize / 2) {
if (sh_topk[tid] < sh_topk[tid + BlockSize / 2]) {
maxid[tid] = tid + BlockSize / 2;
} else {
maxid[tid] = tid;
}
}
__syncthreads();
for (int stride = BlockSize / 4; stride > 0; stride = stride / 2) {
if (tid < stride) {
if (sh_topk[maxid[tid]] < sh_topk[maxid[tid + stride]]) {
maxid[tid] = maxid[tid + stride];
}
}
__syncthreads();
}
__syncthreads();
if (tid == 0) {
**topVal = sh_topk[maxid[0]].v;
**topIds = sh_topk[maxid[0]].id;
(*topVal)++;
(*topIds)++;
}
if (tid == maxid[0]) beam++;
if (--k == 0) break;
__syncthreads();
if (tid == maxid[0]) {
if (beam < MaxLength) {
sh_topk[tid] = topk[beam];
}
}
if (maxid[0] / 32 == warp) {
if (__shfl(beam, (maxid[0]) % 32, 32) == MaxLength) break;
}
}
}
/**
* Each block compute one sample.
* In a block:
* 1. every thread get top MaxLength value;
* 2. merge to sh_topk, block reduce and get max value;
* 3. go to the second setp, until one thread's topk value is null;
* 4. go to the first setp, until get the topk value.
*/
template <typename T, int MaxLength, int BlockSize>
__global__ void KeMatrixTopK(T* output, int output_stride, int* indices,
const T* src, int lds, int dim, int k) {
__shared__ Pair<T> sh_topk[BlockSize];
__shared__ int maxid[BlockSize / 2];
const int tid = threadIdx.x;
const int warp = threadIdx.x / 32;
output += blockIdx.x * output_stride;
indices += blockIdx.x * k;
Pair<T> topk[MaxLength];
int beam = MaxLength;
Pair<T> max;
bool is_empty = false;
bool firststep = true;
for (int k = 0; k < MaxLength; k++) {
topk[k].set(-INFINITY, -1);
}
while (k) {
ThreadGetTopK<T, MaxLength, BlockSize>(topk, beam, k,
src + blockIdx.x * lds, firststep,
is_empty, max, dim, tid);
sh_topk[tid] = topk[0];
BlockReduce<T, MaxLength, BlockSize>(sh_topk, maxid, topk, &output,
&indices, beam, k, tid, warp);
}
}
template <typename T>
class TopkOpCUDAKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"It must use GPUPlace.");
auto* input = ctx.Input<Tensor>("X");
auto* output = ctx.Output<Tensor>("Out");
auto* indices = ctx.Output<Tensor>("Indices");
size_t k = static_cast<int>(ctx.Attr<int>("k"));
const T* input_data = input->data<T>();
T* output_data = output->mutable_data<T>(ctx.GetPlace());
// FIXME(typhoonzero): data is always converted to type T?
int* indices_data = indices->mutable_data<int>(ctx.GetPlace());
size_t input_height = input->dims()[0];
size_t input_width = input->dims()[1];
if (k > input_width) k = input_width;
// NOTE: pass lds and dim same to input width.
// NOTE: old matrix implementation of stride is different to eigen.
// TODO(typhoonzero): launch kernel on specified stream.
// TODO(typhoonzero): refine this kernel.
dim3 threads(256, 1);
dim3 grid(input_height, 1);
KeMatrixTopK<T, 5, 256><<<grid, threads>>>(
output_data, output->dims()[1], indices_data, input_data, input_width,
input_width, int(k));
}
};
} // namespace operators
} // namespace paddle
REGISTER_OP_GPU_KERNEL(top_k, paddle::operators::TopkOpCUDAKernel<float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <algorithm>
#include <iostream>
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
template <typename Place, typename T>
class TopkKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
// Get the top k elements of each row of input tensor
// FIXME: only deal with matrix(2d tensor).
auto* input = ctx.Input<Tensor>("X");
auto* output = ctx.Output<Tensor>("Out");
auto* indices = ctx.Output<Tensor>("Indices");
// k is determined by Attr
const size_t k = static_cast<int>(ctx.Attr<int>("k"));
T* output_data = output->mutable_data<T>(ctx.GetPlace());
T* indices_data = indices->mutable_data<T>(ctx.GetPlace());
auto eg_input = EigenMatrix<T>::From(*input);
// reshape input to a flattern matrix(like flat_inner_dims)
framework::DDim inputdims = input->dims();
const size_t row = framework::product(
framework::slice_ddim(inputdims, 0, inputdims.size() - 1));
const size_t col = inputdims[inputdims.size() - 1];
Eigen::DSizes<int, 2> flat2dims(row, col);
// NOTE: eigen shape doesn't affect paddle tensor.
eg_input.reshape(flat2dims);
for (size_t i = 0; i < row; i++) {
std::vector<std::pair<T, size_t>> vec;
for (size_t j = 0; j < col; j++) {
vec.push_back(std::pair<T, size_t>(eg_input(i, j), j));
}
std::partial_sort(
vec.begin(), vec.begin() + k, vec.end(),
[](const std::pair<T, size_t>& l, const std::pair<T, size_t>& r) {
return l.first > r.first;
});
for (size_t j = 0; j < k; j++) {
output_data[i * k + j] = vec[j].first;
indices_data[i * k + j] = vec[j].second;
}
}
}
};
} // namespace operators
} // namespace paddle
...@@ -49,6 +49,7 @@ USE_OP(minus); ...@@ -49,6 +49,7 @@ USE_OP(minus);
USE_OP(cos_sim); USE_OP(cos_sim);
USE_CPU_ONLY_OP(gather); USE_CPU_ONLY_OP(gather);
USE_CPU_ONLY_OP(scatter); USE_CPU_ONLY_OP(scatter);
USE_OP(top_k);
USE_OP(squared_l2_distance); USE_OP(squared_l2_distance);
namespace paddle { namespace paddle {
......
...@@ -37,7 +37,7 @@ Configuring cmake in /paddle/build ... ...@@ -37,7 +37,7 @@ Configuring cmake in /paddle/build ...
-DWITH_PYTHON=${WITH_PYTHON:-ON} -DWITH_PYTHON=${WITH_PYTHON:-ON}
-DWITH_SWIG_PY=${WITH_SWIG_PY:-ON} -DWITH_SWIG_PY=${WITH_SWIG_PY:-ON}
-DCUDNN_ROOT=/usr/ -DCUDNN_ROOT=/usr/
-DWITH_STYLE_CHECK=${WITH_STYLE_CHECK:-OFF} -DWITH_STYLE_CHECK=${WITH_STYLE_CHECK:-ON}
-DWITH_TESTING=${WITH_TESTING:-ON} -DWITH_TESTING=${WITH_TESTING:-ON}
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
======================================== ========================================
......
...@@ -320,6 +320,9 @@ void loadFileList(const std::string& fileListFileName, ...@@ -320,6 +320,9 @@ void loadFileList(const std::string& fileListFileName,
} }
double getMemoryUsage() { double getMemoryUsage() {
#if defined(__ANDROID__)
return 0.0;
#else
FILE* fp = fopen("/proc/meminfo", "r"); FILE* fp = fopen("/proc/meminfo", "r");
CHECK(fp) << "failed to fopen /proc/meminfo"; CHECK(fp) << "failed to fopen /proc/meminfo";
size_t bufsize = 256 * sizeof(char); size_t bufsize = 256 * sizeof(char);
...@@ -357,6 +360,7 @@ double getMemoryUsage() { ...@@ -357,6 +360,7 @@ double getMemoryUsage() {
delete[] buf; delete[] buf;
double usedMem = 1.0 - 1.0 * (freeMem + bufMem + cacheMem) / totalMem; double usedMem = 1.0 - 1.0 * (freeMem + bufMem + cacheMem) / totalMem;
return usedMem; return usedMem;
#endif
} }
SyncThreadPool* getGlobalSyncThreadPool() { SyncThreadPool* getGlobalSyncThreadPool() {
......
...@@ -33,6 +33,13 @@ limitations under the License. */ ...@@ -33,6 +33,13 @@ limitations under the License. */
#include "Flags.h" #include "Flags.h"
#include "hl_gpu.h" #include "hl_gpu.h"
#if defined(__ANDROID__) && (__ANDROID_API__ < 21)
inline int rand_r(unsigned int* seedp) {
(void)seedp;
return rand();
}
#endif
/** /**
* Loop over the elements in a container * Loop over the elements in a container
* TODO(yuyang18): It's this foreach useful? Why not use C++ 11 foreach, * TODO(yuyang18): It's this foreach useful? Why not use C++ 11 foreach,
......
...@@ -271,6 +271,7 @@ message ImageConfig { ...@@ -271,6 +271,7 @@ message ImageConfig {
// The size of input feature map. // The size of input feature map.
required uint32 img_size = 8; required uint32 img_size = 8;
optional uint32 img_size_y = 9; optional uint32 img_size_y = 9;
optional uint32 img_size_z = 10 [ default = 1 ];
} }
message PriorBoxConfig { message PriorBoxConfig {
...@@ -519,6 +520,7 @@ message LayerConfig { ...@@ -519,6 +520,7 @@ message LayerConfig {
// for HuberRegressionLoss // for HuberRegressionLoss
optional double delta = 57 [ default = 1.0 ]; optional double delta = 57 [ default = 1.0 ];
// for 3D data
optional uint64 depth = 58 [ default = 1 ]; optional uint64 depth = 58 [ default = 1 ];
// for switch order layer // for switch order layer
......
...@@ -1332,6 +1332,12 @@ def parse_image(image, input_layer_name, image_conf): ...@@ -1332,6 +1332,12 @@ def parse_image(image, input_layer_name, image_conf):
get_img_size(input_layer_name, image_conf.channels) get_img_size(input_layer_name, image_conf.channels)
def parse_image3d(image, input_layer_name, image_conf):
image_conf.channels = image.channels
image_conf.img_size, image_conf.img_size_y, image_conf.img_size_z = \
get_img3d_size(input_layer_name, image_conf.channels)
def parse_norm(norm, input_layer_name, norm_conf): def parse_norm(norm, input_layer_name, norm_conf):
norm_conf.norm_type = norm.norm_type norm_conf.norm_type = norm.norm_type
config_assert( config_assert(
...@@ -2365,9 +2371,11 @@ class BatchNormLayer(LayerBase): ...@@ -2365,9 +2371,11 @@ class BatchNormLayer(LayerBase):
name, name,
inputs, inputs,
bias=True, bias=True,
img3D=False,
use_global_stats=True, use_global_stats=True,
moving_average_fraction=0.9, moving_average_fraction=0.9,
batch_norm_type=None, batch_norm_type=None,
mean_var_names=None,
**xargs): **xargs):
if inputs is None: if inputs is None:
inputs = [] inputs = []
...@@ -2409,24 +2417,69 @@ class BatchNormLayer(LayerBase): ...@@ -2409,24 +2417,69 @@ class BatchNormLayer(LayerBase):
input_layer = self.get_input_layer(0) input_layer = self.get_input_layer(0)
image_conf = self.config.inputs[0].image_conf image_conf = self.config.inputs[0].image_conf
parse_image(self.inputs[0].image, input_layer.name, image_conf) if img3D:
parse_image3d(self.inputs[0].image, input_layer.name, image_conf)
# Only pass the width and height of input to batch_norm layer # Only pass the width and height of input to batch_norm layer
# when either of it is non-zero. # when either of it is non-zero.
if input_layer.width != 0 or input_layer.height != 0: if input_layer.width != 0 or input_layer.height != 0:
self.set_cnn_layer(name, image_conf.img_size_y, image_conf.img_size, self.set_cnn_layer(
image_conf.channels, False) input_layer_name=name,
depth=image_conf.img_size_z,
height=image_conf.img_size_y,
width=image_conf.img_size,
channels=image_conf.channels,
is_print=True)
else:
self.set_layer_size(input_layer.size)
else: else:
self.set_layer_size(input_layer.size) parse_image(self.inputs[0].image, input_layer.name, image_conf)
# Only pass the width and height of input to batch_norm layer
# when either of it is non-zero.
if input_layer.width != 0 or input_layer.height != 0:
self.set_cnn_layer(
input_layer_name=name,
height=image_conf.img_size_y,
width=image_conf.img_size,
channels=image_conf.channels,
is_print=True)
else:
self.set_layer_size(input_layer.size)
psize = self.calc_parameter_size(image_conf) psize = self.calc_parameter_size(image_conf)
dims = [1, psize] dims = [1, psize]
if mean_var_names is not None:
assert len(mean_var_names) == 2
self.inputs[1].parameter_name = mean_var_names[0]
self.inputs[2].parameter_name = mean_var_names[1]
self.create_input_parameter(0, psize) self.create_input_parameter(0, psize)
self.create_input_parameter(1, psize, dims) self.create_input_parameter(1, psize, dims)
self.create_input_parameter(2, psize, dims) self.create_input_parameter(2, psize, dims)
self.create_bias_parameter(bias, psize) self.create_bias_parameter(bias, psize)
def set_cnn_layer(self,
input_layer_name,
depth=None,
height=None,
width=None,
channels=None,
is_print=True):
depthIsNone = False
if depth is None:
depth = 1
depthIsNone = True
size = depth * height * width * channels
self.set_layer_size(size)
self.set_layer_height_width(height, width)
self.set_layer_depth(depth)
if is_print and depthIsNone:
print("output for %s: c = %d, h = %d, w = %d, size = %d" %
(input_layer_name, channels, height, width, size))
elif is_print:
print("output for %s: c = %d, d = %d, h = %d, w = %d, size = %d" %
(input_layer_name, channels, depth, height, width, size))
def calc_parameter_size(self, image_conf): def calc_parameter_size(self, image_conf):
return image_conf.channels return image_conf.channels
...@@ -2688,9 +2741,20 @@ class AddToLayer(LayerBase): ...@@ -2688,9 +2741,20 @@ class AddToLayer(LayerBase):
super(AddToLayer, self).__init__( super(AddToLayer, self).__init__(
name, 'addto', 0, inputs=inputs, **xargs) name, 'addto', 0, inputs=inputs, **xargs)
config_assert(len(inputs) > 0, 'inputs cannot be empty for AddToLayer') config_assert(len(inputs) > 0, 'inputs cannot be empty for AddToLayer')
for input_index in xrange(len(self.inputs)):
input_layer = self.get_input_layer(input_index) if len(self.inputs) > 1:
self.set_layer_size(input_layer.size) for input_index in xrange(len(self.inputs)):
assert self.get_input_layer(0).height == self.get_input_layer(
input_index).height
assert self.get_input_layer(0).width == self.get_input_layer(
input_index).width
assert self.get_input_layer(0).depth == self.get_input_layer(
input_index).depth
self.set_layer_size(self.get_input_layer(0).size)
self.set_layer_height_width(self.get_input_layer(0).height, \
self.get_input_layer(0).width)
self.set_layer_depth(self.get_input_layer(0).depth)
self.create_bias_parameter(bias, self.config.size) self.create_bias_parameter(bias, self.config.size)
...@@ -3370,11 +3434,20 @@ class ConcatenateLayer(LayerBase): ...@@ -3370,11 +3434,20 @@ class ConcatenateLayer(LayerBase):
name, 'concat', 0, inputs=inputs, **xargs) name, 'concat', 0, inputs=inputs, **xargs)
size = 0 size = 0
for input_index in xrange(len(self.inputs)): for input_index in xrange(len(self.inputs)):
assert self.get_input_layer(0).height == self.get_input_layer(
input_index).height
assert self.get_input_layer(0).width == self.get_input_layer(
input_index).width
assert self.get_input_layer(0).depth == self.get_input_layer(
input_index).depth
input_layer = self.get_input_layer(input_index) input_layer = self.get_input_layer(input_index)
input = self.inputs[input_index] input = self.inputs[input_index]
if self.config.size == 0: if self.config.size == 0:
size += input_layer.size size += input_layer.size
self.set_layer_height_width(self.get_input_layer(0).height, \
self.get_input_layer(0).width)
self.set_layer_depth(self.get_input_layer(0).depth)
self.set_layer_size(size) self.set_layer_size(size)
......
...@@ -354,6 +354,10 @@ class LayerOutput(object): ...@@ -354,6 +354,10 @@ class LayerOutput(object):
def height(self): def height(self):
return cp.g_layer_map[self.full_name].height return cp.g_layer_map[self.full_name].height
@property
def depth(self):
return cp.g_layer_map[self.full_name].depth
def set_input(self, input): def set_input(self, input):
""" """
Set the input for a memory layer. Can only be used for memory layer Set the input for a memory layer. Can only be used for memory layer
...@@ -943,7 +947,7 @@ def data_layer(name, size, depth=None, height=None, width=None, ...@@ -943,7 +947,7 @@ def data_layer(name, size, depth=None, height=None, width=None,
if height is not None and width is not None: if height is not None and width is not None:
num_filters = size / (width * height * depth) num_filters = size / (width * height * depth)
assert num_filters * width * height * depth == size, \ assert num_filters * width * height * depth == size, \
"size=%s width=%s height=%s depth=%s" % (size, width, height, depth) "size=%s width=%s height=%s depth=%s" % (size, width, height, depth)
return LayerOutput(name, LayerType.DATA, size=size, num_filters=num_filters) return LayerOutput(name, LayerType.DATA, size=size, num_filters=num_filters)
...@@ -2953,13 +2957,15 @@ def img_cmrnorm_layer(input, ...@@ -2953,13 +2957,15 @@ def img_cmrnorm_layer(input,
def batch_norm_layer(input, def batch_norm_layer(input,
act=None, act=None,
name=None, name=None,
img3D=False,
num_channels=None, num_channels=None,
bias_attr=None, bias_attr=None,
param_attr=None, param_attr=None,
layer_attr=None, layer_attr=None,
batch_norm_type=None, batch_norm_type=None,
moving_average_fraction=0.9, moving_average_fraction=0.9,
use_global_stats=None): use_global_stats=None,
mean_var_names=None):
""" """
Batch Normalization Layer. The notation of this layer as follow. Batch Normalization Layer. The notation of this layer as follow.
...@@ -3026,6 +3032,8 @@ def batch_norm_layer(input, ...@@ -3026,6 +3032,8 @@ def batch_norm_layer(input,
:math:`runningMean = newMean*(1-factor) :math:`runningMean = newMean*(1-factor)
+ runningMean*factor` + runningMean*factor`
:type moving_average_fraction: float. :type moving_average_fraction: float.
:param mean_var_names: [mean name, variance name]
:type mean_var_names: string list
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
""" """
...@@ -3039,6 +3047,7 @@ def batch_norm_layer(input, ...@@ -3039,6 +3047,7 @@ def batch_norm_layer(input,
(batch_norm_type == "cudnn_batch_norm") (batch_norm_type == "cudnn_batch_norm")
l = Layer( l = Layer(
name=name, name=name,
img3D=img3D,
inputs=Input( inputs=Input(
input.name, image=Image(channels=num_channels), **param_attr.attr), input.name, image=Image(channels=num_channels), **param_attr.attr),
active_type=act.name, active_type=act.name,
...@@ -3047,6 +3056,7 @@ def batch_norm_layer(input, ...@@ -3047,6 +3056,7 @@ def batch_norm_layer(input,
bias=ParamAttr.to_bias(bias_attr), bias=ParamAttr.to_bias(bias_attr),
moving_average_fraction=moving_average_fraction, moving_average_fraction=moving_average_fraction,
use_global_stats=use_global_stats, use_global_stats=use_global_stats,
mean_var_names=mean_var_names,
**ExtraLayerAttribute.to_kwargs(layer_attr)) **ExtraLayerAttribute.to_kwargs(layer_attr))
return LayerOutput( return LayerOutput(
...@@ -6410,7 +6420,7 @@ def gated_unit_layer(input, ...@@ -6410,7 +6420,7 @@ def gated_unit_layer(input,
@wrap_name_default('switch_order') @wrap_name_default('switch_order')
def switch_order_layer(input, def switch_order_layer(input,
name=None, name=None,
reshape=None, reshape_axis=None,
act=None, act=None,
layer_attr=None): layer_attr=None):
""" """
...@@ -6421,8 +6431,9 @@ def switch_order_layer(input, ...@@ -6421,8 +6431,9 @@ def switch_order_layer(input,
The example usage is: The example usage is:
.. code-block:: python .. code-block:: python
reshape_axis = 3
switch = switch_order(input=layer, name='switch', reshape_axis=reshape_axis)
reshape = {'height':[ 0, 1, 2], 'width':[3]} reshape = {'height':[ 0, 1, 2], 'width':[3]}
switch = switch_order(input=layer, name='switch', reshape=reshape)
:param input: The input layer. :param input: The input layer.
:type input: LayerOutput :type input: LayerOutput
...@@ -6434,6 +6445,11 @@ def switch_order_layer(input, ...@@ -6434,6 +6445,11 @@ def switch_order_layer(input,
:rtype: LayerOutput :rtype: LayerOutput
""" """
assert isinstance(input, LayerOutput) assert isinstance(input, LayerOutput)
assert reshape_axis != None and (reshape_axis > 0 and reshape_axis < 4)
height = [ele for ele in xrange(reshape_axis)]
width = [ele for ele in range(reshape_axis, 4)]
reshape = {'height': height, 'width': width}
l = Layer( l = Layer(
name=name, name=name,
inputs=input.name, inputs=input.name,
......
...@@ -10,6 +10,6 @@ test_prelu_layer test_row_conv test_detection_output_layer test_multibox_loss_la ...@@ -10,6 +10,6 @@ test_prelu_layer test_row_conv test_detection_output_layer test_multibox_loss_la
test_recursive_topology test_gated_unit_layer test_clip_layer test_row_l2_norm_layer test_recursive_topology test_gated_unit_layer test_clip_layer test_row_l2_norm_layer
test_kmax_seq_socre_layer test_sub_nested_seq_select_layer test_scale_shift_layer test_kmax_seq_socre_layer test_sub_nested_seq_select_layer test_scale_shift_layer
test_seq_slice_layer test_cross_entropy_over_beam test_pooling3D_layer test_seq_slice_layer test_cross_entropy_over_beam test_pooling3D_layer
test_conv3d_layer test_deconv3d_layer) test_conv3d_layer test_deconv3d_layer test_BatchNorm3D)
export whole_configs=(test_split_datasource) export whole_configs=(test_split_datasource)
...@@ -62,6 +62,7 @@ layers { ...@@ -62,6 +62,7 @@ layers {
moving_average_fraction: 0.9 moving_average_fraction: 0.9
height: 227 height: 227
width: 227 width: 227
depth: 1
} }
layers { layers {
name: "__crmnorm_0__" name: "__crmnorm_0__"
......
...@@ -62,6 +62,7 @@ layers { ...@@ -62,6 +62,7 @@ layers {
moving_average_fraction: 0.9 moving_average_fraction: 0.9
height: 256 height: 256
width: 256 width: 256
depth: 1
} }
layers { layers {
name: "__crmnorm_0__" name: "__crmnorm_0__"
......
type: "nn"
layers {
name: "data3D"
type: "data"
size: 360
active_type: ""
height: 6
width: 20
depth: 3
}
layers {
name: "__batch_norm_0__"
type: "batch_norm"
size: 360
active_type: "relu"
inputs {
input_layer_name: "data3D"
input_parameter_name: "___batch_norm_0__.w0"
image_conf {
channels: 1
img_size: 20
img_size_y: 6
img_size_z: 3
}
}
inputs {
input_layer_name: "data3D"
input_parameter_name: "___batch_norm_0__.w1"
}
inputs {
input_layer_name: "data3D"
input_parameter_name: "___batch_norm_0__.w2"
}
bias_parameter_name: "___batch_norm_0__.wbias"
moving_average_fraction: 0.9
height: 6
width: 20
depth: 3
}
parameters {
name: "___batch_norm_0__.w0"
size: 1
initial_mean: 1.0
initial_std: 0.0
initial_strategy: 0
initial_smart: false
}
parameters {
name: "___batch_norm_0__.w1"
size: 1
initial_mean: 0.0
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
is_static: true
is_shared: true
}
parameters {
name: "___batch_norm_0__.w2"
size: 1
initial_mean: 0.0
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
is_static: true
is_shared: true
}
parameters {
name: "___batch_norm_0__.wbias"
size: 1
initial_mean: 0.0
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
}
input_layer_names: "data3D"
output_layer_names: "__batch_norm_0__"
sub_models {
name: "root"
layer_names: "data3D"
layer_names: "__batch_norm_0__"
input_layer_names: "data3D"
output_layer_names: "__batch_norm_0__"
is_recurrent_layer_group: false
}
...@@ -74,6 +74,9 @@ layers { ...@@ -74,6 +74,9 @@ layers {
inputs { inputs {
input_layer_name: "__bidirectional_gru_0___bw" input_layer_name: "__bidirectional_gru_0___bw"
} }
height: 0
width: 0
depth: 1
} }
parameters { parameters {
name: "___bidirectional_gru_0___fw_transform.w0" name: "___bidirectional_gru_0___fw_transform.w0"
......
...@@ -16,6 +16,9 @@ layers { ...@@ -16,6 +16,9 @@ layers {
inputs { inputs {
input_layer_name: "data" input_layer_name: "data"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_1__" name: "__addto_1__"
...@@ -28,6 +31,9 @@ layers { ...@@ -28,6 +31,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_0__" input_layer_name: "__addto_0__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_2__" name: "__addto_2__"
...@@ -40,6 +46,9 @@ layers { ...@@ -40,6 +46,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_1__" input_layer_name: "__addto_1__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_3__" name: "__addto_3__"
...@@ -52,6 +61,9 @@ layers { ...@@ -52,6 +61,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_2__" input_layer_name: "__addto_2__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_4__" name: "__addto_4__"
...@@ -64,6 +76,9 @@ layers { ...@@ -64,6 +76,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_3__" input_layer_name: "__addto_3__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_5__" name: "__addto_5__"
...@@ -76,6 +91,9 @@ layers { ...@@ -76,6 +91,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_4__" input_layer_name: "__addto_4__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_6__" name: "__addto_6__"
...@@ -88,6 +106,9 @@ layers { ...@@ -88,6 +106,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_5__" input_layer_name: "__addto_5__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_7__" name: "__addto_7__"
...@@ -100,6 +121,9 @@ layers { ...@@ -100,6 +121,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_6__" input_layer_name: "__addto_6__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_8__" name: "__addto_8__"
...@@ -112,6 +136,9 @@ layers { ...@@ -112,6 +136,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_7__" input_layer_name: "__addto_7__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_9__" name: "__addto_9__"
...@@ -124,6 +151,9 @@ layers { ...@@ -124,6 +151,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_8__" input_layer_name: "__addto_8__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_10__" name: "__addto_10__"
...@@ -136,6 +166,9 @@ layers { ...@@ -136,6 +166,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_9__" input_layer_name: "__addto_9__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_11__" name: "__addto_11__"
...@@ -148,6 +181,9 @@ layers { ...@@ -148,6 +181,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_10__" input_layer_name: "__addto_10__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_12__" name: "__addto_12__"
...@@ -160,6 +196,9 @@ layers { ...@@ -160,6 +196,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_11__" input_layer_name: "__addto_11__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_13__" name: "__addto_13__"
...@@ -172,6 +211,9 @@ layers { ...@@ -172,6 +211,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_12__" input_layer_name: "__addto_12__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_14__" name: "__addto_14__"
...@@ -184,6 +226,9 @@ layers { ...@@ -184,6 +226,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_13__" input_layer_name: "__addto_13__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_15__" name: "__addto_15__"
...@@ -196,6 +241,9 @@ layers { ...@@ -196,6 +241,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_14__" input_layer_name: "__addto_14__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_16__" name: "__addto_16__"
...@@ -208,6 +256,9 @@ layers { ...@@ -208,6 +256,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_15__" input_layer_name: "__addto_15__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_17__" name: "__addto_17__"
...@@ -220,6 +271,9 @@ layers { ...@@ -220,6 +271,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_16__" input_layer_name: "__addto_16__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_18__" name: "__addto_18__"
...@@ -232,6 +286,9 @@ layers { ...@@ -232,6 +286,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_17__" input_layer_name: "__addto_17__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_19__" name: "__addto_19__"
...@@ -244,6 +301,9 @@ layers { ...@@ -244,6 +301,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_18__" input_layer_name: "__addto_18__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_20__" name: "__addto_20__"
...@@ -256,6 +316,9 @@ layers { ...@@ -256,6 +316,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_19__" input_layer_name: "__addto_19__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_21__" name: "__addto_21__"
...@@ -268,6 +331,9 @@ layers { ...@@ -268,6 +331,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_20__" input_layer_name: "__addto_20__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_22__" name: "__addto_22__"
...@@ -280,6 +346,9 @@ layers { ...@@ -280,6 +346,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_21__" input_layer_name: "__addto_21__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_23__" name: "__addto_23__"
...@@ -292,6 +361,9 @@ layers { ...@@ -292,6 +361,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_22__" input_layer_name: "__addto_22__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_24__" name: "__addto_24__"
...@@ -304,6 +376,9 @@ layers { ...@@ -304,6 +376,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_23__" input_layer_name: "__addto_23__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_25__" name: "__addto_25__"
...@@ -316,6 +391,9 @@ layers { ...@@ -316,6 +391,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_24__" input_layer_name: "__addto_24__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_26__" name: "__addto_26__"
...@@ -328,6 +406,9 @@ layers { ...@@ -328,6 +406,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_25__" input_layer_name: "__addto_25__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_27__" name: "__addto_27__"
...@@ -340,6 +421,9 @@ layers { ...@@ -340,6 +421,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_26__" input_layer_name: "__addto_26__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_28__" name: "__addto_28__"
...@@ -352,6 +436,9 @@ layers { ...@@ -352,6 +436,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_27__" input_layer_name: "__addto_27__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_29__" name: "__addto_29__"
...@@ -364,6 +451,9 @@ layers { ...@@ -364,6 +451,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_28__" input_layer_name: "__addto_28__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_30__" name: "__addto_30__"
...@@ -376,6 +466,9 @@ layers { ...@@ -376,6 +466,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_29__" input_layer_name: "__addto_29__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_31__" name: "__addto_31__"
...@@ -388,6 +481,9 @@ layers { ...@@ -388,6 +481,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_30__" input_layer_name: "__addto_30__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__fc_layer_0__" name: "__fc_layer_0__"
......
...@@ -22,6 +22,9 @@ layers { ...@@ -22,6 +22,9 @@ layers {
inputs { inputs {
input_layer_name: "b" input_layer_name: "b"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__concat_0__" name: "__concat_0__"
...@@ -34,6 +37,9 @@ layers { ...@@ -34,6 +37,9 @@ layers {
inputs { inputs {
input_layer_name: "b" input_layer_name: "b"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__concat_1__" name: "__concat_1__"
......
from paddle.trainer_config_helpers import *
settings(batch_size=1000, learning_rate=1e-4)
#data = data_layer(name='data', size=180, width=30, height=6)
#batchNorm = batch_norm_layer(data, num_channels=1)
#outputs(batchNorm)
data3D = data_layer(name='data3D', size=120 * 3, width=20, height=6, depth=3)
batchNorm3D = batch_norm_layer(data3D, num_channels=1, img3D=True)
outputs(batchNorm3D)
...@@ -4,8 +4,8 @@ import paddle.v2.framework.proto.framework_pb2 as framework_pb2 ...@@ -4,8 +4,8 @@ import paddle.v2.framework.proto.framework_pb2 as framework_pb2
def get_all_op_protos(): def get_all_op_protos():
""" """
Get all registered op proto from Paddle C++ Get all registered op proto from PaddlePaddle C++ end.
:return: list of OpProto :return: A list of registered OpProto.
""" """
protostrs = core.get_all_op_protos() protostrs = core.get_all_op_protos()
ret_values = [] ret_values = []
...@@ -21,8 +21,8 @@ def is_str(s): ...@@ -21,8 +21,8 @@ def is_str(s):
class OpDescCreationMethod(object): class OpDescCreationMethod(object):
""" """
A Functor object to convert user input(use key word args) to OpDesc based on Convert the user's input(only keyword arguments are supported) to OpDesc
OpProto. based on the OpProto.
:param op_proto: The OpProto object. :param op_proto: The OpProto object.
:type op_proto: op_proto_pb2.OpProto :type op_proto: op_proto_pb2.OpProto
...@@ -30,17 +30,18 @@ class OpDescCreationMethod(object): ...@@ -30,17 +30,18 @@ class OpDescCreationMethod(object):
def __init__(self, op_proto): def __init__(self, op_proto):
if not isinstance(op_proto, framework_pb2.OpProto): if not isinstance(op_proto, framework_pb2.OpProto):
raise TypeError("Argument should be OpProto") raise TypeError(
"Type of op_proto should be OpProto in PaddlePaddle.")
self.__op_proto__ = op_proto self.__op_proto__ = op_proto
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
""" """
Convert user input to OpDesc. Only key-word args are supported. Convert user's input to OpDesc. Only keyword arguments are supported.
:return: OpDesc based on user input :return: The OpDesc based on user input.
:rtype: op_desc_pb2.OpDesc :rtype: op_desc_pb2.OpDesc
""" """
if len(args) != 0: if len(args) != 0:
raise ValueError("Only keyword arguments is supported by Paddle") raise ValueError("Only keyword arguments are supported.")
op_desc = framework_pb2.OpDesc() op_desc = framework_pb2.OpDesc()
for input_parameter in self.__op_proto__.inputs: for input_parameter in self.__op_proto__.inputs:
...@@ -49,8 +50,9 @@ class OpDescCreationMethod(object): ...@@ -49,8 +50,9 @@ class OpDescCreationMethod(object):
input_arguments = [input_arguments] input_arguments = [input_arguments]
if not input_parameter.duplicable and len(input_arguments) > 1: if not input_parameter.duplicable and len(input_arguments) > 1:
raise ValueError("Input %s only accepts one input, but give %d" raise ValueError(
% (input_parameter.name, len(input_arguments))) "Input %s expects only one input, but %d are given." %
(input_parameter.name, len(input_arguments)))
ipt = op_desc.inputs.add() ipt = op_desc.inputs.add()
ipt.parameter = input_parameter.name ipt.parameter = input_parameter.name
...@@ -63,7 +65,7 @@ class OpDescCreationMethod(object): ...@@ -63,7 +65,7 @@ class OpDescCreationMethod(object):
if not output_parameter.duplicable and len(output_arguments) > 1: if not output_parameter.duplicable and len(output_arguments) > 1:
raise ValueError( raise ValueError(
"Output %s only accepts one output, but give %d" % "Output %s expects only one output, but %d are given." %
(output_parameter.name, len(output_arguments))) (output_parameter.name, len(output_arguments)))
out = op_desc.outputs.add() out = op_desc.outputs.add()
...@@ -100,15 +102,17 @@ class OpDescCreationMethod(object): ...@@ -100,15 +102,17 @@ class OpDescCreationMethod(object):
pair.first = p[0] pair.first = p[0]
pair.second = p[1] pair.second = p[1]
else: else:
raise NotImplementedError("Not support attribute type " + raise NotImplementedError(
str(attr.type)) "A not supported attribute type: %s." % (
str(attr.type)))
return op_desc return op_desc
@staticmethod @staticmethod
def any_is_true(generator): def any_is_true(generator):
""" """
Reduce a bool array to one. If any of them is True, then return True. Reduce a boolean array to a single boolean parameter. If any element in
the array is True, this function will return True, otherwise False.
""" """
for flag in generator: for flag in generator:
if flag: if flag:
...@@ -127,7 +131,7 @@ class OpInfo(object): ...@@ -127,7 +131,7 @@ class OpInfo(object):
def create_op_creation_method(op_proto): def create_op_creation_method(op_proto):
""" """
Generate op creation method for an OpProto Generate op creation method for an OpProto.
""" """
method = OpDescCreationMethod(op_proto) method = OpDescCreationMethod(op_proto)
...@@ -146,20 +150,23 @@ def create_op_creation_method(op_proto): ...@@ -146,20 +150,23 @@ def create_op_creation_method(op_proto):
class OperatorFactory(object): class OperatorFactory(object):
def __init__(self): def __init__(self):
self.op_methods = dict() self.op_methods = dict()
for op_proto in get_all_op_protos(): for op_proto in get_all_op_protos():
method = create_op_creation_method(op_proto) method = create_op_creation_method(op_proto)
self.op_methods[method.name] = method self.op_methods[method.name] = method
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
if 'type' in kwargs: if "type" in kwargs:
if len(args) != 0: if len(args) != 0:
raise ValueError("All Paddle argument should be key-word " raise ValueError(
"argument except type") "Except the argument \"type\","
t = kwargs.pop('type') "all of the other arguments should be keyword arguments.")
t = kwargs.pop("type")
else: else:
if len(args) != 1: if len(args) != 1:
raise ValueError("All Paddle argument should be key-word " raise ValueError(
"argument except type") "Except the argument \"type\","
"all of the other arguments should be keyword arguments.")
t = args[0] t = args[0]
return self.get_op_info(t).method(**kwargs) return self.get_op_info(t).method(**kwargs)
...@@ -169,7 +176,7 @@ class OperatorFactory(object): ...@@ -169,7 +176,7 @@ class OperatorFactory(object):
def get_op_info(self, t): def get_op_info(self, t):
if t not in self.op_methods: if t not in self.op_methods:
raise ValueError("operator %s is not registered", t) raise ValueError("The operator: %s is not registered." % t)
return self.op_methods.get(t) return self.op_methods.get(t)
def get_op_input_names(self, type): def get_op_input_names(self, type):
...@@ -184,7 +191,7 @@ class OperatorFactory(object): ...@@ -184,7 +191,7 @@ class OperatorFactory(object):
class __RecurrentOp__(object): class __RecurrentOp__(object):
__proto__ = None __proto__ = None
type = 'recurrent' type = "recurrent"
def __init__(self): def __init__(self):
# cache recurrent_op's proto # cache recurrent_op's proto
...@@ -194,8 +201,8 @@ class __RecurrentOp__(object): ...@@ -194,8 +201,8 @@ class __RecurrentOp__(object):
self.__proto__ = op_proto self.__proto__ = op_proto
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
if self.type not in args and 'type' not in kwargs: if self.type not in args and "type" not in kwargs:
kwargs['type'] = self.type kwargs["type"] = self.type
# create proto # create proto
create_method = OpDescCreationMethod(self.__proto__) create_method = OpDescCreationMethod(self.__proto__)
proto = create_method(*args, **kwargs) proto = create_method(*args, **kwargs)
...@@ -203,5 +210,5 @@ class __RecurrentOp__(object): ...@@ -203,5 +210,5 @@ class __RecurrentOp__(object):
return core.RecurrentOp.create(proto.SerializeToString()) return core.RecurrentOp.create(proto.SerializeToString())
Operator = OperatorFactory() # Default global factory Operator = OperatorFactory() # The default global factory
RecurrentOp = __RecurrentOp__() RecurrentOp = __RecurrentOp__()
...@@ -17,6 +17,7 @@ py_test(test_cross_entropy_op SRCS test_cross_entropy_op.py) ...@@ -17,6 +17,7 @@ py_test(test_cross_entropy_op SRCS test_cross_entropy_op.py)
py_test(test_gather_op SRCS test_gather_op.py) py_test(test_gather_op SRCS test_gather_op.py)
py_test(test_scatter_op SRCS test_scatter_op.py) py_test(test_scatter_op SRCS test_scatter_op.py)
py_test(test_fill_zeros_like_op SRCS test_fill_zeros_like_op.py) py_test(test_fill_zeros_like_op SRCS test_fill_zeros_like_op.py)
py_test(test_top_k_op SRCS test_top_k_op.py)
py_test(gradient_checker SRCS gradient_checker.py) py_test(gradient_checker SRCS gradient_checker.py)
......
...@@ -38,9 +38,9 @@ def feed_data(name, data): ...@@ -38,9 +38,9 @@ def feed_data(name, data):
assert isinstance(data, numpy.ndarray) assert isinstance(data, numpy.ndarray)
tensor = scope.find_var(name).get_tensor() tensor = scope.find_var(name).get_tensor()
tensor.set_dims(data.shape) tensor.set_dims(data.shape)
if data.dtype == numpy.dtype('int32'): if data.dtype == numpy.dtype("int32"):
tensor.alloc_int(place) tensor.alloc_int(place)
elif data.dtype == numpy.dtype('float32'): elif data.dtype == numpy.dtype("float32"):
tensor.alloc_float(place) tensor.alloc_float(place)
else: else:
raise ValueError("data type not supported") raise ValueError("data type not supported")
...@@ -74,22 +74,25 @@ def init_param(net, param_name, dims): ...@@ -74,22 +74,25 @@ def init_param(net, param_name, dims):
# fc_layer # fc_layer
def fc_layer(net, input, size, act="softmax", bias=True, param=None, name=None): def fc_layer(net, input, size, act="softmax", bias=True, param=None, name=None):
""" """
Add a fc layer to net The fully connected layer.
:param input: input variable name. :param input: The name of input variable.
:type input: str :type input: str
:param size: fully connected layer size. :param size: The size of fully connected layer.
:param act: activation name :param act: The name of activation.
:param param: parameter attribute, used for initialize parameters. :param param: The attribute of learnable parameter which can be used to
:param bias: bias attribute. False will not have a bias. modify initialization mean and std of the parameter.
:param name: the name of fc layer. If not set, model will generate a :param bias: The attribute of bias. If set False, this layer does not have
readable name a bias.
:return: output variable name. :param name: The name of this layer. If it is not set explictly, a name
will be generated automatically.
:return: The name of the output variable.
""" """
if name is None: if name is None:
name = 'fc_%d' % uniq_id() name = "fc_%d" % uniq_id()
if not isinstance(name, str): if not isinstance(name, str):
raise ValueError("name should be string") raise ValueError("The name of a layer should be a string.")
input_dims = scope.find_var(input).get_tensor().get_dims() input_dims = scope.find_var(input).get_tensor().get_dims()
...@@ -123,7 +126,7 @@ def fc_layer(net, input, size, act="softmax", bias=True, param=None, name=None): ...@@ -123,7 +126,7 @@ def fc_layer(net, input, size, act="softmax", bias=True, param=None, name=None):
def cross_entropy_layer(net, input, label): def cross_entropy_layer(net, input, label):
cost_name = 'cross_entropy_%d' % uniq_id() cost_name = "cross_entropy_%d" % uniq_id()
cross_entropy_op = Operator( cross_entropy_op = Operator(
"onehot_cross_entropy", X=input, label=label, Y=cost_name) "onehot_cross_entropy", X=input, label=label, Y=cost_name)
net.append_op(cross_entropy_op) net.append_op(cross_entropy_op)
...@@ -177,8 +180,8 @@ def error_rate(predict, label): ...@@ -177,8 +180,8 @@ def error_rate(predict, label):
return error_num / float(len(label)) return error_num / float(len(label))
images = data_layer(name='pixel', dims=[BATCH_SIZE, 784]) images = data_layer(name="pixel", dims=[BATCH_SIZE, 784])
labels = data_layer(name='label', dims=[BATCH_SIZE]) labels = data_layer(name="label", dims=[BATCH_SIZE])
fc1 = fc_layer(net=forward_net, input=images, size=100, act="sigmoid") fc1 = fc_layer(net=forward_net, input=images, size=100, act="sigmoid")
fc2 = fc_layer(net=forward_net, input=fc1, size=100, act="sigmoid") fc2 = fc_layer(net=forward_net, input=fc1, size=100, act="sigmoid")
predict = fc_layer(net=forward_net, input=fc2, size=10, act="softmax") predict = fc_layer(net=forward_net, input=fc2, size=10, act="softmax")
......
...@@ -7,11 +7,11 @@ from gradient_checker import get_numeric_gradient ...@@ -7,11 +7,11 @@ from gradient_checker import get_numeric_gradient
class GetNumericGradientTest(unittest.TestCase): class GetNumericGradientTest(unittest.TestCase):
def test_add_op(self): def test_add_op(self):
add_op = Operator('add', X="X", Y="Y", Out="Z") add_op = Operator("add", X="X", Y="Y", Out="Z")
x = numpy.random.random((10, 1)).astype("float32") x = numpy.random.random((10, 1)).astype("float32")
y = numpy.random.random((10, 1)).astype("float32") y = numpy.random.random((10, 1)).astype("float32")
arr = get_numeric_gradient(add_op, {'X': x, "Y": y}, 'Z', 'X') arr = get_numeric_gradient(add_op, {"X": x, "Y": y}, "Z", "X")
self.assertAlmostEqual(arr.mean(), 1.0, delta=1e-4) self.assertAlmostEqual(arr.mean(), 1.0, delta=1e-4)
def test_softmax_op(self): def test_softmax_op(self):
...@@ -35,9 +35,9 @@ class GetNumericGradientTest(unittest.TestCase): ...@@ -35,9 +35,9 @@ class GetNumericGradientTest(unittest.TestCase):
dY = numpy.ones(Y.shape) dY = numpy.ones(Y.shape)
dX = label_softmax_grad(Y, dY) dX = label_softmax_grad(Y, dY)
arr = get_numeric_gradient(softmax_op, {"X": X}, 'Y', 'X') arr = get_numeric_gradient(softmax_op, {"X": X}, "Y", "X")
numpy.testing.assert_almost_equal(arr, dX, decimal=1e-2) numpy.testing.assert_almost_equal(arr, dX, decimal=1e-2)
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
...@@ -4,7 +4,7 @@ from op_test_util import OpTestMeta ...@@ -4,7 +4,7 @@ from op_test_util import OpTestMeta
from gradient_checker import GradientChecker, create_op from gradient_checker import GradientChecker, create_op
class TestSigmoidOp(unittest.TestCase): class TestLookupTableOp(unittest.TestCase):
__metaclass__ = OpTestMeta __metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
...@@ -15,7 +15,7 @@ class TestSigmoidOp(unittest.TestCase): ...@@ -15,7 +15,7 @@ class TestSigmoidOp(unittest.TestCase):
self.outputs = {'Out': table[ids]} self.outputs = {'Out': table[ids]}
class TestSigmoidGradOp(GradientChecker): class TestLookupTableGradOp(GradientChecker):
def test_grad(self): def test_grad(self):
op = create_op('lookup_table') op = create_op('lookup_table')
table = np.random.random((17, 31)).astype('float32') table = np.random.random((17, 31)).astype('float32')
......
...@@ -2,6 +2,7 @@ import unittest ...@@ -2,6 +2,7 @@ import unittest
import numpy as np import numpy as np
from gradient_checker import GradientChecker, create_op from gradient_checker import GradientChecker, create_op
from op_test_util import OpTestMeta from op_test_util import OpTestMeta
from paddle.v2.framework.op import Operator
class TestMulOp(unittest.TestCase): class TestMulOp(unittest.TestCase):
...@@ -16,6 +17,22 @@ class TestMulOp(unittest.TestCase): ...@@ -16,6 +17,22 @@ class TestMulOp(unittest.TestCase):
self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])} self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}
class TestMulOp2(unittest.TestCase):
__metaclass__ = OpTestMeta
def setUp(self):
self.type = "mul"
self.inputs = {
'X': np.random.random((15, 4, 12, 10)).astype("float32"),
'Y': np.random.random((4, 30, 8, 2, 9)).astype("float32")
}
self.attrs = {'x_num_col_dims': 2, 'y_num_col_dims': 2}
self.outputs = {
'Out': np.dot(self.inputs['X'].reshape(15 * 4, 12 * 10),
self.inputs['Y'].reshape(4 * 30, 8 * 2 * 9))
}
class TestMulGradOp(GradientChecker): class TestMulGradOp(GradientChecker):
def setUp(self): def setUp(self):
self.op = create_op("mul") self.op = create_op("mul")
...@@ -49,7 +66,38 @@ class TestMulGradOp(GradientChecker): ...@@ -49,7 +66,38 @@ class TestMulGradOp(GradientChecker):
no_grad_set={"Y"}) no_grad_set={"Y"})
# TODO(dzh,qijun) : mulgrad test case need transpose feature of blas library class TestMulGradTest2(GradientChecker):
def setUp(self):
self.op = Operator(
"mul", X="X", Y="Y", Out="Out", x_num_col_dims=2, y_num_col_dims=2)
self.inputs = {
"X": np.random.random((15, 4, 12, 10)).astype("float32"),
"Y": np.random.random((4, 30, 8, 2, 9)).astype("float32")
}
def test_cpu_gpu_compare(self):
self.compare_grad(self.op, self.inputs)
def test_normal(self):
self.check_grad(
self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.5)
def test_ignore_x(self):
self.check_grad(
self.op,
self.inputs, ["Y"],
"Out",
max_relative_error=0.5,
no_grad_set={"X"})
def test_ignore_y(self):
self.check_grad(
self.op,
self.inputs, ["X"],
"Out",
max_relative_error=0.5,
no_grad_set={"Y"})
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -16,6 +16,18 @@ class TestRowwiseAddOp(unittest.TestCase): ...@@ -16,6 +16,18 @@ class TestRowwiseAddOp(unittest.TestCase):
self.outputs = {'Out': np.add(self.inputs['X'], self.inputs['b'])} self.outputs = {'Out': np.add(self.inputs['X'], self.inputs['b'])}
class TestRowwiseAddOp2(unittest.TestCase):
__metaclass__ = OpTestMeta
def setUp(self):
self.type = "rowwise_add"
self.inputs = {
'X': np.random.random((13, 6, 7, 8)).astype("float32"),
'b': np.random.random((7, 8)).astype("float32")
}
self.outputs = {'Out': np.add(self.inputs['X'], self.inputs['b'])}
class TestRowwiseAddGradOp(GradientChecker): class TestRowwiseAddGradOp(GradientChecker):
def setUp(self): def setUp(self):
self.op = create_op("rowwise_add") self.op = create_op("rowwise_add")
...@@ -34,5 +46,23 @@ class TestRowwiseAddGradOp(GradientChecker): ...@@ -34,5 +46,23 @@ class TestRowwiseAddGradOp(GradientChecker):
self.check_grad(self.op, self.inputs, ["b"], "Out", no_grad_set={"X"}) self.check_grad(self.op, self.inputs, ["b"], "Out", no_grad_set={"X"})
class TestRowwiseAddGradOp2(GradientChecker):
def setUp(self):
self.op = create_op("rowwise_add")
self.inputs = {
"X": np.random.uniform(0.1, 1, [2, 3, 2, 5]).astype("float32"),
"b": np.random.uniform(0.1, 1, [2, 5]).astype("float32")
}
def test_normal(self):
self.check_grad(self.op, self.inputs, ["X", "b"], "Out")
def test_ignore_b(self):
self.check_grad(self.op, self.inputs, ["X"], "Out", no_grad_set={"b"})
def test_ignore_x(self):
self.check_grad(self.op, self.inputs, ["b"], "Out", no_grad_set={"X"})
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -18,18 +18,22 @@ class TestSoftmaxOp(unittest.TestCase): ...@@ -18,18 +18,22 @@ class TestSoftmaxOp(unittest.TestCase):
def setUp(self): def setUp(self):
self.type = "softmax" self.type = "softmax"
self.inputs = {'X': np.random.random((32, 100)).astype("float32")} self.inputs = {"X": np.random.random((10, 10)).astype("float32")}
self.outputs = { self.outputs = {
'Y': np.apply_along_axis(stable_softmax, 1, self.inputs['X']) "Y": np.apply_along_axis(stable_softmax, 1, self.inputs["X"])
} }
class SoftmaxGradOpTest(GradientChecker): class TestSoftmaxGradOp(GradientChecker):
def test_softmax(self): def setUp(self):
op = create_op("softmax") self.op = create_op("softmax")
inputs = {"X": np.random.uniform(0.1, 1, [10, 10]).astype("float32")} self.inputs = {
self.check_grad(op, inputs, set("X"), "Y") "X": np.random.uniform(0.1, 1, [10, 10]).astype("float32")
}
def test_softmax_grad(self):
self.check_grad(self.op, self.inputs, ["X"], "Y")
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest
import numpy as np
from gradient_checker import GradientChecker, create_op
from op_test_util import OpTestMeta
class TestTopkOp(unittest.TestCase):
__metaclass__ = OpTestMeta
def setUp(self):
self.type = "top_k"
k = 1
input = np.random.random((32, 84)).astype("float32")
output = np.ndarray((32, k))
indices = np.ndarray((32, k))
self.inputs = {'X': input}
self.attrs = {'k': k}
for rowid in xrange(32):
row = input[rowid]
output[rowid] = np.sort(row)[-k:]
indices[rowid] = row.argsort()[-k:]
self.outputs = {'Out': output, 'Indices': indices}
class TestTopkOp3d(unittest.TestCase):
__metaclass__ = OpTestMeta
def setUp(self):
self.type = "top_k"
k = 1
input = np.random.random((32, 2, 84)).astype("float32")
input_flat_2d = input.reshape(64, 84)
output = np.ndarray((64, k))
indices = np.ndarray((64, k)).astype("int")
# FIXME: should use 'X': input for a 3d input
self.inputs = {'X': input_flat_2d}
self.attrs = {'k': k}
for rowid in xrange(64):
row = input_flat_2d[rowid]
output[rowid] = np.sort(row)[-k:]
indices[rowid] = row.argsort()[-k:]
self.outputs = {'Out': output, 'Indices': indices}
if __name__ == '__main__':
unittest.main()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册