提交 dbe05987 编写于 作者: Q qingqing01

update to develop branch and resolve conflicts.

...@@ -21,7 +21,6 @@ addons: ...@@ -21,7 +21,6 @@ addons:
- python - python
- python-pip - python-pip
- python2.7-dev - python2.7-dev
- python-numpy
- python-wheel - python-wheel
- libboost-dev - libboost-dev
- curl - curl
...@@ -35,8 +34,8 @@ before_install: ...@@ -35,8 +34,8 @@ before_install:
- if [[ "$JOB" == "check_style" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi - if [[ "$JOB" == "check_style" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi
# Paddle is using protobuf 3.1 currently. Protobuf 3.2 breaks the compatibility. So we specify the python # Paddle is using protobuf 3.1 currently. Protobuf 3.2 breaks the compatibility. So we specify the python
# protobuf version. # protobuf version.
- pip install -r $TRAVIS_BUILD_DIR/python/requirements.txt - sudo pip install -r $TRAVIS_BUILD_DIR/python/requirements.txt
- pip install wheel sphinx==1.5.6 recommonmark sphinx-rtd-theme==0.1.9 virtualenv pre-commit LinkChecker - sudo pip install wheel sphinx==1.5.6 recommonmark sphinx-rtd-theme==0.1.9 virtualenv pre-commit LinkChecker
- curl https://glide.sh/get | bash - curl https://glide.sh/get | bash
- eval "$(GIMME_GO_VERSION=1.8.3 gimme)" - eval "$(GIMME_GO_VERSION=1.8.3 gimme)"
- go get -u github.com/alecthomas/gometalinter - go get -u github.com/alecthomas/gometalinter
......
...@@ -65,8 +65,8 @@ if(NOT CMAKE_BUILD_TYPE) ...@@ -65,8 +65,8 @@ if(NOT CMAKE_BUILD_TYPE)
endif() endif()
if(ANDROID) if(ANDROID)
if(${CMAKE_SYSTEM_VERSION} VERSION_LESS "21") if(${CMAKE_SYSTEM_VERSION} VERSION_LESS "16")
message(FATAL_ERROR "Unsupport standalone toolchains with Android API level lower than 21") message(FATAL_ERROR "Unsupport standalone toolchains with Android API level lower than 16")
endif() endif()
set(WITH_GPU OFF CACHE STRING set(WITH_GPU OFF CACHE STRING
......
...@@ -86,12 +86,13 @@ def layer.fc(X): ...@@ -86,12 +86,13 @@ def layer.fc(X):
We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example: We'd like to have Python bindings to operators in package `paddle.operator`, and Python compositions of operators in package `paddle.layer`. So we have the following concepts in above illustrative example:
```
| C++ functions/functors | mul | add | | | | C++ functions/functors | mul | add | | |
|------------------------|--------------|--------------|-------------|----------|
| C++ operator class | mulOp | addOp | FCOp | | | C++ operator class | mulOp | addOp | FCOp | |
| Python binding | operator.mul | operator.add | operator.fc | | | Python binding | operator.mul | operator.add | operator.fc | |
| Python function | | | | layer.fc | | Python function | | | | layer.fc |
```
This is how we differentiate layer and operators in PaddlePaddle: This is how we differentiate layer and operators in PaddlePaddle:
......
# Design Doc: Operation Graph Based Parameter Server
## Abstract
We propose an approach to implement the parameter server. In this
approach, there is no fundamental difference between the trainer and
the parameter server: they both run subgraphs, but subgraphs of
different purposes.
## Background
The previous implementations of the parameter server does not run a
subgraph. parameter initialization, optimizer computation, network
communication and checkpointing are implemented twice on both the
trainer and the parameter server.
It would be great if we can write code once and use them on both the
trainer and the parameter server: reduces code duplication and
improves extensibility. Given that after the current refactor, we are
representing everything as a computing graph on the
trainer. Representing everything as a computing graph on the parameter
server becomes a natural extension.
## Design
### Graph Converter
The *graph converter* converts the user-defined operation (OP) graph
into subgraphs to be scheduled on different nodes with the following
steps:
1. OP placement: the OPs will be placed on different nodes according
to heuristic that minimizes estimated total computation
time. Currently we will use a simple heuristic that puts parameter
varable on parameter server workers and everything else on trainer
workers.
1. Add communication OPs to enable the communication between nodes.
We will need these OPs: *Send*, *Recv*, *Enqueue*, *Dequeue*.
Below is an example of converting the user defined graph to the
subgraphs for the trainer and the parameter server:
<img src="src/local-graph.png" width="300"/>
After converting:
<img src="src/dist-graph.png" width="700"/>
1. The parameter variable W and it's optimizer subgraph are placed on the parameter server.
1. Operators are added to the subgraphs.
- *Send* sends data to the connected *Recv* operator. The
scheduler on the receive node will only schedule *Recv* operator
to run when the *Send* operator has ran (the *Send* OP will mark
the *Recv* OP runnable automatically).
- *Enueue* enqueues the input variable, it can block until space
become available in the queue.
- *Dequeue* outputs configurable numbers of tensors from the
queue. It will block until the queue have the required number of
tensors.
### Benefits
- Model parallelism become easier to implement: it's an extension to
the trainer - parameter server approach. we already have the
communication OPs, but need to extend the graph converter's
placement functionality.
- User-defined optimizer is easier to add - user can now express it as
a subgraph.
- No more duplication logic inside the trainer and the parameter
server mentioned in the background section.
### Challenges
- It might be hard for the graph converter to cut a general graph
(without any hint for which subgraph is the optimizer). We may need
to label which subgraph inside the OP graph is the optimizer.
- It's important to balance the parameter shards of on multiple
parameter server. If a single parameter is very big (some
word-embedding, fully connected, softmax layer), we need to
automatically partition the single parameter onto different
parameter servers when possible (only element-wise optimizer depends
on the parameter variable).
### Discussion
- In the "Aync SGD" figure, the "W" variable on the parameter server
could be read and wrote concurrently, what is our locking strategy?
E.g., each variable have a lock cpp method to be invoked by every
OP, or, have a lock OP.
- Can the Enqueue OP be implemented under our current tensor design
(puts the input tensor into the queue tensor)?
- *Dequeue* OP will have variable numbers of output (depends on the
`min_count` attribute), does our current design support it? (similar
question for the *Add* OP)
### References:
[1] [TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf)
...@@ -9,6 +9,7 @@ cc_test(eigen_test SRCS eigen_test.cc DEPS tensor) ...@@ -9,6 +9,7 @@ cc_test(eigen_test SRCS eigen_test.cc DEPS tensor)
cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor) cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor)
cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor) cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor)
nv_test(lod_tensor_gpu_test SRCS lod_tensor_test.cu DEPS lod_tensor)
cc_test(variable_test SRCS variable_test.cc) cc_test(variable_test SRCS variable_test.cc)
......
...@@ -45,7 +45,19 @@ class GreaterThanChecker { ...@@ -45,7 +45,19 @@ class GreaterThanChecker {
public: public:
explicit GreaterThanChecker(T lower_bound) : lower_bound_(lower_bound) {} explicit GreaterThanChecker(T lower_bound) : lower_bound_(lower_bound) {}
void operator()(T& value) const { void operator()(T& value) const {
PADDLE_ENFORCE(value > lower_bound_, "larger_than check fail"); PADDLE_ENFORCE(value > lower_bound_, "larger_than check fails.");
}
private:
T lower_bound_;
};
template <typename T>
class EqualGreaterThanChecker {
public:
explicit EqualGreaterThanChecker(T lower_bound) : lower_bound_(lower_bound) {}
void operator()(T& value) const {
PADDLE_ENFORCE_GE(value, lower_bound_, "equal_larger_than check fails.");
} }
private: private:
...@@ -115,6 +127,11 @@ class TypedAttrChecker { ...@@ -115,6 +127,11 @@ class TypedAttrChecker {
return *this; return *this;
} }
TypedAttrChecker& EqualGreaterThan(const T& lower_bound) {
value_checkers_.push_back(EqualGreaterThanChecker<T>(lower_bound));
return *this;
}
// we can add more common limits, like LessThan(), Between()... // we can add more common limits, like LessThan(), Between()...
TypedAttrChecker& SetDefault(const T& default_value) { TypedAttrChecker& SetDefault(const T& default_value) {
......
...@@ -2,20 +2,20 @@ ...@@ -2,20 +2,20 @@
## Motivation ## Motivation
In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass. In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the gradient operators/expressions together with the chain rule. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
## Backward Operator Registry ## Backward Operator Registry
A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs and output gradients and then calculate its input gradients. A backward network is built up with several backward operators. Backward operators take forward operators' inputs outputs, and output gradients and then calculate its input gradients.
| | forward operator | backward operator | | forward operator | backward operator
| ---------------------- | ---------------- |------------------------- | | ---------------------- | ---------------- |------------------------- |
| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients | | **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients |
| **Operator::outputs_** | Outputs | InputGradients | | **Operator::outputs_** | Outputs | InputGradients |
In most cases, there is a one-to-one correspondence between forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced. In most cases, there is a one-to-one correspondence between the forward and backward operators. These correspondences are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and make operators pluggable, the registry mechanism is introduced.
For example, we have got a `mul_op`, and we can register it's information and corresponding backward operator by the following macro: For example, we have got a `mul_op`, and we can register its information and corresponding backward operator by the following macro:
```cpp ```cpp
REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
...@@ -27,17 +27,17 @@ REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad); ...@@ -27,17 +27,17 @@ REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
## Backward Opeartor Creating ## Backward Opeartor Creating
Given a certain forward operator, we can get its corresponding backward opeartor by calling: Given a certain forward operator, we can get its corresponding backward operator by calling:
```cpp ```cpp
OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op); OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op);
``` ```
The function `BuildGradOp` will sequentially execute following processes: The function `BuildGradOp` will sequentially execute following processes:
1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`. 1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.
2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these are not necessary for gradient computing. 2. Build two maps named `inputs` and `outputs` to temporary storage backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these, are not necessary for gradient computing.
3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`. 3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.
...@@ -49,31 +49,31 @@ A backward network is a series of backward operators. The main idea of building ...@@ -49,31 +49,31 @@ A backward network is a series of backward operators. The main idea of building
In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network. In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network.
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`. given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`, `InputGradients`.
1. Op 1. Op
when the input forward network is a Op, return its gradient Operator Immediately. when the input forward network is an Op, return its gradient Operator Immediately.
2. NetOp 2. NetOp
when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to forward NetOp. when the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp.
**shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwirte their shared input variable. **shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwrite their shared input variable.
<p align="center"> <p align="center">
<img src="./images/duplicate_op.png" width="70%" ><br/> <img src="./images/duplicate_op.png" width="50%" ><br/>
1. shared variable in two operators. 1. Shared variable in operators.
</p> </p>
Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator replace the overwirte links. Share variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator replace the overwrite links.
<p align="center"> <p align="center">
<img src="images/duplicate_op2.png" width="90%" ><br/> <img src="images/duplicate_op2.png" width="50%" ><br/>
2. replace shared variable gradient with `Add` Operator 2. Replace shared variable's gradient with `Add` operator.
</p> </p>
......
...@@ -283,5 +283,14 @@ std::ostream& operator<<(std::ostream& os, const DDim& ddim) { ...@@ -283,5 +283,14 @@ std::ostream& operator<<(std::ostream& os, const DDim& ddim) {
DDim::DDim(std::initializer_list<int64_t> init_list) { DDim::DDim(std::initializer_list<int64_t> init_list) {
*this = make_ddim(init_list); *this = make_ddim(init_list);
} }
DDim flatten_to_2d(const DDim& src, int num_col_dims) {
int rank = src.size();
return make_ddim({product(slice_ddim(src, 0, num_col_dims)),
product(slice_ddim(src, num_col_dims, rank))});
}
DDim flatten_to_1d(const DDim& src) { return make_ddim({product(src)}); }
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -115,6 +115,12 @@ int arity(const DDim& ddim); ...@@ -115,6 +115,12 @@ int arity(const DDim& ddim);
std::ostream& operator<<(std::ostream&, const DDim&); std::ostream& operator<<(std::ostream&, const DDim&);
// Reshape a tensor to a matrix. The matrix's first dimension(column length)
// will be the product of tensor's first `num_col_dims` dimensions.
DDim flatten_to_2d(const DDim& src, int num_col_dims);
DDim flatten_to_1d(const DDim& src);
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
......
...@@ -63,20 +63,35 @@ struct EigenTensor { ...@@ -63,20 +63,35 @@ struct EigenTensor {
template <typename T, int MajorType = Eigen::RowMajor, template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex> typename IndexType = Eigen::DenseIndex>
struct EigenMatrix : public EigenTensor<T, 2, MajorType, IndexType> {}; struct EigenMatrix : public EigenTensor<T, 2, MajorType, IndexType> {
static typename EigenMatrix::Type Reshape(Tensor& tensor, int num_col_dims) {
int rank = tensor.dims_.size();
PADDLE_ENFORCE(num_col_dims > 0 && num_col_dims < rank,
"`num_col_dims` must be between (0, rank_of_tensor).");
return EigenMatrix::From(tensor,
flatten_to_2d(tensor.dims(), num_col_dims));
}
static typename EigenMatrix::ConstType Reshape(const Tensor& tensor,
int num_col_dims) {
int rank = tensor.dims_.size();
PADDLE_ENFORCE(num_col_dims > 0 && num_col_dims < rank,
"`num_col_dims` must be between (0, rank_of_tensor).");
return EigenMatrix::From(tensor,
flatten_to_2d(tensor.dims(), num_col_dims));
}
};
template <typename T, int MajorType = Eigen::RowMajor, template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex> typename IndexType = Eigen::DenseIndex>
struct EigenVector : public EigenTensor<T, 1, MajorType, IndexType> { struct EigenVector : public EigenTensor<T, 1, MajorType, IndexType> {
// Flatten reshapes a Tensor into an EigenVector. // Flatten reshapes a Tensor into an EigenVector.
static typename EigenVector::Type Flatten(Tensor& tensor) { static typename EigenVector::Type Flatten(Tensor& tensor) {
return EigenVector::From( return EigenVector::From(tensor, {product(tensor.dims_)});
tensor, make_ddim({static_cast<int>(product(tensor.dims_))}));
} }
static typename EigenVector::ConstType Flatten(const Tensor& tensor) { static typename EigenVector::ConstType Flatten(const Tensor& tensor) {
return EigenVector::From( return EigenVector::From(tensor, {product(tensor.dims_)});
tensor, make_ddim({static_cast<int>(product(tensor.dims_))}));
} }
}; };
......
...@@ -108,5 +108,24 @@ TEST(Eigen, Matrix) { ...@@ -108,5 +108,24 @@ TEST(Eigen, Matrix) {
} }
} }
TEST(Eigen, MatrixReshape) {
Tensor t;
float* p = t.mutable_data<float>({2, 3, 6, 4}, platform::CPUPlace());
for (int i = 0; i < 2 * 3 * 6 * 4; ++i) {
p[i] = static_cast<float>(i);
}
EigenMatrix<float>::Type em = EigenMatrix<float>::Reshape(t, 2);
ASSERT_EQ(2 * 3, em.dimension(0));
ASSERT_EQ(6 * 4, em.dimension(1));
for (int i = 0; i < 2 * 3; i++) {
for (int j = 0; j < 6 * 4; j++) {
ASSERT_NEAR(i * 6 * 4 + j, em(i, j), 1e-6f);
}
}
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -18,8 +18,10 @@ ...@@ -18,8 +18,10 @@
#ifndef PADDLE_ONLY_CPU #ifndef PADDLE_ONLY_CPU
#include <thrust/device_vector.h> #include <thrust/device_vector.h>
#include <thrust/host_vector.h> #include <thrust/host_vector.h>
#include <thrust/system/cuda/experimental/pinned_allocator.h>
#endif #endif
#include <glog/logging.h>
#include "paddle/framework/ddim.h" #include "paddle/framework/ddim.h"
#include "paddle/framework/tensor.h" #include "paddle/framework/tensor.h"
#include "paddle/platform/enforce.h" #include "paddle/platform/enforce.h"
...@@ -32,7 +34,8 @@ template <typename T> ...@@ -32,7 +34,8 @@ template <typename T>
using Vector = std::vector<T>; using Vector = std::vector<T>;
#else #else
template <typename T> template <typename T>
using Vector = thrust::host_vector<T>; using Vector = thrust::host_vector<
T, thrust::system::cuda::experimental::pinned_allocator<T>>;
#endif #endif
using LoD = std::vector<Vector<size_t>>; using LoD = std::vector<Vector<size_t>>;
......
/*
Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
#include <cuda.h>
#include <cuda_runtime.h>
#include "paddle/framework/lod_tensor.h"
#include "paddle/platform/assert.h"
#include <gtest/gtest.h>
__global__ void test(size_t* a, int size) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < size;
i += blockDim.x * gridDim.x) {
a[i] *= 2;
}
}
TEST(LoDTensor, LoDInGPU) {
paddle::framework::Tensor tensor;
paddle::framework::LoDTensor lod_tensor;
paddle::platform::GPUPlace place(0);
paddle::framework::LoD src_lod;
src_lod.push_back(std::vector<size_t>{0, 2, 4, 6, 8, 10, 12, 14});
tensor.Resize({14, 16});
tensor.mutable_data<float>(place);
lod_tensor.set_lod(src_lod);
lod_tensor.set_tensor(&tensor);
CHECK_EQ(lod_tensor.lod_element(0, 2), 4);
CHECK_EQ(lod_tensor.lod_element(0, 4), 8);
auto lod = lod_tensor.lod();
test<<<1, 8>>>(lod[0].data(), lod[0].size());
cudaDeviceSynchronize();
for (size_t i = 0; i < src_lod[0].size(); ++i) {
CHECK_EQ(lod[0].data()[i], src_lod[0].data()[i] * 2);
}
}
...@@ -123,6 +123,15 @@ OperatorBase::OperatorBase(const std::string& type, ...@@ -123,6 +123,15 @@ OperatorBase::OperatorBase(const std::string& type,
CheckAllInputOutputSet(); CheckAllInputOutputSet();
} }
std::vector<std::string> OperatorBase::InputVars() const {
std::vector<std::string> ret_val;
for (auto& o : outputs_) {
ret_val.reserve(ret_val.size() + o.second.size());
ret_val.insert(ret_val.end(), o.second.begin(), o.second.end());
}
return ret_val;
}
std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const { std::vector<std::string> OperatorBase::OutputVars(bool has_intermediate) const {
std::vector<std::string> ret_val; std::vector<std::string> ret_val;
if (has_intermediate) { if (has_intermediate) {
......
...@@ -94,11 +94,14 @@ class OperatorBase { ...@@ -94,11 +94,14 @@ class OperatorBase {
const VariableNameMap& Inputs() const { return inputs_; } const VariableNameMap& Inputs() const { return inputs_; }
const VariableNameMap& Outputs() const { return outputs_; } const VariableNameMap& Outputs() const { return outputs_; }
//! Get a input with argument's name described in `op_proto` //! Get a input with argument's name described in `op_proto`
std::string Input(const std::string& name) const; std::string Input(const std::string& name) const;
//! Get a input which has multiple variables. //! Get a input which has multiple variables.
const std::vector<std::string>& Inputs(const std::string& name) const; const std::vector<std::string>& Inputs(const std::string& name) const;
std::vector<std::string> InputVars() const;
//! Get a output with argument's name described in `op_proto` //! Get a output with argument's name described in `op_proto`
std::string Output(const std::string& name) const; std::string Output(const std::string& name) const;
//! Get an output which has multiple variables. //! Get an output which has multiple variables.
...@@ -311,9 +314,9 @@ class InferShapeContext { ...@@ -311,9 +314,9 @@ class InferShapeContext {
} }
template <typename T> template <typename T>
std::vector<const T*> MultiOutput(const std::string& name) const { std::vector<T*> MultiOutput(const std::string& name) const {
auto names = op_.Outputs(name); auto names = op_.Outputs(name);
std::vector<const T*> res; std::vector<T*> res;
res.reserve(names.size()); res.reserve(names.size());
std::transform(names.begin(), names.end(), std::back_inserter(res), std::transform(names.begin(), names.end(), std::back_inserter(res),
[&](const std::string& sub_name) { [&](const std::string& sub_name) {
......
...@@ -43,6 +43,9 @@ class Tensor { ...@@ -43,6 +43,9 @@ class Tensor {
template <typename T, size_t D, int MajorType, typename IndexType> template <typename T, size_t D, int MajorType, typename IndexType>
friend struct EigenTensor; friend struct EigenTensor;
template <typename T, int MajorType, typename IndexType>
friend struct EigenMatrix;
template <typename T, int MajorType, typename IndexType> template <typename T, int MajorType, typename IndexType>
friend struct EigenVector; friend struct EigenVector;
......
...@@ -151,5 +151,13 @@ inline const DDim& Tensor::dims() const { return dims_; } ...@@ -151,5 +151,13 @@ inline const DDim& Tensor::dims() const { return dims_; }
inline int64_t Tensor::numel() const { return numel_; } inline int64_t Tensor::numel() const { return numel_; }
template <typename T>
inline Tensor ReshapeToMatrix(const Tensor& src, int num_col_dims) {
Tensor res;
res.ShareDataWith<T>(src);
res.Resize(flatten_to_2d(src.dims(), num_col_dims));
return res;
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -262,3 +262,16 @@ TEST(Tensor, CopyFrom) { ...@@ -262,3 +262,16 @@ TEST(Tensor, CopyFrom) {
} }
#endif #endif
} }
TEST(Tensor, ReshapeToMatrix) {
using namespace paddle::framework;
using namespace paddle::platform;
Tensor src;
int* src_ptr = src.mutable_data<int>({2, 3, 4, 9}, CPUPlace());
for (int i = 0; i < 2 * 3 * 4 * 9; ++i) {
src_ptr[i] = i;
}
Tensor res = ReshapeToMatrix<int>(src, 2);
ASSERT_EQ(res.dims()[0], 2 * 3);
ASSERT_EQ(res.dims()[1], 4 * 9);
}
\ No newline at end of file
...@@ -62,14 +62,18 @@ void BatchNormBaseLayer::calFeatureMapSize() { ...@@ -62,14 +62,18 @@ void BatchNormBaseLayer::calFeatureMapSize() {
const ImageConfig& conf = config_.inputs(0).image_conf(); const ImageConfig& conf = config_.inputs(0).image_conf();
imageH_ = inputLayers_[0]->getOutput().getFrameHeight(); imageH_ = inputLayers_[0]->getOutput().getFrameHeight();
imageW_ = inputLayers_[0]->getOutput().getFrameWidth(); imageW_ = inputLayers_[0]->getOutput().getFrameWidth();
imageD_ = inputLayers_[0]->getOutput().getFrameDepth();
if (0 == imageD_) imageD_ = conf.img_size_z();
if (imageH_ == 0 && imageW_ == 0) { if (imageH_ == 0 && imageW_ == 0) {
imageH_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size(); imageH_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size();
imageW_ = conf.img_size(); imageW_ = conf.img_size();
} else { } else {
getOutput().setFrameHeight(imageH_); getOutput().setFrameHeight(imageH_);
getOutput().setFrameWidth(imageW_); getOutput().setFrameWidth(imageW_);
getOutput().setFrameDepth(imageD_);
} }
imgPixels_ = imageH_ * imageW_; imgPixels_ = imageH_ * imageW_ * imageD_;
} }
} // namespace paddle } // namespace paddle
...@@ -80,6 +80,7 @@ protected: ...@@ -80,6 +80,7 @@ protected:
/// Height or width of input image feature. /// Height or width of input image feature.
/// Both of them are 1 if the input is fully-connected layer. /// Both of them are 1 if the input is fully-connected layer.
int imageD_;
int imageH_; int imageH_;
int imageW_; int imageW_;
/// Height * Width. /// Height * Width.
......
...@@ -37,7 +37,7 @@ bool CudnnBatchNormLayer::init(const LayerMap& layerMap, ...@@ -37,7 +37,7 @@ bool CudnnBatchNormLayer::init(const LayerMap& layerMap,
} }
void CudnnBatchNormLayer::reshape(int batchSize) { void CudnnBatchNormLayer::reshape(int batchSize) {
hl_tensor_reshape(ioDesc_, batchSize, channels_, imageH_, imageW_); hl_tensor_reshape(ioDesc_, batchSize, channels_, imageH_ * imageD_, imageW_);
} }
void CudnnBatchNormLayer::forward(PassType passType) { void CudnnBatchNormLayer::forward(PassType passType) {
...@@ -104,7 +104,7 @@ void CudnnBatchNormLayer::forward(PassType passType) { ...@@ -104,7 +104,7 @@ void CudnnBatchNormLayer::forward(PassType passType) {
EPS, EPS,
batchSize, batchSize,
channels_, channels_,
imageH_, imageH_ * imageD_,
imageW_); imageW_);
} }
} }
......
...@@ -53,27 +53,27 @@ bool DeConv3DLayer::init(const LayerMap &layerMap, ...@@ -53,27 +53,27 @@ bool DeConv3DLayer::init(const LayerMap &layerMap,
size_t DeConv3DLayer::getSize() { size_t DeConv3DLayer::getSize() {
CHECK_NE(inputLayers_.size(), 0UL); CHECK_NE(inputLayers_.size(), 0UL);
outputH_.clear(); imgSizeW_.clear();
outputW_.clear(); imgSizeH_.clear();
outputD_.clear(); imgSizeD_.clear();
N_.clear(); N_.clear();
NOut_.clear(); NOut_.clear();
size_t layerSize = 0; size_t layerSize = 0;
for (size_t i = 0; i < inputLayers_.size(); ++i) { for (size_t i = 0; i < inputLayers_.size(); ++i) {
outputW_.push_back( imgSizeW_.push_back(
imageSize(imgSizeW_[i], filterSize_[i], padding_[i], stride_[i], true)); imageSize(outputW_[i], filterSize_[i], padding_[i], stride_[i], true));
outputH_.push_back(imageSize( imgSizeH_.push_back(imageSize(
imgSizeH_[i], filterSizeY_[i], paddingY_[i], strideY_[i], true)); outputH_[i], filterSizeY_[i], paddingY_[i], strideY_[i], true));
outputD_.push_back(imageSize( imgSizeD_.push_back(imageSize(
imgSizeD_[i], filterSizeZ_[i], paddingZ_[i], strideZ_[i], true)); outputD_[i], filterSizeZ_[i], paddingZ_[i], strideZ_[i], true));
NOut_.push_back(outputD_[i] * outputH_[i] * outputW_[i]); NOut_.push_back(imgSizeD_[i] * imgSizeH_[i] * imgSizeW_[i]);
N_.push_back(imgSizeD_[i] * imgSizeH_[i] * imgSizeW_[i]); N_.push_back(outputD_[i] * outputH_[i] * outputW_[i]);
CHECK(layerSize == 0 || N_[i] * size_t(numFilters_) == layerSize); CHECK(layerSize == 0 || N_[i] * size_t(numFilters_) == layerSize);
layerSize += NOut_[i] * numFilters_; layerSize += NOut_[i] * numFilters_;
} }
getOutput().setFrameHeight(outputH_[0]); getOutput().setFrameHeight(imgSizeH_[0]);
getOutput().setFrameWidth(outputW_[0]); getOutput().setFrameWidth(imgSizeW_[0]);
getOutput().setFrameDepth(outputD_[0]); getOutput().setFrameDepth(imgSizeD_[0]);
return layerSize; return layerSize;
} }
...@@ -103,9 +103,9 @@ void DeConv3DLayer::forward(PassType passType) { ...@@ -103,9 +103,9 @@ void DeConv3DLayer::forward(PassType passType) {
} }
colBuf_->col2Vol(outMat->getData() + n * outMat->getStride(), colBuf_->col2Vol(outMat->getData() + n * outMat->getStride(),
numFilters_, numFilters_,
outputD_[i], imgSizeD_[i],
outputH_[i], imgSizeH_[i],
outputW_[i], imgSizeW_[i],
filterSizeZ_[i], filterSizeZ_[i],
filterSizeY_[i], filterSizeY_[i],
filterSize_[i], filterSize_[i],
...@@ -144,9 +144,9 @@ void DeConv3DLayer::backward(const UpdateCallback &callback) { ...@@ -144,9 +144,9 @@ void DeConv3DLayer::backward(const UpdateCallback &callback) {
colBuf_->vol2Col( colBuf_->vol2Col(
getOutputGrad()->getData() + n * getOutputGrad()->getStride(), getOutputGrad()->getData() + n * getOutputGrad()->getStride(),
numFilters_, numFilters_,
outputD_[i], imgSizeD_[i],
outputH_[i], imgSizeH_[i],
outputW_[i], imgSizeW_[i],
filterSizeZ_[i], filterSizeZ_[i],
filterSizeY_[i], filterSizeY_[i],
filterSize_[i], filterSize_[i],
......
...@@ -139,7 +139,13 @@ void DetectionOutputLayer::forward(PassType passType) { ...@@ -139,7 +139,13 @@ void DetectionOutputLayer::forward(PassType passType) {
allDecodedBBoxes, allDecodedBBoxes,
&allIndices); &allIndices);
resetOutput(numKept, 7); if (numKept > 0) {
resetOutput(numKept, 7);
} else {
MatrixPtr outV = getOutputValue();
outV = NULL;
return;
}
MatrixPtr outV = getOutputValue(); MatrixPtr outV = getOutputValue();
getDetectionOutput(confBuffer_->getData(), getDetectionOutput(confBuffer_->getData(),
numKept, numKept,
......
...@@ -469,7 +469,7 @@ size_t getDetectionIndices( ...@@ -469,7 +469,7 @@ size_t getDetectionIndices(
const size_t numClasses, const size_t numClasses,
const size_t backgroundId, const size_t backgroundId,
const size_t batchSize, const size_t batchSize,
const size_t confThreshold, const real confThreshold,
const size_t nmsTopK, const size_t nmsTopK,
const real nmsThreshold, const real nmsThreshold,
const size_t keepTopK, const size_t keepTopK,
......
...@@ -275,7 +275,7 @@ size_t getDetectionIndices( ...@@ -275,7 +275,7 @@ size_t getDetectionIndices(
const size_t numClasses, const size_t numClasses,
const size_t backgroundId, const size_t backgroundId,
const size_t batchSize, const size_t batchSize,
const size_t confThreshold, const real confThreshold,
const size_t nmsTopK, const size_t nmsTopK,
const real nmsThreshold, const real nmsThreshold,
const size_t keepTopK, const size_t keepTopK,
......
...@@ -77,24 +77,6 @@ void MKLDNNFcLayer::convertWeightsToPaddle() { ...@@ -77,24 +77,6 @@ void MKLDNNFcLayer::convertWeightsToPaddle() {
wgtVal_->reorderDataTo(wgtVal_, dstFmt, targetDim); wgtVal_->reorderDataTo(wgtVal_, dstFmt, targetDim);
} }
void MKLDNNFcLayer::convertOutputToOtherDevice() {
copyOutputInfoToOtherDevice();
// find other cpu device and reorder output to cpu device
int cnt = 0;
for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
if (outputOtherDevice_[i].deviceId == CPU_DEVICE) {
// fc cpu output value do not need convert
// just share point
outputOtherDevice_[i].value = output_.value;
++cnt;
}
}
if (cnt > 1) {
LOG(WARNING) << "should not have more than one CPU devie";
}
}
void MKLDNNFcLayer::reshape() { void MKLDNNFcLayer::reshape() {
const Argument& input = getInput(0, getPrev(0)->getDeviceId()); const Argument& input = getInput(0, getPrev(0)->getDeviceId());
int batchSize = input.getBatchSize(); int batchSize = input.getBatchSize();
...@@ -155,7 +137,10 @@ void MKLDNNFcLayer::resetFwd() { ...@@ -155,7 +137,10 @@ void MKLDNNFcLayer::resetFwd() {
// change original output value to mkldnn output value // change original output value to mkldnn output value
output_.value = std::dynamic_pointer_cast<Matrix>(outVal_); output_.value = std::dynamic_pointer_cast<Matrix>(outVal_);
if (!outputIsOnlyMKLDNN()) { if (!outputIsOnlyMKLDNN()) {
convertOutputToOtherDevice(); copyOutputInfoToOtherDevice();
// fc cpu output value do not need create convert
// just share point
getOutput(CPU_DEVICE).value->setData(output_.value->getData());
} }
// create forward handle // create forward handle
...@@ -235,13 +220,12 @@ void MKLDNNFcLayer::resetBwd() { ...@@ -235,13 +220,12 @@ void MKLDNNFcLayer::resetBwd() {
pipelineBwd_.push_back(*bwdWgt_); pipelineBwd_.push_back(*bwdWgt_);
/// backward data /// backward data
device = inputIsOnlyMKLDNN() ? MKLDNN_DEVICE : CPU_DEVICE; const MatrixPtr& in = inputLayers_[0]->getOutput().grad;
const MatrixPtr& in = getInputGrad(0, device);
if (in == nullptr) { if (in == nullptr) {
return; return;
} }
if (getInput(0, device).getAllCount() > 1) { if (getInput(0, MKLDNN_DEVICE).getAllCount() > 1) {
// TODO(TJ): use outputMaps_ ways when merge outgrad done // TODO(TJ): use outputMaps_ ways to get the inGrad_ when merge outgrad done
} else { } else {
inGrad_ = MKLDNNMatrix::create(in, inVal_->getPrimitiveDesc()); inGrad_ = MKLDNNMatrix::create(in, inVal_->getPrimitiveDesc());
} }
...@@ -258,13 +242,21 @@ void MKLDNNFcLayer::resetBwd() { ...@@ -258,13 +242,21 @@ void MKLDNNFcLayer::resetBwd() {
pipelineBwd_.push_back(*bwdData_); pipelineBwd_.push_back(*bwdData_);
} }
void MKLDNNFcLayer::updateInputData() {
if (inputLayers_[0]->getType() != "data") {
return;
}
real* iData = getInputValue(0, CPU_DEVICE)->getData();
inVal_->setData(iData);
}
void MKLDNNFcLayer::forward(PassType passType) { void MKLDNNFcLayer::forward(PassType passType) {
Layer::forward(passType); Layer::forward(passType);
reshape(); reshape();
{ {
REGISTER_TIMER_INFO("mkldnn_FwdTimer", getName().c_str()); REGISTER_TIMER_INFO("mkldnn_FwdTimer", getName().c_str());
syncInputValue(); updateInputData();
// just submit forward pipeline // just submit forward pipeline
stream_->submit(pipelineFwd_); stream_->submit(pipelineFwd_);
...@@ -286,7 +278,6 @@ void MKLDNNFcLayer::backward(const UpdateCallback& callback) { ...@@ -286,7 +278,6 @@ void MKLDNNFcLayer::backward(const UpdateCallback& callback) {
REGISTER_TIMER_INFO("mkldnn_bwdTimer", getName().c_str()); REGISTER_TIMER_INFO("mkldnn_bwdTimer", getName().c_str());
resetBwd(); resetBwd();
syncOutputGrad();
// just sumbmit backward pipeline // just sumbmit backward pipeline
stream_->submit(pipelineBwd_); stream_->submit(pipelineBwd_);
} }
......
...@@ -53,6 +53,8 @@ public: ...@@ -53,6 +53,8 @@ public:
void backward(const UpdateCallback& callback) override; void backward(const UpdateCallback& callback) override;
void updateInputData() override;
protected: protected:
/** /**
* reshape the input image sizes * reshape the input image sizes
...@@ -72,8 +74,6 @@ protected: ...@@ -72,8 +74,6 @@ protected:
* only would be called when needed * only would be called when needed
*/ */
void resetBwd(); void resetBwd();
void convertOutputToOtherDevice() override;
}; };
} // namespace paddle } // namespace paddle
...@@ -114,10 +114,10 @@ public: ...@@ -114,10 +114,10 @@ public:
virtual void convertWeightsToPaddle() {} virtual void convertWeightsToPaddle() {}
/** /**
* convert MKLDNN output to other device. * Update input value data when input layer is "data" type.
* only support CPU device yet * Since the input value data address might be changed.
*/ */
virtual void convertOutputToOtherDevice() {} virtual void updateInputData() {}
/** /**
* print info about sizes * print info about sizes
...@@ -155,6 +155,7 @@ protected: ...@@ -155,6 +155,7 @@ protected:
* copy base info and do not copy data value * copy base info and do not copy data value
*/ */
void copyOutputInfoToOtherDevice() { void copyOutputInfoToOtherDevice() {
int cnt = 0;
for (size_t i = 0; i < outputOtherDevice_.size(); i++) { for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
outputOtherDevice_[i].setFrameHeight(output_.getFrameHeight()); outputOtherDevice_[i].setFrameHeight(output_.getFrameHeight());
outputOtherDevice_[i].setFrameWidth(output_.getFrameWidth()); outputOtherDevice_[i].setFrameWidth(output_.getFrameWidth());
...@@ -163,6 +164,12 @@ protected: ...@@ -163,6 +164,12 @@ protected:
outputOtherDevice_[i].subSequenceStartPositions = outputOtherDevice_[i].subSequenceStartPositions =
output_.subSequenceStartPositions; output_.subSequenceStartPositions;
outputOtherDevice_[i].cpuSequenceDims = output_.cpuSequenceDims; outputOtherDevice_[i].cpuSequenceDims = output_.cpuSequenceDims;
if (outputOtherDevice_[i].deviceId == CPU_DEVICE) {
++cnt;
}
}
if (cnt > 1) {
LOG(WARNING) << "should not have more than one CPU devie";
} }
} }
...@@ -193,32 +200,6 @@ protected: ...@@ -193,32 +200,6 @@ protected:
return outputOtherDevice_.size() == 0; return outputOtherDevice_.size() == 0;
} }
/**
* Sync input value data
*/
void syncInputValue() {
if (inputIsOnlyMKLDNN()) {
return;
}
real* iData = getInputValue(0, CPU_DEVICE)->getData();
// update input data
// since it might be changed if this is after data layer
inVal_->updateData(iData);
}
/**
* Sync output grad data
*/
void syncOutputGrad() {
if (outputIsOnlyMKLDNN()) {
return;
}
// update diff
real* oDiff = getOutput(CPU_DEVICE).grad->getData();
outGrad_->updateData(oDiff);
}
/** /**
* Set deviceId of this layer. * Set deviceId of this layer.
*/ */
......
...@@ -24,10 +24,12 @@ bool SwitchOrderLayer::init(const LayerMap& layerMap, ...@@ -24,10 +24,12 @@ bool SwitchOrderLayer::init(const LayerMap& layerMap,
/* Initialize the basic parent class */ /* Initialize the basic parent class */
Layer::init(layerMap, parameterMap); Layer::init(layerMap, parameterMap);
auto& img_conf = config_.inputs(0).image_conf(); auto& img_conf = config_.inputs(0).image_conf();
size_t inD = img_conf.img_size_z();
size_t inH = size_t inH =
img_conf.has_img_size_y() ? img_conf.img_size_y() : img_conf.img_size(); img_conf.has_img_size_y() ? img_conf.img_size_y() : img_conf.img_size();
size_t inW = img_conf.img_size(); size_t inW = img_conf.img_size();
size_t inC = img_conf.channels(); size_t inC = img_conf.channels();
inH = inH * inD;
inDims_ = TensorShape({0, inC, inH, inW}); inDims_ = TensorShape({0, inC, inH, inW});
outDims_ = TensorShape(4); outDims_ = TensorShape(4);
...@@ -64,9 +66,10 @@ void SwitchOrderLayer::setInDims() { ...@@ -64,9 +66,10 @@ void SwitchOrderLayer::setInDims() {
MatrixPtr input = inputLayers_[0]->getOutputValue(); MatrixPtr input = inputLayers_[0]->getOutputValue();
size_t batchSize = input->getHeight(); size_t batchSize = input->getHeight();
inDims_.setDim(0, batchSize); inDims_.setDim(0, batchSize);
int d = inputLayers_[0]->getOutput().getFrameDepth();
d = (d == 0 ? 1 : d);
int h = inputLayers_[0]->getOutput().getFrameHeight(); int h = inputLayers_[0]->getOutput().getFrameHeight();
if (h != 0) inDims_.setDim(2, h); if (h != 0) inDims_.setDim(2, h * d);
int w = inputLayers_[0]->getOutput().getFrameWidth(); int w = inputLayers_[0]->getOutput().getFrameWidth();
if (w != 0) inDims_.setDim(3, w); if (w != 0) inDims_.setDim(3, w);
int totalCount = input->getElementCnt(); int totalCount = input->getElementCnt();
......
...@@ -1703,6 +1703,55 @@ TEST(Layer, BatchNormalizationLayer) { ...@@ -1703,6 +1703,55 @@ TEST(Layer, BatchNormalizationLayer) {
#endif #endif
} }
void testBatchNorm3DLayer(const string& type, bool trans, bool useGpu) {
TestConfig config;
const int CHANNELS = 10;
const int IMG_SIZE = 16;
const int IMG_SIZE_Y = 8;
const int IMG_SIZE_Z = 8;
size_t size = CHANNELS * IMG_SIZE * IMG_SIZE_Y * IMG_SIZE_Z;
config.layerConfig.set_type(type);
config.layerConfig.set_size(size);
config.layerConfig.set_active_type("sigmoid");
config.biasSize = CHANNELS;
config.inputDefs.push_back({INPUT_DATA,
"layer_0",
/* dim= */ size,
/* paraSize= */ CHANNELS});
config.inputDefs.push_back({INPUT_DATA, "layer_1_running_mean", 1, CHANNELS});
config.inputDefs.back().isStatic = true;
config.inputDefs.push_back({INPUT_DATA, "layer_2_running_var", 1, CHANNELS});
config.inputDefs.back().isStatic = true;
LayerInputConfig* input = config.layerConfig.add_inputs();
config.layerConfig.add_inputs();
config.layerConfig.add_inputs();
ImageConfig* img_conf = input->mutable_image_conf();
img_conf->set_channels(CHANNELS);
img_conf->set_img_size(IMG_SIZE);
img_conf->set_img_size_y(IMG_SIZE_Y);
img_conf->set_img_size_z(IMG_SIZE_Z);
testLayerGrad(config,
"batch_norm",
64,
/* trans= */ trans,
useGpu,
/* useWeight */ true);
}
TEST(Layer, testBatchNorm3DLayer) {
testBatchNorm3DLayer("batch_norm", false, false);
#ifndef PADDLE_ONLY_CPU
testBatchNorm3DLayer("batch_norm", false, true);
if (hl_get_cudnn_lib_version() >= int(4000)) {
testBatchNorm3DLayer("cudnn_batch_norm", false, true);
}
#endif
}
void testConvOperator(bool isDeconv) { void testConvOperator(bool isDeconv) {
TestConfig config; TestConfig config;
const int NUM_FILTERS = 16; const int NUM_FILTERS = 16;
...@@ -2253,26 +2302,27 @@ void test3DDeConvLayer(const string& type, bool trans, bool useGpu) { ...@@ -2253,26 +2302,27 @@ void test3DDeConvLayer(const string& type, bool trans, bool useGpu) {
conv->set_stride(2); conv->set_stride(2);
conv->set_stride_y(2); conv->set_stride_y(2);
conv->set_stride_z(2); conv->set_stride_z(2);
conv->set_img_size(IMAGE_SIZE); conv->set_output_x(IMAGE_SIZE);
conv->set_img_size_y(IMAGE_SIZE_Y); conv->set_output_y(IMAGE_SIZE_Y);
conv->set_img_size_z(IMAGE_SIZE_Z); conv->set_output_z(IMAGE_SIZE_Z);
conv->set_output_x(imageSize(conv->img_size(),
conv->set_img_size(imageSize(conv->output_x(),
conv->filter_size(), conv->filter_size(),
conv->padding(), conv->padding(),
conv->stride(), conv->stride(),
true)); true));
conv->set_output_y(imageSize(conv->img_size_y(), conv->set_img_size_y(imageSize(conv->output_y(),
conv->filter_size_y(), conv->filter_size_y(),
conv->padding_y(), conv->padding_y(),
conv->stride_y(), conv->stride_y(),
true)); true));
conv->set_output_z(imageSize(conv->img_size_z(), conv->set_img_size_z(imageSize(conv->output_z(),
conv->filter_size_z(), conv->filter_size_z(),
conv->padding_z(), conv->padding_z(),
conv->stride_z(), conv->stride_z(),
true)); true));
config.layerConfig.set_size(conv->output_x() * conv->output_y() * config.layerConfig.set_size(conv->img_size() * conv->img_size_y() *
conv->output_z() * NUM_FILTERS); conv->img_size_z() * NUM_FILTERS);
conv->set_groups(1); conv->set_groups(1);
conv->set_filter_channels(conv->channels() / conv->groups()); conv->set_filter_channels(conv->channels() / conv->groups());
config.inputDefs.push_back( config.inputDefs.push_back(
......
...@@ -33,14 +33,12 @@ MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m, memory::primitive_desc pd) { ...@@ -33,14 +33,12 @@ MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m, memory::primitive_desc pd) {
size_t width = cnts / dims[0]; size_t width = cnts / dims[0];
m = Matrix::create(height, width, false, false); m = Matrix::create(height, width, false, false);
} }
CHECK(m) << " Matrix should not be empty"; CHECK(m) << " Matrix should not be empty";
CpuMatrixPtr cpuMatrix = std::dynamic_pointer_cast<CpuMatrix>(m); CpuMatrixPtr cpuMatrix = std::dynamic_pointer_cast<CpuMatrix>(m);
CHECK(cpuMatrix) << "Only support create from CPU matrix yet"; CHECK(cpuMatrix) << "Only support create from CPU matrix yet";
CHECK_EQ(cpuMatrix->getElementCnt(), cnts) << "Count size does not match";
CHECK_EQ(cnts, m->getElementCnt()) << "Count size does not match"; return std::make_shared<MKLDNNMatrix>(cpuMatrix, pd);
return std::make_shared<MKLDNNMatrix>(
m->getData(), m->getHeight(), m->getWidth(), pd);
} }
MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m, MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m,
...@@ -138,7 +136,7 @@ void MKLDNNMatrix::downSpatial() { ...@@ -138,7 +136,7 @@ void MKLDNNMatrix::downSpatial() {
mkldnn_primitive_create(&result, pd.get(), nullptr, nullptr), mkldnn_primitive_create(&result, pd.get(), nullptr, nullptr),
"could not create a memory primitive"); "could not create a memory primitive");
reset(result); reset(result);
set_data_handle(getData()); set_data_handle(data_);
} }
} // namespace paddle } // namespace paddle
...@@ -30,11 +30,10 @@ typedef std::shared_ptr<MKLDNNMatrix> MKLDNNMatrixPtr; ...@@ -30,11 +30,10 @@ typedef std::shared_ptr<MKLDNNMatrix> MKLDNNMatrixPtr;
*/ */
class MKLDNNMatrix : public CpuMatrix, public mkldnn::memory { class MKLDNNMatrix : public CpuMatrix, public mkldnn::memory {
public: public:
MKLDNNMatrix(real* data, MKLDNNMatrix(CpuMatrixPtr m, mkldnn::memory::primitive_desc pd)
size_t height, : CpuMatrix(m->getData(), m->getHeight(), m->getWidth(), false),
size_t width, mkldnn::memory(pd, m->getData()),
mkldnn::memory::primitive_desc pd) m_(m) {}
: CpuMatrix(data, height, width, false), mkldnn::memory(pd, data) {}
~MKLDNNMatrix() {} ~MKLDNNMatrix() {}
...@@ -81,11 +80,29 @@ public: ...@@ -81,11 +80,29 @@ public:
void downSpatial(); void downSpatial();
/** /**
* Update the memory data handle. * set the memory data handle.
* Caution: This will not check the buffer size of the data, * Caution: This will not check the buffer size of the data,
* it should be coverd by user. * it should be coverd by user.
*/ */
void updateData(void* data) { set_data_handle(data); } void setData(real* data) {
set_data_handle(data);
CpuMatrix::setData(data);
m_.reset();
}
/**
* override Matrix::getData
* check data before return
*/
real* getData() override {
CHECK_EQ((void*)data_, get_data_handle());
return data_;
}
const real* getData() const override {
CHECK_EQ((void*)data_, get_data_handle());
return data_;
}
/** /**
* Get primitive descriptor. * Get primitive descriptor.
...@@ -143,6 +160,10 @@ protected: ...@@ -143,6 +160,10 @@ protected:
memory::format srcFmt, memory::format srcFmt,
memory::format dstFmt, memory::format dstFmt,
memory::dims dm); memory::dims dm);
private:
// save the CpuMatrixPtr in case the buffer released outside
CpuMatrixPtr m_;
}; };
} // namespace paddle } // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/concat_op.h"
#include <vector>
namespace paddle {
namespace operators {
using framework::Tensor;
class ConcatOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
auto ins = ctx.MultiInput<framework::Tensor>("X");
auto *out = ctx.Output<framework::Tensor>("Out");
size_t axis = static_cast<size_t>(ctx.Attr<int>("axis"));
size_t n = ins.size();
PADDLE_ENFORCE_GT(n, 1, "Input tensors count should > 1.");
auto out_dims = ins[0]->dims();
size_t in_zero_dims_size = out_dims.size();
for (size_t i = 1; i < n; i++) {
for (size_t j = 0; j < in_zero_dims_size; j++) {
if (j == axis) {
out_dims[axis] += ins[i]->dims()[j];
continue;
}
PADDLE_ENFORCE_EQ(out_dims[j], ins[i]->dims()[j],
"Input tensors should have the same "
"elements except the specify axis.")
}
}
out->Resize(out_dims);
}
};
class ConcatOpMaker : public framework::OpProtoAndCheckerMaker {
public:
ConcatOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "the input tensors of concat operator.").AsDuplicable();
AddOutput("Out", "the output tensor of concat operator.");
AddComment(R"DOC(
Join the input tensors along with the axis.
Examples:
Input[0] = [[1,2],[3,4]]
Input[1] = [[5,6]]
axis = 0
Output = [[1,2],
[3,4],
[5,6]]
)DOC");
AddAttr<int>("axis", "The axis which the inputs will be joined with.")
.SetDefault(0);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(concat, ops::ConcatOp, ops::ConcatOpMaker)
REGISTER_OP_CPU_KERNEL(concat,
ops::ConcatKernel<paddle::platform::CPUPlace, float>)
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#define EIGEN_USE_GPU
#include "paddle/operators/concat_op.h"
namespace ops = paddle::operators;
// TODO(Yancey1989) Add GPU kernel
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
template <typename Place, typename T>
class ConcatKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto ins = ctx.MultiInput<framework::Tensor>("X");
auto* out = ctx.Output<framework::Tensor>("Out");
int64_t axis = static_cast<int64_t>(ctx.Attr<int>("axis"));
size_t n = ins.size();
size_t output_axis_dim = 0;
size_t before = 1, after = 1;
for (size_t i = 0; i < n; i++) {
output_axis_dim += ins[i]->dims()[axis];
}
auto& input_zero = ins[0];
for (int64_t i = 0; i < input_zero->dims().size(); i++) {
if (i == axis) {
continue;
}
if (i < axis) {
before *= input_zero->dims()[i];
} else {
after *= input_zero->dims()[i];
}
}
size_t output_offset = 0;
for (size_t i = 0; i < n; i++) {
auto& in = ins[i];
auto axis_dim = in->dims()[axis];
for (size_t j = 0; j < before; j++) {
size_t len = axis_dim * after * sizeof(T);
const T* src = in->data<T>() + axis_dim * after * j;
T* out_data = out->mutable_data<T>(platform::CPUPlace());
T* dest = out_data + output_offset + output_axis_dim * after * j;
memcpy(dest, src, len);
}
output_offset += axis_dim * after;
}
}
};
} // namespace operators
} // namespace paddle
...@@ -119,4 +119,4 @@ TEST(math, im2col) { ...@@ -119,4 +119,4 @@ TEST(math, im2col) {
#ifndef PADDLE_ONLY_CPU #ifndef PADDLE_ONLY_CPU
testIm2col<paddle::platform::GPUPlace>(); testIm2col<paddle::platform::GPUPlace>();
#endif #endif
} }
\ No newline at end of file
...@@ -25,18 +25,27 @@ class MulOp : public framework::OperatorWithKernel { ...@@ -25,18 +25,27 @@ class MulOp : public framework::OperatorWithKernel {
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(const framework::InferShapeContext &ctx) const override {
auto dim0 = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx.Input<Tensor>("X")->dims();
auto dim1 = ctx.Input<Tensor>("Y")->dims(); auto y_dims = ctx.Input<Tensor>("Y")->dims();
PADDLE_ENFORCE_EQ(dim0.size(), 2, int x_num_col_dims = Attr<int>("x_num_col_dims");
"input X(%s) should be a tensor with 2 dims, a matrix", int y_num_col_dims = Attr<int>("y_num_col_dims");
ctx.op().Input("X"));
PADDLE_ENFORCE_EQ(dim1.size(), 2, PADDLE_ENFORCE(x_dims.size() > x_num_col_dims,
"input Y(%s) should be a tensor with 2 dims, a matrix", "The rank of input tensor X(%s) should be larger than "
ctx.op().Input("Y")); "`mul_op`'s `x_num_col_dims`.",
ctx.op().Input("X"));
PADDLE_ENFORCE(y_dims.size() > y_num_col_dims,
"The rank of input tensor Y(%s) should be larger than "
"`mul_op`'s `y_num_col_dims`.",
ctx.op().Input("Y"));
auto x_mat_dims = framework::flatten_to_2d(x_dims, x_num_col_dims);
auto y_mat_dims = framework::flatten_to_2d(y_dims, y_num_col_dims);
PADDLE_ENFORCE_EQ( PADDLE_ENFORCE_EQ(
dim0[1], dim1[0], x_mat_dims[1], y_mat_dims[0],
"First matrix's width must be equal with second matrix's height."); "First matrix's width must be equal with second matrix's height.");
ctx.Output<Tensor>("Out")->Resize({dim0[0], dim1[1]}); ctx.Output<Tensor>("Out")->Resize({x_mat_dims[0], y_mat_dims[1]});
} }
}; };
...@@ -47,6 +56,23 @@ class MulOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -47,6 +56,23 @@ class MulOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput("X", "The first input of mul op"); AddInput("X", "The first input of mul op");
AddInput("Y", "The second input of mul op"); AddInput("Y", "The second input of mul op");
AddOutput("Out", "The output of mul op"); AddOutput("Out", "The output of mul op");
AddAttr<int>(
"x_num_col_dims",
R"DOC(mul_op can take tensors with more than two dimensions as input `X`,
in that case, tensors will be reshaped to a matrix. The matrix's first
dimension(column length) will be the product of tensor's last
`num_col_dims` dimensions, and the matrix's second dimension(row length)
will be the product of tensor's first `rank - num_col_dims` dimensions.
)DOC")
.SetDefault(1)
.EqualGreaterThan(1);
AddAttr<int>(
"y_num_col_dims",
R"DOC(mul_op can take tensors with more than two dimensions as input `Y`,
in that case, tensors will be reshaped to a matrix. Just like input `X`.
)DOC")
.SetDefault(1)
.EqualGreaterThan(1);
AddComment(R"DOC( AddComment(R"DOC(
Two Element Mul Operator. Two Element Mul Operator.
...@@ -70,10 +96,20 @@ class MulOpGrad : public framework::OperatorWithKernel { ...@@ -70,10 +96,20 @@ class MulOpGrad : public framework::OperatorWithKernel {
auto out_dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims(); auto out_dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims();
auto *x_grad = ctx.Output<Tensor>(framework::GradVarName("X")); auto *x_grad = ctx.Output<Tensor>(framework::GradVarName("X"));
auto *y_grad = ctx.Output<Tensor>(framework::GradVarName("Y")); auto *y_grad = ctx.Output<Tensor>(framework::GradVarName("Y"));
PADDLE_ENFORCE(x_dims[0] == out_dims[0],
"Out@GRAD M X N must equal to X dims 0, M "); auto x_mat_dims =
PADDLE_ENFORCE(y_dims[1] == out_dims[1], framework::flatten_to_2d(x_dims, Attr<int>("x_num_col_dims"));
"Out@GRAD M X N must equal to Y dims 1, N "); auto y_mat_dims =
framework::flatten_to_2d(y_dims, Attr<int>("y_num_col_dims"));
PADDLE_ENFORCE_EQ(
x_mat_dims[0], out_dims[0],
"The first dimension of Out@GRAD must equal to the first dimension of "
"the first operand.");
PADDLE_ENFORCE_EQ(
y_mat_dims[1], out_dims[1],
"The second dimension of Out@GRAD must equal to the second "
"dimension of the second operand.");
if (x_grad) x_grad->Resize(x_dims); if (x_grad) x_grad->Resize(x_dims);
if (y_grad) y_grad->Resize(y_dims); if (y_grad) y_grad->Resize(y_dims);
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. You may not use this file except in compliance with the License.
You may obtain a copy of the License at You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
...@@ -31,13 +31,25 @@ template <typename Place, typename T> ...@@ -31,13 +31,25 @@ template <typename Place, typename T>
class MulKernel : public framework::OpKernel { class MulKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* x = context.Input<Tensor>("X"); const Tensor* x = context.Input<Tensor>("X");
auto* y = context.Input<Tensor>("Y"); const Tensor* y = context.Input<Tensor>("Y");
auto* z = context.Output<Tensor>("Out"); Tensor* z = context.Output<Tensor>("Out");
const Tensor x_matrix =
x->dims().size() > 2
? framework::ReshapeToMatrix<T>(
*x, context.template Attr<int>("x_num_col_dims"))
: *x;
const Tensor y_matrix =
y->dims().size() > 2
? framework::ReshapeToMatrix<T>(
*y, context.template Attr<int>("y_num_col_dims"))
: *y;
z->mutable_data<T>(context.GetPlace()); z->mutable_data<T>(context.GetPlace());
auto* device_context = auto* device_context =
const_cast<platform::DeviceContext*>(context.device_context_); const_cast<platform::DeviceContext*>(context.device_context_);
math::matmul<Place, T>(*x, false, *y, false, 1, z, 0, device_context); math::matmul<Place, T>(x_matrix, false, y_matrix, false, 1, z, 0,
device_context);
} }
}; };
...@@ -45,23 +57,39 @@ template <typename Place, typename T> ...@@ -45,23 +57,39 @@ template <typename Place, typename T>
class MulGradKernel : public framework::OpKernel { class MulGradKernel : public framework::OpKernel {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X"); int x_num_col_dims = ctx.template Attr<int>("x_num_col_dims");
auto* y = ctx.Input<Tensor>("Y"); int y_num_col_dims = ctx.template Attr<int>("y_num_col_dims");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out")); const Tensor* x = ctx.Input<Tensor>("X");
const Tensor* y = ctx.Input<Tensor>("Y");
const Tensor x_matrix =
x->dims().size() > 2 ? framework::ReshapeToMatrix<T>(*x, x_num_col_dims)
: *x;
const Tensor y_matrix =
y->dims().size() > 2 ? framework::ReshapeToMatrix<T>(*y, y_num_col_dims)
: *y;
const Tensor* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X")); Tensor* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y")); Tensor* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
auto* device_context = auto* device_context =
const_cast<platform::DeviceContext*>(ctx.device_context_); const_cast<platform::DeviceContext*>(ctx.device_context_);
if (dx) { if (dx) {
dx->mutable_data<T>(ctx.GetPlace()); dx->mutable_data<T>(ctx.GetPlace());
Tensor dx_matrix = dx->dims().size() > 2 ? framework::ReshapeToMatrix<T>(
*dx, x_num_col_dims)
: *dx;
// dx = dout * y'. dx: M x K, dout : M x N, y : K x N // dx = dout * y'. dx: M x K, dout : M x N, y : K x N
math::matmul<Place, T>(*dout, false, *y, true, 1, dx, 0, device_context); math::matmul<Place, T>(*dout, false, y_matrix, true, 1, &dx_matrix, 0,
device_context);
} }
if (dy) { if (dy) {
dy->mutable_data<T>(ctx.GetPlace()); dy->mutable_data<T>(ctx.GetPlace());
Tensor dy_matrix = dy->dims().size() > 2 ? framework::ReshapeToMatrix<T>(
*dy, y_num_col_dims)
: *dy;
// dy = x' * dout. dy K x N, dout : M x N, x : M x K // dy = x' * dout. dy K x N, dout : M x N, x : M x K
math::matmul<Place, T>(*x, true, *dout, false, 1, dy, 0, device_context); math::matmul<Place, T>(x_matrix, true, *dout, false, 1, &dy_matrix, 0,
device_context);
} }
} }
}; };
......
...@@ -25,14 +25,19 @@ class RowwiseAddOp : public framework::OperatorWithKernel { ...@@ -25,14 +25,19 @@ class RowwiseAddOp : public framework::OperatorWithKernel {
protected: protected:
void InferShape(const framework::InferShapeContext &ctx) const override { void InferShape(const framework::InferShapeContext &ctx) const override {
auto dim0 = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx.Input<Tensor>("X")->dims();
auto dim1 = ctx.Input<Tensor>("b")->dims(); auto b_dims = ctx.Input<Tensor>("b")->dims();
PADDLE_ENFORCE_GT(
PADDLE_ENFORCE(dim0.size() == 2, "Input 0 must be matrix"); x_dims.size(), b_dims.size(),
PADDLE_ENFORCE(dim1.size() == 1, "The second input must be vector"); "The rank of input `X` must be larger than the one of input `b`.");
PADDLE_ENFORCE(dim0[1] == dim1[0], "The width of two input must be same");
PADDLE_ENFORCE(ctx.OutputSize("Out") == 1, "The output size must be 1"); int num_col_dims = x_dims.size() - b_dims.size();
ctx.Output<Tensor>("Out")->Resize(ctx.Input<Tensor>("X")->dims());
PADDLE_ENFORCE_EQ(
framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims,
"The width of two operands must be same");
PADDLE_ENFORCE_EQ(ctx.OutputSize("Out"), 1, "The output size must be 1");
ctx.Output<Tensor>("Out")->Resize(x_dims);
} }
}; };
...@@ -61,13 +66,20 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel { ...@@ -61,13 +66,20 @@ class RowwiseAddGradOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("b"), "b should not be null"); PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("b"), "b should not be null");
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")), PADDLE_ENFORCE_NOT_NULL(ctx.InputVar(framework::GradVarName("Out")),
"Input(Out@GRAD) should not be null"); "Input(Out@GRAD) should not be null");
auto dims0 = ctx.Input<Tensor>("X")->dims(); auto x_dims = ctx.Input<Tensor>("X")->dims();
auto dims1 = ctx.Input<Tensor>("b")->dims(); auto b_dims = ctx.Input<Tensor>("b")->dims();
PADDLE_ENFORCE_EQ(1, dims1.size(), "b dims should be 1") PADDLE_ENFORCE_GT(
x_dims.size(), b_dims.size(),
"The rank of input `X` must be larger than the one of input `b`.");
int num_col_dims = x_dims.size() - b_dims.size();
PADDLE_ENFORCE_EQ(
framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims,
"The width of two operands must be same");
auto *dx = ctx.Output<Tensor>(framework::GradVarName("X")); auto *dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto *db = ctx.Output<Tensor>(framework::GradVarName("b")); auto *db = ctx.Output<Tensor>(framework::GradVarName("b"));
if (dx) dx->Resize(dims0); if (dx) dx->Resize(x_dims);
if (db) db->Resize(dims1); if (db) db->Resize(b_dims);
} }
}; };
......
...@@ -33,10 +33,12 @@ class RowwiseAddKernel : public framework::OpKernel { ...@@ -33,10 +33,12 @@ class RowwiseAddKernel : public framework::OpKernel {
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto out = context.Output<Tensor>("Out"); auto out = context.Output<Tensor>("Out");
out->mutable_data<T>(context.GetPlace()); out->mutable_data<T>(context.GetPlace());
int num_col_dims = context.Input<Tensor>("X")->dims().size() -
auto input = EigenMatrix<T>::From(*context.Input<Tensor>("X")); context.Input<Tensor>("b")->dims().size();
auto bias = EigenVector<T>::From(*context.Input<Tensor>("b")); auto input =
auto output = EigenMatrix<T>::From(*out); EigenMatrix<T>::Reshape(*context.Input<Tensor>("X"), num_col_dims);
auto bias = EigenVector<T>::Flatten(*context.Input<Tensor>("b"));
auto output = EigenMatrix<T>::Reshape(*out, num_col_dims);
const int bias_size = bias.dimension(0); const int bias_size = bias.dimension(0);
const int rest_size = input.size() / bias_size; const int rest_size = input.size() / bias_size;
...@@ -54,12 +56,15 @@ class RowwiseAddGradKernel : public framework::OpKernel { ...@@ -54,12 +56,15 @@ class RowwiseAddGradKernel : public framework::OpKernel {
auto* dout = context.Input<Tensor>(framework::GradVarName("Out")); auto* dout = context.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = context.Output<Tensor>(framework::GradVarName("X")); auto* dx = context.Output<Tensor>(framework::GradVarName("X"));
auto* db = context.Output<Tensor>(framework::GradVarName("b")); auto* db = context.Output<Tensor>(framework::GradVarName("b"));
int num_col_dims = context.Input<Tensor>("X")->dims().size() -
context.Input<Tensor>("b")->dims().size();
auto out_grad = EigenMatrix<T>::From(*dout); auto out_grad = EigenMatrix<T>::Reshape(*dout, num_col_dims);
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
if (dx) { if (dx) {
dx->mutable_data<T>(context.GetPlace()); dx->mutable_data<T>(context.GetPlace());
EigenMatrix<T>::From(*dx).device(place) = out_grad; EigenMatrix<T>::Reshape(*dx, num_col_dims).device(place) = out_grad;
} }
if (db) { if (db) {
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/sum_op.h"
#include <vector>
namespace paddle {
namespace operators {
using framework::Tensor;
class SumOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
auto ins = ctx.MultiInput<framework::Tensor>("X");
auto *out = ctx.Output<framework::Tensor>("Out");
int N = ins.size();
auto in_dim = ins[0]->dims();
PADDLE_ENFORCE_GT(N, 1, "Input tensors count should > 1.");
for (int i = 1; i < N; i++) {
auto dim = ins[i]->dims();
PADDLE_ENFORCE(in_dim == dim, "Input tensors must have same shape");
}
out->Resize(in_dim);
}
};
class SumOpMaker : public framework::OpProtoAndCheckerMaker {
public:
SumOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "the input tensors of sum operator.").AsDuplicable();
AddOutput("Out", "the output tensor of sum operator.");
AddComment(R"DOC(
Sum the input tensors.
)DOC");
}
};
class SumGradOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
auto outputs = ctx.MultiOutput<Tensor>(framework::GradVarName("X"));
auto dims = ctx.Input<Tensor>(framework::GradVarName("Out"))->dims();
for (auto output : outputs) {
output->Resize(dims);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP(sum, ops::SumOp, ops::SumOpMaker, sum_grad, ops::SumGradOp);
REGISTER_OP_CPU_KERNEL(sum, ops::SumKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(sum_grad,
ops::SumGradKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#define EIGEN_USE_GPU
#include "paddle/operators/sum_op.h"
namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(sum, ops::SumKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(sum_grad,
ops::SumGradKernel<paddle::platform::GPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
template <typename Place, typename T>
class SumKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto ins = context.MultiInput<Tensor>("X");
auto* out = context.Output<Tensor>("Out");
out->mutable_data<T>(context.GetPlace());
auto place = context.GetEigenDevice<Place>();
auto result = EigenVector<T>::Flatten(*out);
int N = ins.size();
auto in = EigenVector<T>::Flatten(*(ins[0]));
result.device(place) = in;
for (int i = 1; i < N; i++) {
auto in = EigenVector<T>::Flatten(*(ins[i]));
result.device(place) = result + in;
}
}
};
template <typename Place, typename T>
class SumGradKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto* input = context.Input<Tensor>(framework::GradVarName("Out"));
auto outs = context.MultiOutput<Tensor>(framework::GradVarName("X"));
for (auto out : outs) {
out->mutable_data<T>(context.GetPlace());
}
auto place = context.GetEigenDevice<Place>();
auto in = EigenVector<T>::Flatten(*input);
for (auto out : outs) {
auto result = EigenVector<T>::Flatten(*out);
result.device(place) = in;
}
}
};
} // namespace operators
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/top_k_op.h"
namespace paddle {
namespace operators {
class TopkOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"),
"Input of TopkOP must be initialized.");
auto *input = ctx.Input<framework::Tensor>("X");
const int k = static_cast<int>(ctx.Attr<int>("k"));
PADDLE_ENFORCE_GE(k, 1, "k must >= 1");
PADDLE_ENFORCE_GE(input->dims().size(), 1, "input must have >= 1d shape");
PADDLE_ENFORCE_GE(input->dims()[input->dims().size() - 1], k,
"input must have >= k columns");
framework::DDim dims = input->dims();
dims[dims.size() - 1] = k;
ctx.Output<Tensor>("Out")->Resize(dims);
ctx.Output<Tensor>("Indices")->Resize(dims);
}
};
class TopkOpMaker : public framework::OpProtoAndCheckerMaker {
public:
TopkOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of Topk op");
AddOutput("Out", "The output tensor of Topk op");
AddOutput("Indices", "The indices of Topk elements of input");
AddComment(
R"DOC(If the input is a vector (1d tensor), finds the k largest entries in the vector and outputs their values and indices as vectors. Thus values[j] is the j-th largest entry in input, and its index is indices[j].
For matrices, computes the top k entries in each row. )DOC");
AddAttr<int>("k",
"Number of top elements to look for along the last "
"dimension (along each row for matrices).")
.SetDefault(1);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(top_k, ops::TopkOp, ops::TopkOpMaker);
REGISTER_OP_CPU_KERNEL(top_k,
ops::TopkKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/op_registry.h"
#include "paddle/platform/assert.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T>
struct Pair {
__device__ __forceinline__ Pair() {}
__device__ __forceinline__ Pair(T value, int id) : v(value), id(id) {}
__device__ __forceinline__ void set(T value, int id) {
v = value;
id = id;
}
__device__ __forceinline__ void operator=(const Pair<T>& in) {
v = in.v;
id = in.id;
}
__device__ __forceinline__ bool operator<(const T value) const {
return (v < value);
}
__device__ __forceinline__ bool operator<(const Pair<T>& in) const {
return (v < in.v) || ((v == in.v) && (id > in.id));
}
__device__ __forceinline__ bool operator>(const Pair<T>& in) const {
return (v > in.v) || ((v == in.v) && (id < in.id));
}
T v;
int id;
};
template <typename T>
__device__ __forceinline__ void AddTo(Pair<T> topk[], const Pair<T>& p,
int beam_size) {
for (int k = beam_size - 2; k >= 0; k--) {
if (topk[k] < p) {
topk[k + 1] = topk[k];
} else {
topk[k + 1] = p;
return;
}
}
topk[0] = p;
}
template <typename T, int beam_size>
__device__ __forceinline__ void AddTo(Pair<T> topk[], const Pair<T>& p) {
for (int k = beam_size - 2; k >= 0; k--) {
if (topk[k] < p) {
topk[k + 1] = topk[k];
} else {
topk[k + 1] = p;
return;
}
}
topk[0] = p;
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* src, int idx,
int dim, int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < src[idx]) {
Pair<T> tmp(src[idx], idx);
AddTo<T>(topk, tmp, beam_size);
}
idx += BlockSize;
}
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* src, int idx,
int dim, const Pair<T>& max,
int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < src[idx]) {
Pair<T> tmp(src[idx], idx);
if (tmp < max) {
AddTo<T>(topk, tmp, beam_size);
}
}
idx += BlockSize;
}
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* val, int* col,
int idx, int dim, int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < val[idx]) {
Pair<T> tmp(val[idx], col[idx]);
AddTo<T>(topk, tmp, beam_size);
}
idx += BlockSize;
}
}
template <typename T, int BlockSize>
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* val, int* col,
int idx, int dim, const Pair<T>& max,
int beam_size) {
while (idx < dim) {
if (topk[beam_size - 1] < val[idx]) {
Pair<T> tmp(val[idx], col[idx]);
if (tmp < max) {
AddTo<T>(topk, tmp, beam_size);
}
}
idx += BlockSize;
}
}
template <typename T, int MaxLength, int BlockSize>
__device__ __forceinline__ void ThreadGetTopK(Pair<T> topk[], int& beam,
int beam_size, const T* src,
bool& firstStep, bool& is_empty,
Pair<T>& max, int dim,
const int tid) {
if (beam > 0) {
int length = beam < beam_size ? beam : beam_size;
if (firstStep) {
firstStep = false;
GetTopK<T, BlockSize>(topk, src, tid, dim, length);
} else {
for (int k = 0; k < MaxLength; k++) {
if (k < MaxLength - beam) {
topk[k] = topk[k + beam];
} else {
topk[k].set(-INFINITY, -1);
}
}
if (!is_empty) {
GetTopK<T, BlockSize>(topk + MaxLength - beam, src, tid, dim, max,
length);
}
}
max = topk[MaxLength - 1];
if (max.v == -1) is_empty = true;
beam = 0;
}
}
template <typename T, int MaxLength, int BlockSize>
__device__ __forceinline__ void ThreadGetTopK(Pair<T> topk[], int& beam,
int beam_size, const T* val,
int* col, bool& firstStep,
bool& is_empty, Pair<T>& max,
int dim, const int tid) {
if (beam > 0) {
int length = beam < beam_size ? beam : beam_size;
if (firstStep) {
firstStep = false;
GetTopK<T, BlockSize>(topk, val, col, tid, dim, length);
} else {
for (int k = 0; k < MaxLength; k++) {
if (k < MaxLength - beam) {
topk[k] = topk[k + beam];
} else {
topk[k].set(-INFINITY, -1);
}
}
if (!is_empty) {
GetTopK<T, BlockSize>(topk + MaxLength - beam, val, col, tid, dim, max,
length);
}
}
max = topk[MaxLength - 1];
if (max.v == -1) is_empty = true;
beam = 0;
}
}
template <typename T, int MaxLength, int BlockSize>
__device__ __forceinline__ void BlockReduce(Pair<T>* sh_topk, int* maxid,
Pair<T> topk[], T** topVal,
int** topIds, int& beam, int& k,
const int tid, const int warp) {
while (true) {
__syncthreads();
if (tid < BlockSize / 2) {
if (sh_topk[tid] < sh_topk[tid + BlockSize / 2]) {
maxid[tid] = tid + BlockSize / 2;
} else {
maxid[tid] = tid;
}
}
__syncthreads();
for (int stride = BlockSize / 4; stride > 0; stride = stride / 2) {
if (tid < stride) {
if (sh_topk[maxid[tid]] < sh_topk[maxid[tid + stride]]) {
maxid[tid] = maxid[tid + stride];
}
}
__syncthreads();
}
__syncthreads();
if (tid == 0) {
**topVal = sh_topk[maxid[0]].v;
**topIds = sh_topk[maxid[0]].id;
(*topVal)++;
(*topIds)++;
}
if (tid == maxid[0]) beam++;
if (--k == 0) break;
__syncthreads();
if (tid == maxid[0]) {
if (beam < MaxLength) {
sh_topk[tid] = topk[beam];
}
}
if (maxid[0] / 32 == warp) {
if (__shfl(beam, (maxid[0]) % 32, 32) == MaxLength) break;
}
}
}
/**
* Each block compute one sample.
* In a block:
* 1. every thread get top MaxLength value;
* 2. merge to sh_topk, block reduce and get max value;
* 3. go to the second setp, until one thread's topk value is null;
* 4. go to the first setp, until get the topk value.
*/
template <typename T, int MaxLength, int BlockSize>
__global__ void KeMatrixTopK(T* output, int output_stride, int* indices,
const T* src, int lds, int dim, int k) {
__shared__ Pair<T> sh_topk[BlockSize];
__shared__ int maxid[BlockSize / 2];
const int tid = threadIdx.x;
const int warp = threadIdx.x / 32;
output += blockIdx.x * output_stride;
indices += blockIdx.x * k;
Pair<T> topk[MaxLength];
int beam = MaxLength;
Pair<T> max;
bool is_empty = false;
bool firststep = true;
for (int k = 0; k < MaxLength; k++) {
topk[k].set(-INFINITY, -1);
}
while (k) {
ThreadGetTopK<T, MaxLength, BlockSize>(topk, beam, k,
src + blockIdx.x * lds, firststep,
is_empty, max, dim, tid);
sh_topk[tid] = topk[0];
BlockReduce<T, MaxLength, BlockSize>(sh_topk, maxid, topk, &output,
&indices, beam, k, tid, warp);
}
}
template <typename T>
class TopkOpCUDAKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"It must use GPUPlace.");
auto* input = ctx.Input<Tensor>("X");
auto* output = ctx.Output<Tensor>("Out");
auto* indices = ctx.Output<Tensor>("Indices");
size_t k = static_cast<int>(ctx.Attr<int>("k"));
const T* input_data = input->data<T>();
T* output_data = output->mutable_data<T>(ctx.GetPlace());
// FIXME(typhoonzero): data is always converted to type T?
int* indices_data = indices->mutable_data<int>(ctx.GetPlace());
size_t input_height = input->dims()[0];
size_t input_width = input->dims()[1];
if (k > input_width) k = input_width;
// NOTE: pass lds and dim same to input width.
// NOTE: old matrix implementation of stride is different to eigen.
// TODO(typhoonzero): launch kernel on specified stream.
// TODO(typhoonzero): refine this kernel.
dim3 threads(256, 1);
dim3 grid(input_height, 1);
KeMatrixTopK<T, 5, 256><<<grid, threads>>>(
output_data, output->dims()[1], indices_data, input_data, input_width,
input_width, int(k));
}
};
} // namespace operators
} // namespace paddle
REGISTER_OP_GPU_KERNEL(top_k, paddle::operators::TopkOpCUDAKernel<float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <algorithm>
#include <iostream>
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
template <typename Place, typename T>
class TopkKernel : public framework::OpKernel {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
// Get the top k elements of each row of input tensor
// FIXME: only deal with matrix(2d tensor).
auto* input = ctx.Input<Tensor>("X");
auto* output = ctx.Output<Tensor>("Out");
auto* indices = ctx.Output<Tensor>("Indices");
// k is determined by Attr
const size_t k = static_cast<int>(ctx.Attr<int>("k"));
T* output_data = output->mutable_data<T>(ctx.GetPlace());
T* indices_data = indices->mutable_data<T>(ctx.GetPlace());
auto eg_input = EigenMatrix<T>::From(*input);
// reshape input to a flattern matrix(like flat_inner_dims)
framework::DDim inputdims = input->dims();
const size_t row = framework::product(
framework::slice_ddim(inputdims, 0, inputdims.size() - 1));
const size_t col = inputdims[inputdims.size() - 1];
Eigen::DSizes<int, 2> flat2dims(row, col);
// NOTE: eigen shape doesn't affect paddle tensor.
eg_input.reshape(flat2dims);
for (size_t i = 0; i < row; i++) {
std::vector<std::pair<T, size_t>> vec;
for (size_t j = 0; j < col; j++) {
vec.push_back(std::pair<T, size_t>(eg_input(i, j), j));
}
std::partial_sort(
vec.begin(), vec.begin() + k, vec.end(),
[](const std::pair<T, size_t>& l, const std::pair<T, size_t>& r) {
return l.first > r.first;
});
for (size_t j = 0; j < k; j++) {
output_data[i * k + j] = vec[j].first;
indices_data[i * k + j] = vec[j].second;
}
}
}
};
} // namespace operators
} // namespace paddle
...@@ -25,10 +25,6 @@ limitations under the License. */ ...@@ -25,10 +25,6 @@ limitations under the License. */
#include "paddle/string/printf.h" #include "paddle/string/printf.h"
#include "paddle/string/to_string.h" #include "paddle/string/to_string.h"
#ifdef __GNUC__
#include <cxxabi.h> // for __cxa_demangle
#endif
#ifndef PADDLE_ONLY_CPU #ifndef PADDLE_ONLY_CPU
#include "paddle/platform/dynload/cublas.h" #include "paddle/platform/dynload/cublas.h"
...@@ -46,19 +42,6 @@ limitations under the License. */ ...@@ -46,19 +42,6 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace platform { namespace platform {
namespace {
#ifdef __GNUC__
inline std::string demangle(std::string name) {
int status = -4; // some arbitrary value to eliminate the compiler warning
std::unique_ptr<char, void (*)(void*)> res{
abi::__cxa_demangle(name.c_str(), NULL, NULL, &status), std::free};
return (status == 0) ? res.get() : name;
}
#else
inline std::string demangle(std::string name) { return name; }
#endif
}
struct EnforceNotMet : public std::exception { struct EnforceNotMet : public std::exception {
std::exception_ptr exp_; std::exception_ptr exp_;
std::string err_str_; std::string err_str_;
...@@ -79,7 +62,7 @@ struct EnforceNotMet : public std::exception { ...@@ -79,7 +62,7 @@ struct EnforceNotMet : public std::exception {
Dl_info info; Dl_info info;
for (int i = 0; i < size; ++i) { for (int i = 0; i < size; ++i) {
if (dladdr(call_stack[i], &info)) { if (dladdr(call_stack[i], &info)) {
auto demangled = demangle(info.dli_sname); auto demangled = info.dli_sname;
auto addr_offset = static_cast<char*>(call_stack[i]) - auto addr_offset = static_cast<char*>(call_stack[i]) -
static_cast<char*>(info.dli_saddr); static_cast<char*>(info.dli_saddr);
sout << string::Sprintf("%-3d %*0p %s + %zd\n", i, sout << string::Sprintf("%-3d %*0p %s + %zd\n", i,
......
...@@ -17,6 +17,7 @@ limitations under the License. */ ...@@ -17,6 +17,7 @@ limitations under the License. */
#include <vector> #include <vector>
#include "paddle/framework/backward.h" #include "paddle/framework/backward.h"
#include "paddle/framework/lod_tensor.h"
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/operators/net_op.h" #include "paddle/operators/net_op.h"
#include "paddle/operators/recurrent_op.h" #include "paddle/operators/recurrent_op.h"
...@@ -49,12 +50,17 @@ USE_OP(minus); ...@@ -49,12 +50,17 @@ USE_OP(minus);
USE_OP(cos_sim); USE_OP(cos_sim);
USE_CPU_ONLY_OP(gather); USE_CPU_ONLY_OP(gather);
USE_CPU_ONLY_OP(scatter); USE_CPU_ONLY_OP(scatter);
USE_CPU_ONLY_OP(concat);
USE_OP(top_k);
USE_OP(squared_l2_distance); USE_OP(squared_l2_distance);
USE_OP(sum);
namespace paddle { namespace paddle {
namespace framework { namespace framework {
using Tensor = framework::Tensor; using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
using LoD = framework::LoD;
static size_t UniqueIntegerGenerator() { static size_t UniqueIntegerGenerator() {
static std::atomic<size_t> generator; static std::atomic<size_t> generator;
...@@ -114,6 +120,60 @@ PYBIND11_PLUGIN(core) { ...@@ -114,6 +120,60 @@ PYBIND11_PLUGIN(core) {
return self.data<float>()[offset]; return self.data<float>()[offset];
}); });
py::class_<LoDTensor>(m, "LoDTensor", R"DOC(LoD(Leval of Ddetails) Tensor.
The tensor and LoD info should be created before creating the LoDTensor, then
call the set_tensor and set_lod functions to set them.
)DOC")
.def("__init__",
[](LoDTensor &instance,
const std::vector<std::vector<size_t>> &lod,
Tensor *t) {
#ifdef PADDLE_ONLY_CPU
new (&instance) LoDTensor(lod, t);
#else
paddle::framework::LoD new_lod;
new_lod.reserve(lod.size());
std::copy(lod.begin(), lod.end(), std::back_inserter(new_lod));
new (&instance) LoDTensor(new_lod, t);
#endif
})
.def("set_tensor",
[](LoDTensor &self, Tensor *tensor) { self.set_tensor(tensor); })
.def("set_lod",
[](LoDTensor &self, const std::vector<std::vector<size_t>> &lod) {
#ifdef PADDLE_ONLY_CPU
self.set_lod(lod);
#else
paddle::framework::LoD new_lod;
new_lod.reserve(lod.size());
std::copy(lod.begin(), lod.end(), std::back_inserter(new_lod));
self.set_lod(new_lod);
#endif
})
.def("tensor",
[](LoDTensor &self) -> Tensor & { return self.tensor(); },
py::return_value_policy::reference)
.def("lod", [](LoDTensor &self) -> std::vector<std::vector<size_t>> {
#ifdef PADDLE_ONLY_CPU
return self.lod();
#else
auto lod = self.lod();
std::vector<std::vector<size_t>> new_lod;
new_lod.reserve(lod.size());
std::transform(lod.begin(), lod.end(), std::back_inserter(new_lod),
[](paddle::framework::Vector<size_t> item) ->
std::vector<size_t> {
std::vector<size_t> v;
v.reserve(item.size());
std::copy(item.begin(), item.end(), std::back_inserter(v));
return v;
});
return new_lod;
#endif
});
py::class_<Variable>(m, "Variable", R"DOC(Variable Class. py::class_<Variable>(m, "Variable", R"DOC(Variable Class.
All parameter, weight, gradient are variables in Paddle. All parameter, weight, gradient are variables in Paddle.
...@@ -125,6 +185,11 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -125,6 +185,11 @@ All parameter, weight, gradient are variables in Paddle.
.def("get_tensor", .def("get_tensor",
[](Variable &self) -> Tensor * { return self.GetMutable<Tensor>(); }, [](Variable &self) -> Tensor * { return self.GetMutable<Tensor>(); },
py::return_value_policy::reference) py::return_value_policy::reference)
.def("get_lod_tensor",
[](Variable &self) -> LoDTensor * {
return self.GetMutable<LoDTensor>();
},
py::return_value_policy::reference)
.def("get_net", .def("get_net",
[](Variable &self) -> operators::NetOp * { [](Variable &self) -> operators::NetOp * {
return self.GetMutable<operators::NetOp>(); return self.GetMutable<operators::NetOp>();
...@@ -215,7 +280,10 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -215,7 +280,10 @@ All parameter, weight, gradient are variables in Paddle.
-> std::map<std::string, std::vector<std::string>> { -> std::map<std::string, std::vector<std::string>> {
return op.Outputs(); return op.Outputs();
}) })
.def("output_vars",
[](const OperatorBase &op) { return op.OutputVars(true); })
.def("inputs", [](const OperatorBase &op) { return op.Inputs(); }) .def("inputs", [](const OperatorBase &op) { return op.Inputs(); })
.def("input_vars", [](const OperatorBase &op) { return op.InputVars(); })
.def("__str__", &OperatorBase::DebugString) .def("__str__", &OperatorBase::DebugString)
.def("no_intermediate_outputs", .def("no_intermediate_outputs",
[](const OperatorBase &op) { return op.OutputVars(false); }) [](const OperatorBase &op) { return op.OutputVars(false); })
......
...@@ -30,6 +30,8 @@ Configuring cmake in /paddle/build ... ...@@ -30,6 +30,8 @@ Configuring cmake in /paddle/build ...
-DCMAKE_BUILD_TYPE=Release -DCMAKE_BUILD_TYPE=Release
-DWITH_DOC=OFF -DWITH_DOC=OFF
-DWITH_GPU=${WITH_GPU:-OFF} -DWITH_GPU=${WITH_GPU:-OFF}
-DWITH_MKLDNN=${WITH_MKLDNN:-ON}
-DWITH_MKLML=${WITH_MKLML:-ON}
-DWITH_AVX=${WITH_AVX:-OFF} -DWITH_AVX=${WITH_AVX:-OFF}
-DWITH_GOLANG=${WITH_GOLANG:-ON} -DWITH_GOLANG=${WITH_GOLANG:-ON}
-DWITH_SWIG_PY=ON -DWITH_SWIG_PY=ON
...@@ -37,7 +39,7 @@ Configuring cmake in /paddle/build ... ...@@ -37,7 +39,7 @@ Configuring cmake in /paddle/build ...
-DWITH_PYTHON=${WITH_PYTHON:-ON} -DWITH_PYTHON=${WITH_PYTHON:-ON}
-DWITH_SWIG_PY=${WITH_SWIG_PY:-ON} -DWITH_SWIG_PY=${WITH_SWIG_PY:-ON}
-DCUDNN_ROOT=/usr/ -DCUDNN_ROOT=/usr/
-DWITH_STYLE_CHECK=${WITH_STYLE_CHECK:-OFF} -DWITH_STYLE_CHECK=${WITH_STYLE_CHECK:-ON}
-DWITH_TESTING=${WITH_TESTING:-ON} -DWITH_TESTING=${WITH_TESTING:-ON}
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
======================================== ========================================
...@@ -50,6 +52,8 @@ cmake .. \ ...@@ -50,6 +52,8 @@ cmake .. \
-DCMAKE_BUILD_TYPE=Release \ -DCMAKE_BUILD_TYPE=Release \
-DWITH_DOC=OFF \ -DWITH_DOC=OFF \
-DWITH_GPU=${WITH_GPU:-OFF} \ -DWITH_GPU=${WITH_GPU:-OFF} \
-DWITH_MKLDNN=${WITH_MKLDNN:-ON} \
-DWITH_MKLML=${WITH_MKLML:-ON} \
-DWITH_AVX=${WITH_AVX:-OFF} \ -DWITH_AVX=${WITH_AVX:-OFF} \
-DWITH_GOLANG=${WITH_GOLANG:-ON} \ -DWITH_GOLANG=${WITH_GOLANG:-ON} \
-DWITH_SWIG_PY=${WITH_SWIG_PY:-ON} \ -DWITH_SWIG_PY=${WITH_SWIG_PY:-ON} \
......
...@@ -320,6 +320,9 @@ void loadFileList(const std::string& fileListFileName, ...@@ -320,6 +320,9 @@ void loadFileList(const std::string& fileListFileName,
} }
double getMemoryUsage() { double getMemoryUsage() {
#if defined(__ANDROID__)
return 0.0;
#else
FILE* fp = fopen("/proc/meminfo", "r"); FILE* fp = fopen("/proc/meminfo", "r");
CHECK(fp) << "failed to fopen /proc/meminfo"; CHECK(fp) << "failed to fopen /proc/meminfo";
size_t bufsize = 256 * sizeof(char); size_t bufsize = 256 * sizeof(char);
...@@ -357,6 +360,7 @@ double getMemoryUsage() { ...@@ -357,6 +360,7 @@ double getMemoryUsage() {
delete[] buf; delete[] buf;
double usedMem = 1.0 - 1.0 * (freeMem + bufMem + cacheMem) / totalMem; double usedMem = 1.0 - 1.0 * (freeMem + bufMem + cacheMem) / totalMem;
return usedMem; return usedMem;
#endif
} }
SyncThreadPool* getGlobalSyncThreadPool() { SyncThreadPool* getGlobalSyncThreadPool() {
......
...@@ -33,6 +33,13 @@ limitations under the License. */ ...@@ -33,6 +33,13 @@ limitations under the License. */
#include "Flags.h" #include "Flags.h"
#include "hl_gpu.h" #include "hl_gpu.h"
#if defined(__ANDROID__) && (__ANDROID_API__ < 21)
inline int rand_r(unsigned int* seedp) {
(void)seedp;
return rand();
}
#endif
/** /**
* Loop over the elements in a container * Loop over the elements in a container
* TODO(yuyang18): It's this foreach useful? Why not use C++ 11 foreach, * TODO(yuyang18): It's this foreach useful? Why not use C++ 11 foreach,
......
...@@ -271,6 +271,7 @@ message ImageConfig { ...@@ -271,6 +271,7 @@ message ImageConfig {
// The size of input feature map. // The size of input feature map.
required uint32 img_size = 8; required uint32 img_size = 8;
optional uint32 img_size_y = 9; optional uint32 img_size_y = 9;
optional uint32 img_size_z = 10 [ default = 1 ];
} }
message PriorBoxConfig { message PriorBoxConfig {
...@@ -519,6 +520,7 @@ message LayerConfig { ...@@ -519,6 +520,7 @@ message LayerConfig {
// for HuberRegressionLoss // for HuberRegressionLoss
optional double delta = 57 [ default = 1.0 ]; optional double delta = 57 [ default = 1.0 ];
// for 3D data
optional uint64 depth = 58 [ default = 1 ]; optional uint64 depth = 58 [ default = 1 ];
// for switch order layer // for switch order layer
......
...@@ -1332,6 +1332,12 @@ def parse_image(image, input_layer_name, image_conf): ...@@ -1332,6 +1332,12 @@ def parse_image(image, input_layer_name, image_conf):
get_img_size(input_layer_name, image_conf.channels) get_img_size(input_layer_name, image_conf.channels)
def parse_image3d(image, input_layer_name, image_conf):
image_conf.channels = image.channels
image_conf.img_size, image_conf.img_size_y, image_conf.img_size_z = \
get_img3d_size(input_layer_name, image_conf.channels)
def parse_norm(norm, input_layer_name, norm_conf): def parse_norm(norm, input_layer_name, norm_conf):
norm_conf.norm_type = norm.norm_type norm_conf.norm_type = norm.norm_type
config_assert( config_assert(
...@@ -2365,9 +2371,11 @@ class BatchNormLayer(LayerBase): ...@@ -2365,9 +2371,11 @@ class BatchNormLayer(LayerBase):
name, name,
inputs, inputs,
bias=True, bias=True,
img3D=False,
use_global_stats=True, use_global_stats=True,
moving_average_fraction=0.9, moving_average_fraction=0.9,
batch_norm_type=None, batch_norm_type=None,
mean_var_names=None,
**xargs): **xargs):
if inputs is None: if inputs is None:
inputs = [] inputs = []
...@@ -2409,24 +2417,69 @@ class BatchNormLayer(LayerBase): ...@@ -2409,24 +2417,69 @@ class BatchNormLayer(LayerBase):
input_layer = self.get_input_layer(0) input_layer = self.get_input_layer(0)
image_conf = self.config.inputs[0].image_conf image_conf = self.config.inputs[0].image_conf
parse_image(self.inputs[0].image, input_layer.name, image_conf) if img3D:
parse_image3d(self.inputs[0].image, input_layer.name, image_conf)
# Only pass the width and height of input to batch_norm layer # Only pass the width and height of input to batch_norm layer
# when either of it is non-zero. # when either of it is non-zero.
if input_layer.width != 0 or input_layer.height != 0: if input_layer.width != 0 or input_layer.height != 0:
self.set_cnn_layer(name, image_conf.img_size_y, image_conf.img_size, self.set_cnn_layer(
image_conf.channels, False) input_layer_name=name,
depth=image_conf.img_size_z,
height=image_conf.img_size_y,
width=image_conf.img_size,
channels=image_conf.channels,
is_print=True)
else:
self.set_layer_size(input_layer.size)
else: else:
self.set_layer_size(input_layer.size) parse_image(self.inputs[0].image, input_layer.name, image_conf)
# Only pass the width and height of input to batch_norm layer
# when either of it is non-zero.
if input_layer.width != 0 or input_layer.height != 0:
self.set_cnn_layer(
input_layer_name=name,
height=image_conf.img_size_y,
width=image_conf.img_size,
channels=image_conf.channels,
is_print=True)
else:
self.set_layer_size(input_layer.size)
psize = self.calc_parameter_size(image_conf) psize = self.calc_parameter_size(image_conf)
dims = [1, psize] dims = [1, psize]
if mean_var_names is not None:
assert len(mean_var_names) == 2
self.inputs[1].parameter_name = mean_var_names[0]
self.inputs[2].parameter_name = mean_var_names[1]
self.create_input_parameter(0, psize) self.create_input_parameter(0, psize)
self.create_input_parameter(1, psize, dims) self.create_input_parameter(1, psize, dims)
self.create_input_parameter(2, psize, dims) self.create_input_parameter(2, psize, dims)
self.create_bias_parameter(bias, psize) self.create_bias_parameter(bias, psize)
def set_cnn_layer(self,
input_layer_name,
depth=None,
height=None,
width=None,
channels=None,
is_print=True):
depthIsNone = False
if depth is None:
depth = 1
depthIsNone = True
size = depth * height * width * channels
self.set_layer_size(size)
self.set_layer_height_width(height, width)
self.set_layer_depth(depth)
if is_print and depthIsNone:
print("output for %s: c = %d, h = %d, w = %d, size = %d" %
(input_layer_name, channels, height, width, size))
elif is_print:
print("output for %s: c = %d, d = %d, h = %d, w = %d, size = %d" %
(input_layer_name, channels, depth, height, width, size))
def calc_parameter_size(self, image_conf): def calc_parameter_size(self, image_conf):
return image_conf.channels return image_conf.channels
...@@ -2688,9 +2741,20 @@ class AddToLayer(LayerBase): ...@@ -2688,9 +2741,20 @@ class AddToLayer(LayerBase):
super(AddToLayer, self).__init__( super(AddToLayer, self).__init__(
name, 'addto', 0, inputs=inputs, **xargs) name, 'addto', 0, inputs=inputs, **xargs)
config_assert(len(inputs) > 0, 'inputs cannot be empty for AddToLayer') config_assert(len(inputs) > 0, 'inputs cannot be empty for AddToLayer')
for input_index in xrange(len(self.inputs)):
input_layer = self.get_input_layer(input_index) if len(self.inputs) > 1:
self.set_layer_size(input_layer.size) for input_index in xrange(len(self.inputs)):
assert self.get_input_layer(0).height == self.get_input_layer(
input_index).height
assert self.get_input_layer(0).width == self.get_input_layer(
input_index).width
assert self.get_input_layer(0).depth == self.get_input_layer(
input_index).depth
self.set_layer_size(self.get_input_layer(0).size)
self.set_layer_height_width(self.get_input_layer(0).height, \
self.get_input_layer(0).width)
self.set_layer_depth(self.get_input_layer(0).depth)
self.create_bias_parameter(bias, self.config.size) self.create_bias_parameter(bias, self.config.size)
...@@ -3370,11 +3434,20 @@ class ConcatenateLayer(LayerBase): ...@@ -3370,11 +3434,20 @@ class ConcatenateLayer(LayerBase):
name, 'concat', 0, inputs=inputs, **xargs) name, 'concat', 0, inputs=inputs, **xargs)
size = 0 size = 0
for input_index in xrange(len(self.inputs)): for input_index in xrange(len(self.inputs)):
assert self.get_input_layer(0).height == self.get_input_layer(
input_index).height
assert self.get_input_layer(0).width == self.get_input_layer(
input_index).width
assert self.get_input_layer(0).depth == self.get_input_layer(
input_index).depth
input_layer = self.get_input_layer(input_index) input_layer = self.get_input_layer(input_index)
input = self.inputs[input_index] input = self.inputs[input_index]
if self.config.size == 0: if self.config.size == 0:
size += input_layer.size size += input_layer.size
self.set_layer_height_width(self.get_input_layer(0).height, \
self.get_input_layer(0).width)
self.set_layer_depth(self.get_input_layer(0).depth)
self.set_layer_size(size) self.set_layer_size(size)
...@@ -3675,8 +3748,8 @@ class SwitchOrderLayer(LayerBase): ...@@ -3675,8 +3748,8 @@ class SwitchOrderLayer(LayerBase):
def __init__(self, name, inputs, reshape, **xargs): def __init__(self, name, inputs, reshape, **xargs):
super(SwitchOrderLayer, self).__init__( super(SwitchOrderLayer, self).__init__(
name, 'switch_order', 0, inputs=inputs, **xargs) name, 'switch_order', 0, inputs=inputs, **xargs)
self.config.reshape_conf.heightAxis.extend(reshape['height']) self.config.reshape_conf.height_axis.extend(reshape['height'])
self.config.reshape_conf.widthAxis.extend(reshape['width']) self.config.reshape_conf.width_axis.extend(reshape['width'])
# Deprecated, use a new layer specific class instead # Deprecated, use a new layer specific class instead
......
...@@ -354,6 +354,10 @@ class LayerOutput(object): ...@@ -354,6 +354,10 @@ class LayerOutput(object):
def height(self): def height(self):
return cp.g_layer_map[self.full_name].height return cp.g_layer_map[self.full_name].height
@property
def depth(self):
return cp.g_layer_map[self.full_name].depth
def set_input(self, input): def set_input(self, input):
""" """
Set the input for a memory layer. Can only be used for memory layer Set the input for a memory layer. Can only be used for memory layer
...@@ -943,7 +947,7 @@ def data_layer(name, size, depth=None, height=None, width=None, ...@@ -943,7 +947,7 @@ def data_layer(name, size, depth=None, height=None, width=None,
if height is not None and width is not None: if height is not None and width is not None:
num_filters = size / (width * height * depth) num_filters = size / (width * height * depth)
assert num_filters * width * height * depth == size, \ assert num_filters * width * height * depth == size, \
"size=%s width=%s height=%s depth=%s" % (size, width, height, depth) "size=%s width=%s height=%s depth=%s" % (size, width, height, depth)
return LayerOutput(name, LayerType.DATA, size=size, num_filters=num_filters) return LayerOutput(name, LayerType.DATA, size=size, num_filters=num_filters)
...@@ -1219,7 +1223,8 @@ def detection_output_layer(input_loc, ...@@ -1219,7 +1223,8 @@ def detection_output_layer(input_loc,
name=None): name=None):
""" """
Apply the NMS to the output of network and compute the predict bounding Apply the NMS to the output of network and compute the predict bounding
box location. box location. The output of this layer could be None if there is no valid
bounding box.
:param name: The Layer Name. :param name: The Layer Name.
:type name: basestring :type name: basestring
...@@ -2953,13 +2958,15 @@ def img_cmrnorm_layer(input, ...@@ -2953,13 +2958,15 @@ def img_cmrnorm_layer(input,
def batch_norm_layer(input, def batch_norm_layer(input,
act=None, act=None,
name=None, name=None,
img3D=False,
num_channels=None, num_channels=None,
bias_attr=None, bias_attr=None,
param_attr=None, param_attr=None,
layer_attr=None, layer_attr=None,
batch_norm_type=None, batch_norm_type=None,
moving_average_fraction=0.9, moving_average_fraction=0.9,
use_global_stats=None): use_global_stats=None,
mean_var_names=None):
""" """
Batch Normalization Layer. The notation of this layer as follow. Batch Normalization Layer. The notation of this layer as follow.
...@@ -3026,6 +3033,8 @@ def batch_norm_layer(input, ...@@ -3026,6 +3033,8 @@ def batch_norm_layer(input,
:math:`runningMean = newMean*(1-factor) :math:`runningMean = newMean*(1-factor)
+ runningMean*factor` + runningMean*factor`
:type moving_average_fraction: float. :type moving_average_fraction: float.
:param mean_var_names: [mean name, variance name]
:type mean_var_names: string list
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
""" """
...@@ -3039,6 +3048,7 @@ def batch_norm_layer(input, ...@@ -3039,6 +3048,7 @@ def batch_norm_layer(input,
(batch_norm_type == "cudnn_batch_norm") (batch_norm_type == "cudnn_batch_norm")
l = Layer( l = Layer(
name=name, name=name,
img3D=img3D,
inputs=Input( inputs=Input(
input.name, image=Image(channels=num_channels), **param_attr.attr), input.name, image=Image(channels=num_channels), **param_attr.attr),
active_type=act.name, active_type=act.name,
...@@ -3047,6 +3057,7 @@ def batch_norm_layer(input, ...@@ -3047,6 +3057,7 @@ def batch_norm_layer(input,
bias=ParamAttr.to_bias(bias_attr), bias=ParamAttr.to_bias(bias_attr),
moving_average_fraction=moving_average_fraction, moving_average_fraction=moving_average_fraction,
use_global_stats=use_global_stats, use_global_stats=use_global_stats,
mean_var_names=mean_var_names,
**ExtraLayerAttribute.to_kwargs(layer_attr)) **ExtraLayerAttribute.to_kwargs(layer_attr))
return LayerOutput( return LayerOutput(
...@@ -6410,7 +6421,7 @@ def gated_unit_layer(input, ...@@ -6410,7 +6421,7 @@ def gated_unit_layer(input,
@wrap_name_default('switch_order') @wrap_name_default('switch_order')
def switch_order_layer(input, def switch_order_layer(input,
name=None, name=None,
reshape=None, reshape_axis=None,
act=None, act=None,
layer_attr=None): layer_attr=None):
""" """
...@@ -6421,8 +6432,9 @@ def switch_order_layer(input, ...@@ -6421,8 +6432,9 @@ def switch_order_layer(input,
The example usage is: The example usage is:
.. code-block:: python .. code-block:: python
reshape_axis = 3
switch = switch_order(input=layer, name='switch', reshape_axis=reshape_axis)
reshape = {'height':[ 0, 1, 2], 'width':[3]} reshape = {'height':[ 0, 1, 2], 'width':[3]}
switch = switch_order(input=layer, name='switch', reshape=reshape)
:param input: The input layer. :param input: The input layer.
:type input: LayerOutput :type input: LayerOutput
...@@ -6434,6 +6446,11 @@ def switch_order_layer(input, ...@@ -6434,6 +6446,11 @@ def switch_order_layer(input,
:rtype: LayerOutput :rtype: LayerOutput
""" """
assert isinstance(input, LayerOutput) assert isinstance(input, LayerOutput)
assert reshape_axis != None and (reshape_axis > 0 and reshape_axis < 4)
height = [ele for ele in xrange(reshape_axis)]
width = [ele for ele in range(reshape_axis, 4)]
reshape = {'height': height, 'width': width}
l = Layer( l = Layer(
name=name, name=name,
inputs=input.name, inputs=input.name,
...@@ -6444,6 +6461,7 @@ def switch_order_layer(input, ...@@ -6444,6 +6461,7 @@ def switch_order_layer(input,
return LayerOutput( return LayerOutput(
name=name, name=name,
layer_type=LayerType.SWITCH_ORDER_LAYER, layer_type=LayerType.SWITCH_ORDER_LAYER,
activation=act,
parents=input, parents=input,
size=l.config.size) size=l.config.size)
......
...@@ -10,6 +10,6 @@ test_prelu_layer test_row_conv test_detection_output_layer test_multibox_loss_la ...@@ -10,6 +10,6 @@ test_prelu_layer test_row_conv test_detection_output_layer test_multibox_loss_la
test_recursive_topology test_gated_unit_layer test_clip_layer test_row_l2_norm_layer test_recursive_topology test_gated_unit_layer test_clip_layer test_row_l2_norm_layer
test_kmax_seq_socre_layer test_sub_nested_seq_select_layer test_scale_shift_layer test_kmax_seq_socre_layer test_sub_nested_seq_select_layer test_scale_shift_layer
test_seq_slice_layer test_cross_entropy_over_beam test_pooling3D_layer test_seq_slice_layer test_cross_entropy_over_beam test_pooling3D_layer
test_conv3d_layer test_deconv3d_layer) test_conv3d_layer test_deconv3d_layer test_BatchNorm3D)
export whole_configs=(test_split_datasource) export whole_configs=(test_split_datasource)
...@@ -62,6 +62,7 @@ layers { ...@@ -62,6 +62,7 @@ layers {
moving_average_fraction: 0.9 moving_average_fraction: 0.9
height: 227 height: 227
width: 227 width: 227
depth: 1
} }
layers { layers {
name: "__crmnorm_0__" name: "__crmnorm_0__"
......
...@@ -62,6 +62,7 @@ layers { ...@@ -62,6 +62,7 @@ layers {
moving_average_fraction: 0.9 moving_average_fraction: 0.9
height: 256 height: 256
width: 256 width: 256
depth: 1
} }
layers { layers {
name: "__crmnorm_0__" name: "__crmnorm_0__"
......
type: "nn"
layers {
name: "data3D"
type: "data"
size: 360
active_type: ""
height: 6
width: 20
depth: 3
}
layers {
name: "__batch_norm_0__"
type: "batch_norm"
size: 360
active_type: "relu"
inputs {
input_layer_name: "data3D"
input_parameter_name: "___batch_norm_0__.w0"
image_conf {
channels: 1
img_size: 20
img_size_y: 6
img_size_z: 3
}
}
inputs {
input_layer_name: "data3D"
input_parameter_name: "___batch_norm_0__.w1"
}
inputs {
input_layer_name: "data3D"
input_parameter_name: "___batch_norm_0__.w2"
}
bias_parameter_name: "___batch_norm_0__.wbias"
moving_average_fraction: 0.9
height: 6
width: 20
depth: 3
}
parameters {
name: "___batch_norm_0__.w0"
size: 1
initial_mean: 1.0
initial_std: 0.0
initial_strategy: 0
initial_smart: false
}
parameters {
name: "___batch_norm_0__.w1"
size: 1
initial_mean: 0.0
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
is_static: true
is_shared: true
}
parameters {
name: "___batch_norm_0__.w2"
size: 1
initial_mean: 0.0
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
is_static: true
is_shared: true
}
parameters {
name: "___batch_norm_0__.wbias"
size: 1
initial_mean: 0.0
initial_std: 0.0
dims: 1
dims: 1
initial_strategy: 0
initial_smart: false
}
input_layer_names: "data3D"
output_layer_names: "__batch_norm_0__"
sub_models {
name: "root"
layer_names: "data3D"
layer_names: "__batch_norm_0__"
input_layer_names: "data3D"
output_layer_names: "__batch_norm_0__"
is_recurrent_layer_group: false
}
...@@ -74,6 +74,9 @@ layers { ...@@ -74,6 +74,9 @@ layers {
inputs { inputs {
input_layer_name: "__bidirectional_gru_0___bw" input_layer_name: "__bidirectional_gru_0___bw"
} }
height: 0
width: 0
depth: 1
} }
parameters { parameters {
name: "___bidirectional_gru_0___fw_transform.w0" name: "___bidirectional_gru_0___fw_transform.w0"
......
...@@ -16,6 +16,9 @@ layers { ...@@ -16,6 +16,9 @@ layers {
inputs { inputs {
input_layer_name: "data" input_layer_name: "data"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_1__" name: "__addto_1__"
...@@ -28,6 +31,9 @@ layers { ...@@ -28,6 +31,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_0__" input_layer_name: "__addto_0__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_2__" name: "__addto_2__"
...@@ -40,6 +46,9 @@ layers { ...@@ -40,6 +46,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_1__" input_layer_name: "__addto_1__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_3__" name: "__addto_3__"
...@@ -52,6 +61,9 @@ layers { ...@@ -52,6 +61,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_2__" input_layer_name: "__addto_2__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_4__" name: "__addto_4__"
...@@ -64,6 +76,9 @@ layers { ...@@ -64,6 +76,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_3__" input_layer_name: "__addto_3__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_5__" name: "__addto_5__"
...@@ -76,6 +91,9 @@ layers { ...@@ -76,6 +91,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_4__" input_layer_name: "__addto_4__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_6__" name: "__addto_6__"
...@@ -88,6 +106,9 @@ layers { ...@@ -88,6 +106,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_5__" input_layer_name: "__addto_5__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_7__" name: "__addto_7__"
...@@ -100,6 +121,9 @@ layers { ...@@ -100,6 +121,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_6__" input_layer_name: "__addto_6__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_8__" name: "__addto_8__"
...@@ -112,6 +136,9 @@ layers { ...@@ -112,6 +136,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_7__" input_layer_name: "__addto_7__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_9__" name: "__addto_9__"
...@@ -124,6 +151,9 @@ layers { ...@@ -124,6 +151,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_8__" input_layer_name: "__addto_8__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_10__" name: "__addto_10__"
...@@ -136,6 +166,9 @@ layers { ...@@ -136,6 +166,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_9__" input_layer_name: "__addto_9__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_11__" name: "__addto_11__"
...@@ -148,6 +181,9 @@ layers { ...@@ -148,6 +181,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_10__" input_layer_name: "__addto_10__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_12__" name: "__addto_12__"
...@@ -160,6 +196,9 @@ layers { ...@@ -160,6 +196,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_11__" input_layer_name: "__addto_11__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_13__" name: "__addto_13__"
...@@ -172,6 +211,9 @@ layers { ...@@ -172,6 +211,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_12__" input_layer_name: "__addto_12__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_14__" name: "__addto_14__"
...@@ -184,6 +226,9 @@ layers { ...@@ -184,6 +226,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_13__" input_layer_name: "__addto_13__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_15__" name: "__addto_15__"
...@@ -196,6 +241,9 @@ layers { ...@@ -196,6 +241,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_14__" input_layer_name: "__addto_14__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_16__" name: "__addto_16__"
...@@ -208,6 +256,9 @@ layers { ...@@ -208,6 +256,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_15__" input_layer_name: "__addto_15__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_17__" name: "__addto_17__"
...@@ -220,6 +271,9 @@ layers { ...@@ -220,6 +271,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_16__" input_layer_name: "__addto_16__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_18__" name: "__addto_18__"
...@@ -232,6 +286,9 @@ layers { ...@@ -232,6 +286,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_17__" input_layer_name: "__addto_17__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_19__" name: "__addto_19__"
...@@ -244,6 +301,9 @@ layers { ...@@ -244,6 +301,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_18__" input_layer_name: "__addto_18__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_20__" name: "__addto_20__"
...@@ -256,6 +316,9 @@ layers { ...@@ -256,6 +316,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_19__" input_layer_name: "__addto_19__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_21__" name: "__addto_21__"
...@@ -268,6 +331,9 @@ layers { ...@@ -268,6 +331,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_20__" input_layer_name: "__addto_20__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_22__" name: "__addto_22__"
...@@ -280,6 +346,9 @@ layers { ...@@ -280,6 +346,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_21__" input_layer_name: "__addto_21__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_23__" name: "__addto_23__"
...@@ -292,6 +361,9 @@ layers { ...@@ -292,6 +361,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_22__" input_layer_name: "__addto_22__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_24__" name: "__addto_24__"
...@@ -304,6 +376,9 @@ layers { ...@@ -304,6 +376,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_23__" input_layer_name: "__addto_23__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_25__" name: "__addto_25__"
...@@ -316,6 +391,9 @@ layers { ...@@ -316,6 +391,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_24__" input_layer_name: "__addto_24__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_26__" name: "__addto_26__"
...@@ -328,6 +406,9 @@ layers { ...@@ -328,6 +406,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_25__" input_layer_name: "__addto_25__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_27__" name: "__addto_27__"
...@@ -340,6 +421,9 @@ layers { ...@@ -340,6 +421,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_26__" input_layer_name: "__addto_26__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_28__" name: "__addto_28__"
...@@ -352,6 +436,9 @@ layers { ...@@ -352,6 +436,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_27__" input_layer_name: "__addto_27__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_29__" name: "__addto_29__"
...@@ -364,6 +451,9 @@ layers { ...@@ -364,6 +451,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_28__" input_layer_name: "__addto_28__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_30__" name: "__addto_30__"
...@@ -376,6 +466,9 @@ layers { ...@@ -376,6 +466,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_29__" input_layer_name: "__addto_29__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__addto_31__" name: "__addto_31__"
...@@ -388,6 +481,9 @@ layers { ...@@ -388,6 +481,9 @@ layers {
inputs { inputs {
input_layer_name: "__addto_30__" input_layer_name: "__addto_30__"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__fc_layer_0__" name: "__fc_layer_0__"
......
...@@ -22,6 +22,9 @@ layers { ...@@ -22,6 +22,9 @@ layers {
inputs { inputs {
input_layer_name: "b" input_layer_name: "b"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__concat_0__" name: "__concat_0__"
...@@ -34,6 +37,9 @@ layers { ...@@ -34,6 +37,9 @@ layers {
inputs { inputs {
input_layer_name: "b" input_layer_name: "b"
} }
height: 0
width: 0
depth: 1
} }
layers { layers {
name: "__concat_1__" name: "__concat_1__"
......
from paddle.trainer_config_helpers import *
settings(batch_size=1000, learning_rate=1e-4)
#data = data_layer(name='data', size=180, width=30, height=6)
#batchNorm = batch_norm_layer(data, num_channels=1)
#outputs(batchNorm)
data3D = data_layer(name='data3D', size=120 * 3, width=20, height=6, depth=3)
batchNorm3D = batch_norm_layer(data3D, num_channels=1, img3D=True)
outputs(batchNorm3D)
...@@ -53,10 +53,13 @@ class BeginPass(object): ...@@ -53,10 +53,13 @@ class BeginPass(object):
class EndPass(WithMetric): class EndPass(WithMetric):
""" """
Event On One Pass Training Complete. Event On One Pass Training Complete.
To get the output of a specific layer, add "event.gm.getLayerOutputs('predict_layer')"
in your event_handler call back
""" """
def __init__(self, pass_id, evaluator): def __init__(self, pass_id, evaluator, gm):
self.pass_id = pass_id self.pass_id = pass_id
self.gm = gm
WithMetric.__init__(self, evaluator) WithMetric.__init__(self, evaluator)
...@@ -73,10 +76,13 @@ class BeginIteration(object): ...@@ -73,10 +76,13 @@ class BeginIteration(object):
class EndIteration(WithMetric): class EndIteration(WithMetric):
""" """
Event On One Batch Training Complete. Event On One Batch Training Complete.
To get the output of a specific layer, add "event.gm.getLayerOutputs('predict_layer')"
in your event_handler call back
""" """
def __init__(self, pass_id, batch_id, cost, evaluator): def __init__(self, pass_id, batch_id, cost, evaluator, gm):
self.pass_id = pass_id self.pass_id = pass_id
self.batch_id = batch_id self.batch_id = batch_id
self.cost = cost self.cost = cost
self.gm = gm
WithMetric.__init__(self, evaluator) WithMetric.__init__(self, evaluator)
...@@ -43,7 +43,6 @@ class OpDescCreationMethod(object): ...@@ -43,7 +43,6 @@ class OpDescCreationMethod(object):
if len(args) != 0: if len(args) != 0:
raise ValueError("Only keyword arguments are supported.") raise ValueError("Only keyword arguments are supported.")
op_desc = framework_pb2.OpDesc() op_desc = framework_pb2.OpDesc()
for input_parameter in self.__op_proto__.inputs: for input_parameter in self.__op_proto__.inputs:
input_arguments = kwargs.get(input_parameter.name, []) input_arguments = kwargs.get(input_parameter.name, [])
if is_str(input_arguments): if is_str(input_arguments):
...@@ -142,8 +141,8 @@ def create_op_creation_method(op_proto): ...@@ -142,8 +141,8 @@ def create_op_creation_method(op_proto):
return OpInfo( return OpInfo(
method=__impl__, method=__impl__,
name=op_proto.type, name=op_proto.type,
inputs=[var.name for var in op_proto.inputs], inputs=[(var.name, var.duplicable) for var in op_proto.inputs],
outputs=[var.name for var in op_proto.outputs], outputs=[(var.name, var.duplicable) for var in op_proto.outputs],
attrs=[attr.name for attr in op_proto.attrs]) attrs=[attr.name for attr in op_proto.attrs])
...@@ -180,9 +179,15 @@ class OperatorFactory(object): ...@@ -180,9 +179,15 @@ class OperatorFactory(object):
return self.op_methods.get(t) return self.op_methods.get(t)
def get_op_input_names(self, type): def get_op_input_names(self, type):
return map(lambda x: x[0], self.get_op_info(type).inputs)
def get_op_inputs(self, type):
return self.get_op_info(type).inputs return self.get_op_info(type).inputs
def get_op_output_names(self, type): def get_op_output_names(self, type):
return map(lambda x: x[0], self.get_op_info(type).outputs)
def get_op_outputs(self, type):
return self.get_op_info(type).outputs return self.get_op_info(type).outputs
def get_op_attr_names(self, type): def get_op_attr_names(self, type):
......
...@@ -17,8 +17,7 @@ py_test(test_cross_entropy_op SRCS test_cross_entropy_op.py) ...@@ -17,8 +17,7 @@ py_test(test_cross_entropy_op SRCS test_cross_entropy_op.py)
py_test(test_gather_op SRCS test_gather_op.py) py_test(test_gather_op SRCS test_gather_op.py)
py_test(test_scatter_op SRCS test_scatter_op.py) py_test(test_scatter_op SRCS test_scatter_op.py)
py_test(test_fill_zeros_like_op SRCS test_fill_zeros_like_op.py) py_test(test_fill_zeros_like_op SRCS test_fill_zeros_like_op.py)
py_test(test_top_k_op SRCS test_top_k_op.py)
py_test(gradient_checker SRCS gradient_checker.py)
py_test(test_rowwise_add_op SRCS test_rowwise_add_op.py) py_test(test_rowwise_add_op SRCS test_rowwise_add_op.py)
...@@ -32,5 +31,7 @@ py_test(test_sgd_op SRCS test_sgd_op.py) ...@@ -32,5 +31,7 @@ py_test(test_sgd_op SRCS test_sgd_op.py)
py_test(test_gradient_checker SRCS test_gradient_checker.py) py_test(test_gradient_checker SRCS test_gradient_checker.py)
py_test(test_lookup_table SRCS test_lookup_table.py) py_test(test_lookup_table SRCS test_lookup_table.py)
py_test(test_scale_and_identity_op SRCS test_scale_and_identity_op.py) py_test(test_scale_and_identity_op SRCS test_scale_and_identity_op.py)
py_test(test_sum_op SRCS test_sum_op.py)
py_test(mnist SRCS mnist.py) py_test(mnist SRCS mnist.py)
py_test(test_concat_op SRCS test_concat_op.py)
py_test(test_squared_l2_distance_op SRCS test_squared_l2_distance_op.py) py_test(test_squared_l2_distance_op SRCS test_squared_l2_distance_op.py)
import unittest
import numpy
import itertools
import paddle.v2.framework.core as core
from paddle.v2.framework.op import Operator
__all__ = ['get_numeric_gradient']
def create_op(op_type):
# TODO need to set attrs
kwargs = dict()
for in_name in Operator.get_op_input_names(op_type):
kwargs[in_name] = in_name
for out_name in Operator.get_op_output_names(op_type):
kwargs[out_name] = out_name
return Operator(op_type, **kwargs)
def grad_var_name(var_name):
return var_name + "@GRAD"
def empty_var_name():
return "@EMPTY@"
def get_numeric_gradient(op,
input_values,
output_name,
input_to_check,
delta=0.005,
local_scope=None,
in_place=False):
"""
Get Numeric Gradient for an operator's input.
:param op: C++ operator instance, could be an network
:param input_values: The input variables. Should be an dictionary, key is
variable name. Value is numpy array.
:param output_name: The final output variable name.
:param input_to_check: The input variable need to get gradient.
:param delta: The perturbation value for numeric gradient method. The
smaller delta is, the more accurate result will get. But if that delta is
too small, it could occur numerical stability problem.
:param local_scope: The local scope used for get_numeric_gradient.
:return: The gradient array in numpy format.
"""
if local_scope is None:
local_scope = core.Scope()
# Create all input variable in local_scope
for var_name in input_values:
var = local_scope.new_var(var_name)
tensor = var.get_tensor()
tensor.set_dims(input_values[var_name].shape)
tensor.alloc_float(core.CPUPlace())
tensor.set(input_values[var_name], core.CPUPlace())
# Create all output variable in local_scope
opts = op.outputs()
for key in opts:
for output in opts[key]:
if local_scope.find_var(output) is None:
local_scope.new_var(output).get_tensor()
op.infer_shape(local_scope)
# allocate output memory
for key in opts:
for output in opts[key]:
local_scope.find_var(output).get_tensor().alloc_float(core.CPUPlace(
))
cpu_ctx = core.DeviceContext.create(core.CPUPlace())
def get_output():
op.run(local_scope, cpu_ctx)
return numpy.array(local_scope.find_var(output_name).get_tensor()).sum()
def product(dim):
return reduce(lambda a, b: a * b, dim, 1)
def restore_inputs():
for var_name in input_values:
tensor_ = local_scope.find_var(var_name).get_tensor()
tensor_.set(numpy.copy(input_values[var_name]), core.CPUPlace())
# get the input tensor that we want to get it's numeric gradient.
tensor_to_check = local_scope.find_var(input_to_check).get_tensor()
tensor_size = product(tensor_to_check.get_dims())
# prepare a numpy array to store the gradient.
gradient_flat = numpy.zeros(shape=(tensor_size, ), dtype='float32')
# we only compute gradient of one element each time.
# we use a for loop to compute the gradient of every element.
for i in xrange(tensor_size):
if in_place:
restore_inputs()
# get one input element throw it's index i.
origin = tensor_to_check.get_float_element(i)
# add delta to it, run op and then get the sum of the result tensor.
x_pos = origin + delta
tensor_to_check.set_float_element(i, x_pos)
y_pos = get_output()
# plus delta to this element, run op and get the sum of the result tensor.
if in_place:
restore_inputs()
x_neg = origin - delta
tensor_to_check.set_float_element(i, x_neg)
y_neg = get_output()
# restore old value
tensor_to_check.set_float_element(i, origin)
# compute the gradient of this element and store it into a numpy array.
gradient_flat[i] = (y_pos - y_neg) / delta / 2
# reshape the gradient result to the shape of the source tensor.
return gradient_flat.reshape(tensor_to_check.get_dims())
class GradientChecker(unittest.TestCase):
def __get_gradient(self, forward_op, backward_op, input_value, grad_names,
place):
"""Get the input gradients after running forward and backward operators
on the given places.
:param forward_op: forward operator
:type forward_op: Operator
:param backward_op: backward operator
:type backward_op: Operator
:param input_value: input values.
:type input_value: dict{string:numpy.array}
:param grad_names: the names of returned input gradients.
:type input_value: a list of string
:param place: the device type.
:type place: CPUPlace or GPUPlace
:return: the input grdients of given grad_names.
:rtype: a list of numpy.array
"""
scope = core.Scope()
ctx = core.DeviceContext.create(place)
inputs = forward_op.inputs()
in_names = [item for k in inputs for item in inputs[k]]
outputs = forward_op.outputs()
out_names = [item for k in outputs for item in outputs[k]]
# create input var and set value
for name, value in input_value.iteritems():
if name not in in_names:
raise ValueError(name + "does not exist in Op's inputs.")
var = scope.new_var(name).get_tensor()
var.set_dims(value.shape)
var.set(value, place)
# run forward op
for out_name in out_names:
scope.new_var(out_name)
forward_op.infer_shape(scope)
forward_op.run(scope, ctx)
# set output var's shape
# set output grad to ones
for name in out_names:
out_tensor = scope.find_var(name).get_tensor()
grad_tensor = scope.new_var(grad_var_name(name)).get_tensor()
grad_tensor.set_dims(out_tensor.shape())
data = numpy.ones(out_tensor.shape(), dtype=numpy.float32)
grad_tensor.set(data, place)
# run backward op
backward_outs = backward_op.outputs()
backward_names = [
item for key in backward_outs for item in backward_outs[key]
]
for name in backward_names:
scope.new_var(name)
backward_op.infer_shape(scope)
backward_op.run(scope, ctx)
outs = [
numpy.array(scope.find_var(name).get_tensor())
for name in grad_names
]
return outs
def compare_grad(self, forward_op, input_value, no_grad_set=None):
""" Compare the input gradients between CPU and GPU for the given forward
operator.
:param forward_op: forward operator
:type forward_op: Operator
:param input_value: input values.
:type input_value: dict{string:numpy.array}
:param no_grad_set: the set of variables names without gradients.
:type no_grad_set: a set of string
:raises: AssertionError, there is different gradient value.
"""
if no_grad_set is None:
no_grad_set = set()
backward_op = core.Operator.backward(forward_op, no_grad_set)
# return if not compile with GPU or not implementing GPU kernel
if not (core.is_compile_gpu() and backward_op.support_gpu()):
return
outputs = backward_op.outputs()
out_names = [item for k in outputs for item in outputs[k]]
out_names = filter(lambda x: x != empty_var_name(), out_names)
cpu_grads = self.__get_gradient(forward_op, backward_op, input_value,
out_names, core.CPUPlace())
gpu_grads = self.__get_gradient(forward_op, backward_op, input_value,
out_names, core.GPUPlace(0))
for c_grad, g_grad, name in itertools.izip(cpu_grads, gpu_grads,
out_names):
self.assertTrue(
numpy.allclose(
c_grad, g_grad, atol=1e-4),
"output name: " + name + " has diff")
def __assert_is_close(self, numeric_grads, analytic_grads, names,
max_relative_error, msg_prefix):
"""Use relative error for the comparison.
:param numeric_grads: the numerical graidents.
:type numeric_grads: a list of numpy.array
:param analytic_grads: the analytical graidents.
:type analytic_grads: a list of numpy.array
:param name: the names of gradients, used to print for debug.
:type names: a list of string
:param msg_prefix: string info, used to print for debug.
:type msf_prefix: string
"""
for a, b, name in itertools.izip(numeric_grads, analytic_grads, names):
abs_a = numpy.abs(a)
# if abs_a is nearly zero, then use abs error for a, not relative
# error.
abs_a[abs_a < 1e-3] = 1
diff_mat = numpy.abs(a - b) / abs_a
max_diff = numpy.max(diff_mat)
def err_msg():
offset = numpy.argmax(diff_mat > max_relative_error)
return "%s Variable %s max gradient diff %f over limit %f, the first " \
"error element is %d" % (
msg_prefix, name, max_diff, max_relative_error, offset)
self.assertLessEqual(max_diff, max_relative_error, err_msg())
def check_grad(self,
forward_op,
input_vars,
inputs_to_check,
output_name,
no_grad_set=None,
only_cpu=False,
in_place=False,
max_relative_error=0.005):
"""
:param forward_op: used to create backward_op
:param input_vars: numpy value of input variable. The following
computation will use these variables.
:param inputs_to_check: inputs var names that should check gradient.
:param output_name: the output variable name of forward network.
:param max_relative_error: The relative tolerance parameter.
:param no_grad_set: used when create backward ops
:param only_cpu: only compute and check gradient on cpu kernel.
:return:
"""
if no_grad_set is None:
no_grad_set = set()
no_tmp_out = forward_op.no_intermediate_outputs()
if len(no_tmp_out) != 1:
raise ValueError("non temp out_names should be 1")
inputs = forward_op.inputs()
in_names = [item for k in inputs for item in inputs[k]]
for no_grad in no_grad_set:
if no_grad not in in_names:
raise ValueError("no_grad should be in in_names")
if no_grad in inputs_to_check:
raise ValueError("no_grad should not be in inputs_to_check")
backward_op = core.Operator.backward(forward_op, no_grad_set)
places = [core.CPUPlace()]
if not only_cpu and core.is_compile_gpu() and backward_op.support_gpu():
places.append(core.GPUPlace(0))
# get numerical gradients
numeric_grads = [
get_numeric_gradient(
forward_op, input_vars, output_name, name, in_place=in_place)
for name in inputs_to_check
]
check_names = [grad_var_name(name) for name in inputs_to_check]
for place in places:
analytic_grads = self.__get_gradient(forward_op, backward_op,
input_vars, check_names, place)
self.__assert_is_close(numeric_grads, analytic_grads, check_names,
max_relative_error,
"Gradient Check On %s" % str(place))
import unittest
import numpy as np
import itertools
import paddle.v2.framework.core as core
from paddle.v2.framework.op import Operator
def grad_var_name(var_name):
return var_name + "@GRAD"
def create_op(scope, op_type, inputs, outputs, attrs):
kwargs = dict()
for in_name, in_dup in Operator.get_op_inputs(op_type):
if in_name in inputs:
kwargs[in_name] = []
if in_dup:
sub_in = inputs[in_name]
for sub_in_name, _ in sub_in:
var = scope.new_var(sub_in_name)
kwargs[in_name].append(sub_in_name)
else:
var = scope.new_var(in_name)
kwargs[in_name].append(in_name)
for out_name, out_dup in Operator.get_op_outputs(op_type):
if out_name in outputs:
kwargs[out_name] = []
if out_dup:
sub_in = outputs[out_name]
for sub_in_name, _ in sub_in:
var = scope.new_var(sub_in_name)
kwargs[out_name].append(sub_in_name)
else:
var = scope.new_var(out_name)
kwargs[out_name].append(out_name)
for attr_name in Operator.get_op_attr_names(op_type):
if attr_name in attrs:
kwargs[attr_name] = attrs[attr_name]
return Operator(op_type, **kwargs)
def set_input(scope, op, inputs, place):
for in_name, in_dup in Operator.get_op_inputs(op.type()):
if in_name in inputs:
if in_dup:
sub_in = inputs[in_name]
for sub_in_name, sub_in_array in sub_in:
var = scope.find_var(sub_in_name)
tensor = var.get_tensor()
tensor.set_dims(sub_in_array.shape)
tensor.set(sub_in_array, place)
else:
var = scope.find_var(in_name)
tensor = var.get_tensor()
arr = inputs[in_name]
tensor.set_dims(arr.shape)
tensor.set(arr, place)
def set_output_grad(scope, op, outputs, place):
for out_name, out_dup in Operator.get_op_outputs(op.type()):
if out_name in outputs:
if out_dup:
sub_out = outputs[out_name]
for sub_out_name, _ in sub_out:
out_tensor = scope.find_var(sub_out_name).get_tensor()
grad_tensor = scope.new_var(grad_var_name(
sub_out_name)).get_tensor()
grad_tensor.set_dims(out_tensor.shape())
data = np.ones(out_tensor.shape(), dtype=np.float32)
grad_tensor.set(data, place)
else:
out_tensor = scope.find_var(out_name).get_tensor()
grad_tensor = scope.new_var(grad_var_name(out_name)).get_tensor(
)
grad_tensor.set_dims(out_tensor.shape())
data = np.ones(out_tensor.shape(), dtype=np.float32)
grad_tensor.set(data, place)
def get_numeric_gradient(scope,
op,
inputs,
input_to_check,
output_name,
delta=0.005,
in_place=False):
set_input(scope, op, inputs, core.CPUPlace())
op.infer_shape(scope)
tensor_to_check = scope.find_var(input_to_check).get_tensor()
def product(dim):
return reduce(lambda a, b: a * b, dim, 1)
ctx = core.DeviceContext.create(core.CPUPlace())
def get_output():
op.run(scope, ctx)
return np.array(scope.find_var(output_name).get_tensor()).sum()
tensor_to_check = scope.find_var(input_to_check).get_tensor()
tensor_size = product(tensor_to_check.get_dims())
gradient_flat = np.zeros(shape=(tensor_size, ), dtype='float32')
# we only compute gradient of one element each time.
# we use a for loop to compute the gradient of every element.
for i in xrange(tensor_size):
if in_place:
set_input(scope, op, inputs, core.CPUPlace())
# get one input element throw it's index i.
origin = tensor_to_check.get_float_element(i)
# add delta to it, run op and then get the sum of the result tensor.
x_pos = origin + delta
tensor_to_check.set_float_element(i, x_pos)
y_pos = get_output()
if in_place:
set_input(scope, op, inputs, core.CPUPlace())
x_neg = origin - delta
tensor_to_check.set_float_element(i, x_neg)
y_neg = get_output()
tensor_to_check.set_float_element(i, origin)
gradient_flat[i] = (y_pos - y_neg) / delta / 2
return gradient_flat.reshape(tensor_to_check.get_dims())
def get_backward_op(scope, op, no_grad_set):
backward_op = core.Operator.backward(op, no_grad_set)
for input in backward_op.input_vars():
var = scope.new_var(input)
var.get_tensor()
for output in backward_op.output_vars():
var = scope.new_var(output)
var.get_tensor()
return backward_op
def get_gradient(scope, op, inputs, outputs, grad_name, place,
no_grad_set=None):
ctx = core.DeviceContext.create(place)
set_input(scope, op, inputs, place)
op.infer_shape(scope)
op.run(scope, ctx)
if no_grad_set is None:
no_grad_set = set()
backward_op = get_backward_op(scope, op, no_grad_set)
set_output_grad(scope, op, outputs, place)
backward_op.infer_shape(scope)
backward_op.run(scope, ctx)
out = np.array(scope.find_var(grad_name).get_tensor())
return out
class OpTest(unittest.TestCase):
def check_output_with_place(self, place):
self.scope = core.Scope()
op_inputs = self.inputs if hasattr(self, "inputs") else dict()
op_attrs = self.attrs if hasattr(self, "attrs") else dict()
self.op = create_op(self.scope, self.op_type, op_inputs, self.outputs,
op_attrs)
if isinstance(place, core.GPUPlace) and not self.op.support_gpu():
return
set_input(self.scope, self.op, self.inputs, place)
self.op.infer_shape(self.scope)
ctx = core.DeviceContext.create(place)
self.op.run(self.scope, ctx)
for out_name, out_dup in Operator.get_op_outputs(self.op.type()):
if out_dup:
sub_out = self.outputs[out_name]
for sub_out_name in sub_out:
actual = np.array(
self.scope.find_var(sub_out_name).get_tensor())
expect = sub_out[sub_out_name]
self.assertTrue(
np.allclose(
actual, expect, atol=1e-05),
"output name: " + out_name + "has diff")
else:
actual = np.array(self.scope.find_var(out_name).get_tensor())
expect = self.outputs[out_name]
self.assertTrue(
np.allclose(
actual, expect, atol=1e-05),
"output name: " + out_name + "has diff")
def check_output(self):
places = [core.CPUPlace()]
if core.is_compile_gpu():
places.append(core.GPUPlace(0))
for place in places:
self.check_output_with_place(place)
def __assert_is_close(self, numeric_grads, analytic_grads, names,
max_relative_error, msg_prefix):
for a, b, name in itertools.izip(numeric_grads, analytic_grads, names):
abs_a = np.abs(a)
abs_a[abs_a < 1e-3] = 1
diff_mat = np.abs(a - b) / abs_a
max_diff = np.max(diff_mat)
def err_msg():
offset = np.argmax(diff_mat > max_relative_error)
return "%s Variable %s max gradient diff %f over limit %f, the first " \
"error element is %d" % (
msg_prefix, name, max_diff, max_relative_error, offset)
self.assertLessEqual(max_diff, max_relative_error, err_msg())
def check_grad(self,
inputs_to_check,
output_name,
no_grad_set=None,
in_place=False,
max_relative_error=0.005):
self.scope = core.Scope()
op_inputs = self.inputs if hasattr(self, "inputs") else dict()
op_attrs = self.attrs if hasattr(self, "attrs") else dict()
self.op = create_op(self.scope, self.op_type, op_inputs, self.outputs,
op_attrs)
if no_grad_set is None:
no_grad_set = set()
numeric_grads = [
get_numeric_gradient(
self.scope,
self.op,
self.inputs,
input_to_check,
output_name,
in_place=in_place) for input_to_check in inputs_to_check
]
grad_names = [
grad_var_name(input_to_check) for input_to_check in inputs_to_check
]
cpu_place = core.CPUPlace()
cpu_analytic_grads = [
get_gradient(self.scope, self.op, self.inputs, self.outputs,
grad_name, cpu_place, no_grad_set)
for grad_name in grad_names
]
self.__assert_is_close(numeric_grads, cpu_analytic_grads, grad_names,
max_relative_error,
"Gradient Check On %s" % str(cpu_place))
if core.is_compile_gpu() and self.op.support_gpu():
gpu_place = core.GPUPlace(0)
gpu_analytic_grads = [
get_gradient(self.scope, self.op, self.inputs, self.outputs,
grad_name, gpu_place, no_grad_set)
for grad_name in grad_names
]
self.__assert_is_close(numeric_grads, gpu_analytic_grads,
grad_names, max_relative_error,
"Gradient Check On %s" % str(gpu_place))
for c_grad, g_grad, name in itertools.izip(
cpu_analytic_grads, gpu_analytic_grads, grad_names):
self.assertTrue(
np.allclose(
c_grad, g_grad, atol=1e-4),
"output name: " + name + " has diff")
import numpy
import paddle.v2.framework.core as core
from paddle.v2.framework.op import Operator
class OpTestMeta(type):
"""
Operator Test ClassMeta.
It injects `test_all` method into user's OperatorTest class, to make Python
unittest module run that method.
The `test_all` read what value is stored in `self`. It use self's values to
create and run a operator, and check whether that op is OK or not.
See `test_add_two_op` for example usage.
"""
def __new__(cls, name, bases, attrs):
obj = super(OpTestMeta, cls).__new__(cls, name, bases, attrs)
def test_all(self):
scope = core.Scope()
kwargs = dict()
places = [core.CPUPlace()]
if core.is_compile_gpu():
places.append(core.GPUPlace(0))
for place in places:
for in_name in Operator.get_op_input_names(self.type):
if hasattr(self, "inputs") and in_name in self.inputs:
kwargs[in_name] = in_name
var = scope.new_var(in_name).get_tensor()
arr = self.inputs[in_name]
var.set_dims(arr.shape)
var.set(arr, place)
else:
kwargs[in_name] = "@EMPTY@"
for out_name in Operator.get_op_output_names(self.type):
if not hasattr(self, "outputs"):
raise ValueError(
"The test op must set self.outputs dict.")
if out_name not in self.outputs:
raise ValueError("The %s is not in self.outputs dict." %
(out_name))
kwargs[out_name] = out_name
scope.new_var(out_name).get_tensor()
for attr_name in Operator.get_op_attr_names(self.type):
if hasattr(self, "attrs") and attr_name in self.attrs:
kwargs[attr_name] = self.attrs[attr_name]
op = Operator(self.type, **kwargs)
if isinstance(place, core.GPUPlace) and not op.support_gpu():
return
op.infer_shape(scope)
ctx = core.DeviceContext.create(place)
op.run(scope, ctx)
for out_name in Operator.get_op_output_names(self.type):
actual = numpy.array(scope.find_var(out_name).get_tensor())
expect = self.outputs[out_name]
self.assertTrue(
numpy.allclose(
actual, expect, atol=1e-05),
"output name: " + out_name + " has diff")
obj.test_all = test_all
return obj
import unittest import unittest
import numpy as np
from op_test import OpTest
import numpy
import paddle.v2.framework.core as core
from paddle.v2.framework.op import Operator
from op_test_util import OpTestMeta
class TestAddOp(unittest.TestCase):
__metaclass__ = OpTestMeta
class TestAddOp(OpTest):
def setUp(self): def setUp(self):
self.type = "add" self.op_type = "add"
self.inputs = { self.inputs = {
'X': numpy.random.random((102, 105)).astype("float32"), 'X': np.random.random((102, 105)).astype("float32"),
'Y': numpy.random.random((102, 105)).astype("float32") 'Y': np.random.random((102, 105)).astype("float32")
} }
self.outputs = {'Out': self.inputs['X'] + self.inputs['Y']} self.outputs = {'Out': self.inputs['X'] + self.inputs['Y']}
def test_check_output(self):
self.check_output()
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest
import numpy as np
from op_test import OpTest
class TestConcatOp(OpTest):
def setUp(self):
self.op_type = "concat"
x0 = np.random.random((2, 3, 2, 5)).astype('float32')
x1 = np.random.random((2, 3, 3, 5)).astype('float32')
x2 = np.random.random((2, 3, 4, 5)).astype('float32')
axis = 2
self.inputs = {'X': [('x0', x0), ('x1', x1), ('x2', x2)]}
self.attrs = {'axis': axis}
self.outputs = {'Out': np.concatenate((x0, x1, x2), axis=axis)}
def test_check_output(self):
self.check_output()
if __name__ == '__main__':
unittest.main()
import unittest import unittest
import numpy as np import numpy as np
from gradient_checker import GradientChecker, create_op from op_test import OpTest
from op_test_util import OpTestMeta
class TestCosSimOp(unittest.TestCase): class TestCosSimOp(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "cos_sim" self.op_type = "cos_sim"
self.inputs = { self.inputs = {
'X': np.random.random((32, 64)).astype("float32"), 'X': np.random.random((10, 5)).astype("float32"),
'Y': np.random.random((32, 64)).astype("float32") 'Y': np.random.random((10, 5)).astype("float32")
} }
expect_x_norm = np.linalg.norm(self.inputs['X'], axis=1) expect_x_norm = np.linalg.norm(self.inputs['X'], axis=1)
expect_y_norm = np.linalg.norm(self.inputs['Y'], axis=1) expect_y_norm = np.linalg.norm(self.inputs['Y'], axis=1)
...@@ -23,38 +20,20 @@ class TestCosSimOp(unittest.TestCase): ...@@ -23,38 +20,20 @@ class TestCosSimOp(unittest.TestCase):
'Out': np.expand_dims(expect_out, 1) 'Out': np.expand_dims(expect_out, 1)
} }
def test_check_output(self):
self.check_output()
class TestCosSimGradOp(GradientChecker): def test_check_grad_normal(self):
def setUp(self): self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.05)
self.op = create_op("cos_sim")
self.inputs = {
'X': np.random.random((10, 5)).astype("float32"),
'Y': np.random.random((10, 5)).astype("float32")
}
def test_cpu_gpu_compare(self):
self.compare_grad(self.op, self.inputs)
def test_normal(self):
self.check_grad(
self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.05)
def test_ignore_x(self): def test_check_grad_ingore_x(self):
self.check_grad( self.check_grad(
self.op, ['Y'], 'Out', max_relative_error=0.05, no_grad_set=set('X'))
self.inputs, ["Y"],
"Out",
max_relative_error=0.05,
no_grad_set={"X"})
def test_ignore_y(self): def test_check_grad_ignore_y(self):
self.check_grad( self.check_grad(
self.op, ['X'], 'Out', max_relative_error=0.05, no_grad_set=set('Y'))
self.inputs, ["X"],
"Out",
max_relative_error=0.05,
no_grad_set={"Y"})
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest import unittest
import numpy import numpy
from op_test_util import OpTestMeta from op_test import OpTest
from gradient_checker import GradientChecker, create_op
class TestCrossEntropy(unittest.TestCase): class TestCrossEntropy(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "onehot_cross_entropy" self.op_type = "onehot_cross_entropy"
batch_size = 30 batch_size = 30
class_num = 10 class_num = 10
X = numpy.random.random((batch_size, class_num)).astype("float32") X = numpy.random.uniform(0.1, 1.0,
label = 5 * numpy.ones(batch_size).astype("int32") [batch_size, class_num]).astype("float32")
label = (class_num / 2) * numpy.ones(batch_size).astype("int32")
self.inputs = {'X': X, 'label': label} self.inputs = {'X': X, 'label': label}
Y = [] Y = []
for i in range(0, batch_size): for i in range(0, batch_size):
Y.append(-numpy.log(X[i][label[i]])) Y.append(-numpy.log(X[i][label[i]]))
self.outputs = {'Y': numpy.array(Y).astype("float32")} self.outputs = {'Y': numpy.array(Y).astype("float32")}
def test_check_output(self):
self.check_output()
class CrossEntropyGradOpTest(GradientChecker):
def test_check_grad(self): def test_check_grad(self):
op = create_op("onehot_cross_entropy") self.check_grad(['X'], 'Y')
batch_size = 30
class_num = 10
inputs = {
"X": numpy.random.uniform(
0.1, 1.0, [batch_size, class_num]).astype("float32"),
"label": (class_num / 2) * numpy.ones(batch_size).astype("int32")
}
self.check_grad(op, inputs, set("X"), "Y")
if __name__ == "__main__": if __name__ == "__main__":
......
import unittest import unittest
from op_test_util import OpTestMeta import numpy as np
import numpy from op_test import OpTest
class TestFillZerosLikeOp(unittest.TestCase): class TestFillZerosLikeOp(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "fill_zeros_like" self.op_type = "fill_zeros_like"
self.inputs = {'Src': numpy.random.random((219, 232)).astype("float32")} self.inputs = {'Src': np.random.random((219, 232)).astype("float32")}
self.outputs = {'Dst': numpy.zeros_like(self.inputs['Src'])} self.outputs = {'Dst': np.zeros_like(self.inputs["Src"])}
def test_check_output(self):
self.check_output()
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest import unittest
from op_test_util import OpTestMeta import numpy as np
from gradient_checker import GradientChecker, create_op from op_test import OpTest
import numpy
import paddle.v2.framework.core as core
from paddle.v2.framework.op import Operator
class TestGatherOp(unittest.TestCase): class TestGatherOp(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "gather" self.op_type = "gather"
xnp = numpy.random.random((10, 20)).astype("float32") xnp = np.random.random((10, 20)).astype("float32")
self.inputs = { self.inputs = {'X': xnp, 'Index': np.array([1, 3, 5]).astype("int32")}
'X': xnp, self.outputs = {'Out': self.inputs["X"][self.inputs["Index"]]}
'Index': numpy.array([1, 3, 5]).astype("int32")
}
self.outputs = {'Out': self.inputs['X'][self.inputs['Index']]}
def test_check_output(self):
self.check_output()
class TestGatherGradOp(GradientChecker): def test_check_grad(self):
def test_gather_grad(self): self.check_grad(['X'], 'Out')
op = create_op("gather")
xnp = numpy.random.random((10, 20)).astype("float32")
inputs = {'X': xnp, 'Index': numpy.array([1, 3, 5]).astype("int32")}
self.check_grad(op, inputs, set("X"), "Out")
if __name__ == "__main__": if __name__ == "__main__":
......
...@@ -14,11 +14,11 @@ class GaussianRandomTest(unittest.TestCase): ...@@ -14,11 +14,11 @@ class GaussianRandomTest(unittest.TestCase):
def gaussian_random_test(self, place): def gaussian_random_test(self, place):
scope = core.Scope() scope = core.Scope()
scope.new_var("Out").get_tensor() scope.new_var('Out').get_tensor()
op = Operator( op = Operator(
"gaussian_random", "gaussian_random",
Out="Out", Out='Out',
dims=[1000, 784], dims=[1000, 784],
mean=.0, mean=.0,
std=1., std=1.,
...@@ -27,10 +27,10 @@ class GaussianRandomTest(unittest.TestCase): ...@@ -27,10 +27,10 @@ class GaussianRandomTest(unittest.TestCase):
op.infer_shape(scope) op.infer_shape(scope)
context = core.DeviceContext.create(place) context = core.DeviceContext.create(place)
op.run(scope, context) op.run(scope, context)
tensor = numpy.array(scope.find_var("Out").get_tensor()) tensor = numpy.array(scope.find_var('Out').get_tensor())
self.assertAlmostEqual(numpy.mean(tensor), .0, delta=0.1) self.assertAlmostEqual(numpy.mean(tensor), .0, delta=0.1)
self.assertAlmostEqual(numpy.std(tensor), 1., delta=0.1) self.assertAlmostEqual(numpy.std(tensor), 1., delta=0.1)
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest import unittest
import numpy import numpy as np
from paddle.v2.framework.op import Operator import paddle.v2.framework.core as core
from gradient_checker import GradientChecker from op_test import get_numeric_gradient
from gradient_checker import get_numeric_gradient from op_test import create_op
class GetNumericGradientTest(unittest.TestCase): class GetNumericGradientTest(unittest.TestCase):
def test_add_op(self): def test_add_op(self):
add_op = Operator("add", X="X", Y="Y", Out="Z") x = np.random.random((10, 1)).astype("float32")
x = numpy.random.random((10, 1)).astype("float32") y = np.random.random((10, 1)).astype("float32")
y = numpy.random.random((10, 1)).astype("float32") z = x + y
scope = core.Scope()
arr = get_numeric_gradient(add_op, {"X": x, "Y": y}, "Z", "X") add_op = create_op(scope, "add", {'X': x, 'Y': y}, {'Out': z}, dict())
arr = get_numeric_gradient(scope, add_op, {'X': x, 'Y': y}, 'X', 'Out')
self.assertAlmostEqual(arr.mean(), 1.0, delta=1e-4) self.assertAlmostEqual(arr.mean(), 1.0, delta=1e-4)
def test_softmax_op(self): def test_softmax_op(self):
def stable_softmax(x): def stable_softmax(x):
"""Compute the softmax of vector x in a numerically stable way.""" """Compute the softmax of vector x in a numerically stable way."""
shiftx = x - numpy.max(x) shiftx = x - np.max(x)
exps = numpy.exp(shiftx) exps = np.exp(shiftx)
return exps / numpy.sum(exps) return exps / np.sum(exps)
def label_softmax_grad(Y, dY): def label_softmax_grad(Y, dY):
dX = Y * 0.0 dX = Y * 0.0
for i in range(Y.shape[0]): for i in range(Y.shape[0]):
d = numpy.dot(Y[i, :], dY[i, :]) d = np.dot(Y[i, :], dY[i, :])
dX[i, :] = Y[i, :] * (dY[i, :] - d) dX[i, :] = Y[i, :] * (dY[i, :] - d)
return dX return dX
softmax_op = Operator("softmax", X="X", Y="Y") X = np.random.random((2, 2)).astype("float32")
Y = np.apply_along_axis(stable_softmax, 1, X)
X = numpy.random.random((2, 2)).astype("float32") dY = np.ones(Y.shape)
Y = numpy.apply_along_axis(stable_softmax, 1, X)
dY = numpy.ones(Y.shape)
dX = label_softmax_grad(Y, dY) dX = label_softmax_grad(Y, dY)
arr = get_numeric_gradient(softmax_op, {"X": X}, "Y", "X") scope = core.Scope()
numpy.testing.assert_almost_equal(arr, dX, decimal=1e-2) softmax_op = create_op(scope, "softmax", {"X": X}, {"Y": Y}, dict())
arr = get_numeric_gradient(scope, softmax_op, {"X": X}, "X", "Y")
np.testing.assert_almost_equal(arr, dX, decimal=1e-2)
if __name__ == "__main__": if __name__ == "__main__":
......
import unittest import unittest
import numpy as np import numpy as np
from op_test_util import OpTestMeta from op_test import OpTest
from gradient_checker import GradientChecker, create_op
class TestSigmoidOp(unittest.TestCase): class TestLookupTableOp(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = 'lookup_table' self.op_type = "lookup_table"
table = np.random.random((17, 31)).astype('float32') table = np.random.random((17, 31)).astype("float32")
ids = np.random.randint(0, 17, 4).astype('int32') ids = np.random.randint(0, 17, 4).astype("int32")
self.inputs = {'W': table, 'Ids': ids} self.inputs = {'W': table, 'Ids': ids}
self.outputs = {'Out': table[ids]} self.outputs = {'Out': table[ids]}
def test_check_output(self):
self.check_output()
class TestSigmoidGradOp(GradientChecker): def test_check_grad(self):
def test_grad(self): self.check_grad(['W'], 'Out', no_grad_set=set('Ids'))
op = create_op('lookup_table')
table = np.random.random((17, 31)).astype('float32')
ids = np.random.randint(0, 17, 4).astype('int32')
inputs = {'W': table, 'Ids': ids}
# comapre gradients
self.compare_grad(op, inputs, set(['Ids']))
# check gradients
self.check_grad(op, inputs, set('W'), 'Out')
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest import unittest
from op_test_util import OpTestMeta
from gradient_checker import GradientChecker, create_op
import numpy as np import numpy as np
from op_test import OpTest
class TestMeanOp(unittest.TestCase): class TestMeanOp(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "mean" self.op_type = "mean"
self.inputs = {'X': np.random.random((32, 784)).astype("float32")} self.inputs = {'X': np.random.random((10, 10)).astype("float32")}
self.outputs = {'Out': np.mean(self.inputs['X'])} self.outputs = {'Out': np.mean(self.inputs["X"])}
def test_check_output(self):
self.check_output()
class MeanGradOpTest(GradientChecker): def test_checkout_grad(self):
def test_normal(self): self.check_grad(['X'], 'Out')
op = create_op("mean")
inputs = {"X": np.random.random((10, 10)).astype("float32")}
self.check_grad(op, inputs, set("X"), "Out")
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest import unittest
import numpy as np import numpy as np
from gradient_checker import GradientChecker, create_op from op_test import OpTest
from op_test_util import OpTestMeta
class MinusOpTest(unittest.TestCase): class MinusOpTest(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "minus" self.op_type = "minus"
self.inputs = { self.inputs = {
'X': np.random.random((32, 84)).astype("float32"), 'X': np.random.random((32, 84)).astype("float32"),
'Y': np.random.random((32, 84)).astype("float32") 'Y': np.random.random((32, 84)).astype("float32")
} }
self.outputs = {'Out': (self.inputs['X'] - self.inputs['Y'])} self.outputs = {'Out': (self.inputs['X'] - self.inputs['Y'])}
def test_check_output(self):
self.check_output()
class MinusGradTest(GradientChecker): def test_check_grad(self):
def test_left(self): self.check_grad(['X', 'Y'], 'Out')
op = create_op("minus")
inputs = {
"X": np.random.random((10, 10)).astype("float32"),
"Y": np.random.random((10, 10)).astype("float32")
}
self.check_grad(op, inputs, ["X", 'Y'], "Out")
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest import unittest
import numpy as np import numpy as np
from gradient_checker import GradientChecker, create_op from op_test import OpTest
from op_test_util import OpTestMeta
class TestMulOp(unittest.TestCase): class TestMulOp(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "mul" self.op_type = "mul"
self.inputs = { self.inputs = {
'X': np.random.random((32, 84)).astype("float32"), 'X': np.random.random((32, 84)).astype("float32"),
'Y': np.random.random((84, 100)).astype("float32") 'Y': np.random.random((84, 100)).astype("float32")
} }
self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])} self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}
def test_check_output(self):
self.check_output()
def test_check_grad_normal(self):
self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5)
def test_check_grad_ingore_x(self):
self.check_grad(
['Y'], 'Out', max_relative_error=0.5, no_grad_set=set("X"))
def test_check_grad_ingore_y(self):
self.check_grad(
['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y'))
class TestMulGradOp(GradientChecker): class TestMulOp2(OpTest):
def setUp(self): def setUp(self):
self.op = create_op("mul") self.op_type = "mul"
self.inputs = { self.inputs = {
'X': np.random.random((32, 84)).astype("float32"), 'X': np.random.random((15, 4, 12, 10)).astype("float32"),
'Y': np.random.random((84, 100)).astype("float32") 'Y': np.random.random((4, 30, 8, 2, 9)).astype("float32")
}
self.attrs = {'x_num_col_dims': 2, 'y_num_col_dims': 2}
self.outputs = {
'Out': np.dot(self.inputs['X'].reshape(15 * 4, 12 * 10),
self.inputs['Y'].reshape(4 * 30, 8 * 2 * 9))
} }
def test_cpu_gpu_compare(self): def test_check_output(self):
self.compare_grad(self.op, self.inputs) self.check_output()
def test_normal(self): def test_check_grad_normal(self):
# mul op will enlarge the relative error self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5)
self.check_grad(
self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.5)
def test_ignore_x(self): def test_check_grad_ingore_x(self):
self.check_grad( self.check_grad(
self.op, ['Y'], 'Out', max_relative_error=0.5, no_grad_set=set('X'))
self.inputs, ["Y"],
"Out",
max_relative_error=0.5,
no_grad_set={"X"})
def test_ignore_y(self): def test_check_grad_ignore_y(self):
self.check_grad( self.check_grad(
self.op, ['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y'))
self.inputs, ["X"],
"Out",
max_relative_error=0.5,
no_grad_set={"Y"})
# TODO(dzh,qijun) : mulgrad test case need transpose feature of blas library
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
...@@ -35,5 +35,5 @@ Op(plain_net), inputs:{all[W, X, Y]}, outputs:{all[Out, fc.out, pre_activation]} ...@@ -35,5 +35,5 @@ Op(plain_net), inputs:{all[W, X, Y]}, outputs:{all[Out, fc.out, pre_activation]}
self.assertEqual(expected, "\n" + str(net)) self.assertEqual(expected, "\n" + str(net))
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest import unittest
import numpy as np import numpy as np
from op_test_util import OpTestMeta from op_test import OpTest
from gradient_checker import GradientChecker, create_op
class TestRowwiseAddOp(unittest.TestCase): class TestRowwiseAddOp(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "rowwise_add" self.op_type = "rowwise_add"
self.inputs = { self.inputs = {
'X': np.random.random((32, 84)).astype("float32"), 'X': np.random.uniform(0.1, 1, [5, 10]).astype("float32"),
'b': np.random.random(84).astype("float32") 'b': np.random.uniform(0.1, 1, [10]).astype("float32")
} }
self.outputs = {'Out': np.add(self.inputs['X'], self.inputs['b'])} self.outputs = {'Out': np.add(self.inputs['X'], self.inputs['b'])}
def test_check_output(self):
self.check_output()
def test_check_grad_normal(self):
self.check_grad(['X', 'b'], 'Out')
def test_check_grad_ingore_b(self):
self.check_grad(['X'], 'Out', no_grad_set=set('b'))
def test_check_grad_ingore_x(self):
self.check_grad(['b'], 'Out', no_grad_set=set('X'))
class TestRowwiseAddGradOp(GradientChecker):
class TestRowwiseAddOp2(OpTest):
def setUp(self): def setUp(self):
self.op = create_op("rowwise_add") self.op_type = "rowwise_add"
self.inputs = { self.inputs = {
"X": np.random.uniform(0.1, 1, [5, 10]).astype("float32"), 'X': np.random.uniform(0.1, 1, [2, 3, 2, 5]).astype("float32"),
"b": np.random.uniform(0.1, 1, [10]).astype("float32") 'b': np.random.uniform(0.1, 1, [2, 5]).astype("float32")
} }
self.outputs = {'Out': np.add(self.inputs['X'], self.inputs['b'])}
def test_check_output(self):
self.check_output()
def test_normal(self): def test_check_grad_normal(self):
self.check_grad(self.op, self.inputs, ["X", "b"], "Out") self.check_grad(['X', 'b'], 'Out')
def test_ignore_b(self): def test_check_grad_ignore_b(self):
self.check_grad(self.op, self.inputs, ["X"], "Out", no_grad_set={"b"}) self.check_grad(['X'], 'Out', no_grad_set=set('b'))
def test_ignore_x(self): def test_check_grad_ignore_x(self):
self.check_grad(self.op, self.inputs, ["b"], "Out", no_grad_set={"X"}) self.check_grad(['b'], 'Out', no_grad_set=set('X'))
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest import unittest
from op_test_util import OpTestMeta
from gradient_checker import GradientChecker, create_op
import numpy as np import numpy as np
from paddle.v2.framework.op import Operator from op_test import OpTest
class IdentityTest(unittest.TestCase): class IdentityTest(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "identity" self.op_type = "identity"
self.inputs = {'X': np.random.random((32, 784)).astype("float32")} self.inputs = {'X': np.random.random((10, 10)).astype("float32")}
self.outputs = {'Out': self.inputs['X']} self.outputs = {'Out': self.inputs['X']}
def test_check_output(self):
self.check_output()
class IdentityGradOpTest(GradientChecker): def test_check_grad(self):
def test_normal(self): self.check_grad(['X'], 'Out')
op = create_op("identity")
inputs = {"X": np.random.random((10, 10)).astype("float32")}
self.check_grad(op, inputs, set("X"), "Out")
class ScaleTest(unittest.TestCase):
__metaclass__ = OpTestMeta
class ScaleTest(OpTest):
def setUp(self): def setUp(self):
self.type = "scale" self.op_type = "scale"
self.inputs = {'X': np.random.random((32, 784)).astype("float32")} self.inputs = {'X': np.random.random((10, 10)).astype("float32")}
self.attrs = {'scale': -2.3} self.attrs = {'scale': -2.3}
self.outputs = {'Out': self.inputs['X'] * self.attrs['scale']} self.outputs = {'Out': self.inputs['X'] * self.attrs['scale']}
def test_check_output(self):
self.check_output()
class ScaleGradTest(GradientChecker): def test_check_grad(self):
def test_normal(self): self.check_grad(['X'], 'Out')
op = Operator("scale", X="X", Out="Out", scale=3.2)
self.check_grad(op,
{"X": np.random.random((10, 10)).astype("float32")},
set("X"), "Out")
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest import unittest
from op_test_util import OpTestMeta import numpy as np
from gradient_checker import GradientChecker, create_op from op_test import OpTest
import numpy
import paddle.v2.framework.core as core
from paddle.v2.framework.op import Operator
class TestScatterOp(unittest.TestCase): class TestScatterOp(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "scatter" self.op_type = "scatter"
ref_np = numpy.ones((3, 3)).astype("float32") ref_np = np.ones((3, 3)).astype("float32")
index_np = numpy.array([1, 2]).astype("int32") index_np = np.array([1, 2]).astype("int32")
updates_np = numpy.random.random((2, 3)).astype("float32") updates_np = np.random.random((2, 3)).astype("float32")
output_np = numpy.copy(ref_np) output_np = np.copy(ref_np)
output_np[index_np] += updates_np output_np[index_np] += updates_np
self.inputs = {'Ref': ref_np, 'Index': index_np, 'Updates': updates_np} self.inputs = {'Ref': ref_np, 'Index': index_np, 'Updates': updates_np}
self.outputs = {'Out': output_np} self.outputs = {'Out': output_np}
def test_check_output(self):
self.check_output()
class TestScatterGradOp(GradientChecker): def test_check_grad(self):
def test_scatter_grad(self): self.check_grad(['Updates', 'Ref'], 'Out', in_place=True)
op = create_op("scatter")
# test data setup
ref_np = numpy.ones((3, 10)).astype("float32")
index_np = numpy.array([1, 2]).astype("int32")
updates_np = numpy.random.random((2, 10)).astype("float32")
output_np = numpy.copy(ref_np)
output_np[index_np] += updates_np
inputs = {'Ref': ref_np, 'Index': index_np, 'Updates': updates_np}
self.check_grad(
op, inputs, set(["Updates", "Ref"]), "Out", in_place=True)
if __name__ == "__main__": if __name__ == "__main__":
......
import unittest import unittest
import numpy import numpy as np
from op_test_util import OpTestMeta from op_test import OpTest
class TestSGD(unittest.TestCase): class TestSGD(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "sgd" self.op_type = "sgd"
w = numpy.random.random((102, 105)).astype("float32") w = np.random.random((102, 105)).astype("float32")
g = numpy.random.random((102, 105)).astype("float32") g = np.random.random((102, 105)).astype("float32")
lr = 0.1 lr = 0.1
self.inputs = {'param': w, 'grad': g} self.inputs = {'param': w, 'grad': g}
self.attrs = {'learning_rate': lr} self.attrs = {'learning_rate': lr}
self.outputs = {'param_out': w - lr * g} self.outputs = {'param_out': w - lr * g}
def test_check_output(self):
self.check_output()
if __name__ == "__main__": if __name__ == "__main__":
unittest.main() unittest.main()
import unittest import unittest
import numpy as np import numpy as np
from op_test_util import OpTestMeta from op_test import OpTest
from gradient_checker import GradientChecker, create_op
class TestSigmoidOp(unittest.TestCase): class TestSigmoid(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "sigmoid" self.op_type = "sigmoid"
self.inputs = {'X': np.random.random((15, 31)).astype("float32")} self.inputs = {
'X': np.random.uniform(0.1, 1, [11, 17]).astype("float32")
}
self.outputs = {'Y': 1 / (1 + np.exp(-self.inputs['X']))} self.outputs = {'Y': 1 / (1 + np.exp(-self.inputs['X']))}
def test_check_output(self):
self.check_output()
class TestSigmoidGradOp(GradientChecker): def test_check_grad(self):
def test_grad(self): self.check_grad(["X"], "Y", max_relative_error=0.007)
op = create_op("sigmoid")
inputs = {"X": np.random.uniform(0.1, 1, [11, 17]).astype("float32")}
# compare gpu and cpu results for backward op.
# this test will be skiped if only compiling CPU version.
self.compare_grad(op, inputs)
# check gradients
self.check_grad(op, inputs, set("X"), "Y", max_relative_error=0.007)
if __name__ == '__main__': if __name__ == '__main__':
......
import unittest import unittest
import numpy as np import numpy as np
from op_test import OpTest
from gradient_checker import GradientChecker, create_op
from op_test_util import OpTestMeta
def stable_softmax(x): def stable_softmax(x):
...@@ -13,26 +10,21 @@ def stable_softmax(x): ...@@ -13,26 +10,21 @@ def stable_softmax(x):
return exps / np.sum(exps) return exps / np.sum(exps)
class TestSoftmaxOp(unittest.TestCase): class TestSoftmaxOp(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = "softmax" self.op_type = "softmax"
self.inputs = {"X": np.random.random((10, 10)).astype("float32")} self.inputs = {
'X': np.random.uniform(0.1, 1, [10, 10]).astype("float32")
}
self.outputs = { self.outputs = {
"Y": np.apply_along_axis(stable_softmax, 1, self.inputs["X"]) 'Y': np.apply_along_axis(stable_softmax, 1, self.inputs['X'])
} }
def test_check_output(self):
self.check_output()
class TestSoftmaxGradOp(GradientChecker): def test_check_grad(self):
def setUp(self): self.check_grad(['X'], 'Y')
self.op = create_op("softmax")
self.inputs = {
"X": np.random.uniform(0.1, 1, [10, 10]).astype("float32")
}
def test_softmax_grad(self):
self.check_grad(self.op, self.inputs, ["X"], "Y")
if __name__ == "__main__": if __name__ == "__main__":
......
import unittest import unittest
from op_test_util import OpTestMeta
from gradient_checker import GradientChecker, create_op
import numpy as np import numpy as np
from op_test import OpTest
class TestSquaredL2DistanceOp_f0(unittest.TestCase): class TestSquaredL2DistanceOp_f0(OpTest):
__metaclass__ = OpTestMeta
def setUp(self): def setUp(self):
self.type = 'squared_l2_distance' self.op_type = "squared_l2_distance"
self.inputs = { self.inputs = {
'X': np.random.uniform(0.1, 1., (32, 64)).astype('float32'), 'X': np.random.uniform(0.1, 0.6, (2, 3)).astype("float32"),
'Y': np.random.uniform(0.1, 1., (32, 64)).astype('float32') 'Y': np.random.uniform(0.1, 0.6, (2, 3)).astype("float32")
} }
sub_res = self.inputs['X'] - self.inputs['Y'] sub_res = self.inputs['X'] - self.inputs['Y']
output = sub_res * sub_res output = sub_res * sub_res
...@@ -20,15 +17,19 @@ class TestSquaredL2DistanceOp_f0(unittest.TestCase): ...@@ -20,15 +17,19 @@ class TestSquaredL2DistanceOp_f0(unittest.TestCase):
'Out': np.expand_dims(output.sum(1), 1) 'Out': np.expand_dims(output.sum(1), 1)
} }
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(['X', 'Y'], 'Out')
class TestSquaredL2DistanceOp_f1(unittest.TestCase):
__metaclass__ = OpTestMeta
class TestSquaredL2DistanceOp_f1(OpTest):
def setUp(self): def setUp(self):
self.type = 'squared_l2_distance' self.op_type = "squared_l2_distance"
self.inputs = { self.inputs = {
'X': np.random.uniform(0.1, 1., (32, 64)).astype('float32'), 'X': np.random.uniform(0.1, 0.6, (2, 3)).astype("float32"),
'Y': np.random.uniform(0.1, 1., (1, 64)).astype('float32') 'Y': np.random.uniform(0.1, 0.6, (1, 3)).astype("float32")
} }
sub_res = self.inputs['X'] - self.inputs['Y'] sub_res = self.inputs['X'] - self.inputs['Y']
output = sub_res * sub_res output = sub_res * sub_res
...@@ -37,53 +38,34 @@ class TestSquaredL2DistanceOp_f1(unittest.TestCase): ...@@ -37,53 +38,34 @@ class TestSquaredL2DistanceOp_f1(unittest.TestCase):
'Out': np.expand_dims(output.sum(1), 1) 'Out': np.expand_dims(output.sum(1), 1)
} }
def test_check_output(self):
self.check_output()
class TestSquaredL2DistanceOp_f2(unittest.TestCase): def test_check_grad(self):
__metaclass__ = OpTestMeta self.check_grad(['X', 'Y'], 'Out')
class TestSquaredL2DistanceOp_f2(OpTest):
def setUp(self): def setUp(self):
self.type = 'squared_l2_distance' self.op_type = "squared_l2_distance"
self.inputs = { self.inputs = {
'X': np.random.uniform(0.1, 1., (32, 64, 128)).astype('float32'), 'X': np.random.uniform(0.1, 0.6, (2, 3, 4)).astype("float32"),
'Y': np.random.uniform(0.1, 1., (1, 64, 128)).astype('float32') 'Y': np.random.uniform(0.1, 0.6, (1, 3, 4)).astype("float32")
} }
sub_res = self.inputs['X'] - self.inputs['Y'] sub_res = self.inputs['X'] - self.inputs['Y']
sub_res = sub_res.reshape((32, 64 * 128)) sub_res = sub_res.reshape((2, 3 * 4))
output = sub_res * sub_res output = sub_res * sub_res
self.outputs = { self.outputs = {
'sub_result': sub_res, 'sub_result': sub_res,
'Out': np.expand_dims(output.sum(1), 1) 'Out': np.expand_dims(output.sum(1), 1)
} }
def test_check_output(self):
self.check_output()
class TestSquaredL2DistanceGradOp(GradientChecker): def test_check_grad(self):
def test_squared_l2_distance_b0(self): self.check_grad(['X', 'Y'], 'Out')
op = create_op("squared_l2_distance")
inputs = {
'X': np.random.uniform(0.1, .6, (2, 3)).astype('float32'),
'Y': np.random.uniform(0.1, .6, (2, 3)).astype('float32')
}
self.compare_grad(op, inputs)
self.check_grad(op, inputs, set(["X", "Y"]), "Out")
def test_squared_l2_distance_b1(self):
op = create_op("squared_l2_distance")
inputs = {
'X': np.random.uniform(0.1, .6, (2, 3)).astype('float32'),
'Y': np.random.uniform(0.1, .6, (1, 3)).astype('float32')
}
self.compare_grad(op, inputs)
self.check_grad(op, inputs, set(["X", "Y"]), "Out")
def test_squared_l2_distance_b2(self):
op = create_op("squared_l2_distance")
inputs = {
'X': np.random.uniform(0.1, .6, (2, 3, 4)).astype('float32'),
'Y': np.random.uniform(0.1, .6, (1, 3, 4)).astype('float32')
}
self.compare_grad(op, inputs)
self.check_grad(op, inputs, set(["X", "Y"]), "Out")
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
import unittest
import numpy as np
from op_test import OpTest
class TestSumOp(OpTest):
def setUp(self):
self.op_type = "sum"
x0 = np.random.random((3, 4)).astype('float32')
x1 = np.random.random((3, 4)).astype('float32')
x2 = np.random.random((3, 4)).astype('float32')
self.inputs = {"X": [("x0", x0), ("x1", x1), ("x2", x2)]}
y = x0 + x1 + x2
self.outputs = {'Out': y}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(['x0'], 'Out')
if __name__ == "__main__":
unittest.main()
...@@ -3,7 +3,7 @@ import unittest ...@@ -3,7 +3,7 @@ import unittest
import numpy import numpy
class TestScope(unittest.TestCase): class TestTensor(unittest.TestCase):
def test_int_tensor(self): def test_int_tensor(self):
scope = core.Scope() scope = core.Scope()
var = scope.new_var("test_tensor") var = scope.new_var("test_tensor")
...@@ -20,8 +20,8 @@ class TestScope(unittest.TestCase): ...@@ -20,8 +20,8 @@ class TestScope(unittest.TestCase):
tensor.set(tensor_array, place) tensor.set(tensor_array, place)
tensor_array_2 = numpy.array(tensor) tensor_array_2 = numpy.array(tensor)
self.assertEqual(1.0, tensor_array_2[3, 9]) self.assertEqual(1, tensor_array_2[3, 9])
self.assertEqual(2.0, tensor_array_2[19, 11]) self.assertEqual(2, tensor_array_2[19, 11])
def test_float_tensor(self): def test_float_tensor(self):
scope = core.Scope() scope = core.Scope()
...@@ -43,6 +43,84 @@ class TestScope(unittest.TestCase): ...@@ -43,6 +43,84 @@ class TestScope(unittest.TestCase):
self.assertAlmostEqual(1.0, tensor_array_2[3, 9]) self.assertAlmostEqual(1.0, tensor_array_2[3, 9])
self.assertAlmostEqual(2.0, tensor_array_2[19, 11]) self.assertAlmostEqual(2.0, tensor_array_2[19, 11])
def test_int_lod_tensor(self):
places = [core.CPUPlace(), core.GPUPlace(0)]
for place in places:
scope = core.Scope()
var = scope.new_var("test_tensor")
var_lod = scope.new_var("test_lod_tensor")
tensor = var.get_tensor()
lod_tensor = var_lod.get_lod_tensor()
tensor.set_dims([4, 4, 6])
tensor.alloc_int(place)
array = numpy.array(tensor)
array[0, 0, 0] = 3
array[3, 3, 5] = 10
tensor.set(array, place)
lod_tensor.set_tensor(tensor)
lod_tensor.set_lod([[0, 2, 4]])
lod_v = numpy.array(lod_tensor.tensor())
self.assertTrue(numpy.alltrue(array == lod_v))
lod = lod_tensor.lod()
self.assertEqual(0, lod[0][0])
self.assertEqual(2, lod[0][1])
self.assertEqual(4, lod[0][2])
def test_float_lod_tensor(self):
places = [core.CPUPlace(), core.GPUPlace(0)]
for place in places:
scope = core.Scope()
var = scope.new_var("test_tensor")
var_lod = scope.new_var("test_lod_tensor")
tensor = var.get_tensor()
lod_tensor = var_lod.get_lod_tensor()
tensor.set_dims([5, 2, 3, 4])
tensor.alloc_float(place)
tensor_array = numpy.array(tensor)
self.assertEqual((5, 2, 3, 4), tensor_array.shape)
tensor_array[0, 0, 0, 0] = 1.0
tensor_array[0, 0, 0, 1] = 2.0
tensor.set(tensor_array, place)
lod_tensor.set_tensor(tensor)
lod_v = numpy.array(lod_tensor.tensor())
self.assertAlmostEqual(1.0, lod_v[0, 0, 0, 0])
self.assertAlmostEqual(2.0, lod_v[0, 0, 0, 1])
self.assertEqual(len(lod_tensor.lod()), 0)
lod_py = [[0, 2, 5], [0, 2, 4, 5]]
lod_tensor.set_lod(lod_py)
lod = lod_tensor.lod()
self.assertListEqual(lod_py, lod)
def test_lod_tensor_init(self):
scope = core.Scope()
var = scope.new_var("test_tensor")
place = core.CPUPlace()
tensor = var.get_tensor()
tensor.set_dims([5, 2, 3, 4])
tensor.alloc_float(place)
tensor_array = numpy.array(tensor)
tensor_array[0, 0, 0, 0] = 1.0
tensor_array[0, 0, 0, 1] = 2.0
tensor.set(tensor_array, place)
lod_py = [[0, 2, 5], [0, 2, 4, 5]]
lod_tensor = core.LoDTensor(lod_py, tensor)
lod_v = numpy.array(lod_tensor.tensor())
self.assertAlmostEqual(1.0, lod_v[0, 0, 0, 0])
self.assertAlmostEqual(2.0, lod_v[0, 0, 0, 1])
self.assertListEqual(lod_py, lod_tensor.lod())
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
import unittest
import numpy as np
from op_test import OpTest
class TestTopkOp(OpTest):
def setUp(self):
self.op_type = "top_k"
k = 1
input = np.random.random((32, 84)).astype("float32")
output = np.ndarray((32, k))
indices = np.ndarray((32, k))
self.inputs = {'X': input}
self.attrs = {'k': k}
for rowid in xrange(32):
row = input[rowid]
output[rowid] = np.sort(row)[-k:]
indices[rowid] = row.argsort()[-k:]
self.outputs = {'Out': output, 'Indices': indices}
class TestTopkOp3d(OpTest):
def setUp(self):
self.op_type = "top_k"
k = 1
input = np.random.random((32, 2, 84)).astype("float32")
input_flat_2d = input.reshape(64, 84)
output = np.ndarray((64, k))
indices = np.ndarray((64, k)).astype("int")
# FIXME: should use 'X': input for a 3d input
self.inputs = {'X': input_flat_2d}
self.attrs = {'k': k}
for rowid in xrange(64):
row = input_flat_2d[rowid]
output[rowid] = np.sort(row)[-k:]
indices[rowid] = row.argsort()[-k:]
self.outputs = {'Out': output, 'Indices': indices}
if __name__ == "__main__":
unittest.main()
...@@ -14,11 +14,11 @@ class UniformRandomTest(unittest.TestCase): ...@@ -14,11 +14,11 @@ class UniformRandomTest(unittest.TestCase):
def uniform_random_test(self, place): def uniform_random_test(self, place):
scope = core.Scope() scope = core.Scope()
scope.new_var("X").get_tensor() scope.new_var('X').get_tensor()
op = Operator( op = Operator(
"uniform_random", "uniform_random",
Out="X", Out='X',
dims=[1000, 784], dims=[1000, 784],
min=-5.0, min=-5.0,
max=10.0, max=10.0,
...@@ -27,9 +27,9 @@ class UniformRandomTest(unittest.TestCase): ...@@ -27,9 +27,9 @@ class UniformRandomTest(unittest.TestCase):
op.infer_shape(scope) op.infer_shape(scope)
ctx = core.DeviceContext.create(place) ctx = core.DeviceContext.create(place)
op.run(scope, ctx) op.run(scope, ctx)
tensor = numpy.array(scope.find_var("X").get_tensor()) tensor = numpy.array(scope.find_var('X').get_tensor())
self.assertAlmostEqual(tensor.mean(), 2.5, delta=0.1) self.assertAlmostEqual(tensor.mean(), 2.5, delta=0.1)
if __name__ == '__main__': if __name__ == "__main__":
unittest.main() unittest.main()
...@@ -2,6 +2,7 @@ import numpy ...@@ -2,6 +2,7 @@ import numpy
import collections import collections
import topology import topology
import minibatch import minibatch
import cPickle
__all__ = ['infer', 'Inference'] __all__ = ['infer', 'Inference']
...@@ -25,11 +26,23 @@ class Inference(object): ...@@ -25,11 +26,23 @@ class Inference(object):
:type parameters: paddle.v2.parameters.Parameters :type parameters: paddle.v2.parameters.Parameters
""" """
def __init__(self, output_layer, parameters): def __init__(self, parameters, output_layer=None, fileobj=None):
import py_paddle.swig_paddle as api import py_paddle.swig_paddle as api
topo = topology.Topology(output_layer)
gm = api.GradientMachine.createFromConfigProto( if output_layer is not None:
topo.proto(), api.CREATE_MODE_TESTING, [api.PARAMETER_VALUE]) topo = topology.Topology(output_layer)
gm = api.GradientMachine.createFromConfigProto(
topo.proto(), api.CREATE_MODE_TESTING, [api.PARAMETER_VALUE])
self.__data_types__ = topo.data_type()
elif fileobj is not None:
tmp = cPickle.load(fileobj)
gm = api.GradientMachine.createByConfigProtoStr(
tmp['protobin'], api.CREATE_MODE_TESTING,
[api.PARAMETER_VALUE])
self.__data_types__ = tmp['data_type']
else:
raise ValueError("Either output_layer or fileobj must be set")
for param in gm.getParameters(): for param in gm.getParameters():
val = param.getBuf(api.PARAMETER_VALUE) val = param.getBuf(api.PARAMETER_VALUE)
name = param.getName() name = param.getName()
...@@ -43,7 +56,6 @@ class Inference(object): ...@@ -43,7 +56,6 @@ class Inference(object):
# called here, but it's better to call this function in one place. # called here, but it's better to call this function in one place.
param.setValueUpdated() param.setValueUpdated()
self.__gradient_machine__ = gm self.__gradient_machine__ = gm
self.__data_types__ = topo.data_type()
def iter_infer(self, input, feeding=None): def iter_infer(self, input, feeding=None):
from data_feeder import DataFeeder from data_feeder import DataFeeder
......
...@@ -18,6 +18,7 @@ from paddle.proto.ModelConfig_pb2 import ModelConfig ...@@ -18,6 +18,7 @@ from paddle.proto.ModelConfig_pb2 import ModelConfig
import paddle.trainer_config_helpers as conf_helps import paddle.trainer_config_helpers as conf_helps
import layer as v2_layer import layer as v2_layer
import config_base import config_base
import cPickle
__all__ = ['Topology'] __all__ = ['Topology']
...@@ -100,6 +101,14 @@ class Topology(object): ...@@ -100,6 +101,14 @@ class Topology(object):
return layer return layer
return None return None
def serialize_for_inference(self, stream):
protobin = self.proto().SerializeToString()
data_type = self.data_type()
cPickle.dump({
'protobin': protobin,
'data_type': data_type
}, stream, cPickle.HIGHEST_PROTOCOL)
def __check_layer_type__(layer): def __check_layer_type__(layer):
if not isinstance(layer, config_base.Layer): if not isinstance(layer, config_base.Layer):
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册