提交 cf2608e3 编写于 作者: D dangqingqing

update to the develop branch.

...@@ -22,7 +22,7 @@ COPY ./paddle/scripts/docker/root/ /root/ ...@@ -22,7 +22,7 @@ COPY ./paddle/scripts/docker/root/ /root/
RUN apt-get update && \ RUN apt-get update && \
apt-get install -y \ apt-get install -y \
git python-pip python-dev openssh-server bison \ git python-pip python-dev openssh-server bison libnccl-dev \
wget unzip unrar tar xz-utils bzip2 gzip coreutils ntp \ wget unzip unrar tar xz-utils bzip2 gzip coreutils ntp \
curl sed grep graphviz libjpeg-dev zlib1g-dev \ curl sed grep graphviz libjpeg-dev zlib1g-dev \
python-matplotlib gcc-4.8 g++-4.8 \ python-matplotlib gcc-4.8 g++-4.8 \
......
...@@ -189,7 +189,7 @@ OpDesc { ...@@ -189,7 +189,7 @@ OpDesc {
inputs = {0} // the index of x in vars of BlockDesc above inputs = {0} // the index of x in vars of BlockDesc above
outputs = {5, 3} // indices of act and hidden_out in vars of BlockDesc above outputs = {5, 3} // indices of act and hidden_out in vars of BlockDesc above
attrs { attrs {
"memories" : {1} // the index of h "states" : {1} // the index of h
"step_net" : <above step net> "step_net" : <above step net>
} }
}; };
......
...@@ -3,17 +3,17 @@ ...@@ -3,17 +3,17 @@
## The Problem Posed ## The Problem Posed
Currently, for each C++ operator class definition, there registers a *gradient operator creator* function, which takes a C++ operator instance and returns the corresponding gradient operator instance. Currently, for each C++ operator class definition, a *gradient operator creator* function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance.
However, we noticed two problems with the current deisgn: However, we noticed two problems with the current design:
1. As we decided to separate the *compilation* and *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message. 1. As we decided to separate the *compilation* and the *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message.
1. Some operator's gradient computation requires more than one gradient operators. For example, the gradient of *minus* consists of two operators -- an identity operaotr and a scale operator. So we need to make the registration mechanism to support the mapping from an operator to a set of operators for gradient computation. 1. For some operators, the gradient computation can be written in terms of existing operators. For example, the gradient of *minus* operator consists of two operators -- an *identity* operator followed by a *scale* operator. Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation.
## The Current Implementation ## The Current Implementation
The C++ class `OpInfos` store in a association map which key is the operator type. The `grad_op_type` indicate associated gradient operator type. Operator can create gradient operator by `OpInfo::creator_` of gradient. The pseudo code is Instances of the C++ class `OpInfo` are stored an associative map whose key is the operator type. The `grad_op_type` indicates the associated gradient operator type. An operator can create the gradient operator by invoking `OpInfo::creator_` of the gradient operator. The pseudo code is as follows
```cpp ```cpp
struct OpInfo { struct OpInfo {
...@@ -31,16 +31,16 @@ OperatorBase* CreateGradientOperator(const OperatorBase& op) { ...@@ -31,16 +31,16 @@ OperatorBase* CreateGradientOperator(const OperatorBase& op) {
## Proposed Solution ## Proposed Solution
The mapping relationship between an operator and its gradient operators is a function. The interface of that function is: The mapping relationship between an operator and its gradient operators is a function. The interface of this function is:
```cpp ```cpp
// (OpDesc) --> vector<OpDesc> // (OpDesc) --> vector<OpDesc>
std::function<std::vector<OpDescBind>(const OpDescBind&)>; std::function<std::vector<OpDescBind>(const OpDescBind&)>;
``` ```
The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for protobuf message `OpDesc` to manipulate `OpDesc` fast. The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for the protobuf message `OpDesc` for rapid manipulation of `OpDesc`.
The `GradOpDescMaker` will be registered in `OpInfo`, to replace `grad_op_type_` field. The `OpInfo` should be The `GradOpDescMaker` will be registered in `OpInfo` and will replace the `grad_op_type_` field. The `OpInfo` should look like
```cpp ```cpp
struct OpInfo { struct OpInfo {
...@@ -49,7 +49,7 @@ struct OpInfo { ...@@ -49,7 +49,7 @@ struct OpInfo {
}; };
``` ```
The `grad_op_maker_ ` is `nullptr` if the operator does not have associated gradient operators. The `grad_op_maker_ ` is a `nullptr` if the operator does not have any associated gradient operators.
We propose a base class called `GradOpDescMakerBase` to let operator developers generate `Gradient Operators` easily. The public interface of that class is We propose a base class called `GradOpDescMakerBase` to let operator developers generate `Gradient Operators` easily. The public interface of that class is
...@@ -74,7 +74,7 @@ func = [] (const OpDescBind& fwd_op) { ...@@ -74,7 +74,7 @@ func = [] (const OpDescBind& fwd_op) {
We can write many helper functions since the `GradOpDescMakerBase` is a class now. The basic helper functions get the variables of `Input`, `Output`, `InputGradient` and `OutputGradient` in the forwarding operator. We can write many helper functions since the `GradOpDescMakerBase` is a class now. The basic helper functions get the variables of `Input`, `Output`, `InputGradient` and `OutputGradient` in the forwarding operator.
We should chagne register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`. We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`.
The user interface should be The user interface should be
......
...@@ -174,7 +174,7 @@ decoder_inputs = paddle.layer.fc( ...@@ -174,7 +174,7 @@ decoder_inputs = paddle.layer.fc(
1. 两者都是对梯度的截断,但截断时机不同,前者在 :code:`optimzier` 更新网络参数时应用;后者在激活函数反向计算时被调用; 1. 两者都是对梯度的截断,但截断时机不同,前者在 :code:`optimzier` 更新网络参数时应用;后者在激活函数反向计算时被调用;
2. 截断对象不同:前者截断可学习参数的梯度,后者截断回传给前层的梯度; 2. 截断对象不同:前者截断可学习参数的梯度,后者截断回传给前层的梯度;
除此之外,还可以通过减小学习或者对数据进行归一化处理来解决这类问题。 除此之外,还可以通过减小学习或者对数据进行归一化处理来解决这类问题。
5. 如何调用 infer 接口输出多个layer的预测结果 5. 如何调用 infer 接口输出多个layer的预测结果
----------------------------------------------- -----------------------------------------------
......
...@@ -28,9 +28,9 @@ add_style_check_target(paddle_capi ${CAPI_SOURCES} ${CAPI_HEADER} ...@@ -28,9 +28,9 @@ add_style_check_target(paddle_capi ${CAPI_SOURCES} ${CAPI_HEADER}
add_dependencies(paddle_capi paddle_proto) add_dependencies(paddle_capi paddle_proto)
# combine all paddle static libraries together, into libpaddle_capi_whole.a # TODO: paddle_capi_whole will be removed.
# user should use PaddleCAPI as -lpaddle_capi_whole if(MOBILE_INFERENCE)
set(PADDLE_CAPI_INFER_LIBS set(PADDLE_CAPI_INFER_LIBS
paddle_utils paddle_utils
paddle_parameter paddle_parameter
paddle_math paddle_math
...@@ -38,13 +38,27 @@ set(PADDLE_CAPI_INFER_LIBS ...@@ -38,13 +38,27 @@ set(PADDLE_CAPI_INFER_LIBS
paddle_function paddle_function
paddle_gserver paddle_gserver
paddle_proto) paddle_proto)
else()
set(PADDLE_CAPI_INFER_LIBS
paddle_utils
paddle_parameter
paddle_math
paddle_cuda
paddle_function
paddle_gserver
paddle_proto
paddle_pserver
paddle_network)
endif()
cc_library(paddle_capi_whole DEPS paddle_capi ${PADDLE_CAPI_INFER_LIBS}) cc_library(paddle_capi_whole DEPS paddle_capi ${PADDLE_CAPI_INFER_LIBS})
# No shared library for iOS # Link the static library for inference
cc_library(paddle_capi_engine DEPS paddle_capi paddle_utils paddle_parameter paddle_math paddle_cuda paddle_proto)
cc_library(paddle_capi_layers DEPS paddle_function paddle_gserver)
# Link the shared library for inference
if(NOT IOS) if(NOT IOS)
set(LINK_FLAGS " -Wl,--retain-symbols-file ${CMAKE_CURRENT_SOURCE_DIR}/export.sym -Wl,--version-script ${CMAKE_CURRENT_SOURCE_DIR}/export.map") set(LINK_FLAGS "-Wl,--version-script ${CMAKE_CURRENT_SOURCE_DIR}/paddle_capi.map")
# TODO: merge mkl into paddle_capi_shared
add_library(paddle_capi_shared SHARED ${CAPI_SOURCES}) add_library(paddle_capi_shared SHARED ${CAPI_SOURCES})
set_target_properties(paddle_capi_shared PROPERTIES LINK_FLAGS "${LINK_FLAGS}") set_target_properties(paddle_capi_shared PROPERTIES LINK_FLAGS "${LINK_FLAGS}")
target_include_directories(paddle_capi_shared PUBLIC ${CMAKE_CURRENT_BINARY_DIR}) target_include_directories(paddle_capi_shared PUBLIC ${CMAKE_CURRENT_BINARY_DIR})
...@@ -53,9 +67,10 @@ endif() ...@@ -53,9 +67,10 @@ endif()
# install library & headers. # install library & headers.
install(FILES ${CAPI_HEADERS} DESTINATION include/paddle) install(FILES ${CAPI_HEADERS} DESTINATION include/paddle)
install(FILES paddle_capi.map DESTINATION include/paddle)
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/config.h DESTINATION include/paddle) install(FILES ${CMAKE_CURRENT_BINARY_DIR}/config.h DESTINATION include/paddle)
if(ANDROID) if(ANDROID)
install(TARGETS paddle_capi_whole paddle_capi_shared install(TARGETS paddle_capi_whole paddle_capi_engine paddle_capi_layers paddle_capi_shared
ARCHIVE DESTINATION lib/${ANDROID_ABI} ARCHIVE DESTINATION lib/${ANDROID_ABI}
LIBRARY DESTINATION lib/${ANDROID_ABI}) LIBRARY DESTINATION lib/${ANDROID_ABI})
execute_process( execute_process(
...@@ -80,7 +95,7 @@ if(ANDROID) ...@@ -80,7 +95,7 @@ if(ANDROID)
)" )"
) )
else(ANDROID) else(ANDROID)
install(TARGETS paddle_capi_whole ARCHIVE DESTINATION lib) install(TARGETS paddle_capi_whole paddle_capi_engine paddle_capi_layers ARCHIVE DESTINATION lib)
if(NOT IOS) if(NOT IOS)
install(TARGETS paddle_capi_shared DESTINATION lib) install(TARGETS paddle_capi_shared DESTINATION lib)
endif() endif()
......
...@@ -19,16 +19,15 @@ cc_test(scope_test SRCS scope_test.cc DEPS scope) ...@@ -19,16 +19,15 @@ cc_test(scope_test SRCS scope_test.cc DEPS scope)
proto_library(framework_proto SRCS framework.proto) proto_library(framework_proto SRCS framework.proto)
cc_library(attribute SRCS attribute.cc DEPS framework_proto) cc_library(attribute SRCS attribute.cc DEPS framework_proto)
cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS attribute ddim op_info) cc_test(program_desc_test SRCS program_desc_test.cc DEPS proto_desc)
cc_test(program_desc_test SRCS program_desc_test.cc DEPS proto_desc
device_context)
cc_library(op_proto_maker SRCS op_proto_maker.cc DEPS framework_proto attribute) cc_library(op_proto_maker SRCS op_proto_maker.cc DEPS framework_proto attribute)
cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker) cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker)
cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto) cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto)
cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope proto_desc glog) cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope glog)
cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry) cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry)
cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS attribute ddim op_info operator)
cc_library(op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator glog) cc_library(op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator glog proto_desc)
cc_test(op_registry_test SRCS op_registry_test.cc DEPS op_registry) cc_test(op_registry_test SRCS op_registry_test.cc DEPS op_registry)
py_proto_compile(framework_py_proto SRCS framework.proto) py_proto_compile(framework_py_proto SRCS framework.proto)
...@@ -44,7 +43,7 @@ add_custom_command(TARGET framework_py_proto POST_BUILD ...@@ -44,7 +43,7 @@ add_custom_command(TARGET framework_py_proto POST_BUILD
cc_library(backward SRCS backward.cc DEPS net_op) cc_library(backward SRCS backward.cc DEPS net_op)
cc_test(backward_test SRCS backward_test.cc DEPS backward recurrent_op device_context) cc_test(backward_test SRCS backward_test.cc DEPS backward recurrent_op device_context)
cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto backward) cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto backward glog)
cc_library(prune SRCS prune.cc DEPS framework_proto) cc_library(prune SRCS prune.cc DEPS framework_proto)
cc_test(prune_test SRCS prune_test.cc DEPS op_info prune recurrent_op device_context) cc_test(prune_test SRCS prune_test.cc DEPS op_info prune recurrent_op device_context)
......
...@@ -21,6 +21,7 @@ ...@@ -21,6 +21,7 @@
#include "paddle/framework/block_desc.h" #include "paddle/framework/block_desc.h"
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/operators/dynamic_recurrent_op.h"
#include "paddle/operators/net_op.h" #include "paddle/operators/net_op.h"
#include "paddle/operators/recurrent_op.h" #include "paddle/operators/recurrent_op.h"
...@@ -220,8 +221,7 @@ static std::unique_ptr<OperatorBase> BackwardRecursive( ...@@ -220,8 +221,7 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
// process recurrent gradient op as a special operator. // process recurrent gradient op as a special operator.
if (forwardOp.Type() == "recurrent") { if (forwardOp.Type() == "recurrent") {
// NOTE clean up cycle call somewhere (RNN's stepnet constains itself), // NOTE clean up cycle call somewhere (RNN's stepnet constains itself),
// or // or this will result in infinite loop.
// this will result in infinite loop.
const auto& rnnop = const auto& rnnop =
*static_cast<const operators::RecurrentOp*>(&forwardOp); *static_cast<const operators::RecurrentOp*>(&forwardOp);
auto rnn_grad_op = auto rnn_grad_op =
...@@ -231,6 +231,18 @@ static std::unique_ptr<OperatorBase> BackwardRecursive( ...@@ -231,6 +231,18 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
// create stepnet's gradient op // create stepnet's gradient op
rnn_grad_op->set_stepnet( rnn_grad_op->set_stepnet(
BackwardRecursive(stepnet_op, no_grad_names, grad_to_var, uniq_id)); BackwardRecursive(stepnet_op, no_grad_names, grad_to_var, uniq_id));
} else if (forwardOp.Type() == "dynamic_recurrent") {
// NOTE clean up cycle call somewhere (RNN's stepnet constains itself),
// or this will result in infinite loop.
const auto& rnnop =
*static_cast<const operators::DynamicRecurrentOp*>(&forwardOp);
auto rnn_grad_op =
static_cast<operators::DynamicRecurrentGradientOp*>(grad_op.get());
const auto& stepnet_op =
*static_cast<const OperatorBase*>(&rnnop.rnn.GetStepUnit());
// create stepnet's gradient op
rnn_grad_op->rnn.SetStepUnit(
BackwardRecursive(stepnet_op, no_grad_names, grad_to_var, uniq_id));
} }
if (net->ops_.empty()) { // Current no aux op is added to network if (net->ops_.empty()) { // Current no aux op is added to network
......
...@@ -41,6 +41,19 @@ bool BlockDescBind::HasVar(const std::string &name) const { ...@@ -41,6 +41,19 @@ bool BlockDescBind::HasVar(const std::string &name) const {
return vars_.find(name) != vars_.end(); return vars_.find(name) != vars_.end();
} }
VarDescBind *BlockDescBind::FindVarRecursive(const std::string &name) const {
auto it = vars_.find(name);
if (it == vars_.end()) {
return Parent() == kNoneBlockIndex ? nullptr
: ParentBlock()->FindVarRecursive(name);
}
return it->second.get();
}
bool BlockDescBind::HasVarRecursive(const std::string &name) const {
return FindVarRecursive(name) != nullptr;
}
std::vector<VarDescBind *> BlockDescBind::AllVars() const { std::vector<VarDescBind *> BlockDescBind::AllVars() const {
std::vector<VarDescBind *> res; std::vector<VarDescBind *> res;
for (const auto &p : vars_) { for (const auto &p : vars_) {
...@@ -97,7 +110,7 @@ void BlockDescBind::Flush() { ...@@ -97,7 +110,7 @@ void BlockDescBind::Flush() {
} }
BlockDescBind *BlockDescBind::ParentBlock() const { BlockDescBind *BlockDescBind::ParentBlock() const {
if (this->desc_->parent_idx() == -1) { if (this->desc_->parent_idx() == kNoneBlockIndex) {
return nullptr; return nullptr;
} }
return prog_->Block(static_cast<size_t>(this->desc_->parent_idx())); return prog_->Block(static_cast<size_t>(this->desc_->parent_idx()));
......
...@@ -21,6 +21,7 @@ limitations under the License. */ ...@@ -21,6 +21,7 @@ limitations under the License. */
#include <vector> #include <vector>
#include "paddle/framework/op_desc.h" #include "paddle/framework/op_desc.h"
#include "paddle/framework/proto_desc.h"
#include "paddle/framework/var_desc.h" #include "paddle/framework/var_desc.h"
#include "paddle/platform/macros.h" #include "paddle/platform/macros.h"
...@@ -56,6 +57,10 @@ class BlockDescBind { ...@@ -56,6 +57,10 @@ class BlockDescBind {
bool HasVar(const std::string &var_name) const; bool HasVar(const std::string &var_name) const;
VarDescBind *FindVarRecursive(const std::string &name_bytes) const;
bool HasVarRecursive(const std::string &var_name) const;
std::set<std::string> LocalVarNames() const { std::set<std::string> LocalVarNames() const {
std::set<std::string> var_names; std::set<std::string> var_names;
for (auto &var : vars_) { for (auto &var : vars_) {
......
...@@ -26,6 +26,8 @@ inline DataType ToDataType(std::type_index type) { ...@@ -26,6 +26,8 @@ inline DataType ToDataType(std::type_index type) {
return DataType::FP64; return DataType::FP64;
} else if (typeid(int).hash_code() == type.hash_code()) { } else if (typeid(int).hash_code() == type.hash_code()) {
return DataType::INT32; return DataType::INT32;
} else if (typeid(int64_t).hash_code() == type.hash_code()) {
return DataType::INT64;
} else { } else {
PADDLE_THROW("Not supported"); PADDLE_THROW("Not supported");
} }
......
...@@ -68,9 +68,13 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id) { ...@@ -68,9 +68,13 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id) {
for (auto& var : block.vars()) { for (auto& var : block.vars()) {
if (var.persistable()) { if (var.persistable()) {
scope->Var(var.name()); auto* ptr = scope->Var(var.name());
VLOG(3) << "Create Variable " << var.name()
<< " global, which pointer is " << ptr;
} else { } else {
local_scope.Var(var.name()); auto* ptr = local_scope.Var(var.name());
VLOG(3) << "Create Variable " << var.name()
<< " locally, which pointer is " << ptr;
} }
} }
...@@ -80,8 +84,7 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id) { ...@@ -80,8 +84,7 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id) {
op->Run(local_scope, *device); op->Run(local_scope, *device);
} }
// TODO(tonyyang-svail): scope->DeleteScope(&local_scope);
// - Destroy local_scope
} }
} // namespace framework } // namespace framework
......
...@@ -13,37 +13,45 @@ See the License for the specific language governing permissions and ...@@ -13,37 +13,45 @@ See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#pragma once #pragma once
#include "glog/logging.h"
#include "paddle/framework/feed_fetch_type.h"
#include "paddle/framework/scope.h" #include "paddle/framework/scope.h"
#include "paddle/framework/variable.h" #include "paddle/framework/variable.h"
namespace paddle { namespace paddle {
namespace framework { namespace framework {
template <typename T> void SetFeedVariable(Scope* scope, const LoDTensor& input,
void SetFeedVariable(const LoDTensor& input, const std::string& var_name, const std::string& var_name, size_t index) {
size_t index) {
// If var_name Variable is not found in GlobalScope, a new variable will // If var_name Variable is not found in GlobalScope, a new variable will
// be created. // be created.
Variable* g_feed_value = GetGlobalScope().Var(var_name); VLOG(3) << "SetFeedVariable name=" << var_name << " index=" << index;
Variable* g_feed_value = scope->Var(var_name);
auto& feed_inputs = auto& feed_inputs =
*(g_feed_value->GetMutable<std::vector<paddle::framework::LoDTensor>>()); *(g_feed_value->GetMutable<std::vector<paddle::framework::LoDTensor>>());
if (index >= feed_inputs.size()) { if (index >= feed_inputs.size()) {
feed_inputs.resize(index + 1); feed_inputs.resize(index + 1);
} }
// shared data with input tensor // shared data with input tensor
feed_inputs[index].ShareDataWith<T>(input); feed_inputs[index].ShareDataWith(input);
// set lod // set lod
feed_inputs[index].set_lod(input.lod()); feed_inputs[index].set_lod(input.lod());
} }
LoDTensor& GetFetchVariable(const std::string& var_name, size_t index) { LoDTensor& GetFetchVariable(const Scope& scope, const std::string& var_name,
size_t index) {
// Since we want to fetch LodTensor from a variable, the variable must // Since we want to fetch LodTensor from a variable, the variable must
// be created alreadly. // be created alreadly.
Variable* g_fetch_value = GetGlobalScope().FindVar(var_name); Variable* g_fetch_value = scope.FindVar(var_name);
auto& fetch_outputs = PADDLE_ENFORCE(g_fetch_value->IsType<FeedFetchList>(),
*(g_fetch_value->GetMutable<std::vector<paddle::framework::LoDTensor>>()); "Only %s can be invoked by GetFetchVariable",
typeid(FeedFetchList).name());
auto& fetch_outputs = *g_fetch_value->GetMutable<FeedFetchList>();
auto& tensor = fetch_outputs[index];
VLOG(3) << "Fetch " << var_name << " with index " << index
<< " shape= " << tensor.dims();
PADDLE_ENFORCE_LT(index, fetch_outputs.size()); PADDLE_ENFORCE_LT(index, fetch_outputs.size());
return fetch_outputs[index]; return tensor;
} }
} // namespace framework } // namespace framework
......
...@@ -68,6 +68,7 @@ message OpProto { ...@@ -68,6 +68,7 @@ message OpProto {
optional bool duplicable = 3 [ default = false ]; optional bool duplicable = 3 [ default = false ];
optional bool intermediate = 4 [ default = false ]; optional bool intermediate = 4 [ default = false ];
optional bool dispensable = 5 [ default = false ];
} }
// AttrProto describes the C++ type Attribute. // AttrProto describes the C++ type Attribute.
...@@ -112,6 +113,8 @@ message VarDesc { ...@@ -112,6 +113,8 @@ message VarDesc {
enum VarType { enum VarType {
LOD_TENSOR = 1; LOD_TENSOR = 1;
SELECTED_ROWS = 2; SELECTED_ROWS = 2;
FEED_MINIBATCH = 3;
FETCH_LIST = 4;
} }
required string name = 1; required string name = 1;
required VarType type = 2; required VarType type = 2;
......
...@@ -25,31 +25,50 @@ LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end) { ...@@ -25,31 +25,50 @@ LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end) {
for (size_t i = level_begin; i < level_end; i++) { for (size_t i = level_begin; i < level_end; i++) {
new_lod.emplace_back(in.at(i)); new_lod.emplace_back(in.at(i));
} }
// transform the lowest level to absolute offset.
LoD abs_offset_lod = ToAbsOffset(in);
new_lod.back() = abs_offset_lod[level_end - 1];
return new_lod; return new_lod;
} }
LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin, LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin,
size_t elem_end) { size_t elem_end) {
// slice the lod. PADDLE_ENFORCE_LT(level, in.size());
LoD new_lod; PADDLE_ENFORCE_LT(elem_end, in[level].size());
new_lod.reserve(in.size() - level);
auto start = in.at(level)[elem_begin]; LoD res;
auto end = in.at(level)[elem_end]; res.resize(in.size() - level);
// copy the first level
for (auto it = in.begin() + level; it != in.end(); it++) { res[0].assign(in[level].begin() + elem_begin,
auto it_begin = std::find(it->begin(), it->end(), start); in[level].begin() + elem_end + 1);
auto it_end = std::find(it_begin, it->end(), end); for (size_t lvl = 1; lvl < res.size(); lvl++) {
PADDLE_ENFORCE(it_begin != it->end(), "error in parsing lod info"); const auto& in_level = in[level + lvl];
PADDLE_ENFORCE(it_end != it->end(), "error in parsing lod info"); const auto& above_level = res[lvl - 1];
new_lod.emplace_back(it_begin, it_end + 1); auto& out_level = res[lvl];
// reset offset if tensor is copyed and sliced. out_level.assign(in_level.begin() + above_level.front(),
std::transform(new_lod.back().begin(), new_lod.back().end(), in_level.begin() + above_level.back() + 1);
new_lod.back().begin(),
[start](int v) { return v - start; });
PADDLE_ENFORCE_EQ(new_lod.back().front(), 0, "error in slice LoD");
} }
PADDLE_ENFORCE_LE(new_lod.size(), in.size()); for (size_t lvl = 0; lvl < res.size(); lvl++) {
return new_lod; // to make the first offset equals 0, all the elements minus the first
// element
size_t front = res[lvl].front();
for (auto& ele : res[lvl]) {
ele -= front;
}
}
return res;
}
LoD ToAbsOffset(const LoD& in) {
// the lowest level stores relative offsets
if (in.empty() || in.size() == 1) return in;
LoD result = in;
for (int level = result.size() - 2; level >= 0; level--) {
for (auto& ele : result[level]) {
ele = result[level + 1][ele];
}
}
return result;
} }
bool operator==(const LoD& a, const LoD& b) { bool operator==(const LoD& a, const LoD& b) {
...@@ -75,17 +94,7 @@ bool operator==(const LoD& a, const LoD& b) { ...@@ -75,17 +94,7 @@ bool operator==(const LoD& a, const LoD& b) {
size_t LoDTensor::NumElements(size_t level, size_t idx) const { size_t LoDTensor::NumElements(size_t level, size_t idx) const {
PADDLE_ENFORCE_LT(level, NumLevels()); PADDLE_ENFORCE_LT(level, NumLevels());
PADDLE_ENFORCE_LT(idx, NumElements(level)); PADDLE_ENFORCE_LT(idx, NumElements(level));
// the last level of LoD, just return number of records in Tensor
if (level == NumLevels() - 1) {
return lod_[level][idx + 1] - lod_[level][idx]; return lod_[level][idx + 1] - lod_[level][idx];
}
// high level of LoD, and there is another lower level, return number of
// lower-level elements
auto tmp = SliceInLevel(lod_, level, idx, idx + 1);
PADDLE_ENFORCE_GE(tmp.size(), 2);
// there is a 0 as a placeholder stored in LoD, so the number of elements
// equals lod.size() - 1
return tmp[1].size() - 1;
} }
void LoDTensor::ShrinkLevels(size_t level_begin, size_t level_end) { void LoDTensor::ShrinkLevels(size_t level_begin, size_t level_end) {
......
...@@ -39,23 +39,36 @@ using Vector = thrust::host_vector< ...@@ -39,23 +39,36 @@ using Vector = thrust::host_vector<
#endif #endif
/* /*
* 3-level LoD stores * LoD is short for Level of Details.
*
* 0 10 20
* 0 5 10 15 20
* 0 2 5 7 10 12 15 20
* *
* - in a level, each element indicates offset in the underlying Tensor * - in a level, each element indicates relative offset of the lower level
* - the first element should be 0 and that indicates that this sequence start * - the first element should be 0 and that indicates that this sequence start
* from 0 * from 0
* - each sequence's begin and end(no-inclusive) is level[id, id+1] * - each sequence's begin and end(no-inclusive) is level[id, id+1]
*
* For example:
* 3-level LoD stores
*
* 0 2 3
* 0 2 4 7
* 0 2 5 7 10 12 15 20
*/ */
using LoD = std::vector<Vector<size_t>>; using LoD = std::vector<Vector<size_t>>;
/*
* Slice levels from a LoD.
* NOTE the lowest level should always be the absolute offsets of the underlying
* tensor instances. So if higher layers are sliced without the lowest level,
* the lower level of the sliced LoD will be transformed to the absolute offset.
*/
LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end); LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end);
LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin, LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin,
size_t elem_end); size_t elem_end);
/*
* Transform an LoD from relative offsets to absolute offsets.
*/
LoD ToAbsOffset(const LoD& in);
bool operator==(const LoD& a, const LoD& b); bool operator==(const LoD& a, const LoD& b);
......
...@@ -30,8 +30,8 @@ class LoDTensorTester : public ::testing::Test { ...@@ -30,8 +30,8 @@ class LoDTensorTester : public ::testing::Test {
// 0 5 10 15 20 // 0 5 10 15 20
// 0 2 5 7 10 12 15 20 // 0 2 5 7 10 12 15 20
LoD lod; LoD lod;
lod.push_back(std::vector<size_t>{0, 10, 20}); lod.push_back(std::vector<size_t>{0, 2, 3});
lod.push_back(std::vector<size_t>{0, 5, 10, 15, 20}); lod.push_back(std::vector<size_t>{0, 2, 5, 8});
lod.push_back(std::vector<size_t>{0, 2, 5, 7, 10, 12, 15, 17, 20}); lod.push_back(std::vector<size_t>{0, 2, 5, 7, 10, 12, 15, 17, 20});
ASSERT_EQ(lod.size(), 3UL); ASSERT_EQ(lod.size(), 3UL);
...@@ -52,14 +52,14 @@ TEST_F(LoDTensorTester, NumLevels) { ASSERT_EQ(lod_tensor_.NumLevels(), 3UL); } ...@@ -52,14 +52,14 @@ TEST_F(LoDTensorTester, NumLevels) { ASSERT_EQ(lod_tensor_.NumLevels(), 3UL); }
TEST_F(LoDTensorTester, NumElements) { TEST_F(LoDTensorTester, NumElements) {
ASSERT_EQ(lod_tensor_.NumElements(0), 2UL); ASSERT_EQ(lod_tensor_.NumElements(0), 2UL);
ASSERT_EQ(lod_tensor_.NumElements(1), 4UL); ASSERT_EQ(lod_tensor_.NumElements(1), 3UL);
ASSERT_EQ(lod_tensor_.NumElements(2), 8UL); ASSERT_EQ(lod_tensor_.NumElements(2), 8UL);
} }
TEST_F(LoDTensorTester, NumElements2) { TEST_F(LoDTensorTester, NumElements2) {
ASSERT_EQ(lod_tensor_.NumElements(0, 0), 2UL); ASSERT_EQ(lod_tensor_.NumElements(0, 0), 2UL);
ASSERT_EQ(lod_tensor_.NumElements(0, 1), 2UL); ASSERT_EQ(lod_tensor_.NumElements(0, 1), 1UL);
ASSERT_EQ(lod_tensor_.NumElements(1, 1), 2UL); ASSERT_EQ(lod_tensor_.NumElements(1, 1), 3UL);
} }
TEST_F(LoDTensorTester, ShrinkLevels) { TEST_F(LoDTensorTester, ShrinkLevels) {
...@@ -68,17 +68,16 @@ TEST_F(LoDTensorTester, ShrinkLevels) { ...@@ -68,17 +68,16 @@ TEST_F(LoDTensorTester, ShrinkLevels) {
LoDTensor new_lod_tensor = lod_tensor_; LoDTensor new_lod_tensor = lod_tensor_;
new_lod_tensor.ShrinkLevels(level, level + 1); new_lod_tensor.ShrinkLevels(level, level + 1);
ASSERT_EQ(new_lod_tensor.NumLevels(), 1UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 1UL);
ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor_.NumElements(level));
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>()); ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
} }
// shrink 2 level // shrink 2 level
for (size_t level = 0; level < 2UL; ++level) { for (size_t level = 0; level < 2UL; ++level) {
LoDTensor new_lod_tensor = lod_tensor_; LoDTensor new_lod_tensor = lod_tensor_;
new_lod_tensor.ShrinkLevels(level, level + 2); new_lod_tensor.ShrinkLevels(level, level + 2);
// the lowest level's last element should be the tensor's batch_size.
ASSERT_EQ(new_lod_tensor.lod().back().back(),
lod_tensor_.lod().back().back());
ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor_.NumElements(level));
ASSERT_EQ(new_lod_tensor.NumElements(1),
lod_tensor_.NumElements(level + 1));
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>()); ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
} }
} }
...@@ -86,19 +85,19 @@ TEST_F(LoDTensorTester, ShrinkLevels) { ...@@ -86,19 +85,19 @@ TEST_F(LoDTensorTester, ShrinkLevels) {
TEST_F(LoDTensorTester, ShrinkInLevel) { TEST_F(LoDTensorTester, ShrinkInLevel) {
size_t level = 0; size_t level = 0;
LoDTensor new_lod_tensor = lod_tensor_; LoDTensor new_lod_tensor = lod_tensor_;
new_lod_tensor.ShrinkInLevel(level, 0, 2); new_lod_tensor.ShrinkInLevel(level, 0, 1);
EXPECT_EQ(new_lod_tensor.NumLevels(), 3UL); EXPECT_EQ(new_lod_tensor.NumLevels(), 3UL);
EXPECT_EQ(new_lod_tensor.NumElements(0), 2UL); EXPECT_EQ(new_lod_tensor.NumElements(0), 1UL);
EXPECT_EQ(new_lod_tensor.NumElements(1), 4UL); EXPECT_EQ(new_lod_tensor.NumElements(1), 2UL);
EXPECT_EQ(new_lod_tensor.NumElements(2), 8UL); EXPECT_EQ(new_lod_tensor.NumElements(2), 5UL);
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>()); ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
level = 1; level = 1;
new_lod_tensor = lod_tensor_; new_lod_tensor = lod_tensor_;
new_lod_tensor.ShrinkInLevel(level, 0, 2); new_lod_tensor.ShrinkInLevel(level, 1, 2);
ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL); ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
ASSERT_EQ(new_lod_tensor.NumElements(0), 2UL); ASSERT_EQ(new_lod_tensor.NumElements(0), 1UL);
ASSERT_EQ(new_lod_tensor.NumElements(1), 4UL); ASSERT_EQ(new_lod_tensor.NumElements(1), 3UL);
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>()); ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
} }
......
...@@ -44,6 +44,11 @@ class OpProtoAndCheckerMaker { ...@@ -44,6 +44,11 @@ class OpProtoAndCheckerMaker {
var_->set_intermediate(true); var_->set_intermediate(true);
return *this; return *this;
} }
VariableBuilder& AsDispensable() {
var_->set_dispensable(true);
return *this;
}
}; };
VariableBuilder AddInput(const std::string& name, const std::string& comment); VariableBuilder AddInput(const std::string& name, const std::string& comment);
......
...@@ -252,5 +252,20 @@ std::ostream& operator<<(std::ostream& os, ...@@ -252,5 +252,20 @@ std::ostream& operator<<(std::ostream& os,
return os; return os;
} }
bool OpSupportGPU(const std::string& op_type) {
auto& all_kernels = OperatorWithKernel::AllOpKernels();
auto it = all_kernels.find(op_type);
if (it == all_kernels.end()) {
// All control operator must support GPU
return true;
}
for (auto& kern_pair : it->second) {
if (platform::is_gpu_place(kern_pair.first.place_)) {
return true;
}
}
return false;
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -327,37 +327,47 @@ class CompileTimeInferShapeContext : public InferShapeContext { ...@@ -327,37 +327,47 @@ class CompileTimeInferShapeContext : public InferShapeContext {
bool HasInput(const std::string& name) const override { bool HasInput(const std::string& name) const override {
const std::vector<std::string>& input_names = op_.Input(name); const std::vector<std::string>& input_names = op_.Input(name);
auto length = input_names.size(); auto length = input_names.size();
if (length == 0) {
return false;
}
PADDLE_ENFORCE_EQ(length, 1UL, PADDLE_ENFORCE_EQ(length, 1UL,
"Input(%s) should have only one value, " "Input(%s) should have only one value, "
"but it have %d now", "but it have %d now",
name, length); name, length);
return block_.HasVar(input_names[0]); return block_.HasVarRecursive(input_names[0]);
} }
bool HasOutput(const std::string& name) const override { bool HasOutput(const std::string& name) const override {
const std::vector<std::string>& output_names = op_.Output(name); const std::vector<std::string>& output_names = op_.Output(name);
auto length = output_names.size(); auto length = output_names.size();
if (length == 0) {
return false;
}
PADDLE_ENFORCE_EQ(length, 1UL, PADDLE_ENFORCE_EQ(length, 1UL,
"Output(%s) should have only one value, " "Output(%s) should have only one value, "
"but it have %d now", "but it have %d now",
name, length); name, length);
return block_.HasVar(output_names[0]); return block_.HasVarRecursive(output_names[0]);
} }
bool HasInputs(const std::string& name) const override { bool HasInputs(const std::string& name) const override {
const std::vector<std::string>& input_names = op_.Input(name); const std::vector<std::string>& input_names = op_.Input(name);
PADDLE_ENFORCE(!input_names.empty(), "Inputs(%s) length is 0", name); if (input_names.empty()) {
return false;
}
for (auto& input : input_names) { for (auto& input : input_names) {
if (!block_.HasVar(input)) return false; if (!block_.HasVarRecursive(input)) return false;
} }
return true; return true;
} }
bool HasOutputs(const std::string& name) const override { bool HasOutputs(const std::string& name) const override {
const std::vector<std::string>& output_names = op_.Output(name); const std::vector<std::string>& output_names = op_.Output(name);
PADDLE_ENFORCE(!output_names.empty(), "Inputs(%s) length is 0", name); if (output_names.empty()) {
return false;
}
for (auto& output : output_names) { for (auto& output : output_names) {
if (!block_.HasVar(output)) return false; if (!block_.HasVarRecursive(output)) return false;
} }
return true; return true;
} }
...@@ -404,11 +414,11 @@ class CompileTimeInferShapeContext : public InferShapeContext { ...@@ -404,11 +414,11 @@ class CompileTimeInferShapeContext : public InferShapeContext {
private: private:
DDim GetDim(const std::string& name) const override { DDim GetDim(const std::string& name) const override {
return framework::make_ddim(block_.FindVar(name)->Shape()); return framework::make_ddim(block_.FindVarRecursive(name)->Shape());
} }
void SetDim(const std::string& name, const DDim& dim) override { void SetDim(const std::string& name, const DDim& dim) override {
block_.FindVar(name)->SetShape(framework::vectorize(dim)); block_.FindVarRecursive(name)->SetShape(framework::vectorize(dim));
} }
const OpDescBind& op_; const OpDescBind& op_;
...@@ -421,13 +431,27 @@ class RuntimeInferShapeContext : public InferShapeContext { ...@@ -421,13 +431,27 @@ class RuntimeInferShapeContext : public InferShapeContext {
: op_(op), scope_(scope) {} : op_(op), scope_(scope) {}
bool HasInput(const std::string& name) const override { bool HasInput(const std::string& name) const override {
auto ipt = op_.Input(name); auto& ins = Inputs(name);
size_t length = ins.size();
if (length == 0) {
return false;
}
PADDLE_ENFORCE_EQ(length, 1UL, "Input %s should have more than one inputs",
name);
auto ipt = ins[0];
auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt); auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
return var != nullptr; return var != nullptr;
} }
bool HasOutput(const std::string& name) const override { bool HasOutput(const std::string& name) const override {
auto ipt = op_.Output(name); auto& outs = Outputs(name);
size_t length = outs.size();
if (length == 0) {
return false;
}
PADDLE_ENFORCE_EQ(length, 1UL, "Output %s should have more than one inputs",
name);
auto ipt = outs[0];
auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt); auto* var = ipt == kEmptyVarName ? nullptr : scope_.FindVar(ipt);
return var != nullptr; return var != nullptr;
} }
...@@ -649,5 +673,7 @@ class OperatorWithKernel : public OperatorBase { ...@@ -649,5 +673,7 @@ class OperatorWithKernel : public OperatorBase {
std::ostream& operator<<(std::ostream& os, std::ostream& operator<<(std::ostream& os,
const OperatorWithKernel::OpKernelKey& kernel_key); const OperatorWithKernel::OpKernelKey& kernel_key);
extern bool OpSupportGPU(const std::string& op_type);
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -35,8 +35,8 @@ ProgramDesc *ProgramDescBind::Proto() { ...@@ -35,8 +35,8 @@ ProgramDesc *ProgramDescBind::Proto() {
ProgramDescBind::ProgramDescBind() { ProgramDescBind::ProgramDescBind() {
auto *block = prog_.mutable_blocks()->Add(); auto *block = prog_.mutable_blocks()->Add();
block->set_idx(0); block->set_idx(kRootBlockIndex);
block->set_parent_idx(-1); block->set_parent_idx(kNoneBlockIndex);
blocks_.emplace_back(new BlockDescBind(this, block)); blocks_.emplace_back(new BlockDescBind(this, block));
} }
......
...@@ -17,6 +17,7 @@ limitations under the License. */ ...@@ -17,6 +17,7 @@ limitations under the License. */
#include <memory> #include <memory>
#include <vector> #include <vector>
#include "paddle/framework/framework.pb.h" #include "paddle/framework/framework.pb.h"
#include "paddle/framework/proto_desc.h"
#include "paddle/platform/macros.h" #include "paddle/platform/macros.h"
namespace paddle { namespace paddle {
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
namespace paddle {
namespace framework {
// The Index of first Block in Program. also called root block.
constexpr int kRootBlockIndex = 0;
// The Parent Index of root Block, this block does not exist.
constexpr int kNoneBlockIndex = -1;
} // namespace framework
} // namespace paddle
...@@ -65,12 +65,11 @@ void Scope::DropKids() { ...@@ -65,12 +65,11 @@ void Scope::DropKids() {
kids_.clear(); kids_.clear();
} }
framework::Scope& GetGlobalScope() { void Scope::DeleteScope(Scope* scope) {
static framework::Scope* g_scope = nullptr; auto it = std::find(this->kids_.begin(), this->kids_.end(), scope);
if (g_scope == nullptr) { PADDLE_ENFORCE(it != this->kids_.end(), "Cannot find %p as kid scope", scope);
g_scope = new framework::Scope(); this->kids_.erase(it);
} delete scope;
return *g_scope;
} }
} // namespace framework } // namespace framework
......
...@@ -59,6 +59,8 @@ class Scope { ...@@ -59,6 +59,8 @@ class Scope {
/// Find the scope or an ancestor scope that contains the given variable. /// Find the scope or an ancestor scope that contains the given variable.
const Scope* FindScope(const Variable* var) const; const Scope* FindScope(const Variable* var) const;
void DeleteScope(Scope* scope);
/// Drop all kids scopes belonged to this scope. /// Drop all kids scopes belonged to this scope.
void DropKids(); void DropKids();
...@@ -72,8 +74,5 @@ class Scope { ...@@ -72,8 +74,5 @@ class Scope {
DISABLE_COPY_AND_ASSIGN(Scope); DISABLE_COPY_AND_ASSIGN(Scope);
}; };
framework::Scope& GetGlobalScope();
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -60,6 +60,10 @@ class Tensor { ...@@ -60,6 +60,10 @@ class Tensor {
template <typename T> template <typename T>
inline T* mutable_data(platform::Place place); inline T* mutable_data(platform::Place place);
inline void* mutable_data(platform::Place place, std::type_index type);
inline void* mutable_data(platform::Place place);
/** /**
* @brief Return a pointer to mutable memory block. * @brief Return a pointer to mutable memory block.
* *
...@@ -81,7 +85,6 @@ class Tensor { ...@@ -81,7 +85,6 @@ class Tensor {
inline Tensor& Resize(const DDim& dims); inline Tensor& Resize(const DDim& dims);
/*! The internal of two tensors share the same memory block. */ /*! The internal of two tensors share the same memory block. */
template <typename T>
inline Tensor& ShareDataWith(const Tensor& src); inline Tensor& ShareDataWith(const Tensor& src);
/** /**
...@@ -96,26 +99,9 @@ class Tensor { ...@@ -96,26 +99,9 @@ class Tensor {
// TODO(qijun): https://github.com/PaddlePaddle/Paddle/issues/4647 // TODO(qijun): https://github.com/PaddlePaddle/Paddle/issues/4647
// Remove `CopyFrom` and `CopyFromVector` from Tensor interface // Remove `CopyFrom` and `CopyFromVector` from Tensor interface
// and make them global functions // and make them global functions
template <typename T>
inline void CopyFrom(const Tensor& src, const platform::Place& dst_place, inline void CopyFrom(const Tensor& src, const platform::Place& dst_place,
const platform::DeviceContext& ctx); const platform::DeviceContext& ctx);
// FIXME(yuyang18): CopyFrom should without template T, use the replace
// `CopyFrom` with `CopyFromTensor`
inline void CopyFromTensor(const Tensor& src,
const platform::Place& dst_place,
const platform::DeviceContext& ctx) {
// NOLINTNEXTLINES_8 cpplint.py will recognize below lines as functions.
// That is a bug of cpplint.py. Just ignore lint these lines.
if (src.type() == std::type_index(typeid(double))) {
CopyFrom<double>(src, dst_place, ctx);
} else if (src.type() == std::type_index(typeid(float))) {
CopyFrom<float>(src, dst_place, ctx);
} else if (src.type() == std::type_index(typeid(int))) {
CopyFrom<int>(src, dst_place, ctx);
}
}
/** /**
* @brief Copy the content of an external vector to a tensor. * @brief Copy the content of an external vector to a tensor.
* *
...@@ -135,7 +121,6 @@ class Tensor { ...@@ -135,7 +121,6 @@ class Tensor {
* @param[in] begin_idx The begin index of the slice. * @param[in] begin_idx The begin index of the slice.
* @param[in] end_idx The end index of the slice. * @param[in] end_idx The end index of the slice.
*/ */
template <typename T>
inline Tensor Slice(const int& begin_idx, const int& end_idx) const; inline Tensor Slice(const int& begin_idx, const int& end_idx) const;
platform::Place place() const { platform::Place place() const {
...@@ -146,7 +131,6 @@ class Tensor { ...@@ -146,7 +131,6 @@ class Tensor {
std::type_index type() const { return holder_->type(); } std::type_index type() const { return holder_->type(); }
private: private:
template <typename T>
inline void check_memory_size() const; inline void check_memory_size() const;
private: private:
...@@ -155,20 +139,22 @@ class Tensor { ...@@ -155,20 +139,22 @@ class Tensor {
* parameter of Variable. * parameter of Variable.
*/ */
struct Placeholder { struct Placeholder {
virtual ~Placeholder() {} virtual ~Placeholder() = default;
virtual void* ptr() const = 0; virtual void* ptr() const = 0;
virtual size_t size() const = 0; virtual size_t size() const = 0;
virtual std::type_index type() const = 0; virtual std::type_index type() const = 0;
virtual platform::Place place() const = 0; virtual platform::Place place() const = 0;
virtual void set_type(std::type_index type) = 0;
}; };
template <typename T, typename Place> template <typename Place>
struct PlaceholderImpl : public Placeholder { struct PlaceholderImpl : public Placeholder {
PlaceholderImpl(Place place, size_t size) PlaceholderImpl(Place place, size_t size, std::type_index type)
: ptr_(static_cast<T*>(memory::Alloc(place, size)), : ptr_(static_cast<uint8_t*>(memory::Alloc(place, size)),
memory::PODDeleter<T, Place>(place)), memory::PODDeleter<uint8_t, Place>(place)),
place_(place), place_(place),
size_(size) { size_(size),
type_(type) {
PADDLE_ENFORCE_NOT_NULL(ptr_, "Insufficient %s memory to allocation.", PADDLE_ENFORCE_NOT_NULL(ptr_, "Insufficient %s memory to allocation.",
(is_cpu_place(place_) ? "CPU" : "GPU")); (is_cpu_place(place_) ? "CPU" : "GPU"));
} }
...@@ -176,16 +162,20 @@ class Tensor { ...@@ -176,16 +162,20 @@ class Tensor {
virtual size_t size() const { return size_; } virtual size_t size() const { return size_; }
virtual platform::Place place() const { return place_; } virtual platform::Place place() const { return place_; }
virtual void* ptr() const { return static_cast<void*>(ptr_.get()); } virtual void* ptr() const { return static_cast<void*>(ptr_.get()); }
virtual std::type_index type() const { return std::type_index(typeid(T)); } virtual std::type_index type() const { return type_; }
virtual void set_type(std::type_index type) { type_ = type; }
/*! the pointer of memory block. */ /*! the pointer of memory block. */
std::unique_ptr<T, memory::PODDeleter<T, Place>> ptr_; std::unique_ptr<uint8_t, memory::PODDeleter<uint8_t, Place>> ptr_;
/*! the place of memory block. */ /*! the place of memory block. */
platform::Place place_; platform::Place place_;
/*! the size of memory block. */ /*! the size of memory block. */
size_t size_; size_t size_;
/* the current type of memory */
std::type_index type_;
}; };
/*! holds the memory block if allocated. */ /*! holds the memory block if allocated. */
......
...@@ -106,7 +106,7 @@ void TensorArray::Write(size_t index, const LoDTensor& value) { ...@@ -106,7 +106,7 @@ void TensorArray::Write(size_t index, const LoDTensor& value) {
values_[index].Resize(value.dims()); values_[index].Resize(value.dims());
values_[index].mutable_data<value_type>(platform::CPUPlace()); values_[index].mutable_data<value_type>(platform::CPUPlace());
values_[index].CopyFrom<value_type>(value, platform::CPUPlace(), values_[index].CopyFrom(value, platform::CPUPlace(),
platform::CPUDeviceContext()); platform::CPUDeviceContext());
} }
...@@ -116,7 +116,7 @@ void TensorArray::WriteShared(size_t index, const LoDTensor& value) { ...@@ -116,7 +116,7 @@ void TensorArray::WriteShared(size_t index, const LoDTensor& value) {
values_.resize(index + 1); values_.resize(index + 1);
} }
values_[index].ShareDataWith<value_type>(value); values_[index].ShareDataWith(value);
} }
LoDTensor TensorArray::Pack(size_t level, const std::vector<DySeqMeta>& meta, LoDTensor TensorArray::Pack(size_t level, const std::vector<DySeqMeta>& meta,
...@@ -163,8 +163,8 @@ LoDTensor TensorArray::Stack() const { ...@@ -163,8 +163,8 @@ LoDTensor TensorArray::Stack() const {
result.mutable_data<value_type>(platform::CPUPlace()); result.mutable_data<value_type>(platform::CPUPlace());
for (size_t idx = 0; idx < size(); idx++) { for (size_t idx = 0; idx < size(); idx++) {
result.Slice<value_type>(idx, idx + 1) result.Slice(idx, idx + 1)
.CopyFrom<value_type>(Read(idx), platform::CPUPlace(), .CopyFrom(Read(idx), platform::CPUPlace(),
platform::CPUDeviceContext()); platform::CPUDeviceContext());
} }
return result; return result;
...@@ -191,12 +191,11 @@ void TensorArray::Unstack(const LoDTensor& source, bool data_shared) const { ...@@ -191,12 +191,11 @@ void TensorArray::Unstack(const LoDTensor& source, bool data_shared) const {
auto& value = values_[elem]; auto& value = values_[elem];
if (data_shared) { if (data_shared) {
// share memory // share memory
value.ShareDataWith<value_type>(source.Slice<value_type>(elem, elem + 1)); value.ShareDataWith(source.Slice(elem, elem + 1));
} else { } else {
// copy // copy
value.Resize(value_dims); value.Resize(value_dims);
value.CopyFrom<value_type>(source.Slice<value_type>(elem, elem + 1), value.CopyFrom(source.Slice(elem, elem + 1), platform::CPUPlace(),
platform::CPUPlace(),
platform::CPUDeviceContext()); platform::CPUDeviceContext());
} }
} }
...@@ -242,11 +241,10 @@ LoDTensor DynamicBatchUnpacker::GetBatch(size_t index) { ...@@ -242,11 +241,10 @@ LoDTensor DynamicBatchUnpacker::GetBatch(size_t index) {
for (size_t i = 0; i < indice.size(); i++) { for (size_t i = 0; i < indice.size(); i++) {
auto index = indice[i]; auto index = indice[i];
auto target = result.Slice<value_type>(i, i + 1); auto target = result.Slice(i, i + 1);
auto slice = source->Slice<value_type>(index, index + 1); auto slice = source->Slice(index, index + 1);
target.CopyFrom<value_type>(slice, platform::CPUPlace(), target.CopyFrom(slice, platform::CPUPlace(), platform::CPUDeviceContext());
platform::CPUDeviceContext());
} }
return result; return result;
...@@ -277,9 +275,9 @@ LoDTensor PackDynamicBatch(const std::vector<LoDTensor>& source, ...@@ -277,9 +275,9 @@ LoDTensor PackDynamicBatch(const std::vector<LoDTensor>& source,
// target is result[index] // target is result[index]
auto index = seq_meta.begin + batch_id; auto index = seq_meta.begin + batch_id;
if (index >= seq_meta.end) break; if (index >= seq_meta.end) break;
auto source_ = source[batch_id].Slice<float>(seq_id, seq_id + 1); auto source_ = source[batch_id].Slice(seq_id, seq_id + 1);
auto target = result.Slice<float>(index, index + 1); auto target = result.Slice(index, index + 1);
target.CopyFrom<float>(source_, platform::CPUPlace(), target.CopyFrom(source_, platform::CPUPlace(),
platform::CPUDeviceContext()); platform::CPUDeviceContext());
} }
} }
......
...@@ -91,7 +91,7 @@ class TensorArrayPackTester : public ::testing::Test { ...@@ -91,7 +91,7 @@ class TensorArrayPackTester : public ::testing::Test {
size_t begin = level[i]; size_t begin = level[i];
size_t end = level[i + 1]; size_t end = level[i + 1];
for (size_t j = begin; j < end; j++) { for (size_t j = begin; j < end; j++) {
auto record = source.Slice<int>(j, j + 1); auto record = source.Slice(j, j + 1);
for (int dim = 0; dim < 128; dim++) { for (int dim = 0; dim < 128; dim++) {
record.mutable_data<int>(platform::CPUPlace())[dim] = j - begin; record.mutable_data<int>(platform::CPUPlace())[dim] = j - begin;
} }
......
...@@ -19,12 +19,50 @@ limitations under the License. */ ...@@ -19,12 +19,50 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace framework { namespace framework {
template <typename... T>
struct SizeOfTypeFunctor;
template <typename T> template <typename T>
struct SizeOfTypeFunctor<T> {
size_t operator()(std::type_index type) const {
if (typeid(T).hash_code() == type.hash_code()) {
return sizeof(T);
} else {
return 0UL;
}
}
};
template <>
struct SizeOfTypeFunctor<> {
size_t operator()(std::type_index type) const { return 0UL; }
};
template <typename HEAD, typename... TAIL>
struct SizeOfTypeFunctor<HEAD, TAIL...> {
size_t operator()(std::type_index type) const {
SizeOfTypeFunctor<HEAD> head;
size_t head_size = head(type);
if (head_size != 0) {
return head_size;
}
SizeOfTypeFunctor<TAIL...> tail;
return tail(type);
}
};
static inline size_t SizeOfType(std::type_index type) {
SizeOfTypeFunctor<int, float, double, int16_t, int64_t> functor;
size_t size = functor(type);
PADDLE_ENFORCE(size != 0UL, "Cannot get size of type %s", type.name());
return size;
}
inline void Tensor::check_memory_size() const { inline void Tensor::check_memory_size() const {
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE_NOT_NULL(
holder_, "Tensor holds no memory. Call Tensor::mutable_data first."); holder_, "Tensor holds no memory. Call Tensor::mutable_data first.");
PADDLE_ENFORCE_GE( PADDLE_ENFORCE_GE(
holder_->size(), numel() * sizeof(T) + offset_, holder_->size(), numel() * SizeOfType(type()) + offset_,
"Tensor's dims_ is out of bound. Call Tensor::mutable_data " "Tensor's dims_ is out of bound. Call Tensor::mutable_data "
"first to re-allocate memory.\n" "first to re-allocate memory.\n"
"or maybe the required data-type mismatches the data already stored."); "or maybe the required data-type mismatches the data already stored.");
...@@ -32,14 +70,23 @@ inline void Tensor::check_memory_size() const { ...@@ -32,14 +70,23 @@ inline void Tensor::check_memory_size() const {
template <typename T> template <typename T>
inline const T* Tensor::data() const { inline const T* Tensor::data() const {
check_memory_size<T>(); check_memory_size();
PADDLE_ENFORCE(std::is_same<T, void>::value ||
holder_->type().hash_code() == typeid(T).hash_code(),
"Tensor holds the wrong type, it holds %s",
this->holder_->type().name());
return reinterpret_cast<const T*>( return reinterpret_cast<const T*>(
reinterpret_cast<uintptr_t>(holder_->ptr()) + offset_); reinterpret_cast<uintptr_t>(holder_->ptr()) + offset_);
} }
template <typename T> template <typename T>
inline T* Tensor::data() { inline T* Tensor::data() {
check_memory_size<T>(); check_memory_size();
PADDLE_ENFORCE(std::is_same<T, void>::value ||
holder_->type().hash_code() == typeid(T).hash_code(),
"Tensor holds the wrong type, it holds %s",
this->holder_->type().name());
return reinterpret_cast<T*>(reinterpret_cast<uintptr_t>(holder_->ptr()) + return reinterpret_cast<T*>(reinterpret_cast<uintptr_t>(holder_->ptr()) +
offset_); offset_);
} }
...@@ -54,51 +101,62 @@ inline T* Tensor::mutable_data(DDim dims, platform::Place place) { ...@@ -54,51 +101,62 @@ inline T* Tensor::mutable_data(DDim dims, platform::Place place) {
template <typename T> template <typename T>
inline T* Tensor::mutable_data(platform::Place place) { inline T* Tensor::mutable_data(platform::Place place) {
static_assert(std::is_pod<T>::value, "T must be POD"); static_assert(std::is_pod<T>::value, "T must be POD");
return reinterpret_cast<T*>(mutable_data(place, typeid(T)));
}
inline void* Tensor::mutable_data(platform::Place place, std::type_index type) {
if (holder_ != nullptr) {
holder_->set_type(type);
}
PADDLE_ENFORCE_GT(numel(), 0, PADDLE_ENFORCE_GT(numel(), 0,
"Tensor's numel must be larger than zero to call " "Tensor's numel must be larger than zero to call "
"Tensor::mutable_data. Call Tensor::set_dim first."); "Tensor::mutable_data. Call Tensor::set_dim first.");
int64_t size = numel() * SizeOfType(type);
/* some versions of boost::variant don't have operator!= */ /* some versions of boost::variant don't have operator!= */
int64_t size = numel() * sizeof(T);
if (holder_ == nullptr || !(holder_->place() == place) || if (holder_ == nullptr || !(holder_->place() == place) ||
holder_->size() < size + offset_) { holder_->size() < size + offset_) {
if (platform::is_cpu_place(place)) { if (platform::is_cpu_place(place)) {
holder_.reset(new PlaceholderImpl<T, platform::CPUPlace>( holder_.reset(new PlaceholderImpl<platform::CPUPlace>(
boost::get<platform::CPUPlace>(place), size)); boost::get<platform::CPUPlace>(place), size, type));
} else if (platform::is_gpu_place(place)) { } else if (platform::is_gpu_place(place)) {
#ifndef PADDLE_WITH_CUDA #ifndef PADDLE_WITH_CUDA
PADDLE_THROW("'GPUPlace' is not supported in CPU only device."); PADDLE_THROW("'GPUPlace' is not supported in CPU only device.");
} }
#else #else
holder_.reset(new PlaceholderImpl<T, platform::GPUPlace>( holder_.reset(new PlaceholderImpl<platform::GPUPlace>(
boost::get<platform::GPUPlace>(place), size)); boost::get<platform::GPUPlace>(place), size, type));
} }
#endif #endif
offset_ = 0; offset_ = 0;
} }
return reinterpret_cast<T*>(reinterpret_cast<uintptr_t>(holder_->ptr()) + return reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(holder_->ptr()) +
offset_); offset_);
} }
template <typename T> inline void* Tensor::mutable_data(platform::Place place) {
PADDLE_ENFORCE(this->holder_ != nullptr,
"Cannot invoke mutable data if current hold nothing");
return mutable_data(place, holder_->type());
}
inline Tensor& Tensor::ShareDataWith(const Tensor& src) { inline Tensor& Tensor::ShareDataWith(const Tensor& src) {
src.check_memory_size<T>(); src.check_memory_size();
*this = src; *this = src;
return *this; return *this;
} }
template <typename T>
inline void Tensor::CopyFrom(const Tensor& src, inline void Tensor::CopyFrom(const Tensor& src,
const platform::Place& dst_place, const platform::Place& dst_place,
const platform::DeviceContext& ctx) { const platform::DeviceContext& ctx) {
src.check_memory_size<T>(); src.check_memory_size();
Resize(src.dims()); Resize(src.dims());
auto src_place = src.holder_->place(); auto src_place = src.holder_->place();
auto src_ptr = static_cast<const void*>(src.data<T>()); auto src_ptr = src.data<void>();
auto dst_ptr = static_cast<void*>(mutable_data<T>(dst_place)); auto dst_ptr = mutable_data(dst_place, src.type());
auto size = src.numel() * sizeof(T); auto size = src.numel() * SizeOfType(src.type());
if (platform::is_cpu_place(src_place) && platform::is_cpu_place(dst_place)) { if (platform::is_cpu_place(src_place) && platform::is_cpu_place(dst_place)) {
memory::Copy(boost::get<platform::CPUPlace>(dst_place), dst_ptr, memory::Copy(boost::get<platform::CPUPlace>(dst_place), dst_ptr,
...@@ -165,9 +223,8 @@ inline void Tensor::CopyFromVector(const std::vector<T>& src, ...@@ -165,9 +223,8 @@ inline void Tensor::CopyFromVector(const std::vector<T>& src,
#endif #endif
} }
template <typename T>
inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const { inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const {
check_memory_size<T>(); check_memory_size();
PADDLE_ENFORCE_GE(begin_idx, 0, "Slice begin index is less than zero."); PADDLE_ENFORCE_GE(begin_idx, 0, "Slice begin index is less than zero.");
PADDLE_ENFORCE_LE(end_idx, dims_[0], "Slice end index is out of bound."); PADDLE_ENFORCE_LE(end_idx, dims_[0], "Slice end index is out of bound.");
PADDLE_ENFORCE_LT(begin_idx, end_idx, PADDLE_ENFORCE_LT(begin_idx, end_idx,
...@@ -182,7 +239,7 @@ inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const { ...@@ -182,7 +239,7 @@ inline Tensor Tensor::Slice(const int& begin_idx, const int& end_idx) const {
DDim dst_dims = dims_; DDim dst_dims = dims_;
dst_dims[0] = end_idx - begin_idx; dst_dims[0] = end_idx - begin_idx;
dst.Resize(dst_dims); dst.Resize(dst_dims);
dst.offset_ = offset_ + begin_idx * base * sizeof(T); dst.offset_ = offset_ + begin_idx * base * SizeOfType(type());
return dst; return dst;
} }
} }
...@@ -196,10 +253,9 @@ inline const DDim& Tensor::dims() const { return dims_; } ...@@ -196,10 +253,9 @@ inline const DDim& Tensor::dims() const { return dims_; }
inline int64_t Tensor::numel() const { return product(dims_); } inline int64_t Tensor::numel() const { return product(dims_); }
template <typename T>
inline Tensor ReshapeToMatrix(const Tensor& src, int num_col_dims) { inline Tensor ReshapeToMatrix(const Tensor& src, int num_col_dims) {
Tensor res; Tensor res;
res.ShareDataWith<T>(src); res.ShareDataWith(src);
res.Resize(flatten_to_2d(src.dims(), num_col_dims)); res.Resize(flatten_to_2d(src.dims(), num_col_dims));
return res; return res;
} }
......
...@@ -108,7 +108,7 @@ TEST(Tensor, ShareDataWith) { ...@@ -108,7 +108,7 @@ TEST(Tensor, ShareDataWith) {
// Try to share data form uninitialized tensor // Try to share data form uninitialized tensor
bool caught = false; bool caught = false;
try { try {
dst_tensor.ShareDataWith<float>(src_tensor); dst_tensor.ShareDataWith(src_tensor);
} catch (paddle::platform::EnforceNotMet err) { } catch (paddle::platform::EnforceNotMet err) {
caught = true; caught = true;
std::string msg = std::string msg =
...@@ -122,7 +122,7 @@ TEST(Tensor, ShareDataWith) { ...@@ -122,7 +122,7 @@ TEST(Tensor, ShareDataWith) {
ASSERT_TRUE(caught); ASSERT_TRUE(caught);
src_tensor.mutable_data<int>(make_ddim({2, 3, 4}), CPUPlace()); src_tensor.mutable_data<int>(make_ddim({2, 3, 4}), CPUPlace());
dst_tensor.ShareDataWith<int>(src_tensor); dst_tensor.ShareDataWith(src_tensor);
ASSERT_EQ(src_tensor.data<int>(), dst_tensor.data<int>()); ASSERT_EQ(src_tensor.data<int>(), dst_tensor.data<int>());
} }
...@@ -131,7 +131,7 @@ TEST(Tensor, ShareDataWith) { ...@@ -131,7 +131,7 @@ TEST(Tensor, ShareDataWith) {
Tensor src_tensor; Tensor src_tensor;
Tensor dst_tensor; Tensor dst_tensor;
src_tensor.mutable_data<int>(make_ddim({2, 3, 4}), GPUPlace()); src_tensor.mutable_data<int>(make_ddim({2, 3, 4}), GPUPlace());
dst_tensor.ShareDataWith<int>(src_tensor); dst_tensor.ShareDataWith(src_tensor);
ASSERT_EQ(src_tensor.data<int>(), dst_tensor.data<int>()); ASSERT_EQ(src_tensor.data<int>(), dst_tensor.data<int>());
} }
#endif #endif
...@@ -143,7 +143,7 @@ TEST(Tensor, Slice) { ...@@ -143,7 +143,7 @@ TEST(Tensor, Slice) {
{ {
Tensor src_tensor; Tensor src_tensor;
src_tensor.mutable_data<int>(make_ddim({5, 3, 4}), CPUPlace()); src_tensor.mutable_data<int>(make_ddim({5, 3, 4}), CPUPlace());
Tensor slice_tensor = src_tensor.Slice<int>(1, 3); Tensor slice_tensor = src_tensor.Slice(1, 3);
DDim slice_dims = slice_tensor.dims(); DDim slice_dims = slice_tensor.dims();
ASSERT_EQ(arity(slice_dims), 3); ASSERT_EQ(arity(slice_dims), 3);
EXPECT_EQ(slice_dims[0], 2); EXPECT_EQ(slice_dims[0], 2);
...@@ -167,7 +167,7 @@ TEST(Tensor, Slice) { ...@@ -167,7 +167,7 @@ TEST(Tensor, Slice) {
{ {
Tensor src_tensor; Tensor src_tensor;
src_tensor.mutable_data<double>(make_ddim({6, 9}), GPUPlace()); src_tensor.mutable_data<double>(make_ddim({6, 9}), GPUPlace());
Tensor slice_tensor = src_tensor.Slice<double>(2, 6); Tensor slice_tensor = src_tensor.Slice(2, 6);
DDim slice_dims = slice_tensor.dims(); DDim slice_dims = slice_tensor.dims();
ASSERT_EQ(arity(slice_dims), 2); ASSERT_EQ(arity(slice_dims), 2);
EXPECT_EQ(slice_dims[0], 4); EXPECT_EQ(slice_dims[0], 4);
...@@ -202,7 +202,7 @@ TEST(Tensor, CopyFrom) { ...@@ -202,7 +202,7 @@ TEST(Tensor, CopyFrom) {
memcpy(src_ptr, arr, 9 * sizeof(int)); memcpy(src_ptr, arr, 9 * sizeof(int));
auto cpu_place = new paddle::platform::CPUPlace(); auto cpu_place = new paddle::platform::CPUPlace();
dst_tensor.CopyFrom<int>(src_tensor, *cpu_place, cpu_ctx); dst_tensor.CopyFrom(src_tensor, *cpu_place, cpu_ctx);
const int* dst_ptr = dst_tensor.data<int>(); const int* dst_ptr = dst_tensor.data<int>();
ASSERT_NE(src_ptr, dst_ptr); ASSERT_NE(src_ptr, dst_ptr);
...@@ -210,8 +210,8 @@ TEST(Tensor, CopyFrom) { ...@@ -210,8 +210,8 @@ TEST(Tensor, CopyFrom) {
EXPECT_EQ(src_ptr[i], dst_ptr[i]); EXPECT_EQ(src_ptr[i], dst_ptr[i]);
} }
Tensor slice_tensor = src_tensor.Slice<int>(1, 2); Tensor slice_tensor = src_tensor.Slice(1, 2);
dst_tensor.CopyFrom<int>(slice_tensor, *cpu_place, cpu_ctx); dst_tensor.CopyFrom(slice_tensor, *cpu_place, cpu_ctx);
const int* slice_ptr = slice_tensor.data<int>(); const int* slice_ptr = slice_tensor.data<int>();
dst_ptr = dst_tensor.data<int>(); dst_ptr = dst_tensor.data<int>();
ASSERT_NE(dst_ptr, slice_ptr); ASSERT_NE(dst_ptr, slice_ptr);
...@@ -233,11 +233,11 @@ TEST(Tensor, CopyFrom) { ...@@ -233,11 +233,11 @@ TEST(Tensor, CopyFrom) {
// CPU Tensor to GPU Tensor // CPU Tensor to GPU Tensor
auto gpu_place = new paddle::platform::GPUPlace(0); auto gpu_place = new paddle::platform::GPUPlace(0);
CUDADeviceContext gpu_ctx(*gpu_place); CUDADeviceContext gpu_ctx(*gpu_place);
gpu_tensor.CopyFrom<int>(src_tensor, *gpu_place, gpu_ctx); gpu_tensor.CopyFrom(src_tensor, *gpu_place, gpu_ctx);
// GPU Tensor to CPU Tensor // GPU Tensor to CPU Tensor
auto cpu_place = new paddle::platform::CPUPlace(); auto cpu_place = new paddle::platform::CPUPlace();
dst_tensor.CopyFrom<int>(gpu_tensor, *cpu_place, gpu_ctx); dst_tensor.CopyFrom(gpu_tensor, *cpu_place, gpu_ctx);
// Sync before Compare Tensors // Sync before Compare Tensors
gpu_ctx.Wait(); gpu_ctx.Wait();
...@@ -247,13 +247,13 @@ TEST(Tensor, CopyFrom) { ...@@ -247,13 +247,13 @@ TEST(Tensor, CopyFrom) {
EXPECT_EQ(src_ptr[i], dst_ptr[i]); EXPECT_EQ(src_ptr[i], dst_ptr[i]);
} }
Tensor slice_tensor = src_tensor.Slice<int>(1, 2); Tensor slice_tensor = src_tensor.Slice(1, 2);
// CPU Slice Tensor to GPU Tensor // CPU Slice Tensor to GPU Tensor
gpu_tensor.CopyFrom<int>(slice_tensor, *gpu_place, gpu_ctx); gpu_tensor.CopyFrom(slice_tensor, *gpu_place, gpu_ctx);
// GPU Tensor to CPU Tensor // GPU Tensor to CPU Tensor
dst_tensor.CopyFrom<int>(gpu_tensor, *cpu_place, gpu_ctx); dst_tensor.CopyFrom(gpu_tensor, *cpu_place, gpu_ctx);
// Sync before Compare Slice Tensors // Sync before Compare Slice Tensors
gpu_ctx.Wait(); gpu_ctx.Wait();
...@@ -320,7 +320,7 @@ TEST(Tensor, CopyFromVector) { ...@@ -320,7 +320,7 @@ TEST(Tensor, CopyFromVector) {
CUDADeviceContext gpu_ctx(*gpu_place); CUDADeviceContext gpu_ctx(*gpu_place);
gpu_tensor.CopyFromVector<int>(src_vec, gpu_ctx); gpu_tensor.CopyFromVector<int>(src_vec, gpu_ctx);
// Copy from GPU to CPU tensor for comparison // Copy from GPU to CPU tensor for comparison
dst_tensor.CopyFrom<int>(gpu_tensor, *cpu_place, gpu_ctx); dst_tensor.CopyFrom(gpu_tensor, *cpu_place, gpu_ctx);
// Sync before Compare Tensors // Sync before Compare Tensors
gpu_ctx.Wait(); gpu_ctx.Wait();
...@@ -340,7 +340,7 @@ TEST(Tensor, CopyFromVector) { ...@@ -340,7 +340,7 @@ TEST(Tensor, CopyFromVector) {
cpu_tensor.CopyFromVector<int>(src_vec, cpu_ctx); cpu_tensor.CopyFromVector<int>(src_vec, cpu_ctx);
gpu_tensor.Resize(make_ddim({2, 2})); gpu_tensor.Resize(make_ddim({2, 2}));
gpu_tensor.CopyFromVector<int>(src_vec, gpu_ctx); gpu_tensor.CopyFromVector<int>(src_vec, gpu_ctx);
dst_tensor.CopyFrom<int>(gpu_tensor, *cpu_place, gpu_ctx); dst_tensor.CopyFrom(gpu_tensor, *cpu_place, gpu_ctx);
// Sync before Compare Tensors // Sync before Compare Tensors
gpu_ctx.Wait(); gpu_ctx.Wait();
...@@ -368,7 +368,7 @@ TEST(Tensor, ReshapeToMatrix) { ...@@ -368,7 +368,7 @@ TEST(Tensor, ReshapeToMatrix) {
for (int i = 0; i < 2 * 3 * 4 * 9; ++i) { for (int i = 0; i < 2 * 3 * 4 * 9; ++i) {
src_ptr[i] = i; src_ptr[i] = i;
} }
Tensor res = ReshapeToMatrix<int>(src, 2); Tensor res = ReshapeToMatrix(src, 2);
ASSERT_EQ(res.dims()[0], 2 * 3); ASSERT_EQ(res.dims()[0], 2 * 3);
ASSERT_EQ(res.dims()[1], 4 * 9); ASSERT_EQ(res.dims()[1], 4 * 9);
} }
...@@ -25,7 +25,10 @@ class Variable { ...@@ -25,7 +25,10 @@ class Variable {
public: public:
template <typename T> template <typename T>
const T& Get() const { const T& Get() const {
PADDLE_ENFORCE(IsType<T>(), "Variable must be type %s", typeid(T).name()); PADDLE_ENFORCE(holder_ != nullptr, "Variable must hold some thing");
PADDLE_ENFORCE(IsType<T>(),
"Variable must be type %s, the holding type is %s",
typeid(T).name(), holder_->Type().name());
return *static_cast<const T*>(holder_->Ptr()); return *static_cast<const T*>(holder_->Ptr());
} }
......
...@@ -126,7 +126,7 @@ void MKLDNNEltwiseActivation::resetFwd(Argument& act) { ...@@ -126,7 +126,7 @@ void MKLDNNEltwiseActivation::resetFwd(Argument& act) {
copyInVal_ = nullptr; copyInVal_ = nullptr;
if (act.grad && algo == algorithm::eltwise_tanh) { if (act.grad && algo == algorithm::eltwise_tanh) {
// tanh need save src input for backward // tanh need save src input for backward
inVal_ = MKLDNNMatrix::create(nullptr, val_->getPrimitiveDesc()); inVal_ = MKLDNNMatrix::create(val_->getPrimitiveDesc());
copyInVal_ = std::make_shared<mkldnn::reorder>(*val_, *inVal_); copyInVal_ = std::make_shared<mkldnn::reorder>(*val_, *inVal_);
CHECK(copyInVal_) << "should not be emptry"; CHECK(copyInVal_) << "should not be emptry";
pipelineFwd_.push_back(*copyInVal_); pipelineFwd_.push_back(*copyInVal_);
...@@ -145,7 +145,7 @@ void MKLDNNEltwiseActivation::resetBwd(Argument& act) { ...@@ -145,7 +145,7 @@ void MKLDNNEltwiseActivation::resetBwd(Argument& act) {
algorithm algo = getAlgo(this->getName()); algorithm algo = getAlgo(this->getName());
float alpha = getBwdAlpha(); float alpha = getBwdAlpha();
float beta = getBeta(); float beta = getBeta();
grad_ = MKLDNNMatrix::create(act.grad, val_->getPrimitiveDesc()); grad_ = MKLDNNMatrix::create(val_->getPrimitiveDesc(), act.grad);
auto eng = CPUEngine::Instance().getEngine(); auto eng = CPUEngine::Instance().getEngine();
auto bwdDesc = eltwise_bwd::desc( auto bwdDesc = eltwise_bwd::desc(
algo, grad_->getMemoryDesc(), val_->getMemoryDesc(), alpha, beta); algo, grad_->getMemoryDesc(), val_->getMemoryDesc(), alpha, beta);
...@@ -230,7 +230,7 @@ void MKLDNNActivation::resetFwd(Argument& act) { ...@@ -230,7 +230,7 @@ void MKLDNNActivation::resetFwd(Argument& act) {
int ic = cnt_ / bs / ih / iw; int ic = cnt_ / bs / ih / iw;
CHECK_EQ(cnt_, (size_t)bs * ic * ih * iw); CHECK_EQ(cnt_, (size_t)bs * ic * ih * iw);
val_ = MKLDNNMatrix::create( val_ = MKLDNNMatrix::create(
act.value, {bs, ic, ih, iw}, mkldnn::memory::format::nchw, *engine_); {bs, ic, ih, iw}, mkldnn::memory::format::nchw, *engine_, act.value);
CHECK(val_); CHECK(val_);
val_->downSpatial(); val_->downSpatial();
} }
......
...@@ -21,8 +21,8 @@ namespace paddle { ...@@ -21,8 +21,8 @@ namespace paddle {
typedef enum { typedef enum {
MKLDNN_BASE = 1, // basical info of MKLDNN MKLDNN_BASE = 1, // basical info of MKLDNN
MKLDNN_TESTS = 1, // gtest info of MKLDNN MKLDNN_TESTS = 1, // gtest info of MKLDNN
MKLDNN_SIZES = 2, // size info of MKLDNN MKLDNN_FMTS = 2, // format info of MKLDNN
MKLDNN_FMTS = 3, // format info of MKLDNN MKLDNN_SIZES = 3, // size info of MKLDNN
MKLDNN_ALL = 4, // show all info of MKLDNN MKLDNN_ALL = 4, // show all info of MKLDNN
} MKLDNN_LOG_LEVEL; } MKLDNN_LOG_LEVEL;
......
...@@ -116,8 +116,6 @@ void MKLDNNConvLayer::resetFwd(std::vector<primitive>& pipeline, ...@@ -116,8 +116,6 @@ void MKLDNNConvLayer::resetFwd(std::vector<primitive>& pipeline,
resetFwdBuffers(fwdPD_, in, wgt, bias, out); resetFwdBuffers(fwdPD_, in, wgt, bias, out);
resetFwdPipeline(pipeline, fwdPD_, in, wgt, bias, out); resetFwdPipeline(pipeline, fwdPD_, in, wgt, bias, out);
printValueFormatFlow();
} }
void MKLDNNConvLayer::resetBwd(std::vector<primitive>& pipeline, void MKLDNNConvLayer::resetBwd(std::vector<primitive>& pipeline,
...@@ -135,12 +133,6 @@ void MKLDNNConvLayer::resetBwd(std::vector<primitive>& pipeline, ...@@ -135,12 +133,6 @@ void MKLDNNConvLayer::resetBwd(std::vector<primitive>& pipeline,
resetBwdBuffers(bwdWgtPD, bwdDataPD, in, wgt, bias, out); resetBwdBuffers(bwdWgtPD, bwdDataPD, in, wgt, bias, out);
resetBwdPipeline(pipeline, bwdWgtPD, bwdDataPD, in, wgt, bias, out); resetBwdPipeline(pipeline, bwdWgtPD, bwdDataPD, in, wgt, bias, out);
printGradFormatFlow();
}
void MKLDNNConvLayer::updateInputData() {
cpuInVal_->setData(getInputValue(0, CPU_DEVICE)->getData());
} }
void MKLDNNConvLayer::updateWeights(const UpdateCallback& callback) { void MKLDNNConvLayer::updateWeights(const UpdateCallback& callback) {
...@@ -211,11 +203,18 @@ void MKLDNNConvLayer::resetFwdBuffers( ...@@ -211,11 +203,18 @@ void MKLDNNConvLayer::resetFwdBuffers(
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
CHECK(pd); CHECK(pd);
resetInValue(pd, in); resetInValue(
in, std::make_shared<memory::primitive_desc>(pd->src_primitive_desc()));
resetOutValue(out, pd->dst_primitive_desc());
resetWgtBiasValue(pd, wgt, bias); resetWithMatrix(wgt, weight_->getW(), pd->weights_primitive_desc());
resetOutValue(pd, out); if (biases_ && biases_->getW()) {
resetWithMatrix(bias, biases_->getW(), pd->bias_primitive_desc());
} else {
bias = nullptr;
}
} }
void MKLDNNConvLayer::resetFwdPipeline( void MKLDNNConvLayer::resetFwdPipeline(
...@@ -225,104 +224,12 @@ void MKLDNNConvLayer::resetFwdPipeline( ...@@ -225,104 +224,12 @@ void MKLDNNConvLayer::resetFwdPipeline(
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
if (cvtInVal_) {
pipeline.push_back(*cvtInVal_);
}
if (bias) { if (bias) {
fwd_.reset(new conv_fwd(*pd, *in, *wgt, *bias, *out)); fwd_.reset(new conv_fwd(*pd, *in, *wgt, *bias, *out));
} else { } else {
fwd_.reset(new conv_fwd(*pd, *in, *wgt, *out)); fwd_.reset(new conv_fwd(*pd, *in, *wgt, *out));
} }
pipeline.push_back(*fwd_); pipeline.push_back(*fwd_);
if (cvtOutVal_) {
pipeline.push_back(*cvtOutVal_);
}
}
void MKLDNNConvLayer::resetInValue(
std::shared_ptr<conv_fwd::primitive_desc>& pd, MKLDNNMatrixPtr& in) {
const MatrixPtr& inMat = inputLayers_[0]->getOutputValue();
in = MKLDNNMatrix::create(inMat, pd->src_primitive_desc());
// create buffer and reorder if input value do not match
cpuInVal_ = nullptr;
cvtInVal_ = nullptr;
MKLDNNMatrixPtr dnnIn = std::dynamic_pointer_cast<MKLDNNMatrix>(inMat);
CHECK_EQ(inputIsOnlyMKLDNN(), dnnIn != nullptr);
if (dnnIn != nullptr && dnnIn->getPrimitiveDesc() == in->getPrimitiveDesc()) {
in = dnnIn;
return;
}
if (dnnIn) {
if (dnnIn->getFormat() == format::nc) {
CHECK(ih_ == 1 && iw_ == 1) << "when input is nc format";
// create a new one with nchw format and same data
memory::dims inDims = memory::dims{bs_, ic_, 1, 1};
dnnIn = MKLDNNMatrix::create(inMat, inDims, format::nchw, engine_);
}
if (dnnIn->getPrimitiveDesc() == in->getPrimitiveDesc()) {
in = dnnIn;
return;
}
cpuInVal_ = dnnIn;
in = MKLDNNMatrix::create(nullptr, pd->src_primitive_desc());
cvtInVal_ = MKLDNNMatrix::createReorder(cpuInVal_, in);
CHECK(cvtInVal_) << "should not be emptry";
} else {
memory::dims inDims = memory::dims{bs_, ic_, ih_, iw_};
cpuInVal_ = MKLDNNMatrix::create(inMat, inDims, format::nchw, engine_);
if (cpuInVal_->getPrimitiveDesc() != in->getPrimitiveDesc()) {
// create new mkldnn matrix
in = MKLDNNMatrix::create(nullptr, pd->src_primitive_desc());
cvtInVal_ = MKLDNNMatrix::createReorder(cpuInVal_, in);
CHECK(cvtInVal_) << "should not be emptry";
} else {
in = cpuInVal_;
}
}
}
void MKLDNNConvLayer::resetWgtBiasValue(
std::shared_ptr<conv_fwd::primitive_desc>& pd,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias) {
wgt = MKLDNNMatrix::create(weight_->getW(), pd->weights_primitive_desc());
VLOG(MKLDNN_FMTS) << "Weight value format: " << wgt->getFormat();
bias = (biases_ && biases_->getW())
? MKLDNNMatrix::create(biases_->getW(), pd->bias_primitive_desc())
: nullptr;
}
void MKLDNNConvLayer::resetOutValue(
std::shared_ptr<conv_fwd::primitive_desc>& pd, MKLDNNMatrixPtr& out) {
out = MKLDNNMatrix::create(output_.value, pd->dst_primitive_desc());
// create reorder if output value has cpu device and pd do not match
cpuOutVal_ = nullptr;
cvtOutVal_ = nullptr;
if (!outputIsOnlyMKLDNN()) {
const MatrixPtr& cpuOut = getOutput(CPU_DEVICE).value;
memory::dims outDims = memory::dims{bs_, oc_, oh_, ow_};
cpuOutVal_ = MKLDNNMatrix::create(cpuOut, outDims, format::nchw, engine_);
if (cpuOutVal_->getPrimitiveDesc() != pd->dst_primitive_desc()) {
out = MKLDNNMatrix::create(nullptr, pd->dst_primitive_desc());
cvtOutVal_ = MKLDNNMatrix::createReorder(out, cpuOutVal_);
CHECK(cvtOutVal_) << "should not be empty";
} else {
cpuOut->setData(output_.value->getData());
cpuOutVal_ = out;
}
// when output is cpu device, change the mkldnn output value and make them
// share the same data. Then if next layer use inputlayer->getOuputValue()
// to achieve the input value, it will get the right data.
output_.value = std::dynamic_pointer_cast<Matrix>(cpuOutVal_);
return;
}
output_.value = std::dynamic_pointer_cast<Matrix>(out);
} }
void MKLDNNConvLayer::resetBwdWgtPD( void MKLDNNConvLayer::resetBwdWgtPD(
...@@ -331,8 +238,8 @@ void MKLDNNConvLayer::resetBwdWgtPD( ...@@ -331,8 +238,8 @@ void MKLDNNConvLayer::resetBwdWgtPD(
loadConvSettings(wgtDims, biasDims, strides, dilations, padL, padR); loadConvSettings(wgtDims, biasDims, strides, dilations, padL, padR);
// create backward weight using input, output and weight value memory desc // create backward weight using input, output and weight value memory desc
CHECK(inVal_) << "Should have input value"; CHECK(inVal_) << "Should have internal input value";
CHECK(outVal_) << "Should have output value"; CHECK(outVal_) << "Should have internal output value";
CHECK(wgtVal_) << "Should have weight value"; CHECK(wgtVal_) << "Should have weight value";
algorithm algo = algorithm::convolution_direct; algorithm algo = algorithm::convolution_direct;
padding_kind padKind = padding_kind::zero; padding_kind padKind = padding_kind::zero;
...@@ -372,8 +279,8 @@ void MKLDNNConvLayer::resetBwdDataPD( ...@@ -372,8 +279,8 @@ void MKLDNNConvLayer::resetBwdDataPD(
memory::dims wgtDims, biasDims, strides, dilations, padL, padR; memory::dims wgtDims, biasDims, strides, dilations, padL, padR;
loadConvSettings(wgtDims, biasDims, strides, dilations, padL, padR); loadConvSettings(wgtDims, biasDims, strides, dilations, padL, padR);
CHECK(inVal_) << "Should have input value"; CHECK(inVal_) << "Should have internal input value";
CHECK(outVal_) << "Should have output value"; CHECK(outVal_) << "Should have internal output value";
// create backward data using input and output value memory desc // create backward data using input and output value memory desc
// but using weight memory desc with any format // but using weight memory desc with any format
auto bwdDataDesc = conv_bwdData::desc(algorithm::convolution_direct, auto bwdDataDesc = conv_bwdData::desc(algorithm::convolution_direct,
...@@ -399,12 +306,27 @@ void MKLDNNConvLayer::resetBwdBuffers( ...@@ -399,12 +306,27 @@ void MKLDNNConvLayer::resetBwdBuffers(
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
CHECK(wgtPD); CHECK(wgtPD);
resetOutGrad(wgtPD, out); resetOutGrad(out, wgtPD->diff_dst_primitive_desc());
resetWgtBiasGrad(wgtPD, wgt, bias); resetWithMatrix(
wgt, weight_->getWGrad(), wgtPD->diff_weights_primitive_desc());
CHECK(wgtVal_ != nullptr &&
wgt->getPrimitiveDesc() == wgtVal_->getPrimitiveDesc())
<< "primitive desc of weight grad and value should be equal";
resetInGrad(dataPD, in); bias = nullptr;
if (biases_ && biases_->getWGrad()) {
resetWithMatrix(
bias, biases_->getWGrad(), wgtPD->diff_bias_primitive_desc());
CHECK(bias && biasVal_ &&
bias->getPrimitiveDesc() == biasVal_->getPrimitiveDesc())
<< "primitive desc of bias grad should equal the bias value";
}
if (dataPD == nullptr) {
return;
}
resetInGrad(in, dataPD->diff_src_primitive_desc());
resetWgtValBwdData(dataPD, wgtValBwdData_); resetWgtValBwdData(dataPD, wgtValBwdData_);
} }
...@@ -416,10 +338,7 @@ void MKLDNNConvLayer::resetBwdPipeline( ...@@ -416,10 +338,7 @@ void MKLDNNConvLayer::resetBwdPipeline(
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
if (cvtOutGrad_) { CHECK(inVal_);
pipeline.push_back(*cvtOutGrad_);
}
// add bwdWgt handle // add bwdWgt handle
if (bias) { if (bias) {
bwdWgt_.reset(new conv_bwdWgt(*wgtPD, *inVal_, *out, *wgt, *bias)); bwdWgt_.reset(new conv_bwdWgt(*wgtPD, *inVal_, *out, *wgt, *bias));
...@@ -431,99 +350,13 @@ void MKLDNNConvLayer::resetBwdPipeline( ...@@ -431,99 +350,13 @@ void MKLDNNConvLayer::resetBwdPipeline(
if (dataPD == nullptr) { if (dataPD == nullptr) {
return; return;
} }
if (cvtWgtVal_) { if (cvtWgtVal_) {
pipeline.push_back(*cvtWgtVal_); pipeline.push_back(*cvtWgtVal_);
} }
// add bwdData handle // add bwdData handle
CHECK(wgtValBwdData_) << "Should have weight memory"; CHECK(wgtValBwdData_) << "Should have weight memory";
bwdData_.reset(new conv_bwdData(*dataPD, *out, *wgtValBwdData_, *in)); bwdData_.reset(new conv_bwdData(*dataPD, *out, *wgtValBwdData_, *in));
pipeline.push_back(*bwdData_); pipeline.push_back(*bwdData_);
if (cvtInGrad_) {
pipeline.push_back(*cvtInGrad_);
}
}
void MKLDNNConvLayer::resetOutGrad(
std::shared_ptr<conv_bwdWgt::primitive_desc>& wgtPD, MKLDNNMatrixPtr& out) {
cpuOutGrad_ = nullptr;
cvtOutGrad_ = nullptr;
CHECK(outVal_ != nullptr &&
outVal_->getPrimitiveDesc() == wgtPD->diff_dst_primitive_desc())
<< "primitive desc of out grad and value should be equal";
if (outputIsOnlyMKLDNN()) {
MKLDNNLayer::resetOutGrad(out, outVal_->getPrimitiveDesc());
} else {
const MatrixPtr& cpuOut = getOutput(CPU_DEVICE).grad;
// always share the same grad data of CPU output
// then the activation can get the right grad from output_.grad
output_.grad->setData(cpuOut->getData());
// same PrimitiveDesc with cpuInVal_
CHECK(cpuOutVal_);
cpuOutGrad_ = MKLDNNMatrix::create(cpuOut, cpuOutVal_->getPrimitiveDesc());
// create reorder if primitive desc does not match
if (cpuOutGrad_->getPrimitiveDesc() != outVal_->getPrimitiveDesc()) {
out = MKLDNNMatrix::create(nullptr, outVal_->getPrimitiveDesc());
cvtOutGrad_ = MKLDNNMatrix::createReorder(cpuOutGrad_, out);
CHECK(cvtOutGrad_);
} else {
out = cpuOutGrad_;
}
}
}
void MKLDNNConvLayer::resetWgtBiasGrad(
std::shared_ptr<conv_bwdWgt::primitive_desc>& wgtPD,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias) {
wgt = MKLDNNMatrix::create(weight_->getWGrad(),
wgtPD->diff_weights_primitive_desc());
CHECK(nullptr != wgtVal_ &&
wgt->getPrimitiveDesc() == wgtVal_->getPrimitiveDesc())
<< "primitive desc of weight grad and value should be equal";
VLOG(MKLDNN_FMTS) << "weight grad format: " << wgt->getFormat();
bias = nullptr;
if (biasVal_ == nullptr) {
return;
}
bias = MKLDNNMatrix::create(biases_->getWGrad(),
wgtPD->diff_bias_primitive_desc());
CHECK(bias->getPrimitiveDesc() == biasVal_->getPrimitiveDesc())
<< "primitive desc of bias grad should equal the bias value";
}
void MKLDNNConvLayer::resetInGrad(
std::shared_ptr<conv_bwdData::primitive_desc>& dataPD,
MKLDNNMatrixPtr& in) {
in = nullptr;
cpuInGrad_ = nullptr;
cvtInGrad_ = nullptr;
if (dataPD == nullptr) {
return;
}
if (inputIsOnlyMKLDNN()) {
MKLDNNLayer::resetInGrad(in, dataPD->diff_src_primitive_desc());
CHECK(nullptr != inVal_ &&
in->getPrimitiveDesc() == inVal_->getPrimitiveDesc())
<< "primitive desc of input grad and value should be equal";
} else {
const MatrixPtr& cpuIn = getInputGrad(0, CPU_DEVICE);
// same PrimitiveDesc with cpuInVal_
CHECK(cpuInVal_);
cpuInGrad_ = MKLDNNMatrix::create(cpuIn, cpuInVal_->getPrimitiveDesc());
in = cpuInGrad_;
// create reorder if PrimitiveDesc does not match
if (cpuInGrad_->getPrimitiveDesc() != dataPD->diff_src_primitive_desc()) {
in = MKLDNNMatrix::create(getInputGrad(0, MKLDNN_DEVICE),
dataPD->diff_src_primitive_desc());
cvtInGrad_ = MKLDNNMatrix::createReorder(in, cpuInGrad_);
CHECK(cvtInGrad_);
}
}
} }
void MKLDNNConvLayer::resetWgtValBwdData( void MKLDNNConvLayer::resetWgtValBwdData(
...@@ -537,8 +370,7 @@ void MKLDNNConvLayer::resetWgtValBwdData( ...@@ -537,8 +370,7 @@ void MKLDNNConvLayer::resetWgtValBwdData(
// since the primitive_desc would be different with wgtVal_ // since the primitive_desc would be different with wgtVal_
CHECK(wgtVal_) << "should have weight value"; CHECK(wgtVal_) << "should have weight value";
if (dataPD->weights_primitive_desc() != wgtVal_->getPrimitiveDesc()) { if (dataPD->weights_primitive_desc() != wgtVal_->getPrimitiveDesc()) {
wgtValBwdData_ = wgtValBwdData_ = MKLDNNMatrix::create(dataPD->weights_primitive_desc());
MKLDNNMatrix::create(nullptr, dataPD->weights_primitive_desc());
cvtWgtVal_ = MKLDNNMatrix::createReorder(wgtVal_, wgtValBwdData_); cvtWgtVal_ = MKLDNNMatrix::createReorder(wgtVal_, wgtValBwdData_);
CHECK(cvtWgtVal_); CHECK(cvtWgtVal_);
} else { } else {
......
...@@ -48,17 +48,6 @@ protected: ...@@ -48,17 +48,6 @@ protected:
// save forward primitive_desc, which can be used backward // save forward primitive_desc, which can be used backward
std::shared_ptr<conv_fwd::primitive_desc> fwdPD_; std::shared_ptr<conv_fwd::primitive_desc> fwdPD_;
// MKLDNNMatrixPtr which should be created from CPU Device
MKLDNNMatrixPtr cpuInVal_;
MKLDNNMatrixPtr cpuInGrad_;
MKLDNNMatrixPtr cpuOutVal_;
MKLDNNMatrixPtr cpuOutGrad_;
// convert handle between CPU device and MKLDNN device
std::shared_ptr<mkldnn::reorder> cvtInVal_;
std::shared_ptr<mkldnn::reorder> cvtInGrad_;
std::shared_ptr<mkldnn::reorder> cvtOutVal_;
std::shared_ptr<mkldnn::reorder> cvtOutGrad_;
// whether the weight has been init // whether the weight has been init
bool hasInitedWgt_; bool hasInitedWgt_;
...@@ -94,8 +83,6 @@ public: ...@@ -94,8 +83,6 @@ public:
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void updateInputData() override;
void updateWeights(const UpdateCallback& callback) override; void updateWeights(const UpdateCallback& callback) override;
void convertWeightsFromPaddle() override; void convertWeightsFromPaddle() override;
...@@ -109,26 +96,6 @@ public: ...@@ -109,26 +96,6 @@ public:
<< ", sw: " << sw_ << ", dh: " << dh_ << ", dw: " << dw_; << ", sw: " << sw_ << ", dh: " << dh_ << ", dw: " << dw_;
} }
void printValueFormatFlow() override {
if (cpuInVal_) {
VLOG(MKLDNN_FMTS) << cpuInVal_->getFormat() << " >>>";
}
MKLDNNLayer::printValueFormatFlow();
if (cpuOutVal_) {
VLOG(MKLDNN_FMTS) << " >>> " << cpuOutVal_->getFormat();
}
}
void printGradFormatFlow() override {
if (cpuInGrad_) {
VLOG(MKLDNN_FMTS) << cpuInGrad_->getFormat() << " <<<";
}
MKLDNNLayer::printGradFormatFlow();
if (cpuOutGrad_) {
VLOG(MKLDNN_FMTS) << " <<< " << cpuOutGrad_->getFormat();
}
}
protected: protected:
/** /**
* load the dims settings of this conv * load the dims settings of this conv
...@@ -162,23 +129,6 @@ protected: ...@@ -162,23 +129,6 @@ protected:
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
/**
* reset MKLDNNMatrix of input value
*/
void resetInValue(std::shared_ptr<conv_fwd::primitive_desc>& pd,
MKLDNNMatrixPtr& in);
/**
* reset MKLDNNMatrix of weight and bias value
*/
void resetWgtBiasValue(std::shared_ptr<conv_fwd::primitive_desc>& pd,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias);
/**
* reset MKLDNNMatrix of output value
*/
void resetOutValue(std::shared_ptr<conv_fwd::primitive_desc>& pd,
MKLDNNMatrixPtr& out);
/** /**
* reset the backward weight primitive descriptor. * reset the backward weight primitive descriptor.
*/ */
...@@ -207,22 +157,6 @@ protected: ...@@ -207,22 +157,6 @@ protected:
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
/**
* reset MKLDNNMatrix of output grad
*/
void resetOutGrad(std::shared_ptr<conv_bwdWgt::primitive_desc>& wgtPD,
MKLDNNMatrixPtr& out);
/**
* reset MKLDNNMatrix of weight and bias grad
*/
void resetWgtBiasGrad(std::shared_ptr<conv_bwdWgt::primitive_desc>& wgtPD,
MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias);
/**
* reset MKLDNNMatrix of input grad
*/
void resetInGrad(std::shared_ptr<conv_bwdData::primitive_desc>& dataPD,
MKLDNNMatrixPtr& in);
/** /**
* reset MKLDNNMatrix of weight value for backward data * reset MKLDNNMatrix of weight value for backward data
* since the primitive_desc would be different with wgtVal_ * since the primitive_desc would be different with wgtVal_
......
...@@ -62,7 +62,7 @@ void MKLDNNFcLayer::convertWeightsFromPaddle() { ...@@ -62,7 +62,7 @@ void MKLDNNFcLayer::convertWeightsFromPaddle() {
CHECK(wgtVal_) << "should have been initialized"; CHECK(wgtVal_) << "should have been initialized";
bool hasNoSpatial_ = ih_ == 1 && iw_ == 1; bool hasNoSpatial_ = ih_ == 1 && iw_ == 1;
auto targetDim = wgtVal_->getDims(); auto targetDim = wgtVal_->getDims();
auto srcFmt = hasNoSpatial_ ? memory::format::io : memory::format::ihwo; auto srcFmt = hasNoSpatial_ ? format::io : format::ihwo;
wgtVal_->reorderDataFrom(wgtVal_, srcFmt, targetDim); wgtVal_->reorderDataFrom(wgtVal_, srcFmt, targetDim);
hasInitedWgt_ = true; hasInitedWgt_ = true;
} }
...@@ -71,7 +71,7 @@ void MKLDNNFcLayer::convertWeightsToPaddle() { ...@@ -71,7 +71,7 @@ void MKLDNNFcLayer::convertWeightsToPaddle() {
CHECK(wgtVal_) << "should have been initialized"; CHECK(wgtVal_) << "should have been initialized";
bool hasNoSpatial_ = ih_ == 1 && iw_ == 1; bool hasNoSpatial_ = ih_ == 1 && iw_ == 1;
auto targetDim = wgtVal_->getDims(); auto targetDim = wgtVal_->getDims();
auto dstFmt = hasNoSpatial_ ? memory::format::io : memory::format::ihwo; auto dstFmt = hasNoSpatial_ ? format::io : format::ihwo;
wgtVal_->reorderDataTo(wgtVal_, dstFmt, targetDim); wgtVal_->reorderDataTo(wgtVal_, dstFmt, targetDim);
} }
...@@ -100,8 +100,6 @@ void MKLDNNFcLayer::resetFwd(std::vector<primitive>& pipeline, ...@@ -100,8 +100,6 @@ void MKLDNNFcLayer::resetFwd(std::vector<primitive>& pipeline,
resetFwdPD(fwdPD_, in, wgt, bias, out); resetFwdPD(fwdPD_, in, wgt, bias, out);
resetFwdPipeline(pipeline, fwdPD_, in, wgt, bias, out); resetFwdPipeline(pipeline, fwdPD_, in, wgt, bias, out);
printValueFormatFlow();
} }
void MKLDNNFcLayer::resetBwd(std::vector<primitive>& pipeline, void MKLDNNFcLayer::resetBwd(std::vector<primitive>& pipeline,
...@@ -119,12 +117,6 @@ void MKLDNNFcLayer::resetBwd(std::vector<primitive>& pipeline, ...@@ -119,12 +117,6 @@ void MKLDNNFcLayer::resetBwd(std::vector<primitive>& pipeline,
resetBwdDataPD(bwdDataPD, in, out); resetBwdDataPD(bwdDataPD, in, out);
resetBwdPipeline(pipeline, bwdWgtPD, bwdDataPD, in, wgt, bias, out); resetBwdPipeline(pipeline, bwdWgtPD, bwdDataPD, in, wgt, bias, out);
printGradFormatFlow();
}
void MKLDNNFcLayer::updateInputData() {
inVal_->setData(getInputValue(0, CPU_DEVICE)->getData());
} }
void MKLDNNFcLayer::updateWeights(const UpdateCallback& callback) { void MKLDNNFcLayer::updateWeights(const UpdateCallback& callback) {
...@@ -139,51 +131,30 @@ void MKLDNNFcLayer::resetFwdBuffers(MKLDNNMatrixPtr& in, ...@@ -139,51 +131,30 @@ void MKLDNNFcLayer::resetFwdBuffers(MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
resetInValue(in); resetInValue(in);
CHECK(in);
resetWgtBiasValue(wgt, bias);
resetOutValue(out);
}
void MKLDNNFcLayer::resetInValue(MKLDNNMatrixPtr& in) {
if (inputIsOnlyMKLDNN()) {
const MatrixPtr& dnnIn = getInputValue(0);
in = std::dynamic_pointer_cast<MKLDNNMatrix>(dnnIn);
CHECK(in) << "Input should be MKLDNNMatrix";
} else {
CHECK_EQ(getPrev(0)->getDeviceId(), CPU_DEVICE) << "Only support CPU yet";
const MatrixPtr& cpuIn = getInputValue(0, CPU_DEVICE);
in = MKLDNNMatrix::create(
cpuIn, {bs_, ic_, ih_, iw_}, format::nchw, engine_);
}
in->downSpatial(); in->downSpatial();
}
void MKLDNNFcLayer::resetWgtBiasValue(MKLDNNMatrixPtr& wgt, auto outPD =
MKLDNNMatrixPtr& bias) { MKLDNNMatrix::createPrimitiveDesc({bs_, oc_}, format::nc, engine_);
resetOutValue(out, outPD);
format wgtFmt = format::oihw; format wgtFmt = format::oihw;
if (inVal_->getFormat() == format::nChw8c) { if (in->getFormat() == format::nChw8c) {
wgtFmt = format::oIhw8i; wgtFmt = format::oIhw8i;
} else if (inVal_->getFormat() == format::nChw16c) { } else if (in->getFormat() == format::nChw16c) {
wgtFmt = format::oIhw16i; wgtFmt = format::oIhw16i;
} }
wgt = MKLDNNMatrix::create( auto wgtPD =
weight_->getW(), {oc_, ic_, ih_, iw_}, wgtFmt, engine_); MKLDNNMatrix::createPrimitiveDesc({oc_, ic_, ih_, iw_}, wgtFmt, engine_);
resetWithMatrix(wgt, weight_->getW(), wgtPD);
wgt->downSpatial(); wgt->downSpatial();
VLOG(MKLDNN_FMTS) << "Weight value format: " << wgt->getFormat();
bias = (biases_ && biases_->getW())
? MKLDNNMatrix::create(biases_->getW(), {oc_}, format::x, engine_)
: nullptr;
}
void MKLDNNFcLayer::resetOutValue(MKLDNNMatrixPtr& out) { if (biases_ && biases_->getW()) {
out = MKLDNNMatrix::create(output_.value, {bs_, oc_}, format::nc, engine_); auto biasPD = MKLDNNMatrix::createPrimitiveDesc({oc_}, format::x, engine_);
if (!outputIsOnlyMKLDNN()) { resetWithMatrix(bias, biases_->getW(), biasPD);
// fc cpu output value do not need create convert, just share data } else {
getOutput(CPU_DEVICE).value->setData(out->getData()); bias = nullptr;
} }
output_.value = std::dynamic_pointer_cast<Matrix>(out);
} }
void MKLDNNFcLayer::resetFwdPD(std::shared_ptr<fc_fwd::primitive_desc>& pd, void MKLDNNFcLayer::resetFwdPD(std::shared_ptr<fc_fwd::primitive_desc>& pd,
...@@ -219,7 +190,6 @@ void MKLDNNFcLayer::resetFwdPipeline( ...@@ -219,7 +190,6 @@ void MKLDNNFcLayer::resetFwdPipeline(
} else { } else {
fwd_.reset(new fc_fwd(*pd, *in, *wgt, *out)); fwd_.reset(new fc_fwd(*pd, *in, *wgt, *out));
} }
pipeline.push_back(*fwd_); pipeline.push_back(*fwd_);
} }
...@@ -227,44 +197,18 @@ void MKLDNNFcLayer::resetBwdBuffers(MKLDNNMatrixPtr& in, ...@@ -227,44 +197,18 @@ void MKLDNNFcLayer::resetBwdBuffers(MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
resetOutGrad(out); CHECK(inVal_ && outVal_);
resetOutGrad(out, outVal_->getPrimitiveDesc());
resetWgtBiasGrad(wgt, bias); resetInGrad(in, inVal_->getPrimitiveDesc());
resetInGrad(in);
}
void MKLDNNFcLayer::resetOutGrad(MKLDNNMatrixPtr& out) {
CHECK(outVal_);
if (outputIsOnlyMKLDNN()) {
MKLDNNLayer::resetOutGrad(out, outVal_->getPrimitiveDesc());
} else {
const MatrixPtr& cpuOut = getOutput(CPU_DEVICE).grad;
output_.grad->setData(cpuOut->getData());
out = MKLDNNMatrix::create(cpuOut, outVal_->getPrimitiveDesc());
}
}
void MKLDNNFcLayer::resetWgtBiasGrad(MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias) {
CHECK(wgtVal_); CHECK(wgtVal_);
wgt = MKLDNNMatrix::create(weight_->getWGrad(), wgtVal_->getPrimitiveDesc()); resetWithMatrix(wgt, weight_->getWGrad(), wgtVal_->getPrimitiveDesc());
if (biasVal_) {
resetWithMatrix(bias, biases_->getWGrad(), biasVal_->getPrimitiveDesc());
} else {
bias = nullptr; bias = nullptr;
if (biasVal_ == nullptr) {
return;
}
bias =
MKLDNNMatrix::create(biases_->getWGrad(), biasVal_->getPrimitiveDesc());
}
void MKLDNNFcLayer::resetInGrad(MKLDNNMatrixPtr& in) {
in = nullptr;
if (inputLayers_[0]->getOutput().grad == nullptr) {
return;
} }
CHECK(inVal_);
MKLDNNLayer::resetInGrad(in, inVal_->getPrimitiveDesc());
} }
void MKLDNNFcLayer::resetBwdWgtPD( void MKLDNNFcLayer::resetBwdWgtPD(
......
...@@ -66,8 +66,6 @@ public: ...@@ -66,8 +66,6 @@ public:
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void updateInputData() override;
void updateWeights(const UpdateCallback& callback) override; void updateWeights(const UpdateCallback& callback) override;
void convertWeightsFromPaddle() override; void convertWeightsFromPaddle() override;
...@@ -84,9 +82,6 @@ protected: ...@@ -84,9 +82,6 @@ protected:
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
void resetInValue(MKLDNNMatrixPtr& in);
void resetWgtBiasValue(MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& bias);
void resetOutValue(MKLDNNMatrixPtr& out);
void resetFwdPD(std::shared_ptr<fc_fwd::primitive_desc>& pd, void resetFwdPD(std::shared_ptr<fc_fwd::primitive_desc>& pd,
MKLDNNMatrixPtr in, MKLDNNMatrixPtr in,
MKLDNNMatrixPtr wgt, MKLDNNMatrixPtr wgt,
...@@ -109,9 +104,6 @@ protected: ...@@ -109,9 +104,6 @@ protected:
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
void resetOutGrad(MKLDNNMatrixPtr& out);
void resetWgtBiasGrad(MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& bias);
void resetInGrad(MKLDNNMatrixPtr& in);
void resetBwdWgtPD(std::shared_ptr<fc_bwdWgt::primitive_desc>& pd, void resetBwdWgtPD(std::shared_ptr<fc_bwdWgt::primitive_desc>& pd,
MKLDNNMatrixPtr& wgt, MKLDNNMatrixPtr& wgt,
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
......
/* Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "MKLDNNLayer.h"
using namespace mkldnn; // NOLINT
typedef memory::format format;
namespace paddle {
bool MKLDNNLayer::init(const LayerMap& layerMap,
const ParameterMap& parameterMap) {
CHECK(FLAGS_use_mkldnn) << "MkldnnLayers only support use_mkldnn."
<< "Please set WITH_MKLDNN=ON "
<< "and set use_mkldnn=True";
CHECK(!useGpu_) << "Do not support GPU yet";
// set device id before Layer::init
setDevice(MKLDNN_DEVICE);
// change param device to MKLDNN device
setParamsDevice(MKLDNN_DEVICE, parameterMap);
if (!Layer::init(layerMap, parameterMap)) {
return false;
}
setOutputMap();
checkCPUOutputsNumber();
stream_.reset(new MKLDNNStream());
engine_ = CPUEngine::Instance().getEngine();
return true;
}
void MKLDNNLayer::forward(PassType passType) {
passType_ = passType;
{
REGISTER_TIMER_INFO("mkldnn_FwdTimer", getName().c_str());
CHECK(!inputLayers_.empty());
copySeqInfoToOutputs();
size_t elemenCnt = inputLayers_[0]->getOutputValue()->getElementCnt();
if (inputElemenCnt_ != elemenCnt) {
VLOG(MKLDNN_BASE) << getName() << " reset mkldnn forward";
// reset when input total sizes changed, not only the batchsize
inputElemenCnt_ = elemenCnt;
pipelineFwd_.clear();
reshape(bs_, ic_, ih_, iw_, oc_, oh_, ow_);
// all cpu device output grad or value share output's
shareCPUDevice();
resetFwd(pipelineFwd_, inVal_, wgtVal_, biasVal_, outVal_);
// MKLDNNLayer output value should be MKLDNNMatrix
// so external output value is necessary.
// Then external input value is not necessary,
// since input may be mkldnn internal buffer.
CHECK(extOutVal_) << "external output value is necessary";
output_.value = std::dynamic_pointer_cast<Matrix>(extOutVal_);
CHECK(inVal_ && outVal_) << "internal memories are necessary";
if (cvtInVal_) {
pipelineFwd_.insert(pipelineFwd_.begin(), *cvtInVal_);
}
if (cvtOutVal_) {
pipelineFwd_.push_back(*cvtOutVal_);
}
convertWeightsFromPaddle();
printSizeInfo();
printValueFormat();
needResetBwd_ = true;
}
if (inputLayers_[0]->getType() == "data") {
// Update input value data when input layer is "data" type,
// since the input value data address might be changed.
CHECK(extInVal_);
extInVal_->setData(getInputValue(0, CPU_DEVICE)->getData());
}
if (!outputOnlyMKLDNN_) {
clearGrads();
}
stream_->submit(pipelineFwd_);
}
{
REGISTER_TIMER_INFO("FwActTimer", getName().c_str());
forwardActivation();
}
}
void MKLDNNLayer::backward(const UpdateCallback& callback) {
if (needResetBwd_) {
VLOG(MKLDNN_BASE) << getName() << " reset mkldnn backward";
pipelineBwd_.clear();
pipelineMergeGrad_.clear();
mergeGrad_ = nullptr;
resetBwd(pipelineBwd_, inGrad_, wgtGrad_, biasGrad_, outGrad_);
// external output grad is not necessary
// since output may be mkldnn internal buffer or merge them directly.
CHECK(outGrad_) << "internal output grad is necessary";
if (extOutGrad_) {
CHECK_EQ(extOutGrad_->getData(), output_.grad->getData())
<< "the external buffer should share the same data with output_.grad";
}
if (cvtOutGrad_) {
pipelineBwd_.insert(pipelineBwd_.begin(), *cvtOutGrad_);
}
if (cvtInGrad_) {
pipelineBwd_.push_back(*cvtInGrad_);
}
printGradFormat();
needResetBwd_ = false;
}
// merge grad must before backward activation
if (mergeGrad_) {
REGISTER_TIMER_INFO("MergeBpGrad", getName().c_str());
stream_->submit(pipelineMergeGrad_);
}
{
REGISTER_TIMER_INFO("BpActTimer", getName().c_str());
backwardActivation();
}
{
REGISTER_TIMER_INFO("mkldnn_bwdTimer", getName().c_str());
stream_->submit(pipelineBwd_);
}
{
REGISTER_TIMER_INFO("WeightUpdate", getName().c_str());
updateWeights(callback);
}
}
void MKLDNNLayer::reshapeInput(int& batchsize, int& height, int& width) {
const Argument& input = inputLayers_[0]->getOutput();
batchsize = input.getBatchSize();
int h = input.getFrameHeight();
int w = input.getFrameWidth();
if (h != 0) {
height = h;
}
if (w != 0) {
width = w;
}
}
void MKLDNNLayer::reshapeOutput(size_t height, size_t width) {
output_.setFrameHeight(height);
output_.setFrameWidth(width);
for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
outputOtherDevice_[i].setFrameHeight(height);
outputOtherDevice_[i].setFrameWidth(width);
}
}
void MKLDNNLayer::resetWithMatrix(MKLDNNMatrixPtr& dnn,
const MatrixPtr& mat,
memory::primitive_desc pd) {
dnn = nullptr;
if (mat == nullptr) {
return;
}
dnn = MKLDNNMatrix::create(pd, mat);
}
void MKLDNNLayer::resetInValue(
MKLDNNMatrixPtr& in, const std::shared_ptr<memory::primitive_desc>& intPD) {
cvtInVal_ = nullptr;
extInVal_ = nullptr;
in = nullptr;
CHECK_GT(bs_ * ic_ * ih_ * iw_, 0);
auto extPD = MKLDNNMatrix::createPrimitiveDesc(
{bs_, ic_, ih_, iw_}, format::nchw, engine_);
const MatrixPtr& inMat = inputLayers_[0]->getOutputValue();
in = std::dynamic_pointer_cast<MKLDNNMatrix>(inMat);
CHECK_EQ(inputIsOnlyMKLDNN(), in != nullptr);
if (in == nullptr || in->getFormat() == format::nc) {
in = MKLDNNMatrix::create(extPD, inMat);
}
extInVal_ = isPaddleFormat(in->getFormat()) ? in : nullptr;
if (in->getFormat() == format::nc) {
CHECK(ih_ == 1 && iw_ == 1);
}
if (nullptr == intPD || in->getPrimitiveDesc() == *intPD) {
return;
}
// need create reorder
in = MKLDNNMatrix::create(*intPD);
extInVal_ = extInVal_ ? extInVal_ : MKLDNNMatrix::create(extPD, inMat);
cvtInVal_ = MKLDNNMatrix::createReorder(extInVal_, in);
CHECK(cvtInVal_) << "should not be emptry";
}
void MKLDNNLayer::resetOutValue(MKLDNNMatrixPtr& out,
memory::primitive_desc intPD) {
cvtOutVal_ = nullptr;
out = MKLDNNMatrix::create(intPD, output_.value);
extOutVal_ = out;
if (outputIsOnlyMKLDNN() || isPaddleFormat(extOutVal_->getFormat())) {
return;
}
// need create reorder
CHECK_GT(bs_ * oc_ * oh_ * ow_, 0);
extOutVal_ = MKLDNNMatrix::create(
memory::dims{bs_, oc_, oh_, ow_}, format::nchw, engine_, output_.value);
out = MKLDNNMatrix::create(intPD);
cvtOutVal_ = MKLDNNMatrix::createReorder(out, extOutVal_);
CHECK(cvtOutVal_) << "should not be empty";
}
void MKLDNNLayer::resetInGrad(MKLDNNMatrixPtr& in,
memory::primitive_desc intPD) {
cvtInGrad_ = nullptr;
extInGrad_ = nullptr;
in = nullptr;
LayerPtr& input = inputLayers_[0];
if (input->getOutputGrad() == nullptr) {
// no need input grad
return;
}
CHECK(inputIsOnlyMKLDNN() || input->getOutputMapSize() <= 1)
<< "only support input is MKLDNN layer or only have one output layer";
// when input is a mkldnn branch node,
// this layer will save input grad to a internal buffer,
// and the mkldnn input layer will merge them to actual prev->output_.grad
const MatrixPtr& inMat =
input->getOutputMapSize() <= 1 ? input->getOutputGrad() : nullptr;
in = MKLDNNMatrix::create(intPD, inMat);
Argument& arg = input->getOutput(this->getName());
arg.grad = std::dynamic_pointer_cast<Matrix>(in);
CHECK(inVal_);
CHECK(inVal_->getPrimitiveDesc() == intPD) << "the primitive desc must equal";
if (inputIsOnlyMKLDNN()) {
return;
}
extInGrad_ = in;
if (isPaddleFormat(extInGrad_->getFormat())) {
return;
}
// need create reorder
// TODO(TJ): add macro definition to simplify it
CHECK(extInVal_ != nullptr && isPaddleFormat(extInVal_->getFormat()))
<< "should have external input value and the format must be nchw(nc)";
extInGrad_ = MKLDNNMatrix::create(extInVal_->getPrimitiveDesc(), inMat);
CHECK(inVal_ != nullptr && inVal_->getPrimitiveDesc() == intPD)
<< "should have internal input value and primitive desc must equal";
in = MKLDNNMatrix::create(intPD);
cvtInGrad_ = MKLDNNMatrix::createReorder(in, extInGrad_);
CHECK(cvtInGrad_);
}
void MKLDNNLayer::resetOutGrad(MKLDNNMatrixPtr& out,
memory::primitive_desc intPD) {
cvtOutGrad_ = nullptr;
extOutGrad_ = nullptr;
out = nullptr;
MatrixPtr& outMat = output_.grad;
out = MKLDNNMatrix::create(intPD, outMat);
resetMergeGrad(out);
if (outputIsOnlyMKLDNN()) {
return;
}
CHECK_LE(outputMap_.size(), 1U) << "do not support mixed with cpu device";
extOutGrad_ = out;
if (isPaddleFormat(extOutGrad_->getFormat())) {
return;
}
// need create reorder
CHECK(extOutVal_ != nullptr && isPaddleFormat(extOutVal_->getFormat()))
<< "should have external output value and the format must be nchw(nc)";
extOutGrad_ = MKLDNNMatrix::create(extOutVal_->getPrimitiveDesc(), outMat);
CHECK(outVal_ != nullptr && outVal_->getPrimitiveDesc() == intPD)
<< "should have internal output value and primitive desc must equal";
out = MKLDNNMatrix::create(intPD);
cvtOutGrad_ = MKLDNNMatrix::createReorder(extOutGrad_, out);
CHECK(cvtOutGrad_);
}
void MKLDNNLayer::resetMergeGrad(MKLDNNMatrixPtr& out) {
mergeGrad_ = nullptr;
pipelineMergeGrad_.clear();
if (outputMap_.size() <= 1 || !outputIsOnlyMKLDNN()) {
// do not merge when output is not all MKLDNN or only one output
return;
}
CHECK(out) << "should have reset internal ouput grad";
std::vector<double> scales(outputMap_.size(), 1.0);
std::vector<memory::primitive_desc> srcPDs;
std::vector<primitive::at> srcs;
for (auto it = outputMap_.begin(); it != outputMap_.end(); ++it) {
MKLDNNMatrixPtr src =
std::dynamic_pointer_cast<MKLDNNMatrix>(it->second->grad);
CHECK(src) << "should be MKLDNNMatrix";
auto srcDims = src->getDims();
auto dstDims = out->getDims();
CHECK_EQ(srcDims.size(), dstDims.size());
for (size_t i = 0; i < srcDims.size(); ++i) {
CHECK_EQ(srcDims[i], dstDims[i]);
}
VLOG(MKLDNN_BASE) << getName() << " has output grad " << it->first
<< ", format " << src->getFormat();
srcPDs.push_back(src->getPrimitiveDesc());
srcs.push_back(*src);
}
// TODO(TJ): remove me when mkldnn sum support different formats
for (size_t i = 1; i < srcPDs.size(); ++i) {
CHECK(srcPDs[0] == srcPDs[i]);
}
tmpOutGrad_ = out;
tmpCvt_ = nullptr;
if (out->getPrimitiveDesc() != srcPDs[0]) {
tmpOutGrad_ = MKLDNNMatrix::create(srcPDs[0]);
tmpCvt_ = MKLDNNMatrix::createReorder(tmpOutGrad_, out);
CHECK(tmpCvt_);
pipelineMergeGrad_.push_back(*tmpCvt_);
}
auto sumPD =
sum::primitive_desc(tmpOutGrad_->getMemoryDesc(), scales, srcPDs);
mergeGrad_.reset(new sum(sumPD, srcs, *tmpOutGrad_));
pipelineMergeGrad_.insert(pipelineMergeGrad_.begin(), *mergeGrad_);
}
} // namespace paddle
...@@ -58,11 +58,31 @@ protected: ...@@ -58,11 +58,31 @@ protected:
std::vector<mkldnn::primitive> pipelineFwd_; std::vector<mkldnn::primitive> pipelineFwd_;
std::vector<mkldnn::primitive> pipelineBwd_; std::vector<mkldnn::primitive> pipelineBwd_;
// MKLDNNMatrixPtr with internal format /* Value and grad are seperated as internal and external buffers.
* Each MKLDNNLayer must init or reset internal buffer at least,
* and the external buffer format is always nchw of nc(when h==w==1),
* which is the same format as paddle.
* The output_.value and output_.grad always save the external data,
* when mixed with cpu device.
* When all layers are mkldnn layers, they could save internal data.
*/
// below MKLDNNMatrix buffers are all internal buffers
MKLDNNMatrixPtr inVal_; MKLDNNMatrixPtr inVal_;
MKLDNNMatrixPtr inGrad_; MKLDNNMatrixPtr inGrad_;
MKLDNNMatrixPtr outVal_; MKLDNNMatrixPtr outVal_;
MKLDNNMatrixPtr outGrad_; MKLDNNMatrixPtr outGrad_;
// below are external value and grad
MKLDNNMatrixPtr extInVal_;
MKLDNNMatrixPtr extInGrad_;
MKLDNNMatrixPtr extOutVal_;
MKLDNNMatrixPtr extOutGrad_;
// convert handle between external and internal buffers
std::shared_ptr<mkldnn::reorder> cvtInVal_;
std::shared_ptr<mkldnn::reorder> cvtInGrad_;
std::shared_ptr<mkldnn::reorder> cvtOutVal_;
std::shared_ptr<mkldnn::reorder> cvtOutGrad_;
// weight and bias are always internal buffers
MKLDNNMatrixPtr wgtVal_; MKLDNNMatrixPtr wgtVal_;
MKLDNNMatrixPtr wgtGrad_; MKLDNNMatrixPtr wgtGrad_;
MKLDNNMatrixPtr biasVal_; MKLDNNMatrixPtr biasVal_;
...@@ -91,6 +111,7 @@ public: ...@@ -91,6 +111,7 @@ public:
oh_(0), oh_(0),
ow_(0), ow_(0),
needResetBwd_(true), needResetBwd_(true),
outputOnlyMKLDNN_(false),
engine_(mkldnn::engine::cpu, 0), engine_(mkldnn::engine::cpu, 0),
stream_(nullptr), stream_(nullptr),
fwd_(nullptr), fwd_(nullptr),
...@@ -99,92 +120,9 @@ public: ...@@ -99,92 +120,9 @@ public:
~MKLDNNLayer() {} ~MKLDNNLayer() {}
virtual bool init(const LayerMap& layerMap, virtual bool init(const LayerMap& layerMap, const ParameterMap& parameterMap);
const ParameterMap& parameterMap) { virtual void forward(PassType passType);
CHECK(FLAGS_use_mkldnn) << "MkldnnLayers only support use_mkldnn." virtual void backward(const UpdateCallback& callback);
<< "Please set WITH_MKLDNN=ON "
<< "and set use_mkldnn=True";
CHECK(!useGpu_) << "Do not support GPU yet";
// set device id before Layer::init
setDevice(MKLDNN_DEVICE);
// change param device to MKLDNN device
setParamsDevice(MKLDNN_DEVICE, parameterMap);
if (!Layer::init(layerMap, parameterMap)) {
return false;
}
setOutputMap();
checkCPUOutputsNumber();
stream_.reset(new MKLDNNStream());
engine_ = CPUEngine::Instance().getEngine();
return true;
}
void forward(PassType passType) override {
passType_ = passType;
{
REGISTER_TIMER_INFO("mkldnn_FwdTimer", getName().c_str());
CHECK(!inputLayers_.empty());
copySeqInfoToOutputs();
size_t elemenCnt = inputLayers_[0]->getOutput().value->getElementCnt();
if (inputElemenCnt_ != elemenCnt) {
VLOG(MKLDNN_BASE) << getName() << " reset mkldnn forward";
// reset when input total sizes changed, not only the batchsize
inputElemenCnt_ = elemenCnt;
pipelineFwd_.clear();
reshape(bs_, ic_, ih_, iw_, oc_, oh_, ow_);
resetFwd(pipelineFwd_, inVal_, wgtVal_, biasVal_, outVal_);
convertWeightsFromPaddle();
needResetBwd_ = true;
}
if (inputLayers_[0]->getType() == "data") {
updateInputData();
}
if (!outputOnlyMKLDNN_) {
clearGrads();
}
stream_->submit(pipelineFwd_);
}
/* activation */ {
REGISTER_TIMER_INFO("FwActTimer", getName().c_str());
forwardActivation();
}
}
void backward(const UpdateCallback& callback) override {
if (needResetBwd_) {
VLOG(MKLDNN_BASE) << getName() << " reset mkldnn backward";
pipelineBwd_.clear();
pipelineMergeGrad_.clear();
mergeGrad_ = nullptr;
resetBwd(pipelineBwd_, inGrad_, wgtGrad_, biasGrad_, outGrad_);
needResetBwd_ = false;
}
// merge grad must before backward activation
if (mergeGrad_) {
REGISTER_TIMER_INFO("MergeBpGrad", getName().c_str());
stream_->submit(pipelineMergeGrad_);
}
{
REGISTER_TIMER_INFO("BpActTimer", getName().c_str());
backwardActivation();
}
{
REGISTER_TIMER_INFO("mkldnn_bwdTimer", getName().c_str());
stream_->submit(pipelineBwd_);
}
{
REGISTER_TIMER_INFO("WeightUpdate", getName().c_str());
updateWeights(callback);
}
}
/** /**
* reshape the input image sizes * reshape the input image sizes
...@@ -195,7 +133,7 @@ public: ...@@ -195,7 +133,7 @@ public:
int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) = 0; int& bs, int& ic, int& ih, int& iw, int oc, int& oh, int& ow) = 0;
/** /**
* reset the mkldnn forward primitve and memory * reset the mkldnn forward primitve and memories
* only would be called when input size changes * only would be called when input size changes
*/ */
virtual void resetFwd(std::vector<mkldnn::primitive>& pipeline, virtual void resetFwd(std::vector<mkldnn::primitive>& pipeline,
...@@ -205,7 +143,7 @@ public: ...@@ -205,7 +143,7 @@ public:
MKLDNNMatrixPtr& out) = 0; MKLDNNMatrixPtr& out) = 0;
/** /**
* reset the mkldnn backward primitve and memory for mkldnn fc * reset the mkldnn backward primitve and memories
* only would be called when needed * only would be called when needed
*/ */
virtual void resetBwd(std::vector<mkldnn::primitive>& pipeline, virtual void resetBwd(std::vector<mkldnn::primitive>& pipeline,
...@@ -214,12 +152,6 @@ public: ...@@ -214,12 +152,6 @@ public:
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) = 0; MKLDNNMatrixPtr& out) = 0;
/**
* Update input value data when input layer is "data" type.
* Since the input value data address might be changed.
*/
virtual void updateInputData() {}
/** /**
* Update weights and biases if necessary. * Update weights and biases if necessary.
*/ */
...@@ -246,131 +178,78 @@ protected: ...@@ -246,131 +178,78 @@ protected:
/** /**
* reshape the input image sizes and input batchsize * reshape the input image sizes and input batchsize
*/ */
virtual void reshapeInput(int& batchsize, int& height, int& width) { void reshapeInput(int& batchsize, int& height, int& width);
const Argument& input = inputLayers_[0]->getOutput();
batchsize = input.getBatchSize();
int h = input.getFrameHeight();
int w = input.getFrameWidth();
if (h != 0) {
height = h;
}
if (w != 0) {
width = w;
}
}
/** /**
* reshape output image sizes * reshape output image sizes
*/ */
virtual void reshapeOutput(size_t height, size_t width) { void reshapeOutput(size_t height, size_t width);
output_.setFrameHeight(height);
output_.setFrameWidth(width);
for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
outputOtherDevice_[i].setFrameHeight(height);
outputOtherDevice_[i].setFrameWidth(width);
}
}
/** /**
* reset the output grad matrix from primitive desc. * reset MKLDNNMatrix from Matrix and internal primitive desc.
* and reset the merge grad primitive if needed. * reset nullptr if matrix or primitive desc is empty
* note: when this layer has serval outputs,
* it could not be mixed with cpu device,
* since it can not get memory desc from cpu device.
*/ */
virtual void resetOutGrad(MKLDNNMatrixPtr& out, void resetWithMatrix(MKLDNNMatrixPtr& dnn,
mkldnn::memory::primitive_desc pd) { const MatrixPtr& mat,
CHECK(outputIsOnlyMKLDNN()) << "do not support mixed with other device yet"; mkldnn::memory::primitive_desc pd);
mergeGrad_ = nullptr;
pipelineMergeGrad_.clear();
out = MKLDNNMatrix::create(output_.grad, pd);
if (outputMap_.size() <= 1) {
return;
}
std::vector<double> scales(outputMap_.size(), 1.0);
std::vector<mkldnn::memory::primitive_desc> srcPDs;
std::vector<mkldnn::primitive::at> srcs;
for (auto it = outputMap_.begin(); it != outputMap_.end(); ++it) {
MKLDNNMatrixPtr src =
std::dynamic_pointer_cast<MKLDNNMatrix>(it->second->grad);
VLOG(MKLDNN_BASE) << getName() << " has output grad " << it->first;
CHECK(src) << "should be MKLDNNMatrix";
auto srcDims = src->getDims();
auto dstDims = out->getDims();
CHECK_EQ(srcDims.size(), dstDims.size());
for (size_t i = 0; i < srcDims.size(); ++i) {
CHECK_EQ(srcDims[i], dstDims[i]);
}
srcPDs.push_back(src->getPrimitiveDesc());
srcs.push_back(*src);
}
// TODO(TJ): remove me when mkldnn sum support different formats
for (size_t i = 1; i < srcPDs.size(); ++i) {
CHECK(srcPDs[0] == srcPDs[i]);
}
tmpOutGrad_ = nullptr;
tmpCvt_ = nullptr;
if (out->getPrimitiveDesc() != srcPDs[0]) {
tmpOutGrad_ = MKLDNNMatrix::create(nullptr, srcPDs[0]);
tmpCvt_ = MKLDNNMatrix::createReorder(tmpOutGrad_, out);
CHECK(tmpCvt_);
pipelineMergeGrad_.push_back(*tmpCvt_);
} else {
tmpOutGrad_ = out;
}
auto sumPD = mkldnn::sum::primitive_desc( /**
tmpOutGrad_->getMemoryDesc(), scales, srcPDs); * reset input value from input MKLDNNMatrix and internal primitive desc.
mergeGrad_.reset(new mkldnn::sum(sumPD, srcs, *tmpOutGrad_)); * reset both internal and external buffer and create reorder if necessary.
pipelineMergeGrad_.insert(pipelineMergeGrad_.begin(), *mergeGrad_); */
} void resetInValue(
MKLDNNMatrixPtr& in,
const std::shared_ptr<mkldnn::memory::primitive_desc>& intPD = nullptr);
/** /**
* reset input grad from primitive desc. * reset output value from internal primitive desc.
* this function is avaiable for input is only mkldnn * reset both internal and external buffer and create reorder if necessary.
* or input do not care cpu device
*/ */
virtual void resetInGrad(MKLDNNMatrixPtr& in, void resetOutValue(MKLDNNMatrixPtr& out,
mkldnn::memory::primitive_desc pd) { mkldnn::memory::primitive_desc intPD);
LayerPtr& input = inputLayers_[0];
const MatrixPtr& grad =
input->getOutputMapSize() > 1 ? nullptr : input->getOutput().grad;
in = MKLDNNMatrix::create(grad, pd);
Argument& arg = input->getOutput(this->getName());
arg.grad = std::dynamic_pointer_cast<Matrix>(in);
}
/** /**
* print info about sizes * reset input grad from internal primitive desc.
* reset both internal and external buffer and create reorder if necessary.
*/ */
virtual void printSizeInfo() { void resetInGrad(MKLDNNMatrixPtr& in, mkldnn::memory::primitive_desc intPD);
VLOG(MKLDNN_SIZES) << getName() << ": bs: " << bs_ << ", ic: " << ic_
<< ", ih: " << ih_ << ", iw: " << iw_ << ", oc: " << oc_
<< ", oh: " << oh_ << ", ow: " << ow_;
}
/** /**
* Print the mkldnn memory format flow of value * reset output grad from internal primitive desc.
* merge grad if necessary.
* reset both internal and external buffer and create reorder if necessary.
* note: about merge grad, when this layer has several outputs,
* it could not be mixed with cpu device,
* since it can not get memory desc from cpu device.
*/ */
virtual void printValueFormatFlow() { void resetOutGrad(MKLDNNMatrixPtr& out, mkldnn::memory::primitive_desc intPD);
if (inVal_ && outVal_) {
VLOG(MKLDNN_FMTS) << inVal_->getFormat() << " >>> " /**
<< outVal_->getFormat(); * reset the merge grad primitive if necessary.
} * note: do not support the grads mixed with cpu device,
} * since it can not get memory desc from cpu device.
*/
void resetMergeGrad(MKLDNNMatrixPtr& out);
protected:
/**
* Set deviceId of this layer.
*/
void setDevice(int id) { deviceId_ = id; }
/** /**
* Print the mkldnn memory format flow of grad * check the format is nchw or nc,
* which is supported by Paddle default memory layout
*/ */
virtual void printGradFormatFlow() { bool isPaddleFormat(mkldnn::memory::format fmt) {
if (inGrad_ && outGrad_) { if (fmt == mkldnn::memory::format::nchw ||
VLOG(MKLDNN_FMTS) << inGrad_->getFormat() << " <<< " fmt == mkldnn::memory::format::nc) {
<< outGrad_->getFormat(); return true;
} else {
return false;
} }
} }
protected:
/** /**
* If input only has MKLDNN device. * If input only has MKLDNN device.
* Otherwise, only support the previous layer using CPU device. * Otherwise, only support the previous layer using CPU device.
...@@ -380,7 +259,6 @@ protected: ...@@ -380,7 +259,6 @@ protected:
if (prevDevice == MKLDNN_DEVICE) { if (prevDevice == MKLDNN_DEVICE) {
return true; return true;
} else { } else {
// do not support GPU yet
CHECK_EQ(prevDevice, CPU_DEVICE) << "Only support CPU yet"; CHECK_EQ(prevDevice, CPU_DEVICE) << "Only support CPU yet";
return false; return false;
} }
...@@ -400,20 +278,76 @@ protected: ...@@ -400,20 +278,76 @@ protected:
} }
/** /**
* Set deviceId of this layer. * print info about sizes
*/ */
void setDevice(int id) { deviceId_ = id; } virtual void printSizeInfo() {
VLOG(MKLDNN_SIZES) << getName() << ": bs: " << bs_ << ", ic: " << ic_
<< ", ih: " << ih_ << ", iw: " << iw_ << ", oc: " << oc_
<< ", oh: " << oh_ << ", ow: " << ow_;
}
/**
* print the mkldnn memory format of value
*/
virtual void printValueFormat() {
if (extInVal_) {
VLOG(MKLDNN_FMTS) << extInVal_->getFormat() << " >>> ";
}
if (inVal_) {
VLOG(MKLDNN_FMTS) << inVal_->getFormat() << " >>>";
}
if (outVal_) {
VLOG(MKLDNN_FMTS) << outVal_->getFormat() << " >>> ";
}
if (extOutVal_) {
VLOG(MKLDNN_FMTS) << extOutVal_->getFormat();
}
if (wgtVal_) {
VLOG(MKLDNN_FMTS) << "Weight value format: " << wgtVal_->getFormat();
}
if (biasVal_) {
VLOG(MKLDNN_FMTS) << "Bias value format: " << biasVal_->getFormat();
}
}
/**
* print the mkldnn memory format of grad
*/
virtual void printGradFormat() {
if (extOutGrad_) {
VLOG(MKLDNN_FMTS) << extOutGrad_->getFormat();
}
if (outGrad_) {
VLOG(MKLDNN_FMTS) << outGrad_->getFormat() << " <<< ";
}
if (inGrad_) {
VLOG(MKLDNN_FMTS) << inGrad_->getFormat() << " <<<";
}
if (extInGrad_) {
VLOG(MKLDNN_FMTS) << extInGrad_->getFormat() << " <<< ";
}
if (wgtGrad_) {
VLOG(MKLDNN_FMTS) << "Weight grad format: " << wgtGrad_->getFormat();
}
if (biasGrad_) {
VLOG(MKLDNN_FMTS) << "Bias grad format: " << biasGrad_->getFormat();
}
}
private: private:
/** /**
* clear all grad * clear all grad
*/ */
void clearGrads() { void clearGrads() {
if (output_.grad) {
output_.grad->zeroMem(); output_.grad->zeroMem();
}
for (size_t i = 0; i < outputOtherDevice_.size(); i++) { for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
if (outputOtherDevice_[i].grad) {
outputOtherDevice_[i].grad->zeroMem(); outputOtherDevice_[i].grad->zeroMem();
} }
} }
}
/** /**
* Set deviceId of the params used in this layer. * Set deviceId of the params used in this layer.
...@@ -449,6 +383,19 @@ private: ...@@ -449,6 +383,19 @@ private:
} }
} }
/**
* if have cpu device, share value and grad data with output_
*/
void shareCPUDevice() {
if (outputIsOnlyMKLDNN()) {
return;
}
for (size_t i = 0; i < outputOtherDevice_.size(); i++) {
outputOtherDevice_[i].value = output_.value;
outputOtherDevice_[i].grad = output_.grad;
}
}
/** /**
* Check the cpu device number of outputOtherDevice_. * Check the cpu device number of outputOtherDevice_.
* should have only one at most. * should have only one at most.
......
...@@ -85,8 +85,6 @@ void MKLDNNPoolLayer::resetFwd(std::vector<primitive>& pipeline, ...@@ -85,8 +85,6 @@ void MKLDNNPoolLayer::resetFwd(std::vector<primitive>& pipeline,
resetFwdPD(fwdPD_, in, out); resetFwdPD(fwdPD_, in, out);
resetFwdPipeline(pipeline, fwdPD_, in, out); resetFwdPipeline(pipeline, fwdPD_, in, out);
printValueFormatFlow();
} }
void MKLDNNPoolLayer::resetBwd(std::vector<primitive>& pipeline, void MKLDNNPoolLayer::resetBwd(std::vector<primitive>& pipeline,
...@@ -101,65 +99,22 @@ void MKLDNNPoolLayer::resetBwd(std::vector<primitive>& pipeline, ...@@ -101,65 +99,22 @@ void MKLDNNPoolLayer::resetBwd(std::vector<primitive>& pipeline,
resetBwdPD(pd, in, out); resetBwdPD(pd, in, out);
resetBwdPipeline(pipeline, pd, in, out); resetBwdPipeline(pipeline, pd, in, out);
printGradFormatFlow();
}
void MKLDNNPoolLayer::updateInputData() {
inVal_->setData(getInputValue(0, CPU_DEVICE)->getData());
} }
void MKLDNNPoolLayer::resetFwdBuffers(MKLDNNMatrixPtr& in, void MKLDNNPoolLayer::resetFwdBuffers(MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
resetInValue(in); resetInValue(in);
resetOutValue(out);
}
void MKLDNNPoolLayer::resetInValue(MKLDNNMatrixPtr& in) {
if (inputIsOnlyMKLDNN()) {
const MatrixPtr& dnnIn = getInputValue(0);
in = std::dynamic_pointer_cast<MKLDNNMatrix>(dnnIn);
CHECK(in) << "Input should be MKLDNNMatrix";
} else {
CHECK_EQ(getPrev(0)->getDeviceId(), CPU_DEVICE) << "Only support CPU yet";
const MatrixPtr& cpuIn = getInputValue(0, CPU_DEVICE);
in = MKLDNNMatrix::create(
cpuIn, {bs_, ic_, ih_, iw_}, format::nchw, engine_);
}
}
void MKLDNNPoolLayer::resetOutValue(MKLDNNMatrixPtr& out) {
CHECK(inVal_) << "Should reset input value first";
memory::dims outDims = memory::dims{bs_, oc_, oh_, ow_}; memory::dims outDims = memory::dims{bs_, oc_, oh_, ow_};
out = MKLDNNMatrix::create( CHECK(in);
output_.value, outDims, inVal_->getFormat(), engine_); auto outPD =
MKLDNNMatrix::createPrimitiveDesc(outDims, in->getFormat(), engine_);
// create reorder if output value has cpu device and pd do not match resetOutValue(out, outPD);
cpuOutVal_ = nullptr;
cvtOutVal_ = nullptr;
if (!outputIsOnlyMKLDNN()) {
const MatrixPtr& cpuOut = getOutput(CPU_DEVICE).value;
cpuOutVal_ = MKLDNNMatrix::create(cpuOut, outDims, format::nchw, engine_);
if (cpuOutVal_->getPrimitiveDesc() != out->getPrimitiveDesc()) {
out = MKLDNNMatrix::create(nullptr, out->getPrimitiveDesc());
cvtOutVal_ = MKLDNNMatrix::createReorder(out, cpuOutVal_);
CHECK(cvtOutVal_) << "should not be emptry";
} else {
cpuOut->setData(output_.value->getData());
cpuOutVal_ = out;
}
output_.value = std::dynamic_pointer_cast<Matrix>(cpuOutVal_);
return;
}
output_.value = std::dynamic_pointer_cast<Matrix>(outVal_);
} }
void MKLDNNPoolLayer::resetFwdPD(std::shared_ptr<pool_fwd::primitive_desc>& pd, void MKLDNNPoolLayer::resetFwdPD(std::shared_ptr<pool_fwd::primitive_desc>& pd,
MKLDNNMatrixPtr in, MKLDNNMatrixPtr in,
MKLDNNMatrixPtr out) { MKLDNNMatrixPtr out) {
memory::dims inDims = memory::dims{bs_, ic_, ih_, iw_};
memory::dims outDims = memory::dims{bs_, oc_, oh_, ow_};
memory::dims kernels = memory::dims{fh_, fw_}; memory::dims kernels = memory::dims{fh_, fw_};
memory::dims strides = memory::dims{sh_, sw_}; memory::dims strides = memory::dims{sh_, sw_};
memory::dims padL = memory::dims{ph_, pw_}; memory::dims padL = memory::dims{ph_, pw_};
...@@ -194,58 +149,26 @@ void MKLDNNPoolLayer::resetFwdPipeline( ...@@ -194,58 +149,26 @@ void MKLDNNPoolLayer::resetFwdPipeline(
? std::make_shared<pool_fwd>(pool_fwd(*pd, *in, *out, *workspace_)) ? std::make_shared<pool_fwd>(pool_fwd(*pd, *in, *out, *workspace_))
: std::make_shared<pool_fwd>(pool_fwd(*pd, *in, *out)); : std::make_shared<pool_fwd>(pool_fwd(*pd, *in, *out));
pipeline.push_back(*fwd_); pipeline.push_back(*fwd_);
if (cvtOutVal_) {
pipeline.push_back(*cvtOutVal_);
}
} }
void MKLDNNPoolLayer::resetBwdBuffers(MKLDNNMatrixPtr& in, void MKLDNNPoolLayer::resetBwdBuffers(MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
resetOutGrad(out); CHECK(inVal_ && outVal_);
resetOutGrad(out, outVal_->getPrimitiveDesc());
resetInGrad(in); resetInGrad(in, inVal_->getPrimitiveDesc());
}
void MKLDNNPoolLayer::resetOutGrad(MKLDNNMatrixPtr& out) {
cpuOutGrad_ = nullptr;
cvtOutGrad_ = nullptr;
CHECK(outVal_);
if (outputIsOnlyMKLDNN()) {
MKLDNNLayer::resetOutGrad(out, outVal_->getPrimitiveDesc());
} else {
const MatrixPtr& cpuOut = getOutput(CPU_DEVICE).grad;
// always share the same grad data of CPU output
// then the activation can get the right grad from output_.grad
output_.grad->setData(cpuOut->getData());
cpuOutGrad_ = MKLDNNMatrix::create(
cpuOut, memory::dims{bs_, oc_, oh_, ow_}, format::nchw, engine_);
if (cpuOutGrad_->getPrimitiveDesc() != outVal_->getPrimitiveDesc()) {
out = MKLDNNMatrix::create(nullptr, outVal_->getPrimitiveDesc());
cvtOutGrad_ = MKLDNNMatrix::createReorder(cpuOutGrad_, out);
CHECK(cvtOutGrad_) << "should not be emptry";
} else {
out = cpuOutGrad_;
}
}
}
void MKLDNNPoolLayer::resetInGrad(MKLDNNMatrixPtr& in) {
in = nullptr;
if (inputLayers_[0]->getOutput().grad == nullptr) {
return;
}
CHECK(inVal_);
MKLDNNLayer::resetInGrad(in, inVal_->getPrimitiveDesc());
} }
void MKLDNNPoolLayer::resetBwdPD(std::shared_ptr<pool_bwd::primitive_desc>& pd, void MKLDNNPoolLayer::resetBwdPD(std::shared_ptr<pool_bwd::primitive_desc>& pd,
MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
pd = nullptr;
if (in == nullptr) {
return;
}
memory::dims kernels = memory::dims{fh_, fw_}; memory::dims kernels = memory::dims{fh_, fw_};
memory::dims strides = memory::dims{sh_, sw_}; memory::dims strides = memory::dims{sh_, sw_};
memory::dims padL = memory::dims{ph_, pw_}; memory::dims padL = memory::dims{ph_, pw_};
memory::dims padR = getPaddingR(); memory::dims padR = getPaddingR();
CHECK(in);
CHECK(out); CHECK(out);
auto bwdDesc = pool_bwd::desc(poolAlgo_, auto bwdDesc = pool_bwd::desc(poolAlgo_,
in->getMemoryDesc(), in->getMemoryDesc(),
...@@ -263,8 +186,8 @@ void MKLDNNPoolLayer::resetBwdPipeline( ...@@ -263,8 +186,8 @@ void MKLDNNPoolLayer::resetBwdPipeline(
std::shared_ptr<pool_bwd::primitive_desc>& pd, std::shared_ptr<pool_bwd::primitive_desc>& pd,
MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& out) { MKLDNNMatrixPtr& out) {
if (cvtOutGrad_) { if (pd == nullptr) {
pipeline.push_back(*cvtOutGrad_); return;
} }
bwdData_ = bwdData_ =
......
...@@ -38,13 +38,6 @@ protected: ...@@ -38,13 +38,6 @@ protected:
// pooling_avg or pooling_max // pooling_avg or pooling_max
mkldnn::algorithm poolAlgo_; mkldnn::algorithm poolAlgo_;
// MKLDNNMatrixPtr which should be created from CPU Device
MKLDNNMatrixPtr cpuOutVal_;
MKLDNNMatrixPtr cpuOutGrad_;
// convert handle between CPU device and MKLDNN device
std::shared_ptr<mkldnn::reorder> cvtOutVal_;
std::shared_ptr<mkldnn::reorder> cvtOutGrad_;
// save forward primitive_desc, which can be used backward // save forward primitive_desc, which can be used backward
std::shared_ptr<pool_fwd::primitive_desc> fwdPD_; std::shared_ptr<pool_fwd::primitive_desc> fwdPD_;
// according to https://github.com/01org/mkl-dnn/blob/master/tests/gtests/ // according to https://github.com/01org/mkl-dnn/blob/master/tests/gtests/
...@@ -74,8 +67,6 @@ public: ...@@ -74,8 +67,6 @@ public:
MKLDNNMatrixPtr& bias, MKLDNNMatrixPtr& bias,
MKLDNNMatrixPtr& out) override; MKLDNNMatrixPtr& out) override;
void updateInputData() override;
void printSizeInfo() override { void printSizeInfo() override {
MKLDNNLayer::printSizeInfo(); MKLDNNLayer::printSizeInfo();
VLOG(MKLDNN_SIZES) << getName() << ": fh: " << fh_ << ", fw: " << fw_ VLOG(MKLDNN_SIZES) << getName() << ": fh: " << fh_ << ", fw: " << fw_
...@@ -90,8 +81,6 @@ protected: ...@@ -90,8 +81,6 @@ protected:
* reset pipeline. * reset pipeline.
*/ */
void resetFwdBuffers(MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& out); void resetFwdBuffers(MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& out);
void resetInValue(MKLDNNMatrixPtr& in);
void resetOutValue(MKLDNNMatrixPtr& out);
void resetFwdPD(std::shared_ptr<pool_fwd::primitive_desc>& pd, void resetFwdPD(std::shared_ptr<pool_fwd::primitive_desc>& pd,
MKLDNNMatrixPtr in, MKLDNNMatrixPtr in,
MKLDNNMatrixPtr out); MKLDNNMatrixPtr out);
...@@ -106,8 +95,6 @@ protected: ...@@ -106,8 +95,6 @@ protected:
* reset pipeline. * reset pipeline.
*/ */
void resetBwdBuffers(MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& out); void resetBwdBuffers(MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& out);
void resetOutGrad(MKLDNNMatrixPtr& out);
void resetInGrad(MKLDNNMatrixPtr& in);
void resetBwdPD(std::shared_ptr<pool_bwd::primitive_desc>& pd, void resetBwdPD(std::shared_ptr<pool_bwd::primitive_desc>& pd,
MKLDNNMatrixPtr& in, MKLDNNMatrixPtr& in,
MKLDNNMatrixPtr& out); MKLDNNMatrixPtr& out);
......
...@@ -97,7 +97,7 @@ void MKLDNNTester::randomWgtDatas() { ...@@ -97,7 +97,7 @@ void MKLDNNTester::randomWgtDatas() {
parameters_[REF][i]->randomize(); parameters_[REF][i]->randomize();
dnnValue->copyFrom(*refValue); dnnValue->copyFrom(*refValue);
VLOG(lvl_) << "Random weight data " << parameters_[DNN][i]->getName(); VLOG(MKLDNN_TESTS) << "Random weight " << parameters_[DNN][i]->getName();
printVector(dnnValue); printVector(dnnValue);
} }
} }
...@@ -109,7 +109,7 @@ void MKLDNNTester::randomBotDatas() { ...@@ -109,7 +109,7 @@ void MKLDNNTester::randomBotDatas() {
dataLayers_[REF][i]->getOutputValue()->randomizeUniform(); dataLayers_[REF][i]->getOutputValue()->randomizeUniform();
dataLayers_[DNN][i]->getOutputValue()->copyFrom( dataLayers_[DNN][i]->getOutputValue()->copyFrom(
*(dataLayers_[REF][i]->getOutputValue())); *(dataLayers_[REF][i]->getOutputValue()));
VLOG(lvl_) << "Input " << i << " data:"; VLOG(MKLDNN_TESTS) << "Random Foward, InputValue " << i;
printMatrix(dataLayers_[REF][i]->getOutputValue()); printMatrix(dataLayers_[REF][i]->getOutputValue());
} }
} }
...@@ -118,12 +118,12 @@ void MKLDNNTester::randomTopDiffs() { ...@@ -118,12 +118,12 @@ void MKLDNNTester::randomTopDiffs() {
refLayer_->getOutputGrad()->randomizeUniform(); refLayer_->getOutputGrad()->randomizeUniform();
dnnLayer_->getOutput(CPU_DEVICE) dnnLayer_->getOutput(CPU_DEVICE)
.grad->copyFrom(*(refLayer_->getOutputGrad())); .grad->copyFrom(*(refLayer_->getOutputGrad()));
VLOG(lvl_) << "Random Backward Input, TopDiff: "; VLOG(MKLDNN_TESTS) << "Random Backward, OutputGrad";
printMatrix(refLayer_->getOutputGrad()); printMatrix(refLayer_->getOutputGrad());
} }
void MKLDNNTester::checkForward() { void MKLDNNTester::checkForward() {
VLOG(MKLDNN_ALL) << "Check Forward"; VLOG(MKLDNN_TESTS) << "Check Forward";
printTopDatas(); printTopDatas();
double delta = double delta =
compareMatrix(dnnLayer_->getOutputValue(), refLayer_->getOutputValue()); compareMatrix(dnnLayer_->getOutputValue(), refLayer_->getOutputValue());
...@@ -131,15 +131,15 @@ void MKLDNNTester::checkForward() { ...@@ -131,15 +131,15 @@ void MKLDNNTester::checkForward() {
} }
void MKLDNNTester::checkBackwardData() { void MKLDNNTester::checkBackwardData() {
VLOG(MKLDNN_ALL) << "Check Backward Data"; VLOG(MKLDNN_TESTS) << "Check Backward Data";
// TODO(TJ): uncomment me when batch norm ready // TODO(TJ): uncomment me when batch norm ready
// const bool isBN = dnnLayer_->getType() == "mkldnn_batch_norm"; // const bool isBN = dnnLayer_->getType() == "mkldnn_batch_norm";
for (size_t i = 0; i < dataLayers_[DNN].size(); ++i) { for (size_t i = 0; i < dataLayers_[DNN].size(); ++i) {
const MatrixPtr& dnnDiff = dataLayers_[DNN][i]->getOutputGrad(); const MatrixPtr& dnnDiff = dataLayers_[DNN][i]->getOutputGrad();
const MatrixPtr& refDiff = dataLayers_[REF][i]->getOutputGrad(); const MatrixPtr& refDiff = dataLayers_[REF][i]->getOutputGrad();
VLOG(lvl_) << "Mkldnn Backward Output BotDiff " << i; VLOG(MKLDNN_ALL) << "MKLDNN Backward Result: InputGrad " << i;
printMatrix(dnnDiff); printMatrix(dnnDiff);
VLOG(lvl_) << "Reference Backward Output BotDiff " << i; VLOG(MKLDNN_ALL) << "Reference Backward Result: InputGrad " << i;
printMatrix(refDiff); printMatrix(refDiff);
double delta = compareMatrix(dnnDiff, refDiff); double delta = compareMatrix(dnnDiff, refDiff);
...@@ -153,7 +153,7 @@ void MKLDNNTester::checkBackwardData() { ...@@ -153,7 +153,7 @@ void MKLDNNTester::checkBackwardData() {
} }
void MKLDNNTester::checkBackwardWgts() { void MKLDNNTester::checkBackwardWgts() {
VLOG(MKLDNN_ALL) << "Check Backward Weight"; VLOG(MKLDNN_TESTS) << "Check Backward Weight";
CHECK_EQ(parameters_[DNN].size(), parameters_[REF].size()); CHECK_EQ(parameters_[DNN].size(), parameters_[REF].size());
vector<VectorPtr> dnnWgts; // used to temply save mkldnn weights vector<VectorPtr> dnnWgts; // used to temply save mkldnn weights
saveWgt(parameters_[DNN], dnnWgts); saveWgt(parameters_[DNN], dnnWgts);
...@@ -165,9 +165,11 @@ void MKLDNNTester::checkBackwardWgts() { ...@@ -165,9 +165,11 @@ void MKLDNNTester::checkBackwardWgts() {
for (size_t i = 0; i < parameters_[DNN].size(); ++i) { for (size_t i = 0; i < parameters_[DNN].size(); ++i) {
const VectorPtr& dnn = parameters_[DNN][i]->getBuf(PARAMETER_VALUE); const VectorPtr& dnn = parameters_[DNN][i]->getBuf(PARAMETER_VALUE);
const VectorPtr& ref = parameters_[REF][i]->getBuf(PARAMETER_VALUE); const VectorPtr& ref = parameters_[REF][i]->getBuf(PARAMETER_VALUE);
VLOG(lvl_) << "Mkldnn Output weight " << parameters_[DNN][i]->getName(); VLOG(MKLDNN_ALL) << "MKLDNN Result: weight value"
<< parameters_[DNN][i]->getName();
printVector(dnn); printVector(dnn);
VLOG(lvl_) << "Reference Output weight " << parameters_[REF][i]->getName(); VLOG(MKLDNN_ALL) << "Reference Result: weight value "
<< parameters_[REF][i]->getName();
printVector(ref); printVector(ref);
double delta = compareVector(dnn, ref); double delta = compareVector(dnn, ref);
...@@ -240,7 +242,8 @@ void MKLDNNTester::printTopDatas() { ...@@ -240,7 +242,8 @@ void MKLDNNTester::printTopDatas() {
} }
for (int n = 0; n < NUM; ++n) { for (int n = 0; n < NUM; ++n) {
VLOG(lvl_) << testLayers_[n]->getType() << " forward output TopData: "; VLOG(MKLDNN_ALL) << testLayers_[n]->getType()
<< " Forward Result: OutputValue";
printMatrix(testLayers_[n]->getOutputValue()); printMatrix(testLayers_[n]->getOutputValue());
} }
} }
...@@ -252,7 +255,7 @@ void MKLDNNTester::printMatrix(const MatrixPtr& m) { ...@@ -252,7 +255,7 @@ void MKLDNNTester::printMatrix(const MatrixPtr& m) {
std::ostringstream ostr; std::ostringstream ostr;
m->print(ostr); m->print(ostr);
VLOG(lvl_) << std::endl << ostr.str(); VLOG(MKLDNN_ALL) << std::endl << ostr.str();
} }
void MKLDNNTester::printVector(const VectorPtr& v) { void MKLDNNTester::printVector(const VectorPtr& v) {
...@@ -262,7 +265,7 @@ void MKLDNNTester::printVector(const VectorPtr& v) { ...@@ -262,7 +265,7 @@ void MKLDNNTester::printVector(const VectorPtr& v) {
std::ostringstream ostr; std::ostringstream ostr;
v->print(ostr, v->getSize()); v->print(ostr, v->getSize());
VLOG(lvl_) << std::endl << ostr.str(); VLOG(MKLDNN_ALL) << std::endl << ostr.str();
} }
double MKLDNNTester::getDelta(const real* d1, double MKLDNNTester::getDelta(const real* d1,
...@@ -314,7 +317,7 @@ void MKLDNNTester::runOnce() { ...@@ -314,7 +317,7 @@ void MKLDNNTester::runOnce() {
UpdateCallback updateCallback = [](Parameter* para) { UpdateCallback updateCallback = [](Parameter* para) {
auto& grad = para->getBuf(PARAMETER_GRADIENT); auto& grad = para->getBuf(PARAMETER_GRADIENT);
auto& value = para->getBuf(PARAMETER_VALUE); auto& value = para->getBuf(PARAMETER_VALUE);
real lr = 1e-3; real lr = 1e-2;
value->add(*grad, lr); value->add(*grad, lr);
grad->zeroMem(); grad->zeroMem();
}; };
...@@ -340,10 +343,9 @@ void MKLDNNTester::run(const TestConfig& dnn, ...@@ -340,10 +343,9 @@ void MKLDNNTester::run(const TestConfig& dnn,
size_t batchSize, size_t batchSize,
size_t inputImgH, size_t inputImgH,
size_t inputImgW, size_t inputImgW,
bool printDetails,
size_t iter, size_t iter,
float epsilon, float epsilon) {
bool log,
int level) {
CHECK(dnn.layerConfig.type().compare(0, 7, "mkldnn_") == 0 || CHECK(dnn.layerConfig.type().compare(0, 7, "mkldnn_") == 0 ||
dnn.layerConfig.active_type().compare(0, 7, "mkldnn_") == 0) dnn.layerConfig.active_type().compare(0, 7, "mkldnn_") == 0)
<< "should be MKLDNN layer or MKLDNN activation"; << "should be MKLDNN layer or MKLDNN activation";
...@@ -359,10 +361,9 @@ void MKLDNNTester::run(const TestConfig& dnn, ...@@ -359,10 +361,9 @@ void MKLDNNTester::run(const TestConfig& dnn,
ih_ = inputImgH; ih_ = inputImgH;
iw_ = inputImgW; iw_ = inputImgW;
log_ = printDetails;
iter_ = iter; iter_ = iter;
eps_ = epsilon; eps_ = epsilon;
log_ = log;
lvl_ = level;
// Firstly test mkldnn init from PARAM_FORMAT_ORIGINAL weight // Firstly test mkldnn init from PARAM_FORMAT_ORIGINAL weight
reset(dnn, ref, batchSize); reset(dnn, ref, batchSize);
...@@ -531,9 +532,11 @@ void MKLDNNTester::getOutResult(const std::string& configPath, ...@@ -531,9 +532,11 @@ void MKLDNNTester::getOutResult(const std::string& configPath,
void MKLDNNTester::compareResult(DataOut& ref, DataOut& dnn, float eps) { void MKLDNNTester::compareResult(DataOut& ref, DataOut& dnn, float eps) {
CHECK_EQ(ref.outValues.size(), dnn.outValues.size()); CHECK_EQ(ref.outValues.size(), dnn.outValues.size());
CHECK_EQ(ref.paraValues.size(), dnn.paraValues.size()); CHECK_EQ(ref.paraValues.size(), dnn.paraValues.size());
VLOG(MKLDNN_TESTS) << "compare value size: " << ref.outValues.size();
for (size_t i = 0; i < ref.outValues.size(); i++) { for (size_t i = 0; i < ref.outValues.size(); i++) {
EXPECT_LE(fabs(compareMatrix(ref.outValues[i], dnn.outValues[i])), eps); EXPECT_LE(fabs(compareMatrix(ref.outValues[i], dnn.outValues[i])), eps);
} }
VLOG(MKLDNN_TESTS) << "compare param size: " << ref.outValues.size();
for (size_t i = 0; i < ref.paraValues.size(); i++) { for (size_t i = 0; i < ref.paraValues.size(); i++) {
EXPECT_LE(fabs(compareVector(ref.paraValues[i], dnn.paraValues[i])), eps); EXPECT_LE(fabs(compareVector(ref.paraValues[i], dnn.paraValues[i])), eps);
} }
...@@ -544,9 +547,10 @@ void MKLDNNTester::runBranchesTest(const std::string& configPath, ...@@ -544,9 +547,10 @@ void MKLDNNTester::runBranchesTest(const std::string& configPath,
float eps) { float eps) {
DataIn in; DataIn in;
initArgument(in, configPath, iter); initArgument(in, configPath, iter);
DataOut outCpu, outDnn; DataOut outCpu, outDnn;
VLOG(MKLDNN_TESTS) << "runing cpu network";
getOutResult(configPath, in, outCpu, false, iter); getOutResult(configPath, in, outCpu, false, iter);
VLOG(MKLDNN_TESTS) << "runing mkldnn network";
getOutResult(configPath, in, outDnn, true, iter); getOutResult(configPath, in, outDnn, true, iter);
compareResult(outCpu, outDnn, eps); compareResult(outCpu, outDnn, eps);
......
...@@ -58,8 +58,6 @@ protected: ...@@ -58,8 +58,6 @@ protected:
size_t iter_; size_t iter_;
/// whether to print out the details /// whether to print out the details
bool log_; bool log_;
/// vlog level to print the matrix details datas
int lvl_;
/// epsilon /// epsilon
float eps_; float eps_;
/// input image size, default 1 /// input image size, default 1
...@@ -70,7 +68,6 @@ public: ...@@ -70,7 +68,6 @@ public:
iter_ = iter; iter_ = iter;
eps_ = epsilon; eps_ = epsilon;
log_ = false; log_ = false;
lvl_ = MKLDNN_ALL;
} }
~MKLDNNTester() {} ~MKLDNNTester() {}
...@@ -81,10 +78,9 @@ public: ...@@ -81,10 +78,9 @@ public:
size_t batchSize, size_t batchSize,
size_t inputImgH = 1, size_t inputImgH = 1,
size_t inputImgW = 1, size_t inputImgW = 1,
bool printDetails = false,
size_t iter = 3, size_t iter = 3,
float epsilon = 1e-4, float epsilon = 1e-4);
bool log = false,
int level = MKLDNN_ALL);
static void runBranchesTest(const std::string& configPath, static void runBranchesTest(const std::string& configPath,
size_t iter = 3, size_t iter = 3,
float eps = 1e-4); float eps = 1e-4);
......
# Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle.trainer_config_helpers import *
settings(batch_size=16)
channels = get_config_arg("channels", int, 2)
def two_fc(input, group_name):
out1 = fc_layer(input=input,
name=group_name+'_fc1',
size=channels,
bias_attr=False,
act=LinearActivation())
out2 = fc_layer(input=input,
name=group_name+'_fc2',
size=channels,
bias_attr=False,
act=LinearActivation())
return out1, out2
data = data_layer(name ="input", size=channels*16*16)
conv = img_conv_layer(input=data,
num_channels=channels,
filter_size=3,
num_filters=channels,
padding=1,
shared_biases=True,
act=LinearActivation())
pool = img_pool_layer(input=conv,
pool_size=3,
stride=2,
padding=1,
pool_type=AvgPooling())
a1, a2 = two_fc(input=pool, group_name='a')
concat = concat_layer(input=[a1, a2])
b1, b2 = two_fc(input=pool, group_name='b')
addto = addto_layer(input=[b1, b2])
outputs([concat, addto])
# Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle.trainer_config_helpers import *
settings(batch_size=16)
channels = get_config_arg("channels", int, 2)
def two_pool(input, group_name):
out1 = img_pool_layer(input=input,
name=group_name+'_pool1',
pool_size=3,
stride=2,
padding=0,
pool_type=MaxPooling())
out2 = img_pool_layer(input=input,
name=group_name+'_pool2',
pool_size=5,
stride=2,
padding=1,
pool_type=MaxPooling())
return out1, out2
data = data_layer(name ="input", size=channels*16*16)
conv = img_conv_layer(input=data,
num_channels=channels,
filter_size=3,
num_filters=channels,
padding=1,
shared_biases=True,
act=LinearActivation())
pool = img_pool_layer(input=conv,
pool_size=3,
stride=1,
padding=1,
pool_type=AvgPooling())
a1, a2 = two_pool(input=pool, group_name='a')
concat = concat_layer(input=[a1, a2])
b1, b2 = two_pool(input=pool, group_name='b')
addto = addto_layer(input=[b1, b2])
outputs([concat, addto])
...@@ -250,7 +250,7 @@ TEST(MKLDNNActivation, Activations) { ...@@ -250,7 +250,7 @@ TEST(MKLDNNActivation, Activations) {
DECLARE_string(config_args); DECLARE_string(config_args);
TEST(MKLDNNLayer, branches) { TEST(MKLDNNLayer, branches) {
std::vector<std::string> cases = {"conv"}; std::vector<std::string> cases = {"conv", "pool", "fc"};
for (auto name : cases) { for (auto name : cases) {
std::string config = "./gserver/tests/mkldnn_branches_" + name + ".conf"; std::string config = "./gserver/tests/mkldnn_branches_" + name + ".conf";
for (auto channels : {2, 32}) { for (auto channels : {2, 32}) {
......
...@@ -51,7 +51,10 @@ def test_sparse_non_value_no_seq(setting, filename): ...@@ -51,7 +51,10 @@ def test_sparse_non_value_no_seq(setting, filename):
yield [(i + 1) * (j + 1) for j in xrange(10)] yield [(i + 1) * (j + 1) for j in xrange(10)]
@provider(input_types=[sparse_vector(30000, seq_type=SequenceType.NO_SEQUENCE)]) @provider(input_types=[
sparse_float_vector(
30000, seq_type=SequenceType.NO_SEQUENCE)
])
def test_sparse_value_no_seq(setting, filename): def test_sparse_value_no_seq(setting, filename):
for i in xrange(200): for i in xrange(200):
yield [((i + 1) * (j + 1), float(j) / float(i + 1)) for j in xrange(10)] yield [((i + 1) * (j + 1), float(j) / float(i + 1)) for j in xrange(10)]
......
...@@ -18,7 +18,7 @@ using namespace mkldnn; // NOLINT ...@@ -18,7 +18,7 @@ using namespace mkldnn; // NOLINT
namespace paddle { namespace paddle {
MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m, memory::primitive_desc pd) { MKLDNNMatrixPtr MKLDNNMatrix::create(memory::primitive_desc pd, MatrixPtr m) {
memory::desc md = pd.desc(); memory::desc md = pd.desc();
size_t ndims = md.data.ndims; size_t ndims = md.data.ndims;
int* dims = md.data.dims; int* dims = md.data.dims;
...@@ -41,12 +41,12 @@ MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m, memory::primitive_desc pd) { ...@@ -41,12 +41,12 @@ MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m, memory::primitive_desc pd) {
return std::make_shared<MKLDNNMatrix>(cpuMatrix, pd); return std::make_shared<MKLDNNMatrix>(cpuMatrix, pd);
} }
MKLDNNMatrixPtr MKLDNNMatrix::create(MatrixPtr m, MKLDNNMatrixPtr MKLDNNMatrix::create(memory::dims dims,
memory::dims dims,
memory::format fmt, memory::format fmt,
engine& eg, engine& eg,
MatrixPtr m,
mkldnn::memory::data_type dtype) { mkldnn::memory::data_type dtype) {
return create(m, memory::primitive_desc(memory::desc(dims, dtype, fmt), eg)); return create(createPrimitiveDesc(dims, fmt, eg, dtype), m);
} }
std::shared_ptr<reorder> MKLDNNMatrix::createReorder(const MKLDNNMatrixPtr& src, std::shared_ptr<reorder> MKLDNNMatrix::createReorder(const MKLDNNMatrixPtr& src,
......
...@@ -40,24 +40,37 @@ public: ...@@ -40,24 +40,37 @@ public:
/** /**
* Create MKLDNNMatrix from a MatrixPtr and memory primitive_desc * Create MKLDNNMatrix from a MatrixPtr and memory primitive_desc
*/ */
static MKLDNNMatrixPtr create(MatrixPtr m, mkldnn::memory::primitive_desc pd); static MKLDNNMatrixPtr create(mkldnn::memory::primitive_desc pd,
MatrixPtr m = nullptr);
/** /**
* Create MKLDNNMatrix from a MatrixPtr and memory details info * Create MKLDNNMatrix from a MatrixPtr and memory details info
*/ */
static MKLDNNMatrixPtr create( static MKLDNNMatrixPtr create(
MatrixPtr m,
mkldnn::memory::dims dims, mkldnn::memory::dims dims,
mkldnn::memory::format fmt, mkldnn::memory::format fmt,
mkldnn::engine& eg, mkldnn::engine& eg,
MatrixPtr m = nullptr,
mkldnn::memory::data_type dtype = mkldnn::memory::data_type::f32); mkldnn::memory::data_type dtype = mkldnn::memory::data_type::f32);
/**
* Create primitive descriptor.
* default with f32 dtype
*/
static mkldnn::memory::primitive_desc createPrimitiveDesc(
const mkldnn::memory::dims dims,
const mkldnn::memory::format& fmt,
const mkldnn::engine& eg,
const mkldnn::memory::data_type& dtype = mkldnn::memory::data_type::f32) {
return mkldnn::memory::primitive_desc(memory::desc(dims, dtype, fmt), eg);
}
/** /**
* Create Memory descriptor. * Create Memory descriptor.
* default with any format and f32 dtype * default with any format and f32 dtype
*/ */
static mkldnn::memory::desc createMemoryDesc( static mkldnn::memory::desc createMemoryDesc(
const mkldnn::memory::dims& dims, const mkldnn::memory::dims dims,
const mkldnn::memory::format& fmt = mkldnn::memory::format::any, const mkldnn::memory::format& fmt = mkldnn::memory::format::any,
const mkldnn::memory::data_type& dtype = mkldnn::memory::data_type::f32) { const mkldnn::memory::data_type& dtype = mkldnn::memory::data_type::f32) {
return mkldnn::memory::desc(dims, dtype, fmt); return mkldnn::memory::desc(dims, dtype, fmt);
......
...@@ -69,5 +69,8 @@ information, or not. But the output only shares the LoD with input `Inference`. ...@@ -69,5 +69,8 @@ information, or not. But the output only shares the LoD with input `Inference`.
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_WITHOUT_GRADIENT(accuracy, ops::AccuracyOp, ops::AccuracyOpMaker); REGISTER_OP_WITHOUT_GRADIENT(accuracy, ops::AccuracyOp, ops::AccuracyOpMaker);
REGISTER_OP_CPU_KERNEL(accuracy, REGISTER_OP_CPU_KERNEL(
ops::AccuracyKernel<paddle::platform::CPUPlace, float>); accuracy, ops::AccuracyKernel<paddle::platform::CPUPlace, float>,
ops::AccuracyKernel<paddle::platform::CPUPlace, int>,
ops::AccuracyKernel<paddle::platform::CPUPlace, double>,
ops::AccuracyKernel<paddle::platform::CPUPlace, int64_t>);
...@@ -21,9 +21,9 @@ namespace paddle { ...@@ -21,9 +21,9 @@ namespace paddle {
namespace operators { namespace operators {
using platform::PADDLE_CUDA_NUM_THREADS; using platform::PADDLE_CUDA_NUM_THREADS;
template <int BlockSize> template <typename T, int BlockSize>
__global__ void AccuracyCudaKernel(const int N, const int D, const int* Xdata, __global__ void AccuracyCudaKernel(const int N, const int D, const T* Xdata,
const int* labeldata, float* accuracy) { const T* labeldata, float* accuracy) {
int count = 0; int count = 0;
__shared__ int total[BlockSize]; __shared__ int total[BlockSize];
...@@ -57,8 +57,8 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> { ...@@ -57,8 +57,8 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
auto* accuracy = ctx.Output<Tensor>("Accuracy"); auto* accuracy = ctx.Output<Tensor>("Accuracy");
// FIXME(typhoonzero): only support indices currently // FIXME(typhoonzero): only support indices currently
// if add support for output values, how to detect the data type? // if add support for output values, how to detect the data type?
const int* inference_data = inference->data<int>(); const T* inference_data = inference->data<T>();
const int* label_data = label->data<int>(); const T* label_data = label->data<T>();
float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace()); float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace());
size_t num_samples = inference->dims()[0]; size_t num_samples = inference->dims()[0];
...@@ -69,7 +69,7 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> { ...@@ -69,7 +69,7 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
return; return;
} }
AccuracyCudaKernel<PADDLE_CUDA_NUM_THREADS><<< AccuracyCudaKernel<T, PADDLE_CUDA_NUM_THREADS><<<
1, PADDLE_CUDA_NUM_THREADS, 0, 1, PADDLE_CUDA_NUM_THREADS, 0,
reinterpret_cast<const platform::CUDADeviceContext&>( reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context()) ctx.device_context())
...@@ -81,5 +81,7 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> { ...@@ -81,5 +81,7 @@ class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
REGISTER_OP_GPU_KERNEL(accuracy, REGISTER_OP_GPU_KERNEL(accuracy, paddle::operators::AccuracyOpCUDAKernel<float>,
paddle::operators::AccuracyOpCUDAKernel<float>); paddle::operators::AccuracyOpCUDAKernel<double>,
paddle::operators::AccuracyOpCUDAKernel<int>,
paddle::operators::AccuracyOpCUDAKernel<int64_t>);
...@@ -43,10 +43,6 @@ class AdamOp : public framework::OperatorWithKernel { ...@@ -43,10 +43,6 @@ class AdamOp : public framework::OperatorWithKernel {
"Output(Moment1Out) of AdamOp should not be null."); "Output(Moment1Out) of AdamOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Moment2Out"), PADDLE_ENFORCE(ctx->HasOutput("Moment2Out"),
"Output(Moment2Out) of AdamOp should not be null."); "Output(Moment2Out) of AdamOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Beta1PowOut"),
"Output(Beta1PowOut) of AdamOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Beta2PowOut"),
"Output(Beta2PowOut) of AdamOp should not be null.");
auto lr_dims = ctx->GetInputDim("LearningRate"); auto lr_dims = ctx->GetInputDim("LearningRate");
PADDLE_ENFORCE_EQ(framework::product(lr_dims), 1, PADDLE_ENFORCE_EQ(framework::product(lr_dims), 1,
...@@ -72,8 +68,6 @@ class AdamOp : public framework::OperatorWithKernel { ...@@ -72,8 +68,6 @@ class AdamOp : public framework::OperatorWithKernel {
ctx->SetOutputDim("ParamOut", param_dims); ctx->SetOutputDim("ParamOut", param_dims);
ctx->SetOutputDim("Moment1Out", param_dims); ctx->SetOutputDim("Moment1Out", param_dims);
ctx->SetOutputDim("Moment2Out", param_dims); ctx->SetOutputDim("Moment2Out", param_dims);
ctx->SetOutputDim("Beta1PowOut", beta1_pow_dims);
ctx->SetOutputDim("Beta2PowOut", beta2_pow_dims);
} }
}; };
...@@ -92,8 +86,6 @@ class AdamOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -92,8 +86,6 @@ class AdamOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("ParamOut", "(Tensor) Output parameter"); AddOutput("ParamOut", "(Tensor) Output parameter");
AddOutput("Moment1Out", "(Tensor) Output first moment"); AddOutput("Moment1Out", "(Tensor) Output first moment");
AddOutput("Moment2Out", "(Tensor) Output second moment"); AddOutput("Moment2Out", "(Tensor) Output second moment");
AddOutput("Beta1PowOut", "(Tensor) Output beta1 power accumulator");
AddOutput("Beta2PowOut", "(Tensor) Output beta2 power accumulator");
AddAttr<float>("beta1", AddAttr<float>("beta1",
"(float, default 0.9) " "(float, default 0.9) "
...@@ -121,10 +113,8 @@ Adam updates: ...@@ -121,10 +113,8 @@ Adam updates:
moment1_out = beta1 * moment1 + (1 − beta1) * grad moment1_out = beta1 * moment1 + (1 − beta1) * grad
moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad
beta1_pow_out = beta1_pow * beta1
beta2_pow_out = beta2_pow * beta2
learning_rate_t = learning_rate_t * learning_rate_t = learning_rate_t *
sqrt(1 - beta2_pow_out) / (1 - beta1_pow_out) sqrt(1 - beta2_pow) / (1 - beta1_pow)
param_out = param - learning_rate_t * moment1/ (sqrt(moment2) + epsilon) param_out = param - learning_rate_t * moment1/ (sqrt(moment2) + epsilon)
References: References:
......
...@@ -26,14 +26,10 @@ class AdamOpKernel : public framework::OpKernel<T> { ...@@ -26,14 +26,10 @@ class AdamOpKernel : public framework::OpKernel<T> {
auto param_out_tensor = ctx.Output<framework::Tensor>("ParamOut"); auto param_out_tensor = ctx.Output<framework::Tensor>("ParamOut");
auto moment1_out_tensor = ctx.Output<framework::Tensor>("Moment1Out"); auto moment1_out_tensor = ctx.Output<framework::Tensor>("Moment1Out");
auto moment2_out_tensor = ctx.Output<framework::Tensor>("Moment2Out"); auto moment2_out_tensor = ctx.Output<framework::Tensor>("Moment2Out");
auto beta1_pow_out_tensor = ctx.Output<framework::Tensor>("Beta1PowOut");
auto beta2_pow_out_tensor = ctx.Output<framework::Tensor>("Beta2PowOut");
param_out_tensor->mutable_data<T>(ctx.GetPlace()); param_out_tensor->mutable_data<T>(ctx.GetPlace());
moment1_out_tensor->mutable_data<T>(ctx.GetPlace()); moment1_out_tensor->mutable_data<T>(ctx.GetPlace());
moment2_out_tensor->mutable_data<T>(ctx.GetPlace()); moment2_out_tensor->mutable_data<T>(ctx.GetPlace());
beta1_pow_out_tensor->mutable_data<T>(ctx.GetPlace());
beta2_pow_out_tensor->mutable_data<T>(ctx.GetPlace());
float beta1 = ctx.Attr<float>("beta1"); float beta1 = ctx.Attr<float>("beta1");
float beta2 = ctx.Attr<float>("beta2"); float beta2 = ctx.Attr<float>("beta2");
...@@ -56,18 +52,13 @@ class AdamOpKernel : public framework::OpKernel<T> { ...@@ -56,18 +52,13 @@ class AdamOpKernel : public framework::OpKernel<T> {
auto param_out = framework::EigenVector<T>::Flatten(*param_out_tensor); auto param_out = framework::EigenVector<T>::Flatten(*param_out_tensor);
auto moment1_out = framework::EigenVector<T>::Flatten(*moment1_out_tensor); auto moment1_out = framework::EigenVector<T>::Flatten(*moment1_out_tensor);
auto moment2_out = framework::EigenVector<T>::Flatten(*moment2_out_tensor); auto moment2_out = framework::EigenVector<T>::Flatten(*moment2_out_tensor);
auto beta1_pow_out =
framework::EigenVector<T>::Flatten(*beta1_pow_out_tensor);
auto beta2_pow_out =
framework::EigenVector<T>::Flatten(*beta2_pow_out_tensor);
auto place = ctx.GetEigenDevice<Place>(); auto place = ctx.GetEigenDevice<Place>();
moment1_out.device(place) = beta1 * moment1 + (1 - beta1) * grad; moment1_out.device(place) = beta1 * moment1 + (1 - beta1) * grad;
moment2_out.device(place) = beta2 * moment2 + (1 - beta2) * grad.square(); moment2_out.device(place) = beta2 * moment2 + (1 - beta2) * grad.square();
beta1_pow_out.device(place) = beta1_pow * beta1;
beta2_pow_out.device(place) = beta2_pow * beta2;
// All of these are tensors of 1 element // All of these are tensors of 1 element
auto lr_t = lr * (1 - beta2_pow_out).sqrt() / (1 - beta1_pow_out); auto lr_t = lr * (1 - beta2_pow).sqrt() / (1 - beta1_pow);
// Eigen does not support automatic broadcast // Eigen does not support automatic broadcast
// Get dimensions of moment vector to broadcast lr_t // Get dimensions of moment vector to broadcast lr_t
Eigen::DSizes<int, 1> m_dsize(moment1_out_tensor->numel()); Eigen::DSizes<int, 1> m_dsize(moment1_out_tensor->numel());
......
...@@ -41,8 +41,6 @@ class AdamaxOp : public framework::OperatorWithKernel { ...@@ -41,8 +41,6 @@ class AdamaxOp : public framework::OperatorWithKernel {
"Output(MomentOut) of AdamaxOp should not be null."); "Output(MomentOut) of AdamaxOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("InfNormOut"), PADDLE_ENFORCE(ctx->HasOutput("InfNormOut"),
"Output(InfNormOut) of AdamaxOp should not be null."); "Output(InfNormOut) of AdamaxOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Beta1PowOut"),
"Output(Beta1PowOut) of AdamaxOp should not be null.");
auto lr_dims = ctx->GetInputDim("LearningRate"); auto lr_dims = ctx->GetInputDim("LearningRate");
PADDLE_ENFORCE_EQ(framework::product(lr_dims), 1, PADDLE_ENFORCE_EQ(framework::product(lr_dims), 1,
...@@ -64,7 +62,6 @@ class AdamaxOp : public framework::OperatorWithKernel { ...@@ -64,7 +62,6 @@ class AdamaxOp : public framework::OperatorWithKernel {
ctx->SetOutputDim("ParamOut", param_dims); ctx->SetOutputDim("ParamOut", param_dims);
ctx->SetOutputDim("MomentOut", param_dims); ctx->SetOutputDim("MomentOut", param_dims);
ctx->SetOutputDim("InfNormOut", param_dims); ctx->SetOutputDim("InfNormOut", param_dims);
ctx->SetOutputDim("Beta1PowOut", beta1_pow_dims);
} }
}; };
...@@ -86,7 +83,6 @@ class AdamaxOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -86,7 +83,6 @@ class AdamaxOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("InfNormOut", AddOutput("InfNormOut",
"(Tensor) " "(Tensor) "
"Output exponentially weighted infinity norm"); "Output exponentially weighted infinity norm");
AddOutput("Beta1PowOut", "(Tensor) Output beta1 power accumulator");
AddAttr<float>("beta1", AddAttr<float>("beta1",
"(float, default 0.9) " "(float, default 0.9) "
...@@ -113,8 +109,7 @@ Adamax updates: ...@@ -113,8 +109,7 @@ Adamax updates:
moment_out = beta1 * moment + (1 - beta1) * grad moment_out = beta1 * moment + (1 - beta1) * grad
inf_norm_out = max(beta2 * inf_norm + epsilon, abs(grad)) inf_norm_out = max(beta2 * inf_norm + epsilon, abs(grad))
beta1_pow_out = beta1_pow * beta1 learning_rate_t = learning_rate/(1 - beta1_pow)
learning_rate_t = learning_rate/(1 - beta1_pow_out)
param_out = param - learning_rate_t * moment_out/inf_norm_out param_out = param - learning_rate_t * moment_out/inf_norm_out
The original paper does not have an epsilon attribute. The original paper does not have an epsilon attribute.
......
...@@ -26,12 +26,10 @@ class AdamaxOpKernel : public framework::OpKernel<T> { ...@@ -26,12 +26,10 @@ class AdamaxOpKernel : public framework::OpKernel<T> {
auto param_out_tensor = ctx.Output<framework::Tensor>("ParamOut"); auto param_out_tensor = ctx.Output<framework::Tensor>("ParamOut");
auto moment_out_tensor = ctx.Output<framework::Tensor>("MomentOut"); auto moment_out_tensor = ctx.Output<framework::Tensor>("MomentOut");
auto inf_norm_out_tensor = ctx.Output<framework::Tensor>("InfNormOut"); auto inf_norm_out_tensor = ctx.Output<framework::Tensor>("InfNormOut");
auto beta1_pow_out_tensor = ctx.Output<framework::Tensor>("Beta1PowOut");
param_out_tensor->mutable_data<T>(ctx.GetPlace()); param_out_tensor->mutable_data<T>(ctx.GetPlace());
moment_out_tensor->mutable_data<T>(ctx.GetPlace()); moment_out_tensor->mutable_data<T>(ctx.GetPlace());
inf_norm_out_tensor->mutable_data<T>(ctx.GetPlace()); inf_norm_out_tensor->mutable_data<T>(ctx.GetPlace());
beta1_pow_out_tensor->mutable_data<T>(ctx.GetPlace());
float beta1 = ctx.Attr<float>("beta1"); float beta1 = ctx.Attr<float>("beta1");
float beta2 = ctx.Attr<float>("beta2"); float beta2 = ctx.Attr<float>("beta2");
...@@ -53,15 +51,12 @@ class AdamaxOpKernel : public framework::OpKernel<T> { ...@@ -53,15 +51,12 @@ class AdamaxOpKernel : public framework::OpKernel<T> {
auto moment_out = framework::EigenVector<T>::Flatten(*moment_out_tensor); auto moment_out = framework::EigenVector<T>::Flatten(*moment_out_tensor);
auto inf_norm_out = auto inf_norm_out =
framework::EigenVector<T>::Flatten(*inf_norm_out_tensor); framework::EigenVector<T>::Flatten(*inf_norm_out_tensor);
auto beta1_pow_out =
framework::EigenVector<T>::Flatten(*beta1_pow_out_tensor);
auto place = ctx.GetEigenDevice<Place>(); auto place = ctx.GetEigenDevice<Place>();
moment_out.device(place) = beta1 * moment + (1 - beta1) * grad; moment_out.device(place) = beta1 * moment + (1 - beta1) * grad;
inf_norm_out.device(place) = inf_norm_out.device(place) =
grad.abs().cwiseMax((beta2 * inf_norm) + epsilon); grad.abs().cwiseMax((beta2 * inf_norm) + epsilon);
beta1_pow_out.device(place) = beta1_pow * beta1; auto lr_t = lr / (1 - beta1_pow);
auto lr_t = lr / (1 - beta1_pow_out);
Eigen::DSizes<int, 1> m_dsize(moment_out_tensor->numel()); Eigen::DSizes<int, 1> m_dsize(moment_out_tensor->numel());
param_out.device(place) = param_out.device(place) =
param - lr_t.broadcast(m_dsize) * (moment_out / inf_norm_out); param - lr_t.broadcast(m_dsize) * (moment_out / inf_norm_out);
......
...@@ -27,8 +27,8 @@ class ClipOp : public framework::OperatorWithKernel { ...@@ -27,8 +27,8 @@ class ClipOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE(ctx->HasOutput("Out"), PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of ClipOp should not be null."); "Output(Out) of ClipOp should not be null.");
auto x_dims = ctx->GetInputDim("X"); auto x_dims = ctx->GetInputDim("X");
auto max = Attr<float>("max"); auto max = ctx->Attrs().Get<float>("max");
auto min = Attr<float>("min"); auto min = ctx->Attrs().Get<float>("min");
PADDLE_ENFORCE_LT(min, max, "max should be greater than min."); PADDLE_ENFORCE_LT(min, max, "max should be greater than min.");
ctx->SetOutputDim("Out", x_dims); ctx->SetOutputDim("Out", x_dims);
ctx->ShareLoD("X", /*->*/ "Out"); ctx->ShareLoD("X", /*->*/ "Out");
......
...@@ -108,17 +108,17 @@ class GemmConv2DKernel : public framework::OpKernel<T> { ...@@ -108,17 +108,17 @@ class GemmConv2DKernel : public framework::OpKernel<T> {
int in_step = input_channels / groups; int in_step = input_channels / groups;
int out_step = output_channels / groups; int out_step = output_channels / groups;
for (int i = 0; i < batch_size; i++) { for (int i = 0; i < batch_size; i++) {
Tensor in_batch = input->Slice<T>(i, i + 1).Resize(input_shape); Tensor in_batch = input->Slice(i, i + 1).Resize(input_shape);
Tensor out_batch = output->Slice<T>(i, i + 1).Resize(output_matrix_shape); Tensor out_batch = output->Slice(i, i + 1).Resize(output_matrix_shape);
for (int g = 0; g < groups; g++) { for (int g = 0; g < groups; g++) {
// im2col // im2col
Tensor in_slice = in_batch.Slice<T>(g * in_step, (g + 1) * in_step); Tensor in_slice = in_batch.Slice(g * in_step, (g + 1) * in_step);
im2col(context.device_context(), in_slice, col, strides[0], strides[1], im2col(context.device_context(), in_slice, col, strides[0], strides[1],
paddings[0], paddings[1]); paddings[0], paddings[1]);
// gemm // gemm
Tensor out_slice = out_batch.Slice<T>(g * out_step, (g + 1) * out_step); Tensor out_slice = out_batch.Slice(g * out_step, (g + 1) * out_step);
Tensor filter_slice = filter.Slice<T>(g * out_step, (g + 1) * out_step); Tensor filter_slice = filter.Slice(g * out_step, (g + 1) * out_step);
math::matmul<Place, T>(context.device_context(), filter_slice, false, math::matmul<Place, T>(context.device_context(), filter_slice, false,
col_matrix, false, T(1.0), &out_slice, T(0.0)); col_matrix, false, T(1.0), &out_slice, T(0.0));
} }
...@@ -198,22 +198,20 @@ class GemmConvGrad2DKernel : public framework::OpKernel<T> { ...@@ -198,22 +198,20 @@ class GemmConvGrad2DKernel : public framework::OpKernel<T> {
for (int i = 0; i < batch_size; i++) { for (int i = 0; i < batch_size; i++) {
Tensor out_grad_batch = Tensor out_grad_batch =
output_grad->Slice<T>(i, i + 1).Resize(output_matrix_shape); output_grad->Slice(i, i + 1).Resize(output_matrix_shape);
Tensor in_grad_batch = Tensor in_grad_batch = input_grad->Slice(i, i + 1).Resize(input_shape);
input_grad->Slice<T>(i, i + 1).Resize(input_shape);
for (int g = 0; g < groups; g++) { for (int g = 0; g < groups; g++) {
// gemm // gemm
Tensor out_grad_slice = Tensor out_grad_slice =
out_grad_batch.Slice<T>(g * out_step, (g + 1) * out_step); out_grad_batch.Slice(g * out_step, (g + 1) * out_step);
Tensor filter_slice = Tensor filter_slice = filter.Slice(g * out_step, (g + 1) * out_step);
filter.Slice<T>(g * out_step, (g + 1) * out_step);
math::matmul<Place, T>(context.device_context(), filter_slice, true, math::matmul<Place, T>(context.device_context(), filter_slice, true,
out_grad_slice, false, T(1.0), &col_matrix, out_grad_slice, false, T(1.0), &col_matrix,
T(0.0)); T(0.0));
// col2im // col2im
Tensor in_grad_slice = Tensor in_grad_slice =
in_grad_batch.Slice<T>(g * in_step, (g + 1) * in_step); in_grad_batch.Slice(g * in_step, (g + 1) * in_step);
col2im(context.device_context(), in_grad_slice, col, strides[0], col2im(context.device_context(), in_grad_slice, col, strides[0],
strides[1], paddings[0], paddings[1]); strides[1], paddings[0], paddings[1]);
} }
...@@ -229,19 +227,19 @@ class GemmConvGrad2DKernel : public framework::OpKernel<T> { ...@@ -229,19 +227,19 @@ class GemmConvGrad2DKernel : public framework::OpKernel<T> {
for (int i = 0; i < batch_size; i++) { for (int i = 0; i < batch_size; i++) {
Tensor out_grad_batch = Tensor out_grad_batch =
output_grad->Slice<T>(i, i + 1).Resize(output_matrix_shape); output_grad->Slice(i, i + 1).Resize(output_matrix_shape);
Tensor in_batch = input->Slice<T>(i, i + 1).Resize(input_shape); Tensor in_batch = input->Slice(i, i + 1).Resize(input_shape);
for (int g = 0; g < groups; g++) { for (int g = 0; g < groups; g++) {
// im2col // im2col
Tensor out_grad_slice = Tensor out_grad_slice =
out_grad_batch.Slice<T>(g * out_step, (g + 1) * out_step); out_grad_batch.Slice(g * out_step, (g + 1) * out_step);
Tensor in_slice = in_batch.Slice<T>(g * in_step, (g + 1) * in_step); Tensor in_slice = in_batch.Slice(g * in_step, (g + 1) * in_step);
im2col(context.device_context(), in_slice, col, strides[0], im2col(context.device_context(), in_slice, col, strides[0],
strides[1], paddings[0], paddings[1]); strides[1], paddings[0], paddings[1]);
// gemm // gemm
Tensor filter_grad_slice = Tensor filter_grad_slice =
filter_grad_.Slice<T>(g * out_step, (g + 1) * out_step); filter_grad_.Slice(g * out_step, (g + 1) * out_step);
math::matmul<Place, T>(context.device_context(), out_grad_slice, math::matmul<Place, T>(context.device_context(), out_grad_slice,
false, col_matrix, true, T(1.0), false, col_matrix, true, T(1.0),
&filter_grad_slice, T(1.0)); &filter_grad_slice, T(1.0));
......
...@@ -23,6 +23,7 @@ using framework::Scope; ...@@ -23,6 +23,7 @@ using framework::Scope;
using framework::TensorArray; using framework::TensorArray;
using framework::LoDTensor; using framework::LoDTensor;
using framework::Variable; using framework::Variable;
using framework::OperatorBase;
using framework::DySeqMetaBatch; using framework::DySeqMetaBatch;
namespace detail { namespace detail {
...@@ -43,72 +44,72 @@ inline void CreateVariables(Scope& scope, ...@@ -43,72 +44,72 @@ inline void CreateVariables(Scope& scope,
* be reordered, but the RNN op should not change the `boot_state` as an input * be reordered, but the RNN op should not change the `boot_state` as an input
* variable's content. * variable's content.
*/ */
template <typename T> inline void ReorderInitialState(const DySeqMetaBatch& metas,
inline void ReorderBootState(const DySeqMetaBatch& metas,
const LoDTensor& boot_state, LoDTensor* tensor, const LoDTensor& boot_state, LoDTensor* tensor,
const platform::Place& dst_place) { const platform::Place& dst_place) {
for (size_t seq_id = 0; seq_id < metas.size(); seq_id++) { for (size_t seq_id = 0; seq_id < metas.size(); seq_id++) {
auto slice = tensor->Slice<T>(seq_id, seq_id + 1); auto slice = tensor->Slice(seq_id, seq_id + 1);
auto boot_slice = auto boot_slice =
boot_state.Slice<T>(metas[seq_id].ori_idx, metas[seq_id].ori_idx + 1); boot_state.Slice(metas[seq_id].ori_idx, metas[seq_id].ori_idx + 1);
// TODO(superjom) pass in device context as an argument // TODO(superjom) pass in device context as an argument
slice.template CopyFrom<T>(boot_slice, dst_place, slice.CopyFrom(boot_slice, dst_place, platform::CPUDeviceContext());
platform::CPUDeviceContext());
} }
} }
} // namespace detail inline void RestoreInitialState(const DySeqMetaBatch& metas,
const LoDTensor& tensor, LoDTensor* boot_state,
class DynamicRecurrentOpProtoAndCheckerMaker const platform::Place& dst_place) {
: public framework::OpProtoAndCheckerMaker { for (size_t seq_id = 0; seq_id < metas.size(); seq_id++) {
public: auto slice = tensor.Slice(seq_id, seq_id + 1);
DynamicRecurrentOpProtoAndCheckerMaker(framework::OpProto* proto, auto boot_slice =
framework::OpAttrChecker* op_checker) boot_state->Slice(metas[seq_id].ori_idx, metas[seq_id].ori_idx + 1);
: OpProtoAndCheckerMaker(proto, op_checker) { boot_slice.CopyFrom(slice, dst_place, platform::CPUDeviceContext());
const auto& name = DynamicRecurrentOp::kArgName;
// inputs and outputs stored in proto
AddInput(name.inlinks,
"the inputs that need to be segmented for each step.")
.AsDuplicable();
AddInput(name.boot_memories, "variables to initialize memories.")
.AsDuplicable();
AddOutput(name.outlinks, "the outputs that need to concated for all steps.")
.AsDuplicable();
AddOutput(name.step_scopes, "step scopes");
// Attributes stored in AttributeMap
AddAttr<std::vector<std::string>>(name.pre_memories,
"names of pre-memories");
AddAttr<std::vector<std::string>>(name.memories, "names of memories");
AddComment("This is a RNN operator for varience-length sequences.");
} }
}; }
void DynamicRecurrentOp::Run(const Scope& scope, } // namespace detail
const platform::DeviceContext& dev_ctx) const {
cache_.Init(kArgName, *this, scope, &arg_); // Implementation for forward propagation.
template <>
void RNNAlgorithm::Run<RNNAlgorithm::ComputeMode::kForward>(
const framework::Scope& scope, const framework::OperatorBase& op,
const platform::DeviceContext& dev_ctx) {
SetComputeMode(ComputeMode::kForward);
cache_.Init(kArgNames[mode_], op, scope, &dev_ctx, &arg_);
SplitInputs(); SplitInputs();
CreateScopes(); CreateScopes();
WriteStepInputs(); WriteStepInputs();
InitStates(); InitStates();
WriteStepOutputs(); WriteStepOutputs();
RunSteps();
ConcatOutputs();
}
// call stepnet in all the time steps // Implementation for backward propagation.
for (size_t step = 0; step < cache_.num_steps; step++) { template <>
auto& step_scope = cache_.GetScope(step); void RNNAlgorithm::Run<RNNAlgorithm::ComputeMode::kBackward>(
stepnet_->Run(step_scope, dev_ctx); const framework::Scope& scope, const framework::OperatorBase& op,
const platform::DeviceContext& dev_ctx) {
SetComputeMode(ComputeMode::kBackward);
cache_.Init(kArgNames[mode_], op, scope, &dev_ctx, &arg_);
SplitInputs();
WriteStepInputs();
InitStates();
WriteStepOutputs();
RunSteps();
// copy boot-states' gradients back.
for (const auto& state : arg_.states) {
ExportInitialStateGradient(state);
} }
ConcatOutputs(); ConcatOutputs();
} }
void DynamicRecurrentOp::SplitInputs() const { void RNNAlgorithm::SplitInputs() {
// TODO(superjom) make level a config // TODO(superjom) make level a config
// TODO(superjom) check all the inputs has the same LoD // TODO(superjom) check all the inputs has the same LoD
int level = 0; int level = 0;
for (const auto& item : cache_.inlinks) { for (const auto& item : cache_.inputs) {
const auto& var = item.second; const auto& var = item.second;
const auto& tensor = var->Get<LoDTensor>(); const auto& tensor = var->Get<LoDTensor>();
TensorArray& ta = step_inputs_[item.first]; TensorArray& ta = step_inputs_[item.first];
...@@ -125,8 +126,8 @@ void DynamicRecurrentOp::SplitInputs() const { ...@@ -125,8 +126,8 @@ void DynamicRecurrentOp::SplitInputs() const {
} }
} }
void DynamicRecurrentOp::WriteStepInputs() const { void RNNAlgorithm::WriteStepInputs() {
for (const auto& item : cache_.inlinks) { for (const auto& item : cache_.inputs) {
auto ta_it = step_inputs_.find(item.first); auto ta_it = step_inputs_.find(item.first);
PADDLE_ENFORCE(ta_it != step_inputs_.end(), PADDLE_ENFORCE(ta_it != step_inputs_.end(),
"step_inputs_ not compatible with memory set"); "step_inputs_ not compatible with memory set");
...@@ -138,20 +139,20 @@ void DynamicRecurrentOp::WriteStepInputs() const { ...@@ -138,20 +139,20 @@ void DynamicRecurrentOp::WriteStepInputs() const {
if (var == nullptr) { if (var == nullptr) {
var = step_scope.Var(item.first); var = step_scope.Var(item.first);
} }
var->GetMutable<LoDTensor>()->ShareDataWith<value_type>(tensor); var->GetMutable<LoDTensor>()->ShareDataWith(tensor);
} }
} }
} }
void DynamicRecurrentOp::WriteStepOutputs() const { void RNNAlgorithm::WriteStepOutputs() {
// initialize step outputs // initialize step outputs
for (const auto& item : cache_.outlinks) { for (const auto& item : cache_.outputs) {
step_outputs_.emplace(item.first, TensorArray()); step_outputs_.emplace(item.first, TensorArray());
} }
PADDLE_ENFORCE_GT(step_outputs_.size(), 0UL); PADDLE_ENFORCE_GT(step_outputs_.size(), 0UL);
} }
void DynamicRecurrentOp::CreateScopes() const { void RNNAlgorithm::CreateScopes() {
PADDLE_ENFORCE_GT(cache_.num_steps, 0); PADDLE_ENFORCE_GT(cache_.num_steps, 0);
// resize scopes // resize scopes
size_t num_scopes_need_create = cache_.num_steps - cache_.scopes->size(); size_t num_scopes_need_create = cache_.num_steps - cache_.scopes->size();
...@@ -160,19 +161,19 @@ void DynamicRecurrentOp::CreateScopes() const { ...@@ -160,19 +161,19 @@ void DynamicRecurrentOp::CreateScopes() const {
} }
// init temporary inputs // init temporary inputs
PADDLE_ENFORCE_NOT_NULL(stepnet_, "stepnet should be set first"); PADDLE_ENFORCE_NOT_NULL(step_unit_, "stepnet should be set first");
std::vector<std::string> memories; std::vector<std::string> states;
std::vector<std::string> pre_memories; std::vector<std::string> ex_states;
std::vector<std::string> stepnet_outputs; std::vector<std::string> step_unit_outputs;
std::transform(arg_.memories.begin(), arg_.memories.end(), std::transform(arg_.states.begin(), arg_.states.end(),
std::back_inserter(memories), std::back_inserter(states),
[](const rnn::MemoryAttr& m) { return m.var; }); [](const rnn::StateAttr& m) { return m.var; });
std::transform(arg_.memories.begin(), arg_.memories.end(), std::transform(arg_.states.begin(), arg_.states.end(),
std::back_inserter(pre_memories), std::back_inserter(ex_states),
[](const rnn::MemoryAttr& m) { return m.pre_var; }); [](const rnn::StateAttr& m) { return m.pre_var; });
for (const auto& item : stepnet_->Outputs()) { for (const auto& item : step_unit_->Outputs()) {
for (const auto& var : item.second) { for (const auto& var : item.second) {
stepnet_outputs.push_back(var); step_unit_outputs.push_back(var);
} }
} }
...@@ -180,13 +181,13 @@ void DynamicRecurrentOp::CreateScopes() const { ...@@ -180,13 +181,13 @@ void DynamicRecurrentOp::CreateScopes() const {
auto& scope = cache_.GetScope(step); auto& scope = cache_.GetScope(step);
detail::CreateVariables(scope, arg_.inlinks); detail::CreateVariables(scope, arg_.inlinks);
detail::CreateVariables(scope, arg_.outlinks); detail::CreateVariables(scope, arg_.outlinks);
detail::CreateVariables(scope, memories); detail::CreateVariables(scope, states);
detail::CreateVariables(scope, pre_memories); detail::CreateVariables(scope, ex_states);
detail::CreateVariables(scope, stepnet_outputs); detail::CreateVariables(scope, step_unit_outputs);
} }
} }
void DynamicRecurrentOp::ConcatOutputs() const { void RNNAlgorithm::ConcatOutputs() {
// TODO(superjom) transform this to a config // TODO(superjom) transform this to a config
int level = 0; int level = 0;
for (size_t step = 0; step < cache_.num_steps; step++) { for (size_t step = 0; step < cache_.num_steps; step++) {
...@@ -199,31 +200,45 @@ void DynamicRecurrentOp::ConcatOutputs() const { ...@@ -199,31 +200,45 @@ void DynamicRecurrentOp::ConcatOutputs() const {
item.second.WriteShared(step, *tensor); item.second.WriteShared(step, *tensor);
} }
} }
// the inlinks' lods should be the same, so randomly get one lod. // the inputs' lods should be the same, so randomly get one lod.
const auto& some_lod = const auto& some_lod =
cache_.scope->FindVar(arg_.inlinks.front())->Get<LoDTensor>().lod(); cache_.scope->FindVar(arg_.inlinks.front())->Get<LoDTensor>().lod();
const auto& some_meta = dy_seq_metas_[arg_.inlinks.front()]; const auto& some_meta = dy_seq_metas_[arg_.inlinks.front()];
for (auto& item : step_outputs_) { for (auto& item : step_outputs_) {
auto tensor = item.second.Pack(level, some_meta, some_lod); auto tensor = item.second.Pack(level, some_meta, some_lod);
auto* output = cache_.outlinks[item.first]->GetMutable<LoDTensor>(); auto* output = cache_.outputs[item.first]->GetMutable<LoDTensor>();
const_cast<LoDTensor*>(output)->ShareDataWith<value_type>(tensor); const_cast<LoDTensor*>(output)->ShareDataWith(tensor);
}
}
void RNNAlgorithm::RunSteps() {
if (IsBackward()) {
// call stepnet in all the time steps reversely
for (int step = cache_.num_steps - 1; step >= 0; step--) {
auto& step_scope = cache_.GetScope(step);
step_unit_->Run(step_scope, *cache_.dev_ctx);
}
} else {
for (size_t step = 0; step < cache_.num_steps; step++) {
auto& step_scope = cache_.GetScope(step);
step_unit_->Run(step_scope, *cache_.dev_ctx);
}
} }
} }
void DynamicRecurrentOp::InitStates() const { void RNNAlgorithm::InitStates() {
for (size_t step = 0; step < cache_.num_steps; step++) { for (size_t step = 0; step < cache_.num_steps; step++) {
for (const auto& memory : arg_.memories) { for (const auto& state : arg_.states) {
CreateState(memory, step); CreateState(state, step);
LinkState(memory, step); LinkState(state, step);
} }
} }
} }
void DynamicRecurrentOp::CreateState(const rnn::MemoryAttr& memory, void RNNAlgorithm::CreateState(const rnn::StateAttr& state_attr, size_t step) {
size_t step) const {
auto& scope = cache_.GetScope(step); auto& scope = cache_.GetScope(step);
auto& state = *cache_.GetTensor(scope, memory.var); auto& state = *cache_.GetTensor(scope, state_attr.var);
auto& boot_state = *cache_.GetTensor(*cache_.scope, memory.boot_var); auto& boot_state = *cache_.GetTensor(*cache_.scope, state_attr.boot_var);
size_t num_instances = size_t num_instances =
step_inputs_[arg_.inlinks.front()].Read(step).dims()[0]; step_inputs_[arg_.inlinks.front()].Read(step).dims()[0];
...@@ -232,55 +247,78 @@ void DynamicRecurrentOp::CreateState(const rnn::MemoryAttr& memory, ...@@ -232,55 +247,78 @@ void DynamicRecurrentOp::CreateState(const rnn::MemoryAttr& memory,
state.Resize(dims); state.Resize(dims);
state.mutable_data<value_type>(platform::CPUPlace()); state.mutable_data<value_type>(platform::CPUPlace());
states_[memory.var].WriteShared(step, state); states_[state_attr.var].WriteShared(step, state);
} }
void DynamicRecurrentOp::LinkState(const rnn::MemoryAttr& memory, void RNNAlgorithm::LinkState(const rnn::StateAttr& state, size_t step) {
size_t step) const {
auto& scope = cache_.GetScope(step); auto& scope = cache_.GetScope(step);
auto& state_pre = *cache_.GetTensor(scope, memory.pre_var); auto& state_pre = *cache_.GetTensor(scope, state.pre_var);
// process the first state's boot-state(the 0-step in forward mode or the
// last step in backward mode)
// Only forward mode need to link the boot-state to the `pre-state` in first
// time step. In backward mode, need to copy the gradient of `pre-state` in
// first time step to the gradient of `boot-state`.
if (step == 0 && IsForward()) {
LinkInitialState(state);
} else {
size_t num_instances =
step_inputs_[arg_.inlinks.front()].Read(step).dims()[0];
auto* pre_state = cache_.GetTensor(cache_.GetScope(step - 1), state.var);
// shink and share from previous state
auto shrinked_pre_state = pre_state->Slice(0, num_instances);
state_pre.ShareDataWith(shrinked_pre_state);
}
}
void RNNAlgorithm::LinkInitialState(const rnn::StateAttr& state) {
// all the step_inputs' metas should be the same, just randomly select one // all the step_inputs' metas should be the same, just randomly select one
// and get the dyseq meta. // and get the dyseq meta.
const auto& some_meta = dy_seq_metas_[arg_.inlinks.front()]; const auto& some_meta = dy_seq_metas_[arg_.inlinks.front()];
size_t num_instances = auto& scope = cache_.GetScope(0);
step_inputs_[arg_.inlinks.front()].Read(step).dims()[0]; auto& state_pre = *cache_.GetTensor(scope, state.pre_var);
auto* pre_state = cache_.GetTensor(*cache_.scope, state.boot_var);
LoDTensor* pre_state{nullptr};
if (step == 0) {
pre_state = cache_.GetTensor(*cache_.scope, memory.boot_var);
pre_state->mutable_data<float>(platform::CPUPlace()); pre_state->mutable_data<float>(platform::CPUPlace());
// allocate memory // allocate state
state_pre.Resize(pre_state->dims()); state_pre.Resize(pre_state->dims());
state_pre.mutable_data<value_type>(platform::CPUPlace()); state_pre.mutable_data<value_type>(platform::CPUPlace());
detail::ReorderBootState<value_type>(some_meta, *pre_state, &state_pre, detail::ReorderInitialState(some_meta, *pre_state, &state_pre,
pre_state->place()); pre_state->place());
} else { }
pre_state = cache_.GetTensor(cache_.GetScope(step - 1), memory.var);
}
// shink and share from previous state void RNNAlgorithm::ExportInitialStateGradient(const rnn::StateAttr& state) {
auto shrinked_pre_state = pre_state->Slice<value_type>(0, num_instances); // all the step_inputs' metas should be the same, just randomly select one
state_pre.ShareDataWith<value_type>(shrinked_pre_state); // and get the dyseq meta.
const auto& some_meta = dy_seq_metas_[arg_.inlinks.front()];
auto& scope = cache_.GetScope(0);
auto& state_pre = *cache_.GetTensor(scope, state.pre_var);
auto& pre_state = *cache_.GetTensor(*cache_.scope, state.boot_var);
pre_state.Resize(state_pre.dims());
detail::RestoreInitialState(some_meta, state_pre, &pre_state,
pre_state.place());
} }
void DynamicRecurrentOp::ArgCache::Init( void RNNAlgorithm::ArgCache::Init(const rnn::ArgumentName& name,
const rnn::ArgumentName& name, const paddle::framework::OperatorBase& op, const paddle::framework::OperatorBase& op,
const paddle::framework::Scope& scope, rnn::Argument* arg) { const paddle::framework::Scope& scope,
platform::DeviceContext const* dev_ctx,
rnn::Argument* arg) {
this->scope = &scope; this->scope = &scope;
InitArgument(name, op, arg); InitArgument(name, op, arg);
CacheScopes(scope, *arg); CacheScopes(scope, *arg);
CacheInlinks(scope, arg->inlinks); CacheInlinks(scope, arg->inlinks);
CacheOutlinks(scope, arg->outlinks); CacheOutlinks(scope, arg->outlinks);
this->dev_ctx = dev_ctx;
} }
void DynamicRecurrentOp::ArgCache::InitArgument(const rnn::ArgumentName& name, void RNNAlgorithm::ArgCache::InitArgument(const rnn::ArgumentName& name,
const OperatorBase& op, const OperatorBase& op,
rnn::Argument* arg) { rnn::Argument* arg) {
rnn::InitArgument(name, arg, op, false /*is_grad*/); rnn::InitArgument(name, arg, op, false /*is_grad*/);
} }
void DynamicRecurrentOp::ArgCache::CacheScopes(const Scope& scope, void RNNAlgorithm::ArgCache::CacheScopes(const Scope& scope,
const rnn::Argument& arg) { const rnn::Argument& arg) {
auto scopes_var = scope.FindVar(arg.step_scopes); auto scopes_var = scope.FindVar(arg.step_scopes);
PADDLE_ENFORCE(scopes_var != nullptr, PADDLE_ENFORCE(scopes_var != nullptr,
...@@ -290,45 +328,85 @@ void DynamicRecurrentOp::ArgCache::CacheScopes(const Scope& scope, ...@@ -290,45 +328,85 @@ void DynamicRecurrentOp::ArgCache::CacheScopes(const Scope& scope,
this->scopes = scopes_var->GetMutable<std::vector<Scope*>>(); this->scopes = scopes_var->GetMutable<std::vector<Scope*>>();
} }
void DynamicRecurrentOp::ArgCache::CacheInlinks( void RNNAlgorithm::ArgCache::CacheInlinks(
const Scope& scope, const std::vector<std::string>& names) { const Scope& scope, const std::vector<std::string>& names) {
for (auto name : names) { for (auto name : names) {
auto* var = GetVariable(scope, name); auto* var = GetVariable(scope, name);
inlinks[name] = var; inputs[name] = var;
} }
} }
void DynamicRecurrentOp::ArgCache::CacheOutlinks( void RNNAlgorithm::ArgCache::CacheOutlinks(
const Scope& scope, const std::vector<std::string>& names) { const Scope& scope, const std::vector<std::string>& names) {
for (auto name : names) { for (auto name : names) {
auto* var = GetVariable(scope, name); auto* var = GetVariable(scope, name);
outlinks[name] = var; outputs[name] = var;
} }
} }
Variable* DynamicRecurrentOp::ArgCache::GetVariable(const Scope& scope, Variable* RNNAlgorithm::ArgCache::GetVariable(const Scope& scope,
const std::string& name) { const std::string& name) {
auto* var = scope.FindVar(name); auto* var = scope.FindVar(name);
PADDLE_ENFORCE_NOT_NULL(var, "variable [%s] not exist in scope", name); PADDLE_ENFORCE_NOT_NULL(var, "variable [%s] not exist in scope", name);
return var; return var;
} }
LoDTensor* DynamicRecurrentOp::ArgCache::GetTensor( LoDTensor* RNNAlgorithm::ArgCache::GetTensor(const framework::Scope& scope,
const framework::Scope& scope, const std::string& name) { const std::string& name) {
auto* var = GetVariable(scope, name); auto* var = GetVariable(scope, name);
return var->GetMutable<LoDTensor>(); return var->GetMutable<LoDTensor>();
} }
const rnn::ArgumentName DynamicRecurrentOp::kArgName{ const std::array<rnn::ArgumentName, 2> RNNAlgorithm::kArgNames{
"step_net", "step_scopes", "inlinks", "outlinks", {rnn::ArgumentName{"step_unit", "step_scopes", "inputs", "outputs",
"memories", "pre_memories", "boot_memories"}; "states", "ex_states", "initial_states"},
rnn::ArgumentName{"step_unit", "step_scopes@GRAD", "outputs@GRAD",
"inputs@GRAD", "states", "ex_states",
"initial_states@GRAD"}}};
void DynamicRecurrentOp::Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const {
rnn.Run<RNNAlgorithm::ComputeMode::kForward>(
scope, *dynamic_cast<const OperatorBase*>(this), dev_ctx);
}
void DynamicRecurrentGradientOp::Run( void DynamicRecurrentGradientOp::Run(
const Scope& scope, const platform::DeviceContext& dev_ctx) const {} const Scope& scope, const platform::DeviceContext& dev_ctx) const {
rnn.Run<RNNAlgorithm::ComputeMode::kBackward>(
scope, *dynamic_cast<const OperatorBase*>(this), dev_ctx);
}
class DynamicRecurrentOpProtoAndCheckerMaker
: public framework::OpProtoAndCheckerMaker {
public:
DynamicRecurrentOpProtoAndCheckerMaker(framework::OpProto* proto,
framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
const auto& name =
RNNAlgorithm::kArgNames[RNNAlgorithm::ComputeMode::kForward];
// inputs and outputs stored in proto
AddInput(name.inlinks,
"the inputs that need to be segmented for each step.")
.AsDuplicable();
AddInput(name.initial_states, "variables to initialize states.")
.AsDuplicable();
AddOutput(name.outlinks, "the outputs that need to concated for all steps.")
.AsDuplicable();
AddOutput(name.step_scopes, "step scopes");
// Attributes stored in AttributeMap
AddAttr<std::vector<std::string>>(name.ex_states, "names of ex_states");
AddAttr<std::vector<std::string>>(name.states, "names of states");
AddComment("This is a RNN operator for varience-length sequences.");
}
};
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
REGISTER_OP_WITHOUT_GRADIENT( REGISTER_OP(dynamic_recurrent, paddle::operators::DynamicRecurrentOp,
dynamic_recurrent, paddle::operators::DynamicRecurrentOp, paddle::operators::DynamicRecurrentOpProtoAndCheckerMaker,
paddle::operators::DynamicRecurrentOpProtoAndCheckerMaker); dynamic_recurrent_grad,
paddle::operators::DynamicRecurrentGradientOp);
...@@ -27,47 +27,39 @@ ...@@ -27,47 +27,39 @@
namespace paddle { namespace paddle {
namespace operators { namespace operators {
class DynamicRecurrentOp : public framework::OperatorBase { class RNNAlgorithm {
public: public:
static const rnn::ArgumentName kArgName; enum ComputeMode { kForward = 0, kBackward = 1 };
static const std::array<rnn::ArgumentName, 2> kArgNames;
using value_type = float; using value_type = float;
DynamicRecurrentOp(const std::string& type, /*
const framework::VariableNameMap& inputs, * Different `Run` method for forward and backward, `_` is just for template
const framework::VariableNameMap& outputs, * specifialization.
const framework::AttributeMap& attrs) */
: OperatorBase(type, inputs, outputs, attrs) {} template <ComputeMode _>
void Run(const framework::Scope& scope, const framework::OperatorBase& op,
DynamicRecurrentOp(const DynamicRecurrentOp& o) const platform::DeviceContext& dev_ctx);
: framework::OperatorBase(
static_cast<const framework::OperatorBase&>(o)) {
// TODO(yuyang18): Implement copy ctor well.
PADDLE_THROW("Not implemented");
}
void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const override;
/* /*
* Split the inputs(LoDTensors) to segments for each time step. * Split the inputs(LoDTensors) to segments for each time step.
*/ */
void SplitInputs() const; void SplitInputs();
/* /*
* Create step-scopes to store temporary outputs in each time steps. * Create step-scopes to store temporary outputs in each time steps.
*/ */
void CreateScopes() const; void CreateScopes();
/* /*
* Link TensorArray steps to the corresponding variables located in * Link TensorArray steps to the corresponding variables located in
* step-scopes. * step-scopes.
*/ */
void WriteStepInputs() const; void WriteStepInputs();
/* /*
* Write output of each step to the corresponding TensorArray. * Write output of each step to the corresponding TensorArray.
*/ */
void WriteStepOutputs() const; void WriteStepOutputs();
/* /*
* Initialize the states, each state will have a corresponding pre-state, * Initialize the states, each state will have a corresponding pre-state,
...@@ -75,54 +67,83 @@ class DynamicRecurrentOp : public framework::OperatorBase { ...@@ -75,54 +67,83 @@ class DynamicRecurrentOp : public framework::OperatorBase {
* pre-state in the first time step will be initialized with an zero tensor or * pre-state in the first time step will be initialized with an zero tensor or
* a tensor in parent scope if is provided. * a tensor in parent scope if is provided.
*/ */
void InitStates() const; void InitStates();
/* /*
* Create state variables for each time step. * Create state variables for each time step.
*/ */
void CreateState(const rnn::MemoryAttr& memory, size_t step) const; void CreateState(const rnn::StateAttr& state, size_t step);
/* /*
* Link pre-state variable in current scope to the state variable in the * Link pre-state variable in current scope to the state variable in the
* previous time step (scope). * previous time step (scope) by reference.
*/
void LinkState(const rnn::StateAttr& state, size_t step);
/*
* Link the pre-state of the first time step to the `boot-state` in parent's
* scope.
*/
void LinkInitialState(const rnn::StateAttr& state);
/*
* Copy the gradient from `pre-state` in the first step-scope to the
* `boot-state` in parent's scope.
*/
void ExportInitialStateGradient(const rnn::StateAttr& state);
/*
* Calculate time steps.
*/ */
void LinkState(const rnn::MemoryAttr& memory, size_t step) const; void RunSteps();
/* /*
* Concatenate outputs in each time step and generate a LoDTensor. * Concatenate outputs in each time step and generate a LoDTensor.
*/ */
void ConcatOutputs() const; void ConcatOutputs();
void SetComputeMode(ComputeMode mode) { mode_ = mode; }
bool IsForward() const { return mode_ == ComputeMode::kForward; }
bool IsBackward() const { return mode_ == ComputeMode::kBackward; }
/* /*
* set a stepnet that is created according to a RecurrentOp's stepnet. * set a step unit that is created according to a RecurrentOp's step unit.
*/ */
void SetStepNet(std::unique_ptr<OperatorBase> net) { void SetStepUnit(std::unique_ptr<framework::OperatorBase> step_unit) {
PADDLE_ENFORCE_NOT_NULL(net); PADDLE_ENFORCE_NOT_NULL(step_unit);
stepnet_ = std::move(net); step_unit_ = std::move(step_unit);
} }
const OperatorBase& GetStepNet() const { return *stepnet_; } const framework::OperatorBase& GetStepUnit() const { return *step_unit_; }
const framework::TensorArray& state(const std::string& name) const { const framework::TensorArray& state(const std::string& name) const {
return states_[name]; auto it = states_.find(name);
PADDLE_ENFORCE(it != states_.end());
return it->second;
} }
const framework::TensorArray& step_input(const std::string& name) const { const framework::TensorArray& step_input(const std::string& name) const {
return step_inputs_[name]; auto it = step_inputs_.find(name);
PADDLE_ENFORCE(it != step_inputs_.end());
return it->second;
} }
const framework::TensorArray& step_output(const std::string& name) const { const framework::TensorArray& step_output(const std::string& name) const {
return step_outputs_[name]; auto it = step_outputs_.find(name);
PADDLE_ENFORCE(it != step_outputs_.end());
return it->second;
} }
protected: protected:
struct ArgCache { struct ArgCache {
framework::Scope const* scope; framework::Scope const* scope;
std::vector<framework::Scope*>* scopes; std::vector<framework::Scope*>* scopes;
std::map<std::string, framework::Variable*> inlinks; std::map<std::string, framework::Variable*> inputs;
std::map<std::string, framework::Variable*> outlinks; std::map<std::string, framework::Variable*> outputs;
platform::DeviceContext const* dev_ctx;
size_t num_steps{0}; size_t num_steps{0};
void Init(const rnn::ArgumentName& name, const OperatorBase& op, void Init(const rnn::ArgumentName& name, const framework::OperatorBase& op,
const framework::Scope& scope, rnn::Argument* arg); const framework::Scope& scope,
platform::DeviceContext const* dev_ctx, rnn::Argument* arg);
framework::Scope& GetScope(size_t index) { framework::Scope& GetScope(size_t index) {
PADDLE_ENFORCE_LT(index, num_steps); PADDLE_ENFORCE_LT(index, num_steps);
...@@ -133,8 +154,8 @@ class DynamicRecurrentOp : public framework::OperatorBase { ...@@ -133,8 +154,8 @@ class DynamicRecurrentOp : public framework::OperatorBase {
const std::string& name); const std::string& name);
private: private:
void InitArgument(const rnn::ArgumentName& name, const OperatorBase& op, void InitArgument(const rnn::ArgumentName& name,
rnn::Argument* arg); const framework::OperatorBase& op, rnn::Argument* arg);
void CacheScopes(const framework::Scope& scope, const rnn::Argument& arg); void CacheScopes(const framework::Scope& scope, const rnn::Argument& arg);
void CacheInlinks(const framework::Scope& scope, void CacheInlinks(const framework::Scope& scope,
const std::vector<std::string>& names); const std::vector<std::string>& names);
...@@ -145,27 +166,49 @@ class DynamicRecurrentOp : public framework::OperatorBase { ...@@ -145,27 +166,49 @@ class DynamicRecurrentOp : public framework::OperatorBase {
}; };
private: private:
std::unique_ptr<OperatorBase> stepnet_; std::unique_ptr<framework::OperatorBase> step_unit_;
mutable std::map<std::string, framework::TensorArray> states_; std::map<std::string, framework::TensorArray> states_;
mutable std::map<std::string, framework::TensorArray> step_inputs_; std::map<std::string, framework::TensorArray> step_inputs_;
mutable std::map<std::string, framework::TensorArray> step_outputs_; std::map<std::string, framework::TensorArray> step_outputs_;
mutable std::map<std::string, std::vector<framework::DySeqMeta>> std::map<std::string, std::vector<framework::DySeqMeta>> dy_seq_metas_;
dy_seq_metas_; rnn::Argument arg_;
mutable rnn::Argument arg_; ArgCache cache_;
mutable ArgCache cache_; ComputeMode mode_{ComputeMode::kForward};
#ifdef PADDLE_WITH_TESTING #ifdef PADDLE_WITH_TESTING
friend class DynamicRecurrentOpTestHelper; // test forward
FRIEND_TEST(DynamicRecurrentOpTestHelper, SplitInputs); friend class RNNAlgorithmTestHelper;
FRIEND_TEST(DynamicRecurrentOpTestHelper, CreateCache); FRIEND_TEST(RNNAlgorithmTestHelper, SplitInputs);
FRIEND_TEST(DynamicRecurrentOpTestHelper, CreateScopes); FRIEND_TEST(RNNAlgorithmTestHelper, CreateCache);
FRIEND_TEST(DynamicRecurrentOpTestHelper, WriteStepInputs); FRIEND_TEST(RNNAlgorithmTestHelper, CreateScopes);
FRIEND_TEST(DynamicRecurrentOpTestHelper, WriteStepOutputs); FRIEND_TEST(RNNAlgorithmTestHelper, WriteStepInputs);
FRIEND_TEST(DynamicRecurrentOpTestHelper, InitStates); FRIEND_TEST(RNNAlgorithmTestHelper, WriteStepOutputs);
FRIEND_TEST(DynamicRecurrentOpTestHelper, ConcatOutputs); FRIEND_TEST(RNNAlgorithmTestHelper, InitStates);
FRIEND_TEST(RNNAlgorithmTestHelper, ConcatOutputs);
// TODO(superjom) test backward
#endif #endif
}; };
class DynamicRecurrentOp : public framework::OperatorBase {
public:
DynamicRecurrentOp(const std::string& type,
const framework::VariableNameMap& inputs,
const framework::VariableNameMap& outputs,
const framework::AttributeMap& attrs)
: OperatorBase(type, inputs, outputs, attrs) {}
DynamicRecurrentOp(const DynamicRecurrentOp& o)
: framework::OperatorBase(
static_cast<const framework::OperatorBase&>(o)) {
PADDLE_THROW("Not implemented");
}
void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const override;
mutable RNNAlgorithm rnn;
};
class DynamicRecurrentGradientOp : public framework::OperatorBase { class DynamicRecurrentGradientOp : public framework::OperatorBase {
public: public:
DynamicRecurrentGradientOp(const std::string& type, DynamicRecurrentGradientOp(const std::string& type,
...@@ -174,8 +217,16 @@ class DynamicRecurrentGradientOp : public framework::OperatorBase { ...@@ -174,8 +217,16 @@ class DynamicRecurrentGradientOp : public framework::OperatorBase {
const framework::AttributeMap& attrs) const framework::AttributeMap& attrs)
: OperatorBase(type, inputs, outputs, attrs) {} : OperatorBase(type, inputs, outputs, attrs) {}
DynamicRecurrentGradientOp(const DynamicRecurrentGradientOp& o)
: framework::OperatorBase(
static_cast<const framework::OperatorBase&>(o)) {
PADDLE_THROW("Not implemented");
}
void Run(const framework::Scope& scope, void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const override; const platform::DeviceContext& dev_ctx) const override;
mutable RNNAlgorithm rnn;
}; };
} // namespace operators } // namespace operators
......
...@@ -43,16 +43,16 @@ LoDTensor* CreateVar(Scope& scope, std::string name, framework::DDim dims, ...@@ -43,16 +43,16 @@ LoDTensor* CreateVar(Scope& scope, std::string name, framework::DDim dims,
return tensor; return tensor;
} }
class DynamicRecurrentOpTestHelper : public ::testing::Test { class RNNAlgorithmTestHelper : public ::testing::Test {
protected: protected:
const rnn::ArgumentName argname = DynamicRecurrentOp::kArgName; const rnn::ArgumentName argname = RNNAlgorithm::kArgNames[0];
virtual void SetUp() override { virtual void SetUp() override {
CreateGlobalVariables(); CreateGlobalVariables();
auto op_desc = CreateOpDesc(); auto op_desc = CreateOpDesc();
op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr); op = paddle::framework::OpRegistry::CreateOp(op_desc, nullptr);
dop = dynamic_cast<DynamicRecurrentOp*>(op.get()); dop = &(dynamic_cast<DynamicRecurrentOp*>(op.get())->rnn);
InitCacheManually(); InitCacheManually();
InitStepNet(); InitStepNet();
} }
...@@ -63,20 +63,20 @@ class DynamicRecurrentOpTestHelper : public ::testing::Test { ...@@ -63,20 +63,20 @@ class DynamicRecurrentOpTestHelper : public ::testing::Test {
op_desc.set_type("dynamic_recurrent"); op_desc.set_type("dynamic_recurrent");
OpDescNewVar(argname.inlinks, {"in0"}, op_desc.add_inputs()); OpDescNewVar(argname.inlinks, {"in0"}, op_desc.add_inputs());
OpDescNewVar(argname.boot_memories, {"boot_mem"}, op_desc.add_inputs()); OpDescNewVar(argname.initial_states, {"boot_mem"}, op_desc.add_inputs());
OpDescNewVar(argname.step_scopes, {"step_scopes"}, op_desc.add_outputs()); OpDescNewVar(argname.step_scopes, {"step_scopes"}, op_desc.add_outputs());
OpDescNewVar(argname.outlinks, {"out0"}, op_desc.add_outputs()); OpDescNewVar(argname.outlinks, {"out0"}, op_desc.add_outputs());
// set pre-memories // set pre-states
auto pre_memories = op_desc.mutable_attrs()->Add(); auto pre_memories = op_desc.mutable_attrs()->Add();
pre_memories->set_name(argname.pre_memories); pre_memories->set_name(argname.ex_states);
pre_memories->set_type(paddle::framework::AttrType::STRINGS); pre_memories->set_type(paddle::framework::AttrType::STRINGS);
auto pre_memories_item = pre_memories->add_strings(); auto pre_memories_item = pre_memories->add_strings();
*pre_memories_item = "mem@pre"; *pre_memories_item = "mem@pre";
// set memories // set states
auto memories = op_desc.mutable_attrs()->Add(); auto memories = op_desc.mutable_attrs()->Add();
memories->set_name(argname.memories); memories->set_name(argname.states);
memories->set_type(paddle::framework::AttrType::STRINGS); memories->set_type(paddle::framework::AttrType::STRINGS);
auto memories_item = memories->add_strings(); auto memories_item = memories->add_strings();
*memories_item = "mem"; *memories_item = "mem";
...@@ -113,32 +113,33 @@ class DynamicRecurrentOpTestHelper : public ::testing::Test { ...@@ -113,32 +113,33 @@ class DynamicRecurrentOpTestHelper : public ::testing::Test {
} }
void InitCacheManually() { void InitCacheManually() {
dop->cache_.Init(DynamicRecurrentOp::kArgName, *dop, scope, &dop->arg_); dop->cache_.Init(RNNAlgorithm::kArgNames[0], *op, scope, &device_context,
&dop->arg_);
} }
void InitStepNet() { void InitStepNet() {
std::unique_ptr<framework::OperatorBase> stepnet{new NetOp}; std::unique_ptr<framework::OperatorBase> stepnet{new NetOp};
dynamic_cast<NetOp*>(stepnet.get()) dynamic_cast<NetOp*>(stepnet.get())
->AppendOp(std::unique_ptr<TestOp>(new TestOp( ->AppendOp(std::unique_ptr<TestOp>(new TestOp(
"test", {{"inlinks", {"in0"}}, {"boot_memories", {"boot_mem"}}}, "test", {{"inputs", {"in0"}}, {"initial_states", {"boot_mem"}}},
{{"outlinks", {"out0"}}, {"step_scopes", {"step_scopes"}}}, {}))); {{"outputs", {"out0"}}, {"step_scopes", {"step_scopes"}}}, {})));
dop->SetStepNet(std::move(stepnet)); dop->SetStepUnit(std::move(stepnet));
} }
protected: protected:
DynamicRecurrentOp* dop; RNNAlgorithm* dop;
std::unique_ptr<framework::OperatorBase> op; std::unique_ptr<framework::OperatorBase> op;
paddle::platform::CPUDeviceContext device_context; paddle::platform::CPUDeviceContext device_context;
paddle::framework::Scope scope; paddle::framework::Scope scope;
}; };
TEST_F(DynamicRecurrentOpTestHelper, CreateCache) { TEST_F(RNNAlgorithmTestHelper, CreateCache) {
const rnn::Argument& arg = dop->arg_; const rnn::Argument& arg = dop->arg_;
ASSERT_EQ(arg.inlinks.size(), 1UL); ASSERT_EQ(arg.inlinks.size(), 1UL);
ASSERT_EQ(arg.outlinks.size(), 1UL); ASSERT_EQ(arg.outlinks.size(), 1UL);
} }
TEST_F(DynamicRecurrentOpTestHelper, SplitInputs) { TEST_F(RNNAlgorithmTestHelper, SplitInputs) {
dop->SplitInputs(); dop->SplitInputs();
auto& in0_ta = dop->step_inputs_["in0"]; auto& in0_ta = dop->step_inputs_["in0"];
ASSERT_EQ(in0_ta.size(), 4UL); ASSERT_EQ(in0_ta.size(), 4UL);
...@@ -153,14 +154,14 @@ TEST_F(DynamicRecurrentOpTestHelper, SplitInputs) { ...@@ -153,14 +154,14 @@ TEST_F(DynamicRecurrentOpTestHelper, SplitInputs) {
EXPECT_EQ(batch3.dims()[0], 1); EXPECT_EQ(batch3.dims()[0], 1);
} }
TEST_F(DynamicRecurrentOpTestHelper, CreateScopes) { TEST_F(RNNAlgorithmTestHelper, CreateScopes) {
dop->SplitInputs(); dop->SplitInputs();
dop->CreateScopes(); dop->CreateScopes();
ASSERT_EQ(dop->cache_.num_steps, 4UL); ASSERT_EQ(dop->cache_.num_steps, 4UL);
ASSERT_EQ(dop->cache_.scopes->size(), 4UL); ASSERT_EQ(dop->cache_.scopes->size(), 4UL);
} }
TEST_F(DynamicRecurrentOpTestHelper, WriteStepInputs) { TEST_F(RNNAlgorithmTestHelper, WriteStepInputs) {
dop->SplitInputs(); dop->SplitInputs();
dop->CreateScopes(); dop->CreateScopes();
dop->WriteStepInputs(); dop->WriteStepInputs();
...@@ -173,7 +174,7 @@ TEST_F(DynamicRecurrentOpTestHelper, WriteStepInputs) { ...@@ -173,7 +174,7 @@ TEST_F(DynamicRecurrentOpTestHelper, WriteStepInputs) {
} }
} }
TEST_F(DynamicRecurrentOpTestHelper, WriteStepOutputs) { TEST_F(RNNAlgorithmTestHelper, WriteStepOutputs) {
dop->SplitInputs(); dop->SplitInputs();
dop->CreateScopes(); dop->CreateScopes();
dop->WriteStepInputs(); dop->WriteStepInputs();
...@@ -187,11 +188,12 @@ TEST_F(DynamicRecurrentOpTestHelper, WriteStepOutputs) { ...@@ -187,11 +188,12 @@ TEST_F(DynamicRecurrentOpTestHelper, WriteStepOutputs) {
} }
} }
TEST_F(DynamicRecurrentOpTestHelper, ConcatOutputs) { TEST_F(RNNAlgorithmTestHelper, ConcatOutputs) {
// Let's leave this test to python unittest. // Let's leave this test to python unittest.
} }
TEST_F(DynamicRecurrentOpTestHelper, InitStates) { TEST_F(RNNAlgorithmTestHelper, InitStates) {
dop->SetComputeMode(RNNAlgorithm::ComputeMode::kForward);
dop->SplitInputs(); dop->SplitInputs();
dop->CreateScopes(); dop->CreateScopes();
dop->WriteStepInputs(); dop->WriteStepInputs();
...@@ -208,12 +210,6 @@ TEST_F(DynamicRecurrentOpTestHelper, InitStates) { ...@@ -208,12 +210,6 @@ TEST_F(DynamicRecurrentOpTestHelper, InitStates) {
auto* boot_state = scope.FindVar("boot_mem"); auto* boot_state = scope.FindVar("boot_mem");
ASSERT_TRUE(boot_state != nullptr); ASSERT_TRUE(boot_state != nullptr);
if (step == 0) {
// check pre_state is a reference of boot_state
ASSERT_EQ(boot_state->Get<LoDTensor>().data<float>(),
pre_state->Get<LoDTensor>().data<float>());
}
} }
} }
......
...@@ -108,7 +108,7 @@ void ElementwiseCompute(const framework::ExecutionContext& ctx) { ...@@ -108,7 +108,7 @@ void ElementwiseCompute(const framework::ExecutionContext& ctx) {
PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(), PADDLE_ENFORCE_GE(x_dims.size(), y_dims.size(),
"Rank of first input must >= rank of second input.") "Rank of first input must >= rank of second input.")
if (x_dims == y_dims || product(y_dims) == 1) { if (x_dims == y_dims) {
functor f; functor f;
f.template Run<Place, T>(x, y, z, ctx); f.template Run<Place, T>(x, y, z, ctx);
return; return;
...@@ -174,12 +174,6 @@ void ElementwiseGradCompute(const framework::ExecutionContext& ctx) { ...@@ -174,12 +174,6 @@ void ElementwiseGradCompute(const framework::ExecutionContext& ctx) {
return; return;
} }
if (product(y_dims) == 1) {
functor1 f;
f(place, x, y, out, dx, dy, dout);
return;
}
int axis = ctx.Attr<int>("axis"); int axis = ctx.Attr<int>("axis");
axis = (axis == -1 ? x_dims.size() - y_dims.size() : axis); axis = (axis == -1 ? x_dims.size() - y_dims.size() : axis);
......
...@@ -26,8 +26,9 @@ class FeedOp : public framework::OperatorBase { ...@@ -26,8 +26,9 @@ class FeedOp : public framework::OperatorBase {
: OperatorBase(type, inputs, outputs, attrs) {} : OperatorBase(type, inputs, outputs, attrs) {}
void Run(const framework::Scope &scope, void Run(const framework::Scope &scope,
const platform::DeviceContext &dev_ctx) const override { const platform::DeviceContext &dev_ctx) const override {
auto feed_var_name = Input("Input"); auto feed_var_name = Input("X");
auto *feed_var = scope.FindVar(feed_var_name); auto *feed_var = scope.FindVar(feed_var_name);
PADDLE_ENFORCE(feed_var != nullptr, PADDLE_ENFORCE(feed_var != nullptr,
"Cannot find feed_var in scope, feed_var_name is %s", "Cannot find feed_var in scope, feed_var_name is %s",
feed_var_name); feed_var_name);
...@@ -40,18 +41,32 @@ class FeedOp : public framework::OperatorBase { ...@@ -40,18 +41,32 @@ class FeedOp : public framework::OperatorBase {
auto col = Attr<int>("col"); auto col = Attr<int>("col");
VLOG(3) << "Feed Var " << feed_var_name << "'s " << col << " column to var"
<< out_name;
auto &feed_list = feed_var->Get<framework::FeedFetchList>(); auto &feed_list = feed_var->Get<framework::FeedFetchList>();
auto &feed_item = feed_list.at(static_cast<size_t>(col)); auto &feed_item = feed_list.at(static_cast<size_t>(col));
auto *out_item = out_var->GetMutable<framework::FeedFetchType>(); auto *out_item = out_var->GetMutable<framework::FeedFetchType>();
out_item->CopyFromTensor(feed_item, dev_ctx.GetPlace(), dev_ctx); out_item->CopyFrom(feed_item, dev_ctx.GetPlace(), dev_ctx);
out_item->set_lod(feed_item.lod()); out_item->set_lod(feed_item.lod());
} }
}; };
class FeedOpInfoMaker : public framework::OpProtoAndCheckerMaker {
public:
FeedOpInfoMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of feed op");
AddOutput("Out", "The output of feed op");
AddComment("feed op, it should not be configured by users directly");
AddAttr<int>("col", "column of feed");
}
};
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
// We do not need to register OpInfoMaker,
// since feed operator will not be used by end users directly
REGISTER_OPERATOR(feed, paddle::operators::FeedOp, REGISTER_OPERATOR(feed, paddle::operators::FeedOp,
paddle::framework::EmptyGradOpMaker); paddle::framework::EmptyGradOpMaker,
paddle::operators::FeedOpInfoMaker);
...@@ -27,7 +27,7 @@ class FetchOp : public framework::OperatorBase { ...@@ -27,7 +27,7 @@ class FetchOp : public framework::OperatorBase {
void Run(const framework::Scope &scope, void Run(const framework::Scope &scope,
const platform::DeviceContext &dev_ctx) const override { const platform::DeviceContext &dev_ctx) const override {
auto fetch_var_name = Input("Input"); auto fetch_var_name = Input("X");
auto *fetch_var = scope.FindVar(fetch_var_name); auto *fetch_var = scope.FindVar(fetch_var_name);
PADDLE_ENFORCE(fetch_var != nullptr, PADDLE_ENFORCE(fetch_var != nullptr,
"Cannot find fetch variable in scope, fetch_var_name is %s", "Cannot find fetch variable in scope, fetch_var_name is %s",
...@@ -51,14 +51,26 @@ class FetchOp : public framework::OperatorBase { ...@@ -51,14 +51,26 @@ class FetchOp : public framework::OperatorBase {
// FIXME(yuyang18): Should we assume the fetch operator always generate // FIXME(yuyang18): Should we assume the fetch operator always generate
// CPU outputs? // CPU outputs?
dst_item.CopyFromTensor(src_item, platform::CPUPlace(), dev_ctx); dst_item.CopyFrom(src_item, platform::CPUPlace(), dev_ctx);
VLOG(3) << "Fetch variable " << fetch_var_name << " to " << out_name;
} }
}; };
class FetchOpInfoMaker : public framework::OpProtoAndCheckerMaker {
public:
FetchOpInfoMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of fetch op");
AddOutput("Out", "The output of fetch op");
AddComment("fetch op, it should not be configured by users directly");
AddAttr<int>("col", "column of fetch");
}
};
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
// We do not need to register OpInfoMaker,
// since fetch operator will not be used by end users directly
REGISTER_OPERATOR(fetch, paddle::operators::FetchOp, REGISTER_OPERATOR(fetch, paddle::operators::FetchOp,
paddle::framework::EmptyGradOpMaker); paddle::framework::EmptyGradOpMaker,
paddle::operators::FetchOpInfoMaker);
...@@ -59,7 +59,7 @@ class GaussianRandomOp : public framework::OperatorWithKernel { ...@@ -59,7 +59,7 @@ class GaussianRandomOp : public framework::OperatorWithKernel {
protected: protected:
framework::DataType IndicateDataType( framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override { const framework::ExecutionContext& ctx) const override {
return static_cast<framework::DataType>(Attr<int>("data_type")); return static_cast<framework::DataType>(ctx.Attr<int>("data_type"));
} }
}; };
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/increment_op.h"
namespace paddle {
namespace operators {
class IncrementOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;
void InferShape(framework::InferShapeContext *ctx) const override {
PADDLE_ENFORCE(ctx->HasInput("X"),
"Input(X) of IncrementOp should not be null.");
PADDLE_ENFORCE(ctx->HasOutput("Out"),
"Output(Out) of IncrementOp should not be null.");
ctx->SetOutputDim("Out", ctx->GetInputDim("X"));
ctx->ShareLoD("X", /*->*/ "Out");
}
};
template <typename AttrType>
class IncrementOpMaker : public framework::OpProtoAndCheckerMaker {
public:
IncrementOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) The input tensor of increment operator");
AddOutput("Out", "(Tensor) The output tensor of increment operator.");
AddComment(R"DOC(Increment operator
The equation is: Out = X + step
)DOC");
AddAttr<AttrType>("step",
"The step size by which the "
"input tensor will be incremented.")
.SetDefault(1.0);
}
};
class IncrementGradOpMaker : public framework::SingleGradOpDescMaker {
public:
using framework::SingleGradOpDescMaker::SingleGradOpDescMaker;
std::unique_ptr<framework::OpDescBind> Apply() const override {
auto *grad_op = new framework::OpDescBind();
grad_op->SetType("scale");
grad_op->SetInput("X", OutputGrad("Out"));
grad_op->SetOutput("Out", InputGrad("X"));
grad_op->SetAttr("scale", 1.0f);
return std::unique_ptr<framework::OpDescBind>(grad_op);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OPERATOR(increment, ops::IncrementOp, ops::IncrementOpMaker<float>,
ops::IncrementGradOpMaker);
REGISTER_OP_CPU_KERNEL(increment,
ops::IncrementKernel<paddle::platform::CPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/increment_op.h"
REGISTER_OP_GPU_KERNEL(
increment,
paddle::operators::IncrementKernel<paddle::platform::GPUPlace, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h"
namespace paddle {
namespace operators {
template <typename Place, typename T, typename AttrType = T>
class IncrementKernel : public framework::OpKernel<T> {
public:
virtual void Compute(const framework::ExecutionContext& context) const {
auto* tensor = context.Output<framework::Tensor>("Out");
auto* in = context.Input<framework::Tensor>("X");
tensor->mutable_data<T>(in->place());
auto step = static_cast<T>(context.Attr<AttrType>("step"));
auto eigen_out = framework::EigenVector<T>::Flatten(*tensor);
auto eigen_in = framework::EigenVector<T>::Flatten(*in);
auto& place = context.GetEigenDevice<Place>();
eigen_out.device(place) = eigen_in + step;
}
};
} // namespace operators
} // namespace paddle
...@@ -91,17 +91,17 @@ class LSTMKernel : public framework::OpKernel<T> { ...@@ -91,17 +91,17 @@ class LSTMKernel : public framework::OpKernel<T> {
int bstart = static_cast<int>(batch_starts[n]); int bstart = static_cast<int>(batch_starts[n]);
int bend = static_cast<int>(batch_starts[n + 1]); int bend = static_cast<int>(batch_starts[n + 1]);
Tensor gate_t = batch_gate->Slice<T>(bstart, bend); Tensor gate_t = batch_gate->Slice(bstart, bend);
Tensor out_t = batch_out.Slice<T>(bstart, bend); Tensor out_t = batch_out.Slice(bstart, bend);
Tensor cell_t = batch_cell.Slice<T>(bstart, bend); Tensor cell_t = batch_cell.Slice(bstart, bend);
Tensor cell_pre_act_t = batch_cell_pre_act.Slice<T>(bstart, bend); Tensor cell_pre_act_t = batch_cell_pre_act.Slice(bstart, bend);
int cur_batch_size = bend - bstart; int cur_batch_size = bend - bstart;
if (n != 0) { if (n != 0) {
int pre_h_start = static_cast<int>(batch_starts[n - 1]); int pre_h_start = static_cast<int>(batch_starts[n - 1]);
int pre_h_end = pre_h_start + cur_batch_size; int pre_h_end = pre_h_start + cur_batch_size;
auto pre_hidden_t = batch_out.Slice<T>(pre_h_start, pre_h_end); auto pre_hidden_t = batch_out.Slice(pre_h_start, pre_h_end);
math::matmul<Place, T>(ctx.device_context(), pre_hidden_t, false, math::matmul<Place, T>(ctx.device_context(), pre_hidden_t, false,
*weight, false, static_cast<T>(1.0), &gate_t, *weight, false, static_cast<T>(1.0), &gate_t,
static_cast<T>(1.0)); static_cast<T>(1.0));
......
...@@ -64,7 +64,7 @@ void testIm2col() { ...@@ -64,7 +64,7 @@ void testIm2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
input = input_tmp; input = input_tmp;
} else { } else {
input.CopyFrom<float>(input_tmp, *place, *context); input.CopyFrom(input_tmp, *place, *context);
} }
output_cfo.mutable_data<float>( output_cfo.mutable_data<float>(
{1, filter_size, filter_size, output_height, output_width}, *place); {1, filter_size, filter_size, output_height, output_width}, *place);
...@@ -85,8 +85,7 @@ void testIm2col() { ...@@ -85,8 +85,7 @@ void testIm2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
out_cfo_ptr = output_cfo.data<float>(); out_cfo_ptr = output_cfo.data<float>();
} else { } else {
output_tmp.CopyFrom<float>(output_cfo, paddle::platform::CPUPlace(), output_tmp.CopyFrom(output_cfo, paddle::platform::CPUPlace(), *context);
*context);
out_cfo_ptr = output_tmp.data<float>(); out_cfo_ptr = output_tmp.data<float>();
} }
EXPECT_EQ(out_cfo_ptr[0], 0); EXPECT_EQ(out_cfo_ptr[0], 0);
...@@ -102,8 +101,7 @@ void testIm2col() { ...@@ -102,8 +101,7 @@ void testIm2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
out_ocf_ptr = output_ocf.data<float>(); out_ocf_ptr = output_ocf.data<float>();
} else { } else {
output_tmp.CopyFrom<float>(output_ocf, paddle::platform::CPUPlace(), output_tmp.CopyFrom(output_ocf, paddle::platform::CPUPlace(), *context);
*context);
out_ocf_ptr = output_tmp.data<float>(); out_ocf_ptr = output_tmp.data<float>();
} }
EXPECT_EQ(out_ocf_ptr[0], 0); EXPECT_EQ(out_ocf_ptr[0], 0);
......
...@@ -16,15 +16,15 @@ TEST(math_function, notrans_mul_trans) { ...@@ -16,15 +16,15 @@ TEST(math_function, notrans_mul_trans) {
auto* gpu_place = new paddle::platform::GPUPlace(0); auto* gpu_place = new paddle::platform::GPUPlace(0);
paddle::platform::CUDADeviceContext context(*gpu_place); paddle::platform::CUDADeviceContext context(*gpu_place);
input1_gpu.CopyFrom<float>(input1, *gpu_place, context); input1_gpu.CopyFrom(input1, *gpu_place, context);
input2_gpu.CopyFrom<float>(input1, *gpu_place, context); input2_gpu.CopyFrom(input1, *gpu_place, context);
out_gpu.mutable_data<float>({2, 2}, *gpu_place); out_gpu.mutable_data<float>({2, 2}, *gpu_place);
paddle::operators::math::matmul<paddle::platform::GPUPlace, float>( paddle::operators::math::matmul<paddle::platform::GPUPlace, float>(
context, input1_gpu, false, input2_gpu, true, 1, &out_gpu, 0); context, input1_gpu, false, input2_gpu, true, 1, &out_gpu, 0);
out.CopyFrom<float>(out_gpu, *cpu_place, context); out.CopyFrom(out_gpu, *cpu_place, context);
float* out_ptr = out.data<float>(); float* out_ptr = out.data<float>();
context.Wait(); context.Wait();
...@@ -50,15 +50,15 @@ TEST(math_function, trans_mul_notrans) { ...@@ -50,15 +50,15 @@ TEST(math_function, trans_mul_notrans) {
auto* gpu_place = new paddle::platform::GPUPlace(0); auto* gpu_place = new paddle::platform::GPUPlace(0);
paddle::platform::CUDADeviceContext context(*gpu_place); paddle::platform::CUDADeviceContext context(*gpu_place);
input1_gpu.CopyFrom<float>(input1, *gpu_place, context); input1_gpu.CopyFrom(input1, *gpu_place, context);
input2_gpu.CopyFrom<float>(input1, *gpu_place, context); input2_gpu.CopyFrom(input1, *gpu_place, context);
out_gpu.mutable_data<float>({3, 3}, *gpu_place); out_gpu.mutable_data<float>({3, 3}, *gpu_place);
paddle::operators::math::matmul<paddle::platform::GPUPlace, float>( paddle::operators::math::matmul<paddle::platform::GPUPlace, float>(
context, input1_gpu, true, input2_gpu, false, 1, &out_gpu, 0); context, input1_gpu, true, input2_gpu, false, 1, &out_gpu, 0);
out.CopyFrom<float>(out_gpu, *cpu_place, context); out.CopyFrom(out_gpu, *cpu_place, context);
float* out_ptr = out.data<float>(); float* out_ptr = out.data<float>();
context.Wait(); context.Wait();
...@@ -99,9 +99,9 @@ TEST(math_function, gemm_notrans_cublas) { ...@@ -99,9 +99,9 @@ TEST(math_function, gemm_notrans_cublas) {
auto* gpu_place = new paddle::platform::GPUPlace(0); auto* gpu_place = new paddle::platform::GPUPlace(0);
paddle::platform::CUDADeviceContext context(*gpu_place); paddle::platform::CUDADeviceContext context(*gpu_place);
input1_gpu.CopyFrom<float>(input1, *gpu_place, context); input1_gpu.CopyFrom(input1, *gpu_place, context);
input2_gpu.CopyFrom<float>(input2, *gpu_place, context); input2_gpu.CopyFrom(input2, *gpu_place, context);
input3_gpu.CopyFrom<float>(input3, *gpu_place, context); input3_gpu.CopyFrom(input3, *gpu_place, context);
float* a = input1_gpu.data<float>(); float* a = input1_gpu.data<float>();
float* b = input2_gpu.data<float>(); float* b = input2_gpu.data<float>();
float* c = input3_gpu.mutable_data<float>(*gpu_place); float* c = input3_gpu.mutable_data<float>(*gpu_place);
...@@ -109,7 +109,7 @@ TEST(math_function, gemm_notrans_cublas) { ...@@ -109,7 +109,7 @@ TEST(math_function, gemm_notrans_cublas) {
paddle::operators::math::gemm<paddle::platform::GPUPlace, float>( paddle::operators::math::gemm<paddle::platform::GPUPlace, float>(
context, false, false, m, n, k, 1, a, 3, b + 1, 4, 1, c + 1, 4); context, false, false, m, n, k, 1, a, 3, b + 1, 4, 1, c + 1, 4);
input3.CopyFrom<float>(input3_gpu, *cpu_place, context); input3.CopyFrom(input3_gpu, *cpu_place, context);
// numpy code: // numpy code:
// a = np.arange(6).reshape(2, 3) // a = np.arange(6).reshape(2, 3)
...@@ -154,9 +154,9 @@ TEST(math_function, gemm_trans_cublas) { ...@@ -154,9 +154,9 @@ TEST(math_function, gemm_trans_cublas) {
auto* gpu_place = new paddle::platform::GPUPlace(0); auto* gpu_place = new paddle::platform::GPUPlace(0);
paddle::platform::CUDADeviceContext context(*gpu_place); paddle::platform::CUDADeviceContext context(*gpu_place);
input1_gpu.CopyFrom<float>(input1, *gpu_place, context); input1_gpu.CopyFrom(input1, *gpu_place, context);
input2_gpu.CopyFrom<float>(input2, *gpu_place, context); input2_gpu.CopyFrom(input2, *gpu_place, context);
input3_gpu.CopyFrom<float>(input3, *gpu_place, context); input3_gpu.CopyFrom(input3, *gpu_place, context);
float* a = input1_gpu.data<float>(); float* a = input1_gpu.data<float>();
float* b = input2_gpu.data<float>(); float* b = input2_gpu.data<float>();
float* c = input3_gpu.mutable_data<float>(*gpu_place); float* c = input3_gpu.mutable_data<float>(*gpu_place);
...@@ -164,7 +164,7 @@ TEST(math_function, gemm_trans_cublas) { ...@@ -164,7 +164,7 @@ TEST(math_function, gemm_trans_cublas) {
paddle::operators::math::gemm<paddle::platform::GPUPlace, float>( paddle::operators::math::gemm<paddle::platform::GPUPlace, float>(
context, false, true, m, n, k, 1, a, 3, b + 3, 3, 1, c + 1, 4); context, false, true, m, n, k, 1, a, 3, b + 3, 3, 1, c + 1, 4);
input3.CopyFrom<float>(input3_gpu, *cpu_place, context); input3.CopyFrom(input3_gpu, *cpu_place, context);
context.Wait(); context.Wait();
EXPECT_EQ(input3_ptr[0], 0); EXPECT_EQ(input3_ptr[0], 0);
......
...@@ -67,7 +67,7 @@ TEST(selected_rows_functor, gpu_add) { ...@@ -67,7 +67,7 @@ TEST(selected_rows_functor, gpu_add) {
EXPECT_EQ(out_rows[6], 9); EXPECT_EQ(out_rows[6], 9);
Tensor out_cpu; Tensor out_cpu;
out_cpu.CopyFrom<float>(*out_value, cpu_place, ctx); out_cpu.CopyFrom(*out_value, cpu_place, ctx);
ctx.Wait(); ctx.Wait();
auto* out_cpu_data = out_cpu.data<float>(); auto* out_cpu_data = out_cpu.data<float>();
...@@ -94,7 +94,7 @@ TEST(selected_rows_functor, gpu_add) { ...@@ -94,7 +94,7 @@ TEST(selected_rows_functor, gpu_add) {
add_tensor_functor(ctx, *output, *tensor1, tensor2.get()); add_tensor_functor(ctx, *output, *tensor1, tensor2.get());
Tensor tensor2_cpu; Tensor tensor2_cpu;
tensor2_cpu.CopyFrom<float>(*tensor2, cpu_place, ctx); tensor2_cpu.CopyFrom(*tensor2, cpu_place, ctx);
ctx.Wait(); ctx.Wait();
auto* tensor2_cpu_data = tensor2_cpu.data<float>(); auto* tensor2_cpu_data = tensor2_cpu.data<float>();
......
...@@ -107,11 +107,8 @@ class LoDTensor2BatchFunctor { ...@@ -107,11 +107,8 @@ class LoDTensor2BatchFunctor {
size_t seq_len = seq_info[i].length; size_t seq_len = seq_info[i].length;
int start = seq_info[i].start; int start = seq_info[i].start;
if (n < seq_len) { if (n < seq_len) {
if (!is_reverse) { seq2batch_idx[batch_id] =
seq2batch_idx[batch_id] = start + n; is_reverse ? start + seq_len - 1 - n : start + n;
} else {
seq2batch_idx[batch_id] = start + seq_len - 1 - n;
}
batch_id++; batch_id++;
} else { } else {
break; break;
......
...@@ -78,7 +78,7 @@ void testVol2col() { ...@@ -78,7 +78,7 @@ void testVol2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
input = input_tmp; input = input_tmp;
} else { } else {
input.CopyFrom<float>(input_tmp, *place, *context); input.CopyFrom(input_tmp, *place, *context);
} }
output.mutable_data<float>({1, filter_size, filter_size, filter_size, output.mutable_data<float>({1, filter_size, filter_size, filter_size,
output_depth, output_height, output_width}, output_depth, output_height, output_width},
...@@ -93,7 +93,7 @@ void testVol2col() { ...@@ -93,7 +93,7 @@ void testVol2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
out_cfo_ptr = output.data<float>(); out_cfo_ptr = output.data<float>();
} else { } else {
output_tmp.CopyFrom<float>(output, paddle::platform::CPUPlace(), *context); output_tmp.CopyFrom(output, paddle::platform::CPUPlace(), *context);
out_cfo_ptr = output_tmp.data<float>(); out_cfo_ptr = output_tmp.data<float>();
} }
...@@ -107,7 +107,7 @@ void testVol2col() { ...@@ -107,7 +107,7 @@ void testVol2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
input = input_tmp; input = input_tmp;
} else { } else {
input.CopyFrom<float>(input_tmp, *place, *context); input.CopyFrom(input_tmp, *place, *context);
} }
paddle::operators::math::Col2VolFunctor<Place, float> col2vol; paddle::operators::math::Col2VolFunctor<Place, float> col2vol;
...@@ -118,7 +118,7 @@ void testVol2col() { ...@@ -118,7 +118,7 @@ void testVol2col() {
if (paddle::platform::is_cpu_place(*place)) { if (paddle::platform::is_cpu_place(*place)) {
in_ptr = input.data<float>(); in_ptr = input.data<float>();
} else { } else {
input_tmp.CopyFrom<float>(input, paddle::platform::CPUPlace(), *context); input_tmp.CopyFrom(input, paddle::platform::CPUPlace(), *context);
in_ptr = input_tmp.data<float>(); in_ptr = input_tmp.data<float>();
} }
......
...@@ -46,7 +46,7 @@ class MatMulKernel : public framework::OpKernel<T> { ...@@ -46,7 +46,7 @@ class MatMulKernel : public framework::OpKernel<T> {
template <typename T> template <typename T>
inline Tensor Reshape(const Tensor& input, const DDim& dims) { inline Tensor Reshape(const Tensor& input, const DDim& dims) {
Tensor output; Tensor output;
output.ShareDataWith<T>(input); output.ShareDataWith(input);
output.Resize(dims); output.Resize(dims);
return output; return output;
} }
...@@ -56,7 +56,7 @@ inline Tensor Reshape(const Tensor& input, const DDim& dims) { ...@@ -56,7 +56,7 @@ inline Tensor Reshape(const Tensor& input, const DDim& dims) {
template <typename T> template <typename T>
Tensor CombineBatchAndM(const Tensor& input) { Tensor CombineBatchAndM(const Tensor& input) {
Tensor output; Tensor output;
output.ShareDataWith<T>(input); output.ShareDataWith(input);
auto in_dims = input.dims(); auto in_dims = input.dims();
if (in_dims.size() == 3) { if (in_dims.size() == 3) {
std::vector<int64_t> out_dims = {in_dims[0] * in_dims[1], in_dims[2]}; std::vector<int64_t> out_dims = {in_dims[0] * in_dims[1], in_dims[2]};
...@@ -80,7 +80,7 @@ Tensor CombineBatchAndN(const framework::ExecutionContext& context, ...@@ -80,7 +80,7 @@ Tensor CombineBatchAndN(const framework::ExecutionContext& context,
std::vector<int64_t> out_dims = {in_dims[1], in_dims[0] * in_dims[2]}; std::vector<int64_t> out_dims = {in_dims[1], in_dims[0] * in_dims[2]};
output.Resize(make_ddim(out_dims)); output.Resize(make_ddim(out_dims));
} else { } else {
output.ShareDataWith<T>(input); output.ShareDataWith(input);
} }
return output; return output;
} }
......
...@@ -75,12 +75,17 @@ class MomentumOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -75,12 +75,17 @@ class MomentumOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("VelocityOut", "(Tensor) Output updated velocity"); AddOutput("VelocityOut", "(Tensor) Output updated velocity");
AddAttr<float>("mu", "(float) Momentum coefficient"); AddAttr<float>("mu", "(float) Momentum coefficient");
AddAttr<bool>("useNesterov", "(bool) Use Nesterov Momentum")
.SetDefault(false);
AddComment(R"DOC( AddComment(R"DOC(
Momentum Algorithm (momentum). Momentum Algorithm with a flag for Nestrov Moemntum (momentum).
velocity = mu * velocity + gradient velocity = mu * velocity + gradient
param = param - learning_rate * velocity if (use_nesterov):
param = param - gradient * learning_rate + mu * velocity * learning_rate
else:
param = param - learning_rate * velocity
)DOC"); )DOC");
} }
......
...@@ -34,6 +34,7 @@ class MomentumOpKernel : public framework::OpKernel<T> { ...@@ -34,6 +34,7 @@ class MomentumOpKernel : public framework::OpKernel<T> {
velocity_out->mutable_data<T>(ctx.GetPlace()); velocity_out->mutable_data<T>(ctx.GetPlace());
float mu = ctx.Attr<float>("mu"); float mu = ctx.Attr<float>("mu");
bool use_nesterov = ctx.Attr<bool>("useNesterov");
auto p_out = framework::EigenVector<T>::Flatten(*param_out); auto p_out = framework::EigenVector<T>::Flatten(*param_out);
auto v_out = framework::EigenVector<T>::Flatten(*velocity_out); auto v_out = framework::EigenVector<T>::Flatten(*velocity_out);
...@@ -46,9 +47,15 @@ class MomentumOpKernel : public framework::OpKernel<T> { ...@@ -46,9 +47,15 @@ class MomentumOpKernel : public framework::OpKernel<T> {
auto place = ctx.GetEigenDevice<Place>(); auto place = ctx.GetEigenDevice<Place>();
Eigen::DSizes<int, 1> grad_dsize(grad->numel()); Eigen::DSizes<int, 1> grad_dsize(grad->numel());
v_out.device(place) = v * mu + g; v_out.device(place) = v * mu + g;
if (use_nesterov) {
p_out.device(place) = p - g * lr.broadcast(grad_dsize) +
v_out * mu * lr.broadcast(grad_dsize);
} else {
p_out.device(place) = p - lr.broadcast(grad_dsize) * v_out; p_out.device(place) = p - lr.broadcast(grad_dsize) * v_out;
} }
}
}; };
} // namespace operators } // namespace operators
......
...@@ -36,12 +36,12 @@ class MulKernel : public framework::OpKernel<T> { ...@@ -36,12 +36,12 @@ class MulKernel : public framework::OpKernel<T> {
Tensor* z = context.Output<Tensor>("Out"); Tensor* z = context.Output<Tensor>("Out");
const Tensor x_matrix = const Tensor x_matrix =
x->dims().size() > 2 x->dims().size() > 2
? framework::ReshapeToMatrix<T>( ? framework::ReshapeToMatrix(
*x, context.template Attr<int>("x_num_col_dims")) *x, context.template Attr<int>("x_num_col_dims"))
: *x; : *x;
const Tensor y_matrix = const Tensor y_matrix =
y->dims().size() > 2 y->dims().size() > 2
? framework::ReshapeToMatrix<T>( ? framework::ReshapeToMatrix(
*y, context.template Attr<int>("y_num_col_dims")) *y, context.template Attr<int>("y_num_col_dims"))
: *y; : *y;
...@@ -59,11 +59,11 @@ class MulGradKernel : public framework::OpKernel<T> { ...@@ -59,11 +59,11 @@ class MulGradKernel : public framework::OpKernel<T> {
int y_num_col_dims = ctx.template Attr<int>("y_num_col_dims"); int y_num_col_dims = ctx.template Attr<int>("y_num_col_dims");
const Tensor* x = ctx.Input<Tensor>("X"); const Tensor* x = ctx.Input<Tensor>("X");
const Tensor* y = ctx.Input<Tensor>("Y"); const Tensor* y = ctx.Input<Tensor>("Y");
const Tensor x_matrix = const Tensor x_matrix = x->dims().size() > 2
x->dims().size() > 2 ? framework::ReshapeToMatrix<T>(*x, x_num_col_dims) ? framework::ReshapeToMatrix(*x, x_num_col_dims)
: *x; : *x;
const Tensor y_matrix = const Tensor y_matrix = y->dims().size() > 2
y->dims().size() > 2 ? framework::ReshapeToMatrix<T>(*y, y_num_col_dims) ? framework::ReshapeToMatrix(*y, y_num_col_dims)
: *y; : *y;
const Tensor* dout = ctx.Input<Tensor>(framework::GradVarName("Out")); const Tensor* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
...@@ -71,8 +71,8 @@ class MulGradKernel : public framework::OpKernel<T> { ...@@ -71,8 +71,8 @@ class MulGradKernel : public framework::OpKernel<T> {
Tensor* dy = ctx.Output<Tensor>(framework::GradVarName("Y")); Tensor* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
if (dx) { if (dx) {
dx->mutable_data<T>(ctx.GetPlace()); dx->mutable_data<T>(ctx.GetPlace());
Tensor dx_matrix = dx->dims().size() > 2 ? framework::ReshapeToMatrix<T>( Tensor dx_matrix = dx->dims().size() > 2
*dx, x_num_col_dims) ? framework::ReshapeToMatrix(*dx, x_num_col_dims)
: *dx; : *dx;
// dx = dout * y'. dx: M x K, dout : M x N, y : K x N // dx = dout * y'. dx: M x K, dout : M x N, y : K x N
math::matmul<Place, T>(ctx.device_context(), *dout, false, y_matrix, true, math::matmul<Place, T>(ctx.device_context(), *dout, false, y_matrix, true,
...@@ -80,8 +80,8 @@ class MulGradKernel : public framework::OpKernel<T> { ...@@ -80,8 +80,8 @@ class MulGradKernel : public framework::OpKernel<T> {
} }
if (dy) { if (dy) {
dy->mutable_data<T>(ctx.GetPlace()); dy->mutable_data<T>(ctx.GetPlace());
Tensor dy_matrix = dy->dims().size() > 2 ? framework::ReshapeToMatrix<T>( Tensor dy_matrix = dy->dims().size() > 2
*dy, y_num_col_dims) ? framework::ReshapeToMatrix(*dy, y_num_col_dims)
: *dy; : *dy;
// dy = x' * dout. dy K x N, dout : M x N, x : M x K // dy = x' * dout. dy K x N, dout : M x N, x : M x K
math::matmul<Place, T>(ctx.device_context(), x_matrix, true, *dout, false, math::matmul<Place, T>(ctx.device_context(), x_matrix, true, *dout, false,
......
...@@ -33,8 +33,7 @@ class MultiplexGPUKernel : public framework::OpKernel<T> { ...@@ -33,8 +33,7 @@ class MultiplexGPUKernel : public framework::OpKernel<T> {
auto cols = ins[0]->numel() / rows; auto cols = ins[0]->numel() / rows;
// copy index to cpu // copy index to cpu
Tensor index_t_cpu; Tensor index_t_cpu;
index_t_cpu.CopyFrom<int32_t>(*ids, platform::CPUPlace(), index_t_cpu.CopyFrom(*ids, platform::CPUPlace(), ctx.device_context());
ctx.device_context());
auto* index = index_t_cpu.data<int32_t>(); auto* index = index_t_cpu.data<int32_t>();
auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context()) ctx.device_context())
...@@ -71,8 +70,7 @@ class MultiplexGradGPUKernel : public framework::OpKernel<T> { ...@@ -71,8 +70,7 @@ class MultiplexGradGPUKernel : public framework::OpKernel<T> {
auto cols = ins[0]->numel() / rows; auto cols = ins[0]->numel() / rows;
// copy index to cpu // copy index to cpu
Tensor index_t_cpu; Tensor index_t_cpu;
index_t_cpu.CopyFrom<int32_t>(*ids, platform::CPUPlace(), index_t_cpu.CopyFrom(*ids, platform::CPUPlace(), ctx.device_context());
ctx.device_context());
auto* index = index_t_cpu.data<int32_t>(); auto* index = index_t_cpu.data<int32_t>();
auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
......
...@@ -42,7 +42,7 @@ void RecurrentAlgorithm::Run(const Scope& scope, ...@@ -42,7 +42,7 @@ void RecurrentAlgorithm::Run(const Scope& scope,
for (size_t step_id = 0; step_id < seq_len; step_id++) { for (size_t step_id = 0; step_id < seq_len; step_id++) {
if (step_id > 0) { if (step_id > 0) {
rnn::LinkMemories(step_scopes, arg_->memories, step_id, -1); rnn::LinkMemories(step_scopes, arg_->states, step_id, -1);
} }
(*stepnet_)->Run(*step_scopes[step_id], dev_ctx); (*stepnet_)->Run(*step_scopes[step_id], dev_ctx);
} }
...@@ -59,7 +59,8 @@ void RecurrentAlgorithm::CreateScopes(const Scope& scope, ...@@ -59,7 +59,8 @@ void RecurrentAlgorithm::CreateScopes(const Scope& scope,
// Now all variables in scope must be created outside of op. // Now all variables in scope must be created outside of op.
PADDLE_ENFORCE_NOT_NULL(stepnet_); PADDLE_ENFORCE_NOT_NULL(stepnet_);
PADDLE_ENFORCE(!(*stepnet_)->Outputs().empty(), "stepnet_ op has no outputs"); PADDLE_ENFORCE(!(*stepnet_)->Outputs().empty(),
"step_unit_ op has no outputs");
if (seq_len > step_scopes->size()) { if (seq_len > step_scopes->size()) {
for (size_t i = step_scopes->size(); i < seq_len; ++i) { for (size_t i = step_scopes->size(); i < seq_len; ++i) {
...@@ -86,7 +87,7 @@ void RecurrentAlgorithm::CreateScopes(const Scope& scope, ...@@ -86,7 +87,7 @@ void RecurrentAlgorithm::CreateScopes(const Scope& scope,
} }
void RecurrentAlgorithm::InitMemories(Scope* step_scope) const { void RecurrentAlgorithm::InitMemories(Scope* step_scope) const {
for (auto& attr : arg_->memories) { for (auto& attr : arg_->states) {
auto* pre_mem = step_scope->Var(attr.pre_var)->GetMutable<LoDTensor>(); auto* pre_mem = step_scope->Var(attr.pre_var)->GetMutable<LoDTensor>();
PADDLE_ENFORCE(step_scope->FindVar(attr.boot_var) != nullptr, PADDLE_ENFORCE(step_scope->FindVar(attr.boot_var) != nullptr,
"memory [%s]'s boot variable [%s] not exists", attr.var, "memory [%s]'s boot variable [%s] not exists", attr.var,
...@@ -95,17 +96,17 @@ void RecurrentAlgorithm::InitMemories(Scope* step_scope) const { ...@@ -95,17 +96,17 @@ void RecurrentAlgorithm::InitMemories(Scope* step_scope) const {
step_scope->FindVar(attr.boot_var)->GetMutable<LoDTensor>(); step_scope->FindVar(attr.boot_var)->GetMutable<LoDTensor>();
pre_mem->Resize(boot_mem->dims()); pre_mem->Resize(boot_mem->dims());
PADDLE_ENFORCE_EQ(pre_mem->dims().size(), 2); PADDLE_ENFORCE_EQ(pre_mem->dims().size(), 2);
pre_mem->ShareDataWith<float>(*boot_mem); pre_mem->ShareDataWith(*boot_mem);
} }
} }
const rnn::ArgumentName RecurrentOp::kArgName{ const rnn::ArgumentName RecurrentOp::kArgName{
"step_net", "step_scopes", "inlinks", "outlinks", "step_net", "step_scopes", "inputs", "outputs",
"memories", "pre_memories", "boot_memories"}; "states", "ex_states", "initial_states"};
const rnn::ArgumentName RecurrentGradientOp::kArgName{ const rnn::ArgumentName RecurrentGradientOp::kArgName{
"step_net", "step_scopes@GRAD", "outlinks@GRAD", "inlinks@GRAD", "step_net", "step_scopes@GRAD", "outputs@GRAD", "inputs@GRAD",
"memories", "pre_memories", "boot_memories@GRAD"}; "states", "ex_states", "initial_states@GRAD"};
RecurrentOp::RecurrentOp(const std::string& type, RecurrentOp::RecurrentOp(const std::string& type,
const framework::VariableNameMap& inputs, const framework::VariableNameMap& inputs,
...@@ -127,7 +128,7 @@ class RecurrentAlgorithmProtoAndCheckerMaker ...@@ -127,7 +128,7 @@ class RecurrentAlgorithmProtoAndCheckerMaker
AddInput(name.inlinks, AddInput(name.inlinks,
"the inputs that need to be segmented for each step.") "the inputs that need to be segmented for each step.")
.AsDuplicable(); .AsDuplicable();
AddInput(name.boot_memories, "variables to initialize memories.") AddInput(name.initial_states, "variables to initialize states.")
.AsDuplicable(); .AsDuplicable();
AddOutput(name.outlinks, "the outputs that need to concated for all steps.") AddOutput(name.outlinks, "the outputs that need to concated for all steps.")
...@@ -135,9 +136,8 @@ class RecurrentAlgorithmProtoAndCheckerMaker ...@@ -135,9 +136,8 @@ class RecurrentAlgorithmProtoAndCheckerMaker
AddOutput(name.step_scopes, "step scopes"); AddOutput(name.step_scopes, "step scopes");
// Attributes stored in AttributeMap // Attributes stored in AttributeMap
AddAttr<std::vector<std::string>>(name.pre_memories, AddAttr<std::vector<std::string>>(name.ex_states, "names of pre-states");
"names of pre-memories"); AddAttr<std::vector<std::string>>(name.states, "names of states");
AddAttr<std::vector<std::string>>(name.memories, "names of memories");
AddComment("This is a recurrent group operator."); AddComment("This is a recurrent group operator.");
} }
...@@ -152,7 +152,7 @@ void RecurrentGradientAlgorithm::Run( ...@@ -152,7 +152,7 @@ void RecurrentGradientAlgorithm::Run(
rnn::SegmentInputs(step_scopes, arg_->inlinks, seq_len); rnn::SegmentInputs(step_scopes, arg_->inlinks, seq_len);
for (int step_id = seq_len - 1; step_id >= 0; --step_id) { for (int step_id = seq_len - 1; step_id >= 0; --step_id) {
if (static_cast<size_t>(step_id) != seq_len - 1) { if (static_cast<size_t>(step_id) != seq_len - 1) {
rnn::LinkMemories(step_scopes, arg_->memories, step_id, 1); rnn::LinkMemories(step_scopes, arg_->states, step_id, 1);
} }
(*stepnet_)->Run(*step_scopes[step_id], dev_ctx); (*stepnet_)->Run(*step_scopes[step_id], dev_ctx);
} }
...@@ -162,7 +162,7 @@ void RecurrentGradientAlgorithm::Run( ...@@ -162,7 +162,7 @@ void RecurrentGradientAlgorithm::Run(
void RecurrentGradientAlgorithm::LinkBootMemoryGradients( void RecurrentGradientAlgorithm::LinkBootMemoryGradients(
Scope* step_scope) const { Scope* step_scope) const {
for (auto& attr : arg_->memories) { for (auto& attr : arg_->states) {
PADDLE_ENFORCE(step_scope->FindVar(attr.var) != nullptr, PADDLE_ENFORCE(step_scope->FindVar(attr.var) != nullptr,
"memory variable [%s] does not exists", attr.var); "memory variable [%s] does not exists", attr.var);
PADDLE_ENFORCE(step_scope->FindVar(attr.boot_var) != nullptr, PADDLE_ENFORCE(step_scope->FindVar(attr.boot_var) != nullptr,
...@@ -171,7 +171,7 @@ void RecurrentGradientAlgorithm::LinkBootMemoryGradients( ...@@ -171,7 +171,7 @@ void RecurrentGradientAlgorithm::LinkBootMemoryGradients(
auto* boot_mem_grad = auto* boot_mem_grad =
step_scope->Var(attr.boot_var)->GetMutable<LoDTensor>(); step_scope->Var(attr.boot_var)->GetMutable<LoDTensor>();
boot_mem_grad->Resize(mem_grad->dims()); boot_mem_grad->Resize(mem_grad->dims());
boot_mem_grad->ShareDataWith<float>(*mem_grad); boot_mem_grad->ShareDataWith(*mem_grad);
} }
} }
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
limitations under the License. */ limitations under the License. */
#include "paddle/operators/reduce_op.h" #include "paddle/operators/reduce_op.h"
#include "paddle/operators/net_op.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
...@@ -159,6 +160,66 @@ class ReduceMinOpMaker : public ReduceOpMaker { ...@@ -159,6 +160,66 @@ class ReduceMinOpMaker : public ReduceOpMaker {
} }
}; };
class NormOp : public NetOp {
public:
NormOp(const std::string &type, const framework::VariableNameMap &inputs,
const framework::VariableNameMap &outputs,
const framework::AttributeMap &attrs)
: NetOp(type, inputs, outputs, attrs) {
PADDLE_ENFORCE_NE(Input("X"), framework::kEmptyVarName,
"Input(X) of NormOp should not be null.");
PADDLE_ENFORCE_NE(Output("AbsOut"), framework::kEmptyVarName,
"Output(AbsOut) of NormOp should not be null.");
PADDLE_ENFORCE_NE(Output("PowOut"), framework::kEmptyVarName,
"Output(PowOut) of NormOp should not be null.");
PADDLE_ENFORCE_NE(Output("SumOut"), framework::kEmptyVarName,
"Output(SumOut) of NormOp should not be null.");
PADDLE_ENFORCE_NE(Output("Out"), framework::kEmptyVarName,
"Output(Out) of NormOp should not be null.");
auto dim = Attr<int>("dim");
auto keep_dim = Attr<bool>("keep_dim");
auto p = Attr<float>("p");
PADDLE_ENFORCE_GT(p, 0, "Order of the norm should be positive.");
AppendOp(framework::OpRegistry::CreateOp("abs", {{"X", {Input("X")}}},
{{"Y", {Output("AbsOut")}}}, {}));
AppendOp(framework::OpRegistry::CreateOp("pow", {{"X", {Output("AbsOut")}}},
{{"Y", {Output("PowOut")}}},
{{"factor", p}}));
framework::AttributeMap sum_attr;
sum_attr["dim"] = dim;
sum_attr["keep_dim"] = keep_dim;
AppendOp(framework::OpRegistry::CreateOp(
"reduce_sum", {{"X", {Output("PowOut")}}},
{{"Out", {Output("SumOut")}}}, sum_attr));
AppendOp(framework::OpRegistry::CreateOp(
"pow", {{"X", {Output("SumOut")}}}, {{"Y", {Output("Out")}}},
{{"factor", static_cast<float>(1. / p)}}));
CompleteAddOp(false);
}
};
class NormOpMaker : public ReduceOpMaker {
public:
NormOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: ReduceOpMaker(proto, op_checker) {
AddOutput("AbsOut",
"(Tensor) The intermediate output of Norm operator, "
"saving the absolute value of the input tensor X.")
.AsIntermediate();
AddOutput("PowOut",
"(Tensor) The intermediate output of Norm operator, "
"saving the p-th power of the output tensor AbsOut.")
.AsIntermediate();
AddOutput("SumOut",
"(Tensor) the intermediate output of Norm operator, "
"saving the sum of PowOut reduced on the given dimension.")
.AsIntermediate();
AddAttr<float>("p", "(float, default 2) The order of Norm.").SetDefault(2);
SetComment("Norm", "vector p-norm");
AddComment(comment_);
}
};
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
...@@ -176,6 +237,8 @@ REGISTER_OP(reduce_max, ops::ReduceOp, ops::ReduceMaxOpMaker, reduce_max_grad, ...@@ -176,6 +237,8 @@ REGISTER_OP(reduce_max, ops::ReduceOp, ops::ReduceMaxOpMaker, reduce_max_grad,
REGISTER_OP(reduce_min, ops::ReduceOp, ops::ReduceMinOpMaker, reduce_min_grad, REGISTER_OP(reduce_min, ops::ReduceOp, ops::ReduceMinOpMaker, reduce_min_grad,
ops::ReduceGradOp); ops::ReduceGradOp);
REGISTER_OP_WITHOUT_GRADIENT(norm, ops::NormOp, ops::NormOpMaker);
#define REGISTER_REDUCE_CPU_KERNEL(reduce_type, functor, grad_functor) \ #define REGISTER_REDUCE_CPU_KERNEL(reduce_type, functor, grad_functor) \
REGISTER_OP_CPU_KERNEL( \ REGISTER_OP_CPU_KERNEL( \
reduce_type, \ reduce_type, \
......
...@@ -33,7 +33,7 @@ class ReshapeKernel : public framework::OpKernel<T> { ...@@ -33,7 +33,7 @@ class ReshapeKernel : public framework::OpKernel<T> {
std::transform(shape.begin(), shape.end(), shape_int64.begin(), std::transform(shape.begin(), shape.end(), shape_int64.begin(),
[](int a) { return static_cast<int64_t>(a); }); [](int a) { return static_cast<int64_t>(a); });
auto out_dims = framework::make_ddim(shape_int64); auto out_dims = framework::make_ddim(shape_int64);
out->CopyFrom<T>(*in, ctx.GetPlace(), ctx.device_context()); out->CopyFrom(*in, ctx.GetPlace(), ctx.device_context());
out->Resize(out_dims); out->Resize(out_dims);
} }
}; };
...@@ -47,7 +47,7 @@ class ReshapeGradKernel : public framework::OpKernel<T> { ...@@ -47,7 +47,7 @@ class ReshapeGradKernel : public framework::OpKernel<T> {
d_x->mutable_data<T>(ctx.GetPlace()); d_x->mutable_data<T>(ctx.GetPlace());
auto in_dims = d_x->dims(); auto in_dims = d_x->dims();
d_x->CopyFrom<T>(*d_out, ctx.GetPlace(), ctx.device_context()); d_x->CopyFrom(*d_out, ctx.GetPlace(), ctx.device_context());
d_x->Resize(in_dims); d_x->Resize(in_dims);
} }
}; };
......
...@@ -36,14 +36,14 @@ void SegmentInputs(const std::vector<Scope*>& step_scopes, ...@@ -36,14 +36,14 @@ void SegmentInputs(const std::vector<Scope*>& step_scopes,
LoDTensor* input = input_var->GetMutable<LoDTensor>(); LoDTensor* input = input_var->GetMutable<LoDTensor>();
f::DDim dims = input->dims(); f::DDim dims = input->dims();
PADDLE_ENFORCE_EQ(static_cast<size_t>(dims[0]), seq_len, PADDLE_ENFORCE_EQ(static_cast<size_t>(dims[0]), seq_len,
"all the inlinks be the same length"); "all the inputs be the same length");
f::DDim step_dims = slice_ddim(dims, 1, dims.size()); f::DDim step_dims = slice_ddim(dims, 1, dims.size());
for (size_t j = 0; j < seq_len; j++) { for (size_t j = 0; j < seq_len; j++) {
Tensor* step_input = Tensor* step_input =
step_scopes[j]->Var(inlinks[i])->GetMutable<Tensor>(); step_scopes[j]->Var(inlinks[i])->GetMutable<Tensor>();
// The input of operators of each step is Tensor here. // The input of operators of each step is Tensor here.
// Maybe need to modify Slice function. // Maybe need to modify Slice function.
*step_input = input->Slice<float>(j, j + 1); *step_input = input->Slice(j, j + 1);
step_input->Resize(step_dims); step_input->Resize(step_dims);
} }
} }
...@@ -71,14 +71,14 @@ void ConcatOutputs(const std::vector<Scope*>& step_scopes, ...@@ -71,14 +71,14 @@ void ConcatOutputs(const std::vector<Scope*>& step_scopes,
step_scopes[j]->FindVar(outlinks[i])->GetMutable<LoDTensor>(); step_scopes[j]->FindVar(outlinks[i])->GetMutable<LoDTensor>();
// TODO(luotao02) data type and platform::DeviceContext() should set // TODO(luotao02) data type and platform::DeviceContext() should set
// correctly // correctly
(output->Slice<float>(j, j + 1)) (output->Slice(j, j + 1))
.CopyFrom<float>(*step_output, platform::CPUPlace(), ctx); .CopyFrom(*step_output, platform::CPUPlace(), ctx);
} }
} }
} }
void LinkMemories(const std::vector<Scope*>& scopes, void LinkMemories(const std::vector<Scope*>& scopes,
const std::vector<rnn::MemoryAttr>& memories, const std::vector<rnn::StateAttr>& memories,
const size_t step_id, const int offset) { const size_t step_id, const int offset) {
PADDLE_ENFORCE_LT(step_id, scopes.size(), PADDLE_ENFORCE_LT(step_id, scopes.size(),
"step [%d] is out of range of step scopes' size [%d]", "step [%d] is out of range of step scopes' size [%d]",
...@@ -95,7 +95,7 @@ void LinkMemories(const std::vector<Scope*>& scopes, ...@@ -95,7 +95,7 @@ void LinkMemories(const std::vector<Scope*>& scopes,
auto* mem = scope->FindVar(attr.pre_var)->GetMutable<LoDTensor>(); auto* mem = scope->FindVar(attr.pre_var)->GetMutable<LoDTensor>();
auto* linked_mem = linked_scope->FindVar(attr.var)->GetMutable<LoDTensor>(); auto* linked_mem = linked_scope->FindVar(attr.var)->GetMutable<LoDTensor>();
mem->Resize(linked_mem->dims()); mem->Resize(linked_mem->dims());
mem->ShareDataWith<float>(*linked_mem); mem->ShareDataWith(*linked_mem);
} }
} }
...@@ -106,26 +106,26 @@ void InitArgument(const ArgumentName& name, Argument* arg, ...@@ -106,26 +106,26 @@ void InitArgument(const ArgumentName& name, Argument* arg,
arg->inlinks = op.Inputs(name.inlinks); arg->inlinks = op.Inputs(name.inlinks);
arg->outlinks = op.Outputs(name.outlinks); arg->outlinks = op.Outputs(name.outlinks);
auto& boot_memories = auto& boot_memories = is_grad ? op.Outputs(name.initial_states)
is_grad ? op.Outputs(name.boot_memories) : op.Inputs(name.boot_memories); : op.Inputs(name.initial_states);
// attributes // attributes
auto& memories = op.Attr<std::vector<std::string>>(name.memories); auto& memories = op.Attr<std::vector<std::string>>(name.states);
auto& pre_memories = op.Attr<std::vector<std::string>>(name.pre_memories); auto& pre_memories = op.Attr<std::vector<std::string>>(name.ex_states);
PADDLE_ENFORCE(memories.size() == boot_memories.size(), PADDLE_ENFORCE(memories.size() == boot_memories.size(),
"the size of memories, boot_memories don't match:%d,%d", "the size of states, initial_states don't match:%d,%d",
memories.size(), boot_memories.size()); memories.size(), boot_memories.size());
PADDLE_ENFORCE(pre_memories.size() == boot_memories.size(), PADDLE_ENFORCE(pre_memories.size() == boot_memories.size(),
"the size of pre_memories, boot_memories don't match:%d,%d", "the size of ex_states, initial_states don't match:%d,%d",
pre_memories.size(), boot_memories.size()); pre_memories.size(), boot_memories.size());
PADDLE_ENFORCE(memories.size() > 0, "more than 1 memories should be set"); PADDLE_ENFORCE(memories.size() > 0, "more than 1 states should be set");
for (size_t i = 0; i < memories.size(); ++i) { for (size_t i = 0; i < memories.size(); ++i) {
rnn::MemoryAttr mem_attr; rnn::StateAttr mem_attr;
mem_attr.var = memories[i]; mem_attr.var = memories[i];
mem_attr.pre_var = pre_memories[i]; mem_attr.pre_var = pre_memories[i];
mem_attr.boot_var = boot_memories[i]; mem_attr.boot_var = boot_memories[i];
(arg->memories).push_back(mem_attr); (arg->states).push_back(mem_attr);
} }
} }
......
...@@ -31,7 +31,7 @@ using Scope = framework::Scope; ...@@ -31,7 +31,7 @@ using Scope = framework::Scope;
* boot memories in father scope. Other attributes are copied from Op's proto * boot memories in father scope. Other attributes are copied from Op's proto
* attributes. * attributes.
*/ */
struct MemoryAttr { struct StateAttr {
// name of current state variable // name of current state variable
std::string var; std::string var;
// name of previous step's state variable // name of previous step's state variable
...@@ -46,7 +46,7 @@ struct Argument { ...@@ -46,7 +46,7 @@ struct Argument {
std::string step_scopes; std::string step_scopes;
std::vector<std::string> inlinks; std::vector<std::string> inlinks;
std::vector<std::string> outlinks; std::vector<std::string> outlinks;
std::vector<rnn::MemoryAttr> memories; std::vector<rnn::StateAttr> states;
}; };
struct ArgumentName { struct ArgumentName {
...@@ -54,9 +54,9 @@ struct ArgumentName { ...@@ -54,9 +54,9 @@ struct ArgumentName {
std::string step_scopes; std::string step_scopes;
std::string inlinks; std::string inlinks;
std::string outlinks; std::string outlinks;
std::string memories; // the memory name std::string states; // the memory name
std::string pre_memories; // the previous memory name std::string ex_states; // the previous memory name
std::string boot_memories; // the boot memory name std::string initial_states; // the boot memory name
}; };
/** /**
...@@ -74,7 +74,7 @@ void ConcatOutputs(const std::vector<Scope*>& step_scopes, ...@@ -74,7 +74,7 @@ void ConcatOutputs(const std::vector<Scope*>& step_scopes,
const size_t seq_len, const platform::DeviceContext& ctx); const size_t seq_len, const platform::DeviceContext& ctx);
void LinkMemories(const std::vector<Scope*>& step_scopes, void LinkMemories(const std::vector<Scope*>& step_scopes,
const std::vector<MemoryAttr>& memories, const size_t step_id, const std::vector<StateAttr>& memories, const size_t step_id,
const int offset); const int offset);
void InitArgument(const ArgumentName& name, Argument* arg, void InitArgument(const ArgumentName& name, Argument* arg,
......
...@@ -30,7 +30,7 @@ class ScatterOpCUDAKernel : public framework::OpKernel<T> { ...@@ -30,7 +30,7 @@ class ScatterOpCUDAKernel : public framework::OpKernel<T> {
auto *Updates = ctx.Input<Tensor>("Updates"); auto *Updates = ctx.Input<Tensor>("Updates");
auto *Out = ctx.Output<Tensor>("Out"); auto *Out = ctx.Output<Tensor>("Out");
Out->ShareDataWith<T>(*Ref); Out->ShareDataWith(*Ref);
GPUScatterAssign<T>(ctx.device_context(), *Updates, *Index, Out); GPUScatterAssign<T>(ctx.device_context(), *Updates, *Index, Out);
} }
...@@ -48,7 +48,7 @@ class ScatterGradOpCUDAKernel : public framework::OpKernel<T> { ...@@ -48,7 +48,7 @@ class ScatterGradOpCUDAKernel : public framework::OpKernel<T> {
auto *dOut = ctx.Input<Tensor>(framework::GradVarName("Out")); auto *dOut = ctx.Input<Tensor>(framework::GradVarName("Out"));
// In place gradient: dRef = dO // In place gradient: dRef = dO
dRef->ShareDataWith<T>(*dOut); dRef->ShareDataWith(*dOut);
dUpdates->mutable_data<T>(ctx.GetPlace()); dUpdates->mutable_data<T>(ctx.GetPlace());
// Gradient by Gather: dUpdates = dO[Index] // Gradient by Gather: dUpdates = dO[Index]
GPUGather<T>(ctx.device_context(), *dOut, *Index, dUpdates); GPUGather<T>(ctx.device_context(), *dOut, *Index, dUpdates);
......
...@@ -35,7 +35,7 @@ class ScatterOpKernel : public framework::OpKernel<T> { ...@@ -35,7 +35,7 @@ class ScatterOpKernel : public framework::OpKernel<T> {
auto *Out = ctx.Output<Tensor>("Out"); auto *Out = ctx.Output<Tensor>("Out");
// In place output: Out = Ref, Out[Index] += Updates // In place output: Out = Ref, Out[Index] += Updates
Out->ShareDataWith<T>(*Ref); Out->ShareDataWith(*Ref);
// Apply ScatterUpdate: Out[index] += Updates[:] // Apply ScatterUpdate: Out[index] += Updates[:]
ScatterAssign<T>(ctx.device_context(), *Updates, *Index, Out); ScatterAssign<T>(ctx.device_context(), *Updates, *Index, Out);
} }
...@@ -53,7 +53,7 @@ class ScatterGradientOpKernel : public framework::OpKernel<T> { ...@@ -53,7 +53,7 @@ class ScatterGradientOpKernel : public framework::OpKernel<T> {
auto *dOut = ctx.Input<Tensor>(framework::GradVarName("Out")); auto *dOut = ctx.Input<Tensor>(framework::GradVarName("Out"));
// In place gradient: dRef = dO // In place gradient: dRef = dO
dRef->ShareDataWith<T>(*dOut); dRef->ShareDataWith(*dOut);
dUpdates->mutable_data<T>(ctx.GetPlace()); dUpdates->mutable_data<T>(ctx.GetPlace());
// Gradient by Gather: dUpdates += dO[Index] // Gradient by Gather: dUpdates += dO[Index]
CPUGather<T>(ctx.device_context(), *dOut, *Index, dUpdates); CPUGather<T>(ctx.device_context(), *dOut, *Index, dUpdates);
......
...@@ -87,7 +87,7 @@ class SequenceConcatOpKernel : public framework::OpKernel<T> { ...@@ -87,7 +87,7 @@ class SequenceConcatOpKernel : public framework::OpKernel<T> {
auto out_lod_level = out_lod[level]; auto out_lod_level = out_lod[level];
for (size_t i = 0; i < out_lod_level.size() - 1; ++i) { for (size_t i = 0; i < out_lod_level.size() - 1; ++i) {
Tensor out_t = out->Slice<T>(static_cast<int>(out_lod_level[i]), Tensor out_t = out->Slice(static_cast<int>(out_lod_level[i]),
static_cast<int>(out_lod_level[i + 1])); static_cast<int>(out_lod_level[i + 1]));
auto out_stride = framework::stride(out_t.dims()); auto out_stride = framework::stride(out_t.dims());
size_t offset = 0; size_t offset = 0;
...@@ -95,7 +95,7 @@ class SequenceConcatOpKernel : public framework::OpKernel<T> { ...@@ -95,7 +95,7 @@ class SequenceConcatOpKernel : public framework::OpKernel<T> {
for (size_t j = 0; j < n; ++j) { for (size_t j = 0; j < n; ++j) {
auto in_lod_level = ins[j]->lod()[level]; auto in_lod_level = ins[j]->lod()[level];
auto in_stride = framework::stride(ins[j]->dims()); auto in_stride = framework::stride(ins[j]->dims());
Tensor in_t = ins[j]->Slice<T>(static_cast<int>(in_lod_level[i]), Tensor in_t = ins[j]->Slice(static_cast<int>(in_lod_level[i]),
static_cast<int>(in_lod_level[i + 1])); static_cast<int>(in_lod_level[i + 1]));
size_t axis_dim = in_t.dims()[axis]; size_t axis_dim = in_t.dims()[axis];
StridedMemcpy<T>(ctx.device_context(), in_t.data<T>(), in_stride, StridedMemcpy<T>(ctx.device_context(), in_t.data<T>(), in_stride,
...@@ -130,7 +130,7 @@ class SequenceConcatGradOpKernel : public framework::OpKernel<T> { ...@@ -130,7 +130,7 @@ class SequenceConcatGradOpKernel : public framework::OpKernel<T> {
for (size_t i = 0; i < out_lod_level.size() - 1; ++i) { for (size_t i = 0; i < out_lod_level.size() - 1; ++i) {
Tensor out_grad_t = Tensor out_grad_t =
out_grad->Slice<T>(static_cast<int>(out_lod_level[i]), out_grad->Slice(static_cast<int>(out_lod_level[i]),
static_cast<int>(out_lod_level[i + 1])); static_cast<int>(out_lod_level[i + 1]));
auto out_grad_stride = framework::stride(out_grad_t.dims()); auto out_grad_stride = framework::stride(out_grad_t.dims());
size_t offset = 0; size_t offset = 0;
...@@ -139,7 +139,7 @@ class SequenceConcatGradOpKernel : public framework::OpKernel<T> { ...@@ -139,7 +139,7 @@ class SequenceConcatGradOpKernel : public framework::OpKernel<T> {
auto x_grad_lod_level = x_grads[j]->lod()[level]; auto x_grad_lod_level = x_grads[j]->lod()[level];
auto x_grad_stride = framework::stride(x_grads[j]->dims()); auto x_grad_stride = framework::stride(x_grads[j]->dims());
Tensor x_grad_t = Tensor x_grad_t =
x_grads[j]->Slice<T>(static_cast<int>(x_grad_lod_level[i]), x_grads[j]->Slice(static_cast<int>(x_grad_lod_level[i]),
static_cast<int>(x_grad_lod_level[i + 1])); static_cast<int>(x_grad_lod_level[i + 1]));
size_t axis_dim = x_grad_t.dims()[axis]; size_t axis_dim = x_grad_t.dims()[axis];
StridedMemcpy<T>(ctx.device_context(), out_grad_t.data<T>() + offset, StridedMemcpy<T>(ctx.device_context(), out_grad_t.data<T>() + offset,
......
...@@ -64,9 +64,9 @@ class SequencePoolKernel : public framework::OpKernel<T> { ...@@ -64,9 +64,9 @@ class SequencePoolKernel : public framework::OpKernel<T> {
out->mutable_data<T>(context.GetPlace()); out->mutable_data<T>(context.GetPlace());
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
for (int i = 0; i < static_cast<int>(lod_level_0.size()) - 1; ++i) { for (int i = 0; i < static_cast<int>(lod_level_0.size()) - 1; ++i) {
Tensor in_t = in->Slice<T>(static_cast<int>(lod_level_0[i]), Tensor in_t = in->Slice(static_cast<int>(lod_level_0[i]),
static_cast<int>(lod_level_0[i + 1])); static_cast<int>(lod_level_0[i + 1]));
Tensor out_t = out->Slice<T>(i, i + 1); Tensor out_t = out->Slice(i, i + 1);
int64_t h = static_cast<int64_t>(lod_level_0[i + 1] - lod_level_0[i]); int64_t h = static_cast<int64_t>(lod_level_0[i + 1] - lod_level_0[i]);
auto in_e = EigenMatrix<T>::From(in_t, framework::make_ddim({h, w})); auto in_e = EigenMatrix<T>::From(in_t, framework::make_ddim({h, w}));
auto out_e = EigenVector<T>::Flatten(out_t); auto out_e = EigenVector<T>::Flatten(out_t);
...@@ -116,9 +116,9 @@ class SequencePoolGradKernel : public framework::OpKernel<T> { ...@@ -116,9 +116,9 @@ class SequencePoolGradKernel : public framework::OpKernel<T> {
} }
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
for (int i = 0; i < static_cast<int>(lod.size()) - 1; ++i) { for (int i = 0; i < static_cast<int>(lod.size()) - 1; ++i) {
auto in_g_t = in_g->Slice<T>(static_cast<int>(lod[i]), auto in_g_t =
static_cast<int>(lod[i + 1])); in_g->Slice(static_cast<int>(lod[i]), static_cast<int>(lod[i + 1]));
auto out_g_t = out_g->Slice<T>(i, i + 1); auto out_g_t = out_g->Slice(i, i + 1);
int64_t h = static_cast<int64_t>(lod[i + 1] - lod[i]); int64_t h = static_cast<int64_t>(lod[i + 1] - lod[i]);
auto in_g_e = EigenMatrix<T>::From(in_g_t, {h, w}); auto in_g_e = EigenMatrix<T>::From(in_g_t, {h, w});
auto out_g_e = EigenMatrix<T>::From(out_g_t, {1, w}); auto out_g_e = EigenMatrix<T>::From(out_g_t, {1, w});
......
...@@ -46,8 +46,8 @@ class SequenceSoftmaxKernel : public framework::OpKernel<T> { ...@@ -46,8 +46,8 @@ class SequenceSoftmaxKernel : public framework::OpKernel<T> {
for (int i = 0; i < static_cast<int>(lod[level].size()) - 1; ++i) { for (int i = 0; i < static_cast<int>(lod[level].size()) - 1; ++i) {
int start_pos = static_cast<int>(lod[level][i]); int start_pos = static_cast<int>(lod[level][i]);
int end_pos = static_cast<int>(lod[level][i + 1]); int end_pos = static_cast<int>(lod[level][i + 1]);
Tensor x_i = x->Slice<T>(start_pos, end_pos); Tensor x_i = x->Slice(start_pos, end_pos);
Tensor out_i = out->Slice<T>(start_pos, end_pos); Tensor out_i = out->Slice(start_pos, end_pos);
// Reshape from (end_pos - start_pos) x 1UL to 1UL x (end_pos - start_pos) // Reshape from (end_pos - start_pos) x 1UL to 1UL x (end_pos - start_pos)
framework::DDim dims_i = framework::make_ddim({1UL, end_pos - start_pos}); framework::DDim dims_i = framework::make_ddim({1UL, end_pos - start_pos});
...@@ -75,9 +75,9 @@ class SequenceSoftmaxGradKernel : public framework::OpKernel<T> { ...@@ -75,9 +75,9 @@ class SequenceSoftmaxGradKernel : public framework::OpKernel<T> {
int start_pos = static_cast<int>(lod[level][i]); int start_pos = static_cast<int>(lod[level][i]);
int end_pos = static_cast<int>(lod[level][i + 1]); int end_pos = static_cast<int>(lod[level][i + 1]);
Tensor out_i = out->Slice<T>(start_pos, end_pos); Tensor out_i = out->Slice(start_pos, end_pos);
Tensor out_grad_i = out_grad->Slice<T>(start_pos, end_pos); Tensor out_grad_i = out_grad->Slice(start_pos, end_pos);
Tensor x_grad_i = x_grad->Slice<T>(start_pos, end_pos); Tensor x_grad_i = x_grad->Slice(start_pos, end_pos);
// Reshape from (end_pos - start_pos) x 1UL to 1UL x (end_pos - start_pos) // Reshape from (end_pos - start_pos) x 1UL to 1UL x (end_pos - start_pos)
framework::DDim dims_i = framework::make_ddim({1UL, end_pos - start_pos}); framework::DDim dims_i = framework::make_ddim({1UL, end_pos - start_pos});
......
...@@ -85,7 +85,7 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel<T> { ...@@ -85,7 +85,7 @@ class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel<T> {
context.Input<Tensor>(framework::GradVarName("Loss"))->data<T>(); context.Input<Tensor>(framework::GradVarName("Loss"))->data<T>();
Tensor* logit_grad = Tensor* logit_grad =
context.Output<Tensor>(framework::GradVarName("Logits")); context.Output<Tensor>(framework::GradVarName("Logits"));
logit_grad->ShareDataWith<T>(*context.Input<Tensor>("Softmax")); logit_grad->ShareDataWith(*context.Input<Tensor>("Softmax"));
T* logit_grad_data = logit_grad->data<T>(); T* logit_grad_data = logit_grad->data<T>();
const int batch_size = logit_grad->dims()[0]; const int batch_size = logit_grad->dims()[0];
......
...@@ -57,7 +57,7 @@ class SoftmaxWithCrossEntropyGradKernel : public framework::OpKernel<T> { ...@@ -57,7 +57,7 @@ class SoftmaxWithCrossEntropyGradKernel : public framework::OpKernel<T> {
const Tensor* labels = context.Input<Tensor>("Label"); const Tensor* labels = context.Input<Tensor>("Label");
Tensor* logit_grad = Tensor* logit_grad =
context.Output<Tensor>(framework::GradVarName("Logits")); context.Output<Tensor>(framework::GradVarName("Logits"));
logit_grad->ShareDataWith<T>(*context.Input<Tensor>("Softmax")); logit_grad->ShareDataWith(*context.Input<Tensor>("Softmax"));
const int class_num = logit_grad->dims()[1]; const int class_num = logit_grad->dims()[1];
if (context.Attr<bool>("soft_label")) { if (context.Attr<bool>("soft_label")) {
......
...@@ -53,10 +53,10 @@ class UniformRandomOp : public framework::OperatorWithKernel { ...@@ -53,10 +53,10 @@ class UniformRandomOp : public framework::OperatorWithKernel {
PADDLE_ENFORCE( PADDLE_ENFORCE(
ctx->Attrs().Get<float>("min") < ctx->Attrs().Get<float>("max"), ctx->Attrs().Get<float>("min") < ctx->Attrs().Get<float>("max"),
"uniform_random's min must less then max"); "uniform_random's min must less then max");
auto& dims = ctx->Attrs().Get<std::vector<int>>("dims"); auto& shape = ctx->Attrs().Get<std::vector<int>>("shape");
std::vector<int64_t> temp; std::vector<int64_t> temp;
temp.reserve(dims.size()); temp.reserve(shape.size());
for (auto dim : dims) { for (auto dim : shape) {
temp.push_back(static_cast<int64_t>(dim)); temp.push_back(static_cast<int64_t>(dim));
} }
ctx->SetOutputDim("Out", framework::make_ddim(temp)); ctx->SetOutputDim("Out", framework::make_ddim(temp));
...@@ -65,7 +65,7 @@ class UniformRandomOp : public framework::OperatorWithKernel { ...@@ -65,7 +65,7 @@ class UniformRandomOp : public framework::OperatorWithKernel {
protected: protected:
framework::DataType IndicateDataType( framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override { const framework::ExecutionContext& ctx) const override {
return static_cast<framework::DataType>(Attr<int>("data_type")); return static_cast<framework::DataType>(ctx.Attr<int>("data_type"));
} }
}; };
...@@ -78,7 +78,7 @@ class UniformRandomOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -78,7 +78,7 @@ class UniformRandomOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC(Uniform random operator. AddComment(R"DOC(Uniform random operator.
Used to initialize tensor with uniform random generator. Used to initialize tensor with uniform random generator.
)DOC"); )DOC");
AddAttr<std::vector<int>>("dims", "the dimension of random tensor"); AddAttr<std::vector<int>>("shape", "the dimension of random tensor");
AddAttr<float>("min", "Minimum value of uniform random").SetDefault(-1.0f); AddAttr<float>("min", "Minimum value of uniform random").SetDefault(-1.0f);
AddAttr<float>("max", "Maximun value of uniform random").SetDefault(1.0f); AddAttr<float>("max", "Maximun value of uniform random").SetDefault(1.0f);
AddAttr<int>("seed", AddAttr<int>("seed",
......
...@@ -222,7 +222,9 @@ void BindVarDsec(py::module &m) { ...@@ -222,7 +222,9 @@ void BindVarDsec(py::module &m) {
py::enum_<VarDesc::VarType>(var_desc, "VarType", "") py::enum_<VarDesc::VarType>(var_desc, "VarType", "")
.value("LOD_TENSOR", VarDesc::LOD_TENSOR) .value("LOD_TENSOR", VarDesc::LOD_TENSOR)
.value("SELECTED_ROWS", VarDesc::SELECTED_ROWS); .value("SELECTED_ROWS", VarDesc::SELECTED_ROWS)
.value("FEED_MINIBATCH", VarDesc::FEED_MINIBATCH)
.value("FETCH_LIST", VarDesc::FETCH_LIST);
} }
void BindOpDesc(py::module &m) { void BindOpDesc(py::module &m) {
......
...@@ -84,10 +84,12 @@ PYBIND11_PLUGIN(core) { ...@@ -84,10 +84,12 @@ PYBIND11_PLUGIN(core) {
.def("set", PyCPUTensorSetFromArray<float>) .def("set", PyCPUTensorSetFromArray<float>)
.def("set", PyCPUTensorSetFromArray<int>) .def("set", PyCPUTensorSetFromArray<int>)
.def("set", PyCPUTensorSetFromArray<double>) .def("set", PyCPUTensorSetFromArray<double>)
.def("set", PyCPUTensorSetFromArray<int64_t>)
#ifdef PADDLE_WITH_CUDA #ifdef PADDLE_WITH_CUDA
.def("set", PyCUDATensorSetFromArray<float>) .def("set", PyCUDATensorSetFromArray<float>)
.def("set", PyCUDATensorSetFromArray<int>) .def("set", PyCUDATensorSetFromArray<int>)
.def("set", PyCUDATensorSetFromArray<double>) .def("set", PyCUDATensorSetFromArray<double>)
.def("set", PyCUDATensorSetFromArray<int64_t>)
#endif #endif
.def("shape", [](Tensor &self) { return vectorize(self.dims()); }) .def("shape", [](Tensor &self) { return vectorize(self.dims()); })
.def("set_float_element", TensorSetElement<float>) .def("set_float_element", TensorSetElement<float>)
...@@ -111,6 +113,7 @@ PYBIND11_PLUGIN(core) { ...@@ -111,6 +113,7 @@ PYBIND11_PLUGIN(core) {
new (&instance) LoDTensor(new_lod); new (&instance) LoDTensor(new_lod);
#endif #endif
}) })
.def("__init__", [](LoDTensor &instance) { new (&instance) LoDTensor(); })
.def("set_lod", .def("set_lod",
[](LoDTensor &self, const std::vector<std::vector<size_t>> &lod) { [](LoDTensor &self, const std::vector<std::vector<size_t>> &lod) {
#ifndef PADDLE_WITH_CUDA #ifndef PADDLE_WITH_CUDA
...@@ -264,6 +267,17 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -264,6 +267,17 @@ All parameter, weight, gradient are variables in Paddle.
.def(py::init<>()) .def(py::init<>())
.def("__str__", string::to_string<const platform::CPUPlace &>); .def("__str__", string::to_string<const platform::CPUPlace &>);
py::class_<platform::Place>(m, "Place")
.def(py::init<>())
.def("set_place",
[](platform::Place &self, const platform::CPUPlace &cpu_place) {
self = cpu_place;
})
.def("set_place",
[](platform::Place &self, const platform::GPUPlace &gpu_place) {
self = gpu_place;
});
py::class_<OperatorBase>(m, "Operator") py::class_<OperatorBase>(m, "Operator")
.def_static("create", .def_static("create",
[](py::bytes protobin) { [](py::bytes protobin) {
...@@ -399,18 +413,18 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -399,18 +413,18 @@ All parameter, weight, gradient are variables in Paddle.
return static_cast<operators::DynamicRecurrentOp *>( return static_cast<operators::DynamicRecurrentOp *>(
rnn_op.release()); rnn_op.release());
}) })
.def("set_stepnet", .def("set_step_unit",
[](operators::DynamicRecurrentOp &self, const operators::NetOp &net) [](operators::DynamicRecurrentOp &self, const operators::NetOp &net)
-> void { self.SetStepNet(net.Clone()); }) -> void { self.rnn.SetStepUnit(net.Clone()); })
.def("get_state", .def("get_state",
[](operators::DynamicRecurrentOp &self, const std::string &name) [](operators::DynamicRecurrentOp &self, const std::string &name)
-> const TensorArray & { return self.state(name); }) -> const TensorArray & { return self.rnn.state(name); })
.def("get_step_input", .def("get_step_input",
[](operators::DynamicRecurrentOp &self, const std::string &name) [](operators::DynamicRecurrentOp &self, const std::string &name)
-> const TensorArray & { return self.step_input(name); }) -> const TensorArray & { return self.rnn.step_input(name); })
.def("get_step_output", .def("get_step_output",
[](operators::DynamicRecurrentOp &self, const std::string &name) [](operators::DynamicRecurrentOp &self, const std::string &name)
-> const TensorArray & { return self.step_output(name); }); -> const TensorArray & { return self.rnn.step_output(name); });
// cond_op // cond_op
py::class_<operators::CondOp, OperatorBase>(m, "CondOp") py::class_<operators::CondOp, OperatorBase>(m, "CondOp")
...@@ -436,18 +450,15 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -436,18 +450,15 @@ All parameter, weight, gradient are variables in Paddle.
py::class_<framework::Executor>(m, "Executor") py::class_<framework::Executor>(m, "Executor")
.def(py::init<std::vector<platform::Place> &>()) .def(py::init<std::vector<platform::Place> &>())
.def("run", .def("run", [](Executor &self, ProgramDescBind *program_bind,
[](Executor &self, const ProgramDesc &program_desc, int block_id) { Scope *scope, int block_id) {
framework::Scope &global_scope = GetGlobalScope(); self.Run(*program_bind->Proto(), scope, block_id);
self.Run(program_desc, &global_scope, block_id);
}); });
m.def("unique_integer", UniqueIntegerGenerator); m.def("unique_integer", UniqueIntegerGenerator);
m.def("is_compile_gpu", IsCompileGPU); m.def("is_compile_gpu", IsCompileGPU);
m.def("set_feed_variable_float", framework::SetFeedVariable<float>); m.def("set_feed_variable", framework::SetFeedVariable);
m.def("set_feed_variable_double", framework::SetFeedVariable<double>);
m.def("set_feed_variable_int", framework::SetFeedVariable<int>);
m.def("get_fetch_variable", framework::GetFetchVariable); m.def("get_fetch_variable", framework::GetFetchVariable);
BindProgramDesc(m); BindProgramDesc(m);
...@@ -455,6 +466,8 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -455,6 +466,8 @@ All parameter, weight, gradient are variables in Paddle.
BindVarDsec(m); BindVarDsec(m);
BindOpDesc(m); BindOpDesc(m);
m.def("op_support_gpu", OpSupportGPU);
return m.ptr(); return m.ptr();
} }
} // namespace pybind } // namespace pybind
......
...@@ -141,10 +141,17 @@ RUN sed -i '${APT_MIRROR}' /etc/apt/sources.list ...@@ -141,10 +141,17 @@ RUN sed -i '${APT_MIRROR}' /etc/apt/sources.list
EOF EOF
fi fi
if [[ ${WITH_GPU} == "ON" ]]; then
NCCL_DEPS="apt-get install -y libnccl-dev &&"
else
NCCL_DEPS=""
fi
cat >> /paddle/build/Dockerfile <<EOF cat >> /paddle/build/Dockerfile <<EOF
ADD python/dist/*.whl / ADD python/dist/*.whl /
# run paddle version to install python packages first # run paddle version to install python packages first
RUN apt-get update &&\ RUN apt-get update &&\
${NCCL_DEPS}\
apt-get install -y wget python-pip && pip install -U pip && \ apt-get install -y wget python-pip && pip install -U pip && \
pip install /*.whl; apt-get install -f -y && \ pip install /*.whl; apt-get install -f -y && \
apt-get clean -y && \ apt-get clean -y && \
......
...@@ -17,7 +17,7 @@ from paddle.trainer_config_helpers import * ...@@ -17,7 +17,7 @@ from paddle.trainer_config_helpers import *
################################### Data Configuration ################################### ################################### Data Configuration ###################################
TrainData(ProtoData(files = "trainer/tests/mnist.list")) TrainData(ProtoData(files = "trainer/tests/mnist.list"))
################################### Algorithm Configuration ################################### ################################### Algorithm Configuration ###################################
settings(batch_size = 256, settings(batch_size = 128,
learning_method = MomentumOptimizer(momentum=0.5, sparse=False)) learning_method = MomentumOptimizer(momentum=0.5, sparse=False))
################################### Network Configuration ################################### ################################### Network Configuration ###################################
data = data_layer(name ="input", size=784) data = data_layer(name ="input", size=784)
...@@ -44,10 +44,11 @@ a2 = img_conv_layer(input=tmp, ...@@ -44,10 +44,11 @@ a2 = img_conv_layer(input=tmp,
shared_biases=True, shared_biases=True,
act=ReluActivation()) act=ReluActivation())
tmp = concat_layer(input=[a1, a2]) tmp = addto_layer(input=[a1, a2],
act=ReluActivation(),
bias_attr=False)
tmp = img_pool_layer(input=tmp, tmp = img_pool_layer(input=tmp,
num_channels=64,
pool_size=3, pool_size=3,
stride=2, stride=2,
padding=1, padding=1,
...@@ -55,35 +56,34 @@ tmp = img_pool_layer(input=tmp, ...@@ -55,35 +56,34 @@ tmp = img_pool_layer(input=tmp,
b1 = img_conv_layer(input=tmp, b1 = img_conv_layer(input=tmp,
filter_size=3, filter_size=3,
num_filters=64, num_filters=32,
padding=1, padding=1,
shared_biases=True, shared_biases=True,
act=ReluActivation()) act=ReluActivation())
b1 = img_pool_layer(input=b1, b1 = img_pool_layer(input=b1,
pool_size=3, pool_size=3,
stride=1, stride=2,
padding=1, padding=0,
pool_type=MaxPooling()) pool_type=MaxPooling())
b2 = img_conv_layer(input=tmp, b2 = img_conv_layer(input=tmp,
filter_size=5, filter_size=3,
num_filters=64, num_filters=64,
padding=2, padding=1,
shared_biases=True, shared_biases=True,
act=ReluActivation()) act=ReluActivation())
b2 = img_pool_layer(input=b2, b2 = img_pool_layer(input=b2,
pool_size=5, pool_size=5,
stride=1, stride=2,
padding=2, padding=1,
pool_type=MaxPooling()) pool_type=MaxPooling())
tmp = addto_layer(input=[b1, b2], tmp = concat_layer(input=[b1, b2])
act=ReluActivation(),
bias_attr=False)
tmp = img_pool_layer(input=tmp, tmp = img_pool_layer(input=tmp,
num_channels=96,
pool_size=3, pool_size=3,
stride=2, stride=2,
padding=1, padding=1,
......
...@@ -17,7 +17,7 @@ from paddle.trainer_config_helpers import * ...@@ -17,7 +17,7 @@ from paddle.trainer_config_helpers import *
################################### Data Configuration ################################### ################################### Data Configuration ###################################
TrainData(ProtoData(files = "trainer/tests/mnist.list")) TrainData(ProtoData(files = "trainer/tests/mnist.list"))
################################### Algorithm Configuration ################################### ################################### Algorithm Configuration ###################################
settings(batch_size = 1000, settings(batch_size = 128,
learning_method = MomentumOptimizer(momentum=0.5, sparse=False)) learning_method = MomentumOptimizer(momentum=0.5, sparse=False))
################################### Network Configuration ################################### ################################### Network Configuration ###################################
data = data_layer(name ="input", size=784) data = data_layer(name ="input", size=784)
......
...@@ -175,7 +175,7 @@ def index_slot(value_range, seq_type=SequenceType.NO_SEQUENCE): ...@@ -175,7 +175,7 @@ def index_slot(value_range, seq_type=SequenceType.NO_SEQUENCE):
dense_vector = dense_slot dense_vector = dense_slot
sparse_binary_vector = sparse_non_value_slot sparse_binary_vector = sparse_non_value_slot
sparse_vector = sparse_value_slot sparse_float_vector = sparse_value_slot
integer_value = index_slot integer_value = index_slot
# dense_array can be used for variable-length input feature. # dense_array can be used for variable-length input feature.
...@@ -216,7 +216,7 @@ def sparse_binary_vector_sub_sequence(dim): ...@@ -216,7 +216,7 @@ def sparse_binary_vector_sub_sequence(dim):
return sparse_binary_vector(dim, seq_type=SequenceType.SUB_SEQUENCE) return sparse_binary_vector(dim, seq_type=SequenceType.SUB_SEQUENCE)
def sparse_vector_sequence(dim): def sparse_float_vector_sequence(dim):
""" """
Data type of a sequence of sparse vector, which most elements are zero, Data type of a sequence of sparse vector, which most elements are zero,
others could be any float value. others could be any float value.
...@@ -226,11 +226,11 @@ def sparse_vector_sequence(dim): ...@@ -226,11 +226,11 @@ def sparse_vector_sequence(dim):
:return: An input type object :return: An input type object
:rtype: InputType :rtype: InputType
""" """
return sparse_vector(dim, seq_type=SequenceType.SEQUENCE) return sparse_float_vector(dim, seq_type=SequenceType.SEQUENCE)
def sparse_vector_sub_sequence(dim): def sparse_float_vector_sub_sequence(dim):
return sparse_vector(dim, seq_type=SequenceType.SUB_SEQUENCE) return sparse_float_vector(dim, seq_type=SequenceType.SUB_SEQUENCE)
def integer_value_sequence(value_range): def integer_value_sequence(value_range):
......
import paddle.v2.framework.core as core
from paddle.v2.framework.framework import Block, Program
g_scope = core.Scope()
class Executor(object):
def __init__(self, places):
if not isinstance(places, list) and not isinstance(places, tuple):
places = [places]
act_places = []
for each in places:
p = core.Place()
p.set_place(each)
act_places.append(p)
self.executor = core.Executor(act_places)
def run(self,
program,
feed,
fetch_list,
feed_var_name='feed',
fetch_var_name='fetch',
scope=None):
if not isinstance(program, Program):
raise TypeError()
if scope is None:
scope = g_scope
program = program.clone()
global_block = program.global_block()
feed_var = global_block.create_var(
name=feed_var_name,
type=core.VarDesc.VarType.FEED_MINIBATCH,
persistable=True)
for i, name in enumerate(feed):
out = global_block.var(name)
global_block.prepend_op(
'feed',
inputs={'X': [feed_var]},
outputs={'Out': [out]},
attrs={'col': i})
core.set_feed_variable(scope, feed[name], feed_var.name, i)
fetch_var = global_block.create_var(
name=fetch_var_name,
type=core.VarDesc.VarType.FETCH_LIST,
persistable=True)
for i, var in enumerate(fetch_list):
global_block.append_op(
type='fetch',
inputs={'X': [var]},
outputs={'Out': [fetch_var]},
attrs={'col': i})
self.executor.run(program.desc, scope, 0)
return [
core.get_fetch_variable(scope, fetch_var_name, i)
for i in xrange(len(fetch_list))
]
...@@ -15,7 +15,7 @@ class Variable(object): ...@@ -15,7 +15,7 @@ class Variable(object):
shape=None, shape=None,
dtype=None, dtype=None,
lod_level=None, lod_level=None,
persistable=False, persistable=None,
**kwargs): **kwargs):
self.block = block self.block = block
...@@ -256,6 +256,7 @@ class Operator(object): ...@@ -256,6 +256,7 @@ class Operator(object):
self.desc.set_block_attr(attr_name, attrs[attr_name].desc) self.desc.set_block_attr(attr_name, attrs[attr_name].desc)
self.desc.check_attrs() self.desc.check_attrs()
if type not in {'feed', 'fetch'}:
self.desc.infer_shape(self.block.desc) self.desc.infer_shape(self.block.desc)
def __str__(self): def __str__(self):
...@@ -323,9 +324,12 @@ class Block(object): ...@@ -323,9 +324,12 @@ class Block(object):
return self.desc.id return self.desc.id
def var(self, name): def var(self, name):
if name not in self.vars: if not isinstance(name, basestring):
raise TypeError()
v = self.vars.get(name, None)
if v is None:
raise ValueError("var %s not in this block" % name) raise ValueError("var %s not in this block" % name)
return self.vars[name] return v
def all_parameters(self): def all_parameters(self):
return {v for k, v in self.vars.iteritems() if isinstance(v, Parameter)} return {v for k, v in self.vars.iteritems() if isinstance(v, Parameter)}
...@@ -339,6 +343,8 @@ class Block(object): ...@@ -339,6 +343,8 @@ class Block(object):
def create_parameter(self, *args, **kwargs): def create_parameter(self, *args, **kwargs):
global_block = self.program.global_block() global_block = self.program.global_block()
param = Parameter(global_block, *args, **kwargs) param = Parameter(global_block, *args, **kwargs)
if 'init_attr' in kwargs:
self._prepend_initialize_ops_(param, kwargs['init_attr'])
return param return param
def append_op(self, *args, **kwargs): def append_op(self, *args, **kwargs):
...@@ -397,6 +403,17 @@ class Block(object): ...@@ -397,6 +403,17 @@ class Block(object):
for index in range(len(self.ops)): for index in range(len(self.ops)):
assert self.ops[index].desc == ops_in_cpp[index] assert self.ops[index].desc == ops_in_cpp[index]
def _prepend_initialize_ops_(self, param, init_attr):
op_type = init_attr['type']
init_attr['shape'] = param.shape
init_attr['data_type'] = int(param.data_type)
op = self.prepend_op(
type=op_type,
inputs=None,
outputs={'Out': [param]},
attrs=init_attr)
param.op = op
class Program(object): class Program(object):
def __init__(self): def __init__(self):
...@@ -428,11 +445,13 @@ class Program(object): ...@@ -428,11 +445,13 @@ class Program(object):
def current_block(self): def current_block(self):
return self.blocks[self.current_block_idx] return self.blocks[self.current_block_idx]
def append_backward(self, target, no_grad_set): def append_backward(self, target, no_grad_set=None):
""" """
return map(param_name -> (grad_name, block_index, op_index)) return map(param_name -> (grad_name, block_index, op_index))
""" """
assert isinstance(target, Variable) assert isinstance(target, Variable)
if no_grad_set is None:
no_grad_set = set()
param_to_grad_info = self.desc.append_backward(target.desc, no_grad_set) param_to_grad_info = self.desc.append_backward(target.desc, no_grad_set)
self.sync_with_cpp() self.sync_with_cpp()
return param_to_grad_info return param_to_grad_info
...@@ -469,27 +488,10 @@ class Parameter(Variable): ...@@ -469,27 +488,10 @@ class Parameter(Variable):
Variable.__init__( Variable.__init__(
self, block, persistable=True, shape=shape, dtype=dtype, **kwargs) self, block, persistable=True, shape=shape, dtype=dtype, **kwargs)
self.trainable = kwargs.get('trainable', True) self.trainable = kwargs.get('trainable', True)
self.init_attr = kwargs.get('initialize_attr', {
'type': 'uniform_random',
'min': -1.0,
'max': 1.0
})
self.optimize_attr = kwargs.get('optimize_attr', {'learning_rate': 1.0}) self.optimize_attr = kwargs.get('optimize_attr', {'learning_rate': 1.0})
self._append_initialize_ops_()
def _append_initialize_ops_(self):
attr = self.init_attr
op_type = attr.pop('type', None)
block = self.block
assert isinstance(block, Block)
shape = self.shape
attr['dims'] = shape
attr['data_type'] = int(self.data_type)
op = block.prepend_op(
type=op_type, inputs=None, outputs={'Out': [self]}, attrs=attr)
self.op = op
# program is a global instance. # program is a global instance.
g_program = Program() g_program = Program()
g_init_program = Program()
from paddle.v2.framework.framework import Variable, OpProtoHolder, g_program from paddle.v2.framework.framework import Variable, OpProtoHolder, g_program, g_init_program
import paddle.v2.framework.core as core import paddle.v2.framework.core as core
import copy import copy
import itertools import itertools
...@@ -29,6 +29,14 @@ class LayerHelper(object): ...@@ -29,6 +29,14 @@ class LayerHelper(object):
else: else:
return prog return prog
@property
def init_program(self):
prog = self.kwargs.get('init_program', None)
if prog is None:
return g_init_program
else:
return prog
def append_op(self, *args, **kwargs): def append_op(self, *args, **kwargs):
return self.program.current_block().append_op(*args, **kwargs) return self.program.current_block().append_op(*args, **kwargs)
...@@ -66,16 +74,14 @@ class LayerHelper(object): ...@@ -66,16 +74,14 @@ class LayerHelper(object):
actual = self.kwargs.get('param_attr', None) actual = self.kwargs.get('param_attr', None)
return actual if actual is not None else default return actual if actual is not None else default
def bias_attr(self, shape, dtype): def bias_attr(self):
bias_attr = self.kwargs.get('bias_attr', None) bias_attr = self.kwargs.get('bias_attr', None)
if bias_attr is True: if bias_attr is True:
bias_attr = { bias_attr = {
'name': None, 'name': None,
'init_attr': { 'init_attr': {
'type': 'fill_constant', 'type': 'fill_constant',
'value': 0.0, 'value': 0.0
'shape': shape,
'dataType': dtype
} }
} }
return bias_attr return bias_attr
...@@ -113,22 +119,27 @@ class LayerHelper(object): ...@@ -113,22 +119,27 @@ class LayerHelper(object):
def create_parameter(self, attr, shape, dtype, suffix='w'): def create_parameter(self, attr, shape, dtype, suffix='w'):
if attr['name'] is None: if attr['name'] is None:
attr['name'] = unique_name(".".join([self.name, suffix])) attr['name'] = unique_name(".".join([self.name, suffix]))
return self.program.global_block().create_parameter( self.init_program.global_block().create_parameter(
name=attr['name'], name=attr['name'],
dtype=dtype, dtype=dtype,
shape=shape, shape=shape,
initialize_attr=attr['init_attr']) init_attr=attr['init_attr'])
return self.program.global_block().create_parameter(
name=attr['name'], dtype=dtype, shape=shape)
def create_tmp_variable(self, dtype): def create_tmp_variable(self, dtype):
return self.program.current_block().create_var( return self.program.current_block().create_var(
name=unique_name(".".join([self.name, 'tmp'])), dtype=dtype) name=unique_name(".".join([self.name, 'tmp'])),
dtype=dtype,
persistable=False)
def create_global_variable(self, *args, **kwargs): def create_global_variable(self, *args, **kwargs):
return self.program.global_block().create_var(*args, **kwargs) return self.program.global_block().create_var(
*args, persistable=False, **kwargs)
def append_bias_op(self, input_var): def append_bias_op(self, input_var):
size = list(input_var.shape[1:]) size = list(input_var.shape[1:])
bias_attr = self.bias_attr(size, dtype=input_var.data_type) bias_attr = self.bias_attr()
if not bias_attr: if not bias_attr:
return input_var return input_var
......
...@@ -3,7 +3,7 @@ import paddle.v2.framework.core as core ...@@ -3,7 +3,7 @@ import paddle.v2.framework.core as core
from paddle.v2.framework.framework import OpProtoHolder, Variable from paddle.v2.framework.framework import OpProtoHolder, Variable
import re import re
__all__ = ['fc', 'data', 'cross_entropy', 'conv2d'] __all__ = ['fc', 'data', 'cross_entropy', 'conv2d', 'pool2d']
def fc(input, def fc(input,
...@@ -13,7 +13,8 @@ def fc(input, ...@@ -13,7 +13,8 @@ def fc(input,
name=None, name=None,
act=None, act=None,
num_flatten_dims=1, num_flatten_dims=1,
program=None): program=None,
init_program=None):
# create helper # create helper
helper = LayerHelper('fc', **locals()) helper = LayerHelper('fc', **locals())
...@@ -35,7 +36,10 @@ def fc(input, ...@@ -35,7 +36,10 @@ def fc(input,
"Y": w, "Y": w,
}, },
outputs={"Out": tmp}, outputs={"Out": tmp},
attrs={'x_num_col_dims': num_flatten_dims}) attrs={
'x_num_col_dims': num_flatten_dims,
'y_num_col_dims': len(input_shape) - num_flatten_dims
})
mul_results.append(tmp) mul_results.append(tmp)
# sum # sum
...@@ -55,8 +59,11 @@ def data(name, ...@@ -55,8 +59,11 @@ def data(name,
shape, shape,
data_type='float32', data_type='float32',
type=core.VarDesc.VarType.LOD_TENSOR, type=core.VarDesc.VarType.LOD_TENSOR,
program=None): append_batch_size=True,
program=None,
init_program=None):
helper = LayerHelper('data', **locals()) helper = LayerHelper('data', **locals())
if append_batch_size:
shape = [-1] + shape # append batch size as -1 shape = [-1] + shape # append batch size as -1
return helper.create_global_variable( return helper.create_global_variable(
name=name, shape=shape, dtype=data_type, type=type) name=name, shape=shape, dtype=data_type, type=type)
...@@ -112,7 +119,7 @@ def _create_op_func_(op_type): ...@@ -112,7 +119,7 @@ def _create_op_func_(op_type):
_create_op_func_('mean') _create_op_func_('mean')
_create_op_func_('pool2d') _create_op_func_('mul')
def cross_entropy(input, label, **kwargs): def cross_entropy(input, label, **kwargs):
...@@ -155,7 +162,8 @@ def conv2d(input, ...@@ -155,7 +162,8 @@ def conv2d(input,
padding=None, padding=None,
bias_attr=None, bias_attr=None,
param_attr=None, param_attr=None,
program=None): program=None,
init_program=None):
helper = LayerHelper('conv2d', **locals()) helper = LayerHelper('conv2d', **locals())
dtype = helper.input_dtype() dtype = helper.input_dtype()
...@@ -167,6 +175,13 @@ def conv2d(input, ...@@ -167,6 +175,13 @@ def conv2d(input,
raise ValueError("num_channels must be divisible by groups.") raise ValueError("num_channels must be divisible by groups.")
num_filter_channels = num_channels / groups num_filter_channels = num_channels / groups
if isinstance(filter_size, int):
filter_size = [filter_size, filter_size]
if isinstance(stride, int):
stride = [stride, stride]
if isinstance(padding, int):
padding = [padding, padding]
input_shape = input.shape input_shape = input.shape
filter_shape = [num_filters, num_filter_channels] + filter_size filter_shape = [num_filters, num_filter_channels] + filter_size
filter = helper.create_parameter( filter = helper.create_parameter(
...@@ -187,3 +202,41 @@ def conv2d(input, ...@@ -187,3 +202,41 @@ def conv2d(input,
pre_act = helper.append_bias_op(pre_bias) pre_act = helper.append_bias_op(pre_bias)
return helper.append_activation(pre_act) return helper.append_activation(pre_act)
def pool2d(input,
pool_size,
pool_type,
pool_stride=[1, 1],
pool_padding=[0, 0],
global_pooling=False,
program=None,
init_program=None):
if pool_type not in ["max", "avg"]:
raise ValueError(
"Unknown pool_type: '%s'. It can only be 'max' or 'avg'.",
str(pool_type))
if isinstance(pool_size, int):
pool_size = [pool_size, pool_size]
if isinstance(pool_stride, int):
pool_stride = [pool_stride, pool_stride]
if isinstance(pool_padding, int):
pool_padding = [pool_padding, pool_padding]
helper = LayerHelper('conv2d', **locals())
dtype = helper.input_dtype()
pool_out = helper.create_tmp_variable(dtype)
helper.append_op(
type="pool2d",
inputs={"X": input},
outputs={"Out": pool_out},
attrs={
"pooling_type": pool_type,
"ksize": pool_size,
"global_pooling": global_pooling,
"strides": pool_stride,
"paddings": pool_padding
})
return pool_out
import paddle.v2.framework.layers as layers
def simple_img_conv_pool(input,
filter_size,
num_filters,
pool_size,
pool_stride,
act,
program=None,
init_program=None):
conv_out = layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
act=act,
program=program,
init_program=init_program)
pool_out = layers.pool2d(
input=conv_out,
pool_size=pool_size,
pool_type='max',
pool_stride=pool_stride,
program=program,
init_program=init_program)
return pool_out
import paddle.v2.framework.framework as framework import paddle.v2.framework.framework as framework
from collections import defaultdict
__all__ = ['SGDOptimizer'] __all__ = ['SGDOptimizer', 'MomentumOptimizer', 'AdagradOptimizer']
class Optimizer(object): class Optimizer(object):
"""Optimizer Base class. """Optimizer Base class.
Define the common interface of an optimizer. Define the common interface of an optimizer.
User should not use this class directly, but need to use one of it's implementation. User should not use this class directly,
but need to use one of it's implementation.
""" """
def __init__(self): def __init__(self):
pass # Dictionary of accumulators. Some optimizer subclasses need to
# allocate and manage extra variables associated with the parameters
# to train. These variables are called accumulators.
# {accum_name : { paramter_name : accumulator_for_parameter, ...}, ...}
self._accumulators = defaultdict(lambda: dict())
def _append_optimize_op(self, block, param_and_grad): def _append_optimize_op(self, block, param_and_grad):
""" append optimize operator to block and return all the added optimize_op """ append optimize operator to block and return all the added optimize_op
""" """
raise NotImplementedError() raise NotImplementedError()
def create_backward_pass(self, loss, parameter_list=None, no_grad_set=None): def _initialize_tensors(self, block):
"""Create all necessary tensors, that will be shared for all parameter updates.
Tensors like learning rate should be initialized here.
Args:
block: the block in which the loss variable is present
"""
pass
def _create_accumulators(self, block, parameters):
"""Create all accumulators needed by the parameters
Args:
block: the block in which the loss variable is present
parameters: list of parameter variables for the optimizer
""" """
create and add gradient Operators in BlockDesc to Compute gradients of `loss` pass
for parameters in parameter_list
def _add_accumulator(self, block, name, param, dtype=None, fill_value=0.0):
"""Utility function to add an accumulator for a parameter
Args:
block: the block in which the loss variable is present
name: name of the accumulator
param: parameter variable for which accumulator is to be added
dtype: data type of the accumulator variable
fill_value: value to initialize the accumulator variable
"""
if (name in self._accumulators and
param.name in self._accumulators[name]):
raise Exception("Accumulator {} already exists for parmeter {}".
format(name, param.name))
global_block = block.program.global_block()
param_shape = list(param.shape)
param_acc = global_block.create_var(
dtype=dtype, shape=param_shape, lod_level=0)
# Initialize the accumulator with fill_value
# FIXME: Fix when Initialization design has been implemented
# https://github.com/PaddlePaddle/Paddle/pull/4852
global_block.append_op(
type="fill_constant",
outputs={"Out": param_acc},
attrs={"shape": param_shape,
"value": fill_value})
# Add to accumulators dict
self._accumulators[name][param.name] = param_acc
def _get_accumulator(self, name, param):
"""Utility function to fetch an accumulator for a parameter
Args:
name: name of the accumulator
param: parameter variable for which accumulator is to be fetched
Returns:
accumulator variable for the parameter
"""
if (name not in self._accumulators or
param.name not in self._accumulators[name]):
raise Exception("Accumulator {} does not exist for parameter {}".
format(name, param.name))
return self._accumulators[name][param.name]
def create_backward_pass(self, loss, parameter_list=None, no_grad_set=None):
"""Create and add gradient Operators in BlockDesc to compute
gradients of `loss` for parameters in parameter_list
Args: Args:
loss: an variable generated by cost function. loss: an variable generated by cost function.
no_grad_set: variable that should not create gradient no_grad_set: variable that should not create gradient
parameter_list: parameters that need to compute gradient and update to optimize the lost. parameter_list: parameters that need to compute gradient and
update to optimize the lost.
Returns: Returns:
list of (parameters, gradients) pair. list of (parameters, gradients) pair.
...@@ -48,7 +120,8 @@ class Optimizer(object): ...@@ -48,7 +120,8 @@ class Optimizer(object):
if not grad_block.has_var(grad_info[0]): if not grad_block.has_var(grad_info[0]):
raise Exception("grad block[%d] did not have grad var %s" % raise Exception("grad block[%d] did not have grad var %s" %
grad_info[1], grad_info[0]) grad_info[1], grad_info[0])
param_var = loss.block.var(param) # Get the param var from the global block
param_var = loss.block.program.global_block().var(param)
grad_var = grad_block.var(grad_info[0]) grad_var = grad_block.var(grad_info[0])
if loss.block.has_var(grad_info[0]): if loss.block.has_var(grad_info[0]):
params_and_grads.append((param_var, grad_var)) params_and_grads.append((param_var, grad_var))
...@@ -64,14 +137,29 @@ class Optimizer(object): ...@@ -64,14 +137,29 @@ class Optimizer(object):
parameters_and_grads: a list of (variable, gradient) pair to update. parameters_and_grads: a list of (variable, gradient) pair to update.
Returns: Returns:
optmization_op_list: a list of optimization operator that will update parameter using gradient. optmization_op_list: a list of optimization operator that will update
parameter using gradient.
""" """
# This is a default implementation of create_optimization_pass that
# can be shared by most optimizers. This implementation assumes that
# the subclass will implement the _append_optimize_op method and the
# _initialize_tensors method. The subclass can extend the
# _create_accumulators method if it needs to create accumulators
# for parameters.
# Create any accumulators
self._create_accumulators(loss.block,
[p[0] for p in parameters_and_grads])
# Create any necessary tensors
self._initialize_tensors(loss.block)
optimize_ops = [] optimize_ops = []
for param_and_grad in parameters_and_grads: for param_and_grad in parameters_and_grads:
if param_and_grad[1] is not None: if param_and_grad[1] is not None:
optimize_op = self._append_optimize_op(loss.block, optimize_op = self._append_optimize_op(loss.block,
param_and_grad) param_and_grad)
optimize_ops.append(optimize_op) optimize_ops.append(optimize_op)
return optimize_ops return optimize_ops
def minimize(self, loss, parameter_list=None, no_grad_set=None): def minimize(self, loss, parameter_list=None, no_grad_set=None):
...@@ -92,33 +180,152 @@ class SGDOptimizer(Optimizer): ...@@ -92,33 +180,152 @@ class SGDOptimizer(Optimizer):
def __init__(self, learning_rate): def __init__(self, learning_rate):
assert learning_rate is not None assert learning_rate is not None
super(Optimizer, self).__init__() super(SGDOptimizer, self).__init__()
self.type = "sgd" self.type = "sgd"
self._learning_rate = learning_rate self._learning_rate = learning_rate
def _append_optimize_op(self, block, param_and_grad): def _initialize_tensors(self, block):
assert isinstance(block, framework.Block) assert isinstance(block, framework.Block)
lr_shape = [1] lr_shape = [1]
# create a var for learning_rate # create a variable for learning_rate
lr = block.create_var(dtype="float32", shape=lr_shape, lod_level=0) self._lr = block.create_var(
dtype="float32", shape=lr_shape, lod_level=0)
# create an op to init the learning_rate # create an op to init the learning_rate
init_op = block.append_op( # FIXME: Fix when Initialization design has been implemented
# https://github.com/PaddlePaddle/Paddle/pull/4852
block.append_op(
type="fill_constant", type="fill_constant",
outputs={"Out": lr}, outputs={"Out": self._lr},
attrs={"shape": lr_shape, attrs={"shape": lr_shape,
"value": self._learning_rate}) "value": self._learning_rate})
def _append_optimize_op(self, block, param_and_grad):
assert isinstance(block, framework.Block)
# create the optimize op # create the optimize op
sgd_op = block.append_op( sgd_op = block.append_op(
type=self.type, type=self.type,
inputs={ inputs={
"Param": param_and_grad[0], "Param": param_and_grad[0],
"Grad": param_and_grad[1], "Grad": param_and_grad[1],
"LearningRate": lr "LearningRate": self._lr
}, },
outputs={"ParamOut": param_and_grad[0]}, outputs={"ParamOut": param_and_grad[0]})
attrs={"shape": [1],
"value": self._learning_rate})
return sgd_op return sgd_op
class MomentumOptimizer(Optimizer):
"""Simple Momentum optimizer with velocity state
"""
_velocity_acc_str = "velocity"
def __init__(self, learning_rate, momentum):
assert learning_rate is not None
assert momentum is not None
super(MomentumOptimizer, self).__init__()
self.type = "momentum"
self._learning_rate = learning_rate
self._momentum = momentum
def _initialize_tensors(self, block):
assert isinstance(block, framework.Block)
lr_shape = [1]
# create a variable for learning_rate
self._lr = block.create_var(
dtype="float32", shape=lr_shape, lod_level=0)
# create an op to init the learning_rate
# FIXME: Fix when Initialization design has been implemented
# https://github.com/PaddlePaddle/Paddle/pull/4852
block.append_op(
type="fill_constant",
outputs={"Out": self._lr},
attrs={"shape": lr_shape,
"value": self._learning_rate})
def _create_accumulators(self, block, parameters):
assert isinstance(block, framework.Block)
for p in parameters:
self._add_accumulator(block, self._velocity_acc_str, p, 'float32')
def _append_optimize_op(self, block, param_and_grad):
assert isinstance(block, framework.Block)
velocity_acc = self._get_accumulator(self._velocity_acc_str,
param_and_grad[0])
# create the momentum optimize op
momentum_op = block.append_op(
type=self.type,
inputs={
"Param": param_and_grad[0],
"Grad": param_and_grad[1],
"Velocity": velocity_acc,
"LearningRate": self._lr
},
outputs={
"ParamOut": param_and_grad[0],
"VelocityOut": velocity_acc
},
attrs={"mu": self._momentum})
return momentum_op
class AdagradOptimizer(Optimizer):
"""Simple Adagrad optimizer with moment state
"""
_moment_acc_str = "moment"
def __init__(self, learning_rate, epsilon=1.0e-6):
assert learning_rate is not None
assert epsilon is not None
super(AdagradOptimizer, self).__init__()
self.type = "adagrad"
self._learning_rate = learning_rate
self._epsilon = epsilon
def _initialize_tensors(self, block):
assert isinstance(block, framework.Block)
lr_shape = [1]
# create a variable for learning_rate
self._lr = block.create_var(
dtype="float32", shape=lr_shape, lod_level=0)
# create an op to init the learning_rate
# FIXME: Fix when Initialization design has been implemented
# https://github.com/PaddlePaddle/Paddle/pull/4852
block.append_op(
type="fill_constant",
outputs={"Out": self._lr},
attrs={"shape": lr_shape,
"value": self._learning_rate})
def _create_accumulators(self, block, parameters):
assert isinstance(block, framework.Block)
for p in parameters:
self._add_accumulator(block, self._moment_acc_str, p, 'float32')
def _append_optimize_op(self, block, param_and_grad):
assert isinstance(block, framework.Block)
moment_acc = self._get_accumulator(self._moment_acc_str,
param_and_grad[0])
# create the adagrad optimizer op
adagrad_op = block.append_op(
type=self.type,
inputs={
"Param": param_and_grad[0],
"Grad": param_and_grad[1],
"Moment": moment_acc,
"LearningRate": self._lr
},
outputs={"ParamOut": param_and_grad[0],
"MomentOut": moment_acc},
attrs={"epsilon": self._epsilon})
return adagrad_op
...@@ -33,14 +33,12 @@ class TestAdamOp1(OpTest): ...@@ -33,14 +33,12 @@ class TestAdamOp1(OpTest):
self.attrs = {'epsilon': epsilon, 'beta1': beta1, 'beta2': beta2} self.attrs = {'epsilon': epsilon, 'beta1': beta1, 'beta2': beta2}
param_out, moment1_out, moment2_out, beta1_pow_out, \ param_out, moment1_out, \
beta2_pow_out = adam_step(self.inputs, self.attrs) moment2_out = adam_step(self.inputs, self.attrs)
self.outputs = { self.outputs = {
'Moment1Out': moment1_out, 'Moment1Out': moment1_out,
'Moment2Out': moment2_out, 'Moment2Out': moment2_out,
'Beta1PowOut': beta1_pow_out,
'Beta2PowOut': beta2_pow_out,
'ParamOut': param_out 'ParamOut': param_out
} }
...@@ -78,14 +76,12 @@ class TestAdamOp2(OpTest): ...@@ -78,14 +76,12 @@ class TestAdamOp2(OpTest):
attributes = {'epsilon': epsilon, 'beta1': beta1, 'beta2': beta2} attributes = {'epsilon': epsilon, 'beta1': beta1, 'beta2': beta2}
param_out, moment1_out, moment2_out, beta1_pow_out, \ param_out, moment1_out, \
beta2_pow_out = adam_step(self.inputs, attributes) moment2_out = adam_step(self.inputs, attributes)
self.outputs = { self.outputs = {
'Moment1Out': moment1_out, 'Moment1Out': moment1_out,
'Moment2Out': moment2_out, 'Moment2Out': moment2_out,
'Beta1PowOut': beta1_pow_out,
'Beta2PowOut': beta2_pow_out,
'ParamOut': param_out 'ParamOut': param_out
} }
...@@ -127,14 +123,12 @@ class TestAdamOpMultipleSteps(OpTest): ...@@ -127,14 +123,12 @@ class TestAdamOpMultipleSteps(OpTest):
def test_check_output(self): def test_check_output(self):
for _ in range(self.num_steps): for _ in range(self.num_steps):
param_out, moment1_out, moment2_out, beta1_pow_out, \ param_out, moment1_out, \
beta2_pow_out = adam_step(self.inputs, self.attrs) moment2_out = adam_step(self.inputs, self.attrs)
self.outputs = { self.outputs = {
'Moment1Out': moment1_out, 'Moment1Out': moment1_out,
'Moment2Out': moment2_out, 'Moment2Out': moment2_out,
'Beta1PowOut': beta1_pow_out,
'Beta2PowOut': beta2_pow_out,
'ParamOut': param_out 'ParamOut': param_out
} }
...@@ -145,8 +139,10 @@ class TestAdamOpMultipleSteps(OpTest): ...@@ -145,8 +139,10 @@ class TestAdamOpMultipleSteps(OpTest):
self.inputs['Param'] = param_out self.inputs['Param'] = param_out
self.inputs['Moment1'] = moment1_out self.inputs['Moment1'] = moment1_out
self.inputs['Moment2'] = moment2_out self.inputs['Moment2'] = moment2_out
self.inputs['Beta1Pow'] = beta1_pow_out
self.inputs['Beta2Pow'] = beta2_pow_out # Update powers of Beta1 and Beta2 for next time step
self.inputs['Beta1Pow'] *= self.attrs['beta1']
self.inputs['Beta2Pow'] *= self.attrs['beta1']
# Randomize gradient for next step # Randomize gradient for next step
self.inputs['Grad'] = np.random.uniform( self.inputs['Grad'] = np.random.uniform(
...@@ -175,11 +171,9 @@ def adam_step(inputs, attributes): ...@@ -175,11 +171,9 @@ def adam_step(inputs, attributes):
moment1_out = beta1 * moment1 + (1 - beta1) * grad moment1_out = beta1 * moment1 + (1 - beta1) * grad
moment2_out = beta2 * moment2 + (1 - beta2) * np.square(grad) moment2_out = beta2 * moment2 + (1 - beta2) * np.square(grad)
beta1_pow_out = beta1_pow * beta1 lr_t = lr * np.sqrt(1 - beta2_pow) / (1 - beta1_pow)
beta2_pow_out = beta2_pow * beta2
lr_t = lr * np.sqrt(1 - beta2_pow_out) / (1 - beta1_pow_out)
param_out = param - lr_t * (moment1_out / (np.sqrt(moment2_out) + epsilon)) param_out = param - lr_t * (moment1_out / (np.sqrt(moment2_out) + epsilon))
return param_out, moment1_out, moment2_out, beta1_pow_out, beta2_pow_out return param_out, moment1_out, moment2_out
if __name__ == "__main__": if __name__ == "__main__":
......
...@@ -31,14 +31,13 @@ class TestAdamaxOp1(OpTest): ...@@ -31,14 +31,13 @@ class TestAdamaxOp1(OpTest):
self.attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon} self.attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon}
param_out, moment_out, inf_norm_out, beta1_pow_out = adamax_step( param_out, moment_out, inf_norm_out = adamax_step(self.inputs,
self.inputs, self.attrs) self.attrs)
self.outputs = { self.outputs = {
'ParamOut': param_out, 'ParamOut': param_out,
'MomentOut': moment_out, 'MomentOut': moment_out,
'InfNormOut': inf_norm_out, 'InfNormOut': inf_norm_out
'Beta1PowOut': beta1_pow_out
} }
def test_check_output(self): def test_check_output(self):
...@@ -73,14 +72,12 @@ class TestAdamaxOp2(OpTest): ...@@ -73,14 +72,12 @@ class TestAdamaxOp2(OpTest):
} }
attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon} attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon}
param_out, moment_out, inf_norm_out, beta1_pow_out = adamax_step( param_out, moment_out, inf_norm_out = adamax_step(self.inputs, attrs)
self.inputs, attrs)
self.outputs = { self.outputs = {
'ParamOut': param_out, 'ParamOut': param_out,
'MomentOut': moment_out, 'MomentOut': moment_out,
'InfNormOut': inf_norm_out, 'InfNormOut': inf_norm_out
'Beta1PowOut': beta1_pow_out
} }
def test_check_output(self): def test_check_output(self):
...@@ -117,19 +114,15 @@ class TestAdamaxOpMultipleSteps(OpTest): ...@@ -117,19 +114,15 @@ class TestAdamaxOpMultipleSteps(OpTest):
self.attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon} self.attrs = {'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon}
param_out, moment_out, inf_norm_out, beta1_pow_out = adamax_step(
self.inputs, self.attrs)
def test_check_output(self): def test_check_output(self):
for _ in range(self.num_steps): for _ in range(self.num_steps):
param_out, moment_out, inf_norm_out, beta1_pow_out = adamax_step( param_out, moment_out, inf_norm_out = adamax_step(self.inputs,
self.inputs, self.attrs) self.attrs)
self.outputs = { self.outputs = {
'ParamOut': param_out, 'ParamOut': param_out,
'MomentOut': moment_out, 'MomentOut': moment_out,
'InfNormOut': inf_norm_out, 'InfNormOut': inf_norm_out
'Beta1PowOut': beta1_pow_out
} }
# Verify output for this step # Verify output for this step
...@@ -139,7 +132,9 @@ class TestAdamaxOpMultipleSteps(OpTest): ...@@ -139,7 +132,9 @@ class TestAdamaxOpMultipleSteps(OpTest):
self.inputs['Param'] = param_out self.inputs['Param'] = param_out
self.inputs['Moment'] = moment_out self.inputs['Moment'] = moment_out
self.inputs['InfNorm'] = inf_norm_out self.inputs['InfNorm'] = inf_norm_out
self.inputs['Beta1Pow'] = beta1_pow_out
# Update Beta1 Power accumulator for next step
self.inputs['Beta1Pow'] *= self.attrs['beta1']
# Randomize gradient for next step # Randomize gradient for next step
self.inputs['Grad'] = np.random.uniform( self.inputs['Grad'] = np.random.uniform(
...@@ -167,11 +162,10 @@ def adamax_step(inputs, attributes): ...@@ -167,11 +162,10 @@ def adamax_step(inputs, attributes):
moment_out = beta1 * moment + (1 - beta1) * grad moment_out = beta1 * moment + (1 - beta1) * grad
inf_norm_out = np.maximum(beta2 * inf_norm + epsilon, np.abs(grad)) inf_norm_out = np.maximum(beta2 * inf_norm + epsilon, np.abs(grad))
beta1_pow_out = beta1_pow * beta1 lr_t = (lr / (1 - beta1_pow))
lr_t = (lr / (1 - beta1_pow_out))
param_out = param - lr_t * np.divide(moment_out, inf_norm_out) param_out = param - lr_t * np.divide(moment_out, inf_norm_out)
return param_out, moment_out, inf_norm_out, beta1_pow_out return param_out, moment_out, inf_norm_out
if __name__ == "__main__": if __name__ == "__main__":
......
...@@ -21,7 +21,7 @@ class TestCrossEntropyOp1(OpTest): ...@@ -21,7 +21,7 @@ class TestCrossEntropyOp1(OpTest):
self.inputs = {"X": X, "Label": label} self.inputs = {"X": X, "Label": label}
self.outputs = {"Y": cross_entropy} self.outputs = {"Y": cross_entropy}
self.attrs = {"softLabel": False} self.attrs = {"soft_label": False}
def test_check_output(self): def test_check_output(self):
self.check_output() self.check_output()
......
...@@ -4,6 +4,12 @@ import unittest ...@@ -4,6 +4,12 @@ import unittest
from paddle.v2.framework.op import Operator, DynamicRecurrentOp from paddle.v2.framework.op import Operator, DynamicRecurrentOp
import numpy as np import numpy as np
# for siplicity, just one level LoD
lod_py = [[0, 4, 7, 9, 10]]
input_dim = 30
num_sents = len(lod_py[0]) - 1
weight_dim = 15
def create_tensor(scope, name, shape, np_data): def create_tensor(scope, name, shape, np_data):
tensor = scope.var(name).get_tensor() tensor = scope.var(name).get_tensor()
...@@ -12,6 +18,17 @@ def create_tensor(scope, name, shape, np_data): ...@@ -12,6 +18,17 @@ def create_tensor(scope, name, shape, np_data):
return tensor return tensor
class PyRNNStep(object):
def __init__(self):
self.x = np.random.normal(size=(lod_py[0][-1],
input_dim)).astype("float32")
self.W = np.random.normal(size=(input_dim, input_dim)).astype("float32")
self.U = np.random.normal(size=(input_dim, input_dim)).astype("float32")
self.h_boot = np.random.normal(size=(num_sents,
input_dim)).astype("float32")
class DynamicRecurrentOpTest(unittest.TestCase): class DynamicRecurrentOpTest(unittest.TestCase):
''' '''
Test RNNOp Test RNNOp
...@@ -23,17 +40,13 @@ class DynamicRecurrentOpTest(unittest.TestCase): ...@@ -23,17 +40,13 @@ class DynamicRecurrentOpTest(unittest.TestCase):
- U - U
vars: vars:
- x - x
memories: states:
- h - h
outputs: outputs:
- h - h
''' '''
# for siplicity, just one level LoD py = PyRNNStep()
lod_py = [[0, 4, 7, 9, 10]]
input_dim = 30
num_sents = len(lod_py[0]) - 1
weight_dim = 15
def forward(self): def forward(self):
self.scope = core.Scope() self.scope = core.Scope()
...@@ -42,64 +55,55 @@ class DynamicRecurrentOpTest(unittest.TestCase): ...@@ -42,64 +55,55 @@ class DynamicRecurrentOpTest(unittest.TestCase):
self.create_step_net() self.create_step_net()
ctx = core.DeviceContext.create(core.CPUPlace()) ctx = core.DeviceContext.create(core.CPUPlace())
self.rnnop.run(self.scope, ctx) self.rnnop.run(self.scope, ctx)
state = self.rnnop.get_state("h@mem") state = self.rnnop.get_state("h@state")
print 'state size: ', state.size() print 'state size: ', state.size()
step_inputs = self.rnnop.get_step_input("x") step_inputs = self.rnnop.get_step_input("x")
print "x size ", step_inputs.size() print "x size ", step_inputs.size()
for i in range(step_inputs.size()): for i in range(step_inputs.size()):
print "x %d" % i, np.array(step_inputs.read(i).get_dims()) print "x %d" % i, np.array(step_inputs.read(i).get_dims())
step_outputs = self.rnnop.get_step_output('h@mem') step_outputs = self.rnnop.get_step_output('h@state')
print 'step_outputs.size ', step_outputs.size() print 'step_outputs.size ', step_outputs.size()
output = self.scope.find_var("h@mem").get_tensor() output = self.scope.find_var("h@state").get_tensor()
print 'output', np.array(output).shape print 'output', np.array(output).shape
def create_global_variables(self): def create_global_variables(self):
x = np.random.normal(size=(self.lod_py[0][-1],
self.input_dim)).astype("float32")
W = np.random.normal(size=(self.input_dim,
self.input_dim)).astype("float32")
U = np.random.normal(size=(self.input_dim,
self.input_dim)).astype("float32")
h_boot = np.random.normal(size=(self.num_sents,
self.input_dim)).astype("float32")
# create inlink # create inlink
x_tensor = create_tensor(self.scope, "x", x_tensor = create_tensor(self.scope, "x", [num_sents, input_dim],
[self.num_sents, self.input_dim], x) self.py.x)
x_tensor.set_lod(self.lod_py) x_tensor.set_lod(lod_py)
create_tensor(self.scope, "W", [self.input_dim, self.input_dim], W) create_tensor(self.scope, "W", [input_dim, input_dim], self.py.W)
create_tensor(self.scope, "U", [self.input_dim, self.input_dim], U) create_tensor(self.scope, "U", [input_dim, input_dim], self.py.U)
create_tensor(self.scope, "h_boot", [self.num_sents, self.input_dim], create_tensor(self.scope, "h_boot", [num_sents, input_dim],
h_boot) self.py.h_boot)
self.scope.var("step_scopes") self.scope.var("step_scopes")
self.scope.var("h@mem") self.scope.var("h@state")
def create_rnn_op(self): def create_rnn_op(self):
# create RNNOp # create RNNOp
self.rnnop = DynamicRecurrentOp( self.rnnop = DynamicRecurrentOp(
# inputs # inputs
inlinks=["x"], inputs=["x"],
boot_memories=["h_boot"], initial_states=["h_boot"],
step_net="stepnet", step_net="step_unit",
# outputs # outputs
outlinks=["h@mem"], outputs=["h@state"],
step_scopes="step_scopes", step_scopes="step_scopes",
# attributes # attributes
pre_memories=["h@pre"], ex_states=["h@pre"],
memories=["h@mem"]) states=["h@state"])
def create_step_net(self): def create_step_net(self):
stepnet = core.Net.create() step_unit = core.Net.create()
x_fc_op = Operator("mul", X="x", Y="W", Out="Wx") x_fc_op = Operator("mul", X="x", Y="W", Out="Wx")
h_fc_op = Operator("mul", X="h@pre", Y="U", Out="Uh") h_fc_op = Operator("mul", X="h@pre", Y="U", Out="Uh")
sum_op = Operator("sum", X=["Wx", "Uh"], Out="sum") sum_op = Operator("sum", X=["Wx", "Uh"], Out="sum")
sig_op = Operator("sigmoid", X="sum", Y="h@mem") sig_op = Operator("sigmoid", X="sum", Y="h@state")
for op in [x_fc_op, h_fc_op, sum_op, sig_op]: for op in [x_fc_op, h_fc_op, sum_op, sig_op]:
stepnet.append_op(op) step_unit.append_op(op)
stepnet.complete_add_op(True) step_unit.complete_add_op(True)
self.rnnop.set_stepnet(stepnet) self.rnnop.set_step_unit(step_unit)
def test_forward(self): def test_forward(self):
print 'test recurrent op forward' print 'test recurrent op forward'
...@@ -107,5 +111,58 @@ class DynamicRecurrentOpTest(unittest.TestCase): ...@@ -107,5 +111,58 @@ class DynamicRecurrentOpTest(unittest.TestCase):
print 'pd_output', pd_output print 'pd_output', pd_output
class RecurrentGradientOpTest(unittest.TestCase):
py = PyRNNStep()
def create_forward_op(self):
# create RNNOp
self.forward_op = DynamicRecurrentOp(
# inputs
inputs=["x"],
initial_states=["h_boot"],
step_net="step_unit",
# outputs
outputs=["h@state"],
step_scopes="step_scopes",
# attributes
ex_states=["h@pre"],
states=["h@state"])
def create_gradient_op(self):
a = set()
backward_op = core.DynamicRecurrentOp.backward(self.forward_op, a)
def create_step_net(self):
step_unit = core.Net.create()
x_fc_op = Operator("mul", X="x", Y="W", Out="Wx")
h_fc_op = Operator("mul", X="h@pre", Y="U", Out="Uh")
sum_op = Operator("sum", X=["Wx", "Uh"], Out="sum")
sig_op = Operator("sigmoid", X="sum", Y="h@state")
for op in [x_fc_op, h_fc_op, sum_op, sig_op]:
step_unit.append_op(op)
step_unit.complete_add_op(True)
self.forward_op.set_step_unit(step_unit)
def create_global_variables(self):
# create inlink
x_tensor = create_tensor(self.scope, "x", [num_sents, input_dim],
self.py.x)
x_tensor.set_lod(lod_py)
create_tensor(self.scope, "W", [input_dim, input_dim], self.py.W)
create_tensor(self.scope, "U", [input_dim, input_dim], self.py.U)
create_tensor(self.scope, "h_boot", [num_sents, input_dim],
self.py.h_boot)
self.scope.var("step_scopes")
self.scope.var("h@state")
def test_grad(self):
self.scope = core.Scope()
self.create_forward_op()
self.create_global_variables()
self.create_step_net()
self.create_gradient_op()
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -92,5 +92,33 @@ class TestElementwiseAddOp_broadcast_3(TestElementwiseOp): ...@@ -92,5 +92,33 @@ class TestElementwiseAddOp_broadcast_3(TestElementwiseOp):
} }
class TestElementwiseAddOp_rowwise_add_0(TestElementwiseOp):
def setUp(self):
self.op_type = "elementwise_add"
self.inputs = {
'X': np.random.rand(2, 3, 4).astype(np.float32),
'Y': np.random.rand(3, 4).astype(np.float32)
}
self.attrs = {'axis': 1}
self.outputs = {
'Out': self.inputs['X'] + self.inputs['Y'].reshape(1, 3, 4)
}
class TestElementwiseAddOp_rowwise_add_1(TestElementwiseOp):
def setUp(self):
self.op_type = "elementwise_add"
self.inputs = {
'X': np.random.rand(2, 1).astype(np.float32),
'Y': np.random.rand(1).astype(np.float32)
}
self.attrs = {'axis': 1}
self.outputs = {
'Out': self.inputs['X'] + self.inputs['Y'].reshape(1, 1)
}
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
import unittest
from paddle.v2.framework.layers import mul, data
import paddle.v2.framework.core as core
from paddle.v2.framework.executor import Executor
from paddle.v2.framework.framework import g_program
import numpy
class TestExecutor(unittest.TestCase):
def test_mul(self):
a = data(name='a', shape=[784], data_type='float32')
b = data(
name='b',
shape=[784, 100],
data_type='float32',
append_batch_size=False)
out = mul(x=a, y=b)
place = core.CPUPlace()
a_np = numpy.random.random((100, 784)).astype('float32')
tensor_a = core.LoDTensor()
tensor_a.set(a_np, place)
b_np = numpy.random.random((784, 100)).astype('float32')
tensor_b = core.LoDTensor()
tensor_b.set(b_np, place)
exe = Executor(place)
outs = exe.run(g_program,
feed={'a': tensor_a,
'b': tensor_b},
fetch_list=[out])
out = numpy.array(outs[0])
self.assertEqual((100, 100), out.shape)
self.assertTrue(numpy.allclose(out, numpy.dot(a_np, b_np)))
if __name__ == '__main__':
unittest.main()
...@@ -5,6 +5,7 @@ import numpy as np ...@@ -5,6 +5,7 @@ import numpy as np
class TestFeedFetch(unittest.TestCase): class TestFeedFetch(unittest.TestCase):
def test_feed_fetch(self): def test_feed_fetch(self):
scope = core.Scope()
place = core.CPUPlace() place = core.CPUPlace()
input_array = np.ones((4, 4, 6)).astype("float32") input_array = np.ones((4, 4, 6)).astype("float32")
input_array[0, 0, 0] = 3 input_array[0, 0, 0] = 3
...@@ -12,9 +13,9 @@ class TestFeedFetch(unittest.TestCase): ...@@ -12,9 +13,9 @@ class TestFeedFetch(unittest.TestCase):
input_tensor = core.LoDTensor([[0, 2, 4]]) input_tensor = core.LoDTensor([[0, 2, 4]])
input_tensor.set(input_array, place) input_tensor.set(input_array, place)
core.set_feed_variable_float(input_tensor, "feed", 0) core.set_feed_variable(scope, input_tensor, "feed", 0)
output_tensor = core.get_fetch_variable("feed", 0) output_tensor = core.get_fetch_variable(scope, "feed", 0)
output_lod = output_tensor.lod() output_lod = output_tensor.lod()
self.assertEqual(0, output_lod[0][0]) self.assertEqual(0, output_lod[0][0])
......
import paddle.v2 as paddle
import paddle.v2.framework.layers as layers
import paddle.v2.framework.core as core
import paddle.v2.framework.optimizer as optimizer
from paddle.v2.framework.framework import Program, g_program
from paddle.v2.framework.executor import Executor
import numpy as np
init_program = Program()
program = Program()
x = layers.data(
name='x',
shape=[13],
data_type='float32',
program=program,
init_program=init_program)
y_predict = layers.fc(input=x,
size=1,
act=None,
program=program,
init_program=init_program)
y = layers.data(
name='y',
shape=[1],
data_type='float32',
program=program,
init_program=init_program)
cost = layers.square_error_cost(
input=y_predict, label=y, program=program, init_program=init_program)
avg_cost = layers.mean(x=cost, program=program, init_program=init_program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
opts = sgd_optimizer.minimize(avg_cost)
BATCH_SIZE = 20
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500),
batch_size=BATCH_SIZE)
place = core.CPUPlace()
exe = Executor(place)
exe.run(init_program, feed={}, fetch_list=[])
PASS_NUM = 100
for pass_id in range(PASS_NUM):
for data in train_reader():
x_data = np.array(map(lambda x: x[0], data)).astype("float32")
y_data = np.array(map(lambda x: x[1], data)).astype("float32")
tensor_x = core.LoDTensor()
tensor_x.set(x_data, place)
# print tensor_x.get_dims()
tensor_y = core.LoDTensor()
tensor_y.set(y_data, place)
# print tensor_y.get_dims()
outs = exe.run(program,
feed={'x': tensor_x,
'y': tensor_y},
fetch_list=[avg_cost])
out = np.array(outs[0])
if out[0] < 10.0:
exit(0) # if avg cost less than 10.0, we think our code is good.
exit(1)
import unittest
import numpy as np
from op_test import OpTest
class TestIncrementOpPositiveStep(OpTest):
"""Test increment op with positive step
"""
def setUp(self):
self.op_type = "increment"
self.inputs = {'X': np.random.random((10, 10)).astype("float32")}
self.attrs = {'step': 14.8}
self.outputs = {'Out': self.inputs['X'] + self.attrs['step']}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(['X'], 'Out')
class TestIncrementOpNegativeStep(OpTest):
"""Test increment op with negative step
"""
def setUp(self):
self.op_type = "increment"
self.inputs = {'X': np.random.random((10, 10)).astype("float32")}
self.attrs = {'step': -3.8}
self.outputs = {'Out': self.inputs['X'] + self.attrs['step']}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(['X'], 'Out')
if __name__ == "__main__":
unittest.main()
import paddle.v2.framework.layers as layers import paddle.v2.framework.layers as layers
import paddle.v2.framework.nets as nets
from paddle.v2.framework.framework import Program, g_program from paddle.v2.framework.framework import Program, g_program
import paddle.v2.framework.core as core import paddle.v2.framework.core as core
import unittest import unittest
...@@ -18,7 +19,7 @@ class TestBook(unittest.TestCase): ...@@ -18,7 +19,7 @@ class TestBook(unittest.TestCase):
avg_cost = layers.mean(x=cost, program=program) avg_cost = layers.mean(x=cost, program=program)
self.assertIsNotNone(avg_cost) self.assertIsNotNone(avg_cost)
program.append_backward(avg_cost, set()) program.append_backward(avg_cost)
print str(program) print str(program)
def test_recognize_digits_mlp(self): def test_recognize_digits_mlp(self):
...@@ -38,24 +39,52 @@ class TestBook(unittest.TestCase): ...@@ -38,24 +39,52 @@ class TestBook(unittest.TestCase):
cost = layers.cross_entropy(input=predict, label=label, program=program) cost = layers.cross_entropy(input=predict, label=label, program=program)
avg_cost = layers.mean(x=cost, program=program) avg_cost = layers.mean(x=cost, program=program)
self.assertIsNotNone(avg_cost) self.assertIsNotNone(avg_cost)
# print str(program) print str(program)
def test_simple_conv2d(self): def test_simple_conv2d(self):
pd = core.ProgramDesc.__create_program_desc__() program = Program()
program = Program(desc=pd) images = layers.data(
images = data_layer(
name='pixel', shape=[3, 48, 48], data_type='int32', program=program) name='pixel', shape=[3, 48, 48], data_type='int32', program=program)
conv2d_layer( layers.conv2d(
input=images, num_filters=3, filter_size=[4, 4], program=program) input=images, num_filters=3, filter_size=[4, 4], program=program)
# print str(program) print str(program)
def test_simple_conv2d(self): def test_recognize_digits_conv(self):
program = Program() program = Program()
images = layers.data( images = layers.data(
name='pixel', shape=[3, 48, 48], data_type='int32', program=program) name='pixel',
layers.conv2d( shape=[1, 28, 28],
input=images, num_filters=3, filter_size=[4, 4], program=program) data_type='float32',
program=program)
label = layers.data(
name='label', shape=[1], data_type='int32', program=program)
conv_pool_1 = nets.simple_img_conv_pool(
input=images,
filter_size=5,
num_filters=2,
pool_size=2,
pool_stride=2,
act="relu",
program=program)
conv_pool_2 = nets.simple_img_conv_pool(
input=conv_pool_1,
filter_size=5,
num_filters=4,
pool_size=2,
pool_stride=2,
act="relu",
program=program)
predict = layers.fc(input=conv_pool_2,
size=10,
act="softmax",
program=program)
cost = layers.cross_entropy(input=predict, label=label, program=program)
avg_cost = layers.mean(x=cost, program=program)
program.append_backward(avg_cost)
print str(program) print str(program)
......
...@@ -3,7 +3,7 @@ import numpy as np ...@@ -3,7 +3,7 @@ import numpy as np
from op_test import OpTest from op_test import OpTest
class TestMomentumOp(OpTest): class TestMomentumOp1(OpTest):
def setUp(self): def setUp(self):
self.op_type = "momentum" self.op_type = "momentum"
...@@ -12,6 +12,7 @@ class TestMomentumOp(OpTest): ...@@ -12,6 +12,7 @@ class TestMomentumOp(OpTest):
velocity = np.zeros((123, 321)).astype("float32") velocity = np.zeros((123, 321)).astype("float32")
learning_rate = np.array([0.001]).astype("float32") learning_rate = np.array([0.001]).astype("float32")
mu = 0.0001 mu = 0.0001
use_nesterov = False
self.inputs = { self.inputs = {
'Param': param, 'Param': param,
...@@ -23,6 +24,46 @@ class TestMomentumOp(OpTest): ...@@ -23,6 +24,46 @@ class TestMomentumOp(OpTest):
self.attrs = {'mu': mu} self.attrs = {'mu': mu}
velocity_out = mu * velocity + grad velocity_out = mu * velocity + grad
if use_nesterov:
param_out = param - grad * learning_rate + \
velocity_out * mu * learning_rate
else:
param_out = param - learning_rate * velocity_out
self.outputs = {'ParamOut': param_out, 'VelocityOut': velocity_out}
def test_check_output(self):
self.check_output()
class TestMomentumOp2(OpTest):
'''Test Momentum with defaukt values for attributes
'''
def setUp(self):
self.op_type = "momentum"
param = np.random.random((123, 321)).astype("float32")
grad = np.random.random((123, 321)).astype("float32")
velocity = np.zeros((123, 321)).astype("float32")
learning_rate = np.array([0.001]).astype("float32")
mu = 0.0001
use_nesterov = True
self.inputs = {
'Param': param,
'Grad': grad,
'Velocity': velocity,
'LearningRate': learning_rate
}
self.attrs = {'mu': mu, 'useNesterov': use_nesterov}
velocity_out = mu * velocity + grad
if use_nesterov:
param_out = param - grad * learning_rate + \
velocity_out * mu * learning_rate
else:
param_out = param - learning_rate * velocity_out param_out = param - learning_rate * velocity_out
self.outputs = {'ParamOut': param_out, 'VelocityOut': velocity_out} self.outputs = {'ParamOut': param_out, 'VelocityOut': velocity_out}
......
import unittest
import paddle.v2.framework.core as core
class TestOpSupportGPU(unittest.TestCase):
def test_case(self):
self.assertEqual(core.is_compile_gpu(), core.op_support_gpu("sum"))
if __name__ == '__main__':
unittest.main()
...@@ -6,7 +6,7 @@ import paddle.v2.framework.optimizer as optimizer ...@@ -6,7 +6,7 @@ import paddle.v2.framework.optimizer as optimizer
class TestOptimizer(unittest.TestCase): class TestOptimizer(unittest.TestCase):
def test_sgd_optimizer(self): def test_sgd_optimizer(self):
program = framework.g_program program = framework.Program()
block = program.global_block() block = program.global_block()
mul_x = block.create_parameter( mul_x = block.create_parameter(
dtype="float32", shape=[5, 10], lod_level=0, name="mul.x") dtype="float32", shape=[5, 10], lod_level=0, name="mul.x")
...@@ -14,7 +14,7 @@ class TestOptimizer(unittest.TestCase): ...@@ -14,7 +14,7 @@ class TestOptimizer(unittest.TestCase):
dtype="float32", shape=[10, 8], lod_level=0, name="mul.y") dtype="float32", shape=[10, 8], lod_level=0, name="mul.y")
mul_out = block.create_var( mul_out = block.create_var(
dtype="float32", shape=[5, 8], lod_level=0, name="mul.out") dtype="float32", shape=[5, 8], lod_level=0, name="mul.out")
mul_op = block.append_op( block.append_op(
type="mul", type="mul",
inputs={"X": mul_x, inputs={"X": mul_x,
"Y": mul_y}, "Y": mul_y},
...@@ -27,5 +27,88 @@ class TestOptimizer(unittest.TestCase): ...@@ -27,5 +27,88 @@ class TestOptimizer(unittest.TestCase):
self.assertEqual(sgd_op.type, "sgd") self.assertEqual(sgd_op.type, "sgd")
class TestMomentumOptimizer(unittest.TestCase):
class MockMomentum(optimizer.MomentumOptimizer):
def get_accumulators(self):
return self._accumulators
def get_velocity_str(self):
return self._velocity_acc_str
def test_momentum_optimizer(self):
program = framework.Program()
block = program.global_block()
mul_x = block.create_parameter(
dtype="float32", shape=[5, 10], lod_level=0, name="mul.x")
mul_y = block.create_var(
dtype="float32", shape=[10, 8], lod_level=0, name="mul.y")
mul_out = block.create_var(
dtype="float32", shape=[5, 8], lod_level=0, name="mul.out")
block.append_op(
type="mul",
inputs={"X": mul_x,
"Y": mul_y},
outputs={"Out": mul_out},
attrs={"x_num_col_dims": 1})
momentum_optimizer = self.MockMomentum(learning_rate=0.01, momentum=0.2)
params_grads = momentum_optimizer.create_backward_pass(mul_out)
self.assertEqual(len(params_grads), 1)
self.assertEqual(len(momentum_optimizer.get_accumulators()), 0)
opts = momentum_optimizer.create_optimization_pass(params_grads,
mul_out)
self.assertEqual(len(opts), 1)
sgd_op = opts[0]
self.assertEqual(sgd_op.type, "momentum")
# Check accumulators
accumulators = momentum_optimizer.get_accumulators()
self.assertEqual(len(accumulators), 1)
self.assertTrue(momentum_optimizer.get_velocity_str() in accumulators)
velocity_acc = accumulators[momentum_optimizer.get_velocity_str()]
self.assertEqual(len(velocity_acc), 1)
self.assertTrue(mul_x.name in velocity_acc)
class TestAdagradOptimizer(unittest.TestCase):
class MockAdagrad(optimizer.AdagradOptimizer):
def get_accumulators(self):
return self._accumulators
def get_moment_str(self):
return self._moment_acc_str
def test_adagrad_optimizer(self):
program = framework.Program()
block = program.global_block()
mul_x = block.create_parameter(
dtype="float32", shape=[5, 10], lod_level=0, name="mul.x")
mul_y = block.create_var(
dtype="float32", shape=[10, 8], lod_level=0, name="mul.y")
mul_out = block.create_var(
dtype="float32", shape=[5, 8], lod_level=0, name="mul.out")
block.append_op(
type="mul",
inputs={"X": mul_x,
"Y": mul_y},
outputs={"Out": mul_out},
attrs={"x_num_col_dims": 1})
adagrad_optimizer = self.MockAdagrad(learning_rate=0.01, epsilon=1.0e-6)
params_grads = adagrad_optimizer.create_backward_pass(mul_out)
self.assertEqual(len(params_grads), 1)
self.assertEqual(len(adagrad_optimizer.get_accumulators()), 0)
opts = adagrad_optimizer.create_optimization_pass(params_grads, mul_out)
self.assertEqual(len(opts), 1)
adagrad_op = opts[0]
self.assertEqual(adagrad_op.type, "adagrad")
# check accumulators
accumulators = adagrad_optimizer.get_accumulators()
self.assertEqual(len(accumulators), 1)
self.assertTrue(adagrad_optimizer.get_moment_str() in accumulators)
moment_acc = accumulators[adagrad_optimizer.get_moment_str()]
self.assertEqual(len(moment_acc), 1)
self.assertTrue(mul_x.name in moment_acc)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
import paddle.v2 as paddle
import paddle.v2.framework.layers as layers
import paddle.v2.framework.nets as nets
import paddle.v2.framework.core as core
import paddle.v2.framework.optimizer as optimizer
from paddle.v2.framework.framework import Program, g_program
from paddle.v2.framework.executor import Executor
import numpy as np
init_program = Program()
program = Program()
images = layers.data(
name='pixel',
shape=[1, 28, 28],
data_type='float32',
program=program,
init_program=init_program)
label = layers.data(
name='label',
shape=[1],
data_type='int32',
program=program,
init_program=init_program)
conv_pool_1 = nets.simple_img_conv_pool(
input=images,
filter_size=5,
num_filters=20,
pool_size=2,
pool_stride=2,
act="relu",
program=program,
init_program=init_program)
conv_pool_2 = nets.simple_img_conv_pool(
input=conv_pool_1,
filter_size=5,
num_filters=50,
pool_size=2,
pool_stride=2,
act="relu",
program=program,
init_program=init_program)
predict = layers.fc(input=conv_pool_2,
size=10,
act="softmax",
program=program,
init_program=init_program)
cost = layers.cross_entropy(
input=predict, label=label, program=program, init_program=init_program)
avg_cost = layers.mean(x=cost, program=program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
opts = sgd_optimizer.minimize(avg_cost)
BATCH_SIZE = 50
PASS_NUM = 1
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=500),
batch_size=BATCH_SIZE)
place = core.CPUPlace()
exe = Executor(place)
exe.run(init_program, feed={}, fetch_list=[])
for pass_id in range(PASS_NUM):
count = 0
for data in train_reader():
img_data = np.array(map(lambda x: x[0].reshape([1, 28, 28]),
data)).astype("float32")
y_data = np.array(map(lambda x: x[1], data)).astype("int32")
y_data = y_data.reshape([BATCH_SIZE, 1])
tensor_img = core.LoDTensor()
tensor_y = core.LoDTensor()
tensor_img.set(img_data, place)
tensor_y.set(y_data, place)
outs = exe.run(program,
feed={"pixel": tensor_img,
"label": tensor_y},
fetch_list=[avg_cost])
loss = np.array(outs[0])
if loss < 10.0:
exit(0) # if avg cost less than 10.0, we think our code is good.
exit(1)
import paddle.v2 as paddle
import paddle.v2.framework.layers as layers
import paddle.v2.framework.core as core
import paddle.v2.framework.optimizer as optimizer
from paddle.v2.framework.framework import Program, g_program
from paddle.v2.framework.executor import Executor
import numpy as np
init_program = Program()
program = Program()
image = layers.data(
name='x',
shape=[784],
data_type='float32',
program=program,
init_program=init_program)
hidden1 = layers.fc(input=image,
size=128,
act='relu',
program=program,
init_program=init_program)
hidden2 = layers.fc(input=hidden1,
size=64,
act='relu',
program=program,
init_program=init_program)
predict = layers.fc(input=hidden2,
size=10,
act='softmax',
program=program,
init_program=init_program)
label = layers.data(
name='y',
shape=[1],
data_type='int32',
program=program,
init_program=init_program)
cost = layers.cross_entropy(
input=predict, label=label, program=program, init_program=init_program)
avg_cost = layers.mean(x=cost, program=program, init_program=init_program)
sgd_optimizer = optimizer.SGDOptimizer(learning_rate=0.001)
opts = sgd_optimizer.minimize(avg_cost)
BATCH_SIZE = 128
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=8192),
batch_size=BATCH_SIZE)
place = core.CPUPlace()
exe = Executor(place)
exe.run(init_program, feed={}, fetch_list=[])
PASS_NUM = 100
for pass_id in range(PASS_NUM):
for data in train_reader():
x_data = np.array(map(lambda x: x[0], data)).astype("float32")
y_data = np.array(map(lambda x: x[1], data)).astype("int32")
y_data = np.expand_dims(y_data, axis=1)
tensor_x = core.LoDTensor()
tensor_x.set(x_data, place)
tensor_y = core.LoDTensor()
tensor_y.set(y_data, place)
outs = exe.run(program,
feed={'x': tensor_x,
'y': tensor_y},
fetch_list=[avg_cost])
out = np.array(outs[0])
if out[0] < 5.0:
exit(0) # if avg cost less than 5.0, we think our code is good.
exit(1)
...@@ -132,15 +132,15 @@ class RecurrentOpTest(unittest.TestCase): ...@@ -132,15 +132,15 @@ class RecurrentOpTest(unittest.TestCase):
# create RNNOp # create RNNOp
self.rnnop = RecurrentOp( self.rnnop = RecurrentOp(
# inputs # inputs
inlinks=["x"], inputs=["x"],
boot_memories=["h_boot"], initial_states=["h_boot"],
step_net="stepnet", step_net="stepnet",
# outputs # outputs
outlinks=["h@mem"], outputs=["h@mem"],
step_scopes="step_scopes", step_scopes="step_scopes",
# attributes # attributes
pre_memories=["h@pre"], ex_states=["h@pre"],
memories=["h@mem"]) states=["h@mem"])
def create_step_net(self): def create_step_net(self):
stepnet = core.Net.create() stepnet = core.Net.create()
...@@ -169,15 +169,15 @@ class RecurrentGradientOpTest(unittest.TestCase): ...@@ -169,15 +169,15 @@ class RecurrentGradientOpTest(unittest.TestCase):
def create_forward_op(self): def create_forward_op(self):
self.forward_op = RecurrentOp( self.forward_op = RecurrentOp(
# inputs # inputs
inlinks=["x"], inputs=["x"],
boot_memories=["h_boot"], initial_states=["h_boot"],
step_net="stepnet", step_net="stepnet",
# outputs # outputs
outlinks=["h"], outputs=["h"],
step_scopes="step_scopes", step_scopes="step_scopes",
# attributes # attributes
pre_memories=["h@pre"], ex_states=["h@pre"],
memories=["h@alias"]) states=["h@alias"])
# create a stepnet for RNN # create a stepnet for RNN
stepnet = core.Net.create() stepnet = core.Net.create()
......
...@@ -85,5 +85,33 @@ class Test1DReduce(OpTest): ...@@ -85,5 +85,33 @@ class Test1DReduce(OpTest):
self.check_grad(['X'], 'Out') self.check_grad(['X'], 'Out')
class TestNorm(OpTest):
def setUp(self):
# use x away from 0 to avoid errors of numerical gradient when gradient near 0
x = np.random.random((5, 6, 10)).astype("float32") + 0.2
p = 2
dim = 1
keep_dim = False
abs_out = np.absolute(x)
pow_out = np.power(x, p)
sum_out = np.sum(pow_out, axis=dim, keepdims=keep_dim)
out = np.power(sum_out, 1. / p)
self.op_type = "norm"
self.inputs = {'X': x}
self.attrs = {"p": p, "dim": dim, "keep_dim": keep_dim}
self.outputs = {
"AbsOut": abs_out,
"PowOut": pow_out,
"SumOut": sum_out,
"Out": out
}
def test_check_output(self):
self.check_output()
def test_check_grad(self):
self.check_grad(['X'], 'Out', max_relative_error=0.01)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -46,7 +46,7 @@ class TestRmspropOp1(OpTest): ...@@ -46,7 +46,7 @@ class TestRmspropOp1(OpTest):
class TestRmspropOp2(OpTest): class TestRmspropOp2(OpTest):
'''Test RMSProp with defaukt values for attributes '''Test RMSProp with default values for attributes
''' '''
def setUp(self): def setUp(self):
......
...@@ -19,7 +19,7 @@ class TestUniformRandomOp(unittest.TestCase): ...@@ -19,7 +19,7 @@ class TestUniformRandomOp(unittest.TestCase):
op = Operator( op = Operator(
"uniform_random", "uniform_random",
Out='X', Out='X',
dims=[1000, 784], shape=[1000, 784],
min=-5.0, min=-5.0,
max=10.0, max=10.0,
seed=10) seed=10)
......
...@@ -101,6 +101,10 @@ class Parameters(object): ...@@ -101,6 +101,10 @@ class Parameters(object):
self.__param_conf__[param_conf.name] = param_conf self.__param_conf__[param_conf.name] = param_conf
def update_param_conf(self, model_config):
for p in model_config.parameters:
self.__param_conf__[p.name] = p
def keys(self): def keys(self):
""" """
keys are the names of each parameter. keys are the names of each parameter.
......
...@@ -5,3 +5,4 @@ py_test(test_topology SRCS test_topology.py) ...@@ -5,3 +5,4 @@ py_test(test_topology SRCS test_topology.py)
py_test(test_rnn_layer SRCS test_rnn_layer.py) py_test(test_rnn_layer SRCS test_rnn_layer.py)
py_test(test_parameters SRCS test_parameters.py) py_test(test_parameters SRCS test_parameters.py)
py_test(test_data_feeder SRCS test_data_feeder.py) py_test(test_data_feeder SRCS test_data_feeder.py)
py_test(test_paramconf_order SRCS test_paramconf_order.py)
...@@ -97,7 +97,7 @@ class DataFeederTest(unittest.TestCase): ...@@ -97,7 +97,7 @@ class DataFeederTest(unittest.TestCase):
each_sample.append(zip(a, b)) each_sample.append(zip(a, b))
data.append(each_sample) data.append(each_sample)
feeder = DataFeeder([('input', data_type.sparse_vector(dim))], feeder = DataFeeder([('input', data_type.sparse_float_vector(dim))],
{'input': 0}) {'input': 0})
arg = feeder(data) arg = feeder(data)
output = arg.getSlotValue(0) output = arg.getSlotValue(0)
......
# Copyright PaddlePaddle contributors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
import math
import paddle.v2 as paddle
def wordemb(inlayer):
wordemb = paddle.layer.table_projection(
input=inlayer,
size=5,
param_attr=paddle.attr.Param(
name="_proj", initial_std=0.001, learning_rate=1, l2_rate=0))
return wordemb
def train():
word_dict = paddle.dataset.imikolov.build_dict()
dict_size = len(word_dict)
# Every layer takes integer value of range [0, dict_size)
firstword = paddle.layer.data(
name="firstw", type=paddle.data_type.integer_value(dict_size))
secondword = paddle.layer.data(
name="secondw", type=paddle.data_type.integer_value(dict_size))
thirdword = paddle.layer.data(
name="thirdw", type=paddle.data_type.integer_value(dict_size))
fourthword = paddle.layer.data(
name="fourthw", type=paddle.data_type.integer_value(dict_size))
nextword = paddle.layer.data(
name="fifthw", type=paddle.data_type.integer_value(dict_size))
Efirst = wordemb(firstword)
Esecond = wordemb(secondword)
Ethird = wordemb(thirdword)
Efourth = wordemb(fourthword)
contextemb = paddle.layer.concat(input=[Efirst, Esecond, Ethird, Efourth])
hidden1 = paddle.layer.fc(name="fc1",
input=contextemb,
size=128,
act=paddle.activation.Sigmoid(),
layer_attr=paddle.attr.Extra(drop_rate=0.5),
bias_attr=paddle.attr.Param(learning_rate=2),
param_attr=paddle.attr.Param(
initial_std=1. / math.sqrt(5 * 8),
learning_rate=1,
l2_rate=6e-4))
predictword = paddle.layer.fc(input=hidden1,
size=dict_size,
bias_attr=paddle.attr.Param(learning_rate=2),
act=paddle.activation.Softmax())
return paddle.layer.classification_cost(input=predictword, label=nextword)
class TestParamConfOrder(unittest.TestCase):
def test_param_conf_order(self):
paddle.init()
cost = train()
parameters = paddle.parameters.create(cost)
adagrad = paddle.optimizer.AdaGrad(
learning_rate=3e-3,
regularization=paddle.optimizer.L2Regularization(rate=8e-4))
trainer = paddle.trainer.SGD(cost, parameters, adagrad)
for p in trainer.get_topology_proto().parameters:
if p.name == "_fc1.w0":
self.assertEqual(p.decay_rate, 6e-4)
else:
self.assertEqual(p.decay_rate, 8e-4)
if __name__ == '__main__':
unittest.main()
...@@ -19,6 +19,7 @@ import paddle.trainer_config_helpers as conf_helps ...@@ -19,6 +19,7 @@ import paddle.trainer_config_helpers as conf_helps
import layer as v2_layer import layer as v2_layer
import config_base import config_base
import cPickle import cPickle
from paddle.trainer import config_parser as cp
__all__ = ['Topology'] __all__ = ['Topology']
...@@ -50,6 +51,35 @@ class Topology(object): ...@@ -50,6 +51,35 @@ class Topology(object):
assert isinstance(self.__model_config__, ModelConfig) assert isinstance(self.__model_config__, ModelConfig)
def update_from_default(self):
# HACK(typhoonzero): update ParameterConfig(proto) in case of
# optimizers are defined after layers, or between layers.
# Must be called from trainer.__init__()
for parameter in self.__model_config__.parameters:
if parameter.momentum == 0.0 and cp.g_default_momentum:
parameter.momentum = cp.g_default_momentum
if parameter.decay_rate == 0.0 and cp.g_default_decay_rate:
parameter.decay_rate = cp.g_default_decay_rate
if parameter.initial_mean == 0.0:
parameter.initial_mean = cp.g_default_initial_mean
if parameter.initial_std == 0.01:
parameter.initial_std = cp.g_default_initial_std
if parameter.initial_strategy == 0:
parameter.initial_strategy = cp.g_default_initial_strategy
if parameter.initial_smart == False:
parameter.initial_smart = cp.g_default_initial_smart
if parameter.num_batches_regularization == 1 and \
cp.g_default_num_batches_regularization:
parameter.num_batches_regularization = \
cp.g_default_num_batches_regularization
if parameter.gradient_clipping_threshold == 0.0 and \
cp.g_default_gradient_clipping_threshold:
parameter.gradient_clipping_threshold = \
cp.g_default_gradient_clipping_threshold
if parameter.device == -1 and cp.g_default_device:
parameter.device = cp.g_default_device
# FIXME(typhoonzero): ignored: update_hooks, g_default_compact_func
def use_sparse_updater(self): def use_sparse_updater(self):
""" """
check if any parameter require to use sparse_update check if any parameter require to use sparse_update
......
...@@ -64,6 +64,11 @@ class SGD(object): ...@@ -64,6 +64,11 @@ class SGD(object):
"paddle.v2.optimizer.Optimizer") "paddle.v2.optimizer.Optimizer")
import py_paddle.swig_paddle as api import py_paddle.swig_paddle as api
topology = Topology(cost, extra_layers=extra_layers) topology = Topology(cost, extra_layers=extra_layers)
# HACK(typhoonzero): update ParameterConfig(proto) in case of optimizers
# are defined after layers, or between layers.
topology.update_from_default()
parameters.update_param_conf(topology.proto())
self.__optimizer__ = update_equation self.__optimizer__ = update_equation
self.__topology__ = topology self.__topology__ = topology
self.__parameters__ = parameters self.__parameters__ = parameters
...@@ -91,6 +96,9 @@ class SGD(object): ...@@ -91,6 +96,9 @@ class SGD(object):
self.__parameters__.append_gradient_machine(gm) self.__parameters__.append_gradient_machine(gm)
self.__parameter_updater__ = None self.__parameter_updater__ = None
def get_topology_proto(self):
return self.__topology_in_proto__
def __use_remote_sparse_updater__(self): def __use_remote_sparse_updater__(self):
return self.__use_sparse_updater__ and not self.__is_local__ return self.__use_sparse_updater__ and not self.__is_local__
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册