& no_grad_vars);
-```
-
-The implementation behind it can be divided into two parts, **Backward Operator Creating** and **Backward Operator Building**.
-
-### Backward Operator Registry
-
-A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs, and output gradients and then calculate its input gradients.
-
-| | forward operator | backward operator
-| ---------------------- | ---------------- |------------------------- |
-| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients |
-| **Operator::outputs_** | Outputs | InputGradients |
-
- In most cases, there is a one-to-one relation between the forward and backward operators. These relations are recorded by a global hash map(`OpInfoMap`). To follow the philosophy of minimum core and to make operators pluggable, the registry mechanism is introduced.
-
-For example, we have `mul_op`, and we can register its information and corresponding backward operator by the following macro:
-
-```cpp
-REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
-```
-
-`mul` is the operator's type. `MulOp` and `MulOpMaker` are the operator class and the operator maker class respectively.
-
-`mul_grad` is the type of backward operator, and `MulOpGrad` is its class name.
-
-### Backward Opeartor Creating
-
-Given a certain forward operator, we can get its corresponding backward operator by calling:
-
-```cpp
-OperatorBase* bwd_op = BuildGradOp(const OperatorBase* fwd_op);
-```
-
-The function `BuildGradOp` will sequentially execute following processes:
-
-1. Get the `type_` of given forward operator, and then get the corresponding backward operator's type by looking up the `OpInfoMap`.
-
-2. Build two maps named `inputs` and `outputs` to temporarily store backward operator's inputs and outputs. Copy forward operator's `inputs_` and `outputs_` to map `inputs`, except these, are not necessary for gradient computing.
-
-3. Add forward inputs' gradient variables into map `output`, adding forward outputs' gradient variables into map `input`.
-
-4. Building backward operator with `inputs`, `outputs` and forward operator's attributes.
-
-### Backward Network Building
-
-A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and appending them together one by one. There are some corner cases that need special processing.
-
-1. Op
-
- When the input forward network is an Op, return its gradient Operator immediately. If all of its outputs are in no gradient set, then return a special `NOP`.
-
-2. NetOp
-
- In our design, the network itself is also a kind of operator(**NetOp**). So the operators contained by a big network may be some small network. When the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp.
-
-3. RnnOp
-
- RnnOp is a nested stepnet operator. Backward module needs to recusively call `Backward` for every stepnet.
-
-4. Sharing Variables
-
- As illustrated in the figure 1 and figure 2, two operators share the same variable name **W@GRAD**, which will overwrite their shared input variable.
-
-
-
-
- Figure 1. Sharing variables in operators.
-
-
-
- Sharing variable between operators or same input variable used in multiple operators can lead to duplicate gradient variables. As illustrated in figure 2, we need to rename the gradient names recursively and add a generic add operator to prevent overwriting.
-
-
-
-
- Figure 2. Replace sharing variable's gradient with `Add` operator.
-
-
-
- Because the framework finds variables according to their names, we need to rename the output links. We add an integer suffix to represent its position in the clockwise direction.
-
-5. Part of the Gradient is Zero.
-
- In the whole graph, there is some case of that one operator's gradient is not needed, but its input's gradient is a dependency link of other operator, we need to fill a same shape gradient matrix in the position. In our implementation, we insert a special `fillZeroLike` operator.
-
-
-Follow these rules above, then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it.
diff --git a/paddle/framework/block_desc.cc b/paddle/framework/block_desc.cc
index 0668b08ff7ab3c8ca4f1e989fc7af45a8ec5f63c..54498e175dacfa0a220e3d839f4feb02502b2c03 100644
--- a/paddle/framework/block_desc.cc
+++ b/paddle/framework/block_desc.cc
@@ -53,12 +53,12 @@ VarDesc *BlockDesc::FindVarRecursive(const std::string &name) const {
return it->second.get();
}
-VarDesc *BlockDesc::FindRecursiveOrCreateVar(const std::string &name_bytes) {
+VarDesc &BlockDesc::FindRecursiveOrCreateVar(const std::string &name_bytes) {
VarDesc *res = FindVarRecursive(name_bytes);
if (res == nullptr) {
res = Var(name_bytes);
}
- return res;
+ return *res;
}
bool BlockDesc::HasVarRecursive(const std::string &name) const {
diff --git a/paddle/framework/block_desc.h b/paddle/framework/block_desc.h
index 6c8c81b332d99e52db41018e117aa837be6745bc..4b609e4bcb67bb8dda5924a639e7a8165eda4353 100644
--- a/paddle/framework/block_desc.h
+++ b/paddle/framework/block_desc.h
@@ -57,7 +57,7 @@ class BlockDesc {
VarDesc *FindVarRecursive(const std::string &name_bytes) const;
- VarDesc *FindRecursiveOrCreateVar(const std::string &name_bytes);
+ VarDesc &FindRecursiveOrCreateVar(const std::string &name_bytes);
bool HasVarRecursive(const std::string &var_name) const;
diff --git a/paddle/framework/data_transform.cc b/paddle/framework/data_transform.cc
index 35f16025a9ae44bd70e15b19b25deb08299bea88..fed958db1584c4fda5394d59a2ef8936045a9ce9 100644
--- a/paddle/framework/data_transform.cc
+++ b/paddle/framework/data_transform.cc
@@ -11,8 +11,13 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
+#include
#include "paddle/framework/data_transform.h"
+#include "paddle/framework/device_data_transform.h"
+#include "paddle/framework/lod_tensor.h"
+#include "paddle/framework/selected_rows.h"
+#include "paddle/platform/device_context.h"
namespace paddle {
namespace framework {
@@ -22,5 +27,165 @@ DataTransformFnMap& DataTransformFnMap::Instance() {
return data_transform_map;
}
+Tensor* DataTransform(const OpKernelType& expected_kernel_type,
+ const OpKernelType& kernel_type_for_var,
+ const Tensor& input_tensor) {
+ Tensor* out = nullptr;
+ if (!platform::is_same_place(kernel_type_for_var.place_,
+ expected_kernel_type.place_)) {
+ out = DeviceTransform(input_tensor, expected_kernel_type.place_);
+ }
+ PADDLE_ENFORCE_NOT_NULL(out, "out should not be null");
+ return out;
+}
+
+void CopyVariableWithTensor(const Variable& in_var, const Tensor& tensor,
+ Variable& out_var) {
+ if (in_var.IsType()) {
+ auto& in_lod_tensor = in_var.Get();
+ auto* tran_lod_tensor = out_var.GetMutable();
+ tran_lod_tensor->set_lod(in_lod_tensor.lod());
+ tran_lod_tensor->set_layout(in_lod_tensor.layout());
+ tran_lod_tensor->ShareDataWith(tensor);
+ } else if (in_var.IsType()) {
+ auto& in_selected_rows = in_var.Get();
+ auto* trans_selected_rows = out_var.GetMutable();
+ trans_selected_rows->set_height(in_selected_rows.height());
+ trans_selected_rows->set_rows(in_selected_rows.rows());
+ trans_selected_rows->mutable_value()->ShareDataWith(tensor);
+ } else {
+ PADDLE_THROW("unknown var type");
+ }
+}
+
+auto KernelFP32 = OpKernelType(proto::DataType::FP32, platform::CPUPlace(),
+ DataLayout::kNHWC, LibraryType::kPlain);
+
+auto KernelFP64 = OpKernelType(proto::DataType::FP64, platform::CPUPlace(),
+ DataLayout::kNHWC, LibraryType::kPlain);
+
+auto KernelNHWC = OpKernelType(proto::DataType::FP64, platform::CPUPlace(),
+ DataLayout::kNHWC, LibraryType::kPlain);
+
+auto KernelNCHW = OpKernelType(proto::DataType::FP64, platform::CPUPlace(),
+ DataLayout::kNCHW, LibraryType::kPlain);
+
+// TODO(dzhwinter): Only for testing multiple op kernel.
+// Dummy transform function for library_type
+// should be removed.
+auto KernelPlain = OpKernelType(proto::DataType::FP32, platform::CUDAPlace(0),
+ DataLayout::kAnyLayout, LibraryType::kPlain);
+
+auto KernelCUDNN = OpKernelType(proto::DataType::FP32, platform::CUDAPlace(0),
+ DataLayout::kAnyLayout, LibraryType::kCUDNN);
+
+void DummyTrans(const platform::DeviceContext* ctx,
+ const KernelTypePair& kernel_pair, const Variable& in,
+ Variable* out) {
+ PADDLE_ENFORCE(in.IsType(), "Only Support Tensor transform!.");
+ PADDLE_ENFORCE(
+ platform::places_are_same_class(kernel_pair.first.place_,
+ kernel_pair.second.place_),
+ "TransDataType Only Support DataType transform on same place!");
+ auto src = in.Get();
+ auto* dst = out->GetMutable();
+ *dst = src;
+}
+
+void TransDataType(const platform::DeviceContext* ctx,
+ const KernelTypePair& kernel_pair, const Variable& in,
+ Variable* out) {
+ PADDLE_ENFORCE(in.IsType(), "Only Support Tensor transform!.");
+ PADDLE_ENFORCE(
+ platform::places_are_same_class(kernel_pair.first.place_,
+ kernel_pair.second.place_),
+ "TransDataType Only Support DataType transform on same place!");
+
+ auto src = in.Get();
+ auto* dst = out->GetMutable();
+
+ auto dims = src.dims();
+ dst->Resize(dims);
+ auto dst_type = kernel_pair.second.data_type_;
+ auto src_type = kernel_pair.first.data_type_;
+
+ switch (src_type) {
+ case proto::DataType::FP32:
+ framework::VisitDataType(dst_type, CastDataType(src, dst, ctx));
+ break;
+ case proto::DataType::FP64:
+ framework::VisitDataType(dst_type, CastDataType(src, dst, ctx));
+ break;
+ case proto::DataType::INT32:
+ framework::VisitDataType(dst_type, CastDataType(src, dst, ctx));
+ break;
+ case proto::DataType::INT64:
+ framework::VisitDataType(dst_type, CastDataType(src, dst, ctx));
+ break;
+ case proto::DataType::BOOL:
+ framework::VisitDataType(dst_type, CastDataType(src, dst, ctx));
+ break;
+ default:
+ PADDLE_THROW("Not support type %d", src_type);
+ }
+}
+
+void TransDataLayout(const std::vector& axis,
+ const platform::DeviceContext* ctx,
+ const KernelTypePair& kernel_pair, const Variable& in,
+ Variable* out) {
+ PADDLE_ENFORCE(in.IsType(), "Only support Tensor transform!.");
+ PADDLE_ENFORCE(
+ platform::places_are_same_class(kernel_pair.first.place_,
+ kernel_pair.second.place_),
+ "TransDataLayout only support DataLayout transform on same place!");
+ PADDLE_ENFORCE(kernel_pair.first.data_type_ == kernel_pair.second.data_type_,
+ "TransDataLayout only support Datatype are same!");
+
+ auto src = in.Get();
+ auto* dst = out->GetMutable();
+ PADDLE_ENFORCE(arity(src.dims()) == 4, "Input Arity Only Suppport 4!");
+
+ auto src_dim = src.dims();
+ std::vector dst_dim;
+
+ dst_dim.resize(axis.size());
+ for (size_t i = 0; i < axis.size(); i++) {
+ dst_dim[i] = src_dim[axis[i]];
+ }
+
+ dst->Resize(make_ddim(dst_dim));
+ auto place = kernel_pair.second.place_;
+ dst->mutable_data(place, src.type());
+
+ auto src_type = kernel_pair.first.data_type_;
+ framework::VisitDataType(src_type, CastDataLayout(ctx, axis, src, dst));
+
+ dst->set_layout(kernel_pair.second.data_layout_);
+}
+
} // namespace framework
} // namespace paddle
+
+namespace f = paddle::framework;
+
+namespace {
+std::vector NHWC2NCHW = {0, 3, 1, 2};
+std::vector NCHW2NHWC = {0, 2, 3, 1};
+}
+
+REGISTER_DATA_TRANSFORM_FN(f::KernelFP32, f::KernelFP64, f::TransDataType);
+REGISTER_DATA_TRANSFORM_FN(f::KernelPlain, f::KernelCUDNN, f::DummyTrans);
+REGISTER_DATA_TRANSFORM_FN(f::KernelCUDNN, f::KernelPlain, f::DummyTrans);
+REGISTER_DATA_TRANSFORM_FN(f::KernelNHWC, f::KernelNCHW,
+ std::bind(f::TransDataLayout, NHWC2NCHW,
+ std::placeholders::_1,
+ std::placeholders::_2,
+ std::placeholders::_3,
+ std::placeholders::_4));
+REGISTER_DATA_TRANSFORM_FN(f::KernelNCHW, f::KernelNHWC,
+ std::bind(f::TransDataLayout, NCHW2NHWC,
+ std::placeholders::_1,
+ std::placeholders::_2,
+ std::placeholders::_3,
+ std::placeholders::_4));
diff --git a/paddle/framework/data_transform.h b/paddle/framework/data_transform.h
index 73f894a3e20ab779f8607e63a67139b0e8cce79a..e4e5c30a96a3c985ae2ecd494b723c8afeceb12f 100644
--- a/paddle/framework/data_transform.h
+++ b/paddle/framework/data_transform.h
@@ -19,19 +19,23 @@ limitations under the License. */
#include
#include "paddle/framework/op_kernel_type.h"
+#include "paddle/framework/selected_rows.h"
#include "paddle/framework/tensor.h"
#include "paddle/framework/variable.h"
+#include "paddle/operators/math/math_function.h"
#include "paddle/platform/device_context.h"
#include "paddle/platform/macros.h"
+#include "paddle/platform/transform.h"
namespace paddle {
namespace framework {
-using DataTransformFN =
- std::function ctx,
- const Variable& in, Variable* out)>;
using KernelTypePair = std::pair;
+using DataTransformFn =
+ std::function;
+
struct KernelTypePairHash {
static void HashCombine(const OpKernelType& t, std::size_t* seed) {
OpKernelType::Hash kernel_type_hasher;
@@ -46,8 +50,76 @@ struct KernelTypePairHash {
}
};
+Tensor* DataTransform(const OpKernelType& expected_kernel_type,
+ const OpKernelType& kernel_type_for_var,
+ const Tensor& input_tensor);
+
+void CopyVariableWithTensor(const Variable& in_var, const Tensor& tensor,
+ Variable& out_var);
+
+template
+struct CastDataTypeFunctor {
+ HOSTDEVICE inline OutType operator()(InType in) const {
+ return static_cast(in);
+ }
+};
+
+template
+struct CastDataType {
+ CastDataType(const framework::Tensor& in, framework::Tensor* out,
+ const platform::DeviceContext* ctx)
+ : in_(in), out_(out), ctx_(ctx) {}
+ const framework::Tensor in_;
+ framework::Tensor* out_;
+ const platform::DeviceContext* ctx_;
+
+ template
+ void operator()() {
+ auto place = ctx_->GetPlace();
+
+ auto* in_begin = in_.data();
+ auto numel = in_.numel();
+ auto* in_end = in_begin + numel;
+ auto* out_begin = out_->mutable_data(place);
+
+ if (platform::is_cpu_place(place)) {
+ platform::Transform trans;
+ auto* context = static_cast(ctx_);
+ trans(*context, in_begin, in_end, out_begin,
+ CastDataTypeFunctor());
+ } else {
+ // TODO(dzhwinter): enhance Copy CPU<->GPU with different data type?
+ PADDLE_THROW("Unsupport CPU <-> GPU!");
+ }
+ }
+};
+
+struct CastDataLayout {
+ CastDataLayout(const platform::DeviceContext* ctx,
+ const std::vector& axis, const framework::Tensor& in,
+ framework::Tensor* out)
+ : in_(in), out_(out), ctx_(ctx), axis_(axis) {}
+ const framework::Tensor in_;
+ framework::Tensor* out_;
+ const platform::DeviceContext* ctx_;
+ const std::vector axis_;
+
+ template
+ void operator()() {
+ auto place = ctx_->GetPlace();
+
+ if (platform::is_cpu_place(place)) {
+ operators::math::Transpose trans4;
+ auto* context = static_cast(ctx_);
+ trans4(*context, in_, out_, axis_);
+ } else {
+ PADDLE_THROW("Unsupport CPU <-> GPU!");
+ }
+ }
+};
+
using DataTransformMap =
- std::unordered_map;
+ std::unordered_map;
class DataTransformFnMap {
public:
@@ -58,25 +130,25 @@ class DataTransformFnMap {
}
void Insert(const OpKernelType& left, const OpKernelType& right,
- const DataTransformFN& data_tranform_fn) {
+ const DataTransformFn& data_tranform_fn) {
Insert(std::make_pair(left, right), data_tranform_fn);
}
void Insert(const KernelTypePair& kernel_type_pair,
- const DataTransformFN& data_tranform_fn) {
+ const DataTransformFn& data_tranform_fn) {
PADDLE_ENFORCE(!Has(kernel_type_pair),
"KernelTypePair %s has been registered", "");
map_.insert({kernel_type_pair, data_tranform_fn});
}
- const DataTransformFN& Get(const KernelTypePair& key_pair) const {
+ const DataTransformFn& Get(const KernelTypePair& key_pair) const {
auto data_transformer = GetNullable(key_pair);
PADDLE_ENFORCE_NOT_NULL(data_transformer,
- "DataTransformFN should not be NULL");
+ "DataTransformFn should not be NULL");
return *data_transformer;
}
- const DataTransformFN* GetNullable(const KernelTypePair& key_pair) const {
+ const DataTransformFn* GetNullable(const KernelTypePair& key_pair) const {
auto it = map_.find(key_pair);
if (it == map_.end()) {
return nullptr;
diff --git a/paddle/framework/data_transform_test.cc b/paddle/framework/data_transform_test.cc
index f93a47eeb567c4fc984954aa5198362c9939c556..edd305fd17ae202926b83fbec10089719baa2e16 100644
--- a/paddle/framework/data_transform_test.cc
+++ b/paddle/framework/data_transform_test.cc
@@ -11,36 +11,67 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
+#include
+#include
-#include "paddle/framework/data_transform.h"
#include
+#include "paddle/framework/data_transform.h"
+#include "paddle/platform/device_context.h"
+
namespace paddle {
namespace framework {
-
using namespace platform;
+/**
+ * @brief cross validation of different kernel type transform
+ * We use four bit map represent different combination.
+ * If the field has multiple possible value, only choose two of them.
+ * For DataType, only test the FP32(float), FP64(double).
+ * e.g. 0000 -> FP32, CPUPlace, kNHWC, kPlain
+ * 1111 -> FP64, GPUPlace, kNCHW, kMKLDNN
+ */
+
+std::array kDataType = {
+ {proto::DataType::FP32, proto::DataType::FP64}};
+
+std::array kPlace = {{CPUPlace(), CUDAPlace(0)}};
+
+std::array kDataLayout = {{
+ DataLayout::kNHWC, DataLayout::kNCHW,
+}};
+
+std::array kLibraryType = {{
+ LibraryType::kPlain, LibraryType::kMKLDNN,
+}};
+
+OpKernelType GenFromBit(const std::vector bits) {
+ return OpKernelType(kDataType[bits[0]], kPlace[bits[1]], kDataLayout[bits[2]],
+ kLibraryType[bits[3]]);
+}
+
int test_value = 0;
-OpKernelType kernel_type_1(proto::DataType::FP32, CPUPlace(), DataLayout::kNCHW,
- LibraryType::kCUDNN);
-OpKernelType kernel_type_2(proto::DataType::FP32, CUDAPlace(0),
- DataLayout::kNCHW, LibraryType::kCUDNN);
-OpKernelType kernel_type_3(proto::DataType::FP16, CUDAPlace(0),
- DataLayout::kNCHW, LibraryType::kCUDNN);
+auto kernel0 = GenFromBit({0, 0, 0, 0});
+auto kernel1 = GenFromBit({0, 0, 0, 1});
+auto kernel2 = GenFromBit({0, 0, 1, 0});
+auto kernel3 = GenFromBit({0, 0, 1, 1});
-void type1_to_type2(std::vector ctx,
- const Variable& in, Variable* out) {
+void TransDataType_t(const platform::DeviceContext* ctx,
+ const KernelTypePair& p, const Variable& in,
+ Variable* out) {
test_value++;
}
-void type2_to_type3(std::vector ctx,
- const Variable& in, Variable* out) {
+void TransDataLayout_t(const platform::DeviceContext* ctx,
+ const KernelTypePair& p, const Variable& in,
+ Variable* out) {
test_value--;
}
-void type1_to_type3(std::vector ctx,
- const Variable& in, Variable* out) {
+void TransLibraryType_t(const platform::DeviceContext* ctx,
+ const KernelTypePair& p, const Variable& in,
+ Variable* out) {
test_value += 2;
}
@@ -49,30 +80,89 @@ void type1_to_type3(std::vector ctx,
namespace frw = paddle::framework;
-REGISTER_DATA_TRANSFORM_FN(frw::kernel_type_1, frw::kernel_type_2,
- frw::type1_to_type2);
-REGISTER_DATA_TRANSFORM_FN(frw::kernel_type_2, frw::kernel_type_3,
- frw::type2_to_type3);
-REGISTER_DATA_TRANSFORM_FN(frw::kernel_type_1, frw::kernel_type_3,
- frw::type1_to_type3);
+REGISTER_DATA_TRANSFORM_FN(frw::kernel0, frw::kernel1, frw::TransDataType_t);
+REGISTER_DATA_TRANSFORM_FN(frw::kernel1, frw::kernel2, frw::TransDataLayout_t);
+REGISTER_DATA_TRANSFORM_FN(frw::kernel0, frw::kernel2, frw::TransLibraryType_t);
TEST(DataTransform, Register) {
using namespace paddle::framework;
using namespace paddle::platform;
auto& instance = DataTransformFnMap::Instance();
- ASSERT_EQ(instance.Map().size(), 3UL);
- std::vector ctx;
paddle::framework::Variable in;
paddle::framework::Variable out;
- instance.Get(std::make_pair(frw::kernel_type_1, frw::kernel_type_2))(ctx, in,
- &out);
+ DeviceContext* ctx = new CPUDeviceContext();
+ auto pair0 = std::make_pair(frw::kernel0, frw::kernel1);
+ instance.Get(pair0)(ctx, pair0, in, &out);
ASSERT_EQ(test_value, 1);
- instance.Get(std::make_pair(frw::kernel_type_2, frw::kernel_type_3))(ctx, in,
- &out);
+
+ auto pair1 = std::make_pair(frw::kernel1, frw::kernel2);
+ instance.Get(pair1)(ctx, pair1, in, &out);
ASSERT_EQ(test_value, 0);
- instance.Get(std::make_pair(frw::kernel_type_1, frw::kernel_type_3))(ctx, in,
- &out);
+
+ auto pair3 = std::make_pair(frw::kernel0, frw::kernel2);
+ instance.Get(pair3)(ctx, pair3, in, &out);
ASSERT_EQ(test_value, 2);
}
+
+TEST(DataTransform, DataLayout) {
+ using namespace paddle::framework;
+ using namespace paddle::platform;
+
+ auto& instance = DataTransformFnMap::Instance();
+ Variable in;
+ Variable out;
+ Tensor* src = in.GetMutable();
+ src->mutable_data(make_ddim({2, 3, 1, 2}), CPUPlace());
+ src->set_layout(DataLayout::kNHWC);
+
+ DeviceContext* ctx = new CPUDeviceContext();
+
+ {
+ auto kernel1 = GenFromBit({1, 0, 0, 0});
+ auto kernel2 = GenFromBit({1, 0, 1, 0});
+ auto pair0 = std::make_pair(kernel1, kernel2);
+ instance.Get(pair0)(ctx, pair0, in, &out);
+ }
+
+ Tensor dst = out.Get();
+
+ EXPECT_TRUE(dst.layout() == DataLayout::kNCHW);
+ EXPECT_TRUE(dst.dims() == make_ddim({2, 2, 3, 1}));
+
+ {
+ auto kernel1 = GenFromBit({1, 0, 1, 0});
+ auto kernel2 = GenFromBit({1, 0, 0, 0});
+ auto pair0 = std::make_pair(kernel1, kernel2);
+ instance.Get(pair0)(ctx, pair0, out, &in);
+ }
+
+ EXPECT_TRUE(src->layout() == DataLayout::kNHWC);
+ EXPECT_TRUE(src->dims() == make_ddim({2, 3, 1, 2}));
+}
+
+TEST(DataTransform, DataType) {
+ using namespace paddle::framework;
+ using namespace paddle::platform;
+
+ auto& instance = DataTransformFnMap::Instance();
+ DeviceContext* ctx = new CPUDeviceContext();
+
+ Variable in;
+ Variable out;
+ Tensor* src = in.GetMutable();
+ float* ptr = src->mutable_data(make_ddim({2, 3}), CPUPlace());
+ for (int i = 0; i < 6; ++i) {
+ ptr[i] = i / 3;
+ }
+
+ {
+ auto kernel1 = GenFromBit({0, 0, 0, 0});
+ auto kernel2 = GenFromBit({1, 0, 0, 0});
+ auto pair0 = std::make_pair(kernel1, kernel2);
+ instance.Get(pair0)(ctx, pair0, in, &out);
+ }
+ Tensor dst = out.Get();
+ EXPECT_TRUE(dst.data() != nullptr);
+}
diff --git a/paddle/framework/details/cow_ptr.h b/paddle/framework/details/cow_ptr.h
new file mode 100644
index 0000000000000000000000000000000000000000..7e308ffb5a49876aa2c1833b3b7e2a2c7eb137aa
--- /dev/null
+++ b/paddle/framework/details/cow_ptr.h
@@ -0,0 +1,98 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. */
+
+#pragma once
+#include
+#include
+
+namespace paddle {
+namespace framework {
+namespace details {
+
+// Change it to thread safe flags if needed.
+class ThreadUnsafeOwnershipFlags {
+ public:
+ ThreadUnsafeOwnershipFlags(bool flag) : flag_(flag) {}
+
+ ThreadUnsafeOwnershipFlags(const ThreadUnsafeOwnershipFlags& other) = delete;
+ ThreadUnsafeOwnershipFlags& operator=(
+ const ThreadUnsafeOwnershipFlags& other) = delete;
+ ThreadUnsafeOwnershipFlags(ThreadUnsafeOwnershipFlags&& other) = default;
+
+ void SetOwnership(bool flag) { flag_ = flag; }
+
+ // Invoke the callback if it is not owned.
+ template
+ void AcquireOwnershipOnce(Callback acquire) {
+ if (!flag_) {
+ acquire();
+ flag_ = true;
+ }
+ }
+
+ private:
+ bool flag_;
+};
+
+// Copy-On-Write pointer.
+// It will hold a T* pointer, and only copy once when `MutableData` is invoked.
+//
+// The template parameter OwnershipFlags should have:
+// * a constructor takes a bool. True if own.
+// * SetOwnership(bool flag).
+// * AcquireOwnershipOnce(Callback). It will invoke the callback if it is not
+// owned.
+//
+// https://en.wikipedia.org/wiki/Copy-on-write
+template
+class COWPtr {
+ public:
+ // Ctor from raw pointer.
+ explicit COWPtr(T* ptr) : payload_(ptr), ownership_{true} {}
+
+ // Move methods. Steal ownership from origin
+ COWPtr(COWPtr&& other)
+ : payload_(other.payload_), ownership_{std::move(other.ownership_)} {}
+ COWPtr& operator=(COWPtr&& origin) = default;
+
+ // Copy methods. Not own payload
+ COWPtr(const COWPtr& other) : payload_(other.payload_), ownership_{false} {}
+ COWPtr& operator=(const COWPtr& other) {
+ payload_ = other.payload_;
+ ownership_.SetOwnership(false);
+ return *this;
+ }
+
+ // Access read only data.
+ const T& Data() const { return *payload_; }
+
+ // Access mutable data. If the data is not owned, the data will be copied
+ // before.
+ T* MutableData() {
+ ownership_.AcquireOwnershipOnce(
+ [this] { payload_.reset(new T(*payload_)); });
+ return payload_.get();
+ }
+
+ private:
+ // Actual data pointer.
+ std::shared_ptr payload_;
+
+ // Ownership flag.
+ OwnershipFlags ownership_;
+};
+
+} // namespace details
+} // namespace framework
+} // namespace paddle
diff --git a/paddle/framework/details/cow_ptr_test.cc b/paddle/framework/details/cow_ptr_test.cc
new file mode 100644
index 0000000000000000000000000000000000000000..936954a2333e7e5d2a932abad641279db9ef7b9f
--- /dev/null
+++ b/paddle/framework/details/cow_ptr_test.cc
@@ -0,0 +1,35 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. */
+
+#include "paddle/framework/details/cow_ptr.h"
+#include "gtest/gtest.h"
+
+namespace paddle {
+namespace framework {
+namespace details {
+
+TEST(COWPtr, all) {
+ COWPtr ptr(new int{0});
+ ASSERT_EQ(ptr.Data(), 0);
+ COWPtr ptr2 = ptr;
+ ASSERT_EQ(ptr2.Data(), 0);
+ ASSERT_EQ(&ptr2.Data(), &ptr.Data());
+ *ptr2.MutableData() = 10;
+ ASSERT_EQ(ptr.Data(), 0);
+ ASSERT_EQ(ptr2.Data(), 10);
+}
+
+} // namespace details
+} // namespace framework
+} // namespace paddle
diff --git a/paddle/framework/device_data_transform.cc b/paddle/framework/device_data_transform.cc
new file mode 100644
index 0000000000000000000000000000000000000000..cd5104cc6f287315ed9d22aa2ec6414f7204d214
--- /dev/null
+++ b/paddle/framework/device_data_transform.cc
@@ -0,0 +1,46 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. */
+
+#include "paddle/framework/device_data_transform.h"
+
+namespace paddle {
+namespace framework {
+
+static const platform::DeviceContext* GetDeviceContext(
+ const platform::Place& src_place, const platform::Place& dst_place) {
+ platform::DeviceContextPool& pool = platform::DeviceContextPool::Instance();
+
+ if (platform::is_gpu_place(src_place) && platform::is_cpu_place(dst_place)) {
+ return pool.Get(src_place);
+ } else if (platform::is_cpu_place(src_place) &&
+ platform::is_gpu_place(dst_place)) {
+ return pool.Get(dst_place);
+ } else {
+ PADDLE_THROW(
+ "Currently, model parallelism is only supported between CPU and CUDA");
+ }
+}
+
+Tensor* DeviceTransform(const Tensor& in, const platform::Place& dst_place) {
+ VLOG(3) << "DeviceTransform in, src_place " << in.place()
+ << " dst_place: " << dst_place;
+ Tensor* out = new Tensor();
+ auto* dev_ctx = GetDeviceContext(in.place(), dst_place);
+ dev_ctx->Wait();
+ Copy(in, dst_place, *dev_ctx, out);
+ dev_ctx->Wait();
+ return out;
+}
+
+} // namespace framework
+} // namespace paddle
diff --git a/paddle/framework/device_data_transform.h b/paddle/framework/device_data_transform.h
new file mode 100644
index 0000000000000000000000000000000000000000..bebf0d1b320183f46ab226dc6493ba09a365fc35
--- /dev/null
+++ b/paddle/framework/device_data_transform.h
@@ -0,0 +1,27 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. */
+
+#pragma once
+
+#include "paddle/framework/lod_tensor.h"
+#include "paddle/framework/tensor.h"
+#include "paddle/framework/tensor_util.h"
+#include "paddle/platform/device_context.h"
+
+namespace paddle {
+namespace framework {
+
+Tensor* DeviceTransform(const Tensor& in, const platform::Place& dst_place);
+
+} // namespace framework
+} // namespace paddle
diff --git a/paddle/framework/device_data_transform_test.cu b/paddle/framework/device_data_transform_test.cu
new file mode 100644
index 0000000000000000000000000000000000000000..5d89f5546fa87241dec6364d86a100ca51bce687
--- /dev/null
+++ b/paddle/framework/device_data_transform_test.cu
@@ -0,0 +1,167 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. */
+
+#include "gtest/gtest.h"
+
+#include "paddle/framework/init.h"
+#include "paddle/framework/lod_tensor.h"
+#include "paddle/framework/op_info.h"
+#include "paddle/framework/op_registry.h"
+#include "paddle/operators/elementwise_op_function.h"
+#include "paddle/operators/math/math_function.h"
+#include "paddle/platform/device_context.h"
+
+namespace paddle {
+namespace framework {
+
+template
+struct AddFunctor {
+ inline HOSTDEVICE T operator()(T a, T b) const { return a + b; }
+};
+
+class OpKernelTestProtoAndCheckerMaker : public OpProtoAndCheckerMaker {
+ public:
+ OpKernelTestProtoAndCheckerMaker(OpProto* proto, OpAttrChecker* op_checker)
+ : OpProtoAndCheckerMaker(proto, op_checker) {
+ AddInput("input", "input1 of test op");
+ AddOutput("output", "output of test op");
+ AddAttr("use_gpu", "force to use gpu kernel").SetDefault(false);
+ AddComment("This is test op");
+ }
+};
+
+class TestOpWithKernel : public OperatorWithKernel {
+ public:
+ using OperatorWithKernel::OperatorWithKernel;
+
+ protected:
+ void InferShape(framework::InferShapeContext* ctx) const override {}
+ OpKernelType GetExpectedKernelType(
+ const ExecutionContext& ctx) const override {
+ if (Attr("use_gpu")) {
+ VLOG(3) << "force use gpu kernel";
+ return OpKernelType(proto::DataType::FP32, platform::CUDAPlace(0));
+ } else {
+ VLOG(3) << "use default kernel";
+ return OpKernelType(proto::DataType::FP32,
+ ctx.Input("input")->place());
+ }
+ }
+};
+
+template
+class TestKernel : public OpKernel {
+ public:
+ void Compute(const ExecutionContext& ctx) const {
+ std::cout << ctx.op().DebugString() << std::endl;
+
+ const Tensor* input = ctx.Input("input");
+
+ std::cout << "input place:" << input->place() << std::endl;
+ auto* output = ctx.Output("output");
+ output->Resize(input->dims());
+ output->mutable_data(ctx.GetPlace());
+
+ operators::TransformFunctor, T, DeviceContext> functor(
+ input, input, output, ctx.template device_context(),
+ AddFunctor());
+ functor.Run();
+ }
+};
+
+} // namespace framework
+} // namespace paddle
+
+REGISTER_OP_WITHOUT_GRADIENT(
+ test_op, paddle::framework::TestOpWithKernel,
+ paddle::framework::OpKernelTestProtoAndCheckerMaker);
+REGISTER_OP_CPU_KERNEL(
+ test_op,
+ paddle::framework::TestKernel);
+REGISTER_OP_CUDA_KERNEL(
+ test_op,
+ paddle::framework::TestKernel);
+
+static void BuildVar(const std::string& param_name,
+ std::initializer_list arguments,
+ paddle::framework::proto::OpDesc::Var* var) {
+ var->set_parameter(param_name);
+ for (auto& arg_name : arguments) {
+ *var->mutable_arguments()->Add() = arg_name;
+ }
+}
+
+TEST(Operator, CPUtoGPU) {
+ using namespace paddle::framework;
+ using namespace paddle::platform;
+ InitDevices();
+
+ paddle::framework::Scope scope;
+ paddle::platform::CPUPlace cpu_place;
+
+ // create an op to run on CPU
+ paddle::framework::proto::OpDesc cpu_op_desc;
+ cpu_op_desc.set_type("test_op");
+ BuildVar("input", {"IN1"}, cpu_op_desc.add_inputs());
+ BuildVar("output", {"OUT1"}, cpu_op_desc.add_outputs());
+
+ auto cpu_op = paddle::framework::OpRegistry::CreateOp(cpu_op_desc);
+ // prepare input
+ auto* in_t = scope.Var("IN1")->GetMutable();
+ auto* src_ptr = in_t->mutable_data({2, 3}, CPUPlace());
+ for (int i = 0; i < 2 * 3; ++i) {
+ src_ptr[i] = static_cast(i);
+ }
+
+ // get output
+ auto* output = scope.Var("OUT1");
+ cpu_op->Run(scope, cpu_place);
+
+ auto* output_ptr = output->Get().data();
+ for (int i = 0; i < 2 * 3; ++i) {
+ ASSERT_EQ(output_ptr[i], static_cast(i) * 2);
+ }
+
+ // create an op to run on GPU
+ paddle::framework::proto::OpDesc gpu_op_desc;
+ gpu_op_desc.set_type("test_op");
+ BuildVar("input", {"OUT1"}, gpu_op_desc.add_inputs());
+ BuildVar("output", {"OUT2"}, gpu_op_desc.add_outputs());
+
+ auto attr = gpu_op_desc.mutable_attrs()->Add();
+ attr->set_name("use_gpu");
+ attr->set_type(paddle::framework::proto::AttrType::BOOLEAN);
+ attr->set_b(true);
+
+ auto gpu_op = paddle::framework::OpRegistry::CreateOp(gpu_op_desc);
+
+ paddle::platform::CUDAPlace cuda_place(0);
+ // get output
+ auto* output2 = scope.Var("OUT2");
+ gpu_op->Run(scope, cuda_place);
+
+ // auto* output2_ptr = output2->Get().data();
+ DeviceContextPool& pool = DeviceContextPool::Instance();
+ auto dev_ctx = pool.Get(cuda_place);
+
+ paddle::framework::Tensor output_tensor;
+ Copy(output2->Get(), paddle::platform::CPUPlace(), *dev_ctx,
+ &output_tensor);
+
+ dev_ctx->Wait();
+ float* output2_ptr = output_tensor.data();
+ for (int i = 0; i < 2 * 3; ++i) {
+ ASSERT_EQ(output2_ptr[i], static_cast(i) * 4);
+ }
+}
diff --git a/paddle/framework/executor.cc b/paddle/framework/executor.cc
index 997773c1689efad4ce5a86c09ce58bd3a40185e0..c0418c9266e257bd7567861543e557f354451b17 100644
--- a/paddle/framework/executor.cc
+++ b/paddle/framework/executor.cc
@@ -14,18 +14,18 @@ limitations under the License. */
#include "paddle/framework/executor.h"
-#include
-#include
-#include
#include
-#include
+#include "gflags/gflags.h"
#include "paddle/framework/feed_fetch_type.h"
#include "paddle/framework/lod_rank_table.h"
-#include "paddle/framework/lod_tensor.h"
#include "paddle/framework/lod_tensor_array.h"
#include "paddle/framework/op_registry.h"
-#include "paddle/framework/scope.h"
+#include "paddle/platform/place.h"
+
+DEFINE_bool(check_nan_inf, false,
+ "Checking whether operator produce NAN/INF or not. It will be "
+ "extremely slow so please use this flag wisely.");
namespace paddle {
namespace framework {
@@ -50,14 +50,30 @@ static void CreateTensor(Variable* var, proto::VarDesc::VarType var_type) {
var->GetMutable();
} else if (var_type == proto::VarDesc::LOD_TENSOR_ARRAY) {
var->GetMutable();
+ } else if (var_type == proto::VarDesc::PLACE_LIST) {
+ var->GetMutable();
} else {
PADDLE_THROW(
"Variable type %d is not in "
- "[LoDTensor, SelectedRows, FEED_MINIBATCH, FETCH_LIST, LOD_RANK_TABLE]",
+ "[LoDTensor, SelectedRows, FEED_MINIBATCH, FETCH_LIST, LOD_RANK_TABLE,"
+ " PLACE_LIST]",
var_type);
}
}
+static void CheckTensorNANOrInf(const std::string& name,
+ const framework::Tensor& tensor) {
+ if (tensor.memory_size() == 0) {
+ return;
+ }
+ if (tensor.type().hash_code() != typeid(float).hash_code() &&
+ tensor.type().hash_code() != typeid(double).hash_code()) {
+ return;
+ }
+ PADDLE_ENFORCE(!framework::HasInf(tensor), "Tensor %s has Inf", name);
+ PADDLE_ENFORCE(!framework::HasNAN(tensor), "Tensor %s has NAN", name);
+}
+
void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id,
bool create_local_scope, bool create_vars) {
// TODO(tonyyang-svail):
@@ -99,10 +115,19 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id,
for (auto& op_desc : block.AllOps()) {
auto op = paddle::framework::OpRegistry::CreateOp(*op_desc);
- VLOG(3) << op->DebugString();
+ VLOG(3) << op->DebugStringEx(local_scope);
op->Run(*local_scope, place_);
+ if (FLAGS_check_nan_inf) {
+ for (auto& vname : op->OutputVars(true)) {
+ auto* var = local_scope->FindVar(vname);
+ if (var == nullptr) continue;
+ if (var->IsType()) {
+ CheckTensorNANOrInf(vname, var->Get());
+ }
+ }
+ }
}
- if (create_local_scope) {
+ if (create_vars && create_local_scope) {
scope->DeleteScope(local_scope);
}
}
diff --git a/paddle/framework/framework.proto b/paddle/framework/framework.proto
index 4f2746e4b86ee5fe095897ff6ef9d3f6473e8a14..ea69b87e2ac7dc587333b623c310182bb39eb452 100644
--- a/paddle/framework/framework.proto
+++ b/paddle/framework/framework.proto
@@ -123,6 +123,7 @@ message VarDesc {
STEP_SCOPES = 5;
LOD_RANK_TABLE = 6;
LOD_TENSOR_ARRAY = 7;
+ PLACE_LIST = 8;
}
required string name = 1;
required VarType type = 2;
diff --git a/paddle/framework/grad_op_desc_maker.h b/paddle/framework/grad_op_desc_maker.h
index 2de5242831835b47893a5825e5532500ad5ec3f9..2082f8bb76fb62bc36f033fecbd4eaa76d12d949 100644
--- a/paddle/framework/grad_op_desc_maker.h
+++ b/paddle/framework/grad_op_desc_maker.h
@@ -87,7 +87,11 @@ class GradOpDescMakerBase {
auto onames = this->Output(name);
ret_val.reserve(onames.size());
std::transform(onames.begin(), onames.end(), std::back_inserter(ret_val),
- GradVarName);
+ [this](const std::string& fwd_var_name) -> std::string {
+ auto g_name = GradVarName(fwd_var_name);
+ (*this->grad_to_var_)[g_name] = fwd_var_name;
+ return g_name;
+ });
return ret_val;
}
diff --git a/paddle/framework/init.cc b/paddle/framework/init.cc
index d6601090d5b6150a5aa467210038d3693c3e67a8..e12bac1d78e3f6bbc46849c06b53e3b93e147cfc 100644
--- a/paddle/framework/init.cc
+++ b/paddle/framework/init.cc
@@ -15,6 +15,7 @@ limitations under the License. */
#include
#include "paddle/framework/init.h"
+#include "paddle/framework/operator.h"
#include "paddle/platform/device_context.h"
#include "paddle/platform/place.h"
#include "paddle/string/piece.h"
@@ -24,7 +25,6 @@ namespace framework {
std::once_flag gflags_init_flag;
-// TODO(qijun) move init gflags to init.cc
void InitGflags(std::vector &argv) {
std::call_once(gflags_init_flag, [&]() {
int argc = argv.size();
@@ -40,39 +40,28 @@ void InitGflags(std::vector &argv) {
});
}
-bool InitDevices(const std::vector &devices) {
- // device format
- // CPU
- // GPU:1
- // TODO(dzhwinter) : add device format annotation for users.
+void InitDevices() {
+ /*Init all avaiable devices by default */
+
std::vector places;
- for (auto &device : devices) {
- auto p = string::Piece(device);
- if (string::HasPrefix(p, "CPU")) {
- places.emplace_back(platform::CPUPlace());
- } else if (string::HasPrefix(p, "GPU")) {
+ places.emplace_back(platform::CPUPlace());
+
#ifdef PADDLE_WITH_CUDA
- auto pos = string::RFind(p, ':', string::Piece::npos);
- auto number = device.substr(pos + 1);
- places.emplace_back(platform::CUDAPlace(std::stoi(number)));
+ int count = platform::GetCUDADeviceCount();
+ for (int i = 0; i < count; ++i) {
+ places.emplace_back(platform::CUDAPlace(i));
+ }
#else
- LOG(WARNING)
- << "'GPU' is not supported, Please re-compile with WITH_GPU option";
+ LOG(WARNING)
+ << "'GPU' is not supported, Please re-compile with WITH_GPU option";
#endif
- } else {
- return false;
- }
- }
- if (std::find_if(places.begin(), places.end(),
- [&](const platform::Place &place) {
- return platform::is_cpu_place(place);
- }) == places.end()) {
- places.emplace_back(platform::CPUPlace());
- LOG(WARNING) << "Not specified CPU device, create CPU by Default.";
- }
- platform::DeviceContextPool::Create(places);
- return true;
+ platform::DeviceContextPool::Init(places);
+}
+
+void InitGLOG(const std::string &prog_name) {
+ google::InitGoogleLogging(prog_name.c_str());
+ google::InstallFailureSignalHandler();
}
} // namespace framework
diff --git a/paddle/framework/init.h b/paddle/framework/init.h
index 33907f9eb00fb3469b53dcf8151557cc7a2d3791..c8fd964d006baf729888414ded2aec85ba5a024e 100644
--- a/paddle/framework/init.h
+++ b/paddle/framework/init.h
@@ -22,7 +22,9 @@ namespace framework {
void InitGflags(std::vector &argv);
-bool InitDevices(const std::vector &devices);
+void InitGLOG(const std::string &prog_name);
+
+void InitDevices();
} // namespace framework
} // namespace paddle
diff --git a/paddle/framework/init_test.cc b/paddle/framework/init_test.cc
index f0788051d4855a175d2d7ea1f1a0805c776c462b..f837a965d3be7d40c20803ae4462b3bfd91bffd0 100644
--- a/paddle/framework/init_test.cc
+++ b/paddle/framework/init_test.cc
@@ -14,18 +14,13 @@ limitations under the License. */
#include "gtest/gtest.h"
#include "paddle/framework/init.h"
+#include "paddle/platform/device_context.h"
-TEST(Init, InitDevices) {
+TEST(InitDevices, CPU) {
using paddle::framework::InitDevices;
- std::vector ds1 = {"CPU"};
- ASSERT_EQ(InitDevices(ds1), true);
+ using paddle::platform::DeviceContextPool;
-#ifdef PADDLE_WITH_CUDA
- std::vector ds2 = {"CPU", "GPU:0", "GPU:1"};
- ASSERT_EQ(InitDevices(ds2), true);
-
- // test re-init
- std::vector ds3 = {"GPU:0", "GPU:1"};
- ASSERT_EQ(InitDevices(ds3), true);
-#endif
+ InitDevices();
+ DeviceContextPool& pool = DeviceContextPool::Instance();
+ ASSERT_GE(pool.size(), 1U);
}
diff --git a/paddle/framework/library_type.h b/paddle/framework/library_type.h
index 7707799cae8c4edc304cd81725270a85f01fd28d..1e3084835439b0d55de72a669b93acbaef7ed6b9 100644
--- a/paddle/framework/library_type.h
+++ b/paddle/framework/library_type.h
@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
+#include
namespace paddle {
namespace framework {
@@ -41,6 +42,9 @@ inline std::string LibraryTypeToString(const LibraryType& library_type) {
inline LibraryType StringToLibraryType(const char* ctype) {
std::string s(ctype);
+ for (size_t i = 0; i < s.size(); ++i) {
+ s[i] = toupper(s[i]);
+ }
if (s == std::string("PLAIN")) {
return LibraryType::kPlain;
} else if (s == std::string("MKLDNN")) {
diff --git a/paddle/framework/lod_tensor.cc b/paddle/framework/lod_tensor.cc
index f8a3be9a82bdbaf82550634d36122eb7bbe85e54..7ae94c646537e0d7c4687b949a1b06cd3a7f3404 100644
--- a/paddle/framework/lod_tensor.cc
+++ b/paddle/framework/lod_tensor.cc
@@ -43,16 +43,30 @@ std::ostream &operator<<(std::ostream &os, const LoD &lod) {
return os;
}
-LoD SliceLevels(const LoD &in, size_t level_begin, size_t level_end) {
- LoD new_lod;
- new_lod.reserve(level_end - level_begin);
- for (size_t i = level_begin; i < level_end; i++) {
- new_lod.emplace_back(in.at(i));
+std::ostream &operator<<(std::ostream &os, const LoDTensor &t) {
+ PADDLE_ENFORCE(t.type().hash_code() == typeid(float).hash_code());
+
+ if (!platform::is_cpu_place(t.place())) {
+ LoDTensor tt;
+ framework::Copy(t, platform::CPUPlace(), &tt);
+ platform::DeviceContextPool &pool = platform::DeviceContextPool::Instance();
+ auto &dev_ctx = *pool.Get(t.place());
+ dev_ctx.Wait();
+
+ os << tt;
+ return os;
+ }
+
+ os << "dim: " << t.dims() << "\n";
+ os << "lod: " << t.lod() << "\n";
+
+ // only print first ten elements
+ int64_t size = t.numel() < 10 ? t.numel() : 10;
+ for (int64_t i = 0; i < size; ++i) {
+ os << t.data()[i] << " ";
}
- // transform the lowest level to absolute offset.
- LoD abs_offset_lod = ToAbsOffset(in);
- new_lod.back() = abs_offset_lod[level_end - 1];
- return new_lod;
+
+ return os;
}
LoD SliceInLevel(const LoD &in, size_t level, size_t elem_begin,
@@ -115,43 +129,6 @@ bool operator==(const LoD &a, const LoD &b) {
return true;
}
-size_t LoDTensor::NumElements(size_t level, size_t idx) const {
- PADDLE_ENFORCE_LT(level, NumLevels());
- PADDLE_ENFORCE_LT(idx, NumElements(level));
- return lod_[level][idx + 1] - lod_[level][idx];
-}
-
-size_t LoDTensor::NumInstancesInElement(size_t level, size_t idx) const {
- PADDLE_ENFORCE_LT(level, NumLevels());
- PADDLE_ENFORCE_LT(idx, NumElements(level));
- auto abs_lod = ToAbsOffset(lod());
- size_t begin = abs_lod[level][idx];
- size_t end = abs_lod[level][idx + 1];
- return end - begin;
-}
-
-void LoDTensor::ShrinkLevels(size_t level_begin, size_t level_end) {
- auto new_lod = framework::SliceLevels(lod_, level_begin, level_end);
- lod_ = new_lod;
-}
-
-void LoDTensor::ShrinkInLevel(size_t level, size_t elem_begin,
- size_t elem_end) {
- PADDLE_ENFORCE_LT(level, NumLevels());
- PADDLE_ENFORCE_LT(elem_begin, NumElements(level));
- PADDLE_ENFORCE_LT(elem_end, NumElements(level) + 1);
-
- auto abs_lod = framework::ToAbsOffset(lod());
- auto new_lod = framework::SliceInLevel(lod_, level, elem_begin, elem_end);
- lod_ = new_lod;
-
- // slice the underlying tensor
- size_t begin = abs_lod[level][elem_begin];
- size_t end = abs_lod[level][elem_end];
- PADDLE_ENFORCE_LT(begin, end, "Cannot shrink, the result tensor is empty.");
- ShareDataWith(Slice(begin, end));
-}
-
using LoDAndOffset = std::pair>;
LoDAndOffset GetSubLoDAndAbsoluteOffset(const LoD &lod, size_t start_idx,
size_t end_idx, size_t start_level) {
@@ -177,6 +154,9 @@ void AppendLoD(LoD *lod, const LoD &lod_length) {
lod->empty() || lod->size() == lod_length.size(),
"The lod_length should has the same size with the appended lod.");
if (lod->empty()) {
+ for (size_t i = 0; i < lod_length.size(); ++i) {
+ lod->emplace_back(1, 0); // size = 1, value = 0;
+ }
*lod = LoD(lod_length.size(), std::vector({0}));
}
for (size_t i = 0; i < lod->size(); ++i) {
@@ -189,62 +169,16 @@ void AppendLoD(LoD *lod, const LoD &lod_length) {
void SerializeToStream(std::ostream &os, const LoDTensor &tensor,
const platform::DeviceContext &dev_ctx) {
- // TODO(typhoonzero): serialize to ostream
- { // the 1st field, uint32_t version
+ { // the 1st field, uint32_t version for LoDTensor
constexpr uint32_t version = 0;
os.write(reinterpret_cast(&version), sizeof(version));
}
- { // the 2nd field, tensor description
- // int32_t size
- // void* protobuf message
- proto::TensorDesc desc;
- desc.set_data_type(framework::ToDataType(tensor.type()));
- auto dims = framework::vectorize(tensor.dims());
- auto *pb_dims = desc.mutable_dims();
- pb_dims->Resize(static_cast(dims.size()), 0);
- std::copy(dims.begin(), dims.end(), pb_dims->begin());
- int32_t size = desc.ByteSize();
- os.write(reinterpret_cast(&size), sizeof(size));
- auto out = desc.SerializeAsString();
- os.write(out.data(), size);
- }
- { // the 3rd field, tensor data
- uint64_t size = tensor.memory_size();
- auto *data_ptr = tensor.data();
- PADDLE_ENFORCE(size < std::numeric_limits::max(),
- "Index overflow when writing tensor");
- if (platform::is_gpu_place(tensor.place())) {
-#ifdef PADDLE_WITH_CUDA
- constexpr size_t kBufSize = 1024 * 1024 * 64; // 64MB
- std::unique_ptr buf(new char[kBufSize]);
- auto &gpu_dev_ctx =
- static_cast(dev_ctx);
- platform::CPUPlace cpu;
- uintptr_t data = reinterpret_cast(data_ptr);
- while (size != 0) {
- size_t size_to_write = std::min(kBufSize, static_cast(size));
- memory::Copy(cpu, buf.get(),
- boost::get(tensor.place()),
- reinterpret_cast(data), size_to_write,
- gpu_dev_ctx.stream());
- gpu_dev_ctx.Wait();
- os.write(buf.get(), size_to_write);
- data += size_to_write;
- size -= size_to_write;
- }
-#else
- PADDLE_THROW("Unexpected branch");
-#endif
- } else {
- os.write(static_cast(data_ptr),
- static_cast(size));
- }
- }
- { // the 4th field, lod information
- // uint64_t lod_level
- // uint64_t lod_level_1 size in byte.
- // int* lod_level_1 data
- // ...
+ {
+ // the 2st field, LoD information
+ // uint64_t lod_level
+ // uint64_t lod_level_1 size in byte.
+ // int* lod_level_1 data
+ // ...
auto lod = tensor.lod();
uint64_t size = lod.size();
os.write(reinterpret_cast(&size), sizeof(size));
@@ -256,49 +190,20 @@ void SerializeToStream(std::ostream &os, const LoDTensor &tensor,
static_cast(size));
}
}
+ // the 3st field, Tensor
+ SerializeToStream(os, static_cast(tensor), dev_ctx);
}
-void DeserializeFromStream(std::istream &is, LoDTensor *tensor) {
- uint32_t version;
- is.read(reinterpret_cast(&version), sizeof(version));
- PADDLE_ENFORCE_EQ(version, 0U, "Only version 0 is supported");
- proto::TensorDesc desc;
- { // int32_t size
- // proto buffer
- int32_t size;
- is.read(reinterpret_cast(&size), sizeof(size));
- std::unique_ptr buf(new char[size]);
- is.read(reinterpret_cast(buf.get()), size);
- PADDLE_ENFORCE(desc.ParseFromArray(buf.get(), size),
- "Cannot parse tensor desc");
+void DeserializeFromStream(std::istream &is, LoDTensor *tensor,
+ const platform::DeviceContext &dev_ctx) {
+ {
+ // the 1st field, unit32_t version for LoDTensor
+ uint32_t version;
+ is.read(reinterpret_cast(&version), sizeof(version));
+ PADDLE_ENFORCE_EQ(version, 0U, "Only version 0 is supported");
}
- { // read tensor
- std::vector dims;
- dims.reserve(static_cast(desc.dims().size()));
- std::copy(desc.dims().begin(), desc.dims().end(), std::back_inserter(dims));
- tensor->Resize(framework::make_ddim(dims));
-
- void *buf;
- platform::Place cpu = platform::CPUPlace();
- switch (desc.data_type()) {
- case proto::FP32:
- buf = tensor->mutable_data(cpu);
- break;
- case proto::FP64:
- buf = tensor->mutable_data(cpu);
- break;
- case proto::INT32:
- buf = tensor->mutable_data(cpu);
- break;
- case proto::INT64:
- buf = tensor->mutable_data(cpu);
- break;
- default:
- PADDLE_THROW("DataType %d not supported", desc.data_type());
- }
- is.read(static_cast(buf), tensor->memory_size());
- }
- { // read lod
+ {
+ // the 2st field, LoD information
uint64_t lod_level;
is.read(reinterpret_cast(&lod_level), sizeof(lod_level));
auto &lod = *tensor->mutable_lod();
@@ -312,6 +217,59 @@ void DeserializeFromStream(std::istream &is, LoDTensor *tensor) {
lod[i] = tmp;
}
}
+ // the 3st filed, Tensor
+ DeserializeFromStream(is, static_cast(tensor), dev_ctx);
+}
+
+// TODO(tonyyang-svail): make this function support LoD
+std::vector LoDTensor::SplitLoDTensor(
+ const std::vector places) const {
+ check_memory_size();
+ PADDLE_ENFORCE(lod().empty(), "Disable parallel lod for now");
+ PADDLE_ENFORCE(dims()[0] % places.size() == 0,
+ "Batch size should be divided by places size");
+
+ std::vector lods;
+ for (size_t place_idx = 0; place_idx < places.size(); ++place_idx) {
+ int begin = place_idx * dims()[0] / places.size();
+ int end = (place_idx + 1) * dims()[0] / places.size();
+
+ auto src = Slice(begin, end);
+ auto &dst_place = places[place_idx];
+ LoDTensor dst;
+ framework::Copy(src, dst_place, &dst);
+
+ lods.emplace_back(dst);
+ }
+
+ return lods;
+}
+
+// TODO(tonyyang-svail): make this function support LoD
+void LoDTensor::MergeLoDTensor(
+ const std::vector &lod_tensors,
+ platform::Place dst_place) {
+ PADDLE_ENFORCE(!lod_tensors.empty());
+ framework::DDim new_dim = lod_tensors[0]->dims();
+ std::type_index new_type = lod_tensors[0]->type();
+ auto new_layout = lod_tensors[0]->layout();
+ for (auto *lod : lod_tensors) {
+ PADDLE_ENFORCE(new_dim == lod->dims());
+ PADDLE_ENFORCE(new_type == lod->type());
+ PADDLE_ENFORCE(new_layout == lod->layout());
+ }
+ new_dim[0] *= lod_tensors.size();
+ Resize(new_dim);
+ set_layout(new_layout);
+
+ mutable_data(dst_place, new_type);
+ int begin = 0;
+ for (auto *src : lod_tensors) {
+ int end = begin + src->dims()[0];
+ auto dst = Slice(begin, end);
+ framework::Copy(*src, dst_place, &dst);
+ begin = end;
+ }
}
} // namespace framework
diff --git a/paddle/framework/lod_tensor.h b/paddle/framework/lod_tensor.h
index 147db3ab0877662d9e47ae7ee6df05638b5fcbd1..37753f5f4ddea4755ad6211007c367de00aad754 100644
--- a/paddle/framework/lod_tensor.h
+++ b/paddle/framework/lod_tensor.h
@@ -58,14 +58,7 @@ using Vector = thrust::host_vector<
using LoD = std::vector>;
std::ostream& operator<<(std::ostream& os, const LoD& lod);
-
-/*
- * Slice levels from a LoD.
- * NOTE the lowest level should always be the absolute offsets of the underlying
- * tensor instances. So if higher layers are sliced without the lowest level,
- * the lower level of the sliced LoD will be transformed to the absolute offset.
- */
-LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end);
+std::ostream& operator<<(std::ostream& os, const LoDTensor& t);
LoD SliceInLevel(const LoD& in, size_t level, size_t elem_begin,
size_t elem_end);
@@ -115,34 +108,11 @@ class LoDTensor : public Tensor {
return (lod_)[level].size() - 1;
}
- /*
- * Number of lower-level elements.
- * For example, a 2-level lod-tensor
- *
- * 0-th level | |
- * 1-th level || |||
- *
- * NumElements(0, 0) get 2
- * NumElements(0, 1) get 3
- */
- size_t NumElements(size_t level, size_t idx) const;
+ std::vector SplitLoDTensor(
+ const std::vector places) const;
- /*
- * Get the number of instances in the underlying tensor in the `idx`-th
- * element.
- */
- size_t NumInstancesInElement(size_t level, size_t idx) const;
-
- /*
- * Shrink levels[level_begin:level_end]
- */
- void ShrinkLevels(size_t level_begin, size_t level_end);
-
- /*
- * Shrink elements of a level, [elem_begin: elem_end]
- * @note: low performance in slice lod_.
- */
- void ShrinkInLevel(size_t level, size_t elem_begin, size_t elem_end);
+ void MergeLoDTensor(const std::vector& lod_tensors,
+ platform::Place place);
private:
LoD lod_;
@@ -177,8 +147,8 @@ LoDTensor LodExpand(const LoDTensor& source, const LoD& lod, size_t level,
for (size_t ins = 0; ins < num_instances; ins++) {
for (size_t elem = lod_level[ins]; elem < lod_level[ins + 1]; elem++) {
auto slice = tensor.Slice(elem, elem + 1);
- CopyFrom(source.Slice(ins, ins + 1), platform::CPUPlace(),
- platform::CPUDeviceContext(), &slice);
+ Copy(source.Slice(ins, ins + 1), platform::CPUPlace(),
+ platform::CPUDeviceContext(), &slice);
}
}
return tensor;
@@ -208,7 +178,8 @@ void AppendLoD(LoD* lod, const LoD& lod_length);
*/
void SerializeToStream(std::ostream& os, const LoDTensor& tensor,
const platform::DeviceContext& dev_ctx);
-void DeserializeFromStream(std::istream& is, LoDTensor* tensor);
+void DeserializeFromStream(std::istream& is, LoDTensor* tensor,
+ const platform::DeviceContext& dev_ctx);
} // namespace framework
} // namespace paddle
diff --git a/paddle/framework/lod_tensor_test.cc b/paddle/framework/lod_tensor_test.cc
index 02d84b68233f2fdfc66e1df2fc7ce20307cadd94..baad9c6f98ac135c3650fe3113522850328c1298 100644
--- a/paddle/framework/lod_tensor_test.cc
+++ b/paddle/framework/lod_tensor_test.cc
@@ -54,78 +54,6 @@ class LoDTensorTester : public ::testing::Test {
LoDTensor lod_tensor_;
};
-TEST_F(LoDTensorTester, NumLevels) { ASSERT_EQ(lod_tensor_.NumLevels(), 3UL); }
-
-TEST_F(LoDTensorTester, NumElements) {
- ASSERT_EQ(lod_tensor_.NumElements(0), 2UL);
- ASSERT_EQ(lod_tensor_.NumElements(1), 3UL);
- ASSERT_EQ(lod_tensor_.NumElements(2), 8UL);
-}
-
-TEST_F(LoDTensorTester, NumElements2) {
- ASSERT_EQ(lod_tensor_.NumElements(0, 0), 2UL);
- ASSERT_EQ(lod_tensor_.NumElements(0, 1), 1UL);
- ASSERT_EQ(lod_tensor_.NumElements(1, 1), 3UL);
-}
-
-TEST_F(LoDTensorTester, ShrinkLevels) {
- // slice 1 level
- for (size_t level = 0; level < 3UL; ++level) {
- LoDTensor new_lod_tensor = lod_tensor_;
- new_lod_tensor.ShrinkLevels(level, level + 1);
- ASSERT_EQ(new_lod_tensor.NumLevels(), 1UL);
- ASSERT_EQ(new_lod_tensor.data(), lod_tensor_.data());
- }
- // shrink 2 level
- for (size_t level = 0; level < 2UL; ++level) {
- LoDTensor new_lod_tensor = lod_tensor_;
- new_lod_tensor.ShrinkLevels(level, level + 2);
- // the lowest level's last element should be the tensor's batch_size.
- ASSERT_EQ(new_lod_tensor.lod().back().back(),
- lod_tensor_.lod().back().back());
- ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
- ASSERT_EQ(new_lod_tensor.data(), lod_tensor_.data());
- }
-}
-
-TEST_F(LoDTensorTester, ShrinkInLevel) {
- size_t level = 0;
- LoDTensor new_lod_tensor = lod_tensor_;
- new_lod_tensor.ShrinkInLevel(level, 0, 1);
- ASSERT_EQ(new_lod_tensor.NumLevels(), 3UL);
- ASSERT_EQ(new_lod_tensor.NumElements(0), 1UL);
- ASSERT_EQ(new_lod_tensor.NumElements(1), 2UL);
- ASSERT_EQ(new_lod_tensor.NumElements(2), 5UL);
- ASSERT_EQ(new_lod_tensor.dims()[0], 12);
- for (int i = 0; i < 12 * 128; i++) {
- ASSERT_EQ(new_lod_tensor.data()[i], i);
- }
-
- level = 1;
- new_lod_tensor = lod_tensor_;
- new_lod_tensor.ShrinkInLevel(level, 1, 2);
- ASSERT_EQ(new_lod_tensor.NumLevels(), 2UL);
- ASSERT_EQ(new_lod_tensor.NumElements(0), 1UL);
- ASSERT_EQ(new_lod_tensor.NumElements(1), 3UL);
- ASSERT_EQ(new_lod_tensor.dims()[0], 7);
- for (int i = 5 * 128; i < 12 * 128; i++) {
- ASSERT_EQ(new_lod_tensor.data()[i - 5 * 128], i);
- }
-
- LoDTensor t1;
- t1.set_lod(lod_tensor_.lod());
- t1.ShareDataWith(lod_tensor_);
-
- LoDTensor t2;
- t2.set_lod(lod_tensor_.lod());
- t2.ShareDataWith(lod_tensor_);
-
- t1.ShrinkInLevel(0, 1, 2);
- t2.ShrinkInLevel(0, 0, 1);
- EXPECT_NE(t1.data(), t2.data());
- EXPECT_NE(t1.data(), lod_tensor_.data());
-}
-
TEST(LodExpand, test) {
LoD lod{{0, 2}};
LoDTensor tensor;
@@ -187,5 +115,21 @@ TEST(LoD, AppendLoD) {
EXPECT_EQ(origin, expected);
}
+TEST(LoD, ToAbsOffset) {
+ LoD relative_lod;
+ relative_lod.push_back(std::vector({0, 2}));
+ relative_lod.push_back(std::vector({0, 1, 3}));
+ relative_lod.push_back(std::vector({0, 2, 4, 5}));
+
+ LoD abs_lod = paddle::framework::ToAbsOffset(relative_lod);
+
+ LoD expected;
+ expected.push_back(std::vector({0, 5}));
+ expected.push_back(std::vector({0, 2, 5}));
+ expected.push_back(std::vector({0, 2, 4, 5}));
+
+ EXPECT_EQ(abs_lod, expected);
+}
+
} // namespace framework
} // namespace paddle
diff --git a/paddle/framework/op_desc.cc b/paddle/framework/op_desc.cc
index 781bbb4c19f1c610df485c3061ca8b510e727019..1c0372bb16c04e155a68a0411939e4887322107a 100644
--- a/paddle/framework/op_desc.cc
+++ b/paddle/framework/op_desc.cc
@@ -64,8 +64,9 @@ class CompileTimeInferShapeContext : public InferShapeContext {
PADDLE_ENFORCE_EQ(in_var->GetType(), proto::VarDesc::LOD_TENSOR,
"The %d-th output of Output(%s) must be LoDTensor.", j,
out);
- out_var->SetLoDLevel(in_var->GetLodLevel());
+ out_var->SetLoDLevel(in_var->GetLoDLevel());
}
+
bool IsRuntime() const override;
protected:
@@ -260,7 +261,13 @@ struct SetAttrDescVisitor : public boost::static_visitor {
void operator()(int v) const { attr_->set_i(v); }
void operator()(float v) const { attr_->set_f(v); }
void operator()(const std::string &v) const { attr_->set_s(v); }
- void operator()(bool b) const { attr_->set_b(b); }
+
+ // Please refer to https://github.com/PaddlePaddle/Paddle/issues/7162
+ template ::value>::type>
+ void operator()(T b) const {
+ attr_->set_b(b);
+ }
void operator()(const std::vector &v) const {
VectorToRepeated(v, attr_->mutable_ints());
@@ -274,9 +281,7 @@ struct SetAttrDescVisitor : public boost::static_visitor {
void operator()(const std::vector &v) const {
VectorToRepeated(v, attr_->mutable_bools());
}
- void operator()(proto::BlockDesc *desc) const {
- attr_->set_block_idx(desc->idx());
- }
+ void operator()(BlockDesc *desc) const { attr_->set_block_idx(desc->ID()); }
void operator()(boost::blank) const { PADDLE_THROW("Unexpected branch"); }
};
@@ -379,7 +384,7 @@ void OpDesc::InferVarType(BlockDesc *block) const {
for (auto &out_pair : this->outputs_) {
for (auto &out_var_name : out_pair.second) {
block->FindRecursiveOrCreateVar(out_var_name)
- ->SetType(proto::VarDesc::LOD_TENSOR);
+ .SetType(proto::VarDesc::LOD_TENSOR);
}
}
}
diff --git a/paddle/framework/op_desc.h b/paddle/framework/op_desc.h
index 4cf784a0d0d319d09caa27b4e2b589bd7ac4f324..a5ffb162928bfd355d35d3f9b63aab59a88dd061 100644
--- a/paddle/framework/op_desc.h
+++ b/paddle/framework/op_desc.h
@@ -129,7 +129,7 @@ class OpDesc {
}
proto::OpDesc desc_;
- // input arg name => output variable names
+ // input arg name => input variable names
VariableNameMap inputs_;
// output arg name => output variable names
VariableNameMap outputs_;
diff --git a/paddle/framework/op_kernel_type.h b/paddle/framework/op_kernel_type.h
index 97b542e345feab0bab701dd967558ce23375dc7f..053897784c1c4350deadf39e2a009220d38f65f9 100644
--- a/paddle/framework/op_kernel_type.h
+++ b/paddle/framework/op_kernel_type.h
@@ -26,13 +26,12 @@ namespace framework {
struct OpKernelType {
struct Hash {
size_t operator()(const OpKernelType& key) const {
- int place = key.place_.which() + (1 << LEFT_SHIFT);
- int data_type =
- static_cast(key.data_type_) + (1 << (LEFT_SHIFT + 1));
- int data_layout =
- static_cast(key.data_layout_) + (1 << (LEFT_SHIFT + 2));
- int library_type =
- static_cast(key.library_type_) + (1 << (LEFT_SHIFT + 3));
+ int place = key.place_.which();
+ int data_type = static_cast(key.data_type_) << LEFT_SHIFT;
+ int data_layout = static_cast(key.data_layout_) << (LEFT_SHIFT * 2);
+ int library_type = static_cast(key.library_type_)
+ << (LEFT_SHIFT * 3);
+
std::hash hasher;
return hasher(place + data_type + data_layout + library_type);
}
@@ -68,6 +67,8 @@ struct OpKernelType {
data_type_ == o.data_type_ && data_layout_ == o.data_layout_ &&
library_type_ == o.library_type_;
}
+
+ bool operator!=(const OpKernelType& o) const { return !(*this == o); }
};
inline std::ostream& operator<<(std::ostream& os,
@@ -78,5 +79,11 @@ inline std::ostream& operator<<(std::ostream& os,
return os;
}
+inline std::string KernelTypeToString(const OpKernelType& kernel_key) {
+ std::ostringstream stream;
+ stream << kernel_key;
+ return stream.str();
+}
+
} // namespace framework
} // namespace paddle
diff --git a/paddle/framework/op_kernel_type_test.cc b/paddle/framework/op_kernel_type_test.cc
index dd048405007974667bbb8a052b77ab8b3aa4580e..649afeee8a846b0579545f2edff77e9dbe3b4dd8 100644
--- a/paddle/framework/op_kernel_type_test.cc
+++ b/paddle/framework/op_kernel_type_test.cc
@@ -26,10 +26,8 @@ TEST(OpKernelType, ToString) {
OpKernelType op_kernel_type(DataType::FP32, CPUPlace(), DataLayout::kNCHW,
LibraryType::kCUDNN);
- std::ostringstream stream;
- stream << op_kernel_type;
ASSERT_EQ(
- stream.str(),
+ paddle::framework::KernelTypeToString(op_kernel_type),
"data_type[5]:data_layout[NCHW]:place[CPUPlace]:library_type[CUDNN]");
}
diff --git a/paddle/framework/op_registry.h b/paddle/framework/op_registry.h
index bdaa25918155caca4b64b0ed60aa3f6be03eb12f..d75c0233e8e0134ddf4edc50c07490a234b65cd0 100644
--- a/paddle/framework/op_registry.h
+++ b/paddle/framework/op_registry.h
@@ -37,8 +37,8 @@ class Registrar {
public:
// In our design, various kinds of classes, e.g., operators and kernels,
// have their corresponding registry and registrar. The action of
- // registration is in the constructor of a global registrar variable, which,
- // however, are not used in the code that calls package framework, and would
+ // registration is in the constructor of a global registrar variable, which
+ // are not used in the code that calls package framework, and would
// be removed from the generated binary file by the linker. To avoid such
// removal, we add Touch to all registrar classes and make USE_OP macros to
// call this method. So, as long as the callee code calls USE_OP, the global
diff --git a/paddle/framework/op_registry_test.cc b/paddle/framework/op_registry_test.cc
index cef530c6e639f6e2188869fa57d114ec6b885aa8..66f07b6757fe1fe613e61ac66057be43ef5aced7 100644
--- a/paddle/framework/op_registry_test.cc
+++ b/paddle/framework/op_registry_test.cc
@@ -12,13 +12,16 @@
See the License for the specific language governing permissions and
limitations under the License. */
-#include "paddle/framework/op_registry.h"
+#include
#include
+#include "paddle/framework/op_registry.h"
+
namespace pd = paddle::framework;
namespace paddle {
namespace framework {
+
class CosineOp : public OperatorBase {
public:
using OperatorBase::OperatorBase;
@@ -215,7 +218,7 @@ class OpWithKernelTest : public OperatorWithKernel {
protected:
void InferShape(InferShapeContext* ctx) const override {}
- framework::OpKernelType GetActualKernelType(
+ framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(proto::DataType::FP32, ctx.device_context());
}
@@ -252,7 +255,6 @@ TEST(OperatorRegistrar, CPU) {
op->Run(scope, cpu_place);
}
-#ifdef PADDLE_WITH_CUDA
TEST(OperatorRegistrar, CUDA) {
paddle::framework::proto::OpDesc op_desc;
paddle::platform::CUDAPlace cuda_place(0);
@@ -263,4 +265,127 @@ TEST(OperatorRegistrar, CUDA) {
op->Run(scope, cuda_place);
}
-#endif
+
+static int op_test_value = 0;
+
+using paddle::platform::DeviceContext;
+using paddle::platform::CPUDeviceContext;
+using paddle::platform::CUDADeviceContext;
+
+namespace paddle {
+namespace framework {
+
+class OpWithMultiKernelTest : public OperatorWithKernel {
+ public:
+ using OperatorWithKernel::OperatorWithKernel;
+
+ protected:
+ void InferShape(InferShapeContext* ctx) const override {}
+
+ framework::OpKernelType GetExpectedKernelType(
+ const framework::ExecutionContext& ctx) const override {
+ return framework::OpKernelType(
+ proto::DataType::FP32, platform::CUDAPlace(0), DataLayout::kAnyLayout,
+ framework::LibraryType::kCUDNN);
+ }
+};
+
+template
+class OpMultiKernelTest : public paddle::framework::OpKernel {
+ public:
+ void Compute(const paddle::framework::ExecutionContext& ctx) const;
+};
+
+template
+class OpMultiKernelTest
+ : public paddle::framework::OpKernel {
+ public:
+ void Compute(const paddle::framework::ExecutionContext& ctx) const {
+ ++op_test_value;
+ }
+};
+
+template
+class OpMultiKernelTest
+ : public paddle::framework::OpKernel {
+ public:
+ void Compute(const paddle::framework::ExecutionContext& ctx) const {
+ --op_test_value;
+ }
+};
+
+template
+class OpMultiKernelTest2 : public paddle::framework::OpKernel {
+ public:
+ void Compute(const paddle::framework::ExecutionContext& ctx) const;
+};
+
+template
+class OpMultiKernelTest2
+ : public paddle::framework::OpKernel {
+ public:
+ void Compute(const paddle::framework::ExecutionContext& ctx) const {
+ op_test_value += 10;
+ }
+};
+
+template
+class OpMultiKernelTest2
+ : public paddle::framework::OpKernel {
+ public:
+ void Compute(const paddle::framework::ExecutionContext& ctx) const {
+ op_test_value -= 10;
+ }
+};
+
+} // namespace framework
+} // namespace paddle
+
+REGISTER_OP_WITHOUT_GRADIENT(op_with_multi_kernel,
+ paddle::framework::OpWithMultiKernelTest,
+ paddle::framework::OpKernelTestMaker);
+REGISTER_OP_KERNEL(
+ op_with_multi_kernel, CPU, paddle::platform::CPUPlace,
+ paddle::framework::OpMultiKernelTest);
+REGISTER_OP_KERNEL(
+ op_with_multi_kernel, MKLDNN, paddle::platform::CPUPlace,
+ paddle::framework::OpMultiKernelTest2);
+REGISTER_OP_KERNEL(
+ op_with_multi_kernel, CUDA, paddle::platform::CUDAPlace,
+ paddle::framework::OpMultiKernelTest);
+REGISTER_OP_KERNEL(
+ op_with_multi_kernel, CUDNN, paddle::platform::CUDAPlace,
+ paddle::framework::OpMultiKernelTest2);
+
+TEST(OperatorRegistrar, OpWithMultiKernel) {
+ paddle::framework::proto::OpDesc op_desc;
+ paddle::platform::CUDAPlace cuda_place(0);
+ paddle::platform::CPUPlace cpu_place;
+ paddle::framework::Scope scope;
+
+ op_desc.set_type("op_with_multi_kernel");
+ auto op = paddle::framework::OpRegistry::CreateOp(op_desc);
+
+ // TODO(qiao) add priority back
+ // use all available kernels
+ paddle::framework::UseALL();
+ op->Run(scope, cuda_place);
+ EXPECT_EQ(op_test_value, -10);
+
+ // remove cuda kernels
+ paddle::framework::UseCPU();
+ op->Run(scope, cpu_place);
+
+ EXPECT_EQ(op_test_value, -9);
+
+ // add cuda kernels
+ paddle::framework::UseCUDA();
+ op->Run(scope, cuda_place);
+
+ EXPECT_EQ(op_test_value, -10);
+
+ // use cudnn kernel
+ paddle::framework::UseCUDNN();
+ op->Run(scope, cuda_place);
+ EXPECT_EQ(op_test_value, -20);
+}
diff --git a/paddle/framework/operator.cc b/paddle/framework/operator.cc
index 886f73e7b81c35cac573bd041e6462eb2111bf85..35ebe48ba682f135b7f85edb3b2999db7c29e51a 100644
--- a/paddle/framework/operator.cc
+++ b/paddle/framework/operator.cc
@@ -11,13 +11,13 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
+#include
#include
-#include
#include "paddle/framework/data_transform.h"
+#include "paddle/framework/device_data_transform.h"
#include "paddle/framework/executor.h"
-#include "paddle/framework/lod_tensor_array.h"
#include "paddle/framework/operator.h"
#include "paddle/framework/shape_inference.h"
#include "paddle/framework/var_type.h"
@@ -25,6 +25,66 @@ limitations under the License. */
namespace paddle {
namespace framework {
+std::vector> kKernelPriority;
+
+void UseCPU() {
+ kKernelPriority.clear();
+ /*Plain CPU*/
+ auto pair0 = std::make_tuple(platform::CPUPlace(), LibraryType::kPlain);
+ kKernelPriority.insert(kKernelPriority.begin(), pair0);
+}
+
+void UseMKLDNN() {
+ UseCPU();
+#if PADDLE_WITH_MKLML
+ {
+ /*MKLDNN Kernel*/
+ auto pair0 = std::make_tuple(platform::CPUPlace(), LibraryType::kMKLDNN);
+ kKernelPriority.insert(kKernelPriority.begin(), pair0);
+ }
+#endif
+}
+
+void UseCUDA() {
+ UseMKLDNN();
+#if PADDLE_WITH_CUDA
+ /*Plain GPU*/
+ auto pair0 = std::make_tuple(platform::CUDAPlace(0), LibraryType::kPlain);
+ kKernelPriority.insert(kKernelPriority.begin(), pair0);
+#endif
+}
+
+void UseCUDNN() {
+ UseCUDA();
+#if PADDLE_WITH_CUDA
+ if (platform::dynload::HasCUDNN()) {
+ /*CUDNN Kernel*/
+ auto pair0 = std::make_tuple(platform::CUDAPlace(0), LibraryType::kCUDNN);
+ kKernelPriority.insert(kKernelPriority.begin(), pair0);
+ }
+#endif
+}
+
+void UseALL() {
+ UseCPU();
+ UseMKLDNN();
+ UseCUDA();
+ UseCUDNN();
+}
+
+static DDim GetDims(const Scope& scope, const std::string& name) {
+ Variable* var = scope.FindVar(name);
+ if (var == nullptr) {
+ return DDim({-1});
+ } else if (var->IsType()) {
+ return var->Get().dims();
+ } else if (var->IsType()) {
+ return var->Get().GetCompleteDims();
+ } else {
+ return DDim({-1});
+ }
+}
+
std::string OperatorBase::Input(const std::string& name) const {
auto& ins = Inputs(name);
PADDLE_ENFORCE_LE(ins.size(), 1UL,
@@ -57,7 +117,7 @@ const std::vector& OperatorBase::Outputs(
return it->second;
}
-std::string OperatorBase::DebugString() const {
+std::string OperatorBase::DebugStringEx(const Scope* scope) const {
std::stringstream ss;
ss << "Op(" << type_ << "), inputs:{";
for (auto it = inputs_.begin(); it != inputs_.end();) {
@@ -65,6 +125,9 @@ std::string OperatorBase::DebugString() const {
ss << input.first << "[";
for (size_t i = 0; i < input.second.size(); ++i) {
ss << input.second[i];
+ if (scope) {
+ ss << "(" << GetDims(*scope, input.second[i]) << ")";
+ }
if (i != input.second.size() - 1) {
ss << ", ";
}
@@ -81,6 +144,9 @@ std::string OperatorBase::DebugString() const {
ss << output.first << "[";
for (size_t i = 0; i < output.second.size(); ++i) {
ss << output.second[i];
+ if (scope) {
+ ss << "(" << GetDims(*scope, output.second[i]) << ")";
+ }
if (i != output.second.size() - 1) {
ss << ", ";
}
@@ -178,6 +244,10 @@ void OperatorBase::GenerateTemporaryNames() {
}
}
+static bool VarIsTensor(const Variable* var) {
+ return var->IsType() || var->IsType();
+}
+
static const Tensor* GetTensorFromVar(const Variable* var) {
const Tensor* t = nullptr;
if (var->IsType()) {
@@ -185,7 +255,8 @@ static const Tensor* GetTensorFromVar(const Variable* var) {
} else if (var->IsType()) {
t = &(var->Get().value());
} else {
- PADDLE_THROW("Variable type must be LoDTensor/SelectedRows.");
+ PADDLE_THROW("Variable type_id %s, expect LoDTensor/SelectedRows.",
+ var->Type().name());
}
return t;
}
@@ -197,7 +268,8 @@ static Tensor* GetMutableTensorFromVar(Variable* var) {
} else if (var->IsType()) {
t = var->GetMutable()->mutable_value();
} else {
- PADDLE_THROW("Variable type must be LoDTensor/SelectedRows.");
+ PADDLE_THROW("Variable type_id %s, expect LoDTensor/SelectedRows.",
+ var->Type().name());
}
return t;
}
@@ -347,6 +419,25 @@ class RuntimeInferShapeContext : public InferShapeContext {
auto in_tensor = in_var->Get();
auto* out_tensor = out_var->GetMutable();
out_tensor->set_lod(in_tensor.lod());
+
+ // TODO(dzhwinter) : reuse ShareLoD in most operators.
+ // Need to call ShareLayout explicitly in sequence related ops.
+ // Shall we have a better method to shared info between in/out Tensor?
+ out_tensor->set_layout(in_tensor.layout());
+ }
+
+ void ShareLayout(const std::string& in, const std::string& out, size_t i = 0,
+ size_t j = 0) const {
+ PADDLE_ENFORCE_LT(i, Inputs(in).size());
+ PADDLE_ENFORCE_LT(j, Outputs(out).size());
+ Variable* in_var = scope_.FindVar(Inputs(in)[i]);
+ Variable* out_var = scope_.FindVar(Outputs(out)[j]);
+ if (!in_var->IsType()) return;
+ PADDLE_ENFORCE(out_var->IsType(),
+ "The %d-th output of Output(%s) must be LoDTensor.", j, out);
+ auto in_tensor = in_var->Get();
+ auto* out_tensor = out_var->GetMutable();
+ out_tensor->set_layout(in_tensor.layout());
}
bool IsRuntime() const override { return true; }
@@ -359,7 +450,8 @@ class RuntimeInferShapeContext : public InferShapeContext {
} else if (var->IsType()) {
return var->Get().GetCompleteDims();
} else {
- PADDLE_THROW("Variable type must be LoDTensor/SelectedRows.");
+ PADDLE_THROW("Variable %s type_id %s, expect LoDTensor/SelectedRows.",
+ name, var->Type().name());
}
}
@@ -370,7 +462,8 @@ class RuntimeInferShapeContext : public InferShapeContext {
} else if (var->IsType()) {
var->GetMutable()->set_height(dim[0]);
} else {
- PADDLE_THROW("Variable type must be LoDTensor/SelectedRows.");
+ PADDLE_THROW("Variable %s type_id %s, expect LoDTensor/SelectedRows.",
+ name, var->Type().name());
}
}
@@ -388,8 +481,8 @@ void OperatorWithKernel::Run(const Scope& scope,
const platform::Place& place) const {
RuntimeInferShapeContext infer_shape_ctx(*this, scope);
this->InferShape(&infer_shape_ctx);
- platform::DeviceContextPool& pool = platform::DeviceContextPool::Get();
- auto dev_ctx = pool.Borrow(place);
+ platform::DeviceContextPool& pool = platform::DeviceContextPool::Instance();
+ auto dev_ctx = pool.Get(place);
// check if op[type] has kernel registered.
auto& all_op_kernels = AllOpKernels();
@@ -399,61 +492,59 @@ void OperatorWithKernel::Run(const Scope& scope,
"There are no kernels which are registered in the %s operator.", type_);
}
- // check if op[type] have kernel for kernel_key
- OpKernelMap& kernels = kernels_iter->second;
-
ExecutionContext ctx(*this, scope, *dev_ctx);
- auto actual_kernel_key = GetActualKernelType(ctx);
- auto expected_kernel_key = GetExpectedKernelType(actual_kernel_key);
- auto kernel_iter = kernels.find(expected_kernel_key);
-
- if (kernel_iter == kernels.end()) {
- PADDLE_THROW("The operator %s does not support %s", type_,
- expected_kernel_key);
- }
+ auto expected_kernel_key = this->GetExpectedKernelType(ctx);
- if (actual_kernel_key == expected_kernel_key) {
- kernel_iter->second->Compute(ctx);
- } else {
- Scope& op_scope = scope.NewScope();
- auto input_vars = this->InputVars();
- for (auto var_name : input_vars) {
- op_scope.Var(var_name);
- }
-
- // TODO(qijun) get appropriate DeviceContext from DeviceContext pool
- platform::DeviceContext* trans_dev_ctx = nullptr;
- std::vector trans_dev_ctx_vec{trans_dev_ctx};
-
- // TODO(qijun) get appropriate DataTransformFN from global map
- framework::DataTransformFN trans_fun = nullptr;
+ OpKernelMap& kernels = kernels_iter->second;
- // Wait for transform starting
- dev_ctx->Wait();
+ for (auto& candidate : kKernelPriority) {
+ auto candidate_key =
+ OpKernelType(expected_kernel_key.data_type_, std::get<0>(candidate),
+ expected_kernel_key.data_layout_, std::get<1>(candidate));
- for (auto var_name : input_vars) {
- trans_fun(trans_dev_ctx_vec, *(scope.FindVar(var_name)),
- op_scope.FindVar(var_name));
- }
- // Wait for data transform finishing
- for (auto ctx : trans_dev_ctx_vec) {
- ctx->Wait();
+ if ((candidate_key == expected_kernel_key) ||
+ (kernels.count(candidate_key))) {
+ expected_kernel_key = candidate_key;
+ break;
}
+ }
- // Create a new ExecutionContext
- ExecutionContext op_ctx(*this, op_scope, *dev_ctx);
- kernel_iter->second->Compute(op_ctx);
+ VLOG(3) << "expected_kernel_key:" << expected_kernel_key;
+
+ Scope& new_scope = scope.NewScope();
+
+ for (auto& var_name_item : this->Inputs()) {
+ for (auto& var_name : var_name_item.second) {
+ auto* var = scope.FindVar(var_name);
+ if (var && VarIsTensor(var)) {
+ auto* tensor_in = GetTensorFromVar(var);
+ if (tensor_in->IsInitialized()) {
+ auto kernel_type_for_var = this->GetKernelTypeForVar(
+ var_name_item.first, *tensor_in, expected_kernel_key);
+ if (kernel_type_for_var != expected_kernel_key) {
+ auto out_var_names = OutputVars(true);
+ if (std::find(out_var_names.begin(), out_var_names.end(),
+ var_name) != out_var_names.end()) {
+ PADDLE_THROW(
+ "var %s is both input and output, "
+ "does not support transform",
+ var_name);
+ }
+ VLOG(3) << "need to do transform for var " << var_name;
+ auto* trans_var = new_scope.Var(var_name);
+ auto* out = DataTransform(expected_kernel_key, kernel_type_for_var,
+ *tensor_in);
+ CopyVariableWithTensor(*var, *out, *trans_var);
+ }
+ }
+ }
+ }
}
-}
-OpKernelType OperatorWithKernel::GetActualKernelType(
- const ExecutionContext& ctx) const {
- return OpKernelType(IndicateDataType(ctx), ctx.GetPlace());
-}
+ auto kernel_iter = kernels.find(expected_kernel_key);
-OpKernelType OperatorWithKernel::GetExpectedKernelType(
- const OpKernelType& actual_kernel_type) const {
- return actual_kernel_type;
+ kernel_iter->second->Compute(ExecutionContext(
+ *this, new_scope, *pool.Get(expected_kernel_key.place_)));
}
proto::DataType OperatorWithKernel::IndicateDataType(
@@ -485,5 +576,16 @@ proto::DataType OperatorWithKernel::IndicateDataType(
return static_cast(data_type);
}
+OpKernelType OperatorWithKernel::GetExpectedKernelType(
+ const ExecutionContext& ctx) const {
+ return OpKernelType(IndicateDataType(ctx), ctx.GetPlace());
+}
+
+OpKernelType OperatorWithKernel::GetKernelTypeForVar(
+ const std::string& var_name, const Tensor& tensor,
+ const OpKernelType& expected_kernel_type) const {
+ return OpKernelType(expected_kernel_type.data_type_, tensor.place());
+}
+
} // namespace framework
} // namespace paddle
diff --git a/paddle/framework/operator.h b/paddle/framework/operator.h
index d0a9b643d565d6651fd7ec0b515f088362852ba3..d5feb598649c97a9517b7c2b1764fd54ff9f8693 100644
--- a/paddle/framework/operator.h
+++ b/paddle/framework/operator.h
@@ -17,6 +17,7 @@ limitations under the License. */
#include
#include
#include
+#include
#include
#include
@@ -52,10 +53,33 @@ constexpr char kGradVarSuffix[] = "@GRAD";
/// Variables with this suffix are supposed to be filled up with zeros.
constexpr char kZeroVarSuffix[] = "@ZERO";
-// define some kernel hint
-const std::string kUseCPU = "use_cpu";
-const std::string kUseCUDNN = "use_cudnn";
-const std::string kUseMKLDNN = "use_mkldnn";
+// define some kernel priority
+extern std::vector> kKernelPriority;
+
+/**
+ * @brief Use cpu kernel only
+ */
+void UseCPU();
+
+/**
+ * @brief Perfer MKLDNN kernel than Plain CPU kernel
+ */
+void UseMKLDNN();
+
+/**
+ * @brief Perfer CUDA kernel than Plain CPU kernel
+ */
+void UseCUDA();
+
+/**
+ * @brief Perfer cudnn kernel than Plain CUDA kernel
+ */
+void UseCUDNN();
+
+/**
+ * @brief Use all available kernels
+ */
+void UseALL();
inline std::string GradVarName(const std::string& var_name) {
return var_name + kGradVarSuffix;
@@ -84,7 +108,10 @@ class OperatorBase {
return boost::get(attrs_.at(name));
}
- virtual std::string DebugString() const;
+ /// if scope is not null, also show dimensions of arguments
+ virtual std::string DebugStringEx(const Scope* scope) const;
+
+ std::string DebugString() const { return DebugStringEx(nullptr); }
/// Net will call this function to Run an op.
virtual void Run(const Scope& scope, const platform::Place& place) const = 0;
@@ -381,9 +408,10 @@ class OperatorWithKernel : public OperatorBase {
}
protected:
- virtual OpKernelType GetActualKernelType(const ExecutionContext& ctx) const;
- virtual OpKernelType GetExpectedKernelType(
- const OpKernelType& actual_kernel_type) const;
+ virtual OpKernelType GetExpectedKernelType(const ExecutionContext& ctx) const;
+ virtual OpKernelType GetKernelTypeForVar(
+ const std::string& var_name, const Tensor& tensor,
+ const OpKernelType& expected_kernel_type) const;
private:
// indicate kernel DataType by input data. Defaultly all input data must be
diff --git a/paddle/framework/operator_test.cc b/paddle/framework/operator_test.cc
index 4d38a7ada91af834aa1a19b49e36d606ebe786ba..b69d7c7a7406eb3e18d385c568cb9c21b9b4107b 100644
--- a/paddle/framework/operator_test.cc
+++ b/paddle/framework/operator_test.cc
@@ -69,7 +69,7 @@ REGISTER_OP_WITHOUT_GRADIENT(test_operator,
paddle::framework::OpWithoutKernelCheckerMaker);
TEST(OperatorBase, all) {
- paddle::framework::InitDevices({"CPU"});
+ paddle::framework::InitDevices();
paddle::framework::proto::OpDesc op_desc;
op_desc.set_type("test_operator");
BuildVar("input", {"IN1"}, op_desc.add_inputs());
@@ -114,7 +114,8 @@ class OpWithKernelTest : public OperatorWithKernel {
protected:
void InferShape(framework::InferShapeContext* ctx) const override {}
- OpKernelType GetActualKernelType(const ExecutionContext& ctx) const override {
+ OpKernelType GetExpectedKernelType(
+ const ExecutionContext& ctx) const override {
return OpKernelType(proto::DataType::FP32, ctx.GetPlace());
}
};
@@ -194,7 +195,7 @@ REGISTER_OP_CPU_KERNEL(op_with_kernel,
// test with single input
TEST(OpKernel, all) {
- paddle::framework::InitDevices({"CPU"});
+ paddle::framework::InitDevices();
paddle::framework::proto::OpDesc op_desc;
op_desc.set_type("op_with_kernel");
BuildVar("x", {"IN1"}, op_desc.add_inputs());
@@ -224,7 +225,7 @@ REGISTER_OP_CPU_KERNEL(op_multi_inputs_with_kernel,
TEST(OpKernel, multi_inputs) {
using namespace paddle::framework;
- paddle::framework::InitDevices({"CPU"});
+ paddle::framework::InitDevices();
proto::OpDesc op_desc;
op_desc.set_type("op_multi_inputs_with_kernel");
@@ -263,7 +264,7 @@ class OperatorClone : public paddle::framework::OperatorBase {
};
TEST(Operator, Clone) {
- paddle::framework::InitDevices({"CPU"});
+ paddle::framework::InitDevices();
OperatorClone a("ABC", paddle::framework::VariableNameMap{},
paddle::framework::VariableNameMap{},
paddle::framework::AttributeMap{});
diff --git a/paddle/framework/scope.cc b/paddle/framework/scope.cc
index 0c01d605bcd95f5796fba1e5a3351a2640b2898a..2bd0ac8f5a9eb6439a4196dd9c61e13797c1a8e3 100644
--- a/paddle/framework/scope.cc
+++ b/paddle/framework/scope.cc
@@ -17,6 +17,7 @@ limitations under the License. */
#include // for unique_ptr
#include // for call_once
#include "glog/logging.h"
+#include "paddle/framework/threadpool.h"
#include "paddle/string/printf.h"
namespace paddle {
@@ -87,7 +88,8 @@ void Scope::DeleteScope(Scope* scope) {
auto it = std::find(this->kids_.begin(), this->kids_.end(), scope);
PADDLE_ENFORCE(it != this->kids_.end(), "Cannot find %p as kid scope", scope);
this->kids_.erase(it);
- delete scope;
+ // Make delete async.
+ Async([scope] { delete scope; });
}
void Scope::Rename(const std::string& origin_name,
@@ -107,6 +109,7 @@ std::string Scope::Rename(const std::string& origin_name) const {
Rename(origin_name, var_name);
return var_name;
}
+
Variable* Scope::FindVarLocally(const std::string& name) const {
auto it = vars_.find(name);
if (it != vars_.end()) return it->second;
diff --git a/paddle/framework/scope.h b/paddle/framework/scope.h
index 10143326dfa201894c777b3e5e226d5ca5015eda..a1da81cc7977d2f31b99c41fb3db3ec03188f954 100644
--- a/paddle/framework/scope.h
+++ b/paddle/framework/scope.h
@@ -75,9 +75,9 @@ class Scope {
// Rename variable to a new name and return the new name
std::string Rename(const std::string& origin_name) const;
- private:
Variable* FindVarLocally(const std::string& name) const;
+ private:
// Call Scope::NewScope for a sub-scope.
explicit Scope(Scope const* parent) : parent_(parent) {}
diff --git a/paddle/framework/selected_rows.cc b/paddle/framework/selected_rows.cc
index c74459c9dd7006a24615b1d6df041583088fb25c..3b3e60177a495cc99f38ee8b82af41c4c76b8652 100644
--- a/paddle/framework/selected_rows.cc
+++ b/paddle/framework/selected_rows.cc
@@ -12,5 +12,58 @@ limitations under the License. */
#include "paddle/framework/selected_rows.h"
namespace paddle {
-namespace framework {} // namespace framework
+namespace framework {
+void SerializeToStream(std::ostream& os, const SelectedRows& selected_rows,
+ const platform::DeviceContext& dev_ctx) {
+ { // the 1st field, uint32_t version
+ constexpr uint32_t version = 0;
+ os.write(reinterpret_cast