未验证 提交 b9fdd3bc 编写于 作者: C Chen Weihang 提交者: GitHub

Paddle Tensor Operation Library initial implementation (#34425)

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details
Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
Co-authored-by: Nzyfncg <1370305206@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
上级 3c0a68ce
......@@ -116,6 +116,20 @@ function(find_fluid_modules TARGET_NAME)
endif()
endfunction(find_fluid_modules)
set_property(GLOBAL PROPERTY PTEN_MODULES "")
# find all pten modules is used for paddle static library
# for building inference libs
function(find_pten_modules TARGET_NAME)
get_filename_component(__target_path ${TARGET_NAME} ABSOLUTE)
string(REGEX REPLACE "^${PADDLE_SOURCE_DIR}/" "" __target_path ${__target_path})
string(FIND "${__target_path}" "pten" pos)
if(pos GREATER 1)
get_property(pten_modules GLOBAL PROPERTY PTEN_MODULES)
set(pten_modules ${pten_modules} ${TARGET_NAME})
set_property(GLOBAL PROPERTY PTEN_MODULES "${pten_modules}")
endif()
endfunction(find_pten_modules)
function(common_link TARGET_NAME)
if (WITH_PROFILER)
target_link_libraries(${TARGET_NAME} gperftools::profiler)
......@@ -310,6 +324,7 @@ function(cc_library TARGET_NAME)
else()
add_library(${TARGET_NAME} STATIC ${cc_library_SRCS})
find_fluid_modules(${TARGET_NAME})
find_pten_modules(${TARGET_NAME})
endif()
if(cc_library_DEPS)
# Don't need link libwarpctc.so
......@@ -482,6 +497,7 @@ function(nv_library TARGET_NAME)
else()
add_library(${TARGET_NAME} STATIC ${nv_library_SRCS})
find_fluid_modules(${TARGET_NAME})
find_pten_modules(${TARGET_NAME})
endif()
if (nv_library_DEPS)
add_dependencies(${TARGET_NAME} ${nv_library_DEPS})
......@@ -572,6 +588,7 @@ function(hip_library TARGET_NAME)
else()
hip_add_library(${TARGET_NAME} STATIC ${hip_library_SRCS})
find_fluid_modules(${TARGET_NAME})
find_pten_modules(${TARGET_NAME})
endif()
if (hip_library_DEPS)
add_dependencies(${TARGET_NAME} ${hip_library_DEPS})
......
add_subdirectory(scripts)
add_subdirectory(testing)
set(PYTHON_TESTS_DIR ${PADDLE_BINARY_DIR}/python/paddle/fluid/tests CACHE INTERNAL "python tests directory")
add_subdirectory(pten)
add_subdirectory(fluid)
......@@ -197,10 +197,12 @@ cc_library(unused_var_check SRCS unused_var_check.cc DEPS glog no_need_buffer_va
IF(WITH_XPU)
cc_library(operator SRCS operator.cc DEPS xpu_op_list op_info device_context tensor scope glog trainer_desc_proto data_feed_proto
shape_inference data_transform lod_tensor profiler transfer_scope_cache op_kernel_type op_call_stack unused_var_check nan_inf_utils)
shape_inference data_transform lod_tensor profiler transfer_scope_cache op_kernel_type op_call_stack unused_var_check nan_inf_utils
pten pten_utils kernel_factory)
ELSE()
cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope glog trainer_desc_proto data_feed_proto
shape_inference data_transform lod_tensor profiler transfer_scope_cache op_kernel_type op_call_stack unused_var_check nan_inf_utils)
shape_inference data_transform lod_tensor profiler transfer_scope_cache op_kernel_type op_call_stack unused_var_check nan_inf_utils
pten pten_utils kernel_factory)
ENDIF()
cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry device_context)
......@@ -394,6 +396,8 @@ cc_library(save_load_util SRCS save_load_util.cc DEPS tensor scope layer)
cc_test(save_load_util_test SRCS save_load_util_test.cc DEPS save_load_util tensor scope layer)
cc_library(generator SRCS generator.cc DEPS enforce place)
cc_library(pten_utils SRCS pten_utils.cc DEPS lod_tensor selected_rows place pten var_type_traits pten_hapi_utils)
# Get the current working branch
execute_process(
COMMAND git rev-parse --abbrev-ref HEAD
......@@ -456,3 +460,4 @@ if(WITH_TESTING AND TEST selected_rows_test)
endif()
cc_test(scope_guard_test SRCS scope_guard_test.cc)
cc_test(pten_utils_test SRCS pten_utils_test.cc DEPS pten_utils)
......@@ -29,6 +29,7 @@ limitations under the License. */
#include "paddle/fluid/framework/var_type.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/profiler.h"
#include "paddle/pten/common/scalar.h"
namespace paddle {
namespace framework {
......@@ -49,6 +50,7 @@ DECLARE_bool(check_nan_inf);
DECLARE_bool(enable_unused_var_check);
PADDLE_DEFINE_EXPORTED_int32(inner_op_parallelism, 0,
"number of threads for inner op");
DECLARE_bool(run_pten_kernel);
namespace paddle {
namespace framework {
......@@ -1120,8 +1122,24 @@ void OperatorWithKernel::RunImpl(const Scope& scope,
}
#endif
auto exe_ctx = ExecutionContext(*this, scope, *dev_ctx, *runtime_ctx);
// TODO(chenweihang): Now we are still reusing a lot of the original fluid
// implementation, this is a gradual replacement process
// TODO(chenweihang): in the first phase of project, we only support CPU, CUDA
// and RCOM backend, the XPU, NPU and MKLDNN will be supported in the second
// phase
if (FLAGS_run_pten_kernel &&
pten::KernelFactory::Instance().HasCompatiblePtenKernel(type_)) {
if (pt_kernel_signature_.get() == nullptr || pt_kernel_.get() == nullptr) {
ChoosePtenKernel(exe_ctx);
}
run_pten_kernel_ = pt_kernel_->IsValid();
}
if (!run_pten_kernel_) {
if (kernel_type_.get() == nullptr || kernel_func_.get() == nullptr) {
ChooseKernel(*runtime_ctx, scope, place);
ChooseKernel(exe_ctx);
}
}
// do data transformScope &transfer_scope;
......@@ -1159,9 +1177,14 @@ void OperatorWithKernel::RunImpl(const Scope& scope,
{
platform::RecordEvent record_event("compute",
platform::EventRole::kInnerOp);
if (run_pten_kernel_) {
auto op_kernel_ctx = BuildPtenKernelContext(*runtime_ctx, *dev_ctx);
(*pt_kernel_)(&op_kernel_ctx);
} else {
(*kernel_func_)(
ExecutionContext(*this, exec_scope, *dev_ctx, *runtime_ctx));
}
}
if (!transfered_inplace_vars.empty()) {
// there is inplace variable has been transferred.
......@@ -1208,25 +1231,11 @@ void OperatorWithKernel::RunImpl(const Scope& scope,
}
}
void OperatorWithKernel::ChooseKernel(const RuntimeContext& ctx,
const Scope& scope,
const platform::Place& place) const {
platform::DeviceContextPool& pool = platform::DeviceContextPool::Instance();
auto* dev_ctx = pool.Get(place);
// check if op[type] has kernel registered.
auto& all_op_kernels = AllOpKernels();
auto kernels_iter = all_op_kernels.find(type_);
PADDLE_ENFORCE_NE(
kernels_iter, all_op_kernels.end(),
platform::errors::Unavailable(
"There are no kernels which are registered in the %s operator.",
type_));
OpKernelMap& kernels = kernels_iter->second;
OpKernelType OperatorWithKernel::InnerGetExpectedKernelType(
const ExecutionContext& ctx) const {
auto& dev_ctx = ctx.device_context();
auto expected_kernel_key = this->GetExpectedKernelType(
ExecutionContext(*this, scope, *dev_ctx, ctx));
auto expected_kernel_key = this->GetExpectedKernelType(ctx);
if (HasAttr("op_device")) {
if (Attr<std::string>("op_device") == "cpu") {
expected_kernel_key.place_ = platform::CPUPlace();
......@@ -1243,9 +1252,9 @@ void OperatorWithKernel::ChooseKernel(const RuntimeContext& ctx,
// when the Op that only has CPUKernel is assigned to GPU, the CPUKernel
// will be executed and a warning will be given at the same time.
if (SupportGPU()) {
expected_kernel_key.place_ = dev_ctx->GetPlace();
expected_kernel_key.place_ = dev_ctx.GetPlace();
} else if (SupportNPU()) {
expected_kernel_key.place_ = dev_ctx->GetPlace();
expected_kernel_key.place_ = dev_ctx.GetPlace();
} else {
expected_kernel_key.place_ = platform::CPUPlace();
LOG_FIRST_N(WARNING, 1)
......@@ -1256,6 +1265,47 @@ void OperatorWithKernel::ChooseKernel(const RuntimeContext& ctx,
}
VLOG(3) << "op type:" << type_
<< ", expected_kernel_key:" << expected_kernel_key;
return expected_kernel_key;
}
void OperatorWithKernel::ChoosePtenKernel(const ExecutionContext& ctx) const {
pt_kernel_signature_.reset(
new KernelSignature(std::move(this->GetExpectedPtenKernelArgs(ctx))));
VLOG(1) << KernelSignatureToString(*pt_kernel_signature_.get());
kernel_type_.reset(
new OpKernelType(std::move(InnerGetExpectedKernelType(ctx))));
auto pt_kernel_name = pten::KernelName(pt_kernel_signature_->name);
auto pt_kernel_key = TransOpKernelTypeToPtenKernelKey(*kernel_type_.get());
pt_kernel_.reset(
new pten::Kernel(pten::KernelFactory::Instance().SelectKernel(
pt_kernel_name, pt_kernel_key)));
if (pt_kernel_->IsValid()) {
VLOG(1) << "Static mode ChoosePtenKernel - kernel name: " << pt_kernel_name
<< " | kernel key: " << pt_kernel_key
<< " | kernel: " << *pt_kernel_;
} else {
VLOG(1) << "Static mode ChoosePtenKernel - kernel `" << pt_kernel_name
<< "` not found.";
}
}
void OperatorWithKernel::ChooseKernel(const ExecutionContext& ctx) const {
// check if op[type] has kernel registered.
auto& all_op_kernels = AllOpKernels();
auto kernels_iter = all_op_kernels.find(type_);
PADDLE_ENFORCE_NE(
kernels_iter, all_op_kernels.end(),
platform::errors::Unavailable(
"There are no kernels which are registered in the %s operator.",
type_));
OpKernelMap& kernels = kernels_iter->second;
auto expected_kernel_key = InnerGetExpectedKernelType(ctx);
auto kernel_iter = kernels.find(expected_kernel_key);
#ifdef PADDLE_WITH_MKLDNN
......@@ -1562,11 +1612,10 @@ Scope* OperatorWithKernel::PrepareData(
}
void OperatorWithKernel::ParseInputDataType(
const ExecutionContext& ctx, const std::string& name,
const std::vector<Variable*>& vars, const std::string& name,
proto::VarType::Type* data_type) const {
proto::VarType::Type default_data_type =
static_cast<proto::VarType::Type>(-1);
const std::vector<Variable*> vars = ctx.MultiInputVar(name);
for (size_t i = 0; i < vars.size(); ++i) {
const Variable* var = vars[i];
if (var != nullptr) {
......@@ -1588,10 +1637,9 @@ void OperatorWithKernel::ParseInputDataType(
if (t != nullptr) {
PADDLE_ENFORCE_EQ(
t->IsInitialized(), true,
platform::errors::InvalidArgument(
"The Tensor in the %s Op's Input Variable %s(%s) is "
"not initialized.",
Type(), name, ctx.InputNames(name).at(i)));
platform::errors::InvalidArgument("The %s Op's Input Variable `%s` "
"contains uninitialized Tensor.",
Type(), name));
proto::VarType::Type tmp = t->type();
PADDLE_ENFORCE(tmp == *data_type || *data_type == default_data_type,
platform::errors::InvalidArgument(
......@@ -1614,7 +1662,8 @@ proto::VarType::Type OperatorWithKernel::IndicateDataType(
static_cast<proto::VarType::Type>(-1);
proto::VarType::Type data_type = dafault_data_type;
for (auto& input : ctx.InNameList()) {
ParseInputDataType(ctx, input, &data_type);
const std::vector<Variable*> vars = ctx.MultiInputVar(input);
ParseInputDataType(vars, input, &data_type);
}
PADDLE_ENFORCE_NE(
data_type, dafault_data_type,
......@@ -1628,7 +1677,7 @@ proto::VarType::Type OperatorWithKernel::IndicateVarDataType(
proto::VarType::Type dafault_data_type =
static_cast<proto::VarType::Type>(-1);
proto::VarType::Type data_type = dafault_data_type;
ParseInputDataType(ctx, name, &data_type);
ParseInputDataType(ctx.MultiInputVar(name), name, &data_type);
PADDLE_ENFORCE_NE(
data_type, dafault_data_type,
platform::errors::InvalidArgument(
......@@ -1711,5 +1760,115 @@ OpKernelType OperatorWithKernel::GetKernelTypeForVar(
tensor.layout());
}
KernelSignature OperatorWithKernel::GetExpectedPtenKernelArgs(
const ExecutionContext& ctx) const {
if (!KernelSignatureMap::Instance().Has(Type())) {
// TODO(chenweihang): we can generate this map by proto info in compile time
KernelArgsNameMakerByOpProto maker(Info().proto_);
KernelSignatureMap::Instance().Emplace(
Type(), std::move(maker.GetKernelSignature()));
}
return KernelSignatureMap::Instance().Get(Type());
}
pten::KernelContext OperatorWithKernel::BuildPtenKernelContext(
const RuntimeContext& ctx, const platform::DeviceContext& dev_ctx) const {
// TODO(chenweihang): now only work for very simple case,
// many cases need to be deal with later:
// 1. the input and output are not tensor
// 2. the dispensbale, duplicable input and output
// 3. needless attributes remove
// 4. use pt Tensor directly
// 5. kernel input is not DenseTensor
pten::KernelContext op_kernel_ctx(dev_ctx);
auto& input_names = std::get<0>(pt_kernel_signature_->args);
auto& attr_names = std::get<1>(pt_kernel_signature_->args);
auto& output_names = std::get<2>(pt_kernel_signature_->args);
auto input_defs = pt_kernel_->args_def().input_defs();
auto attr_defs = pt_kernel_->args_def().attribute_defs();
auto output_defs = pt_kernel_->args_def().output_defs();
PADDLE_ENFORCE_EQ(input_names.size(), input_defs.size(),
platform::errors::InvalidArgument(
"The size of inputs_args names (%d) must be equal to "
"the size of kernel input_defs (%d).",
input_names.size(), input_defs.size()));
PADDLE_ENFORCE_EQ(output_names.size(), output_defs.size(),
platform::errors::InvalidArgument(
"The size of outputs_args names (%d) must be equal to "
"the size of kernel output_defs (%d).",
output_names.size(), output_defs.size()));
PADDLE_ENFORCE_EQ(attr_names.size(), attr_defs.size(),
platform::errors::InvalidArgument(
"The size of attribute_args names (%d) must be equal "
"to the size of kernel attribute_defs (%d).",
attr_names.size(), attr_defs.size()));
for (size_t i = 0; i < input_names.size(); ++i) {
auto in_def = input_defs.at(i);
VLOG(2) << "in_def: " << in_def.backend << ", " << in_def.dtype << ", "
<< in_def.layout;
auto ins_vector = ctx.inputs.at(input_names[i]);
paddle::SmallVector<std::shared_ptr<pten::TensorBase>> tmp_inputs;
for (auto var : ins_vector) {
tmp_inputs.emplace_back(
experimental::MakePtenTensorBaseFromVar(*var, in_def));
}
op_kernel_ctx.EmplaceBackInputs(std::move(tmp_inputs));
}
for (size_t i = 0; i < output_names.size(); ++i) {
auto out_def = output_defs.at(i);
auto outs_vector = ctx.outputs.at(output_names[i]);
paddle::SmallVector<std::shared_ptr<pten::TensorBase>> tmp_outputs;
for (auto var : outs_vector) {
tmp_outputs.emplace_back(
experimental::MakePtenTensorBaseFromVar(var, out_def));
}
op_kernel_ctx.EmplaceBackOutputs(std::move(tmp_outputs));
}
for (size_t i = 0; i < attr_names.size(); ++i) {
auto& attr = Attrs().at(attr_names[i]);
if (attr_defs[i].type_index == std::type_index(typeid(pten::Scalar))) {
// TODO(chenweihang): support other attrs later
// TODO(zhangyunfei): Scalar should hold scaler type, and we should check
// attribtue type by attr_defs
if (std::type_index(attr.type()) == std::type_index(typeid(float))) {
op_kernel_ctx.EmplaceBackAttr(
std::move(pten::Scalar(BOOST_GET_CONST(float, attr))));
} else {
PADDLE_THROW(platform::errors::Unimplemented(
"unsupported cast op attribute `%s` to Scalar when construct "
"KernelContext.",
attr_names[i]));
}
} else {
// TODO(chenweihang): support other attrs later
if (attr_defs[i].type_index == std::type_index(typeid(int))) {
op_kernel_ctx.EmplaceBackAttr(BOOST_GET_CONST(int, attr));
} else if (attr_defs[i].type_index == std::type_index(typeid(float))) {
op_kernel_ctx.EmplaceBackAttr(BOOST_GET_CONST(float, attr));
} else if (attr_defs[i].type_index == std::type_index(typeid(bool))) {
op_kernel_ctx.EmplaceBackAttr(BOOST_GET_CONST(bool, attr));
} else {
PADDLE_THROW(platform::errors::Unimplemented(
"unsupported cast op attribute `%s` when construct "
"KernelContext.",
attr_names[i]));
}
}
}
return op_kernel_ctx;
}
} // namespace framework
} // namespace paddle
......@@ -30,6 +30,7 @@ limitations under the License. */
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_info.h"
#include "paddle/fluid/framework/op_kernel_type.h"
#include "paddle/fluid/framework/pten_utils.h"
#include "paddle/fluid/framework/scope.h"
#include "paddle/fluid/framework/selected_rows.h"
#include "paddle/fluid/framework/tensor.h"
......@@ -39,6 +40,8 @@ limitations under the License. */
#include "paddle/fluid/platform/variant.h"
#include "paddle/utils/flat_hash_map.h"
#include "paddle/pten/api/include/core.h"
namespace paddle {
namespace framework {
class InferShapeContext;
......@@ -529,6 +532,17 @@ class OperatorWithKernel : public OperatorBase {
return kernel_type_->place_;
}
/* member functions for adapting to pten lib */
/** In the Tensor calculation library, the new Kernel adopts a clearer and
* more streamlined design. The arguments of the Kernel and the input and
* output arguments registered in the original OpMaker do not match in some
* cases, so we use map to record the arguments required by the kernel.
* When selecting Kernel during Op execution, select the arguments of the
* original Op according to the GetExpectedPtenKernelArgs returned arguments.
*/
virtual KernelSignature GetExpectedPtenKernelArgs(
const ExecutionContext& ctx) const;
private:
void RunImpl(const Scope& scope, const platform::Place& place) const final;
void RunImpl(const Scope& scope, const platform::Place& place,
......@@ -550,8 +564,9 @@ class OperatorWithKernel : public OperatorBase {
const std::vector<std::string>& inplace_vars,
const Scope& exec_scope) const;
void ChooseKernel(const RuntimeContext& ctx, const Scope& scope,
const platform::Place& place) const;
OpKernelType InnerGetExpectedKernelType(const ExecutionContext& ctx) const;
void ChooseKernel(const ExecutionContext& ctx) const;
void HandleComplexGradToRealGrad(const Scope& scope,
RuntimeContext* ctx) const;
......@@ -561,12 +576,19 @@ class OperatorWithKernel : public OperatorBase {
// By default all input data must be same.
proto::VarType::Type IndicateDataType(const ExecutionContext& ctx) const;
// used for IndicateDataType
void ParseInputDataType(const ExecutionContext& ctx, const std::string& name,
proto::VarType::Type* type) const;
void ParseInputDataType(const std::vector<Variable*>& vars,
const std::string& name,
proto::VarType::Type* data_type) const;
// used for IndicateOrPromoteVarDataTypes
Tensor* GetTensorFormInputSafely(const ExecutionContext& ctx,
const std::string& name) const;
/* member functions for adapting to pten lib */
void ChoosePtenKernel(const ExecutionContext& ctx) const;
pten::KernelContext BuildPtenKernelContext(
const RuntimeContext& ctx, const platform::DeviceContext& dev_ctx) const;
protected:
mutable std::unique_ptr<OpKernelType> kernel_type_;
mutable std::unique_ptr<OpKernelFunc> kernel_func_;
......@@ -577,6 +599,12 @@ class OperatorWithKernel : public OperatorBase {
mutable bool all_kernels_must_compute_runtime_shape_ = false;
mutable std::mutex cache_update_mutex_;
mutable bool enable_cache_transfer_scope_ = false;
// NOTE(chenweihang): Similar op members are used to adapt to
// new pten kernel, if there is a better design in the future,
// we may polish the implementation here
mutable bool run_pten_kernel_ = false;
mutable std::unique_ptr<KernelSignature> pt_kernel_signature_;
mutable std::unique_ptr<pten::Kernel> pt_kernel_;
};
extern bool OpSupportGPU(const std::string& op_type);
......
......@@ -439,9 +439,8 @@ TEST(IndicateVarDataTypeTest, lodtensor) {
std::string ex_msg = err.what();
EXPECT_TRUE(
ex_msg.find(
"The Tensor in the indicate_lod_tensor_data_type_test Op's "
"Input Variable LoDTensor(lodtensor_1) is not initialized") !=
std::string::npos);
"The indicate_lod_tensor_data_type_test Op's Input Variable "
"`LoDTensor` contains uninitialized Tensor.") != std::string::npos);
}
ASSERT_TRUE(caught);
}
......@@ -466,9 +465,9 @@ TEST(IndicateVarDataTypeTest, selectedrows) {
caught = true;
std::string ex_msg = err.what();
EXPECT_TRUE(
ex_msg.find("The Tensor in the indicate_selected_rows_data_type_test "
"Op's Input Variable SelectedRows(selected_rows_1) is not "
"initialized") != std::string::npos);
ex_msg.find("The indicate_selected_rows_data_type_test Op's "
"Input Variable `SelectedRows` contains uninitialized "
"Tensor.") != std::string::npos);
}
ASSERT_TRUE(caught);
}
......
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <sstream>
#include "paddle/fluid/framework/pten_utils.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/selected_rows.h"
#include "paddle/fluid/framework/variable.h"
#include "paddle/fluid/string/string_helper.h"
namespace paddle {
namespace framework {
OpKernelType TransPtenKernelKeyToOpKernelType(
const pten::KernelKey& kernel_key) {
proto::VarType::Type data_type =
pten::TransToProtoVarType(kernel_key.dtype());
platform::Place place = pten::TransToFluidPlace(kernel_key.backend());
DataLayout data_layout = pten::TransToFluidDataLayout(kernel_key.layout());
LibraryType library_type = LibraryType::kPlain;
if (kernel_key.backend() == pten::Backend::MKLDNN) {
library_type = LibraryType::kMKLDNN;
} else if (kernel_key.backend() == pten::Backend::CUDNN) {
library_type = LibraryType::kCUDNN;
} else {
// do nothing
}
// TODO(chenweihang): the customized_type_value is lost
return OpKernelType(data_type, place, data_layout, library_type);
}
pten::KernelKey TransOpKernelTypeToPtenKernelKey(
const OpKernelType& kernel_type) {
pten::Backend backend = pten::TransToPtenBackend(kernel_type.place_);
if (kernel_type.library_type_ == LibraryType::kMKLDNN) {
backend = pten::Backend::MKLDNN;
} else if (kernel_type.library_type_ == LibraryType::kCUDNN) {
backend = pten::Backend::CUDNN;
} else {
// do
}
paddle::experimental::DataLayout layout =
pten::TransToPtenDataLayout(kernel_type.data_layout_);
paddle::experimental::DataType dtype =
pten::TransToPtenDataType(kernel_type.data_type_);
return pten::KernelKey(backend, layout, dtype);
}
const paddle::SmallVector<std::string>&
KernelArgsNameMakerByOpProto::GetInputArgsNames() {
for (int i = 0; i < op_proto_->inputs_size(); ++i) {
auto& in = op_proto_->inputs()[i];
auto& in_name = in.name();
if ((in.has_extra() && in.extra()) || (in.has_quant() && in.quant())) {
VLOG(1) << "Parse PtenKernel input: skip extra & quant input - "
<< in_name;
continue;
}
// If contains dispensable input, we should override the
// GetExpectedPtenKernelArgs method self
if (in.has_dispensable() && in.dispensable()) {
VLOG(1) << "Parse PtenKernel input: skip dispensable input - " << in_name;
continue;
}
VLOG(1) << "Parse PtenKernel input: " << in_name;
input_names_.emplace_back(in_name);
}
return input_names_;
}
const paddle::SmallVector<std::string>&
KernelArgsNameMakerByOpProto::GetOutputArgsNames() {
for (int i = 0; i < op_proto_->outputs_size(); ++i) {
auto& out = op_proto_->outputs()[i];
auto& out_name = out.name();
// TODO(chenweihang): outputs also need skip some cases
VLOG(1) << "Parse PtenKernel output: " << out_name;
output_names_.emplace_back(out_name);
}
return output_names_;
}
const paddle::SmallVector<std::string>&
KernelArgsNameMakerByOpProto::GetAttrsArgsNames() {
for (int i = 0; i < op_proto_->attrs_size(); ++i) {
auto& attr = op_proto_->attrs()[i];
auto& attr_name = attr.name();
if (attr_name == "use_mkldnn" || attr_name == "op_role" ||
attr_name == "op_role_var" || attr_name == "op_namescope" ||
attr_name == "op_callstack" || attr_name == "op_device") {
VLOG(1) << "Parse PtenKernel attribute: skip needless attr - "
<< attr_name;
continue;
}
if ((attr.has_extra() && attr.extra()) ||
(attr.has_quant() && attr.quant())) {
VLOG(1) << "Parse PtenKernel attribute: skip extra & quant attr - "
<< attr_name;
continue;
}
VLOG(1) << "Parse PtenKernel attribute: " << attr_name;
attr_names_.emplace_back(attr_name);
}
return attr_names_;
}
KernelSignature KernelArgsNameMakerByOpProto::GetKernelSignature() {
return KernelSignature(op_proto_->type(), GetInputArgsNames(),
GetAttrsArgsNames(), GetOutputArgsNames());
}
std::string KernelSignatureToString(const KernelSignature& signature) {
std::stringstream os;
os << "Kernel Signature - name: " << signature.name
<< "; inputs: " << string::join_strings(std::get<0>(signature.args), ", ")
<< "; attributes: "
<< string::join_strings(std::get<1>(signature.args), ", ") << "; outputs: "
<< string::join_strings(std::get<2>(signature.args), ", ");
return os.str();
}
} // namespace framework
} // namespace paddle
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <string>
#include <unordered_map>
#include <vector>
#include "paddle/fluid/framework/framework.pb.h"
#include "paddle/fluid/framework/op_kernel_type.h"
#include "paddle/fluid/framework/tensor.h"
#include "paddle/fluid/imperative/type_defs.h"
#include "paddle/fluid/platform/macros.h"
#include "paddle/fluid/platform/place.h"
#include "paddle/pten/api/include/core.h"
#include "paddle/pten/hapi/lib/utils/tensor_utils.h"
#include "paddle/utils/flat_hash_map.h"
#include "paddle/utils/small_vector.h"
namespace paddle {
namespace framework {
/* Kernel Key translate */
OpKernelType TransPtenKernelKeyToOpKernelType(
const pten::KernelKey& kernel_key);
pten::KernelKey TransOpKernelTypeToPtenKernelKey(
const OpKernelType& kernel_type);
/* Kernel Args parse */
struct KernelSignature {
std::string name;
KernelArgsTuple args;
KernelSignature() = default;
KernelSignature(std::string&& kernel_name,
paddle::SmallVector<std::string>&& inputs,
paddle::SmallVector<std::string>&& attrs,
paddle::SmallVector<std::string>&& outputs)
: name(std::move(kernel_name)),
args(std::make_tuple(inputs, attrs, outputs)) {}
KernelSignature(const std::string& kernel_name,
const paddle::SmallVector<std::string>& inputs,
const paddle::SmallVector<std::string>& attrs,
const paddle::SmallVector<std::string>& outputs)
: name(kernel_name), args(std::make_tuple(inputs, attrs, outputs)) {}
};
// TODO(chenweihang): we can generate this map by proto info in compile time
class KernelSignatureMap {
public:
static KernelSignatureMap& Instance() {
static KernelSignatureMap g_kernel_signature_map;
return g_kernel_signature_map;
}
bool Has(const std::string& op_type) const {
return map_.find(op_type) != map_.end();
}
void Emplace(const std::string& op_type, KernelSignature&& signature) {
if (!Has(op_type)) {
map_.emplace(op_type, signature);
}
}
const KernelSignature& Get(const std::string& op_type) const {
auto it = map_.find(op_type);
PADDLE_ENFORCE_NE(
it, map_.end(),
platform::errors::NotFound(
"Operator `%s`'s kernel signature is not registered.", op_type));
return it->second;
}
private:
KernelSignatureMap() = default;
paddle::flat_hash_map<std::string, KernelSignature> map_;
DISABLE_COPY_AND_ASSIGN(KernelSignatureMap);
};
class KernelArgsNameMaker {
public:
virtual ~KernelArgsNameMaker() {}
virtual const paddle::SmallVector<std::string>& GetInputArgsNames() = 0;
virtual const paddle::SmallVector<std::string>& GetOutputArgsNames() = 0;
virtual const paddle::SmallVector<std::string>& GetAttrsArgsNames() = 0;
};
class KernelArgsNameMakerByOpProto : public KernelArgsNameMaker {
public:
explicit KernelArgsNameMakerByOpProto(framework::proto::OpProto* op_proto)
: op_proto_(op_proto) {}
~KernelArgsNameMakerByOpProto() {}
const paddle::SmallVector<std::string>& GetInputArgsNames() override;
const paddle::SmallVector<std::string>& GetOutputArgsNames() override;
const paddle::SmallVector<std::string>& GetAttrsArgsNames() override;
KernelSignature GetKernelSignature();
private:
framework::proto::OpProto* op_proto_;
paddle::SmallVector<std::string> input_names_;
paddle::SmallVector<std::string> output_names_;
paddle::SmallVector<std::string> attr_names_;
};
std::string KernelSignatureToString(const KernelSignature& signature);
} // namespace framework
} // namespace paddle
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/framework/pten_utils.h"
#include "gtest/gtest.h"
#include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/selected_rows.h"
#include "paddle/fluid/framework/variable.h"
TEST(PtenUtils, TransPtenKernelKeyToOpKernelType) {
pten::KernelKey kernel_key(pten::Backend::CPU, pten::DataLayout::NCHW,
pten::DataType::FLOAT32);
auto op_kernel_type =
paddle::framework::TransPtenKernelKeyToOpKernelType(kernel_key);
ASSERT_EQ(op_kernel_type.data_type_, paddle::framework::proto::VarType::FP32);
ASSERT_EQ(op_kernel_type.data_layout_, paddle::framework::DataLayout::kNCHW);
ASSERT_TRUE(paddle::platform::is_cpu_place(op_kernel_type.place_));
ASSERT_EQ(op_kernel_type.library_type_,
paddle::framework::LibraryType::kPlain);
#ifdef PADDLE_WITH_MKLDNN
pten::KernelKey kernel_key_mkldnn(
pten::Backend::MKLDNN, pten::DataLayout::NCHW, pten::DataType::FLOAT32);
op_kernel_type =
paddle::framework::TransPtenKernelKeyToOpKernelType(kernel_key_mkldnn);
ASSERT_EQ(op_kernel_type.data_type_, paddle::framework::proto::VarType::FP32);
ASSERT_EQ(op_kernel_type.data_layout_, paddle::framework::DataLayout::kNCHW);
ASSERT_TRUE(paddle::platform::is_cpu_place(op_kernel_type.place_));
ASSERT_EQ(op_kernel_type.library_type_,
paddle::framework::LibraryType::kMKLDNN);
#endif
#ifdef PADDLE_WITH_CUDA
pten::KernelKey kernel_key_cudnn(pten::Backend::CUDNN, pten::DataLayout::NCHW,
pten::DataType::FLOAT32);
op_kernel_type =
paddle::framework::TransPtenKernelKeyToOpKernelType(kernel_key_cudnn);
ASSERT_EQ(op_kernel_type.data_type_, paddle::framework::proto::VarType::FP32);
ASSERT_EQ(op_kernel_type.data_layout_, paddle::framework::DataLayout::kNCHW);
ASSERT_TRUE(paddle::platform::is_gpu_place(op_kernel_type.place_));
ASSERT_EQ(op_kernel_type.library_type_,
paddle::framework::LibraryType::kCUDNN);
#endif
}
......@@ -17,11 +17,13 @@ limitations under the License. */
#include <map>
#include <memory>
#include <string>
#include <tuple>
#include <unordered_map>
#include <unordered_set>
#include <vector>
#include "paddle/fluid/imperative/type_defs.h"
#include "paddle/fluid/platform/variant.h"
#include "paddle/utils/small_vector.h"
namespace paddle {
namespace framework {
......@@ -33,8 +35,8 @@ class BlockDesc;
class Variable;
class InferNoNeedBufferVarsFN;
using VariableNameMap = std::map<std::string, std::vector<std::string>>;
// TODO(panyx0718): Replace vector with something like gtl::Vector.
using VariableNameMap = std::map<std::string, std::vector<std::string>>;
using VariableValueMap = std::map<std::string, std::vector<Variable*>>;
// The order should be as same as framework.proto
......@@ -82,5 +84,10 @@ using InferShapeFN = std::function<void(InferShapeContext*)>;
using InplacePair = std::unordered_map<std::string, std::string>;
using InferInplaceOpFN = std::function<InplacePair(bool /*use_cuda*/)>;
// tuple(input_names, attr_names, output_names)
using KernelArgsTuple = std::tuple<paddle::SmallVector<std::string>,
paddle::SmallVector<std::string>,
paddle::SmallVector<std::string>>;
} // namespace framework
} // namespace paddle
cc_library(imperative_flag SRCS flags.cc DEPS gflags flags)
IF(WITH_XPU)
cc_library(prepared_operator SRCS prepared_operator.cc DEPS xpu_op_list proto_desc operator device_context lod_tensor selected_rows var_type_traits op_kernel_type data_transform nan_inf_utils)
cc_library(prepared_operator SRCS prepared_operator.cc DEPS xpu_op_list proto_desc operator device_context lod_tensor selected_rows var_type_traits op_kernel_type data_transform nan_inf_utils pten_utils)
ELSE()
cc_library(prepared_operator SRCS prepared_operator.cc DEPS proto_desc operator device_context lod_tensor selected_rows var_type_traits op_kernel_type data_transform nan_inf_utils)
cc_library(prepared_operator SRCS prepared_operator.cc DEPS proto_desc operator device_context lod_tensor selected_rows var_type_traits op_kernel_type data_transform nan_inf_utils pten_utils)
ENDIF()
cc_library(layer SRCS layer.cc DEPS prepared_operator math_function imperative_flag variable_helper op_registry)
add_subdirectory(jit)
......
......@@ -17,10 +17,13 @@
#include "paddle/fluid/framework/data_type_transform.h"
#include "paddle/fluid/framework/details/nan_inf_utils.h"
#include "paddle/fluid/imperative/infer_shape_context.h"
#include "paddle/pten/common/scalar.h"
#include "paddle/utils/small_vector.h"
#ifdef PADDLE_WITH_XPU
#include "paddle/fluid/platform/xpu/xpu_op_list.h"
#endif
DECLARE_bool(check_nan_inf);
DECLARE_bool(run_pten_kernel);
DECLARE_bool(benchmark);
namespace paddle {
......@@ -46,6 +49,21 @@ const framework::Tensor* GetTensorFromVar(const framework::Variable& var) {
}
}
static const framework::Attribute& GetAttr(
const framework::AttributeMap& attrs,
const framework::AttributeMap& default_attrs, const std::string& name) {
auto it = attrs.find(name);
bool found = it != attrs.end();
if (!found) {
it = default_attrs.find(name);
found = it != default_attrs.end();
}
PADDLE_ENFORCE_EQ(
found, true,
platform::errors::NotFound("(%s) is not found in AttributeMap.", name));
return it->second;
}
template <typename VarType>
static void HandleComplexGradToRealGrad(const NameVarMap<VarType>& outs) {
for (auto& pair : outs) {
......@@ -89,6 +107,21 @@ PreparedOp::PreparedOp(const framework::OperatorBase& op,
func_(func),
dev_ctx_(dev_ctx) {}
PreparedOp::PreparedOp(const framework::OperatorBase& op,
const framework::RuntimeContext& ctx,
const framework::OpKernelType& kernel_type,
const framework::KernelSignature& kernel_signature,
const pten::Kernel& pt_kernel,
platform::DeviceContext* dev_ctx)
: op_(op),
ctx_(ctx),
kernel_type_(kernel_type),
func_(nullptr),
dev_ctx_(dev_ctx),
run_pten_kernel_(true),
pt_kernel_signature_(kernel_signature),
pt_kernel_(pt_kernel) {}
template <typename VarType>
PreparedOp PrepareImpl(const NameVarMap<VarType>& ins,
const NameVarMap<VarType>& outs,
......@@ -115,11 +148,36 @@ PreparedOp PrepareImpl(const NameVarMap<VarType>& ins,
#endif
// 1. get expected kernel key
auto expected_kernel_key = op.GetExpectedKernelType(
DygraphExecutionContext<VarType>(op, framework::Scope(), *dev_ctx, ctx,
ins, outs, attrs, default_attrs));
auto dygraph_exe_ctx = DygraphExecutionContext<VarType>(
op, framework::Scope(), *dev_ctx, ctx, ins, outs, attrs, default_attrs);
auto expected_kernel_key = op.GetExpectedKernelType(dygraph_exe_ctx);
VLOG(3) << "expected_kernel_key:" << expected_kernel_key;
if (FLAGS_run_pten_kernel &&
pten::KernelFactory::Instance().HasCompatiblePtenKernel(op.Type())) {
auto pt_kernel_signature = op.GetExpectedPtenKernelArgs(dygraph_exe_ctx);
VLOG(1) << framework::KernelSignatureToString(pt_kernel_signature);
auto pt_kernel_name = pten::KernelName(pt_kernel_signature.name);
auto pt_kernel_key = TransOpKernelTypeToPtenKernelKey(expected_kernel_key);
auto pt_kernel = pten::KernelFactory::Instance().SelectKernel(
pt_kernel_name, pt_kernel_key);
if (pt_kernel.IsValid()) {
VLOG(1) << "Dynamic mode PrepareImpl - kernel name: " << pt_kernel_name
<< " | kernel key: " << pt_kernel_key
<< " | kernel: " << pt_kernel;
// TODO(chenweihang): using CPUKernel when miss device kernel case
return PreparedOp(op, ctx, expected_kernel_key, pt_kernel_signature,
pt_kernel, dev_ctx);
} else {
VLOG(1) << "Dynamic mode ChoosePtenKernel - kernel `" << pt_kernel_name
<< "` not found.";
}
}
// 2. check if op[type] has kernel registered.
auto& all_op_kernels = op.AllOpKernels();
auto kernels_iter = all_op_kernels.find(op.Type());
......@@ -153,7 +211,8 @@ PreparedOp PrepareImpl(const NameVarMap<VarType>& ins,
kernel_iter = kernels.find(expected_kernel_key);
}
#endif
// TODO(jiabin): Add operator.cc's line 1000 part back when we need that case
// TODO(jiabin): Add operator.cc's line 1000 part back when we need that
// case
PADDLE_ENFORCE_NE(kernel_iter, kernels.end(),
platform::errors::NotFound(
"Operator %s does not have kernel for %s.", op.Type(),
......@@ -185,6 +244,109 @@ PreparedOp PreparedOp::Prepare(const NameVarMap<VariableWrapper>& ins,
default_attrs);
}
template <typename VarType>
static pten::KernelContext BuildDygraphPtenKernelContext(
const framework::KernelSignature& pt_kernel_signature,
const pten::Kernel& pt_kernel, const NameVarMap<VarType>& ins,
const NameVarMap<VarType>& outs, const framework::AttributeMap& attrs,
const framework::AttributeMap& default_attrs,
const platform::DeviceContext& dev_ctx) {
// TODO(chenweihang): now only work for very simple case,
// many cases need to be deal with later:
// 1. the input and output are not tensor
// 2. the dispensbale, duplicable input and output
// 3. needless attributes remove
// 4. use pt Tensor directly
// 5. kernel input is not DenseTensor
pten::KernelContext op_kernel_ctx(dev_ctx);
auto& input_names = std::get<0>(pt_kernel_signature.args);
auto& attr_names = std::get<1>(pt_kernel_signature.args);
auto& output_names = std::get<2>(pt_kernel_signature.args);
auto& input_defs = pt_kernel.args_def().input_defs();
auto& output_defs = pt_kernel.args_def().output_defs();
auto& attr_defs = pt_kernel.args_def().attribute_defs();
PADDLE_ENFORCE_EQ(input_names.size(), input_defs.size(),
platform::errors::InvalidArgument(
"the size of inputs_args names (%d) must be equal to "
"the size of kernel input_defs (%d).",
input_names.size(), input_defs.size()));
PADDLE_ENFORCE_EQ(output_names.size(), output_defs.size(),
platform::errors::InvalidArgument(
"the size of outputs_args names (%d) must be equal to "
"the size of kernel output_defs (%d).",
output_names.size(), output_defs.size()));
PADDLE_ENFORCE_EQ(attr_names.size(), attr_defs.size(),
platform::errors::InvalidArgument(
"the size of attribute_args names (%d) must be equal "
"to the size of kernel attribute_defs (%d).",
attr_names.size(), attr_defs.size()));
for (size_t i = 0; i < input_names.size(); ++i) {
auto& in_def = input_defs.at(i);
auto& ins_vector = ins.at(input_names[i]);
paddle::SmallVector<std::shared_ptr<pten::TensorBase>> tmp_inputs;
for (auto var : ins_vector) {
const auto& variable = var->Var();
tmp_inputs.emplace_back(
experimental::MakePtenTensorBaseFromVar(variable, in_def));
}
op_kernel_ctx.EmplaceBackInputs(std::move(tmp_inputs));
}
for (size_t i = 0; i < output_names.size(); ++i) {
auto& out_def = output_defs.at(i);
auto& outs_vector = outs.at(output_names[i]);
paddle::SmallVector<std::shared_ptr<pten::TensorBase>> tmp_outputs;
for (auto var : outs_vector) {
auto* variable = var->MutableVar();
tmp_outputs.emplace_back(
experimental::MakePtenTensorBaseFromVar(variable, out_def));
}
op_kernel_ctx.EmplaceBackOutputs(std::move(tmp_outputs));
}
for (size_t i = 0; i < attr_names.size(); ++i) {
auto& attr = GetAttr(attrs, default_attrs, attr_names[i]);
if (attr_defs[i].type_index == std::type_index(typeid(pten::Scalar))) {
// TODO(chenweihang): support other attrs later
// TODO(zhangyunfei): Scalar should hold scaler type, and we should check
// attribtue type by attr_defs
if (std::type_index(attr.type()) == std::type_index(typeid(float))) {
op_kernel_ctx.EmplaceBackAttr(
std::move(pten::Scalar(BOOST_GET_CONST(float, attr))));
} else {
PADDLE_THROW(platform::errors::Unimplemented(
"unsupported cast op attribute `%s` to Scalar when construct "
"KernelContext in dygraph.",
attr_names[i]));
}
} else {
// TODO(chenweihang): support other attrs later
if (attr_defs[i].type_index == std::type_index(typeid(int))) {
op_kernel_ctx.EmplaceBackAttr(BOOST_GET_CONST(int, attr));
} else if (attr_defs[i].type_index == std::type_index(typeid(float))) {
op_kernel_ctx.EmplaceBackAttr(BOOST_GET_CONST(float, attr));
} else if (attr_defs[i].type_index == std::type_index(typeid(bool))) {
op_kernel_ctx.EmplaceBackAttr(BOOST_GET_CONST(bool, attr));
} else {
PADDLE_THROW(platform::errors::Unimplemented(
"unsupported cast op attribute `%s` when construct "
"KernelContext in dygraph.",
attr_names[i]));
}
}
}
return op_kernel_ctx;
}
template <typename VarType>
static void PreparedOpRunImpl(
const framework::OperatorBase& op, const framework::RuntimeContext& ctx,
......@@ -239,20 +401,54 @@ static void PreparedOpRunImpl(
}
}
template <typename VarType>
static void PreparedOpRunPtImpl(
const framework::OperatorBase& op,
const framework::KernelSignature& pt_kernel_signature,
const pten::Kernel& pt_kernel, platform::DeviceContext* dev_ctx,
const NameVarMap<VarType>& ins, const NameVarMap<VarType>& outs,
const framework::AttributeMap& attrs,
const framework::AttributeMap& default_attrs) {
DygraphInferShapeContext<VarType> infer_shape_ctx(&ins, &outs, &attrs,
&default_attrs, op.Type());
static_cast<const framework::OperatorWithKernel&>(op).InferShape(
&infer_shape_ctx);
auto op_kernel_ctx = BuildDygraphPtenKernelContext<VarType>(
pt_kernel_signature, pt_kernel, ins, outs, attrs, default_attrs,
*dev_ctx);
pt_kernel(&op_kernel_ctx);
// TODO(chenweihang): add debug flags later
// TODO(chenweihang): deal with complex cases later
}
void PreparedOp::Run(const NameVarMap<VarBase>& ins,
const NameVarMap<VarBase>& outs,
const framework::AttributeMap& attrs,
const framework::AttributeMap& default_attrs) {
if (run_pten_kernel_) {
PreparedOpRunPtImpl<VarBase>(op_, pt_kernel_signature_, pt_kernel_,
dev_ctx_, ins, outs, attrs, default_attrs);
} else {
PreparedOpRunImpl<VarBase>(op_, ctx_, kernel_type_, func_, dev_ctx_, ins,
outs, attrs, default_attrs);
}
}
void PreparedOp::Run(const NameVarMap<VariableWrapper>& ins,
const NameVarMap<VariableWrapper>& outs,
const framework::AttributeMap& attrs,
const framework::AttributeMap& default_attrs) {
if (run_pten_kernel_) {
PreparedOpRunPtImpl<VariableWrapper>(op_, pt_kernel_signature_, pt_kernel_,
dev_ctx_, ins, outs, attrs,
default_attrs);
} else {
PreparedOpRunImpl<VariableWrapper>(op_, ctx_, kernel_type_, func_, dev_ctx_,
ins, outs, attrs, default_attrs);
}
}
} // namespace imperative
......
......@@ -21,10 +21,14 @@
#include "paddle/fluid/framework/data_transform.h"
#include "paddle/fluid/framework/op_kernel_type.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/pten_utils.h"
#include "paddle/fluid/framework/type_defs.h"
#include "paddle/fluid/imperative/execution_context.h"
#include "paddle/fluid/imperative/layer.h"
#include "paddle/fluid/imperative/type_defs.h"
#include "paddle/pten/api/include/core.h"
DECLARE_bool(use_mkldnn);
namespace paddle {
......@@ -147,6 +151,12 @@ class PreparedOp {
const framework::OperatorWithKernel::OpKernelFunc& func,
platform::DeviceContext* dev_ctx);
PreparedOp(const framework::OperatorBase& op,
const framework::RuntimeContext& ctx,
const framework::OpKernelType& kernel_type,
const framework::KernelSignature& kernel_signature,
const pten::Kernel& pt_kernel, platform::DeviceContext* dev_ctx);
static PreparedOp Prepare(const NameVarMap<VarBase>& ins,
const NameVarMap<VarBase>& outs,
const framework::OperatorWithKernel& op,
......@@ -178,6 +188,12 @@ class PreparedOp {
framework::OpKernelType kernel_type_;
framework::OperatorWithKernel::OpKernelFunc func_;
platform::DeviceContext* dev_ctx_;
// NOTE(chenweihang): Similar op members are used to adapt to
// new pten kernel, if there is a better design in the future,
// we may polish the implementation here
bool run_pten_kernel_{false};
framework::KernelSignature pt_kernel_signature_;
pten::Kernel pt_kernel_;
};
} // namespace imperative
......
......@@ -35,6 +35,7 @@ endif()
# fluid_modules exclude API-interface of inference/api and inference/capi_exp
get_property(fluid_modules GLOBAL PROPERTY FLUID_MODULES)
get_property(pten_modules GLOBAL PROPERTY PTEN_MODULES)
# Adapt to custom op mechanism: Include the header files related to the data type
# to avoid exposing the path of the underlying file
......@@ -50,9 +51,9 @@ set(STATIC_INFERENCE_API paddle_inference_api analysis_predictor
analysis_config paddle_pass_builder activation_functions ${mkldnn_quantizer_cfg})
#TODO(wilber, T8T9): Do we still need to support windows gpu static library?
if(WIN32 AND WITH_GPU)
cc_library(paddle_inference DEPS ${fluid_modules} ${STATIC_INFERENCE_API})
cc_library(paddle_inference DEPS ${fluid_modules} ${pten_modules} ${STATIC_INFERENCE_API})
else()
create_static_lib(paddle_inference ${fluid_modules} ${STATIC_INFERENCE_API})
create_static_lib(paddle_inference ${fluid_modules} ${pten_modules} ${STATIC_INFERENCE_API})
endif()
if(NOT APPLE)
......@@ -82,7 +83,7 @@ set(SHARED_INFERENCE_SRCS
${PADDLE_CUSTOM_OP_SRCS})
# shared inference library deps
set(SHARED_INFERENCE_DEPS ${fluid_modules} analysis_predictor)
set(SHARED_INFERENCE_DEPS ${fluid_modules} ${pten_modules} analysis_predictor)
if (WITH_CRYPTO)
set(SHARED_INFERENCE_DEPS ${SHARED_INFERENCE_DEPS} paddle_crypto)
......
......@@ -79,6 +79,8 @@ if(WITH_UNITY_BUILD)
include(unity_build_rule.cmake)
endif()
set(OP_HEADER_DEPS ${OP_HEADER_DEPS} pten)
register_operators(EXCLUDES py_layer_op py_func_op warpctc_op dgc_op load_combine_op lstm_op run_program_op eye_op
recurrent_op save_combine_op sparse_attention_op sync_batch_norm_op spectral_op cinn_launch_op ${OP_MKL_DEPS} DEPS ${OP_HEADER_DEPS})
......
......@@ -61,7 +61,7 @@ void Compare1(f::Scope* scope, const p::DeviceContext& ctx,
// run
f::AttributeMap attrs = {{"to_main_scope", false}, {"num_micro_batches", 3}};
std::map<std::string, std::vector<std::string>> output;
f::VariableNameMap output;
auto op = f::OpRegistry::CreateOp(op_type, {{"X", {"tmp"}}, {"Id", {"Id"}}},
output, attrs);
......@@ -109,7 +109,7 @@ void Compare2(f::Scope* scope, const p::DeviceContext& ctx,
// run
f::AttributeMap attrs = {{"to_main_scope", true}, {"num_micro_batches", 3}};
std::map<std::string, std::vector<std::string>> output;
f::VariableNameMap output;
auto op = f::OpRegistry::CreateOp(op_type, {{"X", {"tmp"}}, {"Id", {"Id"}}},
output, attrs);
......
......@@ -19,6 +19,11 @@
#include "paddle/fluid/operators/math/complex_functors.h"
#include "paddle/fluid/platform/for_range.h"
// only can include the headers in paddle/pten/api dirs
#include "paddle/pten/api/include/core.h"
#include "paddle/pten/api/include/linalg.h"
#include "paddle/pten/hapi/lib/utils/tensor_utils.h"
namespace paddle {
namespace operators {
......@@ -228,48 +233,23 @@ struct DotGradFunction<DeviceContext, T, math::DisableComplex<T>> {
}
};
// See Note [ Why still keep the original kernel implementation? ]
template <typename DeviceContext, typename T>
class DotKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* tensor_x = ctx.Input<Tensor>("X");
auto* tensor_y = ctx.Input<Tensor>("Y");
auto* tensor_out = ctx.Output<Tensor>("Out");
tensor_out->mutable_data<T>(ctx.GetPlace());
#if defined(__NVCC__) || defined(__HIPCC__)
if (1 == tensor_out->dims().size()) {
auto out = framework::EigenScalar<T>::From(*tensor_out);
auto x = framework::EigenVector<T>::Flatten(*tensor_x);
auto y = framework::EigenVector<T>::Flatten(*tensor_y);
auto& dev = *ctx.template device_context<DeviceContext>().eigen_device();
out.device(dev) = (x * y).sum();
} else {
auto out = framework::EigenMatrix<T>::From(*tensor_out);
auto x = framework::EigenMatrix<T>::From(*tensor_x);
auto y = framework::EigenMatrix<T>::From(*tensor_y);
auto& dev = *ctx.template device_context<DeviceContext>().eigen_device();
out.device(dev) = (x * y).sum(Eigen::DSizes<int, 1>(1));
}
#else
auto const *x = tensor_x->data<T>(), *x_ = &x[0];
auto const *y = tensor_y->data<T>(), *y_ = &y[0];
auto* z = tensor_out->data<T>();
// Loop over the total N elements of both operands while sum-reducing every
// B pairs along the way where B is the dimension of the least ordered axis
auto&& d = tensor_x->dims();
auto const N = tensor_x->numel();
auto const B = d[d.size() - 1];
for (int j = 0; j < N / B; j++) {
T ss = 0;
for (int i = 0; i < B; i++) ss += (*x_++) * (*y_++);
z[j] = ss;
}
#endif
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* out = ctx.Output<Tensor>("Out");
auto& dev_ctx = ctx.device_context<DeviceContext>();
out->mutable_data<T>(x->place());
auto pt_x = paddle::experimental::MakePtenDenseTensor(*x);
auto pt_y = paddle::experimental::MakePtenDenseTensor(*y);
auto pt_out = paddle::experimental::MakePtenDenseTensor(*out);
// call new kernel
pten::Dot<T>(dev_ctx, *pt_x.get(), *pt_y.get(), pt_out.get());
}
};
......
......@@ -47,6 +47,12 @@ class FillAnyLikeOp : public framework::OperatorWithKernel {
expected_kernel_type.place_,
tensor.layout());
}
framework::KernelSignature GetExpectedPtenKernelArgs(
const framework::ExecutionContext &ctx) const override {
return framework::KernelSignature("fill_any_like", {"X"}, {"value"},
{"Out"});
}
};
class FillAnyLikeOpMaker : public framework::OpProtoAndCheckerMaker {
......
......@@ -17,7 +17,10 @@ limitations under the License. */
#include <limits>
#include <type_traits>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/framework/pten_utils.h"
#include "paddle/pten/api/include/core.h"
#include "paddle/pten/api/include/creation.h"
namespace paddle {
namespace operators {
......@@ -31,6 +34,7 @@ class FillAnyLikeKernel : public framework::OpKernel<T> {
float, T>::type>::type;
void Compute(const framework::ExecutionContext& context) const override {
auto* in = context.Input<framework::Tensor>("X");
auto* out = context.Output<framework::Tensor>("Out");
out->mutable_data<T>(context.GetPlace());
......@@ -58,9 +62,12 @@ class FillAnyLikeKernel : public framework::OpKernel<T> {
std::isnan(value), false,
platform::errors::InvalidArgument("The filled value is NaN."));
math::SetConstant<DeviceContext, T> setter;
setter(context.template device_context<DeviceContext>(), out,
static_cast<T>(value));
auto pt_x = paddle::experimental::MakePtenDenseTensor(*in);
auto pt_out = paddle::experimental::MakePtenDenseTensor(*out);
const auto& dev_ctx = context.template device_context<DeviceContext>();
// call new kernel
pten::FillAnyLike<T>(dev_ctx, *pt_x, value, pt_out.get());
}
};
......
......@@ -25,17 +25,6 @@ namespace cub = hipcub;
namespace paddle {
namespace operators {
template <typename T>
struct DivideFunctor {
HOSTDEVICE explicit inline DivideFunctor(int n)
: n_inv(static_cast<T>(1.0 / n)) {}
HOSTDEVICE inline T operator()(const T& x) const { return x * n_inv; }
private:
T n_inv;
};
template <typename T>
__global__ void MeanRunKernel(const T* in_data, T* out_data, int N) {
int idx = blockDim.x * blockIdx.x + threadIdx.x;
......@@ -45,37 +34,6 @@ __global__ void MeanRunKernel(const T* in_data, T* out_data, int N) {
}
}
template <typename DeviceContext, typename T>
class MeanCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto* input = context.Input<Tensor>("X");
auto* output = context.Output<Tensor>("Out");
output->mutable_data<T>(context.GetPlace());
auto size_prob = input->numel();
const T* in_data = input->data<T>();
T* out_data = output->mutable_data<T>(context.GetPlace());
auto stream = context.cuda_device_context().stream();
DivideFunctor<T> transformer(size_prob);
cub::TransformInputIterator<T, DivideFunctor<T>, const T*> trans_x(
in_data, transformer);
size_t temp_storage_bytes = 0;
auto err = cub::DeviceReduce::Sum(nullptr, temp_storage_bytes, trans_x,
out_data, size_prob, stream);
PADDLE_ENFORCE_CUDA_SUCCESS(err);
framework::Tensor tmp;
auto* temp_storage = tmp.mutable_data<uint8_t>(
framework::make_ddim({static_cast<int64_t>(temp_storage_bytes)}),
context.GetPlace());
err = cub::DeviceReduce::Sum(temp_storage, temp_storage_bytes, trans_x,
out_data, size_prob, stream);
PADDLE_ENFORCE_CUDA_SUCCESS(err);
}
};
template <typename DeviceContext, typename T>
class MeanCUDAGradKernel : public framework::OpKernel<T> {
public:
......@@ -104,10 +62,11 @@ class MeanCUDAGradKernel : public framework::OpKernel<T> {
namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_CUDA_KERNEL(
mean, ops::MeanCUDAKernel<paddle::platform::CUDADeviceContext, float>,
ops::MeanCUDAKernel<paddle::platform::CUDADeviceContext, double>,
ops::MeanCUDAKernel<paddle::platform::CUDADeviceContext, plat::float16>);
mean, ops::MeanKernel<paddle::platform::CUDADeviceContext, float>,
ops::MeanKernel<paddle::platform::CUDADeviceContext, double>,
ops::MeanKernel<paddle::platform::CUDADeviceContext, plat::float16>);
REGISTER_OP_CUDA_KERNEL(
mean_grad,
ops::MeanCUDAGradKernel<paddle::platform::CUDADeviceContext, float>,
......
......@@ -15,6 +15,12 @@ limitations under the License. */
#pragma once
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/pten_utils.h"
// only can include the headers in paddle/top/api dirs
#include "paddle/pten/api/include/core.h"
#include "paddle/pten/api/include/math.h"
#include "paddle/pten/hapi/lib/utils/tensor_utils.h"
namespace paddle {
namespace operators {
......@@ -27,21 +33,40 @@ template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
/** [ Why still keep the original kernel implementation? ]
*
* Removal of the original kernel implementation and kernel registration needs
* to ensure that the new kernel mechanism adapts to multiple sets of execution
* mechanisms, including:
*
* 1. Executor and ParallelExecutor
* 2. Dygraph OpBase (Tracer and Engine)
* 3. New Executor
* 4. Predictor
* 5. NPU and XPU lack kernel and need to reuse CPU Kernel
*
* Removal of the original Kernel requires a more complete solution to ensure
* that it will not affect the current execution system.
* Currently, only the first two cases are adapted.
*
* The principle here is that the implementation in the kernel must reuse the
* corresponding functions in the Tensor Operation library and cannot maintain
* two copies of the code.
*/
template <typename DeviceContext, typename T>
class MeanKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto* input = context.Input<Tensor>("X");
auto* output = context.Output<Tensor>("Out");
output->mutable_data<T>(context.GetPlace());
auto* x = context.Input<Tensor>("X");
auto* out = context.Output<Tensor>("Out");
auto& dev_ctx = context.device_context<DeviceContext>();
out->mutable_data<T>(x->place());
auto X = EigenVector<T>::Flatten(*input);
auto y = EigenScalar<T>::From(*output);
auto& place =
*context.template device_context<DeviceContext>().eigen_device();
auto pt_x = paddle::experimental::MakePtenDenseTensor(*x);
auto pt_out = paddle::experimental::MakePtenDenseTensor(*out);
y.device(place) = X.mean();
// call new kernel
pten::Mean<T>(dev_ctx, *pt_x.get(), pt_out.get());
}
};
......
......@@ -70,6 +70,17 @@ class ScaleOp : public framework::OperatorWithKernel {
#endif
return framework::OpKernelType(input_data_type, ctx.GetPlace());
}
framework::KernelSignature GetExpectedPtenKernelArgs(
const framework::ExecutionContext &ctx) const override {
if (ctx.HasInput("ScaleTensor")) {
return framework::KernelSignature("scale.host", {"X", "ScaleTensor"},
{"bias", "bias_after_scale"}, {"Out"});
} else {
return framework::KernelSignature(
"scale", {"X"}, {"scale", "bias", "bias_after_scale"}, {"Out"});
}
}
};
class ScaleOpMaker : public framework::OpProtoAndCheckerMaker {
......
......@@ -14,9 +14,13 @@ limitations under the License. */
#pragma once
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/eigen/eigen_function.h"
#include "paddle/fluid/framework/pten_utils.h"
// only can include the headers in paddle/top/api dirs
#include "paddle/pten/api/include/core.h"
#include "paddle/pten/api/include/math.h"
#include "paddle/pten/hapi/lib/utils/tensor_utils.h"
namespace paddle {
namespace operators {
......@@ -33,6 +37,7 @@ static inline T GetAttrFromTensor(const framework::Tensor* tensor) {
return tensor_data[0];
}
// See Note [ Why still keep the original kernel implementation? ]
template <typename DeviceContext, typename T>
class ScaleKernel : public framework::OpKernel<T> {
public:
......@@ -40,13 +45,13 @@ class ScaleKernel : public framework::OpKernel<T> {
auto* in_var = ctx.InputVar("X");
auto* in = framework::GetLoDTensorOrSelectedRowsValueFromVar(*in_var);
auto bias = static_cast<T>(ctx.Attr<float>("bias"));
auto bias = ctx.Attr<float>("bias");
auto bias_after_scale = ctx.Attr<bool>("bias_after_scale");
auto scale = static_cast<T>(ctx.Attr<float>("scale"));
auto scale = ctx.Attr<float>("scale");
if (ctx.HasInput("ScaleTensor")) {
auto* scale_tensor = ctx.Input<framework::Tensor>("ScaleTensor");
scale = GetAttrFromTensor<T>(scale_tensor);
scale = static_cast<float>(GetAttrFromTensor<T>(scale_tensor));
}
auto* out_var = ctx.OutputVar("Out");
......@@ -56,22 +61,17 @@ class ScaleKernel : public framework::OpKernel<T> {
out_slr->set_rows(in_slr.rows());
out_slr->set_height(in_slr.height());
}
auto* out =
framework::GetMutableLoDTensorOrSelectedRowsValueFromVar(out_var);
out->mutable_data<T>(in->place());
auto& dev_ctx = ctx.device_context<DeviceContext>();
PADDLE_ENFORCE_EQ(in->dims(), out->dims(),
paddle::platform::errors::InvalidArgument(
"the input and output should have the same dim"
"but input dim is %s, output dim is %s",
in->dims(), out->dims()));
auto pt_x = paddle::experimental::MakePtenDenseTensor(*in);
auto pt_out = paddle::experimental::MakePtenDenseTensor(*out);
auto eigen_out = framework::EigenVector<T>::Flatten(*out);
auto eigen_in = framework::EigenVector<T>::Flatten(*in);
auto& dev = *ctx.template device_context<DeviceContext>().eigen_device();
EigenScale<std::decay_t<decltype(dev)>, T>::Eval(
dev, eigen_out, eigen_in, scale, bias, bias_after_scale);
// call new kernel
pten::Scale<T>(dev_ctx, *pt_x.get(), scale, bias, bias_after_scale,
pt_out.get());
}
};
......
......@@ -16,24 +16,31 @@ limitations under the License. */
#include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/pten_utils.h"
#include "paddle/fluid/operators/eigen/eigen_function.h"
// only can include the headers in paddle/pten/api dirs
#include "paddle/pten/api/include/core.h"
#include "paddle/pten/api/include/math.h"
namespace paddle {
namespace operators {
// See Note [ Why still keep the original kernel implementation? ]
template <typename DeviceContext, typename T>
class SignKernel : public framework::OpKernel<T> {
public:
virtual void Compute(const framework::ExecutionContext& context) const {
auto* x = context.Input<framework::Tensor>("X");
auto* out = context.Output<framework::Tensor>("Out");
auto* in = context.Input<framework::Tensor>("X");
out->mutable_data<T>(in->place());
auto eigen_out = framework::EigenVector<T>::Flatten(*out);
auto eigen_in = framework::EigenVector<T>::Flatten(*in);
auto& place =
*context.template device_context<DeviceContext>().eigen_device();
EigenSign<std::decay_t<decltype(place)>, T>::Eval(place, eigen_out,
eigen_in);
auto& dev_ctx = context.device_context<DeviceContext>();
out->mutable_data<T>(x->place());
auto pt_x = paddle::experimental::MakePtenDenseTensor(*x);
auto pt_out = paddle::experimental::MakePtenDenseTensor(*out);
// call new kernel
pten::Sign<T>(dev_ctx, *pt_x.get(), pt_out.get());
}
};
......
......@@ -109,7 +109,6 @@ register_unity_group(cc
gaussian_random_batch_size_like_op.cc
gaussian_random_op.cc
mkldnn/gaussian_random_mkldnn_op.cc
grid_sampler_op.cc
group_norm_op.cc gru_op.cc)
register_unity_group(cc
hash_op.cc
......
......@@ -169,7 +169,7 @@ if(WITH_GPU)
nv_test(device_event_test SRCS device_event_test.cc DEPS device_event_gpu)
nv_test(device_context_test SRCS device_context_test.cu DEPS device_context gpu_info)
nv_test(cudnn_helper_test SRCS cudnn_helper_test.cc DEPS dynload_cuda)
nv_test(cudnn_helper_test SRCS cudnn_helper_test.cc DEPS dynload_cuda pten)
nv_test(cudnn_desc_test SRCS cudnn_desc_test.cc DEPS dynload_cuda)
nv_test(transform_test SRCS transform_test.cu DEPS memory place device_context)
endif()
......
......@@ -188,11 +188,8 @@ struct TypeConverterImpl<T1, T2, false> {
template <typename T1, typename T2>
struct TypeConverter {
private:
static constexpr bool kIsArithmetic =
IsArithmetic<T1>() && IsArithmetic<T2>();
public:
using Type1 = typename TypeConverterImpl<T1, T2, kIsArithmetic>::Type1;
using Type2 = typename TypeConverterImpl<T1, T2, kIsArithmetic>::Type2;
};
......
......@@ -681,6 +681,18 @@ PADDLE_DEFINE_EXPORTED_bool(
apply_pass_to_program, false,
"It controls whether to apply IR pass to program when using Fleet APIs");
/**
* Pt kernel related FLAG
* Name: FLAGS_run_pten_kernel
* Since Version: 2.3.0
* Value Range: bool, default=false
* Example: FLAGS_run_pten_kernel=true would use the pt kernel to compute in the
* Op.
* Note:
*/
PADDLE_DEFINE_EXPORTED_bool(run_pten_kernel, true,
"It controls whether to use pten kernel");
/**
* Distributed related FLAG
* Name: FLAGS_allreduce_record_one_event
......
......@@ -38,12 +38,13 @@ limitations under the License. */
#endif
#endif
#include <boost/any.hpp>
#include <boost/mpl/comparison.hpp>
#include <boost/mpl/less_equal.hpp>
#include <boost/optional.hpp>
#include <boost/variant.hpp>
#include "paddle/utils/any.h"
#include "paddle/utils/optional.h"
// some platform-independent defintion
#if defined(_WIN32)
#define UNUSED
......
......@@ -567,7 +567,9 @@ GenerateOpFunctions() {
auto& op_type = op_proto->type();
// Skip ooerator which is not inherit form OperatorWithKernel, like while,
// since only OperatorWithKernel can run in dygraph mode.
if (!all_kernels.count(op_type)) {
// if the pten lib contains op kernel, we still generate ops method
if (!all_kernels.count(op_type) &&
!pten::KernelFactory::Instance().HasCompatiblePtenKernel(op_type)) {
continue;
}
......
# pten api
add_subdirectory(api)
# pten high level api
add_subdirectory(hapi)
# pten core components
add_subdirectory(core)
# pten kernels for diff device
add_subdirectory(kernels)
# pten infershape
add_subdirectory(infershape)
# pten tests
add_subdirectory(tests)
set(PTEN_DEPS convert_utils dense_tensor kernel_factory kernel_context)
set(PTEN_DEPS ${PTEN_DEPS} math_cpu linalg_cpu creation_cpu manipulation_cpu)
set(PTEN_DEPS ${PTEN_DEPS} unary binary)
if(WITH_GPU OR WITH_ROCM)
set(PTEN_DEPS ${PTEN_DEPS} math_cuda linalg_cuda creation_cuda manipulation_cuda)
endif()
cc_library(pten SRCS all.cc DEPS ${PTEN_DEPS})
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/pten/api/all.h"
namespace pten {} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
// develop apis
#include "paddle/pten/api/include/core.h"
#include "paddle/pten/api/include/creation.h"
#include "paddle/pten/api/include/infershape.h"
#include "paddle/pten/api/include/linalg.h"
#include "paddle/pten/api/include/manipulation.h"
#include "paddle/pten/api/include/math.h"
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
// See Note: [ How do we organize the kernel directory ]
#include "paddle/pten/core/convert_utils.h"
#include "paddle/pten/core/dense_tensor.h"
#include "paddle/pten/core/kernel_context.h"
#include "paddle/pten/core/kernel_factory.h"
#include "paddle/pten/core/tensor_meta.h"
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "paddle/pten/kernels/cpu/creation.h"
#include "paddle/pten/kernels/cuda/creation.h"
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
// See Note: [ How do we organize the kernel directory ]
#include "paddle/pten/infershape/binary.h"
#include "paddle/pten/infershape/unary.h"
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
// See Note: [ How do we organize the kernel directory ]
#include "paddle/pten/kernels/cpu/linalg.h"
#include "paddle/pten/kernels/cuda/linalg.h"
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
// See Note: [ How do we organize the kernel directory ]
#include "paddle/pten/kernels/cpu/manipulation.h"
#include "paddle/pten/kernels/cuda/manipulation.h"
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
// See Note: [ How do we organize the kernel directory ]
#include "paddle/pten/kernels/cpu/math.h"
#include "paddle/pten/kernels/cuda/math.h"
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <ostream>
#include "paddle/fluid/platform/enforce.h"
namespace paddle {
namespace experimental {
/**
* [ Why need Backend? ]
*
* Backend not only means place. Backend is a superset of place.
*
* Place cannot indicate the difference in calculation methods on the device,
* but in order to make the boundary of the kernel clearer and the function
* more specific, we need to distinguish the calculation method.
*
* Such as the kernel for CPU device, it can be a native CPU kernel,
* or a kernel implemented by MKLDNN library.
*
* Note(chenweihang): HIP is not needed now, we can added it if needed
* in the future
*/
enum class Backend : uint8_t {
// kernel backend cannot be undefined
UNDEFINED = 0,
// basic kernel backend
CPU,
// various acceleration devices' backends
CUDA,
XPU, // XPU currently does not exist at the same time as CUDA
NPU, // NPU currently does not exist at the same time as CUDA
// the third library backend
MKLDNN,
CUDNN,
// end of backend types
NUM_BACKENDS,
};
inline std::ostream& operator<<(std::ostream& os, Backend backend) {
switch (backend) {
case Backend::UNDEFINED:
os << "Undefined";
break;
case Backend::CPU:
os << "CPU";
break;
case Backend::CUDA:
os << "CUDA";
break;
case Backend::XPU:
os << "XPU";
break;
case Backend::NPU:
os << "NPU";
break;
case Backend::MKLDNN:
os << "MKLDNN";
break;
case Backend::CUDNN:
os << "CUDNN";
break;
default:
PADDLE_THROW(platform::errors::InvalidArgument(
"Invalid enum backend type `%d`.", static_cast<int>(backend)));
}
return os;
}
} // namespace experimental
} // namespace paddle
namespace pten {
using Backend = paddle::experimental::Backend;
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
// See Note [ Why still include the fluid headers? ]
#include "paddle/fluid/platform/bfloat16.h"
#include "paddle/fluid/platform/complex.h"
#include "paddle/fluid/platform/enforce.h"
#include "paddle/fluid/platform/float16.h"
namespace paddle {
namespace experimental {
using complex64 = ::paddle::platform::complex<float>;
using complex128 = ::paddle::platform::complex<double>;
using float16 = ::paddle::platform::float16;
using bfloat16 = ::paddle::platform::bfloat16;
enum class DataType {
UNDEFINED = 0,
BOOL,
INT8, // Char
UINT8, // BYte
INT16,
INT32,
UINT32,
INT64,
UINT64,
BFLOAT16,
FLOAT16,
UINT16,
FLOAT32,
FLOAT64,
COMPLEX64,
COMPLEX128,
NUM_DATA_TYPES
};
inline size_t SizeOf(DataType data_type) {
switch (data_type) {
case DataType::BOOL:
case DataType::UINT8:
case DataType::INT8:
return 1;
case DataType::BFLOAT16:
case DataType::FLOAT16:
case DataType::INT16:
case DataType::UINT16:
return 2;
case DataType::FLOAT32:
case DataType::INT32:
case DataType::UINT32:
return 4;
case DataType::FLOAT64:
case DataType::INT64:
case DataType::UINT64:
case DataType::COMPLEX64:
return 8;
case DataType::COMPLEX128:
return 16;
case DataType::UNDEFINED:
case DataType::NUM_DATA_TYPES:
PADDLE_THROW(platform::errors::Unimplemented(
"Data type %d is not supported by tensor.",
static_cast<int>(data_type)));
}
return 0;
}
#define PT_FOR_EACH_DATA_TYPE(_) \
_(bool, DataType::BOOL) \
_(int8_t, DataType::INT8) \
_(uint8_t, DataType::UINT8) \
_(int16_t, DataType::INT16) \
_(uint16_t, DataType::UINT16) \
_(int32_t, DataType::INT32) \
_(uint32_t, DataType::UINT32) \
_(int64_t, DataType::INT64) \
_(uint64_t, DataType::UINT64) \
_(bfloat16, DataType::BFLOAT16) \
_(float16, DataType::FLOAT16) \
_(float, DataType::FLOAT32) \
_(double, DataType::FLOAT64) \
_(complex64, DataType::COMPLEX64) \
_(complex128, DataType::COMPLEX128)
template <DataType T>
struct DataTypeToCppType;
template <typename T>
struct CppTypeToDataType;
#define PT_SPECIALIZE_DataTypeToCppType(cpp_type, data_type) \
template <> \
struct DataTypeToCppType<data_type> { \
using type = cpp_type; \
};
PT_FOR_EACH_DATA_TYPE(PT_SPECIALIZE_DataTypeToCppType)
#undef PT_SPECIALIZE_DataTypeToCppType
#define PT_SPECIALIZE_CppTypeToDataType(cpp_type, data_type) \
template <> \
struct CppTypeToDataType<cpp_type> { \
constexpr static DataType Type() { return data_type; } \
};
PT_FOR_EACH_DATA_TYPE(PT_SPECIALIZE_CppTypeToDataType)
#undef PT_SPECIALIZE_CppTypeToDataType
inline std::ostream& operator<<(std::ostream& os, DataType dtype) {
switch (dtype) {
case DataType::UNDEFINED:
os << "Undefined";
break;
case DataType::BOOL:
os << "bool";
break;
case DataType::INT8:
os << "int8";
break;
case DataType::UINT8:
os << "uint8";
break;
case DataType::INT16:
os << "int16";
break;
case DataType::UINT16:
os << "uint16";
break;
case DataType::INT32:
os << "int32";
break;
case DataType::UINT32:
os << "uint32";
break;
case DataType::INT64:
os << "int64";
break;
case DataType::UINT64:
os << "uint64";
break;
case DataType::BFLOAT16:
os << "bfloat16";
break;
case DataType::FLOAT16:
os << "float16";
break;
case DataType::FLOAT32:
os << "float32";
break;
case DataType::FLOAT64:
os << "float64";
break;
case DataType::COMPLEX64:
os << "complex64";
break;
case DataType::COMPLEX128:
os << "complex128";
break;
default:
PADDLE_THROW(platform::errors::InvalidArgument(
"Invalid enum data type `%d`.", static_cast<int>(dtype)));
}
return os;
}
} // namespace experimental
} // namespace paddle
namespace pten {
using DataType = paddle::experimental::DataType;
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/fluid/platform/enforce.h"
namespace paddle {
namespace experimental {
enum class DataLayout {
UNDEFINED = 0,
ANY,
NHWC,
NCHW,
MKLDNN,
NUM_DATA_LAYOUTS,
};
inline std::ostream& operator<<(std::ostream& os, DataLayout layout) {
switch (layout) {
case DataLayout::UNDEFINED:
os << "Undefined";
break;
case DataLayout::ANY:
os << "Any";
break;
case DataLayout::NHWC:
os << "NHWC";
break;
case DataLayout::NCHW:
os << "NCHW";
break;
case DataLayout::MKLDNN:
os << "MKLDNN";
break;
default:
PADDLE_THROW(platform::errors::InvalidArgument(
"Invalid enum data layout type `%d`.", static_cast<int>(layout)));
}
return os;
}
} // namespace experimental
} // namespace paddle
namespace pten {
using DataLayout = paddle::experimental::DataLayout;
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <cstdint>
#include "paddle/fluid/platform/enforce.h"
namespace paddle {
namespace experimental {
class Scalar {
public:
// Constructor support implicit
Scalar(float val) : tag(Tag::HAS_F) { data_.f = val; } // NOLINT
Scalar(double val) : tag(Tag::HAS_D) { data_.d = val; } // NOLINT
Scalar(int32_t val) : tag(Tag::HAS_I32) { data_.i32 = val; } // NOLINT
Scalar(int64_t val) : tag(Tag::HAS_I64) { data_.i64 = val; } // NOLINT
Scalar(bool val) : tag(Tag::HAS_B) { data_.b = val; } // NOLINT
template <typename T>
inline T to() const {
switch (tag) {
case Tag::HAS_F:
return static_cast<T>(data_.f);
case Tag::HAS_D:
return static_cast<T>(data_.d);
case Tag::HAS_I32:
return static_cast<T>(data_.i32);
case Tag::HAS_I64:
return static_cast<T>(data_.i64);
case Tag::HAS_B:
return static_cast<T>(data_.b);
default:
PADDLE_THROW(platform::errors::InvalidArgument(
"Invalid enum scalar type tag `%d`.", static_cast<int>(tag)));
}
}
private:
enum class Tag { HAS_F, HAS_D, HAS_I32, HAS_I64, HAS_B };
Tag tag;
union data {
float f;
double d;
int32_t i32;
int64_t i64;
bool b;
} data_;
};
} // namespace experimental
} // namespace paddle
namespace pten {
using Scalar = paddle::experimental::Scalar;
}
IF(WITH_MKLDNN)
set(MKLDNN_CTX_DEPS mkldnn)
ELSE()
set(MKLDNN_CTX_DEPS)
ENDIF()
if(WITH_GPU)
cc_library(convert_utils SRCS convert_utils.cc DEPS data_type place gpu_info)
elseif(WITH_ROCM)
cc_library(convert_utils SRCS convert_utils.cc DEPS data_type place gpu_info)
else()
cc_library(convert_utils SRCS convert_utils.cc DEPS data_type place)
endif()
cc_library(kernel_factory SRCS kernel_factory.cc DEPS enforce)
cc_library(kernel_context SRCS kernel_context.cc DEPS enforce device_context)
cc_library(tensor_base SRCS tensor_base.cc allocator.cc storage.cc DEPS enforce)
cc_library(dense_tensor SRCS dense_tensor.cc DEPS tensor_base)
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/pten/core/allocator.h"
namespace pten {} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <cstdint>
#include "paddle/fluid/platform/place.h"
namespace pten {
/// \brief Encapsulates strategies for access/addressing, allocation/
/// deallocation and construction/destruction of objects.
class RawAllocator {
public:
using Place = paddle::platform::Place;
/// \brief Default destructor.
virtual ~RawAllocator() = default;
/// \brief Allocates storage suitable for an array object of n bytes
/// and creates the array, but does not construct array elements.
/// May throw exceptions.
/// \param bytes_size The number of bytes to allocate.
/// \return The first address allocated.
virtual void* Allocate(size_t bytes_size) = 0;
/// \brief Deallocates storage pointed to ptr, which must be a value
/// returned by a previous call to allocate that has not been
/// invalidated by an intervening call to deallocate. The bytes_size
/// must match the value previously passed to allocate.
/// \param ptr The first address to deallocate.
/// \param bytes_size The number of bytes to deallocate.
virtual void Deallocate(void* ptr, size_t bytes_size) = 0;
/// \brief Get the place value of the allocator and the allocation.
/// \return The place value of the allocator and the allocation.
virtual const Place& place() const = 0;
};
/// \brief Fancy pointer with context. The use of this data type
/// is to be compatible with allocators from different frameworks
/// without significant performance loss. This class does not
/// support being inherited.
class Allocation final {
public:
using Place = paddle::platform::Place;
using DeleterFnPtr = void (*)(void*);
Allocation() = default;
Allocation(Allocation&&) = default;
Allocation& operator=(Allocation&&) = default;
Allocation(void* data, const Place& place) : data_(data), place_(place) {}
Allocation(void* data,
void* ctx,
DeleterFnPtr ctx_deleter,
const Place& place)
: data_(data), ctx_(ctx, ctx_deleter), place_(place) {}
void* operator->() const noexcept { return data_; }
operator bool() const noexcept { return data_ || ctx_.Get(); }
const Place& place() const noexcept { return place_; }
void Clear() noexcept {
data_ = nullptr;
ctx_.Clear();
}
/// \brief Statically cast the void pointer of the context object to
/// the primitive type. Conversion of any pointer to void* and back
/// to pointer to the original cv type preserves its original value.
/// \param T The primitive type name of the context pointer.
/// \param expected_deleter The destructor passed in to enhance type
/// safety checking.
template <typename T>
T* CastContext(DeleterFnPtr expected_deleter) const noexcept {
if (ctx_.deleter() != expected_deleter) {
return nullptr;
}
return static_cast<T*>(ctx_.Get());
}
public:
class Context {
public:
Context() = default;
Context(void* ctx, DeleterFnPtr deleter) noexcept : ctx_(ctx),
deleter_(deleter) {}
Context(Context&& other) noexcept {
// Exchange them explicitly to avoid moving is equivalent
// to copying.
swap(*this, other);
}
Context& operator=(Context&& other) noexcept {
swap(*this, other);
return *this;
}
~Context() {
if (deleter_) {
deleter_(ctx_);
}
}
void Clear() noexcept {
ctx_ = nullptr;
deleter_ = nullptr;
}
void* Get() const noexcept { return ctx_; }
DeleterFnPtr deleter() const noexcept { return deleter_; }
void* Release() noexcept {
deleter_ = nullptr;
return ctx_;
}
friend void swap(Context& a, Context& b) noexcept;
private:
void* ctx_{nullptr};
DeleterFnPtr deleter_{nullptr};
};
private:
void* data_{nullptr};
Context ctx_;
// TODO(Shixiaowei02): Enum needs to be used instead to reduce
// the construction overhead by more than 50%.
Place place_;
};
inline void swap(Allocation::Context& a, Allocation::Context& b) noexcept {
::std::swap(a.ctx_, b.ctx_);
::std::swap(a.deleter_, b.deleter_);
}
/// \brief Context compatible allocator interface. This allocator is
/// mainly used for general data structures such as Tensor. The raw
/// allocator is more universal and efficient.
class Allocator {
public:
virtual ~Allocator() = default;
virtual Allocation Allocate(size_t bytes_size) = 0;
};
inline Allocation Allocate(const std::shared_ptr<Allocator>& a, size_t n) {
CHECK(a);
return a->Allocate(n);
}
} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/pten/core/convert_utils.h"
// See Note [ Why still include the fluid headers? ]
#include "paddle/fluid/platform/gpu_info.h"
namespace pten {
// TODO(chenweihang): Add other place trans cases later
Backend TransToPtenBackend(const paddle::platform::Place& place) {
if (paddle::platform::is_cpu_place(place)) {
return Backend::CPU;
} else if (paddle::platform::is_gpu_place(place)) {
return Backend::CUDA;
} else {
return Backend::UNDEFINED;
}
}
paddle::experimental::DataType TransToPtenDataType(
const paddle::framework::proto::VarType::Type& dtype) {
// Set the order of case branches according to the frequency with
// the data type is used
switch (dtype) {
case paddle::framework::proto::VarType::FP32:
return DataType::FLOAT32;
case paddle::framework::proto::VarType::FP64:
return DataType::FLOAT64;
case paddle::framework::proto::VarType::INT64:
return DataType::INT64;
case paddle::framework::proto::VarType::INT32:
return DataType::INT32;
case paddle::framework::proto::VarType::INT8:
return DataType::INT8;
case paddle::framework::proto::VarType::UINT8:
return DataType::UINT8;
case paddle::framework::proto::VarType::INT16:
return DataType::INT16;
case paddle::framework::proto::VarType::COMPLEX64:
return DataType::COMPLEX64;
case paddle::framework::proto::VarType::COMPLEX128:
return DataType::COMPLEX128;
case paddle::framework::proto::VarType::FP16:
return DataType::FLOAT16;
case paddle::framework::proto::VarType::BF16:
return DataType::BFLOAT16;
case paddle::framework::proto::VarType::BOOL:
return DataType::BOOL;
default:
return DataType::UNDEFINED;
}
}
DataLayout TransToPtenDataLayout(const paddle::framework::DataLayout& layout) {
switch (layout) {
case paddle::framework::DataLayout::kNHWC:
return DataLayout::NHWC;
case paddle::framework::DataLayout::kNCHW:
return DataLayout::NCHW;
case paddle::framework::DataLayout::kAnyLayout:
return DataLayout::ANY;
case paddle::framework::DataLayout::kMKLDNN:
return DataLayout::MKLDNN;
default:
return DataLayout::UNDEFINED;
}
}
paddle::platform::Place TransToFluidPlace(const Backend& backend) {
// TODO(chenweihang): add other trans cases later
switch (backend) {
case pten::Backend::CPU:
return paddle::platform::CPUPlace();
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
case pten::Backend::CUDA:
return paddle::platform::CUDAPlace(
paddle::platform::GetCurrentDeviceId());
#endif
#ifdef PADDLE_WITH_MKLDNN
case pten::Backend::MKLDNN:
return paddle::platform::CPUPlace();
#endif
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
case pten::Backend::CUDNN:
return paddle::platform::CUDAPlace(
paddle::platform::GetCurrentDeviceId());
#endif
default:
PADDLE_THROW(paddle::platform::errors::Unimplemented(
"Unsupported backend `%s` when casting it to paddle place type.",
backend));
}
}
paddle::framework::proto::VarType::Type TransToProtoVarType(
const paddle::experimental::DataType& dtype) {
// Set the order of case branches according to the frequency with
// the data type is used
switch (dtype) {
case DataType::FLOAT32:
return paddle::framework::proto::VarType::FP32;
case DataType::FLOAT64:
return paddle::framework::proto::VarType::FP64;
case DataType::INT64:
return paddle::framework::proto::VarType::INT64;
case DataType::INT32:
return paddle::framework::proto::VarType::INT32;
case DataType::INT8:
return paddle::framework::proto::VarType::INT8;
case DataType::UINT8:
return paddle::framework::proto::VarType::UINT8;
case DataType::INT16:
return paddle::framework::proto::VarType::INT16;
case DataType::COMPLEX64:
return paddle::framework::proto::VarType::COMPLEX64;
case DataType::COMPLEX128:
return paddle::framework::proto::VarType::COMPLEX128;
case DataType::FLOAT16:
return paddle::framework::proto::VarType::FP16;
case DataType::BFLOAT16:
return paddle::framework::proto::VarType::BF16;
case DataType::BOOL:
return paddle::framework::proto::VarType::BOOL;
default:
PADDLE_THROW(paddle::platform::errors::Unimplemented(
"Unsupported data type `%s` when casting it into "
"paddle data type.",
dtype));
}
}
paddle::framework::DataLayout TransToFluidDataLayout(const DataLayout& layout) {
switch (layout) {
case DataLayout::NHWC:
return paddle::framework::DataLayout::kNHWC;
case DataLayout::NCHW:
return paddle::framework::DataLayout::kNCHW;
case DataLayout::ANY:
return paddle::framework::DataLayout::kAnyLayout;
case DataLayout::MKLDNN:
return paddle::framework::DataLayout::kMKLDNN;
default:
PADDLE_THROW(paddle::platform::errors::Unimplemented(
"Unsupported data layout `%s` when casting it into "
"paddle data layout.",
layout));
}
}
} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/pten/common/backend.h"
#include "paddle/pten/common/data_type.h"
#include "paddle/pten/common/layout.h"
// See Note [ Why still include the fluid headers? ]
#include "paddle/fluid/framework/data_layout.h"
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/platform/place.h"
// TODO(chenweihang): this file may need to be removed
namespace pten {
using DataType = paddle::experimental::DataType;
using DataLayout = paddle::experimental::DataLayout;
Backend TransToPtenBackend(const paddle::platform::Place& place);
DataType TransToPtenDataType(
const paddle::framework::proto::VarType::Type& dtype);
DataLayout TransToPtenDataLayout(const paddle::framework::DataLayout& layout);
paddle::platform::Place TransToFluidPlace(const Backend& backend);
paddle::framework::proto::VarType::Type TransToProtoVarType(
const DataType& dtype);
paddle::framework::DataLayout TransToFluidDataLayout(const DataLayout& layout);
} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/pten/core/dense_tensor.h"
namespace pten {
DenseTensor::DenseTensor(const std::shared_ptr<Allocator>& a,
const DenseTensorMeta& meta)
: meta_(meta),
storage_(
make_intrusive<TensorStorage>(a, SizeOf(data_type()) * numel())) {}
DenseTensor::DenseTensor(const std::shared_ptr<Allocator>& a,
DenseTensorMeta&& meta)
: meta_(std::move(meta)),
storage_(
make_intrusive<TensorStorage>(a, SizeOf(data_type()) * numel())) {}
DenseTensor::DenseTensor(intrusive_ptr<Storage> storage,
const DenseTensorMeta& meta)
: meta_(meta), storage_(std::move(storage)) {}
DenseTensor::DenseTensor(intrusive_ptr<Storage> storage, DenseTensorMeta&& meta)
: meta_(std::move(meta)), storage_(std::move(storage)) {}
int64_t DenseTensor::numel() const {
if (meta_.is_scalar) {
return 1;
}
return product(meta_.dims);
}
bool DenseTensor::IsSharedWith(const DenseTensor& b) const {
return storage_.get() == b.storage_.get() && storage_.get() != nullptr;
}
void* DenseTensor::mutable_data(size_t request_bytes) {
PADDLE_ENFORCE(
valid(),
paddle::platform::errors::PreconditionNotMet(
"The meta data must be valid when call the mutable data function."));
PADDLE_ENFORCE_NOT_NULL(
storage_,
paddle::platform::errors::PreconditionNotMet(
"The storage must be valid when call the mutable data function."));
size_t bytes = numel() * SizeOf(data_type());
if (request_bytes) {
PADDLE_ENFORCE_GE(request_bytes,
bytes,
paddle::platform::errors::InvalidArgument(
"The reserved size %d should be enough to meet the "
"volume required by metadata %d.",
request_bytes,
bytes));
bytes = request_bytes;
}
if (storage_->size() < bytes) {
storage_->Realloc(bytes);
}
return storage_->data();
}
template <typename T>
T* DenseTensor::mutable_data() {
PADDLE_ENFORCE(
(data_type() == paddle::experimental::CppTypeToDataType<T>::Type()),
paddle::platform::errors::PreconditionNotMet(
"The type of data (%d) we are trying to retrieve does not match the "
"type of data currently contained in the container (%d).",
static_cast<int>(paddle::experimental::CppTypeToDataType<T>::Type()),
static_cast<int>(data_type())));
return static_cast<T*>(mutable_data());
}
template <typename T>
const T* DenseTensor::data() const {
PADDLE_ENFORCE(
(data_type() == paddle::experimental::CppTypeToDataType<T>::Type()),
paddle::platform::errors::PreconditionNotMet(
"The type of data we are trying to retrieve does not match the "
"type of data currently contained in the container."));
return static_cast<const T*>(data());
}
const void* DenseTensor::data() const {
PADDLE_ENFORCE_NOT_NULL(
storage_,
paddle::platform::errors::PreconditionNotMet(
"The storage must be valid when call the mutable data function."));
return storage_->data();
}
void DenseTensor::check_memory_size() const {
size_t bytes = numel() * SizeOf(data_type());
PADDLE_ENFORCE_GE(memory_size(),
bytes,
paddle::platform::errors::InvalidArgument(
"The memory size %d should be enough to meet the "
"volume required by metadata %d.",
memory_size(),
bytes));
}
#define DATA_MEMBER_FUNC_INSTANTIATION(dtype) \
template dtype* DenseTensor::mutable_data(); \
template const dtype* DenseTensor::data() const;
DATA_MEMBER_FUNC_INSTANTIATION(bool);
DATA_MEMBER_FUNC_INSTANTIATION(int8_t);
DATA_MEMBER_FUNC_INSTANTIATION(uint8_t);
DATA_MEMBER_FUNC_INSTANTIATION(int16_t);
DATA_MEMBER_FUNC_INSTANTIATION(uint16_t);
DATA_MEMBER_FUNC_INSTANTIATION(int32_t);
DATA_MEMBER_FUNC_INSTANTIATION(uint32_t);
DATA_MEMBER_FUNC_INSTANTIATION(int64_t);
DATA_MEMBER_FUNC_INSTANTIATION(uint64_t);
DATA_MEMBER_FUNC_INSTANTIATION(::paddle::platform::bfloat16);
DATA_MEMBER_FUNC_INSTANTIATION(::paddle::platform::float16);
DATA_MEMBER_FUNC_INSTANTIATION(float);
DATA_MEMBER_FUNC_INSTANTIATION(double);
DATA_MEMBER_FUNC_INSTANTIATION(::paddle::experimental::complex64);
DATA_MEMBER_FUNC_INSTANTIATION(::paddle::experimental::complex128);
#undef DATA_MEMBER_FUNC_INSTANTIATION
} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/pten/core/allocator.h"
#include "paddle/pten/core/storage.h"
#include "paddle/pten/core/tensor_base.h"
#include "paddle/pten/core/tensor_meta.h"
namespace pten {
/// \brief The Dense tensor store values in a contiguous sequential block
/// of memory where all values are represented. Tensors or multi-dimensional
/// arrays are used in math operators.
/// During the entire life cycle of a DenseTensor, its device type and key
/// metadata are set unchanged.
class DenseTensor : public TensorBase,
public TypeInfoTraits<TensorBase, DenseTensor> {
public:
/// \brief Construct a dense tensor and allocate space.
/// \param a The allocator used to allocate space.
/// \param meta The meta data of dense tensor.
DenseTensor(const std::shared_ptr<Allocator>& a, const DenseTensorMeta& meta);
/// \brief Construct a dense tensor and allocate space.
/// \param a The allocator used to allocate space.
/// \param meta The meta data of dense tensor.
DenseTensor(const std::shared_ptr<Allocator>& a, DenseTensorMeta&& meta);
/// \brief Use existing storage space to create dense tensor. This interface
/// can be used to deliberately create an uninitialized dense tensor.
/// \param storage The existing storage.
/// \param meta The meta data of dense tensor.
DenseTensor(intrusive_ptr<Storage> storage, const DenseTensorMeta& meta);
/// \brief Use existing storage space to create dense tensor. This interface
/// can be used to deliberately create an uninitialized dense tensor.
/// \param storage The existing storage.
/// \param meta The meta data of dense tensor.
DenseTensor(intrusive_ptr<Storage> storage, DenseTensorMeta&& meta);
/// \brief Because dense tensor is a kind of container, we give a default
/// constructor to use for stl container. But the dense tensor created with
/// the default constructor is not practical.
DenseTensor() = default;
/// \brief Because dense tensor is a resource handle, we provide a default
/// move constructor to support move semantics.
DenseTensor(DenseTensor&& other) = default;
/// \brief We do not recommend deep copy of dense tensor because of its
/// efficiency and complexity across devices. The operation is disabled here.
DenseTensor(const DenseTensor& other) = delete;
/// \brief Destroy the tensor object and release exclusive resources.
virtual ~DenseTensor() = default;
public:
/// \brief Returns the name of the class for type traits.
/// \return The name of the class.
static const char* name() { return "DenseTensor"; }
/// \brief Returns the number of elements contained in tensor.
/// \return The number of elements contained in tensor.
int64_t numel() const;
/// \brief Returns the dims of the tensor.
/// \return The dims of the tensor.
const DDim& dims() const noexcept { return meta_.dims; }
/// \brief Returns the lod of the tensor.
/// \return The lod of the tensor.
const std::vector<std::vector<size_t>>& lod() const noexcept {
return meta_.lod;
}
/// \brief Set the lod of the tensor.
void set_lod(const std::vector<std::vector<size_t>>& lod) { meta_.lod = lod; }
/// \brief Returns the data type of the tensor.
/// \return The data type of the tensor.
DataType data_type() const noexcept { return meta_.type; }
/// \brief Returns the data layout of the tensor.
/// \return The data layout of the tensor.
DataLayout layout() const noexcept { return meta_.layout; }
/// \brief Returns the data place of the tensor.
/// \return The data place of the tensor.
const Place& place() const { return storage_->place(); }
/// \brief Returns the meta information of the tensor.
/// \return The meta information of the tensor.
const DenseTensorMeta& meta() const noexcept { return meta_; }
/// \brief Test whether the metadata is valid.
/// \return Whether the metadata is valid.
bool valid() const noexcept { return meta_.valid(); }
/// \brief Test whether the storage is allocated.
/// return Whether the storage is allocated.
bool initialized() const { return storage_->data(); }
/// \brief Check if storage is shared with other objects.
/// \return Whether the storage is shared with other objects.
bool IsSharedWith(const DenseTensor& b) const;
/// \brief Change the dims information in the metadata, and the corresponding
/// memory allocation will occur when the `mutable_data` is called.
/// \param dims The new dims of the dense tensor.
void Resize(const DDim& dims) noexcept { meta_.dims = dims; }
/// \brief Returns the actual storage size occupied by tensor, may be larger
/// than its shape dims.
/// \return The actual storage size occupied by tensor.
size_t memory_size() const { return storage_->size(); }
/// \brief Check that the storage area is large enough to hold the data of the
/// metadata size, and throw an exception if the conditions are not met.
void check_memory_size() const;
/// \brief Release the storage area for other purposes. Because of the
/// destruction of encapsulation, we do not support two dense tensors directly
/// sharing the same intrusive pointer.
/// \return The rvalue of instrusize pointer releated to the released storage.
intrusive_ptr<Storage> release() { return std::move(storage_); }
/// \brief Get the mutable data pointer value of type T.
/// Memory allocation may occur when calling this interface:
/// 1. When the storage size is not enough to meet the current shape of the
/// data.
/// \return The mutable data pointer value of type T.
template <typename T>
T* mutable_data();
/// \brief Get the mutable data pointer value of raw type.
/// Memory allocation may occur when calling this interface:
/// 1. When the storage size is not enough to meet the current shape of the
/// data.
/// 2. When more request_bytes parameters are used to reserve the data
/// storage.
/// param request_bytes The bytes to reserve the data storage.
/// \return The mutable data pointer value of type T.
void* mutable_data(size_t request_bytes = 0);
/// \brief Get the const data pointer value of type T.
/// \return The const data pointer value of type T.
template <typename T>
const T* data() const;
/// \brief Get the const data pointer value of raw type.
/// \return The const data pointer value of raw type.
const void* data() const;
private:
DenseTensorMeta meta_;
intrusive_ptr<Storage> storage_;
};
} // namespace pten
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/pten/core/kernel_context.h"
namespace pten {} // namespace pten
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <utility>
#include "paddle/pten/core/tensor_base.h"
#include "paddle/utils/any.h"
#include "paddle/utils/small_vector.h"
// See Note [ Why still include the fluid headers? ]
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/enforce.h"
namespace pten {
using DeviceContext = paddle::platform::DeviceContext;
using DataType = paddle::experimental::DataType;
using DataLayout = paddle::experimental::DataLayout;
/**
* Note: KernelContext doesn't manage the life if DeviceContext and Tensor
*
* Note: KernelContext does not couple the concept of framework,
* its constructor can only take the members it needs as parameters,
* not Scope, RuntimeContext, etc. as parameters
*/
class KernelContext {
public:
explicit KernelContext(const DeviceContext& dev_ctx) : dev_ctx_(dev_ctx) {}
KernelContext(const DeviceContext& dev_ctx,
const paddle::SmallVector<std::shared_ptr<TensorBase>>& inputs,
const paddle::SmallVector<std::shared_ptr<TensorBase>>& outputs,
const paddle::SmallVector<paddle::any>& attrs)
: dev_ctx_(dev_ctx), inputs_(inputs), outputs_(outputs), attrs_(attrs) {}
template <typename CtxType>
const CtxType& GetDeviceContext() const {
return static_cast<const CtxType&>(dev_ctx_);
}
void EmplaceBackInput(std::shared_ptr<TensorBase> input) {
inputs_.emplace_back(std::move(input));
// Record the start and end index of the input
int index = inputs_.size();
input_range_.emplace_back(std::pair<int, int>(index, index + 1));
}
void EmplaceBackInputs(
paddle::SmallVector<std::shared_ptr<TensorBase>> inputs) {
for (auto in : inputs) {
inputs_.emplace_back(in);
}
// Record the start and end index of the input
int index = inputs_.size();
input_range_.emplace_back(
std::pair<int, int>(index, index + inputs.size()));
}
void EmplaceBackOutput(std::shared_ptr<TensorBase> output) {
outputs_.emplace_back(std::move(output));
// Record the start and end index of the input
int index = outputs_.size();
output_range_.emplace_back(std::pair<int, int>(index, index + 1));
}
void EmplaceBackOutputs(
paddle::SmallVector<std::shared_ptr<TensorBase>> outputs) {
for (auto out : outputs) {
outputs_.emplace_back(out);
}
// Record the start and end index of the input
int index = outputs_.size();
output_range_.emplace_back(
std::pair<int, int>(index, index + outputs.size()));
}
void EmplaceBackAttr(paddle::any attr) {
attrs_.emplace_back(std::move(attr));
}
template <typename TensorType>
const TensorType& InputAt(size_t idx) const {
return static_cast<const TensorType&>(*(inputs_.at(idx)));
}
template <typename TensorType>
TensorType* MutableOutputAt(size_t idx) {
return static_cast<TensorType*>(outputs_.at(idx).get());
}
template <typename AttrType>
AttrType AttrAt(size_t idx) const {
try {
return paddle::any_cast<AttrType>(attrs_.at(idx));
} catch (paddle::bad_any_cast&) {
PADDLE_THROW(paddle::platform::errors::InvalidArgument(
"Attribute cast error in Op Kernel Context."));
}
}
private:
bool IsDuplicable() const { return input_range_.size() != inputs_.size(); }
private:
// DeviceContext base class
const DeviceContext& dev_ctx_;
// TODO(chenweihang): Tensor -> Tensor*, Tensor should by managed `scope`
// Note: can't use API Tensor here, the inference don't use this API Tensor
paddle::SmallVector<std::shared_ptr<TensorBase>> inputs_;
paddle::SmallVector<std::shared_ptr<TensorBase>> outputs_;
paddle::SmallVector<paddle::any> attrs_;
// Only contains input like list[Tensor] need `range`
paddle::SmallVector<std::pair<int, int>> input_range_;
paddle::SmallVector<std::pair<int, int>> output_range_;
// Only static graph need `name`
// TODO(chenweihang): replaced by paddle::string_view
paddle::SmallVector<std::string> input_names_;
paddle::SmallVector<std::string> output_names_;
};
} // namespace pten
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
namespace pten {
class Kernel;
class KernelKey;
class KernelArgsDef;
class KernelContext;
using KernelFn = void (*)(KernelContext* ctx);
using KernelArgsDefFn = void (*)(Kernel* kernel);
using KernelArgsParseFn = void (*)(const KernelKey& default_key,
KernelArgsDef* args_def);
// Multiple kernels of the same operation are distinguished by the difference
// of the overload name. For the convenience of reuse, we define some overload
// naming strings for the naming of the kernel
// For kernels that contains dynamic tensor attribute and it need to be always
// on host device, such as `ScaleTensor`
constexpr char kContainHostTensorSuffix[] = "host";
// For kernels with SelectedRowsTensor input and output
constexpr char kContainSelectedRowsSuffix[] = "sr";
// For kernels with intermediate output
constexpr char kContainMidOutputTensorSuffix[] = "mid";
} // namespace pten
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/pten/core/kernel_factory.h"
// See Note [ Why still include the fluid headers? ]
#include "paddle/fluid/platform/enforce.h"
namespace pten {
uint32_t KernelKey::Hash::operator()(const KernelKey& key) const {
uint32_t hash_value = 0;
// |----31-20------|---19-12---|---11-8----|---7-0---|
// | For extension | DataType | DataLayout | Backend |
hash_value |= static_cast<uint8_t>(key.backend());
hash_value |=
(static_cast<uint8_t>(key.layout()) << KernelKey::kBackendBitLength);
hash_value |=
(static_cast<uint16_t>(key.dtype())
<< (KernelKey::kBackendBitLength + KernelKey::kDataTypeBitLength));
return hash_value;
}
KernelFactory& KernelFactory::Instance() {
static KernelFactory g_op_kernel_factory;
return g_op_kernel_factory;
}
Kernel KernelFactory::SelectKernel(const KernelName& kernel_name,
const KernelKey& kernel_key) const {
auto iter = kernels_.find(kernel_name);
if (iter == kernels_.end()) {
return Kernel();
}
auto kernel_iter = iter->second.find(kernel_key);
if (kernel_iter == iter->second.end()) {
return Kernel();
}
return kernel_iter->second;
}
const Kernel& KernelFactory::SelectKernelOrThrowError(
const KernelName& kernel_name, const KernelKey& kernel_key) const {
auto iter = kernels_.find(kernel_name);
PADDLE_ENFORCE_NE(iter,
kernels_.end(),
paddle::platform::errors::NotFound(
"The kernel `%s` is not registered.", kernel_name));
auto kernel_iter = iter->second.find(kernel_key);
// TODO(chenweihang): polish refind impl here
if (kernel_key.layout() != pten::DataLayout::ANY) {
pten::KernelKey any_layout_kernel_key(
kernel_key.backend(), pten::DataLayout::ANY, kernel_key.dtype());
kernel_iter = iter->second.find(any_layout_kernel_key);
}
PADDLE_ENFORCE_NE(
kernel_iter,
iter->second.end(),
paddle::platform::errors::NotFound(
"The kernel with key %s of kernel `%s` is not registered.",
kernel_key,
kernel_name));
return kernel_iter->second;
}
const Kernel& KernelFactory::SelectKernelOrThrowError(
const KernelName& kernel_name,
Backend backend,
DataLayout layout,
DataType dtype) const {
return SelectKernelOrThrowError(kernel_name,
KernelKey(backend, layout, dtype));
}
std::ostream& operator<<(std::ostream& os, const Kernel& kernel) {
os << "InputNum(" << kernel.args_def().input_defs().size() << "): [";
for (auto& in_def : kernel.args_def().input_defs()) {
os << "<" << in_def.backend << ", " << in_def.layout << ", " << in_def.dtype
<< ">";
}
os << "]), AttributeNum(" << kernel.args_def().attribute_defs().size()
<< "), OutputNum(" << kernel.args_def().output_defs().size() << ")";
return os;
}
std::ostream& operator<<(std::ostream& os, KernelFactory& kernel_factory) {
for (const auto& op_kernel_pair : kernel_factory.kernels()) {
os << "- kernel name: " << op_kernel_pair.first << "\n";
for (const auto& kernel_pair : op_kernel_pair.second) {
os << "\t- kernel key: " << kernel_pair.first << " | "
<< "kernel: " << kernel_pair.second << "\n";
}
}
return os;
}
} // namespace pten
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <ostream>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include "paddle/pten/common/backend.h"
#include "paddle/pten/common/data_type.h"
#include "paddle/pten/common/layout.h"
#include "paddle/pten/core/kernel_def.h"
// See Note [ Why still include the fluid headers? ]
#include "paddle/fluid/platform/enforce.h"
#include "paddle/utils/flat_hash_map.h"
#include "paddle/utils/small_vector.h"
namespace pten {
using DataType = paddle::experimental::DataType;
using DataLayout = paddle::experimental::DataLayout;
/**
* [ Naming considerations ]
*
* The tensor operation library contains many kernels, and the computation
* in each specific scenario is represented by an kernel.
*
* We directly named it `Kernel` instead of `Kernel`, the tensor operation
* library here and fluid are independent, avoiding developers from
* misunderstanding the relationship between the two concepts.
*/
class KernelContext;
using KernelFn = void (*)(KernelContext* ctx);
class KernelName final {
public:
KernelName(std::string name, std::string overload_name)
: name_(std::move(name)), overload_name_(std::move(overload_name)) {}
KernelName(const std::string& kernel_name) {
ParseNameAndOverloadNameFromString(kernel_name);
}
KernelName(const char* kernel_name) {
std::string kernel_name_str(kernel_name);
ParseNameAndOverloadNameFromString(kernel_name_str);
}
const std::string& name() const { return name_; }
const std::string& overload_name() const { return overload_name_; }
struct Hash {
size_t operator()(const KernelName& kernel_name) const {
return std::hash<std::string>()(kernel_name.name()) ^
(std::hash<std::string>()(kernel_name.overload_name()) << 1);
}
};
size_t hash_value() const { return Hash()(*this); }
bool operator<(const KernelName& kernel_name) const {
return hash_value() < kernel_name.hash_value();
}
bool operator==(const KernelName& kernel_name) const {
return hash_value() == kernel_name.hash_value();
}
bool operator!=(const KernelName& kernel_name) const {
return hash_value() != kernel_name.hash_value();
}
private:
void ParseNameAndOverloadNameFromString(const std::string& kernel_name) {
size_t pos = kernel_name.find_first_of('.');
if (pos == std::string::npos) {
name_ = kernel_name;
overload_name_ = "";
} else {
name_ = kernel_name.substr(0, pos);
overload_name_ = kernel_name.substr(pos + 1, kernel_name.size());
}
}
// TODO(chenweihang): use string_view to improve performance later
std::string name_;
std::string overload_name_;
};
class KernelKey {
public:
KernelKey() = default;
KernelKey(Backend backend, DataLayout layout, DataType dtype)
: backend_(backend), layout_(layout), dtype_(dtype) {}
Backend backend() const { return backend_; }
DataLayout layout() const { return layout_; }
DataType dtype() const { return dtype_; }
struct Hash {
// Note: Now the number of bits we need does not exceed 32 bits, so there is
// no need to use 64 bits. If needed in the future, it can be expanded,
// but now we don’t over-design.
uint32_t operator()(const KernelKey& key) const;
};
uint32_t hash_value() const { return Hash()(*this); }
bool operator<(const KernelKey& key) const {
return hash_value() < key.hash_value();
}
bool operator==(const KernelKey& key) const {
return hash_value() == key.hash_value();
}
bool operator!=(const KernelKey& key) const {
return hash_value() != key.hash_value();
}
private:
// In total should be smaller than 32.
constexpr static int kBackendBitLength = 8;
constexpr static int kDataLayoutBitLength = 4;
constexpr static int kDataTypeBitLength = 8;
Backend backend_{Backend::UNDEFINED};
DataLayout layout_{DataLayout::UNDEFINED};
DataType dtype_{DataType::UNDEFINED};
};
// TODO(chenweihang): how deal with vector<Param>?
struct TensorArgDef {
Backend backend;
DataLayout layout;
DataType dtype;
TensorArgDef(Backend in_backend, DataLayout in_layout, DataType in_dtype)
: backend(in_backend), layout(in_layout), dtype(in_dtype) {}
TensorArgDef& SetBackend(Backend in_backend) {
backend = in_backend;
return *this;
}
TensorArgDef& SetDataLayout(DataLayout in_layout) {
layout = in_layout;
return *this;
}
TensorArgDef& SetDataType(DataType in_dtype) {
dtype = in_dtype;
return *this;
}
};
struct AttributeArgDef {
std::type_index type_index;
explicit AttributeArgDef(std::type_index type_index)
: type_index(type_index) {}
};
class KernelArgsDef {
public:
KernelArgsDef() = default;
void AppendInput(Backend backend, DataLayout layout, DataType dtype) {
input_defs_.emplace_back(TensorArgDef(backend, layout, dtype));
}
void AppendOutput(Backend backend, DataLayout layout, DataType dtype) {
output_defs_.emplace_back(TensorArgDef(backend, layout, dtype));
}
void AppendAttribute(std::type_index type_index) {
attribute_defs_.emplace_back(AttributeArgDef(type_index));
}
const paddle::SmallVector<TensorArgDef>& input_defs() const {
return input_defs_;
}
const paddle::SmallVector<TensorArgDef>& output_defs() const {
return output_defs_;
}
const paddle::SmallVector<AttributeArgDef>& attribute_defs() const {
return attribute_defs_;
}
paddle::SmallVector<TensorArgDef>& input_defs() { return input_defs_; }
paddle::SmallVector<TensorArgDef>& output_defs() { return output_defs_; }
paddle::SmallVector<AttributeArgDef>& attribute_defs() {
return attribute_defs_;
}
private:
paddle::SmallVector<TensorArgDef> input_defs_{{}};
paddle::SmallVector<TensorArgDef> output_defs_{{}};
paddle::SmallVector<AttributeArgDef> attribute_defs_{{}};
};
class Kernel {
public:
// for map element contruct
Kernel() = default;
explicit Kernel(KernelFn fn) : fn_(fn) {}
void operator()(KernelContext* ctx) const { fn_(ctx); }
KernelArgsDef* mutable_args_def() { return &args_def_; }
const KernelArgsDef& args_def() const { return args_def_; }
TensorArgDef& InputAt(size_t idx) { return args_def_.input_defs().at(idx); }
TensorArgDef& OutputAt(size_t idx) { return args_def_.output_defs().at(idx); }
bool IsValid() { return fn_ != nullptr; }
private:
KernelFn fn_{nullptr};
KernelArgsDef args_def_;
};
/**
* Note: Each Computation need a basic kernel map that named by kernel_name.
* Such as for scale op, KernelMap contains a `scale` kernel map,
* if it still need other overload kernel, the op name can be
* `scale.***`.
*/
class KernelFactory {
public:
// replaced by paddle::flat_hash_map later
using KernelMap = paddle::flat_hash_map<
KernelName,
paddle::flat_hash_map<KernelKey, Kernel, KernelKey::Hash>,
KernelName::Hash>;
static KernelFactory& Instance();
KernelMap& kernels() { return kernels_; }
void InsertCompatibleOpType(const std::string& op_type) {
compatible_op_types_.insert(op_type);
}
bool HasCompatiblePtenKernel(const std::string& op_type) const {
return compatible_op_types_.count(op_type) > 0;
}
const Kernel& SelectKernelOrThrowError(const KernelName& kernel_name,
const KernelKey& kernel_key) const;
const Kernel& SelectKernelOrThrowError(const KernelName& kernel_name,
Backend backend,
DataLayout layout,
DataType dtype) const;
Kernel SelectKernel(const KernelName& kernel_name,
const KernelKey& kernel_key) const;
private:
KernelFactory() = default;
KernelMap kernels_;
// Used to be compatible with the original execution system and
// quickly confirm whether the new kernel can be called
std::unordered_set<std::string> compatible_op_types_;
};
/** operator << overload **/
inline std::ostream& operator<<(std::ostream& os,
const KernelName& kernel_name) {
if (kernel_name.overload_name().empty()) {
os << kernel_name.name();
} else {
os << kernel_name.name() << "." << kernel_name.overload_name();
}
return os;
}
inline std::ostream& operator<<(std::ostream& os, const KernelKey& kernel_key) {
os << "(" << kernel_key.backend() << ", " << kernel_key.layout() << ", "
<< kernel_key.dtype() << ")";
return os;
}
std::ostream& operator<<(std::ostream& os, const Kernel& kernel);
std::ostream& operator<<(std::ostream& os, KernelFactory& kernel_factory);
} // namespace pten
此差异已折叠。
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "paddle/pten/common/scalar.h"
#include "paddle/pten/core/dense_tensor.h"
#include "paddle/pten/core/kernel_context.h"
#include "paddle/pten/core/kernel_def.h"
// See Note [ Why still include the fluid headers? ]
#include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/enforce.h"
namespace pten {
// TODO(shixiaowei): replaced by new DeviceContext later
using CPUContext = paddle::platform::CPUDeviceContext;
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
using CUDAContext = paddle::platform::CUDADeviceContext;
#endif
#ifdef PADDLE_WITH_MKLDNN
using MKLDNNContext = paddle::platform::MKLDNNDeviceContext;
#endif
#ifdef PADDLE_WITH_ASCEND_CL
using NPUContext = paddle::platform::NPUDeviceContext;
#endif
#ifdef PADDLE_WITH_XPU
using XPUContext = paddle::platform::XPUDeviceContext;
#endif
#define PT_KERNEL(...) \
::pten::KernelImpl<decltype(&__VA_ARGS__), &__VA_ARGS__>::Compute
#define PT_SPECIALIZE_KernelCallHelper_FOR_DEVICE_CONTEXT(dev_ctx) \
template <typename... Tail> \
struct KernelCallHelper<const dev_ctx&, Tail...> { \
template <int dev_ctx_idx, \
int in_idx, \
int attr_idx, \
int out_idx, \
typename... PreviousArgs> \
static void Compute(KernelContext* ctx, PreviousArgs&... pargs) { \
static_assert(in_idx == 0, \
"Kernel's DeviceContext should appear before Inputs."); \
static_assert( \
attr_idx == 0, \
"Kernel's DeviceContext should appear before Attributes."); \
static_assert(out_idx == 0, \
"Kernel's DeviceContext should appear before Outputs."); \
const dev_ctx& arg = ctx->GetDeviceContext<dev_ctx>(); \
KernelCallHelper<Tail...>:: \
template Compute<dev_ctx_idx + 1, in_idx, attr_idx, out_idx>( \
ctx, pargs..., arg); \
} \
}
#define PT_SPECIALIZE_KernelCallHelper_FOR_INPUT(tensor_type) \
template <typename... Tail> \
struct KernelCallHelper<const tensor_type&, Tail...> { \
template <int dev_ctx_idx, \
int in_idx, \
int attr_idx, \
int out_idx, \
typename... PreviousArgs> \
static void Compute(KernelContext* ctx, PreviousArgs&... pargs) { \
static_assert(attr_idx == 0, \
"Kernel's Input should appear before Attributes."); \
static_assert(out_idx == 0, \
"Kernel's Input should appear before Outputs."); \
const tensor_type& arg = ctx->InputAt<tensor_type>(in_idx); \
KernelCallHelper<Tail...>:: \
template Compute<dev_ctx_idx, in_idx + 1, attr_idx, out_idx>( \
ctx, pargs..., arg); \
} \
}
#define PT_SPECIALIZE_KernelCallHelper_FOR_ATTRIBUTE(attr_type) \
template <typename... Tail> \
struct KernelCallHelper<attr_type, Tail...> { \
template <int dev_ctx_idx, \
int in_idx, \
int attr_idx, \
int out_idx, \
typename... PreviousArgs> \
static void Compute(KernelContext* ctx, PreviousArgs&... pargs) { \
static_assert(out_idx == 0, \
"Kernel's Attributes should appear before Outputs."); \
attr_type arg = ctx->AttrAt<attr_type>(attr_idx); \
KernelCallHelper<Tail...>:: \
template Compute<dev_ctx_idx, in_idx, attr_idx + 1, out_idx>( \
ctx, pargs..., arg); \
} \
}
#define PT_SPECIALIZE_KernelCallHelper_FOR_OUTPUT(tensor_type) \
template <typename... Tail> \
struct KernelCallHelper<tensor_type*, Tail...> { \
template <int dev_ctx_idx, \
int in_idx, \
int attr_idx, \
int out_idx, \
typename... PreviousArgs> \
static void Compute(KernelContext* ctx, PreviousArgs&... pargs) { \
tensor_type* arg = ctx->MutableOutputAt<tensor_type>(out_idx); \
KernelCallHelper<Tail...>:: \
template Compute<dev_ctx_idx, in_idx, attr_idx, out_idx + 1>( \
ctx, pargs..., arg); \
} \
}
template <typename T>
struct TypeTag {};
template <typename Fn, Fn fn>
struct KernelImpl;
template <typename Return, typename... Args, Return (*kernel_fn)(Args...)>
struct KernelImpl<Return (*)(Args...), kernel_fn> {
static void Compute(KernelContext* ctx) {
KernelCallHelper<Args..., TypeTag<int>>::template Compute<0, 0, 0, 0>(ctx);
}
private:
template <typename... RemainingArgs>
struct KernelCallHelper;
/* DeviceContext Helpers */
PT_SPECIALIZE_KernelCallHelper_FOR_DEVICE_CONTEXT(CPUContext);
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
PT_SPECIALIZE_KernelCallHelper_FOR_DEVICE_CONTEXT(CUDAContext);
#endif
#ifdef PADDLE_WITH_ASCEND_CL
PT_SPECIALIZE_KernelCallHelper_FOR_DEVICE_CONTEXT(NPUContext);
#endif
#ifdef PADDLE_WITH_XPU
PT_SPECIALIZE_KernelCallHelper_FOR_DEVICE_CONTEXT(XPUContext);
#endif
/* Input Helpers */
PT_SPECIALIZE_KernelCallHelper_FOR_INPUT(DenseTensor);
// TODO(chenweihang): adapt SelectedRows
// PT_SPECIALIZE_KernelCallHelper_FOR_INPUT(SelectedRowsTensor);
/* Attribute Helpers */
PT_SPECIALIZE_KernelCallHelper_FOR_ATTRIBUTE(bool);
PT_SPECIALIZE_KernelCallHelper_FOR_ATTRIBUTE(float);
PT_SPECIALIZE_KernelCallHelper_FOR_ATTRIBUTE(double);
PT_SPECIALIZE_KernelCallHelper_FOR_ATTRIBUTE(int);
PT_SPECIALIZE_KernelCallHelper_FOR_ATTRIBUTE(int64_t);
PT_SPECIALIZE_KernelCallHelper_FOR_ATTRIBUTE(paddle::platform::float16);
PT_SPECIALIZE_KernelCallHelper_FOR_ATTRIBUTE(const Scalar&);
/* Output Helpers */
PT_SPECIALIZE_KernelCallHelper_FOR_OUTPUT(DenseTensor);
// TODO(chenweihang): adapt SelectedRows
// PT_SPECIALIZE_KernelCallHelper_FOR_OUTPUT(SelectedRowsTensor);
/* End case */
template <typename T>
struct KernelCallHelper<TypeTag<T>> {
template <int dev_ctx_idx, int in_idx, int attr_idx, int out_idx>
static void Compute(KernelContext* ctx, Args&... args) {
static_assert(dev_ctx_idx > 0,
"Kernel should pass DeviceContext as argument.");
static_assert(out_idx > 0, "Kernel should have output argument.");
// TODO(chenweihang): check dev_ctx, in, attr, out number
return kernel_fn(args...);
}
};
};
} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/pten/core/storage.h"
namespace pten {
void TensorStorage::Realloc(size_t size) {
data_.Clear();
data_ = Allocate(alloc_, size);
size_ = size;
}
} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <cstddef>
#include "boost/intrusive_ptr.hpp"
#include "paddle/pten/core/utils/intrusive_ptr.h"
#include "paddle/pten/core/utils/intrusive_ref_counter.h"
#include "paddle/pten/core/utils/type_info.h"
#include "paddle/fluid/platform/place.h"
#include "paddle/pten/core/allocator.h"
namespace pten {
/// \brief The interface of contiguous storage used for the dense tensor.
/// It should be used in conjunction with the intrusive pointer. We prohibit
/// all default copy operations to ensure the integrity of the package.
class Storage : public intrusive_ref_counter<Storage> {
public:
using Place = paddle::platform::Place;
Storage() = default;
Storage(const Storage&) = delete;
explicit Storage(Allocation&& data) : data_(std::move(data)) {}
virtual ~Storage() = default;
/// \brief Get the mutable data pointer of the storage.
/// This function is set to inline to improve performance.
/// \return The mutable data pointer of the storage.
void* data() const noexcept { return data_.operator->(); }
virtual size_t size() const = 0;
virtual const Place& place() const = 0;
virtual bool OwnsMemory() const = 0;
virtual void Realloc(size_t n) = 0;
protected:
Allocation data_;
};
class TensorStorage : public Storage {
public:
using Place = paddle::platform::Place;
explicit TensorStorage(const std::shared_ptr<Allocator>& a) : alloc_(a) {}
TensorStorage(const std::shared_ptr<Allocator>& a, size_t size)
: Storage(Allocate(a, size)), alloc_(a), size_(size) {}
~TensorStorage() = default;
static const char* name() { return "TensorStorage"; }
void Realloc(size_t size) override;
size_t size() const noexcept override { return size_; }
const Place& place() const override { return data_.place(); }
bool OwnsMemory() const noexcept override { return true; }
const std::shared_ptr<Allocator>& allocator() const noexcept {
return alloc_;
}
private:
const std::shared_ptr<Allocator> alloc_;
int64_t size_{0};
};
} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/pten/core/tensor_base.h"
#include "paddle/pten/core/utils/type_registry.h"
namespace pten {}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/fluid/framework/ddim.h"
#include "paddle/fluid/platform/place.h"
#include "paddle/pten/common/backend.h"
#include "paddle/pten/common/data_type.h"
#include "paddle/pten/common/layout.h"
#include "paddle/pten/core/storage.h"
#include "paddle/pten/core/utils/type_registry.h"
namespace pten {
class TensorBase {
public:
using DataType = paddle::experimental::DataType;
using DataLayout = paddle::experimental::DataLayout;
using DDim = paddle::framework::DDim;
using Place = paddle::platform::Place;
virtual ~TensorBase() = default;
/// \brief Returns the number of elements contained in tensor.
/// \return The number of elements contained in tensor.
virtual int64_t numel() const = 0;
/// \brief Returns the dims of the tensor.
/// \return The dims of the tensor.
virtual const DDim& dims() const = 0;
/// \brief Returns the data type of the tensor.
/// \return The data type of the tensor.
virtual DataType data_type() const = 0;
/// \brief Returns the data layout of the tensor.
/// \return The data layout of the tensor.
virtual DataLayout layout() const = 0;
/// \brief Returns the data place of the tensor.
/// \return The data place of the tensor.
virtual const Place& place() const = 0;
/// \brief Test whether the metadata is valid.
/// \return Whether the metadata is valid.
virtual bool valid() const = 0;
/// \brief Test whether the storage is allocated.
/// return Whether the storage is allocated.
virtual bool initialized() const = 0;
/// \brief Return the type information of the derived class to support
/// safely downcast in non-rtti environment.
/// return The type information of the derived class.
TypeInfo<TensorBase> type_info() const { return type_info_; }
private:
template <typename T, typename U>
friend class TypeInfoTraits;
TypeInfo<TensorBase> type_info_{TypeInfo<TensorBase>::kUnknownType};
};
} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "paddle/pten/common/backend.h"
#include "paddle/pten/common/data_type.h"
#include "paddle/pten/common/layout.h"
// See Note [ Why still include the fluid headers? ]
#include "paddle/fluid/framework/ddim.h"
// Note: mixed_vector include many header now, LoD will be
// used on CUDA device? Can we use small_vector here?
// #include "paddle/fluid/framework/mixed_vector.h"
namespace pten {
using DDim = paddle::framework::DDim;
using LoD = std::vector<std::vector<size_t>>;
/// \brief The meta data of dense tensor. Take the structure type
/// and use all default operations.
///
struct DenseTensorMeta {
using DataType = paddle::experimental::DataType;
using DataLayout = paddle::experimental::DataLayout;
DenseTensorMeta() = default;
DenseTensorMeta(DataType type, const DDim& dims);
DenseTensorMeta(DataType type, const DDim& dims, DataLayout layout);
DenseTensorMeta(DataType type,
const DDim& dims,
DataLayout layout,
const std::vector<std::vector<size_t>>& lod);
/// \brief Test whether the metadata is valid. Does not throw exceptions.
/// \return Whether the metadata is valid.
bool valid() const noexcept;
/// During the entire life cycle of a DenseTensor, the following attributes
/// marked with `const` are expected to remain unchanged.
const bool is_scalar{false};
DDim dims;
const DataType type{DataType::FLOAT32};
const DataLayout layout{DataLayout::NCHW};
LoD lod;
};
inline DenseTensorMeta::DenseTensorMeta(DataType type, const DDim& dims)
: dims(dims), type(type) {}
inline DenseTensorMeta::DenseTensorMeta(DataType type,
const DDim& dims,
DataLayout layout)
: dims(dims), type(type), layout(layout) {}
inline DenseTensorMeta::DenseTensorMeta(
DataType type,
const DDim& dims,
DataLayout layout,
const std::vector<std::vector<size_t>>& lod)
: dims(dims), type(type), layout(layout), lod(lod) {}
inline bool DenseTensorMeta::valid() const noexcept {
bool valid{true};
valid = valid && (type != DataType::UNDEFINED);
valid = valid && (layout != DataLayout::UNDEFINED);
valid = valid && (is_scalar || product(dims) >= 0);
return valid;
}
} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include "paddle/pten/common/backend.h"
#include "paddle/pten/common/data_type.h"
#include "paddle/pten/common/layout.h"
namespace pten {
class TensorInplaceVersion {
public:
explicit TensorInplaceVersion(uint32_t inplace_version = 0)
: inplace_version_(inplace_version) {}
bool IsUnique() const { return inplace_version_ == 0; }
void Bump() { ++inplace_version_; }
uint32_t CurrentVersion() const { return inplace_version_; }
private:
uint32_t inplace_version_;
};
/**
* The Status data member of DenseTensor.
*
* Here the `static` represents information describing the status of Tensor,
* such as version counter, or other bool status members.
*
* Note: TensorStatus is a struct, the members are named like
* ordinary nonmember variables, such as `type` instead of `type_`.
* And we direct access its members, in addition to constructor, destructor
* and functions for setting data members, can not provide other functions.
*
* Note: polish impl later
*/
struct TensorStatus {
TensorStatus() = default;
TensorStatus(const TensorStatus&) = default;
TensorStatus(TensorStatus&&) = default;
TensorStatus& operator=(const TensorStatus&) = delete;
TensorStatus& operator=(TensorStatus&&) = delete;
TensorInplaceVersion inplace_version_counter{0};
/**
* For Scalar Tensor design
*/
bool is_scalar{false};
};
} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <utility>
#include "glog/logging.h"
#include "paddle/fluid/platform/enforce.h"
namespace pten {
template <typename T>
class intrusive_ptr {
public:
using this_type = intrusive_ptr;
constexpr intrusive_ptr() noexcept = default;
~intrusive_ptr() {
if (px) {
intrusive_ptr_release(px);
}
}
intrusive_ptr(intrusive_ptr&& rhs) noexcept : px(rhs.px) { rhs.px = nullptr; }
template <typename U,
typename = std::enable_if_t<std::is_base_of<T, U>::value>>
intrusive_ptr(intrusive_ptr<U>&& rhs) noexcept : px(rhs.get()) {
rhs.reset();
}
void reset() { this_type().swap(*this); }
void reset(T* rhs) { this_type(rhs).swap(*this); }
void reset(T* rhs, bool add_ref) { this_type(rhs, add_ref).swap(*this); }
T* get() const noexcept { return px; }
T* detach() noexcept {
T* ret = px;
px = nullptr;
return ret;
}
T& operator*() const {
PADDLE_ENFORCE_NOT_NULL(
px,
paddle::platform::errors::PreconditionNotMet(
"The pointer must be non-null before the dereference operation."));
return *px;
}
T* operator->() const {
PADDLE_ENFORCE_NOT_NULL(
px,
paddle::platform::errors::PreconditionNotMet(
"The pointer must be non-null before the dereference operation."));
return px;
}
void swap(intrusive_ptr& rhs) noexcept {
T* tmp = px;
px = rhs.px;
rhs.px = tmp;
}
private:
template <typename U,
typename = std::enable_if_t<std::is_base_of<T, U>::value>>
explicit intrusive_ptr(U* p, bool add_ref = true) : px(p) {
if (px && add_ref) {
intrusive_ptr_add_ref(px);
}
}
template <typename R, typename... Args>
friend intrusive_ptr<R> make_intrusive(Args&&...);
template <typename R>
friend intrusive_ptr<R> copy_intrusive(const intrusive_ptr<R>&);
T* px{nullptr};
};
template <typename T, typename U>
inline bool operator==(const intrusive_ptr<T>& a,
const intrusive_ptr<U>& b) noexcept {
return a.get() == b.get();
}
template <typename T, typename U>
inline bool operator!=(const intrusive_ptr<T>& a,
const intrusive_ptr<U>& b) noexcept {
return a.get() != b.get();
}
template <typename T, typename U>
inline bool operator==(const intrusive_ptr<T>& a, U* b) noexcept {
return a.get() == b;
}
template <typename T, typename U>
inline bool operator!=(const intrusive_ptr<T>& a, U* b) noexcept {
return a.get() != b;
}
template <typename T, typename U>
inline bool operator==(T* a, const intrusive_ptr<U>& b) noexcept {
return a == b.get();
}
template <typename T, typename U>
inline bool operator!=(T* a, const intrusive_ptr<U>& b) noexcept {
return a != b.get();
}
template <typename T>
inline bool operator==(const intrusive_ptr<T>& p, std::nullptr_t) noexcept {
return p.get() == nullptr;
}
template <typename T>
inline bool operator==(std::nullptr_t, const intrusive_ptr<T>& p) noexcept {
return p.get() == nullptr;
}
template <typename T>
inline bool operator!=(const intrusive_ptr<T>& p, std::nullptr_t) noexcept {
return p.get() != nullptr;
}
template <typename T>
inline bool operator!=(std::nullptr_t, const intrusive_ptr<T>& p) noexcept {
return p.get() != nullptr;
}
template <typename T, typename... Args>
inline intrusive_ptr<T> make_intrusive(Args&&... args) {
return intrusive_ptr<T>(new T(std::forward<Args>(args)...), false);
}
template <typename T>
inline intrusive_ptr<T> copy_intrusive(const intrusive_ptr<T>& rhs) {
return intrusive_ptr<T>(rhs.get(), true);
}
} // namespace pten
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <atomic>
namespace pten {
template <typename DerivedT>
class intrusive_ref_counter;
template <typename DerivedT>
void intrusive_ptr_add_ref(const intrusive_ref_counter<DerivedT>* p) noexcept;
template <typename DerivedT>
void intrusive_ptr_release(const intrusive_ref_counter<DerivedT>* p) noexcept;
template <typename DerivedT>
class intrusive_ref_counter {
public:
constexpr intrusive_ref_counter() noexcept : ref_(1) {}
virtual ~intrusive_ref_counter() = default;
unsigned int use_count() const noexcept { return ref_.load(); }
protected:
intrusive_ref_counter(const intrusive_ref_counter&) = delete;
intrusive_ref_counter& operator=(const intrusive_ref_counter&) = delete;
friend void intrusive_ptr_add_ref<DerivedT>(
const intrusive_ref_counter<DerivedT>* p) noexcept;
friend void intrusive_ptr_release<DerivedT>(
const intrusive_ref_counter<DerivedT>* p) noexcept;
private:
mutable std::atomic_int_fast32_t ref_;
};
template <typename DerivedT>
inline void intrusive_ptr_add_ref(
const intrusive_ref_counter<DerivedT>* p) noexcept {
p->ref_.fetch_add(1, std::memory_order_relaxed);
}
template <typename DerivedT>
inline void intrusive_ptr_release(
const intrusive_ref_counter<DerivedT>* p) noexcept {
if (p->ref_.load(std::memory_order_acquire) == 0 ||
p->ref_.fetch_sub(1) == 0) {
delete static_cast<const DerivedT*>(p);
}
}
} // namespace pten
此差异已折叠。
此差异已折叠。
add_subdirectory(lib)
cc_library(pten_hapi SRCS all.cc DEPS linalg_api math_api creation_api)
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/pten/hapi/all.h"
namespace paddle {
namespace experimental {} // namespace experimental
} // namespace paddle
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
// user apis
#include "paddle/pten/hapi/include/creation.h"
#include "paddle/pten/hapi/include/linalg.h"
#include "paddle/pten/hapi/include/manipulation.h"
#include "paddle/pten/hapi/include/math.h"
#include "paddle/pten/hapi/include/tensor.h"
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
add_subdirectory(utils)
cc_library(math_api SRCS math.cc DEPS pten)
cc_library(linalg_api SRCS linalg.cc DEPS pten)
cc_library(creation_api SRCS creation.cc DEPS pten)
cc_library(manipulation_api SRCS manipulation.cc DEPS pten)
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
add_subdirectory(tests)
cc_library(pten_hapi_utils SRCS allocator.cc storage.cc tensor_utils.cc DEPS tensor_base convert_utils
dense_tensor lod_tensor selected_rows place var_type_traits)
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
cc_test(test_framework_storage SRCS test_storage.cc DEPS pten_hapi_utils)
cc_test(test_framework_tensor_utils SRCS test_tensor_utils.cc DEPS pten_hapi_utils)
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册