未验证 提交 e4a134ac 编写于 作者: C chentianyu03 提交者: GitHub

support multiply inputs and outputs (#36851)

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* support multiply inputs and outputs

* rm attrs {}

* fix multioutputs bug

* merge develop

* remove unsed header file

* add missing & in const reference

* modify inputAt, outputAt to inputBetween, outputBetween
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: Nzyfncg <1370305206@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
上级 4a7f1a0d
......@@ -52,37 +52,37 @@ class KernelContext {
}
void EmplaceBackInput(std::shared_ptr<TensorBase> input) {
int index = inputs_.size();
inputs_.emplace_back(std::move(input));
// Record the start and end index of the input
int index = inputs_.size();
input_range_.emplace_back(std::pair<int, int>(index, index + 1));
}
void EmplaceBackInputs(
paddle::SmallVector<std::shared_ptr<TensorBase>> inputs) {
const paddle::SmallVector<std::shared_ptr<TensorBase>>& inputs) {
int index = inputs_.size();
for (auto in : inputs) {
inputs_.emplace_back(in);
inputs_.emplace_back(std::move(in));
}
// Record the start and end index of the input
int index = inputs_.size();
input_range_.emplace_back(
std::pair<int, int>(index, index + inputs.size()));
}
void EmplaceBackOutput(std::shared_ptr<TensorBase> output) {
int index = outputs_.size();
outputs_.emplace_back(std::move(output));
// Record the start and end index of the input
int index = outputs_.size();
output_range_.emplace_back(std::pair<int, int>(index, index + 1));
}
void EmplaceBackOutputs(
paddle::SmallVector<std::shared_ptr<TensorBase>> outputs) {
const paddle::SmallVector<std::shared_ptr<TensorBase>>& outputs) {
int index = outputs_.size();
for (auto out : outputs) {
outputs_.emplace_back(out);
outputs_.emplace_back(std::move(out));
}
// Record the start and end index of the input
int index = outputs_.size();
output_range_.emplace_back(
std::pair<int, int>(index, index + outputs.size()));
}
......@@ -96,11 +96,40 @@ class KernelContext {
return static_cast<const TensorType&>(*(inputs_.at(idx)));
}
template <typename TensorType>
std::vector<TensorType> InputBetween(size_t start, size_t end) const {
std::vector<TensorType> v;
for (size_t i = start; i < end; ++i) {
auto t = std::dynamic_pointer_cast<TensorType>(inputs_.at(i));
v.emplace_back(std::move(*t.get()));
}
return v;
}
const std::pair<int, int>& InputRangeAt(size_t idx) const {
return input_range_.at(idx);
}
const std::pair<int, int>& OutputRangeAt(size_t idx) const {
return output_range_.at(idx);
}
template <typename TensorType>
TensorType* MutableOutputAt(size_t idx) {
return static_cast<TensorType*>(outputs_.at(idx).get());
}
template <typename TensorType>
std::vector<TensorType*> MutableOutputBetween(size_t start, size_t end) {
std::vector<TensorType*> v;
for (size_t i = start; i < end; ++i) {
v.emplace_back(static_cast<TensorType*>(outputs_.at(i).get()));
}
return v;
}
template <typename AttrType>
AttrType AttrAt(size_t idx) const {
try {
......
......@@ -62,9 +62,17 @@ struct KernelArgsParseFunctor<Return_ (*)(Args_...)> {
} else if (arg_type == std::type_index(typeid(const DenseTensor&))) {
args_def->AppendInput(
default_key.backend(), default_tensor_layout, default_key.dtype());
} else if (arg_type ==
std::type_index(typeid(const std::vector<DenseTensor>&))) {
args_def->AppendInput(
default_key.backend(), default_tensor_layout, default_key.dtype());
} else if (arg_type == std::type_index(typeid(DenseTensor*))) {
args_def->AppendOutput(
default_key.backend(), default_tensor_layout, default_key.dtype());
} else if (arg_type ==
std::type_index(typeid(std::vector<DenseTensor*>))) {
args_def->AppendOutput(
default_key.backend(), default_tensor_layout, default_key.dtype());
} else {
// Attribute deal with
// TODO(chenweihang): now here allow any types of attribute, maybe
......
......@@ -79,7 +79,30 @@ using XPUContext = paddle::platform::XPUDeviceContext;
"Kernel's Input should appear before Attributes."); \
static_assert(out_idx == 0, \
"Kernel's Input should appear before Outputs."); \
const tensor_type& arg = ctx->InputAt<tensor_type>(in_idx); \
const std::pair<int, int> range = ctx->InputRangeAt(in_idx); \
const tensor_type& arg = ctx->InputAt<tensor_type>(range.first); \
KernelCallHelper<Tail...>:: \
template Compute<dev_ctx_idx, in_idx + 1, attr_idx, out_idx>( \
ctx, pargs..., arg); \
} \
}
#define PT_SPECIALIZE_KernelCallHelper_FOR_MULTI_INPUT(tensor_type) \
template <typename... Tail> \
struct KernelCallHelper<const std::vector<tensor_type>&, Tail...> { \
template <int dev_ctx_idx, \
int in_idx, \
int attr_idx, \
int out_idx, \
typename... PreviousArgs> \
static void Compute(KernelContext* ctx, PreviousArgs&... pargs) { \
static_assert(attr_idx == 0, \
"Kernel's Input should appear before Attributes."); \
static_assert(out_idx == 0, \
"Kernel's Input should appear before Outputs."); \
const std::pair<int, int> range = ctx->InputRangeAt(in_idx); \
std::vector<tensor_type> arg = std::move( \
ctx->InputBetween<tensor_type>(range.first, range.second)); \
KernelCallHelper<Tail...>:: \
template Compute<dev_ctx_idx, in_idx + 1, attr_idx, out_idx>( \
ctx, pargs..., arg); \
......@@ -104,20 +127,39 @@ using XPUContext = paddle::platform::XPUDeviceContext;
} \
}
#define PT_SPECIALIZE_KernelCallHelper_FOR_OUTPUT(tensor_type) \
template <typename... Tail> \
struct KernelCallHelper<tensor_type*, Tail...> { \
template <int dev_ctx_idx, \
int in_idx, \
int attr_idx, \
int out_idx, \
typename... PreviousArgs> \
static void Compute(KernelContext* ctx, PreviousArgs&... pargs) { \
tensor_type* arg = ctx->MutableOutputAt<tensor_type>(out_idx); \
KernelCallHelper<Tail...>:: \
template Compute<dev_ctx_idx, in_idx, attr_idx, out_idx + 1>( \
ctx, pargs..., arg); \
} \
#define PT_SPECIALIZE_KernelCallHelper_FOR_OUTPUT(tensor_type) \
template <typename... Tail> \
struct KernelCallHelper<tensor_type*, Tail...> { \
template <int dev_ctx_idx, \
int in_idx, \
int attr_idx, \
int out_idx, \
typename... PreviousArgs> \
static void Compute(KernelContext* ctx, PreviousArgs&... pargs) { \
const std::pair<int, int> range = ctx->OutputRangeAt(out_idx); \
tensor_type* arg = ctx->MutableOutputAt<tensor_type>(range.first); \
KernelCallHelper<Tail...>:: \
template Compute<dev_ctx_idx, in_idx, attr_idx, out_idx + 1>( \
ctx, pargs..., arg); \
} \
}
#define PT_SPECIALIZE_KernelCallHelper_FOR_MULTI_OUTPUT(tensor_type) \
template <typename... Tail> \
struct KernelCallHelper<std::vector<tensor_type*>, Tail...> { \
template <int dev_ctx_idx, \
int in_idx, \
int attr_idx, \
int out_idx, \
typename... PreviousArgs> \
static void Compute(KernelContext* ctx, PreviousArgs&... pargs) { \
const std::pair<int, int> range = ctx->OutputRangeAt(out_idx); \
std::vector<tensor_type*> arg = std::move( \
ctx->MutableOutputBetween<tensor_type>(range.first, range.second)); \
KernelCallHelper<Tail...>:: \
template Compute<dev_ctx_idx, in_idx, attr_idx, out_idx + 1>( \
ctx, pargs..., arg); \
} \
}
template <typename T>
......@@ -152,6 +194,7 @@ struct KernelImpl<Return (*)(Args...), kernel_fn> {
/* Input Helpers */
PT_SPECIALIZE_KernelCallHelper_FOR_INPUT(DenseTensor);
PT_SPECIALIZE_KernelCallHelper_FOR_MULTI_INPUT(DenseTensor);
// TODO(chenweihang): adapt SelectedRows
// PT_SPECIALIZE_KernelCallHelper_FOR_INPUT(SelectedRowsTensor);
......@@ -168,6 +211,7 @@ struct KernelImpl<Return (*)(Args...), kernel_fn> {
/* Output Helpers */
PT_SPECIALIZE_KernelCallHelper_FOR_OUTPUT(DenseTensor);
PT_SPECIALIZE_KernelCallHelper_FOR_MULTI_OUTPUT(DenseTensor);
// TODO(chenweihang): adapt SelectedRows
// PT_SPECIALIZE_KernelCallHelper_FOR_OUTPUT(SelectedRowsTensor);
......
......@@ -122,6 +122,14 @@ struct KernelKeyParser : ArgsIterator<KernelKeyParser> {
key_set.dtype = x.type();
}
void operator()(const std::vector<Tensor>& x) {
key_set.backend_set =
key_set.backend_set | detail::GetTensorBackendSet(x[0]);
// TODO(chenweihang): selecte multi layout and dtype
key_set.layout = x[0].layout();
key_set.dtype = x[0].type();
}
// skip other type args, these args don't used in kernel selection
template <typename T>
void operator()(const T& x) {
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册