• Z
    Add matmul_v2 kernel in pten (#36844) · e11ecfce
    zyfncg 提交于
    * initial tensor design & sign kernel demo
    
    * add move constructor for meta & add lodtensor
    
    * add dirs & sign xpu kernel
    
    * add mean cpu&cuda kernel impl
    
    * move sign & mean xpu & npu kernel
    
    * add selected_rows basic impl
    
    * refactor design, BaseTensor to DenseTensor, etc.
    
    * add scale mkldnn kernel
    
    * polish xpu & npu impl details
    
    * fix mkldnn reuse compile failed
    
    * change tensor operation lib name
    
    * rename util filename
    
    * add more comments
    
    * change TensorImplInterface to TensorInterface
    
    * add kernel key and factory
    
    * remove MKLDNNTensorMeta, add MKLDNNDenseTensor
    
    * change XXDeviceContext to XXContext
    
    * add base kernel registrar utils & test on sign
    
    * replace boost::any by paddle::any
    
    * fix several ci failed
    
    * fix npu compile error
    
    * add ordered map util
    
    * fix multiple ordered_map compile errors
    
    * move dev into include dir
    
    * support sign op in static op run
    
    * fix static op run error
    
    * fix new executor compile failed
    
    * add dygraph branch & remove sign_op.h
    
    * fix test_infer_no_need_buffer_slots
    
    * fix rocm compile link error
    
    * fix unitybuild error & clear glog
    
    * fix npu compile failed
    
    * skip quant trans test
    
    * fix part windows compile problem
    
    * fix xpu enforce error
    
    * fix inference test failed
    
    * remove ordered_map to solve quant failed
    
    * fix part of rcom compile faild
    
    * add more register kernels
    
    * revert scale kernel temporarily
    
    * fix code format error
    
    * add new kernel registrar marco
    
    * rename top to tcmpt
    
    * revert xpu, npu, mkldnn impl & remove op def
    
    * add kernel args parse functor to auto parse args
    
    * revert some change & add scale kernels
    
    * add op proto in dygraph kernelcontext building
    
    * polish kernel dispatch logic & nameing rule
    
    * fix scale kernel match error
    
    * fix scale test failed
    
    * add mean API and unittest
    
    * test mean api success
    
    * add branch to solve compiled error
    
    * skip clang format error
    
    * add mean skip rule in op_library
    
    * add dot kernel, api and unittest (#6)
    
    * remove old kernel and add symbol link
    
    * fix dot compiled failed
    
    * add merco for module declare
    
    * fix npu and xpu compile error
    
    * revert sign, mean, scale, dot kernel removing
    
    * add comment for keeping old kernel impl
    
    * fix mutable_data error
    
    * fix bfloat16 conflit
    
    * fix inference undef error
    
    * adapt to msvc compile rules
    
    * polish comment for template inst
    
    * add cmake template instantiation for win
    
    * fix backend to place device id bug
    
    * fix ifdef error
    
    * Op2functor (#7)
    
    * add kernel args maker class
    
    * make args maker non-const
    
    * remove debug log
    
    * modify codes by review options
    
    * split constructPrKernelContext function
    
    * fix output name bug
    
    * fix test_mean_op test_sign_op failed
    
    * fill_any_like kernel refactor (#10)
    
    * fill_any_like kernel refactor
    
    * remove useless code of full_like c++ api
    
    * skip dtype for fill_any_like
    
    * add attrs for kernel key constrcut
    
    * add use_pt_kernel Flags to control whether to use pt kernel (#13)
    
    * add use_pt_kernel Flags to control whether to use pt kernel
    
    * change the default value to true for cheking pt kernels
    
    * fix mutable_data cuda place error
    
    * move high level apis into hapi
    
    * remove selectedrows adapting temporarily
    
    * Support Scalar in Tensor Compute Library (#14)
    
    * fill_any_like kernel refactor
    
    * remove useless code of full_like c++ api
    
    * Support Scalar in Tensor Compute Library
    
    * add scalar in dygraph and static graph mode
    
    * keep the basic type for attr, instead of using scalar for all
    
    * merge the code
    
    * remove mkldnn tensor & polish details
    
    * use flat_hash_map and small_vector in kernel factory
    
    * Refactor flatten kernel (#12)
    
    * refactor flatten kernel
    
    * update infershape function
    
    * fix compile bugs
    
    * fix bugs when merge
    
    * fix compiler bugs
    
    * fix bugs when run test_flatten_api
    
    * fix bugs when run test
    
    * Revert "use flat_hash_map and small_vector in kernel factory"
    
    This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.
    
    * Move cpu, cuda and other device code into kernels (#15)
    
    * fill_any_like kernel refactor
    
    * remove useless code of full_like c++ api
    
    * Support Scalar in Tensor Compute Library
    
    * add scalar in dygraph and static graph mode
    
    * keep the basic type for attr, instead of using scalar for all
    
    * merge the code
    
    * start refactor matmul
    
    * move cpu, cuda and other device modules into kernels
    
    * merge code
    
    * polish code in operator.cc
    
    * Perfect unitests (#16)
    
    * perfect unittest
    
    * update license
    
    * replace with flat_hash_map, small_vector (#19)
    
    * fix small_vector build error on windows platform
    
    * replace with flat_hash_map, small_vector
    
    * remove todo
    
    * Perfect unitests (#20)
    
    * perfect unittest
    
    * update license
    
    * fix bug when run tcmpt_utils_test
    
    * refactor execution adapting impl
    
    * fix insert conflit
    
    * Fix CI bug of test_yolov3 (#21)
    
    * fill_any_like kernel refactor
    
    * remove useless code of full_like c++ api
    
    * Support Scalar in Tensor Compute Library
    
    * add scalar in dygraph and static graph mode
    
    * keep the basic type for attr, instead of using scalar for all
    
    * merge the code
    
    * start refactor matmul
    
    * move cpu, cuda and other device modules into kernels
    
    * merge code
    
    * polish code in operator.cc
    
    * Fix CI bug of test_yolov3
    
    * add the tensor base class, test=develop (#17)
    
    * update the tensor base class, test=develop
    
    * remove two funcs, test=develop
    
    * update the error msg, test=develop
    Co-authored-by: NChen Weihang <chenweihang@baidu.com>
    
    * [no-verify] commit backend and tensor signature changes
    
    * Rename tcmpt to pten (#23)
    
    * rename tcmpt to pten
    
    * update omitted files for rename to pten
    
    * update omitted file for rename to pten
    
    * remove k of all enum var
    
    * remove kernel_instantiate (#26)
    
    * remove symbols and spatial_tensor
    
    * change common to functions
    
    * readd share tensor impl methods
    
    * add a candidate dense tensor class, test=develop (#28)
    
    * change all Pt to Pten
    
    * resolve conflit with xiaowei
    
    * Op2functor opt1 (#27)
    
    * replace to small vector and change to const &
    
    * add std::move
    Co-authored-by: NChen Weihang <chenweihang@baidu.com>
    
    * polish kernel factory and kernel registry
    
    * fix operator test error msg mismatch
    
    * remove tensor signature and backend set member
    
    * move scalar and polish enforce
    
    * revert dtype layout change to fix error
    
    * fix enum operator override error
    
    * add several base unittests
    
    * add pten utils tests
    
    * polish some details
    
    * Dev/op2func refactor 3 (#30)
    
    * add a candidate dense tensor class, test=develop
    
    * remove TensorBase::backend(), test=develop
    
    * remove some ops, test=develop
    
    * cherry-pick the pr of tensor meta, test=develop
    
    * moves the dense tensor and some ops, test=develop
    
    * update the linalg operator, test=develop
    
    * update other operators, test=develop
    
    * fix errors, test=develop
    
    * fix bugs, test=develop
    
    * try to resolve the problem of windows ci, test=develop
    
    * updates codes, test=develop
    
    * fix the tensor_utils.cc, test=develop
    
    * modify the dense tensor, test=develop
    
    * fix the data type, test=develop
    Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
    
    * polish some details
    
    * polish kernel signature details
    
    * fix a bug about offsets of the tensor, test=develop (#31)
    Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
    
    * add matmul kernel in pten
    
    * add unittest for new matmul_v2 kernel
    
    * fix bug of CI compile
    
    * fix bug of CI compile
    
    * merge conflict
    
    * remove useless file
    Co-authored-by: NChen Weihang <chenweihang@baidu.com>
    Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
    Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
    Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
    e11ecfce
operator.cc 68.0 KB