- 18 11月, 2021 4 次提交
-
-
由 Li Min 提交于
* fix bug to support dropout eval grad computing. * Remove useless code.
-
由 YuanRisheng 提交于
* elementwise_add kernel refactor * fix compile bugs in elementwise_add refactor * fix compile bugs when run in npu/xpu * fix bugs when run unit test * fix bugs when run ci-windows * modify code as recommended * code format adjust * fix bugs when run ci * fix compile bug when run in ci-windwos * elementwise_sub refactor * add PD_DLL_DECL for elementwise_sub * fix bugs when compilei
-
由 Zhen Wang 提交于
* Add the `GetFetchNames` method in CinnGraphSymbolization. * Use unordered_set instead vector as the type of fetch_var_names. * Reuse the definition of kCompilationKey. * Use CompileOptions to set fetch_var_ids. * Update the argument passing of GraphCompiler.Build. * Fix some bugs in CinnGraphSymbolization::GetFetchIds.
-
由 zhangkaihuo 提交于
topk中有cub和手写kernel两种实现,而cub是通过排序来获取topk,通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。
-
- 17 11月, 2021 6 次提交
-
-
由 Sławomir Siwek 提交于
* Use oneDNN reorder instead of custom one * Fix whitespace typo * Fix Code format error * Incorporating feedback * Remove unncessary reorder * Support GIOHW format * Fix code format error
-
由 piotrekobiIntel 提交于
* Change first batch of mkldnn headers and namespace names to dnnl * Revert changes to tensor.h, which require approval * Format changes with pre-commit * Add int32 tests * Fix int32 tests and call GetDataFromTensor for int32 * Fix test
-
由 niuliling123 提交于
* Modify reduce_op.op.h for xpu2 with kernel primitive api
-
由 zmx 提交于
* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * refactor heter trainer. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop
-
由 Leo Chen 提交于
* copy beta pow to same place when skip_update=1 * fix xpu
-
由 WangXi 提交于
-
- 16 11月, 2021 5 次提交
-
-
由 arlesniak 提交于
* Added BF16 Pool2d grad * upstream pulled * fix for CI * fixes after review
-
由 YuanRisheng 提交于
* reshape kernel refactor * fix compile bugs when run ci * support xpu for reshape * fix bugs when run unittest in kunlun ci * fix compile bugs when run kunlun * perfect code according to suggestion * add api and unit test for reshape
-
由 Yiqun Liu 提交于
* Make FLAGS_determinstic effective in conv2d forward. * Add call of SetCinnCudnnDeterministic in cinn_launch op.
-
由 jakpiase 提交于
-
由 Li Min 提交于
fused_attention_op的实现中,使用了bias_add,且其实现是通过使用kernel primitive来实现的,之后kernel primitive的WriteData api接口及函数内部实现发生了更改,将判断越界的逻辑移到了template的参数中,使得调用的分支有错误,产生了越界赋值操作,污染了别的显存空间的内容。具体表现为:test_fused_attention_op_api.py 单次执行基本上不会报错,多次循环执行不同shape的输入,结果计算不对,具有偶发性,bug不易察觉。
-
- 15 11月, 2021 6 次提交
-
-
由 Chen Weihang 提交于
* move extension into pten [no-verify] * append tensor methods by ext_tensor [no-verify] * append other tensor methods [no-verify] * ext related files tidy [no-verify] * include relation tidy [no-verify] * add pten tensor test [no-verify] * replace tensor in custom op & compile success * refine tensor constructor for unittest * custom relu jit run success * fix all custom op unittests * add inference cmake adapt [no-verify] * fix failed unittests * fix windows failed unittests * try to fix kunlun and inference failed * fix test_elementwise_api error * try to fix win compile failed * fix kunlun fp16 type error * remove useless haddle error macro * add custom linear op test * fix compile failed & add win symbols * fix non pten kernel cast failed * add dll decl for api * polish several deetails * polish details by review comment * add dll_decl for register
-
由 feng_shuai 提交于
-
由 arlesniak 提交于
* Added BF16 to mean op * fix for CI * fix for CI * fix for CI
-
由 Weilong Wu 提交于
* Add elementwise_mul triple grad kernel * Removed InplaceInferer and polished code
-
由 Zeng Jinle 提交于
* add split_program * make ut faster * increase ut timeout * make result deterministic * add fuse_all_reduce pass * add ut framework, update * fix ut framework * remove useless code * add coverage support * update * fix CI * fix some bugs and fix ci coverage * fix conflict
-
由 zmx 提交于
* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop
-
- 14 11月, 2021 1 次提交
-
-
由 YuanRisheng 提交于
* reshape kernel refactor * fix compile bugs when run ci * support xpu for reshape * fix bugs when run unittest in kunlun ci * fix compile bugs when run kunlun * perfect code according to suggestion
-
- 13 11月, 2021 1 次提交
-
-
由 CtfGo 提交于
Modify serveral implements on CinnLaunchOp: 1. Skip checking input variables must be used 2. Move current helper functions to a CinnlaunchContext
-
- 12 11月, 2021 5 次提交
-
-
由 zhangkaihuo 提交于
* fix bug: 1. atten: set the default value of attn_dropout_rate to None 2. ffn: add activation parameter
-
由 Chen Weihang 提交于
-
由 zyfncg 提交于
* adjust the param of full_like api in pten * adjust the code format * adjust the code format * adjust the code format
-
由 Aganlengzi 提交于
-
由 YuanRisheng 提交于
* elementwise_add kernel refactor * fix compile bugs in elementwise_add refactor * fix compile bugs when run in npu/xpu * fix bugs when run unit test * fix bugs when run ci-windows * modify code as recommended * code format adjust * fix bugs when run ci * fix compile bug when run in ci-windwos
-
- 11 11月, 2021 4 次提交
-
-
由 zmx 提交于
-
由 TTerror 提交于
* add where/where_index/masked_select for kunlun * fix where/where_index * update where/masked_select
-
由 jakpiase 提交于
* added softplus + activation fuse plass * minor change * implemented reviewer suggestion * minor fix * minor fix * added scale_out parameter * minor fix * fix for iScan CI * conditionally disabled logs * refactored pass builder
-
由 zmx 提交于
* change username * fix * fix * fix * fix * fix * update * update * update unittests * fix * update * fix * update * fix * fix * fix * update * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update send_and_recv op. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix unit. notest,test=coverage * fix ut. notest, test=coverage * update. notest,test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix. notest, test=coverage * fix. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * add func. notest, test=coverage * fix ut. notest, test=coverage * fix. test=develop * fix. test=develop
-
- 10 11月, 2021 3 次提交
- 09 11月, 2021 3 次提交
-
-
由 Haohongxiang 提交于
-
由 Zeng Jinle 提交于
* try to fix CUDA Graph H2D copy bug * remove useless code * fix ci * fix ROCM CI * fix CUDA_VERSION * improve CI coverage
-
由 TTerror 提交于
-
- 08 11月, 2021 2 次提交
-
-
由 zyfncg 提交于
* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b. * Move cpu, cuda and other device code into kernels (#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: NChen Weihang <chenweihang@baidu.com> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (#27) * replace to small vector and change to const & * add std::move Co-authored-by: NChen Weihang <chenweihang@baidu.com> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> * polish some details * polish kernel signature details * fix a bug about offsets of the tensor, test=develop (#31) Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> * polish some details * add fill_constant kernel in pten * fix bug of full api (c++) * remove the support for SelectRows in new fill_constant kernel * fix bug of setting fill_any_like kernel key * merge code confilct * modify fill_constant GetExpectedKernelType * fix fill_constant KernelType bug * polish code of build pten KernelContext * refactor code of fill_constant in pten Co-authored-by: NChen Weihang <chenweihang@baidu.com> Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com> Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com> Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
由 Li Min 提交于
目前的fused_attention_op不支持attn_mask=None的输入,本PR对此进行了补充,并补充了相应的单测逻辑。
-