- 12 5月, 2023 1 次提交
-
-
由 Leo Chen 提交于
-
- 21 4月, 2023 1 次提交
-
-
由 zhouweiwei2014 提交于
-
- 11 4月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Fix scale kernel for low precision, cherry pick #50998. * Fix the FP16 precision problem of add_n. (#50129) * Change squared_l2_norm to reuse ReduceKernel, and register fp16 and bf16 kernel, which is cherry pick #48315. * Cherry-pick the fix of MPTypeTrait in KP, which is implemented in #50993. * Cherry-pick the multi-precision support of AdamW for bf16, #48041. * Fix compiling error. * Cherry-pick the fix of CubTensorReduceImpl for bfloat16 in #50993. * Fix unittest. --------- Co-authored-by: Nliuruyan <44316842+liuruyan@users.noreply.github.com>
-
- 09 4月, 2023 2 次提交
-
-
由 Yiqun Liu 提交于
* Cherry-pick the register of bfloat16 for amp_kernel, pull request #45541. * Cherry-pick the master_grad support of adamw, pull request #51141. * add bf16 for some ops in static mode (#51582) * Add bfloat16 support for some api in static mode. * Fix codestyle. * Revert the change of layer_function_generator.py. --------- Co-authored-by: Shaojie WANG <wsjmessi@163.com>
-
由 Yiqun Liu 提交于
* Register exp/expm1/logit bf16 activation op kernels (#48702) * register more bf16 ops * update to register coresponding backward ops * Addition of bf16 type support for Compare OP (#46413) * first commit * clarify the quotes * change code style format * support bfloat16 * add bfloat16 support for more ops (#48272) * [Bfloat16]register bfloat16 datatype for squared l2 norm (#50908) * Sync the pull request #51903. * Add some header files back. * modify cmake file for cuda11.8 compile (#49020) * modify cmake file for cuda11.8 compile * add op_library(fused_embedding_eltwise_layernorm_op DEPS bert_encoder_functor) * Fix compling error. * Cherry-pick pull request #51396. --------- Co-authored-by: Nsneaxiy <32832641+sneaxiy@users.noreply.github.com> Co-authored-by: Nlimingshu <61349199+JamesLim-sy@users.noreply.github.com> Co-authored-by: Shaojie WANG <wsjmessi@163.com> Co-authored-by: Nzqw_1997 <118182234+zhengqiwen1997@users.noreply.github.com>
-
- 20 3月, 2023 1 次提交
-
-
由 LiYuRio 提交于
-
- 13 1月, 2023 1 次提交
-
-
由 Yuanle Liu 提交于
* fix fc kernel diff * disable fc_elementwise_layernorm_fuse_pass
-
- 09 1月, 2023 1 次提交
-
-
由 Haohongxiang 提交于
-
- 04 1月, 2023 1 次提交
-
-
由 Yuanle Liu 提交于
* disable scale op in amp pass * Do not insert redundant cast op * fix fused_fc_elementwise_layernorm kernel diff * fix fc kerenl diff
-
- 03 1月, 2023 1 次提交
-
-
由 xiaoting 提交于
* fix fold for large bs * fix fold for large bs * fix pre-commit
-
- 29 12月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
* cherry-pick 45860 * [BUG FIX]Fix MetaTensor's bug when run infermeta (#46265) * fix sum bug * fix ci bugs * fix ci bugs * update code according comment
-
- 29 11月, 2022 1 次提交
-
-
由 yeliang2258 提交于
[cherry-pick] updating mul and matmul with set_mem_desc and fix squeeze_transpose for MKLDNN (#47951) * Fix slice bugs in MKLDNN when input dims are zeros (#46671) * fix slice bugs * fix * update code * fix * update code * updating mul and matmul with set_mem_desc (#45624) * - mul & matmul changes - fix - bs16 correction of strides * - cosmetic fixes * - lint * - fix * - fix * - format -> mem_desc * - fix * - fix * - fix * - fix * - fix * fix squueze_transpose (#47911) Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>
-
- 25 11月, 2022 1 次提交
-
-
由 zyfncg 提交于
* Fix wrong eigen header include * fix compile bug
-
- 07 11月, 2022 1 次提交
-
-
由 zhangkaihuo 提交于
Revert SparseConv support duplicate coordinates
-
- 03 11月, 2022 1 次提交
-
-
由 zhangkaihuo 提交于
Unified api args name
-
- 02 11月, 2022 1 次提交
-
-
由 Siming Dai 提交于
-
- 28 10月, 2022 1 次提交
-
-
由 zhangkaihuo 提交于
add sync_batch_norm_bn and deliver indices_dict
-
- 27 10月, 2022 1 次提交
-
-
由 zhangkaihuo 提交于
* cherry-pick #46359 and resolve conflict
-
- 24 10月, 2022 1 次提交
-
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * support pure bfloat16 * support bf16 linear * update PR to pass CI * tiny fix where_grad_kernel.cu * Support bfloat16 type for reducer and sharding. * Fix some bug. * Polish code. * Polise code. * Add bfloat16 datatype in fill_grad kernels. Co-authored-by: Nsneaxiy <sneaxiy@126.com> Co-authored-by: Nsneaxiy <sneaxiy@126.com>
-
- 21 10月, 2022 1 次提交
-
-
由 JingZhuangzhuang 提交于
* Add infer prune function * add fusion op
-
- 20 10月, 2022 4 次提交
-
-
由 Yiqun Liu 提交于
* Simplify the codes of conv. (#45966) * Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)
-
由 liu zhengxi 提交于
Add value check & error message for gather_tree cherry-pick #47051
-
由 sneaxiy 提交于
Fix some operators when the tensor.numel() > INT32_MAX
-
由 sneaxiy 提交于
support pure bfloat16 for more ops
-
- 19 10月, 2022 2 次提交
-
-
由 Zhang Ting 提交于
* strided_slice grad add fp16 support
-
由 xiongkun 提交于
* [Dy2Static] Support TypeHint for function decorated by @to_static (#47121) * Add TypeHint Transformer * add unittest for typehint transformer * [Dy2Static] Remove GradTransformer (#47063) * [Dy2Static] Remove GradTransformer 1. fix einsum infershape bugs. 2. remove grad_transformer and unify paddle.grad and paddle.static.gradient. 3. add dygraph_and_dy2static_only decorator for dy2static. * fix bugs * rename
-
- 18 10月, 2022 2 次提交
-
-
由 zhouweiwei2014 提交于
新增sparse.is_same_shape、sparse.reshape、sparse.transpose 三个API
-
由 Wang Bojun 提交于
* draft with debug print * remove debug print * bug fix for ci
-
- 17 10月, 2022 3 次提交
-
-
由 zhangkaihuo 提交于
cherry-pick : #46322, #46245 Sparse API 支持静态图
-
由 Zhang Zheng 提交于
Optimize performance of depthwise_conv Config: input[2048, 1024, 4, 4], filter[1024, 1, 4, 4], stride=1, pad=0, dilation=1
-
由 Zhang Zheng 提交于
为了提升性能,将label的边界检查从python端转移到kernel内,减少额外op的调用,如min、max和同步拷贝等 当前的模板参数IgnoreIndex仅在ignore_index取值范围在[0, dim)时才生效,但是当某个label值超出了边界,ignore_index等于该label,这种情况下是应该仍然能正常计算。虽然当前的计算逻辑在结果上不会出错,但逻辑上仍是有问题的,且模板参数IgnoreIndex是没有必要的
-
- 13 10月, 2022 1 次提交
-
-
由 Sławomir Siwek 提交于
* Revert pool+grad oneDNN kernel conversion (#45989) * [PHI] transpose2_grad op migration (#46139) * op migrated, Copy(OneDNNContext, ...) added * mutable_data & op registration in fluid removed * refactoring * OneDNNGetDataType to uppercase * missing cpu check added, handler moved to .h file * name changed to transpose_grad * Copy changed back to TensorCopy * Resizing corrected, Copy(OneDNNContext) removed Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com> Co-authored-by: NPaulina Gacek <paulina.gacek@intel.com>
-
- 11 10月, 2022 6 次提交
-
-
由 Feiyu Chan 提交于
-
由 Sławomir Siwek 提交于
-
由 Sławomir Siwek 提交于
-
由 Sławomir Siwek 提交于
* [PHI] Migrate gelu kernels (#45596) * gaussian random * mkldnn to onednn renaming * fix merge conflicts * remove fluid code * onednn renaming * gelu fwd * sort activations * gelu gradient * remove unused macros * merge conflicts * fix merge conflicts * remove extra contraint from gelu op * [PHI] relu6_grad kernel (#46501) * Relu6 * remove fluid handler * add individual kernel signature * coding style * replace bounded_relu with clip * whitespace * code style
-
由 Sławomir Siwek 提交于
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
-
由 YuanRisheng 提交于
* fix concat bug * fix ci bugs * fix ci bugs
-
- 10 10月, 2022 2 次提交
-
-
由 Sławomir Siwek 提交于
[cherry-pick] [PHI] Migrate concat+grad, expand+grad, fill_constant … oneDNN kernels (#45863) (#46727) * [PHI] Migrate concat+grad, expand+grad, fill_constant, nearest_interp and bilinear_interp oneDNN kernels (#45863) * Migrate concat+grad, expand+grad, fill_constant, nearest_interp_v2 and bilinear_interp_v2 oneDNN kernels to PHI * Remove old namespace variable * Fix invalid out dims error * Add mutable_data method to concat output * Add check for -1 dim before computing out_dims * Capitalize oneDNNGetDataType function name * Change fill_constant kernel to correct PHI kernel * Attempt to fix dims error * Fix fill_constant (full) kernel * update dependencies Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
-
由 Sławomir Siwek 提交于
* [PHI] Migrate sgd and stack oneDNN kernels (#46374) * Convert slice+grad oneDNN fluid kernels to PHI * Change mutable_data to Alloc * Refactor licences * update dependencies Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
-