- 18 4月, 2023 1 次提交
-
- 14 4月, 2023 1 次提交
-
-
由 JZ-LIANG 提交于
* pr1 * pr2 * pr3 * fixed unitest * adopt for scale
-
- 12 4月, 2023 3 次提交
- 11 4月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Fix scale kernel for low precision, cherry pick #50998. * Fix the FP16 precision problem of add_n. (#50129) * Change squared_l2_norm to reuse ReduceKernel, and register fp16 and bf16 kernel, which is cherry pick #48315. * Cherry-pick the fix of MPTypeTrait in KP, which is implemented in #50993. * Cherry-pick the multi-precision support of AdamW for bf16, #48041. * Fix compiling error. * Cherry-pick the fix of CubTensorReduceImpl for bfloat16 in #50993. * Fix unittest. --------- Co-authored-by: Nliuruyan <44316842+liuruyan@users.noreply.github.com>
-
- 10 4月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Broadcast the master weight along with param for distributed training. * Fix codestyle.
-
- 09 4月, 2023 3 次提交
-
-
由 Yiqun Liu 提交于
* Cherry-pick the register of bfloat16 for amp_kernel, pull request #45541. * Cherry-pick the master_grad support of adamw, pull request #51141. * add bf16 for some ops in static mode (#51582) * Add bfloat16 support for some api in static mode. * Fix codestyle. * Revert the change of layer_function_generator.py. --------- Co-authored-by: Shaojie WANG <wsjmessi@163.com>
-
由 Yiqun Liu 提交于
* Register exp/expm1/logit bf16 activation op kernels (#48702) * register more bf16 ops * update to register coresponding backward ops * Addition of bf16 type support for Compare OP (#46413) * first commit * clarify the quotes * change code style format * support bfloat16 * add bfloat16 support for more ops (#48272) * [Bfloat16]register bfloat16 datatype for squared l2 norm (#50908) * Sync the pull request #51903. * Add some header files back. * modify cmake file for cuda11.8 compile (#49020) * modify cmake file for cuda11.8 compile * add op_library(fused_embedding_eltwise_layernorm_op DEPS bert_encoder_functor) * Fix compling error. * Cherry-pick pull request #51396. --------- Co-authored-by: Nsneaxiy <32832641+sneaxiy@users.noreply.github.com> Co-authored-by: Nlimingshu <61349199+JamesLim-sy@users.noreply.github.com> Co-authored-by: Shaojie WANG <wsjmessi@163.com> Co-authored-by: Nzqw_1997 <118182234+zhengqiwen1997@users.noreply.github.com>
-
由 shaojie_wang 提交于
-
- 07 4月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
-
- 03 4月, 2023 1 次提交
-
-
由 zhaoyingli 提交于
-
- 30 3月, 2023 1 次提交
-
-
由 Yuang Liu 提交于
-
- 28 3月, 2023 1 次提交
-
-
由 LiYuRio 提交于
-
- 24 3月, 2023 1 次提交
-
-
由 LiYuRio 提交于
-
- 20 3月, 2023 1 次提交
-
-
由 LiYuRio 提交于
-
- 09 3月, 2023 1 次提交
-
-
由 JZ-LIANG 提交于
-
- 17 2月, 2023 1 次提交
-
-
由 Wen Sun 提交于
-
- 13 1月, 2023 2 次提交
-
-
由 xiaoxiaohehe001 提交于
-
由 Yuanle Liu 提交于
* fix fc kernel diff * disable fc_elementwise_layernorm_fuse_pass
-
- 12 1月, 2023 1 次提交
-
-
由 xiaoxiaohehe001 提交于
-
- 09 1月, 2023 1 次提交
-
-
由 Haohongxiang 提交于
-
- 04 1月, 2023 2 次提交
-
-
由 Yuanle Liu 提交于
* disable scale op in amp pass * Do not insert redundant cast op * fix fused_fc_elementwise_layernorm kernel diff * fix fc kerenl diff
-
由 YUNSHEN XIE 提交于
* resolve conflict * fix format error
-
- 03 1月, 2023 2 次提交
-
-
由 xiaoting 提交于
* fix fold for large bs * fix fold for large bs * fix pre-commit
-
由 feng_shuai 提交于
* cherry-pick:Some version of TensorRT don't support qkv_plugin * cherry-pick:support coverage CI
-
- 30 12月, 2022 1 次提交
-
-
由 Chenxiao Niu 提交于
* [MLU] fix compute error of dropout op (#45923) * [MLU] add mergedAdam kernel. (#45965) * [MLU] add int64 support for mlu one_hot_v2 (#46313) * [MLU] fix profiler compile failure (#46208) * [MLU] add barrier_op kernel. (#46417) * [MLU] fluid: add mluop (#46429) * [MLU] add huber_loss kernel. (#46455) * [MLU] add mlu kernel for add_reduce_max_grad (#45651) Co-authored-by: Nliupeiyu <liupeiyu@cambricon.com> * [MLU] add_fluid_mluop_yolo_box (#46573) * [MLU] fix phi::Tensor compile error of mlu. (#46649) * [MLU] add fluid MLUOps prior_box (#46585) * [MLU] fix cmake error (#46772) * [MLU]fix unittest of sync_bn (#46797) * [MLU] add masterparam support for mlu adamw. (#46804) * [MLU] add int64 support for allgather. (#46830) * [MLU] fix compile error & add mlu blacklist function. (#47439) * [MLU] fix softmax_with_cross_entropy failed in 370-X8. * [MLU] fix cncl stuck caused by multiple initializations. * [MLU] fix code style check. Co-authored-by: Nqipengh <huangqipeng@cambricon.com> Co-authored-by: Ncifar10 <41565156+cifar10@users.noreply.github.com> Co-authored-by: Lux et Veritas <1004239791@qq.com> Co-authored-by: Nliupeiyu <liupeiyu@cambricon.com> Co-authored-by: Nronnywang <ronny1996@163.com>
-
- 29 12月, 2022 2 次提交
-
-
由 zhouweiwei2014 提交于
-
由 YuanRisheng 提交于
* cherry-pick 45860 * [BUG FIX]Fix MetaTensor's bug when run infermeta (#46265) * fix sum bug * fix ci bugs * fix ci bugs * update code according comment
-
- 28 12月, 2022 1 次提交
-
-
由 Huihuang Zheng 提交于
Fix CUDA11.8 Unittest Accuracy
-
- 27 12月, 2022 2 次提交
-
-
由 Yuanle Liu 提交于
-
由 HongyuJia 提交于
* [Release2.4] Revert python link prs (#48573) * Revert "Fix mac link python (#48017)" This reverts commit 3fa7a736. * Revert "[Cherry-pick] Fix python link error (#47811)" This reverts commit ff642c68. * Update config.go * fix custom operator backward=None (#48656) * [Custom Extension] Fix custom double_grad backward=None (#49224) * fix custom double_grad backward=None * fix custom_relu.cu bug && polish testcase of double_grad * remove old dynamic graph test * add import fluid * add import fluid Co-authored-by: NChen Weihang <chenweihang@baidu.com>
-
- 22 12月, 2022 3 次提交
-
-
由 Guanghua Yu 提交于
-
由 Yuanle Liu 提交于
* [Release2.4] Revert python link prs (#48573) * Revert "Fix mac link python (#48017)" This reverts commit 3fa7a736. * Revert "[Cherry-pick] Fix python link error (#47811)" This reverts commit ff642c68. * Update config.go * fix mixed precision inference Co-authored-by: NChen Weihang <chenweihang@baidu.com>
-
由 Ligoml 提交于
-
- 21 12月, 2022 2 次提交
-
-
由 Aganlengzi 提交于
-
由 zhangkaihuo 提交于
-
- 20 12月, 2022 1 次提交
-
-
由 ShenLiang 提交于
Co-authored-by: NMing-Xu Huang <mingh@nvidia.com>
-
- 19 12月, 2022 1 次提交
-
-
由 Yuanle Liu 提交于
* [Release2.4] Revert python link prs (#48573) * Revert "Fix mac link python (#48017)" This reverts commit 3fa7a736. * Revert "[Cherry-pick] Fix python link error (#47811)" This reverts commit ff642c68. * Update config.go * [Paddle Inference] Add float_to_half_pass to support inference with mixed precision (#47993) * [Inference] optimize some code and fix some bug (#48780) * clean ir_pass_manager and fix map_depthwise_conv_to_conv_pass * fix unitest timeout * [Paddle Inference] clean unused code (#48392) * fix * update * update Co-authored-by: NChen Weihang <chenweihang@baidu.com>
-
- 29 11月, 2022 1 次提交
-
-
由 yeliang2258 提交于
[cherry-pick] updating mul and matmul with set_mem_desc and fix squeeze_transpose for MKLDNN (#47951) * Fix slice bugs in MKLDNN when input dims are zeros (#46671) * fix slice bugs * fix * update code * fix * update code * updating mul and matmul with set_mem_desc (#45624) * - mul & matmul changes - fix - bs16 correction of strides * - cosmetic fixes * - lint * - fix * - fix * - format -> mem_desc * - fix * - fix * - fix * - fix * - fix * fix squueze_transpose (#47911) Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>
-