- 13 2月, 2023 2 次提交
- 10 2月, 2023 2 次提交
-
-
由 zhangkaihuo 提交于
att, cherry-pick #48563
-
由 zhangkaihuo 提交于
att, cherry-pick: #48563 , #50287
-
- 07 2月, 2023 2 次提交
-
-
由 Siming Dai 提交于
* Fix to_dlpack (#50138) * fix to_dlpack for loop * fix reference count * fix conflicts
-
由 zqw_1997 提交于
* 2.4:modify cmake file for cuda11.8 compile * fix small mistake * mistake resolved
-
- 06 2月, 2023 1 次提交
-
-
由 zhouweiwei2014 提交于
-
- 03 2月, 2023 2 次提交
- 02 2月, 2023 2 次提交
-
-
由 zhangkaihuo 提交于
cherry-pick some PR about optimize sparse kernel and fix some bug: #47736 #47703 #47604 #46679 #48439 #49009 #49734
-
由 Zhang Jun 提交于
* constant folding/trt subgrash pass debug * constant folding set persistalbe var in OP block, and remove unsed log * set node var persistalbe
-
- 31 1月, 2023 1 次提交
-
-
由 zhangkaihuo 提交于
att, cherry-pick#48254, and resolve conflict
-
- 19 1月, 2023 1 次提交
-
-
由 heliqi 提交于
* Fix paddle.queeze_ bug (#49903) * fix queeze_ bug * fix slove use squeeze_kernel * fix slove use squeeze_kernel * fix slove use squeeze_kernel * add test case * Update squeeze_kernel.h
-
- 13 1月, 2023 2 次提交
-
-
由 xiaoxiaohehe001 提交于
-
由 Yuanle Liu 提交于
* fix fc kernel diff * disable fc_elementwise_layernorm_fuse_pass
-
- 12 1月, 2023 1 次提交
-
-
由 xiaoxiaohehe001 提交于
-
- 09 1月, 2023 1 次提交
-
-
由 Haohongxiang 提交于
-
- 04 1月, 2023 2 次提交
-
-
由 Yuanle Liu 提交于
* disable scale op in amp pass * Do not insert redundant cast op * fix fused_fc_elementwise_layernorm kernel diff * fix fc kerenl diff
-
由 YUNSHEN XIE 提交于
* resolve conflict * fix format error
-
- 03 1月, 2023 2 次提交
-
-
由 xiaoting 提交于
* fix fold for large bs * fix fold for large bs * fix pre-commit
-
由 feng_shuai 提交于
* cherry-pick:Some version of TensorRT don't support qkv_plugin * cherry-pick:support coverage CI
-
- 30 12月, 2022 1 次提交
-
-
由 Chenxiao Niu 提交于
* [MLU] fix compute error of dropout op (#45923) * [MLU] add mergedAdam kernel. (#45965) * [MLU] add int64 support for mlu one_hot_v2 (#46313) * [MLU] fix profiler compile failure (#46208) * [MLU] add barrier_op kernel. (#46417) * [MLU] fluid: add mluop (#46429) * [MLU] add huber_loss kernel. (#46455) * [MLU] add mlu kernel for add_reduce_max_grad (#45651) Co-authored-by: Nliupeiyu <liupeiyu@cambricon.com> * [MLU] add_fluid_mluop_yolo_box (#46573) * [MLU] fix phi::Tensor compile error of mlu. (#46649) * [MLU] add fluid MLUOps prior_box (#46585) * [MLU] fix cmake error (#46772) * [MLU]fix unittest of sync_bn (#46797) * [MLU] add masterparam support for mlu adamw. (#46804) * [MLU] add int64 support for allgather. (#46830) * [MLU] fix compile error & add mlu blacklist function. (#47439) * [MLU] fix softmax_with_cross_entropy failed in 370-X8. * [MLU] fix cncl stuck caused by multiple initializations. * [MLU] fix code style check. Co-authored-by: Nqipengh <huangqipeng@cambricon.com> Co-authored-by: Ncifar10 <41565156+cifar10@users.noreply.github.com> Co-authored-by: Lux et Veritas <1004239791@qq.com> Co-authored-by: Nliupeiyu <liupeiyu@cambricon.com> Co-authored-by: Nronnywang <ronny1996@163.com>
-
- 29 12月, 2022 2 次提交
-
-
由 zhouweiwei2014 提交于
-
由 YuanRisheng 提交于
* cherry-pick 45860 * [BUG FIX]Fix MetaTensor's bug when run infermeta (#46265) * fix sum bug * fix ci bugs * fix ci bugs * update code according comment
-
- 28 12月, 2022 1 次提交
-
-
由 Huihuang Zheng 提交于
Fix CUDA11.8 Unittest Accuracy
-
- 27 12月, 2022 2 次提交
-
-
由 Yuanle Liu 提交于
-
由 HongyuJia 提交于
* [Release2.4] Revert python link prs (#48573) * Revert "Fix mac link python (#48017)" This reverts commit 3fa7a736. * Revert "[Cherry-pick] Fix python link error (#47811)" This reverts commit ff642c68. * Update config.go * fix custom operator backward=None (#48656) * [Custom Extension] Fix custom double_grad backward=None (#49224) * fix custom double_grad backward=None * fix custom_relu.cu bug && polish testcase of double_grad * remove old dynamic graph test * add import fluid * add import fluid Co-authored-by: NChen Weihang <chenweihang@baidu.com>
-
- 22 12月, 2022 3 次提交
-
-
由 Guanghua Yu 提交于
-
由 Yuanle Liu 提交于
* [Release2.4] Revert python link prs (#48573) * Revert "Fix mac link python (#48017)" This reverts commit 3fa7a736. * Revert "[Cherry-pick] Fix python link error (#47811)" This reverts commit ff642c68. * Update config.go * fix mixed precision inference Co-authored-by: NChen Weihang <chenweihang@baidu.com>
-
由 Ligoml 提交于
-
- 21 12月, 2022 2 次提交
-
-
由 Aganlengzi 提交于
-
由 zhangkaihuo 提交于
-
- 20 12月, 2022 1 次提交
-
-
由 ShenLiang 提交于
Co-authored-by: NMing-Xu Huang <mingh@nvidia.com>
-
- 19 12月, 2022 1 次提交
-
-
由 Yuanle Liu 提交于
* [Release2.4] Revert python link prs (#48573) * Revert "Fix mac link python (#48017)" This reverts commit 3fa7a736. * Revert "[Cherry-pick] Fix python link error (#47811)" This reverts commit ff642c68. * Update config.go * [Paddle Inference] Add float_to_half_pass to support inference with mixed precision (#47993) * [Inference] optimize some code and fix some bug (#48780) * clean ir_pass_manager and fix map_depthwise_conv_to_conv_pass * fix unitest timeout * [Paddle Inference] clean unused code (#48392) * fix * update * update Co-authored-by: NChen Weihang <chenweihang@baidu.com>
-
- 29 11月, 2022 1 次提交
-
-
由 yeliang2258 提交于
[cherry-pick] updating mul and matmul with set_mem_desc and fix squeeze_transpose for MKLDNN (#47951) * Fix slice bugs in MKLDNN when input dims are zeros (#46671) * fix slice bugs * fix * update code * fix * update code * updating mul and matmul with set_mem_desc (#45624) * - mul & matmul changes - fix - bs16 correction of strides * - cosmetic fixes * - lint * - fix * - fix * - format -> mem_desc * - fix * - fix * - fix * - fix * - fix * fix squueze_transpose (#47911) Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>
-
- 28 11月, 2022 1 次提交
-
-
由 zlsh80826 提交于
* Reduce squeeze2_matmul_fuse_pass, flattent tests time (#47098) * Add missing fp32 config and reduce the testing combination * Reduce trt matmul pass test max examples * Loose TRT fp16 tests tolerance (#47100) * Loose TRT half test tolerance to 1e-3 (#47101) * Loose TRT half test tolerance to 1e-3 (#47106) * Update distributed_strategy.proto (#46531) * Close popen pipe after used (#47053) * Add launch_bounds (#47285) * Fix TRT UT failures (#47488) * Format cherry-picked commits * CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203) * Skip tests that use fused_ops on H100 * Add error message to FusedOps on H100 Co-authored-by: NShijie <505749828@qq.com> Co-authored-by: NLeo Chen <39020268+leo0519@users.noreply.github.com> Co-authored-by: NTian Zheng <tizheng@nvidia.com>
-
- 25 11月, 2022 2 次提交
- 24 11月, 2022 1 次提交
-
-
由 ustiniankw 提交于
* fixdocs, test=document_fix * fixdocs, test=document_fix
-
- 16 11月, 2022 1 次提交
-
-
由 wanghuancoder 提交于
* finx mac link python * refine
-