- 19 10月, 2022 4 次提交
-
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * Support allow_partial switch, which can be configure in pipeline_configs. If sent tensor are not the same from different hosts, they shouldn't been sent partially and then concated as a whole tensor. * Change name allow_partial to enable_partial_send_recv. * Add global variable _enable_partial_send_recv
-
由 WangZhen 提交于
[CherryPick][Dy2St]Fix recurrent op eager deletion pass error in dy2st
-
由 YangZhou 提交于
* update audio api examples * fix format * format * fix * test api * fix format * fix static check error * fix doc error * fix ci * fix api error * update api.spec * fix ci * fix typo in window gaussian
-
由 Hui Zhang 提交于
Construct exec and ctx only once in cond op to speed up
-
- 18 10月, 2022 7 次提交
-
-
由 Wilber 提交于
-
由 weishengying 提交于
Add symbolic shape deduction function for unfold, scatter_nd_add, p_norm, grid_sampler, pad3d, etc (#46291) (#47003)
-
由 zhouweiwei2014 提交于
新增sparse.is_same_shape、sparse.reshape、sparse.transpose 三个API
-
由 zhoutianzi666 提交于
-
由 Yuang Liu 提交于
* [dygraph sharding] Overlap the reduce and the caculation for sharding stage 2. (#46495) * [dygraph sharding stage 2] sharding broadcast overlap (#46656) * Multi groups for broadcast of sharding stage 2 (#46894)
-
由 Haohongxiang 提交于
* [Dygraph] Fix performance of pp+mp by using send/recv_calc_stream instead of send/recv (#46116) * [Dygraph] Fix Perf of FusedFeedForward and FusedAttention with AllReduce (#46780) * update
-
由 Wang Bojun 提交于
* draft with debug print * remove debug print * bug fix for ci
-
- 17 10月, 2022 9 次提交
-
-
由 Wen Sun 提交于
* Support both use_calc_stream and sync_op in send recv APIs (#46023) * Support both use_calc_stream and sync_op in allgather API (#46295) * Support both use_calc_stream and sync_op in collective communication API (#46761) * Move group and all reduce from collective to communication (#45848) * Completes bfloat16 dtype for collective api in eager mode (#45844) * Fix collective APIs cannot be recognized when building docs (#46962) Co-authored-by: NLiYuRio <63526175+LiYuRio@users.noreply.github.com>
-
由 zhangkaihuo 提交于
cherry-pick : #46322, #46245 Sparse API 支持静态图
-
由 Zhang Zheng 提交于
Optimize performance of depthwise_conv Config: input[2048, 1024, 4, 4], filter[1024, 1, 4, 4], stride=1, pad=0, dilation=1
-
由 Guanghua Yu 提交于
* fix dygraph new format quant * fix unittest * fix conflict
-
由 Allen Guo 提交于
-
由 Allen Guo 提交于
-
由 Allen Guo 提交于
* paddle-inference support custom-ops Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai> * fix tolower Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai> Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
-
由 Allen Guo 提交于
-
由 Zhang Zheng 提交于
为了提升性能,将label的边界检查从python端转移到kernel内,减少额外op的调用,如min、max和同步拷贝等 当前的模板参数IgnoreIndex仅在ignore_index取值范围在[0, dim)时才生效,但是当某个label值超出了边界,ignore_index等于该label,这种情况下是应该仍然能正常计算。虽然当前的计算逻辑在结果上不会出错,但逻辑上仍是有问题的,且模板参数IgnoreIndex是没有必要的
-
- 14 10月, 2022 8 次提交
-
-
由 xiaoxiaohehe001 提交于
-
由 Wilber 提交于
-
由 Guanghua Yu 提交于
-
由 xiaoxiaohehe001 提交于
-
由 Aurelius84 提交于
-
由 Aurelius84 提交于
* [BUG]Fix expand_as_v2 bug while X and Y with different dtype * fix commit
-
由 Zhang Jun 提交于
* fix reshape2 opteller; add elementwise min/max register for tensorrt
-
由 zhoutianzi666 提交于
-
- 13 10月, 2022 3 次提交
-
-
由 zhangbo9674 提交于
-
由 傅剑寒 提交于
Fix set_value failure when source tensor is fp16 Dtype and destiny value is a number (dev PR link:#46801)
-
由 Sławomir Siwek 提交于
* Revert pool+grad oneDNN kernel conversion (#45989) * [PHI] transpose2_grad op migration (#46139) * op migrated, Copy(OneDNNContext, ...) added * mutable_data & op registration in fluid removed * refactoring * OneDNNGetDataType to uppercase * missing cpu check added, handler moved to .h file * name changed to transpose_grad * Copy changed back to TensorCopy * Resizing corrected, Copy(OneDNNContext) removed Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com> Co-authored-by: NPaulina Gacek <paulina.gacek@intel.com>
-
- 12 10月, 2022 2 次提交
-
-
由 niuliling123 提交于
Cherry-pick 46541 保证Reset50 TSM deeplabv3模型零修改下实现Layout自动调优
-
由 ronnywang 提交于
cherry pick pr46536
-
- 11 10月, 2022 7 次提交
-
-
由 Feiyu Chan 提交于
-
由 Sławomir Siwek 提交于
-
由 Sławomir Siwek 提交于
-
由 Sławomir Siwek 提交于
* [PHI] Migrate gelu kernels (#45596) * gaussian random * mkldnn to onednn renaming * fix merge conflicts * remove fluid code * onednn renaming * gelu fwd * sort activations * gelu gradient * remove unused macros * merge conflicts * fix merge conflicts * remove extra contraint from gelu op * [PHI] relu6_grad kernel (#46501) * Relu6 * remove fluid handler * add individual kernel signature * coding style * replace bounded_relu with clip * whitespace * code style
-
由 Sławomir Siwek 提交于
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
-
由 ceci3 提交于
-
由 Yuang Liu 提交于
* bug fix for virtual pipeline parallel (#45922) * dont wait for send op under dygraph pp (#46209) * [interleave pp] sync recv for 1f1b (#46399) * [dygraph pp] all sync for allgather partial (#46483)
-