- 20 9月, 2022 21 次提交
-
-
由 ziyoujiyi 提交于
* back fl * delete ssl cert * . * make warning * . * unittest paral degree * solve unittest * heter & multi cloud commm ready * . * . * fix gloo compile warning * adapt for nn fl-ps * flps del fake-init op * add learning_rate_0 intializer op
-
由 Wilber 提交于
-
由 zhoutianzi666 提交于
* fix cast bug
-
由 houj04 提交于
* [XPU] update xdnn activations. (#46246) * [XPU] update xpu cmake. test=kunlun
-
由 zhoutianzi666 提交于
* Move ITensor construction for Weight (persistable variable) from OpConvert to TensorRTEngine.
-
由 HongyuJia 提交于
* polish code comments * polish data_device_transform.cc
-
由 zhaoyingli 提交于
* [Auto Parallel] Change the import way of Auto Parallel (#46115) * fix strategy (#46256) * [Auto Parallel] performance improvement for Sharding-DP hybrid parallelism (#46180) * remove no need grad allreduce communication when sharding-dp * remove no need grad allreduce communication when sharding-dp * bugfix * bugfix * bugfix Co-authored-by: NYulong Ao <aoyulong@baidu.com> Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
-
由 Jiabin Yang 提交于
* [Eager] Fix ocr (#46124) * fix linspace error in amp * fix log * fix amp error * fix ocr error which caused by amp * add more check * rename dtype ns * [Eager Bug fix]Fix Detection (#46147) * fix linspace error in amp * fix log * fix amp error * Revert "Simplify size op impl (#45808)" This reverts commit c252b1de. * fix_seg * fix detection Co-authored-by: NChen Weihang <sunny_cwh@163.com> Co-authored-by: NChen Weihang <sunny_cwh@163.com>
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * Cherry-pick of PR 46045 * Fix bug of reduce_sum kp op. * Fix bug of reduce_sum kp operator compilation. If compilation device is XPU, eigen kernel should be ignored.
-
由 zhoutianzi666 提交于
* Support matmul_v2 in Paddle-TensorRT converter.
-
由 Leo Chen 提交于
* add config * add config * follow comments * fix serial run
-
由 WangZhen 提交于
* Fix TransDataBackend Error when call unsqueeze using MKL Tensor * Add UT * Refine UT
-
由 zhangkaihuo 提交于
cherry-pick : #46016, #46021, #45974 * [Sparse]Sparse add support gpu (#45974) * [Sparse]Remove unused code (#46021) * [Sparse] Add infer meta (#46016)
-
由 Jiabin Yang 提交于
* fix linspace error in amp * fix log * fix amp error
-
由 Charles-hit 提交于
* support cast op backward refuse forward and fix some bugs (#46173) * support cast op backward refuse forward * Fix the bug of high order unit test framework * support sign op backward refuse forward (#46002)
-
由 zhoutianzi666 提交于
* fix preln_residual_bias_fuse_pass bug in TNT_small model
-
由 zhangbo9674 提交于
* add scope cache & reuse * add gc scope for end of each train step * del scope reuse for jit * refine code * test
-
由 niuliling123 提交于
cherry-pick from #45826 LayoutAutotune 支持 inplace 类型的OP 根据 Add eager layout autotune #45409 修改意见调整UseAutotune 将LayoutAutotune判断放到controller中,与AMP 判断保持一致
-
由 Roc 提交于
* fix static_check error when compile twice (#46140) * [CI] fix static check in build_pr_dev (#46192) Co-authored-by: Zhou Wei <1183042833@qq.com>
-
由 zyfncg 提交于
* fix wrong eigen header include * fix complie bug * fix nan_inf_utils_detail * fix resource_manager * fix conv_miopen_helper
-
- 19 9月, 2022 19 次提交
-
-
由 wuhuachaocoding 提交于
-
由 RichardWooSJTU 提交于
[vision.ops.nms] Fix return order error and duplicate results with specific inputs (#46148) (#46193) * fix return order error and duplicate results with specific inputs
-
由 Xiaoxu Chen 提交于
* [cherry-pick] extend reduce_sum,reduce_sum,eq,ne,ge,abs,pow,etc higher order operators * add reduce_mean,reduce_sum primitive ops * add ne_p gt_p primitive operators * add ge_p abs_p primitive oparators * add cast primitive operators * add pow,square prim2oirg rules * add elementwise_div orig2prim rule * [cherry-pick] add mean,sum,ge,gt,ne,abs,etc higher-order differentiation operators(#45888) * add reduce_mean,reduce_sum primitive ops * add ne_p gt_p primitive operators * add ge_p abs_p primitive oparators
-
由 WangZhen 提交于
-
由 feifei-111 提交于
* [dy2static] support user to use decorator in their program (#45768) * support deco * fix deco ast type * arg_str * 1 * support callable deco * code style * codestyle * test_error * fix decos in another file * recover conflict codes * [BugFix] fixed a bug in decorator transformer, it can not analyze decorator with params correctly (#46055) * fix deco call * add raise * add test * add warn, fix paddle api * fix error type * fix coverage
-
由 weishengying 提交于
-
由 Wilber 提交于
-
由 Charles-hit 提交于
* add unit test for sum higher level op (#45961) * support slice op backward refuse forward and add high level unit test (#45960) * support tile op backward refuse forward (#45942) * support expand_v2 op backward refuse forward (#45941) * support concat backward refuse forward (#45940)
-
由 WangZhen 提交于
-
由 Jiabin Yang 提交于
* [PHI] Support bmm and bmm_grad in xpu (#45887) * support bmm and bmm_grad in xpu * add error removal * test=kunlun * refactor code for better structure * test=kunlun * add fp16 kernel for bmm * test=kunlun * test=kunlun
-
由 zhaocaibei123 提交于
-
由 Wangzheee 提交于
-
由 minghaoBD 提交于
Co-authored-by: NRichardWooSJTU <37864677+RichardWooSJTU@users.noreply.github.com>
-
由 wuhuachaocoding 提交于
* refactor mp. * update setup.py. * update mp_layers.py for compatibility. * add documents for mp_layers.py * update init.py * update collective.py. * update. * update mp_ops.py * update. * update code style. * update code style.
-
由 Yulong Ao 提交于
* [AutoParallel] adapt gradient merge pass (#45915) * adapt gradient merge * fix op_role * fix strategy * [Auto Parallel] Gradient Fuse Allreduce (#45643) * bugfix (#45332) * dist embedding support lookup table v1 * add unitest * customize wait_comm * group gradients * bugfix * update program * [Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com> Co-authored-by: Ncaozhou <caozhou@radi.ac.cn> Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com> * [Auto Parallel] Bugfix allreduce fuse for MP (#46086) * bugfix * bugfix * typos fixed * update strategy (#46138) Co-authored-by: Nzhaoyingli <86812880+zhaoyinglia@users.noreply.github.com> Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com> Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com> Co-authored-by: Ncaozhou <caozhou@radi.ac.cn> Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
-
由 sneaxiy 提交于
-
由 Jiabin Yang 提交于
* make eager log readable * fix compile error * recover test * invoke ci again
-
由 xiaoxiaohehe001 提交于
-
由 Chen Weihang 提交于
This reverts commit c252b1de.
-