- 16 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 15 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 14 12月, 2022 1 次提交
-
-
由 james 提交于
* nullptr bugfix for XPU pg mode Also a few kernels is added to xpu whitelist * increase error msg length
-
- 12 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
* chore: unify `SingleTensor` * feat: dynamic check
-
- 09 12月, 2022 1 次提交
-
-
由 PuQing 提交于
-
- 05 12月, 2022 1 次提交
-
-
由 ShenLiang 提交于
-
- 03 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
* refactor: classify static check * refactor: rename to static_check & use forward decl * refactor: switch to unary & binary funcs
-
- 28 11月, 2022 1 次提交
-
-
由 张春乔 提交于
* Update communicator.cc * Update communicator.cc * remove LoDTensor * remove LoDTensor and Tensor
-
- 24 11月, 2022 2 次提交
-
-
由 huangjiyi 提交于
* rm dependence to "convert_utils.h" in some files * fix bugs * replace DataType2String with DataTypeToString * replace framework::DataTypeSize with phi::SizeOf * mv convert_function from fluid to phi and rm old map * recommit with pre-commit * repalce ProtoVarType with ProtoDataType and update comment. * fix error about include "dnnl.hpp" * revert add dep mkldnn to convert_utils in phi * add mkldnn deps in convert_utils.h in phi * move deps to convert_utils.h in phi
-
由 james 提交于
Note: this is a temporary solution, should be replaced once reduce kernel is natively supported on KL2
-
- 23 11月, 2022 1 次提交
-
-
由 Wen Sun 提交于
* feat: static check
-
- 21 11月, 2022 4 次提交
- 19 11月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 18 11月, 2022 3 次提交
-
-
由 Wen Sun 提交于
-
由 james 提交于
* correct sync behavior for XPU distributed training XPU support event mechanism similar to cuda event, so it is advisable to use an event to sync compute/comm streams for performance. However this mechanism is never fully tested, and inconsistent loss/ending_epochs are reported. Therefore, this PR replaces event sync with stream waiting as a temporary solution. * remove compile warning
-
由 james 提交于
* fix device id issue for xpu eager xpu device id is not correctly set in eager mode, thus vars are on dev0 unless XPUDeviceGurad is called, leading to this error message for all node rank != 0: "NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported." * fix typo * fix pybind error
-
- 17 11月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 16 11月, 2022 1 次提交
-
-
由 Wen Sun 提交于
* refactor: update pg custom * fix: use new api in ut * fix: typo * revert: recover legacy apis * fix: add GetDeviceContext
-
- 14 11月, 2022 3 次提交
- 10 11月, 2022 3 次提交
-
-
由 LiYuRio 提交于
-
由 james 提交于
* XPU support eager mode * add unittest for XPU eager mode * minor bugfix * minor bugfix, test=kunlun * correct copyright info * 1. remove unsed vars/funcs 2. ProcessGroupBKCL inherit from ProcessGroupStream * bugfix for fp16 in eager mode multi-card, test=kunlun * rebase & fix a few issues * use new processgroup interface, test=kunlun * fix compile issue, test=kunlun
-
由 Wen Sun 提交于
* refactor: send, recv, send_partial, recv_partial * refactor: rm useless const ref
-
- 09 11月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 08 11月, 2022 2 次提交
- 07 11月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 04 11月, 2022 2 次提交
- 03 11月, 2022 1 次提交
-
-
由 Wang Xin 提交于
* remove unused-variable warning in linux * fix unused-variable error in GpuPS
-
- 01 11月, 2022 2 次提交
-
-
由 Ruibiao Chen 提交于
* [Auto Parallel] Improve the c++ dist attr * [Auto Parallel] Modify test_program.py * Support custom stream for standalone executor Co-authored-by: NYulong Ao <aoyulong@baidu.com>
-
由 Yuang Liu 提交于
-
- 31 10月, 2022 2 次提交
- 28 10月, 2022 2 次提交
-
-
由 Haohongxiang 提交于
-
由 Haohongxiang 提交于
* fix no sync bugs * update * update task chain fix: update wait chain feat: add `GetDeviceContext` for gloo * fix oom * fix dev * update * update Co-authored-by: NLiYuRio <liyuruijx@163.com> Co-authored-by: NForFishes <2282912238@qq.com>
-