- 23 5月, 2023 1 次提交
-
-
由 huangjiyi 提交于
* update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update HostAlloc * update param name * update cpu kernel * remove kernel header * update * update
-
- 04 4月, 2023 1 次提交
-
-
由 huangjiyi 提交于
* update * fix bug * fix bug * revert diag_op * revert expand_op and expand_as_op * fix bug * fix bug
-
- 24 3月, 2023 1 次提交
-
-
由 YuanRisheng 提交于
* decouple memory copy * fix ci bugs * fix ci compile bugs * fix rocm compile * fix ci bugs * decouple memory * deal with conflict * fix xpu compile bugs * fix xpu bugs * deal with xpu bugs * fix cmake bugs * fix windows bugs * fix ci bugs * fix ci bugs * delete redundance code * add code for pybind * fix py3 bugs * fix ci bugs
-
- 01 3月, 2023 1 次提交
-
-
由 TaoTao Li 提交于
-
- 28 9月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* remove needless using tensor * remove needless using tensor * resolve conflict * replace tensor using * fix format error * revert needless changing * fix rocm and npu compile error * fix cinn compile error * fix format error * fix mkldnn format error * fix mkldnn format error * fix cinn compile error * fix cinn compile error * fix cinn compile error * resolve conflict
-
- 14 9月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* fix distributed_fused_lamb nan * remove CUDA_ASSERT
-
- 08 8月, 2022 1 次提交
-
-
由 Thomas Young 提交于
-
- 04 8月, 2022 1 次提交
-
-
由 sneaxiy 提交于
-
- 03 8月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add use_hierarchical_allreduce * support hierarchical allreduce for more cases
-
- 01 8月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* remove cudaDeviceContext * remove more template * fix rocm compile * remove alias name CUDADeviceContext * fix compile * fix tests * revert changes
-
- 27 7月, 2022 1 次提交
-
-
由 Yuang Liu 提交于
-
- 26 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 10 6月, 2022 1 次提交
-
-
由 sneaxiy 提交于
-
- 09 6月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add nproc_per_node for DistributedFusedLamb * fix nproc_per_node communicator bug * fix ring_id = 1 init bug * fix ci * fix test_parallel_executor_mnist.py
-
- 07 6月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add use_master_acc_grad * add ut
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 28 4月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add gradient merge for DistributedFusedLamb * use master acc gradient * fix CI ut * polish * remove math_function_impl.h change * fix test_update_loss_scaling_op.py * try to fix XPU/NPU CI * add gm ut
-
- 07 4月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add Output(Step) to distributed fused lamb op * add _set_step
-
- 04 3月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* clean distribution_helper, index_impl, aligned_vector code in fluid * fix conflicts
-
- 02 3月, 2022 1 次提交
-
-
由 sneaxiy 提交于
-
- 01 3月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* vectorize lamb kernel * remove flags, add ut * remove useless codes * refine code, add param order
-
- 25 2月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add multi tensor apply l2 norm * add multi_tensor_apply code * make sizeof(TensorMeta) smalller * move code to distributed_fused_lamb_op.cu * remove useless FLAGS
-
- 20 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* rename pten dir to phi * rename namespace to phi * rename infrt pten dir to phi * resolve conflict * rename pten to phi in cmake * revert all infrt change * change needed files * fix infrt failed * fix inference failed
-
- 19 2月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add DistributedFusedLamb op * polish code * fix compile error * compatible with pten changement * fix rocm compile error * improve converage * update upstream/develop * fix cast_with_ptr.h * add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1 * fix clip before allreduce * add use_master_param_norm * code polish * fix bug * fix ROCM ci
-