- 02 6月, 2022 16 次提交
-
-
由 Fan Zhang 提交于
* Adapt XPUPS - 1st version - 3.24 * Adapt XPUPS - update XPU PushSparse - 2nd version - 3.24 * Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25 * refactor heter comm kernel * update. test=develop * Adapt XPUPS - modify by compilation - 4th version - 3.27 * update calc_shard_offset. test=develop * update xpu kernel. test=develop * update args of calc_shard_offset * update. test=develop * remove customGradMerger * update. test=develop * heter_comm update * heter_comm update * update calc_shard_offset. test=develop * heter_comm update * update args of calc_shard_offset * update. test=develop * remove customGradMerger * update. test=develop * fix. test=develop * update. test=develop * update. test=develop * update optimizer kernel * Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30 * update. test=develop * update pslib.cmake * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * Adapt XPUPS - modify by kp compilation - 6th version - 3.30 * update. test=develop * update. test=develop * update. test=develop * update optimizer kernel * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * used by minxu * update heter_comm_inl * fix. test=develop * Adapt XPUPS - modify by kp compilation - 7th version - 3.30 * fix. test=develop * add optimizer kernel. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * 3.31 update * Adapt XPUPS - update kp compilation path - 8th version - 3.31 * add optimizer kernel. test=develop * fix kunlun not support size_t. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix kunlun not support size_t. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * update heter_comm_kernel.kps 3.31 * fix. test=develop * fix. test=develop * update heter_comm_kernel.kps 3.31 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * update heter_comm.h 3.31 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * update hashtable. test=develop * update. test=develop * Adapt XPUPS - update by kp compilation - 9th version - 4.1 * update hashtable. test=develop * fix. test=develop * update hashtable 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * Adapt XPUPS - update by kp compilation - 10th version - 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * update. test=develop * modify by compilation 4.1 * update. test=develop * update. test=develop * fix. test=develop * modify by compilation 4.1 * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * modify by compilation 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * modify by compilation 4.1 19:30 * fix. test=develop * update ps_gpu_wrapper.kps 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * Adapt XPUPS - update by kp compilation - 11th version - 4.1 * fix. test=develop * Adapt XPUPS - update by kp compilation - 12nd version - 4.2 * fix. test=develop * fix. test=develop * modify by compilation 4.2 * 4.2 update * fix. test=develop * template init. test=develop * update 4.6 * fix. test=develop * template init. test=develop * 4.6 modify by compilation * hashtable template init. test=develop * hashtable template init. test=develop * fix. test=develop * fix. test=develop * fix. test=devlop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=devlop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * Adapt XPUPS - update by kp compilation - 13nd version - 4.7 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * 4.11 update * fix. test=develop * fix. test=develop * 4.11 update * update by pre-commit * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * 4.12 update * fix. test=develop * Adapt XPUPS - update by kp compilation - 14th version - 4.13 * 4.13 update * 4.14 update * 4.14 update * 4.14 update * 4.14 modify by merged latest compilation * retry CI 4.14 * 4.15 pass static check * 4.15 modify by gpups CI * 3.16 update by gpups CI - modify ps_gpu_wrapper.h * 4.16 update * 4.16 pass xpu compile * 4.16 retry CI * 4.16 update * Adapt XPUPS - adapt BKCL comm for XPUPS - 4.24 * update by compilation * Adapt XPUPS - register PSGPUTrainer for XPUPS - 4.25 * update device_worker_factory * Adapt XPUPS - split heter_ps into .cu and .cc - 4.27 * Adapt XPUPS - register pull_box_sparse op under XPU_KP - 4.28 * update * 5.7 modify ps_gpu_wrapper pull_sparse * 5.11 update ps_gpu_wrapper CopyKeysKernel * 5.13 modify calc_shard_offset_kernel & fill_shard_key_kernel * modify fill_dvals_kernel & PullCopy & c_sync_calc_stream - 5.18 * modify PushCopy & fill_shard_grads_kernel & register push_box_sparse - 5.19 * Adapt XPUPS - modify BKCL comm op register - 5.26 * Adapt XPUPS - modify BKCL comm op register - 5.27 * Adapt XPUPS - modify BKCL comm op register - 5.27v2 * Adapt XPUPS - modify BKCL comm op register - 5.27v3 * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init - 5.30 * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init v2 - 5.30 * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init v3 - 5.30 * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init v4 - 5.31 Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>
-
由 Tomasz Socha 提交于
* Fix bfloat16 placement pass * Make it nicer * Fix leftovers * Style
-
由 Zhang Zheng 提交于
* Support head_dim = 96 in fused_multi_transformer in PLATO-XL * add notes
-
由 光明和真理 提交于
Co-authored-by: Nliupeiyu <liupeiyu@cambricon.com>
-
由 Chenxiao Niu 提交于
-
由 ziyoujiyi 提交于
* back fl * delete ssl cert * . * make warning * . * unittest paral degree * solve unittest * heter & multi cloud commm ready * . * . * fl-ps v1.0 * . * support N + N mode * . * . * . * . * delete print * . * . * . * .
-
由 Wangzheee 提交于
* new general transformer inference support
-
由 Zhang Zheng 提交于
* Delete inplace strategy in group_norm_fwd * fix
-
由 wanghuancoder 提交于
* first run accumulation node
-
由 Siming Dai 提交于
* support heter reindex * add unittest, fix bug * add comment * delete empty line * refine example * fix codestyle * add disable static
-
由 Jackwaterveg 提交于
* fix usage of prefetch_factor * add assert * add docstring and change prefetch_factor when num_workers=0 * fix doc
-
由 Guoxia Wang 提交于
-
由 Li Min 提交于
* extend forward fast_ln_kernel to support more column values.
-
由 zhaoyingli 提交于
* prepare only once
-
由 zhaoyingli 提交于
-
由 sneaxiy 提交于
* support CUDAGraph for partial graph * add ut * fix ci * fix ut again because of eager mode * fix kunlun ci * fix win ci
-
- 01 6月, 2022 22 次提交
-
-
由 xiongkun 提交于
-
由 YuanRisheng 提交于
* add yaml * fix infrt compile bugs
-
由 Aganlengzi 提交于
-
由 Qi Li 提交于
-
由 BrilliantYuKaimin 提交于
* Update random.py * test=document_fix * test=document_fix * Update random.py
-
由 Guoxia Wang 提交于
-
由 sneaxiy 提交于
* support weight transpose * add ut * add template * fix transpose error * fix transpose_comment * add api tests * add skipif * add doc
-
由 YUNSHEN XIE 提交于
-
由 zhouweiwei2014 提交于
-
由 Sing_chan 提交于
-
由 JZ-LIANG 提交于
* adapt for 10 loss * partitioner support optimizer
-
由 BrilliantYuKaimin 提交于
-
由 houj04 提交于
* update xpu cmake: xdnn 0527. test=kunlun * update to xdnn 0531. * update to xdnn 0531. test=kunlun * update to xdnn 0601. test=kunlun
-
由 zhangchunle 提交于
unittest parallel Co-authored-by: Nzhangbo9674 <zhangbo54@baidu.com>
-
由 Ruibiao Chen 提交于
* Add pinned memory to HostMemoryStats * Add macro for WrapStatAllocator * Fix CI errors
-
由 zhiboniu 提交于
-
由 Guoxia Wang 提交于
* fix the bug of adamw which set the attribute in param group not working * fix undefined variable * fix api example typo * add unittest * fix unittest typo
-
由 huzhiqiang 提交于
-
由 caozhou 提交于
-
由 Yulong Ao 提交于
* [Auto Parallel] Add the parallel tuner * [Auto Parallel] Improve the parallel tuner and fix some bugs * upodate cost model * update import Resharder by dist op * update cost model * fix comp cost bug * update cost model * [Auto Parallel] Amend the dist attr for #processses=1 * update cost model and tuner * update cost model and tuner * update cost model and tuner * update cluster * update reshard * [Auto Parallel] Add the estimation from the cost model * [Auto Parallel] Reimplement the backup and restore functions * [Auto Parallel] Fix the bugs of the parallel tuner * [Auto Parallel] Update the engine api and dist context * [Auto Parallel] Work around the high order grad problem * [Auto Parallel] Add some miscellaneous improvements * [Auto Parallel] Add a unittest for DistributedContext Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
-
由 chentianyu03 提交于
* add conv3d yaml * add conv3d_grad, conv3d_double_grad * add final_state_conv3d test case * add conv3d double test case * add depthwise_conv2d grad yaml * add depthwise_conv2d double grad test case * modify the order of args * add depthwise_conv2d_grad_grad config
-
- 31 5月, 2022 2 次提交
-
-
由 Sławomir Siwek 提交于
* remove attrs from base op * fix typos * remove brelu * undo removing code related to matmul * remove whitespaces * undo changes in matmul * remove empty line
-
由 pangyoki 提交于
* add double_grad and triple_grad inplace info in backward.yaml * only generate inplace api in forward
-