- 09 6月, 2022 1 次提交
-
-
由 crystal 提交于
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 08 6月, 2022 4 次提交
-
-
由 Aganlengzi 提交于
-
由 YuanRisheng 提交于
* move_group_norm * move group norm backward * fix code format * modify code according comment
-
由 fwenguang 提交于
-
由 Yiqun Liu 提交于
* Polish codes and memory usage for fused_gate_attention. * Fix wrong reduce_dims in fused_gate_attention when computing gradient of nonbatched_bias.
-
- 07 6月, 2022 8 次提交
-
-
由 Sławomir Siwek 提交于
* add method for post ops * format code * change post-ops pattern * code style
-
由 shixingbo 提交于
-
由 sneaxiy 提交于
* add use_master_acc_grad * add ut
-
由 qipengh 提交于
* [MLU]support cast double type * [MLU]fix cast test
-
由 limingshu 提交于
Transpose optimization with assitant of Chengdu Supercomputing Center and auto_tune operation (#42704)
-
由 niuliling123 提交于
-
由 sneaxiy 提交于
-
由 Zhang Zheng 提交于
-
- 06 6月, 2022 1 次提交
-
-
由 niuliling123 提交于
-
- 05 6月, 2022 2 次提交
- 04 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 02 6月, 2022 9 次提交
-
-
由 Leo Guo 提交于
* Add generate_proposals_v2 op and unittest for kunlun. *test=kunlun * Add the assign op to xpu2_op_list and expand the function of gather op. Add the unit-test of generate_proposals_v2. *test=kunlun
-
由 Fan Zhang 提交于
* Adapt XPUPS - 1st version - 3.24 * Adapt XPUPS - update XPU PushSparse - 2nd version - 3.24 * Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25 * refactor heter comm kernel * update. test=develop * Adapt XPUPS - modify by compilation - 4th version - 3.27 * update calc_shard_offset. test=develop * update xpu kernel. test=develop * update args of calc_shard_offset * update. test=develop * remove customGradMerger * update. test=develop * heter_comm update * heter_comm update * update calc_shard_offset. test=develop * heter_comm update * update args of calc_shard_offset * update. test=develop * remove customGradMerger * update. test=develop * fix. test=develop * update. test=develop * update. test=develop * update optimizer kernel * Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30 * update. test=develop * update pslib.cmake * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * Adapt XPUPS - modify by kp compilation - 6th version - 3.30 * update. test=develop * update. test=develop * update. test=develop * update optimizer kernel * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * used by minxu * update heter_comm_inl * fix. test=develop * Adapt XPUPS - modify by kp compilation - 7th version - 3.30 * fix. test=develop * add optimizer kernel. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * 3.31 update * Adapt XPUPS - update kp compilation path - 8th version - 3.31 * add optimizer kernel. test=develop * fix kunlun not support size_t. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix kunlun not support size_t. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * update heter_comm_kernel.kps 3.31 * fix. test=develop * fix. test=develop * update heter_comm_kernel.kps 3.31 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * update heter_comm.h 3.31 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * update hashtable. test=develop * update. test=develop * Adapt XPUPS - update by kp compilation - 9th version - 4.1 * update hashtable. test=develop * fix. test=develop * update hashtable 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * Adapt XPUPS - update by kp compilation - 10th version - 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * update. test=develop * modify by compilation 4.1 * update. test=develop * update. test=develop * fix. test=develop * modify by compilation 4.1 * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * modify by compilation 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * modify by compilation 4.1 19:30 * fix. test=develop * update ps_gpu_wrapper.kps 4.1 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * Adapt XPUPS - update by kp compilation - 11th version - 4.1 * fix. test=develop * Adapt XPUPS - update by kp compilation - 12nd version - 4.2 * fix. test=develop * fix. test=develop * modify by compilation 4.2 * 4.2 update * fix. test=develop * template init. test=develop * update 4.6 * fix. test=develop * template init. test=develop * 4.6 modify by compilation * hashtable template init. test=develop * hashtable template init. test=develop * fix. test=develop * fix. test=develop * fix. test=devlop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=devlop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * Adapt XPUPS - update by kp compilation - 13nd version - 4.7 * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * 4.11 update * fix. test=develop * fix. test=develop * 4.11 update * update by pre-commit * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * 4.12 update * fix. test=develop * Adapt XPUPS - update by kp compilation - 14th version - 4.13 * 4.13 update * 4.14 update * 4.14 update * 4.14 update * 4.14 modify by merged latest compilation * retry CI 4.14 * 4.15 pass static check * 4.15 modify by gpups CI * 3.16 update by gpups CI - modify ps_gpu_wrapper.h * 4.16 update * 4.16 pass xpu compile * 4.16 retry CI * 4.16 update * Adapt XPUPS - adapt BKCL comm for XPUPS - 4.24 * update by compilation * Adapt XPUPS - register PSGPUTrainer for XPUPS - 4.25 * update device_worker_factory * Adapt XPUPS - split heter_ps into .cu and .cc - 4.27 * Adapt XPUPS - register pull_box_sparse op under XPU_KP - 4.28 * update * 5.7 modify ps_gpu_wrapper pull_sparse * 5.11 update ps_gpu_wrapper CopyKeysKernel * 5.13 modify calc_shard_offset_kernel & fill_shard_key_kernel * modify fill_dvals_kernel & PullCopy & c_sync_calc_stream - 5.18 * modify PushCopy & fill_shard_grads_kernel & register push_box_sparse - 5.19 * Adapt XPUPS - modify BKCL comm op register - 5.26 * Adapt XPUPS - modify BKCL comm op register - 5.27 * Adapt XPUPS - modify BKCL comm op register - 5.27v2 * Adapt XPUPS - modify BKCL comm op register - 5.27v3 * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init - 5.30 * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init v2 - 5.30 * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init v3 - 5.30 * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init v4 - 5.31 Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>
-
由 Zhang Zheng 提交于
* Support head_dim = 96 in fused_multi_transformer in PLATO-XL * add notes
-
由 光明和真理 提交于
Co-authored-by: Nliupeiyu <liupeiyu@cambricon.com>
-
由 Chenxiao Niu 提交于
-
由 Zhang Zheng 提交于
* Delete inplace strategy in group_norm_fwd * fix
-
由 Guoxia Wang 提交于
-
由 Li Min 提交于
* extend forward fast_ln_kernel to support more column values.
-
由 sneaxiy 提交于
* support CUDAGraph for partial graph * add ut * fix ci * fix ut again because of eager mode * fix kunlun ci * fix win ci
-
- 01 6月, 2022 2 次提交
-
-
由 Guoxia Wang 提交于
-
由 sneaxiy 提交于
* support weight transpose * add ut * add template * fix transpose error * fix transpose_comment * add api tests * add skipif * add doc
-
- 31 5月, 2022 7 次提交
-
-
由 Sławomir Siwek 提交于
* remove attrs from base op * fix typos * remove brelu * undo removing code related to matmul * remove whitespaces * undo changes in matmul * remove empty line
-
由 cambriconhsq 提交于
-
由 Aganlengzi 提交于
* fix arg_max and reduce_max * add arg_max ut
-
由 thunder95 提交于
* rrelu逻辑部分 * unregistered op kernel (unresolved) * commit before merge * 丰富测试用例 * 修复rrelu-sig的bug * 修复cpu环境测试 * 修改拼写错误 * 修改code format * 尝试优化测试用例timeout的问题 * 优化测试用例 * 移除seed, 优化随机函数 * update en doc for rrelu * fix rrelu en docs, test=document_fix * add paper link for en docs, test=document_fix * udpate en doc * add r,test=document_fix
-
由 Leo Chen 提交于
Co-authored-by: NRyan Jeng <rjeng@nvidia.com>
-
由 Li Min 提交于
* replace dropout_is_test with is_test. * improve atol on a100.
-
由 jakpiase 提交于
OneDNN md-in-tensor refactoring part 5: Memory descriptor enabled for elementwises, reductions and expand_v2 ops (#43036) * enabled md in elementwises, reductions and expand_v2 * CI fix for invalid numpy copy * fixed formatting * CI rerun * changes after review
-
- 30 5月, 2022 5 次提交
-
-
由 Chenxiao Niu 提交于
-
由 Li Min 提交于
* add fused_bias_dropout_residual_ln op and layer.
-
由 crystal 提交于
-
由 thunder95 提交于
* nanmedian op * 修改cuda kernel的bug * 修复count_if在其他硬件平台不兼容 * 修复某些cpu硬件不兼容 * 修复某些cpu硬件不兼容 * 修复isnan判断 * 兼容numpy低版本不支持全部nan的情况 * 兼容numpy低版本不支持全部nan的情况 * fix code example * fix api comment error * 修改反向传播逻辑以及c++处理逻辑 * 完成修改建议 * typo pre_dim * update en docs, test=document_fix * remove numpy in en doc, test=document_fix * add r,test=document_fix * 添加api到all * follow advice from chenwhql
-
由 cambriconhsq 提交于
-