提交 · d17961edc0f32f640861db93ed2e8660062ba2b7 · Crayon鑫 / Paddle

01 3月, 2022 1 次提交
- S
  Optimize the CUDA kernel in DistributedFusedLamb optimizer (#39972) · d17961ed
  由 sneaxiy 提交于 3月 01, 2022
```
* vectorize lamb kernel

* remove flags, add ut

* remove useless codes

* refine code, add param order
```
  d17961ed
25 2月, 2022 1 次提交

Add MultiTensorApply to calculate L2-Norm in DistributedFusedLamb optimizer (#39900) · d32a0102

由 sneaxiy 提交于 2月 25, 2022

* add multi tensor apply l2 norm

* add multi_tensor_apply code

* make sizeof(TensorMeta) smalller

* move code to distributed_fused_lamb_op.cu

* remove useless FLAGS

d32a0102

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致