paddle/fluid/distributed/collective/reducer.cc · 6baeb2d1066b58be0f64d3f864b6e3aea0f5974d · PaddlePaddle / Paddle

Support BF16 training for sharding (#46846) · 0b39b244

由 Ghost Screaming 提交于 10月 17, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

0b39b244

reducer.cc 43.3 KB

PaddlePaddle / Paddle 大约 1 年 前同步成功

Replace reducer.cc

PaddlePaddle / Paddle
大约 1 年前同步成功