-
由 Reza Yazdani 提交于
* add adamW to CPU-ADAM implementation * supporting cpu-adam optimizer for zero-offload on deepspeed side * bump DSE to match cpu-adam updates Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
f5aa2547
* add adamW to CPU-ADAM implementation
* supporting cpu-adam optimizer for zero-offload on deepspeed side
* bump DSE to match cpu-adam updates
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>