- 15 3月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
Admin merging for pure-doc PR that does not trigger build.
-
- 13 3月, 2021 2 次提交
-
-
由 Cheng Li 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Jeff Rasley 提交于
-
- 12 3月, 2021 4 次提交
-
-
由 Stas Bekman 提交于
* fix log(0) & 1/log(1) bugs * simplify Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NCheng Li <pistasable@gmail.com>
-
由 Olatunji Ruwase 提交于
* Control ZeRO wall clock timers * Disable more ZeRO3 debug prints Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Stas Bekman 提交于
-
由 Cheng Li 提交于
* add optimizers and schedules to rtd * update ds website and fix links * add optimizers and schedules to rtd * update ds website and fix links * add flops profiler to rtd * fix Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
-
- 11 3月, 2021 3 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Jeff Rasley 提交于
-
由 Shaden Smith 提交于
-
- 10 3月, 2021 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
- 09 3月, 2021 7 次提交
-
-
由 Samyam Rajbhandari 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Samyam Rajbhandari 提交于
* Squash stage3 v1 (#146) Co-authored-by: NSamyam <samyamr@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> * Fix correctness bug (#147) * formatting fix (#150) * stage3 bugfix (API) update and simplified FP16 Z3 tests (#151) * fp16 Z3 API update and bugfix * revert debug change * ZeRO-3 detach and race condition bugfixes (#149) * trying out ZeRO-3 race condition fix * CUDA sync instead of stream * reduction stream sync * remove commented code * Fix optimizer state_dict KeyError (#148) Co-authored-by: NJeff Rasley <jerasley@microsoft.com> * fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152) * Simplifying the logic for getting averaged gradients (#153) * skip for now * Z3 Docs redux (#154) * removing some TODOs and commented code (#155) * New Z3 defaults (#156) Co-authored-by: NJeff Rasley <jerasley@microsoft.com> * formatting * megatron external params Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com>
-
由 Olatunji Ruwase 提交于
-
- 04 3月, 2021 1 次提交
-
-
由 Reza Yazdani 提交于
* fixing buffers in transformer kernel when gelu-checkpoint is enabled * fixing the test issue for other memory optimization flags * fixing a bug for when attn_dropout_checkpoint is enabled
-
- 01 3月, 2021 1 次提交
-
-
由 zmx 提交于
hi, i take a look at the code of column_sum_reduce, i have 2 questions: 1. the goal of column_sum_reduce is to get the column sum of inp matrix with shape[rows, width] and the result shape should be [width],right ? It seems that the judgment condition of pos is not suitable 2. the implementation of cuda kernel based on the asumption that, the thread with same threadIdx.y will group into a thread_block_tile, the blockDim is (32,32), i read the nvidia document https://on-demand.gputechconf.com/gtc/2017/presentation/s7622-Kyrylo-perelygin-robust-and-scalable-cuda.pdf, THREAD BLOCK TILE is a subset of threads of a thread block, divided into tiles in row-major order. doesn't it mean thread with the same threadIdx.x will group into a thread_block_tile ? thanks !!!! Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
-
- 27 2月, 2021 3 次提交
-
-
由 vfdev 提交于
-
由 Stas Bekman 提交于
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 26 2月, 2021 1 次提交
-
-
由 vfdev 提交于
-
- 25 2月, 2021 2 次提交
-
-
由 Reza Yazdani 提交于
* fix the bias-add precision and indexing and also adding the layer-norm-eps as a configurable parameter for transformer * add ACC_HALF config * use defined to check if ACC_Half is defined
-
由 Reza Yazdani 提交于
-
- 21 2月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
Invalid param name Thanks.
-
- 19 2月, 2021 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 18 2月, 2021 3 次提交
-
-
由 Conglong Li 提交于
-
由 Jeff Rasley 提交于
-
由 Takuya Makino 提交于
-
- 17 2月, 2021 2 次提交
-
-
由 Olatunji Ruwase 提交于
* Fix docstring * Make screenshots clickable for easier viewing * Navigation menu in alphabetical order; More clicable screenshots * Rename 1Cycle doc * Tweak naming
-
由 Cheng Li 提交于
* check none tensors when splitting buckets
-
- 13 2月, 2021 4 次提交
-
-
由 Olatunji Ruwase 提交于
* Activation checkpoint support for non tensor input/output * Format fixes * Address PR comments; Add ordering edge case tests
-
由 Jeff Rasley 提交于
* add -e/--examples flag to checkout submodules * bump DSE commit
-
由 Stas Bekman 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Sean Naren 提交于
* Use log dist function instead of print * Expose ranks Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 12 2月, 2021 1 次提交
-
-
由 Conglong Li 提交于
* 1-bit adam doc fix * 1-bit adam doc fix * 1-bit adam doc fix Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-