Optimize Softmax Kernel (#3112) (e73de8ce) · 提交 · Greenplum / DeepSpeed

未验证 提交 e73de8ce 编写于 4月 04, 2023 作者: M Molly Smith 提交者： GitHub 4月 05, 2023

Optimize Softmax Kernel (#3112)

* Simplify kernel

* Coalesce memory attempt 1. Logits divergence.

* Logits fix?

* sync after every global mem access

* template on iterations. Down to 8.3% cuda time for 8k tokens

* Up to 64 iterations

* Add alibi/mask check

* fp32

* Revert builder.py

* naming. precommit

* Revert "naming. precommit"

This reverts commit 150eb7d9.

* naming. spacing

* Spacing. simplify checks

* remove bsyncs

* missed bsyncs

* precommit

上级 f2c9a827

展开全部隐藏空白更改

内联并排

浏览文件 @ e73de8ce

此差异已折叠。

想要评论请注册或

Greenplum / DeepSpeed 上一次同步 大约 1 年

Optimize Softmax Kernel (#3112)

Greenplum / DeepSpeed
上一次同步大约 1 年