Fix datatype issue with sparse attention softmax (#363)

Fixes a dataype issue with softmax where the number of blocks being sent to the Triton kernel source was a torch.Tensor but should have been a python integer. On some environments (e.g., conda) this resulted in triton not knowing how to serialize the input (and crashing in our tests). Once switching to the correct datatype that triton expects this seems to have solved the issue. Co-authored-by: N Shaden Smith <Shaden.Smith@microsoft.com>

Fix datatype issue with sparse attention softmax (#363)
Fixes a dataype issue with softmax where the number of blocks being sent to the Triton kernel source was a torch.Tensor but should have been a python integer. On some environments (e.g., conda) this resulted in triton not knowing how to serialize the input (and crashing in our tests). Once switching to the correct datatype that triton expects this seems to have solved the issue. Co-authored-by: N Shaden Smith <Shaden.Smith@microsoft.com>
dca0b784 · Jeff Rasley · GitHub · 093f09ff · dca0b784
隐藏空白更改
内联并排

Showing with 1 addition and 1 deletion

deepspeed/ops/sparse_attention/softmax.py deepspeed/ops/sparse_attention/softmax.py +1 -1

未找到文件。
--- a/deepspeed/ops/sparse_attention/softmax.py
+++ b/deepspeed/ops/sparse_attention/softmax.py
@@ -234,7 +234,7 @@ class Softmax:
             bench: optional: set if you want to do benchmarking
        """

-        self.num_blocks = layout.sum()
+        self.num_blocks = layout.sum().item()
        self.spdims = layout.shape
        self.layout = layout
        self.block = block