bump DSE and doc tweak

4b1df25a · Jeff Rasley · Ubuntu · 240ea97b · b989b41b · a0a80fcc
Showing with 3 addition and 3 deletion

DeepSpeedExamples DeepSpeedExamples +1 -1

docs/_posts/2020-09-09-sparse-attention.md docs/_posts/2020-09-09-sparse-attention.md +1 -1

docs/_tutorials/sparse-attention.md docs/_tutorials/sparse-attention.md +1 -1

未找到文件。
--- a/DeepSpeedExamples @ b989b41b
+++ b/DeepSpeedExamples @ b989b41b
-Subproject commit a0a80fcc010be54dca1710d71436859eabc52c0c
+Subproject commit b989b41b526db164611bedd3e73c09b8c2c5cbfc
--- a/docs/_posts/2020-09-09-sparse-attention.md
+++ b/docs/_posts/2020-09-09-sparse-attention.md
@@ -25,7 +25,7 @@ To learn more about Sparsity Config, and also how to use this library, please ch
 ## Performance Results

 * **Power over 10x longer sequences**
-In a pre-training experiment, we ran BERT model under three settings: dense, dense with activation checkpoint, and sparse (SA) with activation checkpoint. SA empowers 10x and 16x longer sequences comparing with dense for BERT base and large, respectively. Following figure shows the longest sequence length runnable in BERT base and large model; experiment is performed with batch size 1 on a single Nvidia V100 GPU-32GB memory.
+In a pre-training experiment, we ran BERT model under three settings: dense, dense with activation checkpoint, and sparse (SA) with activation checkpoint. SA empowers 10x and 16x longer sequences comparing with dense for BERT base and large, respectively. Following figure shows the longest sequence length runnable in BERT base and large model; experiment is performed with batch size 1 on a single NVIDIA V100 GPU-32GB memory.

 ![Maximum sequence runnable on BERT](/assets/images/sa_maximum_sequence_runnable_on_bert.png){: .align-center}


--- a/docs/_tutorials/sparse-attention.md
+++ b/docs/_tutorials/sparse-attention.md
@@ -4,7 +4,7 @@ title: "DeepSpeed  Sparse Attention"

 In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is through DeepSpeed launcher. We will describe this through an example in [How to use sparse attention with DeepSpeed launcher](/tutorials/sparse-attention/#how-to-use-sparse-attention-with-deepspeed-launcher) section. But before that, we introduce modules provided by DeepSpeed SA in the [next](/tutorials/sparse-attention/#sparse-attention-modules) section.

-**Note:** Currently DeepSpeed Sparse Attention can be used only on Nvidia V100 GPU using Torch >= 1.5 and Cuda 10.1 or 10.2.
+**Note:** Currently DeepSpeed Sparse Attention can be used only on NVIDIA V100 GPU using Torch >= 1.5 and Cuda 10.1 or 10.2.
 {: .notice--warning}

 ## Sparse attention modules