DS-Ulysses formating (#4204)

* fix identation * fix formatting --------- Co-authored-by: N Jeff Rasley <jerasley@microsoft.com>

DS-Ulysses formating (#4204)
* fix identation * fix formatting --------- Co-authored-by: N Jeff Rasley <jerasley@microsoft.com>
961827be · Sam Ade Jacobs · GitHub · 3e82cb64 · 961827be · 961827be
隐藏空白更改
内联并排

Showing with 3 addition and 4 deletion

README.md README.md +1 -1

blogs/deepspeed-ulysses/README.md blogs/deepspeed-ulysses/README.md +2 -3

未找到文件。
--- a/README.md
+++ b/README.md
 [![License Apache 2.0](https://badgen.net/badge/license/apache2.0/blue)](https://github.com/Microsoft/DeepSpeed/blob/master/LICENSE)
 [![PyPI version](https://badge.fury.io/py/deepspeed.svg)](https://pypi.org/project/deepspeed/)
-[![Downloads](https://static.pepy.tech/badge/deepspeed)](https://pepy.tech/project/deepspeed) 
+[![Downloads](https://static.pepy.tech/badge/deepspeed)](https://pepy.tech/project/deepspeed)
 [![Build](https://badgen.net/badge/build/check-status/blue)](#build-pipeline-status)
 [![Twitter](https://img.shields.io/twitter/follow/MSFTDeepSpeed)](https://twitter.com/intent/follow?screen_name=MSFTDeepSpeed)
 [![Japanese Twitter](https://img.shields.io/badge/%E6%97%A5%E6%9C%AC%E8%AA%9ETwitter-%40MSFTDeepSpeedJP-blue)](https://twitter.com/MSFTDeepSpeedJP)

--- a/blogs/deepspeed-ulysses/README.md
+++ b/blogs/deepspeed-ulysses/README.md
@@ -149,7 +149,7 @@ match this analysis.

 ### Additional Highlights of DeepSpeed-Ulysses

-1)  An Attention Agnostic Solution
+***An Attention Agnostic Solution***

 DeepSpeed implementation of distributed attention module is general
 enough to support any attention: e.g., self-attention, cross-attention,
@@ -165,8 +165,7 @@ per head but just with fewer heads, thus attention computation can be
 replaced with any type of attention mechanisms, e.g., dense attention
 and various forms of sparse attention.

-2)  Training Bigger Models with Longer Sequences through ZeRO-3
-    Integration
+***Training Bigger Models with Longer Sequences through ZeRO-3 Integration***

 While DeepSpeed sequence parallelism reduces the activation memory when
 training with longer sequences, it does not impact the memory consumed