Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Greenplum
DeepSpeed
提交
4ae3a3da
D
DeepSpeed
项目概览
Greenplum
/
DeepSpeed
上一次同步 大约 1 年
通知
10
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeed
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
未验证
提交
4ae3a3da
编写于
3月 01, 2023
作者:
M
Molly Smith
提交者:
GitHub
3月 01, 2023
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
TP unsupported models and assertions (#2810)
Co-authored-by:
N
Jeff Rasley
<
jerasley@microsoft.com
>
上级
8d53ac0c
变更
2
显示空白变更内容
内联
并排
Showing
2 changed file
with
17 addition
and
2 deletion
+17
-2
deepspeed/module_inject/auto_tp.py
deepspeed/module_inject/auto_tp.py
+14
-1
docs/_tutorials/automatic-tensor-parallelism.md
docs/_tutorials/automatic-tensor-parallelism.md
+3
-1
未找到文件。
deepspeed/module_inject/auto_tp.py
浏览文件 @
4ae3a3da
...
...
@@ -27,13 +27,25 @@ class AutoTP():
return
mlist
def
supported
(
model
):
unsupported
=
[
'bloom'
,
'codegen'
,
'flaubert'
,
'xlm'
]
unsupported
=
[
'bloom'
,
'codegen'
,
'deberta'
,
'flaubert'
,
'fsmt'
,
'gpt2'
,
'led'
,
'longformer'
,
'xlm'
,
'xlnet'
]
model
=
str
(
model
)
key
=
re
.
search
(
r
": (.*?)Model"
,
model
)
if
key
is
None
:
key
=
re
.
search
(
r
": (.*?)Stack"
,
model
)
if
key
is
None
:
key
=
re
.
match
(
r
"(.*?)Model"
,
model
)
assert
key
is
not
None
,
"Not able to determine model policy automatically. Please provide policy."
if
key
.
group
(
1
).
lower
()
in
unsupported
:
return
False
return
True
...
...
@@ -91,4 +103,5 @@ class AutoTP():
gem_list
=
list
(
set
(
gem_list
))
policy_list
=
AutoTP
.
update_policy_list
(
policy_list
,
module
,
gem_list
)
gem_list
=
[]
assert
len
(
policy_list
),
"Not able to determine model policy automatically. Please provide policy."
return
policy_list
docs/_tutorials/automatic-tensor-parallelism.md
浏览文件 @
4ae3a3da
...
...
@@ -88,6 +88,7 @@ deepspeed --num_gpus <num_gpus> DeepSpeedExamples/inference/huggingface/text-gen
The following results were collected using V100 SXM2 32GB GPUs.
### Max New Tokens = 50
| Test | Memory Allocated per GPU | Max Batch Size | Max Throughput per GPU |
| ---------- | -------------------------- | ---------------- | ------------------------ |
| No TP | 23.94 GB | 64 | 18.84 TFlops |
...
...
@@ -95,6 +96,7 @@ The following results were collected using V100 SXM2 32GB GPUs.
| 4 GPU TP | 6.36 GB | 664 | 27.63 TFlops |
### Max New Tokens = 1024
| Test | Memory Allocated per GPU | Max Batch Size | Max Throughput per GPU |
| ---------- | -------------------------- | ---------------- | ------------------------ |
| No TP | 23.94 GB | 2 | 1.65 TFlops |
...
...
@@ -113,7 +115,6 @@ The following model families have been successfully tested with automatic tensor
-
electra
-
ernie
-
esm
-
gpt2
-
gpt-j
-
gpt-neo
-
gpt-neox
...
...
@@ -146,6 +147,7 @@ The following models are not currently supported with automatic tensor paralleli
-
deberta
-
flaubert
-
fsmt
-
gpt2
-
led
-
longformer
-
xlm
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录