Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Greenplum
DeepSpeed
提交
c26b7c73
D
DeepSpeed
项目概览
Greenplum
/
DeepSpeed
上一次同步 大约 1 年
通知
10
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DeepSpeed
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
c26b7c73
编写于
8月 22, 2023
作者:
L
Logan Adams
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Whitespace and PR feedback
上级
61e6d069
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
3 addition
and
12 deletion
+3
-12
tests/unit/inference/test_inference.py
tests/unit/inference/test_inference.py
+3
-12
未找到文件。
tests/unit/inference/test_inference.py
浏览文件 @
c26b7c73
...
@@ -384,7 +384,7 @@ class TestMPSize(DistributedTest):
...
@@ -384,7 +384,7 @@ class TestMPSize(DistributedTest):
@
pytest
.
mark
.
seq_inference
@
pytest
.
mark
.
seq_inference
@
pytest
.
mark
.
parametrize
(
"model_w_task"
,
[(
"
EleutherAI/gpt-j-6B"
,
"text-generation"
)],
ids
=
[
"gpt-j
"
])
@
pytest
.
mark
.
parametrize
(
"model_w_task"
,
[(
"
gpt2"
,
"text-generation"
)],
ids
=
[
"gpt2
"
])
class
TestLowCpuMemUsage
(
DistributedTest
):
class
TestLowCpuMemUsage
(
DistributedTest
):
world_size
=
1
world_size
=
1
...
@@ -399,25 +399,16 @@ class TestLowCpuMemUsage(DistributedTest):
...
@@ -399,25 +399,16 @@ class TestLowCpuMemUsage(DistributedTest):
model
,
task
=
model_w_task
model
,
task
=
model_w_task
local_rank
=
int
(
os
.
getenv
(
"LOCAL_RANK"
,
"0"
))
local_rank
=
int
(
os
.
getenv
(
"LOCAL_RANK"
,
"0"
))
tokenizer
=
AutoTokenizer
.
from_pretrained
(
model
)
pipe
=
pipeline
(
task
,
model
=
model
,
model_kwargs
=
{
"low_cpu_mem_usage"
:
True
},
device
=
local_rank
,
framework
=
"pt"
)
model
=
AutoModelForCausalLM
.
from_pretrained
(
model
,
low_cpu_mem_usage
=
True
)
# We have to load these large models on CPU with pipeline because not
# enough GPU memory
pipe
=
pipeline
(
task
,
model
=
model
,
tokenizer
=
tokenizer
,
device
=-
1
,
framework
=
"pt"
)
bs_output
=
pipe
(
query
,
**
inf_kwargs
)
bs_output
=
pipe
(
query
,
**
inf_kwargs
)
pipe
.
model
=
deepspeed
.
init_inference
(
pipe
.
model
,
pipe
.
model
=
deepspeed
.
init_inference
(
pipe
.
model
,
mp_size
=
self
.
world_size
,
mp_size
=
self
.
world_size
,
dtype
=
dtype
,
dtype
=
dtype
,
replace_method
=
"auto"
,
replace_method
=
"auto"
,
replace_with_kernel_inject
=
True
)
replace_with_kernel_inject
=
True
)
# Switch device to GPU so that input tensors are not on CPU
pipe
.
device
=
torch
.
device
(
f
"cuda:
{
local_rank
}
"
)
ds_output
=
pipe
(
query
,
**
inf_kwargs
)
ds_output
=
pipe
(
query
,
**
inf_kwargs
)
print
(
local_rank
,
"baseline"
,
bs_output
)
print
(
local_rank
,
"deepspeed"
,
ds_output
)
assert
assert_fn
(
bs_output
,
ds_output
)
assert
assert_fn
(
bs_output
,
ds_output
)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录