Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
MegEngine 天元
MegEngine
提交
b1baee60
MegEngine
项目概览
MegEngine 天元
/
MegEngine
1 年多 前同步成功
通知
404
Star
4705
Fork
582
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
MegEngine
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
提交
b1baee60
编写于
4月 07, 2021
作者:
M
Megvii Engine Team
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
feat(imperative/utils): add optimize-for-inference interface for opgraph
GitOrigin-RevId: 9f93f821905dc05e3968247129920a0a1d43712f
上级
86598c82
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
55 addition
and
37 deletion
+55
-37
imperative/python/megengine/utils/network.py
imperative/python/megengine/utils/network.py
+55
-37
未找到文件。
imperative/python/megengine/utils/network.py
浏览文件 @
b1baee60
...
@@ -11,7 +11,7 @@ import fnmatch
...
@@ -11,7 +11,7 @@ import fnmatch
import
itertools
import
itertools
import
re
import
re
from
collections
import
OrderedDict
from
collections
import
OrderedDict
from
typing
import
Dict
,
List
from
typing
import
Dict
,
List
,
Sequence
import
numpy
as
np
import
numpy
as
np
...
@@ -87,6 +87,58 @@ class Network:
...
@@ -87,6 +87,58 @@ class Network:
for
o
in
opr
.
outputs
:
for
o
in
opr
.
outputs
:
self
.
all_vars_map
[
o
.
var
.
id
]
=
o
self
.
all_vars_map
[
o
.
var
.
id
]
=
o
def
optimize_for_inference
(
self
,
dest_vars
,
**
kwargs
):
r
"""
Applies optimize_for_inference pass for operator graph.
:param dest_vars: list of output vars in the operator graph
:Keyword Arguments:
* enable_io16xc32 --
whether to use float16 for I/O between oprs and use
float32 as internal computation precision. Note the output var would be
changed to float16.
* enable_ioc16 --
whether to use float16 for both I/O and computation
precision.
* enable_hwcd4 --
whether to use NHWCD4 data layout. This is faster on some
OpenCL backend.
* enable_nchw88 --
whether to use NCHW88 data layout, currently
used in X86 AVX backend.
* enable_nchw44 --
whether to use NCHW44 data layout, currently
used in arm backend.
* enable_nchw44_dot --
whether to use NCHW44_dot data layout, currently
used in armv8.2+dotprod backend.
* enable_nchw4 --
whether to use NCHW4 data layout, currently
used in nvidia backend(based on cudnn).
* enable_nchw32 --
whether to use NCHW32 data layout, currently
used in nvidia backend with tensorcore(based on cudnn).
* enable_chwn4 --
whether to use CHWN4 data layout, currently
used in nvidia backend with tensorcore.
* enable_fuse_conv_bias_nonlinearity: whether to fuse conv+bias+nonlinearty
into one opr.
* enable_fuse_conv_bias_with_z: whether to fuse conv_bias with z
input for inference on nvidia backend(this optimization pass will
result in mismatch of the precision of output of training and
inference)
"""
if
not
isinstance
(
dest_vars
,
Sequence
):
dest_vars
=
[
dest_vars
]
dest_vars
=
list
(
G
.
VarNode
(
var
.
var
)
for
var
in
dest_vars
)
new_vars
=
G
.
optimize_for_inference
(
dest_vars
,
**
kwargs
)
return
list
(
self
.
_get_var
(
var
)
for
var
in
new_vars
)
def
dump
(
def
dump
(
self
,
self
,
file
,
file
,
...
@@ -126,42 +178,8 @@ class Network:
...
@@ -126,42 +178,8 @@ class Network:
:Keyword Arguments:
:Keyword Arguments:
* enable_io16xc32 --
See also :py:meth:`optimize_for_inference`.
whether to use float16 for I/O between oprs and use
float32 as internal computation precision. Note the output var would be
changed to float16.
* enable_ioc16 --
whether to use float16 for both I/O and computation
precision.
* enable_hwcd4 --
whether to use NHWCD4 data layout. This is faster on some
OpenCL backend.
* enable_nchw88 --
whether to use NCHW88 data layout, currently
used in X86 AVX backend.
* enable_nchw44 --
whether to use NCHW44 data layout, currently
used in arm backend.
* enable_nchw44_dot --
whether to use NCHW44_dot data layout, currently
used in armv8.2+dotprod backend.
* enable_nchw4 --
whether to use NCHW4 data layout, currently
used in nvidia backend(based on cudnn).
* enable_nchw32 --
whether to use NCHW32 data layout, currently
used in nvidia backend with tensorcore(based on cudnn).
* enable_chwn4 --
whether to use CHWN4 data layout, currently
used in nvidia backend with tensorcore.
* enable_fuse_conv_bias_nonlinearity: whether to fuse conv+bias+nonlinearty
into one opr.
* enable_fuse_conv_bias_with_z: whether to fuse conv_bias with z
input for inference on nvidia backend(this optimization pass will
result in mismatch of the precision of output of training and
inference)
"""
"""
self
.
_compile
()
self
.
_compile
()
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录