Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
毕竟曾有刹那
Mace
提交
12c4dace
Mace
项目概览
毕竟曾有刹那
/
Mace
与 Fork 源项目一致
Fork自
Xiaomi / Mace
通知
1
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
Mace
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
12c4dace
编写于
12月 11, 2018
作者:
L
liuqi
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Update the document about usage of ARM Linux
上级
51b14100
变更
12
隐藏空白更改
内联
并排
Showing
12 changed file
with
112 addition
and
784 deletion
+112
-784
docs/user_guide/advanced_usage.rst
docs/user_guide/advanced_usage.rst
+34
-43
docs/user_guide/basic_usage.rst
docs/user_guide/basic_usage.rst
+4
-3
docs/user_guide/devices/demo_device_nanopi.yml
docs/user_guide/devices/demo_device_nanopi.yml
+0
-3
mace/core/runtime/cpu/cpu_runtime.cc
mace/core/runtime/cpu/cpu_runtime.cc
+1
-31
mace/python/tools/memory_optimizer.py
mace/python/tools/memory_optimizer.py
+0
-350
tools/bazel.rc
tools/bazel.rc
+3
-5
tools/bazel_adb_run.py
tools/bazel_adb_run.py
+8
-8
tools/build-standalone-lib.sh
tools/build-standalone-lib.sh
+4
-4
tools/common.py
tools/common.py
+6
-6
tools/converter.py
tools/converter.py
+17
-47
tools/device.py
tools/device.py
+27
-34
tools/sh_commands.py
tools/sh_commands.py
+8
-250
未找到文件。
docs/user_guide/advanced_usage.rst
浏览文件 @
12c4dace
...
...
@@ -114,69 +114,60 @@ Advanced usage
--------------
There are three common advanced use cases:
- run your model on the embedded device
- run your model on the embedded device
(ARM LINUX)
- converting model to C++ code.
- tuning GPU kernels for a specific SoC.
Run you model on the embedded device
------------------
Run you model on the embedded device
(ARM Linux)
------------------
-----------------------------
MACE use ssh to connect embedded device, in this case we recommend you to push ``$HOME/.ssh/id_rsa.pub``
to your device ``$HOME/.ssh/authorized_keys``
The way to run your model on the ARM Linux is nearly same as with android, except you need specify a device config file.
.. code:: bash
cat ~/.ssh/id_rsa.pub | ssh -q {user}@{ip} "cat >> ~/.ssh/authorized_keys"
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --device_yml=/path/to/devices.yml
Th
is part will show you how to write your own device yaml config file.
Th
ere are two steps to do before run:
**Device yaml config file**
1. configure login without password
The way to run your model on the embedded device is nearly the same as run on android, except you need give a device yaml config file
.
MACE use ssh to connect embedded device, you should copy your public key to embedded device with the blow command
.
MACE get this yaml config via ``--device_yml`` argument, default config value is ``devices.yml``
, when the yaml config file is not found. we treat as there is no available arm linux device, give a message
and continue on other device such as plugged android phone.
* **Example**
.. code:: bash
Here is an device yaml config demo.
cat ~/.ssh/id_rsa.pub | ssh -q {user}@{ip} "cat >> ~/.ssh/authorized_keys"
.. literalinclude:: devices/demo_device_nanopi.yml
:language: yaml
2. write your own device yaml configuration file.
* **Configuration**
.. list-table::
:header-rows: 1
* **Example**
* - Options
- Usage
* - target_abis
- Device supported abis, you can get it via ``dpkg --print-architecture`` and
``dpkg --print-foreign-architectures`` command, if more than one abi is supported,
separate them by commas.
* - target_socs
- device soc, you can get it from device manual, we haven't found a way to get it in shell.
* - models
- device models full name, you can get via get ``lshw`` command (third party package, install it via your package manager).
see it's product value.
* - address
- Since we use ssh to connect device, ip address is required.
* - username
- login username, required.
* - password
- login password, optional when you can login into device without password
Here is an device yaml config demo.
.. literalinclude:: devices/demo_device_nanopi.yml
:language: yaml
.. note::
* **Configuration**
The detailed explanation is listed in the blow table.
Some command tools:
.. list-table::
:header-rows: 1
.. code:: bash
* - Options
- Usage
* - target_abis
- Device supported abis, you can get it via ``dpkg --print-architecture`` and
``dpkg --print-foreign-architectures`` command, if more than one abi is supported,
separate them by commas.
* - target_socs
- device soc, you can get it from device manual, we haven't found a way to get it in shell.
* - models
- device models full name, you can get via get ``lshw`` command (third party package, install it via your package manager).
see it's product value.
* - address
- Since we use ssh to connect device, ip address is required.
* - username
- login username, required.
# specify device yaml config file via --device_yml argument or put the file under working directory
python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --device_yml=/path/to/devices.yml
Convert model(s) to C++ code
--------------------------------
...
...
docs/user_guide/basic_usage.rst
浏览文件 @
12c4dace
...
...
@@ -246,13 +246,14 @@ to run and validate your model.
# Test model run time
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --round=100
# If you want to run model on specified arm linux device, you should put device config file in the working directory or run with flag `--device_yml`
python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --device_yml=/path/to/devices.yml --example
# Validate the correctness by comparing the results against the
# original model and framework, measured with cosine distance for similarity.
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --validate
# If you want to run model on specified arm linux device, you should put device config file in the working directory or run with flag `--device_yml`
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --device_yml=/path/to/devices.yml
* **benchmark**
benchmark and profile the model.
...
...
docs/user_guide/devices/demo_device_nanopi.yml
浏览文件 @
12c4dace
...
...
@@ -12,12 +12,9 @@ devices:
address
:
10.0.0.0
# login username
username
:
user
# login password, is required when you can login into device without password
password
:
1234567
raspberry
:
target_abis
:
[
armv7l
]
target_socs
:
BCM2837
models
:
Raspberry Pi 3 Model B Plus Rev
1.3
address
:
10.0.0.1
username
:
user
password
:
123456
mace/core/runtime/cpu/cpu_runtime.cc
浏览文件 @
12c4dace
...
...
@@ -42,7 +42,7 @@ struct CPUFreq {
};
namespace
{
#if defined(__ANDROID__)
int
GetCPUCount
()
{
int
cpu_count
=
0
;
std
::
string
cpu_sys_conf
=
"/proc/cpuinfo"
;
...
...
@@ -69,10 +69,8 @@ int GetCPUCount() {
VLOG
(
2
)
<<
"CPU cores: "
<<
cpu_count
;
return
cpu_count
;
}
#endif
int
GetCPUMaxFreq
(
std
::
vector
<
float
>
*
max_freqs
)
{
#if defined(__ANDROID__)
int
cpu_count
=
GetCPUCount
();
for
(
int
cpu_id
=
0
;
cpu_id
<
cpu_count
;
++
cpu_id
)
{
std
::
string
cpuinfo_max_freq_sys_conf
=
MakeString
(
...
...
@@ -94,34 +92,6 @@ int GetCPUMaxFreq(std::vector<float> *max_freqs) {
}
f
.
close
();
}
#else
std
::
string
cpu_sys_conf
=
"/proc/cpuinfo"
;
std
::
ifstream
f
(
cpu_sys_conf
);
if
(
!
f
.
is_open
())
{
LOG
(
ERROR
)
<<
"failed to open "
<<
cpu_sys_conf
;
return
-
1
;
}
std
::
string
line
;
const
std
::
string
freq_key
=
"cpu MHz"
;
while
(
std
::
getline
(
f
,
line
))
{
if
(
line
.
size
()
>=
freq_key
.
size
()
&&
line
.
compare
(
0
,
freq_key
.
size
(),
freq_key
)
==
0
)
{
size_t
pos
=
line
.
find
(
":"
);
if
(
pos
!=
std
::
string
::
npos
)
{
std
::
string
freq_str
=
line
.
substr
(
pos
+
1
);
float
freq
=
atof
(
freq_str
.
c_str
());
max_freqs
->
push_back
(
freq
);
}
}
}
if
(
f
.
bad
())
{
LOG
(
ERROR
)
<<
"failed to read "
<<
cpu_sys_conf
;
}
if
(
!
f
.
eof
())
{
LOG
(
ERROR
)
<<
"failed to read end of "
<<
cpu_sys_conf
;
}
f
.
close
();
#endif
for
(
float
freq
:
*
max_freqs
)
{
VLOG
(
2
)
<<
"CPU freq: "
<<
freq
;
...
...
mace/python/tools/memory_optimizer.py
已删除
100644 → 0
浏览文件 @
51b14100
# Copyright 2018 Xiaomi, Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
sys
import
operator
import
six
from
six.moves
import
reduce
from
mace.proto
import
mace_pb2
from
mace.python.tools.converter_tool
import
base_converter
as
cvt
from
mace.python.tools.converter_tool.base_converter
import
DeviceType
from
mace.python.tools.converter_tool.base_converter
import
ConverterUtil
from
mace.python.tools.converter_tool.base_converter
import
MaceKeyword
from
mace.python.tools.convert_util
import
calculate_image_shape
from
mace.python.tools.convert_util
import
OpenCLBufferType
def
MemoryTypeToStr
(
mem_type
):
if
mem_type
==
mace_pb2
.
CPU_BUFFER
:
return
'CPU_BUFFER'
elif
mem_type
==
mace_pb2
.
GPU_BUFFER
:
return
'GPU_BUFFER'
elif
mem_type
==
mace_pb2
.
GPU_IMAGE
:
return
'GPU_IMAGE'
else
:
return
'UNKNOWN'
class
MemoryBlock
(
object
):
def
__init__
(
self
,
mem_type
,
block
):
self
.
_mem_type
=
mem_type
self
.
_block
=
block
@
property
def
mem_type
(
self
):
return
self
.
_mem_type
@
property
def
block
(
self
):
return
self
.
_block
class
MemoryOptimizer
(
object
):
def
__init__
(
self
,
net_def
):
self
.
net_def
=
net_def
self
.
idle_mem
=
set
()
self
.
op_mem
=
{}
# op_name->mem_id
self
.
mem_block
=
{}
# mem_id->[size] or mem_id->[x, y]
self
.
total_mem_count
=
0
self
.
input_ref_counter
=
{}
self
.
mem_ref_counter
=
{}
ocl_mem_type_arg
=
ConverterUtil
.
get_arg
(
net_def
,
MaceKeyword
.
mace_opencl_mem_type
)
self
.
cl_mem_type
=
ocl_mem_type_arg
.
i
if
ocl_mem_type_arg
is
not
None
\
else
None
consumers
=
{}
for
op
in
net_def
.
op
:
if
not
self
.
op_need_optimize_memory
(
op
):
continue
for
ipt
in
op
.
input
:
if
ipt
not
in
consumers
:
consumers
[
ipt
]
=
[]
consumers
[
ipt
].
append
(
op
)
# only ref op's output tensor
for
op
in
net_def
.
op
:
if
not
self
.
op_need_optimize_memory
(
op
):
continue
for
output
in
op
.
output
:
tensor_name
=
output
if
tensor_name
in
consumers
:
self
.
input_ref_counter
[
tensor_name
]
=
\
len
(
consumers
[
tensor_name
])
else
:
self
.
input_ref_counter
[
tensor_name
]
=
0
def
op_need_optimize_memory
(
self
,
op
):
return
True
def
get_op_mem_block
(
self
,
op_type
,
output_shape
,
output_type
):
data_type_size
=
4
if
output_type
==
mace_pb2
.
DT_UINT8
:
data_type_size
=
1
return
MemoryBlock
(
mace_pb2
.
CPU_BUFFER
,
[
reduce
(
operator
.
mul
,
output_shape
,
1
)
*
data_type_size
])
def
mem_size
(
self
,
memory_block
):
return
memory_block
.
block
[
0
]
def
sub_mem_block
(
self
,
mem_block1
,
mem_block2
):
return
self
.
mem_size
(
mem_block1
)
-
self
.
mem_size
(
mem_block2
)
def
resize_mem_block
(
self
,
old_mem_block
,
op_mem_block
):
return
MemoryBlock
(
old_mem_block
.
mem_type
,
[
max
(
old_mem_block
.
block
[
0
],
op_mem_block
.
block
[
0
])])
def
add_net_mem_blocks
(
self
):
for
mem
in
self
.
mem_block
:
arena
=
self
.
net_def
.
mem_arena
block
=
arena
.
mem_block
.
add
()
block
.
mem_id
=
mem
block
.
device_type
=
DeviceType
.
CPU
.
value
block
.
mem_type
=
self
.
mem_block
[
mem
].
mem_type
block
.
x
=
self
.
mem_block
[
mem
].
block
[
0
]
block
.
y
=
1
def
get_total_origin_mem_size
(
self
):
origin_mem_size
=
0
for
op
in
self
.
net_def
.
op
:
if
not
self
.
op_need_optimize_memory
(
op
):
continue
origin_mem_size
+=
reduce
(
operator
.
mul
,
op
.
output_shape
[
0
].
dims
,
1
)
return
origin_mem_size
def
get_total_optimized_mem_size
(
self
):
optimized_mem_size
=
0
for
mem
in
self
.
mem_block
:
print
(
mem
,
MemoryTypeToStr
(
self
.
mem_block
[
mem
].
mem_type
),
self
.
mem_block
[
mem
].
block
)
optimized_mem_size
+=
self
.
mem_size
(
self
.
mem_block
[
mem
])
return
optimized_mem_size
@
staticmethod
def
is_memory_reuse_op
(
op
):
return
op
.
type
==
'Reshape'
or
op
.
type
==
'Identity'
\
or
op
.
type
==
'Squeeze'
or
op
.
type
==
'ExpandDims'
def
optimize
(
self
):
for
op
in
self
.
net_def
.
op
:
if
not
self
.
op_need_optimize_memory
(
op
):
continue
if
not
op
.
output_shape
:
six
.
print_
(
"WARNING: There is no output shape information to "
"do memory optimization. %s (%s)"
%
(
op
.
name
,
op
.
type
),
file
=
sys
.
stderr
)
return
if
len
(
op
.
output_shape
)
!=
len
(
op
.
output
):
six
.
print_
(
'WARNING: the number of output shape is '
'not equal to the number of output.'
,
file
=
sys
.
stderr
)
return
for
i
in
range
(
len
(
op
.
output
)):
if
self
.
is_memory_reuse_op
(
op
):
# make these ops reuse memory of input tensor
mem_id
=
self
.
op_mem
.
get
(
op
.
input
[
0
],
-
1
)
else
:
output_type
=
mace_pb2
.
DT_FLOAT
for
arg
in
op
.
arg
:
if
arg
.
name
==
'T'
:
output_type
=
arg
.
i
if
len
(
op
.
output_type
)
>
i
:
output_type
=
op
.
output_type
[
i
]
op_mem_block
=
self
.
get_op_mem_block
(
op
.
type
,
op
.
output_shape
[
i
].
dims
,
output_type
)
mem_id
=
-
1
if
len
(
self
.
idle_mem
)
>
0
:
best_mem_add_size
=
six
.
MAXSIZE
best_mem_waste_size
=
six
.
MAXSIZE
for
mid
in
self
.
idle_mem
:
old_mem_block
=
self
.
mem_block
[
mid
]
if
old_mem_block
.
mem_type
!=
op_mem_block
.
mem_type
:
continue
new_mem_block
=
self
.
resize_mem_block
(
old_mem_block
,
op_mem_block
)
add_mem_size
=
self
.
sub_mem_block
(
new_mem_block
,
old_mem_block
)
waste_mem_size
=
self
.
sub_mem_block
(
new_mem_block
,
op_mem_block
)
# minimize add_mem_size; if best_mem_add_size is 0,
# then minimize waste_mem_size
if
(
best_mem_add_size
>
0
and
add_mem_size
<
best_mem_add_size
)
\
or
(
best_mem_add_size
==
0
and
waste_mem_size
<
best_mem_waste_size
):
best_mem_id
=
mid
best_mem_add_size
=
add_mem_size
best_mem_waste_size
=
waste_mem_size
best_mem_block
=
new_mem_block
# if add mem size < op mem size, then reuse it
if
best_mem_add_size
<=
self
.
mem_size
(
op_mem_block
):
self
.
mem_block
[
best_mem_id
]
=
best_mem_block
mem_id
=
best_mem_id
self
.
idle_mem
.
remove
(
mem_id
)
if
mem_id
==
-
1
:
mem_id
=
self
.
total_mem_count
self
.
total_mem_count
+=
1
self
.
mem_block
[
mem_id
]
=
op_mem_block
if
mem_id
!=
-
1
:
op
.
mem_id
.
extend
([
mem_id
])
self
.
op_mem
[
op
.
output
[
i
]]
=
mem_id
if
mem_id
not
in
self
.
mem_ref_counter
:
self
.
mem_ref_counter
[
mem_id
]
=
1
else
:
self
.
mem_ref_counter
[
mem_id
]
+=
1
# de-ref input tensor mem
for
idx
in
six
.
moves
.
range
(
len
(
op
.
input
)):
ipt
=
op
.
input
[
idx
]
if
ipt
in
self
.
input_ref_counter
:
self
.
input_ref_counter
[
ipt
]
-=
1
if
self
.
input_ref_counter
[
ipt
]
==
0
\
and
ipt
in
self
.
op_mem
:
mem_id
=
self
.
op_mem
[
ipt
]
self
.
mem_ref_counter
[
mem_id
]
-=
1
if
self
.
mem_ref_counter
[
mem_id
]
==
0
:
self
.
idle_mem
.
add
(
self
.
op_mem
[
ipt
])
elif
self
.
input_ref_counter
[
ipt
]
<
0
:
raise
Exception
(
'ref count is less than 0'
)
self
.
add_net_mem_blocks
()
print
(
"total op: %d"
%
len
(
self
.
net_def
.
op
))
print
(
"origin mem: %d, optimized mem: %d"
%
(
self
.
get_total_origin_mem_size
(),
self
.
get_total_optimized_mem_size
()))
class
GPUMemoryOptimizer
(
MemoryOptimizer
):
def
op_need_optimize_memory
(
self
,
op
):
if
op
.
type
==
MaceKeyword
.
mace_buffer_transform
:
for
arg
in
op
.
arg
:
if
arg
.
name
==
'mode'
and
arg
.
i
==
0
:
return
False
return
op
.
type
!=
MaceKeyword
.
mace_buffer_inverse_transform
def
get_op_image_mem_block
(
self
,
op_type
,
output_shape
):
if
op_type
==
'WinogradTransform'
or
op_type
==
'MatMul'
:
buffer_shape
=
list
(
output_shape
)
+
[
1
]
mem_block
=
MemoryBlock
(
mace_pb2
.
GPU_IMAGE
,
calculate_image_shape
(
OpenCLBufferType
.
IN_OUT_HEIGHT
,
buffer_shape
))
elif
op_type
in
[
'Shape'
,
'InferConv2dShape'
,
'StridedSlice'
,
'Stack'
,
'ScalarMath'
]:
if
len
(
output_shape
)
==
1
:
mem_block
=
MemoryBlock
(
mace_pb2
.
CPU_BUFFER
,
[
output_shape
[
0
],
1
])
elif
len
(
output_shape
)
==
0
:
mem_block
=
MemoryBlock
(
mace_pb2
.
CPU_BUFFER
,
[
1
,
1
])
else
:
raise
Exception
(
'%s output shape dim size is not 0 or 1.'
%
op_type
)
else
:
if
len
(
output_shape
)
==
2
:
# only support fc/softmax
buffer_shape
=
[
output_shape
[
0
],
output_shape
[
1
]]
elif
len
(
output_shape
)
==
4
:
buffer_shape
=
output_shape
else
:
raise
Exception
(
'%s output shape dim size is not 2 or 4.'
%
op_type
)
mem_block
=
MemoryBlock
(
mace_pb2
.
GPU_IMAGE
,
calculate_image_shape
(
OpenCLBufferType
.
IN_OUT_CHANNEL
,
buffer_shape
))
return
mem_block
def
get_op_buffer_mem_block
(
self
,
output_shape
):
return
MemoryBlock
(
mace_pb2
.
GPU_BUFFER
,
[
reduce
(
operator
.
mul
,
output_shape
,
1
),
1
])
def
get_op_mem_block
(
self
,
op_type
,
output_shape
,
output_type
):
if
self
.
cl_mem_type
==
mace_pb2
.
GPU_IMAGE
:
return
self
.
get_op_image_mem_block
(
op_type
,
output_shape
)
else
:
return
self
.
get_op_buffer_mem_block
(
output_shape
)
def
mem_size
(
self
,
memory_block
):
if
memory_block
.
mem_type
==
mace_pb2
.
GPU_IMAGE
:
return
memory_block
.
block
[
0
]
*
memory_block
.
block
[
1
]
*
4
else
:
return
memory_block
.
block
[
0
]
def
resize_mem_block
(
self
,
old_mem_block
,
op_mem_block
):
resize_mem_block
=
MemoryBlock
(
old_mem_block
.
mem_type
,
[
max
(
old_mem_block
.
block
[
0
],
op_mem_block
.
block
[
0
]),
max
(
old_mem_block
.
block
[
1
],
op_mem_block
.
block
[
1
])
])
return
resize_mem_block
def
add_net_mem_blocks
(
self
):
max_image_size_x
=
0
max_image_size_y
=
0
for
mem
in
self
.
mem_block
:
arena
=
self
.
net_def
.
mem_arena
block
=
arena
.
mem_block
.
add
()
block
.
mem_id
=
mem
block
.
device_type
=
DeviceType
.
GPU
.
value
block
.
mem_type
=
self
.
mem_block
[
mem
].
mem_type
block
.
x
=
self
.
mem_block
[
mem
].
block
[
0
]
block
.
y
=
self
.
mem_block
[
mem
].
block
[
1
]
if
self
.
mem_block
[
mem
].
mem_type
==
mace_pb2
.
GPU_IMAGE
:
max_image_size_x
=
max
(
max_image_size_x
,
block
.
x
)
max_image_size_y
=
max
(
max_image_size_y
,
block
.
y
)
if
self
.
cl_mem_type
==
mace_pb2
.
GPU_IMAGE
:
# Update OpenCL max image size
net_ocl_max_img_size_arg
=
None
for
arg
in
self
.
net_def
.
arg
:
if
arg
.
name
==
cvt
.
MaceKeyword
.
mace_opencl_max_image_size
:
net_ocl_max_img_size_arg
=
arg
max_image_size_x
=
max
(
arg
.
ints
[
0
],
max_image_size_x
)
max_image_size_y
=
max
(
arg
.
ints
[
1
],
max_image_size_y
)
break
if
net_ocl_max_img_size_arg
is
None
:
net_ocl_max_img_size_arg
=
self
.
net_def
.
arg
.
add
()
net_ocl_max_img_size_arg
.
name
=
\
cvt
.
MaceKeyword
.
mace_opencl_max_image_size
net_ocl_max_img_size_arg
.
ints
[:]
=
[
max_image_size_x
,
max_image_size_y
]
def
optimize_gpu_memory
(
net_def
):
mem_optimizer
=
GPUMemoryOptimizer
(
net_def
)
mem_optimizer
.
optimize
()
def
optimize_cpu_memory
(
net_def
):
mem_optimizer
=
MemoryOptimizer
(
net_def
)
mem_optimizer
.
optimize
()
tools/bazel.rc
浏览文件 @
12c4dace
# Partially borrowed from tensorflow tools/bazel.rc
# By default, we don't distinct target and host platfroms.
# When doing cross compilation, use --config=cross_compile to distinct them.
build --distinct_host_configuration=false
build:cross_compile --distinct_host_configuration=true
build --verbose_failures
build --copt=-std=c++11
...
...
@@ -17,12 +15,12 @@ build --copt=-DMACE_USE_NNLIB_CAF
build:symbol_hidden --copt=-fvisibility=hidden
# Usage example: bazel build --config android
build:android --
config=cross_compil
e
build:android --
distinct_host_configuration=tru
e
build:android --crosstool_top=//external:android/crosstool
build:android --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
# Usage example: bazel build --config arm_linux_gnueabihf
build:arm_linux_gnueabihf --
config=cross_compil
e
build:arm_linux_gnueabihf --
distinct_host_configuration=tru
e
build:arm_linux_gnueabihf --crosstool_top=//tools/arm_compiler:toolchain
build:arm_linux_gnueabihf --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
build:arm_linux_gnueabihf --cpu=armeabi-v7a
...
...
@@ -34,7 +32,7 @@ build:arm_linux_gnueabihf --copt -Wno-sequence-point
build:arm_linux_gnueabihf --copt -Wno-implicit-fallthrough
# Usage example: bazel build --config aarch64_linux_gnu
build:aarch64_linux_gnu --
config=cross_compil
e
build:aarch64_linux_gnu --
distinct_host_configuration=tru
e
build:aarch64_linux_gnu --crosstool_top=//tools/aarch64_compiler:toolchain
build:aarch64_linux_gnu --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
build:aarch64_linux_gnu --cpu=aarch64
...
...
tools/bazel_adb_run.py
浏览文件 @
12c4dace
...
...
@@ -52,13 +52,13 @@ def ops_benchmark_stdout_processor(stdout, dev, abi):
metrics
[
"%s.input_mb_per_sec"
%
parts
[
0
]]
=
parts
[
3
]
metrics
[
"%s.gmacc_per_sec"
%
parts
[
0
]]
=
parts
[
4
]
platform
=
dev
[
YAMLKeyword
.
target_socs
]
model
=
dev
[
YAMLKeyword
.
models
]
tags
=
{
"ro.board.platform"
:
platform
,
"ro.product.model"
:
model
,
"abi"
:
abi
}
#
platform = dev[YAMLKeyword.target_socs]
# model = dev[YAMLKeyword.device_name
]
#
tags = {
#
"ro.board.platform": platform,
#
"ro.product.model": model,
#
"abi": abi
#
}
# sh_commands.falcon_push_metrics(server,
# metrics, tags=tags, endpoint="mace_ops_benchmark")
...
...
@@ -99,7 +99,7 @@ def parse_args():
parser
.
add_argument
(
"--stdout_processor"
,
type
=
str
,
default
=
"stdout_processor"
,
default
=
"
unittest_
stdout_processor"
,
help
=
"Stdout processing function, default: stdout_processor"
)
parser
.
add_argument
(
"--enable_neon"
,
...
...
tools/build-standalone-lib.sh
浏览文件 @
12c4dace
...
...
@@ -45,11 +45,11 @@ bazel build --config android --config optimization mace/libmace:libmace_dynamic
cp
bazel-bin/mace/libmace/libmace.so
$LIB_DIR
/arm64-v8a/cpu_gpu/
echo
"build shared lib for arm_linux_gnueabihf + cpu_gpu"
bazel build
--config
arm_linux_gnueabihf
--config
optimization mace/libmace:libmace_dynamic
--define
neon
=
true
--define
openmp
=
true
--define
opencl
=
true
bazel build
--config
arm_linux_gnueabihf
--config
optimization mace/libmace:libmace_dynamic
--define
neon
=
true
--define
openmp
=
true
--define
opencl
=
true
--define
quantize
=
true
cp
bazel-bin/mace/libmace/libmace.so
$LIB_DIR
/arm_linux_gnueabihf/cpu_gpu/
echo
"build shared lib for aarch64_linux_gnu + cpu_gpu"
bazel build
--config
aarch64_linux_gnu
--config
optimization mace/libmace:libmace_dynamic
--define
neon
=
true
--define
openmp
=
true
--define
opencl
=
true
bazel build
--config
aarch64_linux_gnu
--config
optimization mace/libmace:libmace_dynamic
--define
neon
=
true
--define
openmp
=
true
--define
opencl
=
true
--define
quantize
=
true
cp
bazel-bin/mace/libmace/libmace.so
$LIB_DIR
/aarch64_linux_gnu/cpu_gpu/
if
[[
"
$OSTYPE
"
!=
"darwin"
*
]]
;
then
...
...
@@ -73,11 +73,11 @@ bazel build --config android --config optimization mace/libmace:libmace_static -
cp
bazel-genfiles/mace/libmace/libmace.a
$LIB_DIR
/arm64-v8a/cpu_gpu/
echo
"build static lib for arm_linux_gnueabihf + cpu_gpu"
bazel build
--config
arm_linux_gnueabihf
--config
optimization mace/libmace:libmace_static
--config
symbol_hidden
--define
neon
=
true
--define
openmp
=
true
--define
opencl
=
true
bazel build
--config
arm_linux_gnueabihf
--config
optimization mace/libmace:libmace_static
--config
symbol_hidden
--define
neon
=
true
--define
openmp
=
true
--define
opencl
=
true
--define
quantize
=
true
cp
bazel-genfiles/mace/libmace/libmace.a
$LIB_DIR
/arm_linux_gnueabihf/cpu_gpu/
echo
"build static lib for aarch64_linux_gnu + cpu_gpu"
bazel build
--config
aarch64_linux_gnu
--config
optimization mace/libmace:libmace_static
--config
symbol_hidden
--define
neon
=
true
--define
openmp
=
true
--define
opencl
=
true
bazel build
--config
aarch64_linux_gnu
--config
optimization mace/libmace:libmace_static
--config
symbol_hidden
--define
neon
=
true
--define
openmp
=
true
--define
opencl
=
true
--define
quantize
=
true
cp
bazel-genfiles/mace/libmace/libmace.a
$LIB_DIR
/aarch64_linux_gnu/cpu_gpu/
if
[[
"
$OSTYPE
"
!=
"darwin"
*
]]
;
then
...
...
tools/common.py
浏览文件 @
12c4dace
...
...
@@ -240,7 +240,7 @@ def get_model_files(model_file_path,
def
get_opencl_binary_output_path
(
library_name
,
target_abi
,
device
):
target_soc
=
device
.
target_socs
device_
model
=
device
.
models
device_
name
=
device
.
device_name
return
'%s/%s/%s/%s/%s_%s.%s.%s.bin'
%
\
(
BUILD_OUTPUT_DIR
,
library_name
,
...
...
@@ -248,13 +248,13 @@ def get_opencl_binary_output_path(library_name, target_abi, device):
target_abi
,
library_name
,
OUTPUT_OPENCL_BINARY_FILE_NAME
,
device_
model
,
device_
name
,
target_soc
)
def
get_opencl_parameter_output_path
(
library_name
,
target_abi
,
device
):
target_soc
=
device
.
target_socs
device_
model
=
device
.
models
device_
name
=
device
.
device_name
return
'%s/%s/%s/%s/%s_%s.%s.%s.bin'
%
\
(
BUILD_OUTPUT_DIR
,
library_name
,
...
...
@@ -262,7 +262,7 @@ def get_opencl_parameter_output_path(library_name, target_abi, device):
target_abi
,
library_name
,
OUTPUT_OPENCL_PARAMETER_FILE_NAME
,
device_
model
,
device_
name
,
target_soc
)
...
...
@@ -271,7 +271,7 @@ def get_build_model_dirs(library_name,
target_abi
,
device
,
model_file_path
):
models
=
device
.
models
device_name
=
device
.
device_name
target_socs
=
device
.
target_socs
model_path_digest
=
md5sum
(
model_file_path
)
model_output_base_dir
=
'{}/{}/{}/{}/{}'
.
format
(
...
...
@@ -287,7 +287,7 @@ def get_build_model_dirs(library_name,
else
:
model_output_dir
=
'{}/{}_{}/{}'
.
format
(
model_output_base_dir
,
models
,
device_name
,
target_socs
,
target_abi
)
...
...
tools/converter.py
浏览文件 @
12c4dace
...
...
@@ -111,6 +111,13 @@ class DefaultValues(object):
gpu_priority_hint
=
3
,
class
ValidationThreshold
(
object
):
cpu_threshold
=
0.999
,
gpu_threshold
=
0.995
,
hexagon_threshold
=
0.930
,
cpu_quantize_threshold
=
0.980
,
CPP_KEYWORDS
=
[
'alignas'
,
'alignof'
,
'and'
,
'and_eq'
,
'asm'
,
'atomic_cancel'
,
'atomic_commit'
,
'atomic_noexcept'
,
'auto'
,
'bitand'
,
'bitor'
,
...
...
@@ -435,10 +442,11 @@ def format_model_config(flags):
'similarity threshold must be a dict.'
)
threshold_dict
=
{
DeviceType
.
CPU
:
0.999
,
DeviceType
.
GPU
:
0.995
,
DeviceType
.
HEXAGON
:
0.930
,
DeviceType
.
CPU
+
"_QUANTIZE"
:
0.980
,
DeviceType
.
CPU
:
ValidationThreshold
.
cpu_threshold
,
DeviceType
.
GPU
:
ValidationThreshold
.
gpu_threshold
,
DeviceType
.
HEXAGON
:
ValidationThreshold
.
hexagon_threshold
,
DeviceType
.
CPU
+
"_QUANTIZE"
:
ValidationThreshold
.
cpu_quantize_threshold
,
}
for
k
,
v
in
six
.
iteritems
(
validation_threshold
):
if
k
.
upper
()
==
'DSP'
:
...
...
@@ -838,39 +846,6 @@ def build_mace_run(configs, target_abi, toolchain, enable_openmp,
mace_lib_type
==
MACELibType
.
dynamic
)
def
build_quantize_stat
(
configs
):
library_name
=
configs
[
YAMLKeyword
.
library_name
]
build_tmp_binary_dir
=
get_build_binary_dir
(
library_name
,
ABIType
.
host
)
if
os
.
path
.
exists
(
build_tmp_binary_dir
):
sh
.
rm
(
"-rf"
,
build_tmp_binary_dir
)
os
.
makedirs
(
build_tmp_binary_dir
)
quantize_stat_target
=
QUANTIZE_STAT_TARGET
build_arg
=
""
six
.
print_
(
configs
[
YAMLKeyword
.
model_graph_format
])
if
configs
[
YAMLKeyword
.
model_graph_format
]
==
ModelFormat
.
code
:
mace_check
(
os
.
path
.
exists
(
ENGINE_CODEGEN_DIR
),
ModuleName
.
RUN
,
"You should convert model first."
)
build_arg
=
"--per_file_copt=mace/tools/quantization/quantize_stat.cc@-DMODEL_GRAPH_FORMAT_CODE"
# noqa
sh_commands
.
bazel_build
(
quantize_stat_target
,
abi
=
ABIType
.
host
,
toolchain
=
flags
.
toolchain
,
enable_openmp
=
True
,
symbol_hidden
=
True
,
extra_args
=
build_arg
)
quantize_stat_filepath
=
build_tmp_binary_dir
+
"/quantize_stat"
if
os
.
path
.
exists
(
quantize_stat_filepath
):
sh
.
rm
(
"-rf"
,
quantize_stat_filepath
)
sh
.
cp
(
"-f"
,
"bazel-bin/mace/tools/quantization/quantize_stat"
,
build_tmp_binary_dir
)
def
build_example
(
configs
,
target_abi
,
toolchain
,
enable_openmp
,
mace_lib_type
):
library_name
=
configs
[
YAMLKeyword
.
library_name
]
...
...
@@ -951,10 +926,8 @@ def run_mace(flags):
clear_build_dirs
(
configs
[
YAMLKeyword
.
library_name
])
target_socs
=
configs
[
YAMLKeyword
.
target_socs
]
if
not
target_socs
or
ALL_SOC_TAG
in
target_socs
:
device_list
=
DeviceManager
.
list_devices
(
flags
.
device_yml
)
else
:
device_list
=
DeviceManager
.
list_devices
(
flags
.
device_yml
)
device_list
=
DeviceManager
.
list_devices
(
flags
.
device_yml
)
if
target_socs
and
ALL_SOC_TAG
not
in
target_socs
:
device_list
=
[
dev
for
dev
in
device_list
if
dev
[
YAMLKeyword
.
target_socs
].
lower
()
in
target_socs
]
for
target_abi
in
configs
[
YAMLKeyword
.
target_abis
]:
...
...
@@ -1042,13 +1015,10 @@ def benchmark_model(flags):
clear_build_dirs
(
configs
[
YAMLKeyword
.
library_name
])
target_socs
=
configs
[
YAMLKeyword
.
target_socs
]
if
not
target_socs
or
ALL_SOC_TAG
in
target_socs
:
device_list
=
DeviceManager
.
list_devices
(
flags
.
device_yml
)
# target_socs = sh_commands.adb_get_all_socs()
else
:
device_list
=
DeviceManager
.
list_devices
(
flags
.
device_yml
)
device_list
=
DeviceManager
.
list_devices
(
flags
.
device_yml
)
if
target_socs
and
ALL_SOC_TAG
not
in
target_socs
:
device_list
=
[
dev
for
dev
in
device_list
if
dev
[
YAMLKeyword
.
target_socs
]
in
target_socs
]
if
dev
[
YAMLKeyword
.
target_socs
]
.
lower
()
in
target_socs
]
for
target_abi
in
configs
[
YAMLKeyword
.
target_abis
]:
# build benchmark_model binary
...
...
tools/device.py
浏览文件 @
12c4dace
...
...
@@ -37,8 +37,8 @@ class DeviceWrapper:
:type device_dict: Device
:param device_dict: a key-value dict that holds the device information,
which attribute has:
target_abis, target_socs, models, system, address
password
, username
device_name, target_abis, target_socs, system,
address
, username
"""
diff
=
set
(
device_dict
.
keys
())
-
set
(
YAMLKeyword
.
__dict__
.
keys
())
if
len
(
diff
)
>
0
:
...
...
@@ -111,6 +111,7 @@ class DeviceWrapper:
def
push
(
self
,
src_path
,
dst_path
):
mace_check
(
os
.
path
.
exists
(
src_path
),
"Device"
,
'{} not found'
.
format
(
src_path
))
six
.
print_
(
"Push %s to %s"
%
(
src_path
,
dst_path
))
if
self
.
system
==
SystemType
.
android
:
sh_commands
.
adb_push
(
src_path
,
dst_path
,
self
.
address
)
elif
self
.
system
==
SystemType
.
arm_linux
:
...
...
@@ -129,6 +130,7 @@ class DeviceWrapper:
dst_file
=
"%s/%s"
%
(
dst_path
,
file_name
)
if
os
.
path
.
exists
(
dst_file
):
sh
.
rm
(
'-f'
,
dst_file
)
six
.
print_
(
"Pull %s to %s"
%
(
src_path
,
dst_path
))
if
self
.
system
==
SystemType
.
android
:
sh_commands
.
adb_pull
(
src_file
,
dst_file
,
self
.
address
)
...
...
@@ -138,7 +140,6 @@ class DeviceWrapper:
self
.
address
,
src_file
),
dst_file
)
print
(
"pull file "
,
src_path
,
dst_path
)
except
sh
.
ErrorReturnCode_1
as
e
:
six
.
print_
(
"Pull Failed !"
,
file
=
sys
.
stderr
)
raise
e
...
...
@@ -256,10 +257,13 @@ class DeviceWrapper:
if
model_graph_format
==
ModelFormat
.
file
:
mace_model_phone_path
=
"%s/%s.pb"
%
(
self
.
data_dir
,
model_tag
)
self
.
push
(
mace_model_path
,
mace_model_phone_path
)
self
.
push
(
mace_model_path
,
mace_model_phone_path
)
if
link_dynamic
:
self
.
push
(
libmace_dynamic_library_path
,
self
.
data_dir
)
if
self
.
system
==
SystemType
.
android
:
sh_commands
.
push_depended_so_libs
(
libmace_dynamic_library_path
,
abi
,
self
.
data_dir
,
self
.
address
)
self
.
push
(
"%s/%s"
%
(
target_dir
,
target_name
),
self
.
data_dir
)
stdout_buff
=
[]
...
...
@@ -430,14 +434,11 @@ class DeviceWrapper:
configs
[
YAMLKeyword
.
model_graph_format
],
configs
[
YAMLKeyword
.
model_data_format
],
target_abi
)
if
target_abi
==
ABIType
.
host
:
device_model
=
ABIType
.
host
else
:
device_model
=
self
.
models
if
target_abi
!=
ABIType
.
host
:
self
.
clear_data_dir
()
MaceLogger
.
header
(
StringFormatter
.
block
(
'Run model {} on {}'
.
format
(
model_name
,
device_model
)))
'Run model {} on {}'
.
format
(
model_name
,
self
.
device_name
)))
model_config
=
configs
[
YAMLKeyword
.
models
][
model_name
]
model_runtime
=
model_config
[
YAMLKeyword
.
runtime
]
...
...
@@ -631,7 +632,7 @@ class DeviceWrapper:
data_str
=
'{model_name},{device_name},{soc},{abi},{device_type},'
\
'{init},{warmup},{run_avg},{tuned}
\n
'
.
format
(
model_name
=
model_name
,
device_name
=
self
.
models
,
device_name
=
self
.
device_name
,
soc
=
self
.
target_socs
,
abi
=
target_abi
,
device_type
=
device_type
,
...
...
@@ -671,7 +672,7 @@ class DeviceWrapper:
mace_model_path
=
''
if
model_graph_format
==
ModelFormat
.
file
:
mace_model_path
=
'%s/%s.pb'
%
(
mace_model_dir
,
model_tag
)
if
abi
==
'host'
:
if
abi
==
ABIType
.
host
:
libmace_dynamic_lib_dir_path
=
\
os
.
path
.
dirname
(
libmace_dynamic_library_path
)
p
=
subprocess
.
Popen
(
...
...
@@ -719,6 +720,10 @@ class DeviceWrapper:
self
.
push
(
mace_model_path
,
mace_model_device_path
)
if
link_dynamic
:
self
.
push
(
libmace_dynamic_library_path
,
self
.
data_dir
)
if
self
.
system
==
SystemType
.
android
:
sh_commands
.
push_depended_so_libs
(
libmace_dynamic_library_path
,
abi
,
self
.
data_dir
,
self
.
address
)
self
.
rm
(
'%s/%s'
%
(
self
.
data_dir
,
benchmark_binary_name
))
self
.
push
(
'%s/%s'
%
(
benchmark_binary_dir
,
benchmark_binary_name
),
self
.
data_dir
)
...
...
@@ -761,19 +766,11 @@ class DeviceWrapper:
os
.
remove
(
tmp_cmd_file
)
if
self
.
system
==
SystemType
.
android
:
sh
.
adb
(
'-s'
,
self
.
address
,
'shell'
,
'sh'
,
cmd_file_path
,
_fg
=
True
)
sh
.
adb
(
'-s'
,
self
.
address
,
'shell'
,
'sh'
,
cmd_file_path
,
_fg
=
True
)
elif
self
.
system
==
SystemType
.
arm_linux
:
sh
.
ssh
(
'%s@%s'
%
(
self
.
username
,
self
.
address
),
'sh'
,
cmd_file_path
,
_fg
=
True
)
'sh'
,
cmd_file_path
,
_fg
=
True
)
self
.
rm
(
cmd_file_path
)
six
.
print_
(
'Benchmark done!
\n
'
)
...
...
@@ -804,13 +801,10 @@ class DeviceWrapper:
configs
[
YAMLKeyword
.
model_graph_format
],
configs
[
YAMLKeyword
.
model_data_format
],
target_abi
)
if
target_abi
==
ABIType
.
host
:
device_name
=
ABIType
.
host
else
:
device_name
=
self
.
models
MaceLogger
.
header
(
StringFormatter
.
block
(
'Benchmark model %s on %s'
%
(
model_name
,
device_name
)))
'Benchmark model %s on %s'
%
(
model_name
,
self
.
device_name
)))
model_config
=
configs
[
YAMLKeyword
.
models
][
model_name
]
model_runtime
=
model_config
[
YAMLKeyword
.
runtime
]
subgraphs
=
model_config
[
YAMLKeyword
.
subgraphs
]
...
...
@@ -885,7 +879,7 @@ class DeviceWrapper:
print
(
'Trying to lock device %s'
%
self
.
address
)
with
self
.
lock
():
print
(
'Run on device: %s, %s, %s'
%
(
self
.
address
,
self
.
target_socs
,
self
.
models
))
(
self
.
address
,
self
.
target_socs
,
self
.
device_name
))
self
.
rm
(
self
.
data_dir
)
self
.
exec_command
(
'mkdir -p %s'
%
self
.
data_dir
)
self
.
push
(
host_bin_full_path
,
device_bin_full_path
)
...
...
@@ -949,11 +943,11 @@ class DeviceManager:
for
adb
in
adb_list
:
prop
=
sh_commands
.
adb_getprop_by_serialno
(
adb
[
0
])
android
=
{
YAMLKeyword
.
device_name
:
adb
[
1
],
YAMLKeyword
.
device_name
:
prop
[
'ro.product.model'
].
replace
(
' '
,
''
),
YAMLKeyword
.
target_abis
:
prop
[
'ro.product.cpu.abilist'
].
split
(
','
),
YAMLKeyword
.
target_socs
:
prop
[
'ro.board.platform'
],
YAMLKeyword
.
models
:
prop
[
'ro.product.model'
].
replace
(
' '
,
'_'
),
YAMLKeyword
.
system
:
SystemType
.
android
,
YAMLKeyword
.
address
:
adb
[
0
],
YAMLKeyword
.
username
:
''
,
...
...
@@ -968,9 +962,9 @@ class DeviceManager:
devices
=
devices
[
'devices'
]
device_list
=
[]
for
name
,
dev
in
six
.
iteritems
(
devices
):
dev
[
YAMLKeyword
.
device_name
]
=
name
dev
[
YAMLKeyword
.
device_name
]
=
\
dev
[
YAMLKeyword
.
models
].
replace
(
' '
,
''
)
dev
[
YAMLKeyword
.
system
]
=
SystemType
.
arm_linux
dev
[
YAMLKeyword
.
models
]
=
dev
[
YAMLKeyword
.
models
].
replace
(
' '
,
'_'
)
device_list
.
append
(
dev
)
return
device_list
...
...
@@ -992,7 +986,6 @@ class DeviceManager:
YAMLKeyword
.
target_abis
:
[
ABIType
.
host
],
YAMLKeyword
.
target_socs
:
''
,
YAMLKeyword
.
system
:
SystemType
.
host
,
YAMLKeyword
.
models
:
None
,
YAMLKeyword
.
address
:
None
,
}
...
...
tools/sh_commands.py
浏览文件 @
12c4dace
...
...
@@ -20,7 +20,6 @@ import os
import
re
import
sh
import
struct
import
subprocess
import
sys
import
time
import
platform
...
...
@@ -28,10 +27,6 @@ import platform
import
six
import
common
from
common
import
ModelFormat
from
common
import
ABIType
from
common
import
SystemType
from
common
import
YAMLKeyword
from
common
import
abi_to_internal
sys
.
path
.
insert
(
0
,
"mace/python/tools"
)
...
...
@@ -179,99 +174,16 @@ def adb_get_all_socs():
def
adb_push
(
src_path
,
dst_path
,
serialno
):
six
.
print_
(
"Push %s to %s"
%
(
src_path
,
dst_path
))
sh
.
adb
(
"-s"
,
serialno
,
"push"
,
src_path
,
dst_path
)
def
adb_pull
(
src_path
,
dst_path
,
serialno
):
six
.
print_
(
"Pull %s to %s"
%
(
src_path
,
dst_path
))
try
:
sh
.
adb
(
"-s"
,
serialno
,
"pull"
,
src_path
,
dst_path
)
except
Exception
as
e
:
six
.
print_
(
"Error msg: %s"
%
e
,
file
=
sys
.
stderr
)
def
adb_run
(
abi
,
serialno
,
host_bin_path
,
bin_name
,
args
=
""
,
opencl_profiling
=
True
,
vlog_level
=
0
,
device_bin_path
=
"/data/local/tmp/mace"
,
out_of_range_check
=
True
,
address_sanitizer
=
False
,
simpleperf
=
False
):
host_bin_full_path
=
"%s/%s"
%
(
host_bin_path
,
bin_name
)
device_bin_full_path
=
"%s/%s"
%
(
device_bin_path
,
bin_name
)
props
=
adb_getprop_by_serialno
(
serialno
)
six
.
print_
(
"====================================================================="
)
six
.
print_
(
"Trying to lock device %s"
%
serialno
)
with
device_lock
(
serialno
):
six
.
print_
(
"Run on device: %s, %s, %s"
%
(
serialno
,
props
[
"ro.board.platform"
],
props
[
"ro.product.model"
]))
sh
.
adb
(
"-s"
,
serialno
,
"shell"
,
"rm -rf %s"
%
device_bin_path
)
sh
.
adb
(
"-s"
,
serialno
,
"shell"
,
"mkdir -p %s"
%
device_bin_path
)
adb_push
(
host_bin_full_path
,
device_bin_full_path
,
serialno
)
ld_preload
=
""
if
address_sanitizer
:
adb_push
(
find_asan_rt_library
(
abi
),
device_bin_path
,
serialno
)
ld_preload
=
"LD_PRELOAD=%s/%s"
%
(
device_bin_path
,
asan_rt_library_names
(
abi
)),
opencl_profiling
=
1
if
opencl_profiling
else
0
out_of_range_check
=
1
if
out_of_range_check
else
0
six
.
print_
(
"Run %s"
%
device_bin_full_path
)
stdout_buff
=
[]
process_output
=
make_output_processor
(
stdout_buff
)
if
simpleperf
:
adb_push
(
find_simpleperf_library
(
abi
),
device_bin_path
,
serialno
)
simpleperf_cmd
=
"%s/simpleperf"
%
device_bin_path
sh
.
adb
(
"-s"
,
serialno
,
"shell"
,
ld_preload
,
"MACE_OUT_OF_RANGE_CHECK=%d"
%
out_of_range_check
,
"MACE_OPENCL_PROFILING=%d"
%
opencl_profiling
,
"MACE_CPP_MIN_VLOG_LEVEL=%d"
%
vlog_level
,
simpleperf_cmd
,
"stat"
,
"--group"
,
"raw-l1-dcache,raw-l1-dcache-refill"
,
"--group"
,
"raw-l2-dcache,raw-l2-dcache-refill"
,
"--group"
,
"raw-l1-dtlb,raw-l1-dtlb-refill"
,
"--group"
,
"raw-l2-dtlb,raw-l2-dtlb-refill"
,
device_bin_full_path
,
args
,
_tty_in
=
True
,
_out
=
process_output
,
_err_to_out
=
True
)
else
:
sh
.
adb
(
"-s"
,
serialno
,
"shell"
,
ld_preload
,
"MACE_OUT_OF_RANGE_CHECK=%d"
%
out_of_range_check
,
"MACE_OPENCL_PROFILING=%d"
%
opencl_profiling
,
"MACE_CPP_MIN_VLOG_LEVEL=%d"
%
vlog_level
,
device_bin_full_path
,
args
,
_tty_in
=
True
,
_out
=
process_output
,
_err_to_out
=
True
)
return
""
.
join
(
stdout_buff
)
################################
# Toolchain
################################
...
...
@@ -433,15 +345,6 @@ def gen_mace_engine_factory_source(model_tags,
six
.
print_
(
"Generate mace engine creator source done!
\n
"
)
def
pull_file_from_device
(
serial_num
,
file_path
,
file_name
,
output_dir
):
if
not
os
.
path
.
exists
(
output_dir
):
sh
.
mkdir
(
"-p"
,
output_dir
)
output_path
=
"%s/%s"
%
(
output_dir
,
file_path
)
if
os
.
path
.
exists
(
output_path
):
sh
.
rm
(
'-rf'
,
output_path
)
adb_pull
(
file_path
+
'/'
+
file_name
,
output_dir
,
serial_num
)
def
merge_opencl_binaries
(
binaries_dirs
,
cl_compiled_program_file_name
,
output_file_path
):
...
...
@@ -690,19 +593,17 @@ def push_depended_so_libs(libmace_dynamic_library_path,
abi
,
phone_data_dir
,
serialno
):
dep_so_libs
=
sh
.
bash
(
os
.
environ
[
"ANDROID_NDK_HOME"
]
+
"/ndk-depends"
,
libmace_dynamic_library_path
)
src_file
=
""
for
dep
in
split_stdout
(
dep_so_libs
):
if
dep
==
"libgnustl_shared.so"
:
adb_push
(
"%s/sources/cxx-stl/gnu-libstdc++/4.9/libs/%s/libgnustl_shared.so"
# noqa
%
(
os
.
environ
[
"ANDROID_NDK_HOME"
],
abi
),
phone_data_dir
,
serialno
)
src_file
=
"%s/sources/cxx-stl/gnu-libstdc++/4.9/libs/"
\
"%s/libgnustl_shared.so"
\
%
(
os
.
environ
[
"ANDROID_NDK_HOME"
],
abi
)
elif
dep
==
"libc++_shared.so"
:
adb_push
(
"%s/sources/cxx-stl/llvm-libc++/libs/%s/libc++_shared.so"
# noqa
%
(
os
.
environ
[
"ANDROID_NDK_HOME"
],
abi
),
phone_data_dir
,
serialno
)
src_file
=
"%s/sources/cxx-stl/llvm-libc++/libs/"
\
"%s/libc++_shared.so"
%
(
os
.
environ
[
"ANDROID_NDK_HOME"
],
abi
)
print
(
"push %s to %s"
%
(
src_file
,
phone_data_dir
))
adb_push
(
src_file
,
phone_data_dir
,
serialno
)
def
validate_model
(
abi
,
...
...
@@ -861,149 +762,6 @@ def packaging_lib(libmace_output_dir, project_name):
################################
# benchmark
################################
def
benchmark_model
(
abi
,
serialno
,
benchmark_binary_dir
,
benchmark_binary_name
,
vlog_level
,
embed_model_data
,
model_output_dir
,
mace_model_dir
,
input_nodes
,
output_nodes
,
input_shapes
,
output_shapes
,
model_tag
,
device_type
,
phone_data_dir
,
model_graph_format
,
opencl_binary_file
,
opencl_parameter_file
,
libmace_dynamic_library_path
,
omp_num_threads
=-
1
,
cpu_affinity_policy
=
1
,
gpu_perf_hint
=
3
,
gpu_priority_hint
=
3
,
input_file_name
=
"model_input"
,
link_dynamic
=
False
):
six
.
print_
(
"* Benchmark for %s"
%
model_tag
)
mace_model_path
=
""
if
model_graph_format
==
ModelFormat
.
file
:
mace_model_path
=
"%s/%s.pb"
%
(
mace_model_dir
,
model_tag
)
if
abi
==
"host"
:
libmace_dynamic_lib_dir_path
=
\
os
.
path
.
dirname
(
libmace_dynamic_library_path
)
p
=
subprocess
.
Popen
(
[
"env"
,
"LD_LIBRARY_PATH=%s"
%
libmace_dynamic_lib_dir_path
,
"MACE_CPP_MIN_VLOG_LEVEL=%s"
%
vlog_level
,
"%s/%s"
%
(
benchmark_binary_dir
,
benchmark_binary_name
),
"--model_name=%s"
%
model_tag
,
"--input_node=%s"
%
","
.
join
(
input_nodes
),
"--output_node=%s"
%
","
.
join
(
output_nodes
),
"--input_shape=%s"
%
":"
.
join
(
input_shapes
),
"--output_shape=%s"
%
":"
.
join
(
output_shapes
),
"--input_file=%s/%s"
%
(
model_output_dir
,
input_file_name
),
"--model_data_file=%s/%s.data"
%
(
mace_model_dir
,
model_tag
),
"--device=%s"
%
device_type
,
"--omp_num_threads=%s"
%
omp_num_threads
,
"--cpu_affinity_policy=%s"
%
cpu_affinity_policy
,
"--gpu_perf_hint=%s"
%
gpu_perf_hint
,
"--gpu_priority_hint=%s"
%
gpu_priority_hint
,
"--model_file=%s"
%
mace_model_path
,
])
p
.
wait
()
else
:
sh
.
adb
(
"-s"
,
serialno
,
"shell"
,
"mkdir"
,
"-p"
,
phone_data_dir
)
internal_storage_dir
=
create_internal_storage_dir
(
serialno
,
phone_data_dir
)
for
input_name
in
input_nodes
:
formatted_name
=
common
.
formatted_file_name
(
input_file_name
,
input_name
)
adb_push
(
"%s/%s"
%
(
model_output_dir
,
formatted_name
),
phone_data_dir
,
serialno
)
if
not
embed_model_data
:
adb_push
(
"%s/%s.data"
%
(
mace_model_dir
,
model_tag
),
phone_data_dir
,
serialno
)
if
device_type
==
common
.
DeviceType
.
GPU
:
if
os
.
path
.
exists
(
opencl_binary_file
):
adb_push
(
opencl_binary_file
,
phone_data_dir
,
serialno
)
if
os
.
path
.
exists
(
opencl_parameter_file
):
adb_push
(
opencl_parameter_file
,
phone_data_dir
,
serialno
)
mace_model_phone_path
=
""
if
model_graph_format
==
ModelFormat
.
file
:
mace_model_phone_path
=
"%s/%s.pb"
%
(
phone_data_dir
,
model_tag
)
adb_push
(
mace_model_path
,
mace_model_phone_path
,
serialno
)
if
link_dynamic
:
adb_push
(
libmace_dynamic_library_path
,
phone_data_dir
,
serialno
)
push_depended_so_libs
(
libmace_dynamic_library_path
,
abi
,
phone_data_dir
,
serialno
)
adb_push
(
"%s/%s"
%
(
benchmark_binary_dir
,
benchmark_binary_name
),
phone_data_dir
,
serialno
)
adb_cmd
=
[
"LD_LIBRARY_PATH=%s"
%
phone_data_dir
,
"MACE_CPP_MIN_VLOG_LEVEL=%s"
%
vlog_level
,
"MACE_RUN_PARAMETER_PATH=%s/mace_run.config"
%
phone_data_dir
,
"MACE_INTERNAL_STORAGE_PATH=%s"
%
internal_storage_dir
,
"MACE_OPENCL_PROFILING=1"
,
"%s/%s"
%
(
phone_data_dir
,
benchmark_binary_name
),
"--model_name=%s"
%
model_tag
,
"--input_node=%s"
%
","
.
join
(
input_nodes
),
"--output_node=%s"
%
","
.
join
(
output_nodes
),
"--input_shape=%s"
%
":"
.
join
(
input_shapes
),
"--output_shape=%s"
%
":"
.
join
(
output_shapes
),
"--input_file=%s/%s"
%
(
phone_data_dir
,
input_file_name
),
"--model_data_file=%s/%s.data"
%
(
phone_data_dir
,
model_tag
),
"--device=%s"
%
device_type
,
"--omp_num_threads=%s"
%
omp_num_threads
,
"--cpu_affinity_policy=%s"
%
cpu_affinity_policy
,
"--gpu_perf_hint=%s"
%
gpu_perf_hint
,
"--gpu_priority_hint=%s"
%
gpu_priority_hint
,
"--model_file=%s"
%
mace_model_phone_path
,
"--opencl_binary_file=%s/%s"
%
(
phone_data_dir
,
os
.
path
.
basename
(
opencl_binary_file
)),
"--opencl_parameter_file=%s/%s"
%
(
phone_data_dir
,
os
.
path
.
basename
(
opencl_parameter_file
)),
]
adb_cmd
=
' '
.
join
(
adb_cmd
)
cmd_file_name
=
"%s-%s-%s"
%
(
'cmd_file'
,
model_tag
,
str
(
time
.
time
()))
adb_cmd_file
=
"%s/%s"
%
(
phone_data_dir
,
cmd_file_name
)
tmp_cmd_file
=
"%s/%s"
%
(
'/tmp'
,
cmd_file_name
)
with
open
(
tmp_cmd_file
,
'w'
)
as
cmd_file
:
cmd_file
.
write
(
adb_cmd
)
adb_push
(
tmp_cmd_file
,
adb_cmd_file
,
serialno
)
os
.
remove
(
tmp_cmd_file
)
sh
.
adb
(
"-s"
,
serialno
,
"shell"
,
"sh"
,
adb_cmd_file
,
_fg
=
True
)
sh
.
adb
(
"-s"
,
serialno
,
"shell"
,
"rm"
,
adb_cmd_file
,
_fg
=
True
)
six
.
print_
(
"Benchmark done!
\n
"
)
def
build_run_throughput_test
(
abi
,
serialno
,
vlog_level
,
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录