提交 12c4dace 编写于 作者: L liuqi

Update the document about usage of ARM Linux

上级 51b14100
......@@ -114,69 +114,60 @@ Advanced usage
--------------
There are three common advanced use cases:
- run your model on the embedded device
- run your model on the embedded device(ARM LINUX)
- converting model to C++ code.
- tuning GPU kernels for a specific SoC.
Run you model on the embedded device
------------------
Run you model on the embedded device(ARM Linux)
-----------------------------------------------
MACE use ssh to connect embedded device, in this case we recommend you to push ``$HOME/.ssh/id_rsa.pub``
to your device ``$HOME/.ssh/authorized_keys``
The way to run your model on the ARM Linux is nearly same as with android, except you need specify a device config file.
.. code:: bash
cat ~/.ssh/id_rsa.pub | ssh -q {user}@{ip} "cat >> ~/.ssh/authorized_keys"
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --device_yml=/path/to/devices.yml
This part will show you how to write your own device yaml config file.
There are two steps to do before run:
**Device yaml config file**
1. configure login without password
The way to run your model on the embedded device is nearly the same as run on android, except you need give a device yaml config file.
MACE use ssh to connect embedded device, you should copy your public key to embedded device with the blow command.
MACE get this yaml config via ``--device_yml`` argument, default config value is ``devices.yml``
, when the yaml config file is not found. we treat as there is no available arm linux device, give a message
and continue on other device such as plugged android phone.
* **Example**
.. code:: bash
Here is an device yaml config demo.
cat ~/.ssh/id_rsa.pub | ssh -q {user}@{ip} "cat >> ~/.ssh/authorized_keys"
.. literalinclude:: devices/demo_device_nanopi.yml
:language: yaml
2. write your own device yaml configuration file.
* **Configuration**
.. list-table::
:header-rows: 1
* **Example**
* - Options
- Usage
* - target_abis
- Device supported abis, you can get it via ``dpkg --print-architecture`` and
``dpkg --print-foreign-architectures`` command, if more than one abi is supported,
separate them by commas.
* - target_socs
- device soc, you can get it from device manual, we haven't found a way to get it in shell.
* - models
- device models full name, you can get via get ``lshw`` command (third party package, install it via your package manager).
see it's product value.
* - address
- Since we use ssh to connect device, ip address is required.
* - username
- login username, required.
* - password
- login password, optional when you can login into device without password
Here is an device yaml config demo.
.. literalinclude:: devices/demo_device_nanopi.yml
:language: yaml
.. note::
* **Configuration**
The detailed explanation is listed in the blow table.
Some command tools:
.. list-table::
:header-rows: 1
.. code:: bash
* - Options
- Usage
* - target_abis
- Device supported abis, you can get it via ``dpkg --print-architecture`` and
``dpkg --print-foreign-architectures`` command, if more than one abi is supported,
separate them by commas.
* - target_socs
- device soc, you can get it from device manual, we haven't found a way to get it in shell.
* - models
- device models full name, you can get via get ``lshw`` command (third party package, install it via your package manager).
see it's product value.
* - address
- Since we use ssh to connect device, ip address is required.
* - username
- login username, required.
# specify device yaml config file via --device_yml argument or put the file under working directory
python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --device_yml=/path/to/devices.yml
Convert model(s) to C++ code
--------------------------------
......
......@@ -246,13 +246,14 @@ to run and validate your model.
# Test model run time
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --round=100
# If you want to run model on specified arm linux device, you should put device config file in the working directory or run with flag `--device_yml`
python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --device_yml=/path/to/devices.yml --example
# Validate the correctness by comparing the results against the
# original model and framework, measured with cosine distance for similarity.
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --validate
# If you want to run model on specified arm linux device, you should put device config file in the working directory or run with flag `--device_yml`
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --device_yml=/path/to/devices.yml
* **benchmark**
benchmark and profile the model.
......
......@@ -12,12 +12,9 @@ devices:
address: 10.0.0.0
# login username
username: user
# login password, is required when you can login into device without password
password: 1234567
raspberry:
target_abis: [armv7l]
target_socs: BCM2837
models: Raspberry Pi 3 Model B Plus Rev 1.3
address: 10.0.0.1
username: user
password: 123456
......@@ -42,7 +42,7 @@ struct CPUFreq {
};
namespace {
#if defined(__ANDROID__)
int GetCPUCount() {
int cpu_count = 0;
std::string cpu_sys_conf = "/proc/cpuinfo";
......@@ -69,10 +69,8 @@ int GetCPUCount() {
VLOG(2) << "CPU cores: " << cpu_count;
return cpu_count;
}
#endif
int GetCPUMaxFreq(std::vector<float> *max_freqs) {
#if defined(__ANDROID__)
int cpu_count = GetCPUCount();
for (int cpu_id = 0; cpu_id < cpu_count; ++cpu_id) {
std::string cpuinfo_max_freq_sys_conf = MakeString(
......@@ -94,34 +92,6 @@ int GetCPUMaxFreq(std::vector<float> *max_freqs) {
}
f.close();
}
#else
std::string cpu_sys_conf = "/proc/cpuinfo";
std::ifstream f(cpu_sys_conf);
if (!f.is_open()) {
LOG(ERROR) << "failed to open " << cpu_sys_conf;
return -1;
}
std::string line;
const std::string freq_key = "cpu MHz";
while (std::getline(f, line)) {
if (line.size() >= freq_key.size()
&& line.compare(0, freq_key.size(), freq_key) == 0) {
size_t pos = line.find(":");
if (pos != std::string::npos) {
std::string freq_str = line.substr(pos + 1);
float freq = atof(freq_str.c_str());
max_freqs->push_back(freq);
}
}
}
if (f.bad()) {
LOG(ERROR) << "failed to read " << cpu_sys_conf;
}
if (!f.eof()) {
LOG(ERROR) << "failed to read end of " << cpu_sys_conf;
}
f.close();
#endif
for (float freq : *max_freqs) {
VLOG(2) << "CPU freq: " << freq;
......
# Copyright 2018 Xiaomi, Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import operator
import six
from six.moves import reduce
from mace.proto import mace_pb2
from mace.python.tools.converter_tool import base_converter as cvt
from mace.python.tools.converter_tool.base_converter import DeviceType
from mace.python.tools.converter_tool.base_converter import ConverterUtil
from mace.python.tools.converter_tool.base_converter import MaceKeyword
from mace.python.tools.convert_util import calculate_image_shape
from mace.python.tools.convert_util import OpenCLBufferType
def MemoryTypeToStr(mem_type):
if mem_type == mace_pb2.CPU_BUFFER:
return 'CPU_BUFFER'
elif mem_type == mace_pb2.GPU_BUFFER:
return 'GPU_BUFFER'
elif mem_type == mace_pb2.GPU_IMAGE:
return 'GPU_IMAGE'
else:
return 'UNKNOWN'
class MemoryBlock(object):
def __init__(self, mem_type, block):
self._mem_type = mem_type
self._block = block
@property
def mem_type(self):
return self._mem_type
@property
def block(self):
return self._block
class MemoryOptimizer(object):
def __init__(self, net_def):
self.net_def = net_def
self.idle_mem = set()
self.op_mem = {} # op_name->mem_id
self.mem_block = {} # mem_id->[size] or mem_id->[x, y]
self.total_mem_count = 0
self.input_ref_counter = {}
self.mem_ref_counter = {}
ocl_mem_type_arg = ConverterUtil.get_arg(
net_def, MaceKeyword.mace_opencl_mem_type)
self.cl_mem_type = ocl_mem_type_arg.i if ocl_mem_type_arg is not None \
else None
consumers = {}
for op in net_def.op:
if not self.op_need_optimize_memory(op):
continue
for ipt in op.input:
if ipt not in consumers:
consumers[ipt] = []
consumers[ipt].append(op)
# only ref op's output tensor
for op in net_def.op:
if not self.op_need_optimize_memory(op):
continue
for output in op.output:
tensor_name = output
if tensor_name in consumers:
self.input_ref_counter[tensor_name] = \
len(consumers[tensor_name])
else:
self.input_ref_counter[tensor_name] = 0
def op_need_optimize_memory(self, op):
return True
def get_op_mem_block(self, op_type, output_shape, output_type):
data_type_size = 4
if output_type == mace_pb2.DT_UINT8:
data_type_size = 1
return MemoryBlock(mace_pb2.CPU_BUFFER,
[reduce(operator.mul, output_shape, 1) *
data_type_size])
def mem_size(self, memory_block):
return memory_block.block[0]
def sub_mem_block(self, mem_block1, mem_block2):
return self.mem_size(mem_block1) - self.mem_size(mem_block2)
def resize_mem_block(self, old_mem_block, op_mem_block):
return MemoryBlock(
old_mem_block.mem_type,
[max(old_mem_block.block[0], op_mem_block.block[0])])
def add_net_mem_blocks(self):
for mem in self.mem_block:
arena = self.net_def.mem_arena
block = arena.mem_block.add()
block.mem_id = mem
block.device_type = DeviceType.CPU.value
block.mem_type = self.mem_block[mem].mem_type
block.x = self.mem_block[mem].block[0]
block.y = 1
def get_total_origin_mem_size(self):
origin_mem_size = 0
for op in self.net_def.op:
if not self.op_need_optimize_memory(op):
continue
origin_mem_size += reduce(operator.mul,
op.output_shape[0].dims,
1)
return origin_mem_size
def get_total_optimized_mem_size(self):
optimized_mem_size = 0
for mem in self.mem_block:
print(mem, MemoryTypeToStr(self.mem_block[mem].mem_type),
self.mem_block[mem].block)
optimized_mem_size += self.mem_size(self.mem_block[mem])
return optimized_mem_size
@staticmethod
def is_memory_reuse_op(op):
return op.type == 'Reshape' or op.type == 'Identity' \
or op.type == 'Squeeze' or op.type == 'ExpandDims'
def optimize(self):
for op in self.net_def.op:
if not self.op_need_optimize_memory(op):
continue
if not op.output_shape:
six.print_("WARNING: There is no output shape information to "
"do memory optimization. %s (%s)" %
(op.name, op.type), file=sys.stderr)
return
if len(op.output_shape) != len(op.output):
six.print_('WARNING: the number of output shape is '
'not equal to the number of output.',
file=sys.stderr)
return
for i in range(len(op.output)):
if self.is_memory_reuse_op(op):
# make these ops reuse memory of input tensor
mem_id = self.op_mem.get(op.input[0], -1)
else:
output_type = mace_pb2.DT_FLOAT
for arg in op.arg:
if arg.name == 'T':
output_type = arg.i
if len(op.output_type) > i:
output_type = op.output_type[i]
op_mem_block = self.get_op_mem_block(
op.type,
op.output_shape[i].dims,
output_type)
mem_id = -1
if len(self.idle_mem) > 0:
best_mem_add_size = six.MAXSIZE
best_mem_waste_size = six.MAXSIZE
for mid in self.idle_mem:
old_mem_block = self.mem_block[mid]
if old_mem_block.mem_type != op_mem_block.mem_type:
continue
new_mem_block = self.resize_mem_block(
old_mem_block, op_mem_block)
add_mem_size = self.sub_mem_block(new_mem_block,
old_mem_block)
waste_mem_size = self.sub_mem_block(new_mem_block,
op_mem_block)
# minimize add_mem_size; if best_mem_add_size is 0,
# then minimize waste_mem_size
if (best_mem_add_size > 0 and
add_mem_size < best_mem_add_size) \
or (best_mem_add_size == 0 and
waste_mem_size < best_mem_waste_size):
best_mem_id = mid
best_mem_add_size = add_mem_size
best_mem_waste_size = waste_mem_size
best_mem_block = new_mem_block
# if add mem size < op mem size, then reuse it
if best_mem_add_size <= self.mem_size(op_mem_block):
self.mem_block[best_mem_id] = best_mem_block
mem_id = best_mem_id
self.idle_mem.remove(mem_id)
if mem_id == -1:
mem_id = self.total_mem_count
self.total_mem_count += 1
self.mem_block[mem_id] = op_mem_block
if mem_id != -1:
op.mem_id.extend([mem_id])
self.op_mem[op.output[i]] = mem_id
if mem_id not in self.mem_ref_counter:
self.mem_ref_counter[mem_id] = 1
else:
self.mem_ref_counter[mem_id] += 1
# de-ref input tensor mem
for idx in six.moves.range(len(op.input)):
ipt = op.input[idx]
if ipt in self.input_ref_counter:
self.input_ref_counter[ipt] -= 1
if self.input_ref_counter[ipt] == 0 \
and ipt in self.op_mem:
mem_id = self.op_mem[ipt]
self.mem_ref_counter[mem_id] -= 1
if self.mem_ref_counter[mem_id] == 0:
self.idle_mem.add(self.op_mem[ipt])
elif self.input_ref_counter[ipt] < 0:
raise Exception('ref count is less than 0')
self.add_net_mem_blocks()
print("total op: %d" % len(self.net_def.op))
print("origin mem: %d, optimized mem: %d" % (
self.get_total_origin_mem_size(),
self.get_total_optimized_mem_size()))
class GPUMemoryOptimizer(MemoryOptimizer):
def op_need_optimize_memory(self, op):
if op.type == MaceKeyword.mace_buffer_transform:
for arg in op.arg:
if arg.name == 'mode' and arg.i == 0:
return False
return op.type != MaceKeyword.mace_buffer_inverse_transform
def get_op_image_mem_block(self, op_type, output_shape):
if op_type == 'WinogradTransform' or op_type == 'MatMul':
buffer_shape = list(output_shape) + [1]
mem_block = MemoryBlock(
mace_pb2.GPU_IMAGE,
calculate_image_shape(OpenCLBufferType.IN_OUT_HEIGHT,
buffer_shape))
elif op_type in ['Shape',
'InferConv2dShape',
'StridedSlice',
'Stack',
'ScalarMath']:
if len(output_shape) == 1:
mem_block = MemoryBlock(mace_pb2.CPU_BUFFER,
[output_shape[0], 1])
elif len(output_shape) == 0:
mem_block = MemoryBlock(mace_pb2.CPU_BUFFER,
[1, 1])
else:
raise Exception('%s output shape dim size is not 0 or 1.' %
op_type)
else:
if len(output_shape) == 2: # only support fc/softmax
buffer_shape = [output_shape[0], output_shape[1]]
elif len(output_shape) == 4:
buffer_shape = output_shape
else:
raise Exception('%s output shape dim size is not 2 or 4.' %
op_type)
mem_block = MemoryBlock(
mace_pb2.GPU_IMAGE,
calculate_image_shape(OpenCLBufferType.IN_OUT_CHANNEL,
buffer_shape))
return mem_block
def get_op_buffer_mem_block(self, output_shape):
return MemoryBlock(mace_pb2.GPU_BUFFER,
[reduce(operator.mul, output_shape, 1), 1])
def get_op_mem_block(self, op_type, output_shape, output_type):
if self.cl_mem_type == mace_pb2.GPU_IMAGE:
return self.get_op_image_mem_block(op_type, output_shape)
else:
return self.get_op_buffer_mem_block(output_shape)
def mem_size(self, memory_block):
if memory_block.mem_type == mace_pb2.GPU_IMAGE:
return memory_block.block[0] * memory_block.block[1] * 4
else:
return memory_block.block[0]
def resize_mem_block(self, old_mem_block, op_mem_block):
resize_mem_block = MemoryBlock(
old_mem_block.mem_type,
[
max(old_mem_block.block[0], op_mem_block.block[0]),
max(old_mem_block.block[1], op_mem_block.block[1])
])
return resize_mem_block
def add_net_mem_blocks(self):
max_image_size_x = 0
max_image_size_y = 0
for mem in self.mem_block:
arena = self.net_def.mem_arena
block = arena.mem_block.add()
block.mem_id = mem
block.device_type = DeviceType.GPU.value
block.mem_type = self.mem_block[mem].mem_type
block.x = self.mem_block[mem].block[0]
block.y = self.mem_block[mem].block[1]
if self.mem_block[mem].mem_type == mace_pb2.GPU_IMAGE:
max_image_size_x = max(max_image_size_x, block.x)
max_image_size_y = max(max_image_size_y, block.y)
if self.cl_mem_type == mace_pb2.GPU_IMAGE:
# Update OpenCL max image size
net_ocl_max_img_size_arg = None
for arg in self.net_def.arg:
if arg.name == cvt.MaceKeyword.mace_opencl_max_image_size:
net_ocl_max_img_size_arg = arg
max_image_size_x = max(arg.ints[0], max_image_size_x)
max_image_size_y = max(arg.ints[1], max_image_size_y)
break
if net_ocl_max_img_size_arg is None:
net_ocl_max_img_size_arg = self.net_def.arg.add()
net_ocl_max_img_size_arg.name = \
cvt.MaceKeyword.mace_opencl_max_image_size
net_ocl_max_img_size_arg.ints[:] = [max_image_size_x,
max_image_size_y]
def optimize_gpu_memory(net_def):
mem_optimizer = GPUMemoryOptimizer(net_def)
mem_optimizer.optimize()
def optimize_cpu_memory(net_def):
mem_optimizer = MemoryOptimizer(net_def)
mem_optimizer.optimize()
# Partially borrowed from tensorflow tools/bazel.rc
# By default, we don't distinct target and host platfroms.
# When doing cross compilation, use --config=cross_compile to distinct them.
build --distinct_host_configuration=false
build:cross_compile --distinct_host_configuration=true
build --verbose_failures
build --copt=-std=c++11
......@@ -17,12 +15,12 @@ build --copt=-DMACE_USE_NNLIB_CAF
build:symbol_hidden --copt=-fvisibility=hidden
# Usage example: bazel build --config android
build:android --config=cross_compile
build:android --distinct_host_configuration=true
build:android --crosstool_top=//external:android/crosstool
build:android --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
# Usage example: bazel build --config arm_linux_gnueabihf
build:arm_linux_gnueabihf --config=cross_compile
build:arm_linux_gnueabihf --distinct_host_configuration=true
build:arm_linux_gnueabihf --crosstool_top=//tools/arm_compiler:toolchain
build:arm_linux_gnueabihf --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
build:arm_linux_gnueabihf --cpu=armeabi-v7a
......@@ -34,7 +32,7 @@ build:arm_linux_gnueabihf --copt -Wno-sequence-point
build:arm_linux_gnueabihf --copt -Wno-implicit-fallthrough
# Usage example: bazel build --config aarch64_linux_gnu
build:aarch64_linux_gnu --config=cross_compile
build:aarch64_linux_gnu --distinct_host_configuration=true
build:aarch64_linux_gnu --crosstool_top=//tools/aarch64_compiler:toolchain
build:aarch64_linux_gnu --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
build:aarch64_linux_gnu --cpu=aarch64
......
......@@ -52,13 +52,13 @@ def ops_benchmark_stdout_processor(stdout, dev, abi):
metrics["%s.input_mb_per_sec" % parts[0]] = parts[3]
metrics["%s.gmacc_per_sec" % parts[0]] = parts[4]
platform = dev[YAMLKeyword.target_socs]
model = dev[YAMLKeyword.models]
tags = {
"ro.board.platform": platform,
"ro.product.model": model,
"abi": abi
}
# platform = dev[YAMLKeyword.target_socs]
# model = dev[YAMLKeyword.device_name]
# tags = {
# "ro.board.platform": platform,
# "ro.product.model": model,
# "abi": abi
# }
# sh_commands.falcon_push_metrics(server,
# metrics, tags=tags, endpoint="mace_ops_benchmark")
......@@ -99,7 +99,7 @@ def parse_args():
parser.add_argument(
"--stdout_processor",
type=str,
default="stdout_processor",
default="unittest_stdout_processor",
help="Stdout processing function, default: stdout_processor")
parser.add_argument(
"--enable_neon",
......
......@@ -45,11 +45,11 @@ bazel build --config android --config optimization mace/libmace:libmace_dynamic
cp bazel-bin/mace/libmace/libmace.so $LIB_DIR/arm64-v8a/cpu_gpu/
echo "build shared lib for arm_linux_gnueabihf + cpu_gpu"
bazel build --config arm_linux_gnueabihf --config optimization mace/libmace:libmace_dynamic --define neon=true --define openmp=true --define opencl=true
bazel build --config arm_linux_gnueabihf --config optimization mace/libmace:libmace_dynamic --define neon=true --define openmp=true --define opencl=true --define quantize=true
cp bazel-bin/mace/libmace/libmace.so $LIB_DIR/arm_linux_gnueabihf/cpu_gpu/
echo "build shared lib for aarch64_linux_gnu + cpu_gpu"
bazel build --config aarch64_linux_gnu --config optimization mace/libmace:libmace_dynamic --define neon=true --define openmp=true --define opencl=true
bazel build --config aarch64_linux_gnu --config optimization mace/libmace:libmace_dynamic --define neon=true --define openmp=true --define opencl=true --define quantize=true
cp bazel-bin/mace/libmace/libmace.so $LIB_DIR/aarch64_linux_gnu/cpu_gpu/
if [[ "$OSTYPE" != "darwin"* ]];then
......@@ -73,11 +73,11 @@ bazel build --config android --config optimization mace/libmace:libmace_static -
cp bazel-genfiles/mace/libmace/libmace.a $LIB_DIR/arm64-v8a/cpu_gpu/
echo "build static lib for arm_linux_gnueabihf + cpu_gpu"
bazel build --config arm_linux_gnueabihf --config optimization mace/libmace:libmace_static --config symbol_hidden --define neon=true --define openmp=true --define opencl=true
bazel build --config arm_linux_gnueabihf --config optimization mace/libmace:libmace_static --config symbol_hidden --define neon=true --define openmp=true --define opencl=true --define quantize=true
cp bazel-genfiles/mace/libmace/libmace.a $LIB_DIR/arm_linux_gnueabihf/cpu_gpu/
echo "build static lib for aarch64_linux_gnu + cpu_gpu"
bazel build --config aarch64_linux_gnu --config optimization mace/libmace:libmace_static --config symbol_hidden --define neon=true --define openmp=true --define opencl=true
bazel build --config aarch64_linux_gnu --config optimization mace/libmace:libmace_static --config symbol_hidden --define neon=true --define openmp=true --define opencl=true --define quantize=true
cp bazel-genfiles/mace/libmace/libmace.a $LIB_DIR/aarch64_linux_gnu/cpu_gpu/
if [[ "$OSTYPE" != "darwin"* ]];then
......
......@@ -240,7 +240,7 @@ def get_model_files(model_file_path,
def get_opencl_binary_output_path(library_name, target_abi, device):
target_soc = device.target_socs
device_model = device.models
device_name = device.device_name
return '%s/%s/%s/%s/%s_%s.%s.%s.bin' % \
(BUILD_OUTPUT_DIR,
library_name,
......@@ -248,13 +248,13 @@ def get_opencl_binary_output_path(library_name, target_abi, device):
target_abi,
library_name,
OUTPUT_OPENCL_BINARY_FILE_NAME,
device_model,
device_name,
target_soc)
def get_opencl_parameter_output_path(library_name, target_abi, device):
target_soc = device.target_socs
device_model = device.models
device_name = device.device_name
return '%s/%s/%s/%s/%s_%s.%s.%s.bin' % \
(BUILD_OUTPUT_DIR,
library_name,
......@@ -262,7 +262,7 @@ def get_opencl_parameter_output_path(library_name, target_abi, device):
target_abi,
library_name,
OUTPUT_OPENCL_PARAMETER_FILE_NAME,
device_model,
device_name,
target_soc)
......@@ -271,7 +271,7 @@ def get_build_model_dirs(library_name,
target_abi,
device,
model_file_path):
models = device.models
device_name = device.device_name
target_socs = device.target_socs
model_path_digest = md5sum(model_file_path)
model_output_base_dir = '{}/{}/{}/{}/{}'.format(
......@@ -287,7 +287,7 @@ def get_build_model_dirs(library_name,
else:
model_output_dir = '{}/{}_{}/{}'.format(
model_output_base_dir,
models,
device_name,
target_socs,
target_abi
)
......
......@@ -111,6 +111,13 @@ class DefaultValues(object):
gpu_priority_hint = 3,
class ValidationThreshold(object):
cpu_threshold = 0.999,
gpu_threshold = 0.995,
hexagon_threshold = 0.930,
cpu_quantize_threshold = 0.980,
CPP_KEYWORDS = [
'alignas', 'alignof', 'and', 'and_eq', 'asm', 'atomic_cancel',
'atomic_commit', 'atomic_noexcept', 'auto', 'bitand', 'bitor',
......@@ -435,10 +442,11 @@ def format_model_config(flags):
'similarity threshold must be a dict.')
threshold_dict = {
DeviceType.CPU: 0.999,
DeviceType.GPU: 0.995,
DeviceType.HEXAGON: 0.930,
DeviceType.CPU + "_QUANTIZE": 0.980,
DeviceType.CPU: ValidationThreshold.cpu_threshold,
DeviceType.GPU: ValidationThreshold.gpu_threshold,
DeviceType.HEXAGON: ValidationThreshold.hexagon_threshold,
DeviceType.CPU + "_QUANTIZE":
ValidationThreshold.cpu_quantize_threshold,
}
for k, v in six.iteritems(validation_threshold):
if k.upper() == 'DSP':
......@@ -838,39 +846,6 @@ def build_mace_run(configs, target_abi, toolchain, enable_openmp,
mace_lib_type == MACELibType.dynamic)
def build_quantize_stat(configs):
library_name = configs[YAMLKeyword.library_name]
build_tmp_binary_dir = get_build_binary_dir(library_name, ABIType.host)
if os.path.exists(build_tmp_binary_dir):
sh.rm("-rf", build_tmp_binary_dir)
os.makedirs(build_tmp_binary_dir)
quantize_stat_target = QUANTIZE_STAT_TARGET
build_arg = ""
six.print_(configs[YAMLKeyword.model_graph_format])
if configs[YAMLKeyword.model_graph_format] == ModelFormat.code:
mace_check(os.path.exists(ENGINE_CODEGEN_DIR),
ModuleName.RUN,
"You should convert model first.")
build_arg = "--per_file_copt=mace/tools/quantization/quantize_stat.cc@-DMODEL_GRAPH_FORMAT_CODE" # noqa
sh_commands.bazel_build(
quantize_stat_target,
abi=ABIType.host,
toolchain=flags.toolchain,
enable_openmp=True,
symbol_hidden=True,
extra_args=build_arg
)
quantize_stat_filepath = build_tmp_binary_dir + "/quantize_stat"
if os.path.exists(quantize_stat_filepath):
sh.rm("-rf", quantize_stat_filepath)
sh.cp("-f", "bazel-bin/mace/tools/quantization/quantize_stat",
build_tmp_binary_dir)
def build_example(configs, target_abi, toolchain,
enable_openmp, mace_lib_type):
library_name = configs[YAMLKeyword.library_name]
......@@ -951,10 +926,8 @@ def run_mace(flags):
clear_build_dirs(configs[YAMLKeyword.library_name])
target_socs = configs[YAMLKeyword.target_socs]
if not target_socs or ALL_SOC_TAG in target_socs:
device_list = DeviceManager.list_devices(flags.device_yml)
else:
device_list = DeviceManager.list_devices(flags.device_yml)
device_list = DeviceManager.list_devices(flags.device_yml)
if target_socs and ALL_SOC_TAG not in target_socs:
device_list = [dev for dev in device_list
if dev[YAMLKeyword.target_socs].lower() in target_socs]
for target_abi in configs[YAMLKeyword.target_abis]:
......@@ -1042,13 +1015,10 @@ def benchmark_model(flags):
clear_build_dirs(configs[YAMLKeyword.library_name])
target_socs = configs[YAMLKeyword.target_socs]
if not target_socs or ALL_SOC_TAG in target_socs:
device_list = DeviceManager.list_devices(flags.device_yml)
# target_socs = sh_commands.adb_get_all_socs()
else:
device_list = DeviceManager.list_devices(flags.device_yml)
device_list = DeviceManager.list_devices(flags.device_yml)
if target_socs and ALL_SOC_TAG not in target_socs:
device_list = [dev for dev in device_list
if dev[YAMLKeyword.target_socs] in target_socs]
if dev[YAMLKeyword.target_socs].lower() in target_socs]
for target_abi in configs[YAMLKeyword.target_abis]:
# build benchmark_model binary
......
......@@ -37,8 +37,8 @@ class DeviceWrapper:
:type device_dict: Device
:param device_dict: a key-value dict that holds the device information,
which attribute has:
target_abis, target_socs, models, system, address
password, username
device_name, target_abis, target_socs, system,
address, username
"""
diff = set(device_dict.keys()) - set(YAMLKeyword.__dict__.keys())
if len(diff) > 0:
......@@ -111,6 +111,7 @@ class DeviceWrapper:
def push(self, src_path, dst_path):
mace_check(os.path.exists(src_path), "Device",
'{} not found'.format(src_path))
six.print_("Push %s to %s" % (src_path, dst_path))
if self.system == SystemType.android:
sh_commands.adb_push(src_path, dst_path, self.address)
elif self.system == SystemType.arm_linux:
......@@ -129,6 +130,7 @@ class DeviceWrapper:
dst_file = "%s/%s" % (dst_path, file_name)
if os.path.exists(dst_file):
sh.rm('-f', dst_file)
six.print_("Pull %s to %s" % (src_path, dst_path))
if self.system == SystemType.android:
sh_commands.adb_pull(
src_file, dst_file, self.address)
......@@ -138,7 +140,6 @@ class DeviceWrapper:
self.address,
src_file),
dst_file)
print("pull file ", src_path, dst_path)
except sh.ErrorReturnCode_1 as e:
six.print_("Pull Failed !", file=sys.stderr)
raise e
......@@ -256,10 +257,13 @@ class DeviceWrapper:
if model_graph_format == ModelFormat.file:
mace_model_phone_path = "%s/%s.pb" % (self.data_dir,
model_tag)
self.push(mace_model_path,
mace_model_phone_path)
self.push(mace_model_path, mace_model_phone_path)
if link_dynamic:
self.push(libmace_dynamic_library_path, self.data_dir)
if self.system == SystemType.android:
sh_commands.push_depended_so_libs(
libmace_dynamic_library_path, abi, self.data_dir,
self.address)
self.push("%s/%s" % (target_dir, target_name), self.data_dir)
stdout_buff = []
......@@ -430,14 +434,11 @@ class DeviceWrapper:
configs[YAMLKeyword.model_graph_format],
configs[YAMLKeyword.model_data_format],
target_abi)
if target_abi == ABIType.host:
device_model = ABIType.host
else:
device_model = self.models
if target_abi != ABIType.host:
self.clear_data_dir()
MaceLogger.header(
StringFormatter.block(
'Run model {} on {}'.format(model_name, device_model)))
'Run model {} on {}'.format(model_name, self.device_name)))
model_config = configs[YAMLKeyword.models][model_name]
model_runtime = model_config[YAMLKeyword.runtime]
......@@ -631,7 +632,7 @@ class DeviceWrapper:
data_str = '{model_name},{device_name},{soc},{abi},{device_type},' \
'{init},{warmup},{run_avg},{tuned}\n'.format(
model_name=model_name,
device_name=self.models,
device_name=self.device_name,
soc=self.target_socs,
abi=target_abi,
device_type=device_type,
......@@ -671,7 +672,7 @@ class DeviceWrapper:
mace_model_path = ''
if model_graph_format == ModelFormat.file:
mace_model_path = '%s/%s.pb' % (mace_model_dir, model_tag)
if abi == 'host':
if abi == ABIType.host:
libmace_dynamic_lib_dir_path = \
os.path.dirname(libmace_dynamic_library_path)
p = subprocess.Popen(
......@@ -719,6 +720,10 @@ class DeviceWrapper:
self.push(mace_model_path, mace_model_device_path)
if link_dynamic:
self.push(libmace_dynamic_library_path, self.data_dir)
if self.system == SystemType.android:
sh_commands.push_depended_so_libs(
libmace_dynamic_library_path, abi, self.data_dir,
self.address)
self.rm('%s/%s' % (self.data_dir, benchmark_binary_name))
self.push('%s/%s' % (benchmark_binary_dir, benchmark_binary_name),
self.data_dir)
......@@ -761,19 +766,11 @@ class DeviceWrapper:
os.remove(tmp_cmd_file)
if self.system == SystemType.android:
sh.adb(
'-s',
self.address,
'shell',
'sh',
cmd_file_path,
_fg=True
)
sh.adb('-s', self.address, 'shell', 'sh', cmd_file_path,
_fg=True)
elif self.system == SystemType.arm_linux:
sh.ssh('%s@%s' % (self.username, self.address),
'sh',
cmd_file_path,
_fg=True)
'sh', cmd_file_path, _fg=True)
self.rm(cmd_file_path)
six.print_('Benchmark done! \n')
......@@ -804,13 +801,10 @@ class DeviceWrapper:
configs[YAMLKeyword.model_graph_format],
configs[YAMLKeyword.model_data_format],
target_abi)
if target_abi == ABIType.host:
device_name = ABIType.host
else:
device_name = self.models
MaceLogger.header(
StringFormatter.block(
'Benchmark model %s on %s' % (model_name, device_name)))
'Benchmark model %s on %s' % (model_name,
self.device_name)))
model_config = configs[YAMLKeyword.models][model_name]
model_runtime = model_config[YAMLKeyword.runtime]
subgraphs = model_config[YAMLKeyword.subgraphs]
......@@ -885,7 +879,7 @@ class DeviceWrapper:
print('Trying to lock device %s' % self.address)
with self.lock():
print('Run on device: %s, %s, %s' %
(self.address, self.target_socs, self.models))
(self.address, self.target_socs, self.device_name))
self.rm(self.data_dir)
self.exec_command('mkdir -p %s' % self.data_dir)
self.push(host_bin_full_path, device_bin_full_path)
......@@ -949,11 +943,11 @@ class DeviceManager:
for adb in adb_list:
prop = sh_commands.adb_getprop_by_serialno(adb[0])
android = {
YAMLKeyword.device_name: adb[1],
YAMLKeyword.device_name:
prop['ro.product.model'].replace(' ', ''),
YAMLKeyword.target_abis:
prop['ro.product.cpu.abilist'].split(','),
YAMLKeyword.target_socs: prop['ro.board.platform'],
YAMLKeyword.models: prop['ro.product.model'].replace(' ', '_'),
YAMLKeyword.system: SystemType.android,
YAMLKeyword.address: adb[0],
YAMLKeyword.username: '',
......@@ -968,9 +962,9 @@ class DeviceManager:
devices = devices['devices']
device_list = []
for name, dev in six.iteritems(devices):
dev[YAMLKeyword.device_name] = name
dev[YAMLKeyword.device_name] = \
dev[YAMLKeyword.models].replace(' ', '')
dev[YAMLKeyword.system] = SystemType.arm_linux
dev[YAMLKeyword.models] = dev[YAMLKeyword.models].replace(' ', '_')
device_list.append(dev)
return device_list
......@@ -992,7 +986,6 @@ class DeviceManager:
YAMLKeyword.target_abis: [ABIType.host],
YAMLKeyword.target_socs: '',
YAMLKeyword.system: SystemType.host,
YAMLKeyword.models: None,
YAMLKeyword.address: None,
}
......
......@@ -20,7 +20,6 @@ import os
import re
import sh
import struct
import subprocess
import sys
import time
import platform
......@@ -28,10 +27,6 @@ import platform
import six
import common
from common import ModelFormat
from common import ABIType
from common import SystemType
from common import YAMLKeyword
from common import abi_to_internal
sys.path.insert(0, "mace/python/tools")
......@@ -179,99 +174,16 @@ def adb_get_all_socs():
def adb_push(src_path, dst_path, serialno):
six.print_("Push %s to %s" % (src_path, dst_path))
sh.adb("-s", serialno, "push", src_path, dst_path)
def adb_pull(src_path, dst_path, serialno):
six.print_("Pull %s to %s" % (src_path, dst_path))
try:
sh.adb("-s", serialno, "pull", src_path, dst_path)
except Exception as e:
six.print_("Error msg: %s" % e, file=sys.stderr)
def adb_run(abi,
serialno,
host_bin_path,
bin_name,
args="",
opencl_profiling=True,
vlog_level=0,
device_bin_path="/data/local/tmp/mace",
out_of_range_check=True,
address_sanitizer=False,
simpleperf=False):
host_bin_full_path = "%s/%s" % (host_bin_path, bin_name)
device_bin_full_path = "%s/%s" % (device_bin_path, bin_name)
props = adb_getprop_by_serialno(serialno)
six.print_(
"====================================================================="
)
six.print_("Trying to lock device %s" % serialno)
with device_lock(serialno):
six.print_("Run on device: %s, %s, %s" %
(serialno, props["ro.board.platform"],
props["ro.product.model"]))
sh.adb("-s", serialno, "shell", "rm -rf %s" % device_bin_path)
sh.adb("-s", serialno, "shell", "mkdir -p %s" % device_bin_path)
adb_push(host_bin_full_path, device_bin_full_path, serialno)
ld_preload = ""
if address_sanitizer:
adb_push(find_asan_rt_library(abi), device_bin_path, serialno)
ld_preload = "LD_PRELOAD=%s/%s" % (device_bin_path,
asan_rt_library_names(abi)),
opencl_profiling = 1 if opencl_profiling else 0
out_of_range_check = 1 if out_of_range_check else 0
six.print_("Run %s" % device_bin_full_path)
stdout_buff = []
process_output = make_output_processor(stdout_buff)
if simpleperf:
adb_push(find_simpleperf_library(abi), device_bin_path, serialno)
simpleperf_cmd = "%s/simpleperf" % device_bin_path
sh.adb(
"-s",
serialno,
"shell",
ld_preload,
"MACE_OUT_OF_RANGE_CHECK=%d" % out_of_range_check,
"MACE_OPENCL_PROFILING=%d" % opencl_profiling,
"MACE_CPP_MIN_VLOG_LEVEL=%d" % vlog_level,
simpleperf_cmd,
"stat",
"--group",
"raw-l1-dcache,raw-l1-dcache-refill",
"--group",
"raw-l2-dcache,raw-l2-dcache-refill",
"--group",
"raw-l1-dtlb,raw-l1-dtlb-refill",
"--group",
"raw-l2-dtlb,raw-l2-dtlb-refill",
device_bin_full_path,
args,
_tty_in=True,
_out=process_output,
_err_to_out=True)
else:
sh.adb(
"-s",
serialno,
"shell",
ld_preload,
"MACE_OUT_OF_RANGE_CHECK=%d" % out_of_range_check,
"MACE_OPENCL_PROFILING=%d" % opencl_profiling,
"MACE_CPP_MIN_VLOG_LEVEL=%d" % vlog_level,
device_bin_full_path,
args,
_tty_in=True,
_out=process_output,
_err_to_out=True)
return "".join(stdout_buff)
################################
# Toolchain
################################
......@@ -433,15 +345,6 @@ def gen_mace_engine_factory_source(model_tags,
six.print_("Generate mace engine creator source done!\n")
def pull_file_from_device(serial_num, file_path, file_name, output_dir):
if not os.path.exists(output_dir):
sh.mkdir("-p", output_dir)
output_path = "%s/%s" % (output_dir, file_path)
if os.path.exists(output_path):
sh.rm('-rf', output_path)
adb_pull(file_path + '/' + file_name, output_dir, serial_num)
def merge_opencl_binaries(binaries_dirs,
cl_compiled_program_file_name,
output_file_path):
......@@ -690,19 +593,17 @@ def push_depended_so_libs(libmace_dynamic_library_path,
abi, phone_data_dir, serialno):
dep_so_libs = sh.bash(os.environ["ANDROID_NDK_HOME"] + "/ndk-depends",
libmace_dynamic_library_path)
src_file = ""
for dep in split_stdout(dep_so_libs):
if dep == "libgnustl_shared.so":
adb_push(
"%s/sources/cxx-stl/gnu-libstdc++/4.9/libs/%s/libgnustl_shared.so" # noqa
% (os.environ["ANDROID_NDK_HOME"], abi),
phone_data_dir,
serialno)
src_file = "%s/sources/cxx-stl/gnu-libstdc++/4.9/libs/" \
"%s/libgnustl_shared.so"\
% (os.environ["ANDROID_NDK_HOME"], abi)
elif dep == "libc++_shared.so":
adb_push(
"%s/sources/cxx-stl/llvm-libc++/libs/%s/libc++_shared.so" # noqa
% (os.environ["ANDROID_NDK_HOME"], abi),
phone_data_dir,
serialno)
src_file = "%s/sources/cxx-stl/llvm-libc++/libs/" \
"%s/libc++_shared.so" % (os.environ["ANDROID_NDK_HOME"], abi)
print("push %s to %s" % (src_file, phone_data_dir))
adb_push(src_file, phone_data_dir, serialno)
def validate_model(abi,
......@@ -861,149 +762,6 @@ def packaging_lib(libmace_output_dir, project_name):
################################
# benchmark
################################
def benchmark_model(abi,
serialno,
benchmark_binary_dir,
benchmark_binary_name,
vlog_level,
embed_model_data,
model_output_dir,
mace_model_dir,
input_nodes,
output_nodes,
input_shapes,
output_shapes,
model_tag,
device_type,
phone_data_dir,
model_graph_format,
opencl_binary_file,
opencl_parameter_file,
libmace_dynamic_library_path,
omp_num_threads=-1,
cpu_affinity_policy=1,
gpu_perf_hint=3,
gpu_priority_hint=3,
input_file_name="model_input",
link_dynamic=False):
six.print_("* Benchmark for %s" % model_tag)
mace_model_path = ""
if model_graph_format == ModelFormat.file:
mace_model_path = "%s/%s.pb" % (mace_model_dir, model_tag)
if abi == "host":
libmace_dynamic_lib_dir_path = \
os.path.dirname(libmace_dynamic_library_path)
p = subprocess.Popen(
[
"env",
"LD_LIBRARY_PATH=%s" % libmace_dynamic_lib_dir_path,
"MACE_CPP_MIN_VLOG_LEVEL=%s" % vlog_level,
"%s/%s" % (benchmark_binary_dir, benchmark_binary_name),
"--model_name=%s" % model_tag,
"--input_node=%s" % ",".join(input_nodes),
"--output_node=%s" % ",".join(output_nodes),
"--input_shape=%s" % ":".join(input_shapes),
"--output_shape=%s" % ":".join(output_shapes),
"--input_file=%s/%s" % (model_output_dir, input_file_name),
"--model_data_file=%s/%s.data" % (mace_model_dir, model_tag),
"--device=%s" % device_type,
"--omp_num_threads=%s" % omp_num_threads,
"--cpu_affinity_policy=%s" % cpu_affinity_policy,
"--gpu_perf_hint=%s" % gpu_perf_hint,
"--gpu_priority_hint=%s" % gpu_priority_hint,
"--model_file=%s" % mace_model_path,
])
p.wait()
else:
sh.adb("-s", serialno, "shell", "mkdir", "-p", phone_data_dir)
internal_storage_dir = create_internal_storage_dir(
serialno, phone_data_dir)
for input_name in input_nodes:
formatted_name = common.formatted_file_name(input_file_name,
input_name)
adb_push("%s/%s" % (model_output_dir, formatted_name),
phone_data_dir, serialno)
if not embed_model_data:
adb_push("%s/%s.data" % (mace_model_dir, model_tag),
phone_data_dir, serialno)
if device_type == common.DeviceType.GPU:
if os.path.exists(opencl_binary_file):
adb_push(opencl_binary_file, phone_data_dir, serialno)
if os.path.exists(opencl_parameter_file):
adb_push(opencl_parameter_file, phone_data_dir, serialno)
mace_model_phone_path = ""
if model_graph_format == ModelFormat.file:
mace_model_phone_path = "%s/%s.pb" % (phone_data_dir, model_tag)
adb_push(mace_model_path,
mace_model_phone_path,
serialno)
if link_dynamic:
adb_push(libmace_dynamic_library_path, phone_data_dir,
serialno)
push_depended_so_libs(libmace_dynamic_library_path, abi,
phone_data_dir, serialno)
adb_push("%s/%s" % (benchmark_binary_dir, benchmark_binary_name),
phone_data_dir,
serialno)
adb_cmd = [
"LD_LIBRARY_PATH=%s" % phone_data_dir,
"MACE_CPP_MIN_VLOG_LEVEL=%s" % vlog_level,
"MACE_RUN_PARAMETER_PATH=%s/mace_run.config" %
phone_data_dir,
"MACE_INTERNAL_STORAGE_PATH=%s" % internal_storage_dir,
"MACE_OPENCL_PROFILING=1",
"%s/%s" % (phone_data_dir, benchmark_binary_name),
"--model_name=%s" % model_tag,
"--input_node=%s" % ",".join(input_nodes),
"--output_node=%s" % ",".join(output_nodes),
"--input_shape=%s" % ":".join(input_shapes),
"--output_shape=%s" % ":".join(output_shapes),
"--input_file=%s/%s" % (phone_data_dir, input_file_name),
"--model_data_file=%s/%s.data" % (phone_data_dir, model_tag),
"--device=%s" % device_type,
"--omp_num_threads=%s" % omp_num_threads,
"--cpu_affinity_policy=%s" % cpu_affinity_policy,
"--gpu_perf_hint=%s" % gpu_perf_hint,
"--gpu_priority_hint=%s" % gpu_priority_hint,
"--model_file=%s" % mace_model_phone_path,
"--opencl_binary_file=%s/%s" %
(phone_data_dir, os.path.basename(opencl_binary_file)),
"--opencl_parameter_file=%s/%s" %
(phone_data_dir, os.path.basename(opencl_parameter_file)),
]
adb_cmd = ' '.join(adb_cmd)
cmd_file_name = "%s-%s-%s" % ('cmd_file', model_tag, str(time.time()))
adb_cmd_file = "%s/%s" % (phone_data_dir, cmd_file_name)
tmp_cmd_file = "%s/%s" % ('/tmp', cmd_file_name)
with open(tmp_cmd_file, 'w') as cmd_file:
cmd_file.write(adb_cmd)
adb_push(tmp_cmd_file, adb_cmd_file, serialno)
os.remove(tmp_cmd_file)
sh.adb(
"-s",
serialno,
"shell",
"sh",
adb_cmd_file,
_fg=True)
sh.adb(
"-s",
serialno,
"shell",
"rm",
adb_cmd_file,
_fg=True)
six.print_("Benchmark done!\n")
def build_run_throughput_test(abi,
serialno,
vlog_level,
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册