Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Xiaomi
Mace
提交
81865af1
Mace
项目概览
Xiaomi
/
Mace
通知
106
Star
40
Fork
27
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
Mace
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
81865af1
编写于
5月 22, 2018
作者:
Y
yejianwu
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'master' of v9.git.n.xiaomi.com:deep-computing/mace into load_model_in_pb
上级
931a005c
ddfac7b5
变更
24
展开全部
隐藏空白更改
内联
并排
Showing
24 changed file
with
1133 addition
and
487 deletion
+1133
-487
README.md
README.md
+18
-10
RELEASE.md
RELEASE.md
+4
-2
docker/Dockerfile
docker/Dockerfile
+0
-2
docker/gitlab-runner/Dockerfile
docker/gitlab-runner/Dockerfile
+0
-2
docs/getting_started/mace-arch.png
docs/getting_started/mace-arch.png
+0
-0
mace/kernels/arm/conv_2d_neon.h
mace/kernels/arm/conv_2d_neon.h
+12
-0
mace/kernels/arm/conv_2d_neon_15x1.cc
mace/kernels/arm/conv_2d_neon_15x1.cc
+163
-0
mace/kernels/arm/conv_2d_neon_1x15.cc
mace/kernels/arm/conv_2d_neon_1x15.cc
+149
-0
mace/kernels/arm/conv_winograd.cc
mace/kernels/arm/conv_winograd.cc
+319
-282
mace/kernels/conv_2d.h
mace/kernels/conv_2d.h
+36
-14
mace/kernels/softmax.h
mace/kernels/softmax.h
+26
-34
mace/kernels/transpose.h
mace/kernels/transpose.h
+105
-22
mace/ops/conv_2d_benchmark.cc
mace/ops/conv_2d_benchmark.cc
+7
-0
mace/ops/conv_2d_test.cc
mace/ops/conv_2d_test.cc
+6
-0
mace/ops/transpose_benchmark.cc
mace/ops/transpose_benchmark.cc
+3
-0
mace/ops/transpose_test.cc
mace/ops/transpose_test.cc
+37
-2
mace/python/tools/converter.py
mace/python/tools/converter.py
+18
-9
mace/python/tools/converter_tool/base_converter.py
mace/python/tools/converter_tool/base_converter.py
+10
-8
mace/python/tools/converter_tool/tensorflow_converter.py
mace/python/tools/converter_tool/tensorflow_converter.py
+137
-88
mace/python/tools/converter_tool/transformer.py
mace/python/tools/converter_tool/transformer.py
+62
-0
mace/python/tools/source_converter_lib.py
mace/python/tools/source_converter_lib.py
+0
-1
mace/python/tools/tensor_util.py
mace/python/tools/tensor_util.py
+11
-9
tools/mace_tools.py
tools/mace_tools.py
+7
-1
tools/sh_commands.py
tools/sh_commands.py
+3
-1
未找到文件。
README.md
浏览文件 @
81865af1
...
...
@@ -12,12 +12,17 @@
mobile heterogeneous computing platforms. The design is focused on the following
targets:
*
Performance
*
The runtime is highly optimized with NEON, OpenCL and HVX. Except for the
inference speed, the initialization speed is also intensively optimized.
*
The runtime is highly optimized with NEON, OpenCL and Hexagon, and
[
Winograd algorithm
](
https://arxiv.org/abs/1509.09308
)
is introduced to
speed up the convolution operations. Except for the inference speed, the
initialization speed is also intensively optimized.
*
Power consumption
*
Chip dependent power options are included as advanced API.
*
Chip dependent power options like big.LITTLE scheduling, Adreno GPU hints are
included as advanced API.
*
Memory usage and library footprint
*
Graph level memory allocation optimization and buffer reuse is supported.
The core library tries to keep minium external dependencies to keep the
library footprint small.
*
Model protection
*
Model protection is one the highest priority feature from the beginning of
the design. Various techniques are introduced like coverting models to C++
...
...
@@ -28,31 +33,34 @@ targets:
archetectures with limited performance.
## Getting Started
*
[
Introduction
](
docs/getting_started/introduction.rst
)
*
[
How to build
](
docs/getting_started/how_to_build.rst
)
*
[
Create a model deployment file
](
docs/getting_started/create_a_model_deployment.rst
)
## Performance
[
MiAI Model Zoo
](
http://v9.git.n.xiaomi.com/deep-computing/mace-models
)
contains
several common neural networks models and built daily against
several
mobile
[
MiAI
Compute Engine
Model Zoo
](
http://v9.git.n.xiaomi.com/deep-computing/mace-models
)
contains
several common neural networks models and built daily against
a list of
mobile
phones. The benchmark result can be found in the CI result page.
## Communication
*
GitHub issues: bug reports, usage issues, feature requests
*
Gitter
or Slack
:
*
QQ群:
*
Gitter:
*
QQ群:
756046893
## Contributing
Any kind of contributions are welcome. For bug reports, feature requests,
please just open an issue without any hesitance. For code contributions, it's
strongly suggested to open an issue for discussion first. For more details,
please refer to
[
th
is guide
](
docs
)
.
please refer to
[
th
e contribution guide
](
docs/development/contributing.md
)
.
## License
[
Apache License 2.0
](
LICENSE
)
.
## Acknowledgement
*MiAI Compute Engine*
depends on several open source projects located in
MiAI Compute Engine
depends on several open source projects located in
[
third_party
](
mace/third_party
)
directory. Particularly, we learned a lot from
the following projects during the development:
*
[
nnlib
](
https://source.codeaurora.org/quic/hexagon_nn/nnlib
)
: the
DSP runtime
*
[
Qualcomm Hexagon NN Offload Framework
](
https://source.codeaurora.org/quic/hexagon_nn/nnlib
)
: the Hexagon
DSP runtime
depends on this library.
*
[
TensorFlow
](
https://github.com/tensorflow/tensorflow
)
,
[
Caffe
](
https://github.com/BVLC/caffe
)
,
...
...
RELEASE.md
浏览文件 @
81865af1
...
...
@@ -5,10 +5,12 @@ v0.6.0 (2018-04-04)
------
1.
Change mace header interfaces, only including necessary methods.
v0.6.3 (2018-05-21)
------
1.
support
`float`
data_type when running in gpu
v0.7.0 (2018-05-18)
------
1.
Change interface that report error type
2.
Improve cpu performace
3.
Merge cpu/gpu engine to one
3.
Merge cpu/gpu engine to one
\ No newline at end of file
docker/Dockerfile
浏览文件 @
81865af1
FROM
ubuntu:16.04
# Update source
# Looks like mirrors.163.com does not work in xiaomi network
# RUN sed -i 's/http:\/\/archive\.ubuntu\.com\/ubuntu\//http:\/\/mirrors\.163\.com\/ubuntu\//g' /etc/apt/sources.list
RUN
apt-get update
-y
## Basic tools
...
...
docker/gitlab-runner/Dockerfile
浏览文件 @
81865af1
FROM
cr.d.xiaomi.net/mace/mace-dev:latest
# Update source
# Looks like mirrors.163.com does not work in xiaomi network
# RUN sed -i 's/http:\/\/archive\.ubuntu\.com\/ubuntu\//http:\/\/mirrors\.163\.com\/ubuntu\//g' /etc/apt/sources.list
RUN
apt-get update
-y
# Install gitlab runner
...
...
docs/getting_started/mace-arch.png
查看替换文件 @
931a005c
浏览文件 @
81865af1
18.2 KB
|
W:
|
H:
18.0 KB
|
W:
|
H:
2-up
Swipe
Onion skin
mace/kernels/arm/conv_2d_neon.h
浏览文件 @
81865af1
...
...
@@ -65,6 +65,18 @@ extern void Conv2dNeonK7x7S3(const float *input,
const
index_t
*
out_shape
,
float
*
output
);
extern
void
Conv2dNeonK1x15S1
(
const
float
*
input
,
const
float
*
filter
,
const
index_t
*
in_shape
,
const
index_t
*
out_shape
,
float
*
output
);
extern
void
Conv2dNeonK15x1S1
(
const
float
*
input
,
const
float
*
filter
,
const
index_t
*
in_shape
,
const
index_t
*
out_shape
,
float
*
output
);
}
// namespace kernels
}
// namespace mace
...
...
mace/kernels/arm/conv_2d_neon_15x1.cc
0 → 100644
浏览文件 @
81865af1
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#if defined(MACE_ENABLE_NEON)
#include <arm_neon.h>
#endif
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/utils/utils.h"
namespace
mace
{
namespace
kernels
{
inline
void
Conv2dCPUK15x1Calc
(
const
float
*
in_ptr
,
const
float
*
filter_ptr
,
const
index_t
in_width
,
const
index_t
in_channels
,
const
index_t
out_height
,
const
index_t
out_width
,
const
index_t
w
,
const
index_t
tile_width
,
const
index_t
out_image_size
,
float
*
out_ptr
,
const
index_t
io
,
const
int
stride
)
{
for
(
index_t
ih
=
0
;
ih
<
out_height
;
++
ih
)
{
for
(
index_t
iw
=
0
;
iw
<
tile_width
&&
w
+
iw
<
out_width
;
++
iw
)
{
for
(
int
i
=
0
;
i
<
15
;
++
i
)
{
for
(
int
j
=
0
;
j
<
1
;
++
j
)
{
out_ptr
[
io
*
out_image_size
+
ih
*
out_width
+
w
+
iw
]
+=
in_ptr
[(
ih
*
stride
+
i
)
*
in_width
+
((
w
+
iw
)
*
stride
+
j
)]
*
filter_ptr
[
io
*
in_channels
*
15
+
i
*
1
+
j
];
}
}
}
}
}
// Ho = 4, Wo = 1, Co = 1
void
Conv2dNeonK15x1S1
(
const
float
*
input
,
const
float
*
filter
,
const
index_t
*
in_shape
,
const
index_t
*
out_shape
,
float
*
output
)
{
const
index_t
in_image_size
=
in_shape
[
2
]
*
in_shape
[
3
];
const
index_t
out_image_size
=
out_shape
[
2
]
*
out_shape
[
3
];
const
index_t
in_batch_size
=
in_shape
[
1
]
*
in_image_size
;
const
index_t
out_batch_size
=
out_shape
[
1
]
*
out_image_size
;
const
index_t
tile_width
=
out_shape
[
1
]
<
4
?
RoundUpDiv4
(
out_shape
[
3
])
:
out_shape
[
3
];
#pragma omp parallel for collapse(3)
for
(
index_t
b
=
0
;
b
<
out_shape
[
0
];
++
b
)
{
for
(
index_t
m
=
0
;
m
<
out_shape
[
1
];
++
m
)
{
for
(
index_t
w
=
0
;
w
<
out_shape
[
3
];
w
+=
tile_width
)
{
const
index_t
out_height
=
out_shape
[
2
];
const
index_t
out_width
=
out_shape
[
3
];
const
index_t
in_channels
=
in_shape
[
1
];
const
index_t
in_width
=
in_shape
[
3
];
float
*
out_ptr_base
=
output
+
b
*
out_batch_size
+
m
*
out_image_size
;
for
(
index_t
c
=
0
;
c
<
in_channels
;
++
c
)
{
const
float
*
in_ptr_base
=
input
+
b
*
in_batch_size
+
c
*
in_image_size
;
const
float
*
filter_ptr
=
filter
+
m
*
in_channels
*
15
+
c
*
15
;
#if defined(MACE_ENABLE_NEON) && !defined(__aarch64__)
/* load filter (1 outch x 1 height x 4 width) */
float32x4_t
vf0
,
vf1
,
vf2
,
vf3
;
vf0
=
vld1q_f32
(
filter_ptr
);
vf1
=
vld1q_f32
(
filter_ptr
+
4
);
vf2
=
vld1q_f32
(
filter_ptr
+
8
);
vf3
=
vld1q_f32
(
filter_ptr
+
11
);
for
(
index_t
h
=
0
;
h
+
3
<
out_height
;
h
+=
4
)
{
for
(
index_t
wt
=
0
;
wt
<
tile_width
&&
w
+
wt
<
out_width
;
++
wt
)
{
// load output
index_t
out_offset
=
h
*
out_width
+
w
+
wt
;
// output (1 outch x 1 height x 4 width): vo_outch_height
float32x4_t
vo
=
{
out_ptr_base
[
out_offset
],
out_ptr_base
[
out_offset
+
out_width
],
out_ptr_base
[
out_offset
+
2
*
out_width
],
out_ptr_base
[
out_offset
+
3
*
out_width
]};
// input offset
index_t
in_offset
=
h
*
in_width
+
w
+
wt
;
// input (3 slide)
float32x4_t
vi0
=
{
in_ptr_base
[
in_offset
],
in_ptr_base
[
in_offset
+
in_width
],
in_ptr_base
[
in_offset
+
2
*
in_width
],
in_ptr_base
[
in_offset
+
3
*
in_width
]};
float32x4_t
vi4
=
{
in_ptr_base
[
in_offset
+
4
*
in_width
],
in_ptr_base
[
in_offset
+
5
*
in_width
],
in_ptr_base
[
in_offset
+
6
*
in_width
],
in_ptr_base
[
in_offset
+
7
*
in_width
]};
float32x4_t
vi8
=
{
in_ptr_base
[
in_offset
+
8
*
in_width
],
in_ptr_base
[
in_offset
+
9
*
in_width
],
in_ptr_base
[
in_offset
+
10
*
in_width
],
in_ptr_base
[
in_offset
+
11
*
in_width
]};
float32x4_t
vi12
=
{
in_ptr_base
[
in_offset
+
12
*
in_width
],
in_ptr_base
[
in_offset
+
13
*
in_width
],
in_ptr_base
[
in_offset
+
14
*
in_width
],
in_ptr_base
[
in_offset
+
15
*
in_width
]};
float32x4_t
vi16
=
{
in_ptr_base
[
in_offset
+
16
*
in_width
],
in_ptr_base
[
in_offset
+
17
*
in_width
]};
float32x4_t
vi1
=
vextq_f32
(
vi0
,
vi4
,
1
);
float32x4_t
vi2
=
vextq_f32
(
vi0
,
vi4
,
2
);
float32x4_t
vi3
=
vextq_f32
(
vi0
,
vi4
,
3
);
float32x4_t
vi5
=
vextq_f32
(
vi4
,
vi8
,
1
);
float32x4_t
vi6
=
vextq_f32
(
vi4
,
vi8
,
2
);
float32x4_t
vi7
=
vextq_f32
(
vi4
,
vi8
,
3
);
float32x4_t
vi9
=
vextq_f32
(
vi8
,
vi12
,
1
);
float32x4_t
vi10
=
vextq_f32
(
vi8
,
vi12
,
2
);
float32x4_t
vi11
=
vextq_f32
(
vi8
,
vi12
,
3
);
float32x4_t
vi13
=
vextq_f32
(
vi12
,
vi16
,
1
);
float32x4_t
vi14
=
vextq_f32
(
vi12
,
vi16
,
2
);
vo
=
vmlaq_lane_f32
(
vo
,
vi0
,
vget_low_f32
(
vf0
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi1
,
vget_low_f32
(
vf0
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi2
,
vget_high_f32
(
vf0
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi3
,
vget_high_f32
(
vf0
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi4
,
vget_low_f32
(
vf1
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi5
,
vget_low_f32
(
vf1
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi6
,
vget_high_f32
(
vf1
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi7
,
vget_high_f32
(
vf1
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi8
,
vget_low_f32
(
vf2
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi9
,
vget_low_f32
(
vf2
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi10
,
vget_high_f32
(
vf2
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi11
,
vget_high_f32
(
vf2
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi12
,
vget_low_f32
(
vf3
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi13
,
vget_high_f32
(
vf3
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi14
,
vget_high_f32
(
vf3
),
1
);
out_ptr_base
[
out_offset
]
=
vo
[
0
];
out_ptr_base
[
out_offset
+
out_width
]
=
vo
[
1
];
out_ptr_base
[
out_offset
+
2
*
out_width
]
=
vo
[
2
];
out_ptr_base
[
out_offset
+
3
*
out_width
]
=
vo
[
3
];
}
// wt
}
// h
#else
Conv2dCPUK15x1Calc
(
in_ptr_base
,
filter_ptr
,
in_width
,
in_channels
,
out_height
,
out_width
,
w
,
tile_width
,
out_image_size
,
out_ptr_base
,
0
,
1
);
#endif
}
// c
}
// w
}
// m
}
// b
}
}
// namespace kernels
}
// namespace mace
mace/kernels/arm/conv_2d_neon_1x15.cc
0 → 100644
浏览文件 @
81865af1
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#if defined(MACE_ENABLE_NEON)
#include <arm_neon.h>
#endif
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/utils/utils.h"
#include "mace/utils/logging.h"
namespace
mace
{
namespace
kernels
{
inline
void
Conv2dCPUK1x15Calc
(
const
float
*
in_ptr
,
const
float
*
filter_ptr
,
const
index_t
in_width
,
const
index_t
in_channels
,
const
index_t
out_height
,
const
index_t
h
,
const
index_t
tile_height
,
const
index_t
out_width
,
const
index_t
out_image_size
,
float
*
out_ptr
,
const
index_t
io
,
const
int
stride
)
{
for
(
index_t
ih
=
0
;
ih
<
tile_height
&&
h
+
ih
<
out_height
;
++
ih
)
{
for
(
index_t
iw
=
0
;
iw
<
out_width
;
++
iw
)
{
for
(
int
i
=
0
;
i
<
1
;
++
i
)
{
for
(
int
j
=
0
;
j
<
15
;
++
j
)
{
out_ptr
[
io
*
out_image_size
+
(
h
+
ih
)
*
out_width
+
iw
]
+=
in_ptr
[((
h
+
ih
)
*
stride
+
i
)
*
in_width
+
(
iw
*
stride
+
j
)]
*
filter_ptr
[
io
*
in_channels
*
15
+
i
*
15
+
j
];
}
}
}
}
}
// Ho = 1, Wo = 4, Co = 1
void
Conv2dNeonK1x15S1
(
const
float
*
input
,
const
float
*
filter
,
const
index_t
*
in_shape
,
const
index_t
*
out_shape
,
float
*
output
)
{
const
index_t
in_image_size
=
in_shape
[
2
]
*
in_shape
[
3
];
const
index_t
out_image_size
=
out_shape
[
2
]
*
out_shape
[
3
];
const
index_t
in_batch_size
=
in_shape
[
1
]
*
in_image_size
;
const
index_t
out_batch_size
=
out_shape
[
1
]
*
out_image_size
;
const
index_t
tile_height
=
out_shape
[
1
]
<
4
?
RoundUpDiv4
(
out_shape
[
2
])
:
out_shape
[
2
];
#pragma omp parallel for collapse(3)
for
(
index_t
b
=
0
;
b
<
out_shape
[
0
];
++
b
)
{
for
(
index_t
m
=
0
;
m
<
out_shape
[
1
];
++
m
)
{
for
(
index_t
h
=
0
;
h
<
out_shape
[
2
];
h
+=
tile_height
)
{
const
index_t
out_height
=
out_shape
[
2
];
const
index_t
out_width
=
out_shape
[
3
];
const
index_t
in_channels
=
in_shape
[
1
];
const
index_t
in_width
=
in_shape
[
3
];
float
*
out_ptr_base
=
output
+
b
*
out_batch_size
+
m
*
out_image_size
;
for
(
index_t
c
=
0
;
c
<
in_channels
;
++
c
)
{
const
float
*
in_ptr_base
=
input
+
b
*
in_batch_size
+
c
*
in_image_size
;
const
float
*
filter_ptr
=
filter
+
m
*
in_channels
*
15
+
c
*
15
;
#if defined(MACE_ENABLE_NEON) && !defined(__aarch64__)
/* load filter (1 outch x 4 height x 1 width) */
float32x4_t
vf0
,
vf1
,
vf2
,
vf3
;
vf0
=
vld1q_f32
(
filter_ptr
);
vf1
=
vld1q_f32
(
filter_ptr
+
4
);
vf2
=
vld1q_f32
(
filter_ptr
+
8
);
vf3
=
vld1q_f32
(
filter_ptr
+
11
);
for
(
index_t
ht
=
0
;
ht
<
tile_height
&&
h
+
ht
<
out_height
;
++
ht
)
{
for
(
index_t
w
=
0
;
w
+
3
<
out_width
;
w
+=
4
)
{
// output (1 outch x 1 height x 4 width): vo_outch_height
float32x4_t
vo
;
// load output
index_t
out_offset
=
(
h
+
ht
)
*
out_width
+
w
;
vo
=
vld1q_f32
(
out_ptr_base
+
out_offset
);
// input (3 slide)
float32x4_t
vi0
,
vi1
,
vi2
,
vi3
,
vi4
,
vi5
,
vi6
,
vi7
,
vi8
,
vi9
,
vi10
,
vi11
,
vi12
,
vi13
,
vi14
,
vi16
;
// input offset
index_t
in_offset
=
(
h
+
ht
)
*
in_width
+
w
;
// load input
vi0
=
vld1q_f32
(
in_ptr_base
+
in_offset
);
vi4
=
vld1q_f32
(
in_ptr_base
+
in_offset
+
4
);
vi8
=
vld1q_f32
(
in_ptr_base
+
in_offset
+
8
);
vi12
=
vld1q_f32
(
in_ptr_base
+
in_offset
+
12
);
vi16
=
vld1q_f32
(
in_ptr_base
+
in_offset
+
16
);
vi1
=
vextq_f32
(
vi0
,
vi4
,
1
);
vi2
=
vextq_f32
(
vi0
,
vi4
,
2
);
vi3
=
vextq_f32
(
vi0
,
vi4
,
3
);
vi5
=
vextq_f32
(
vi4
,
vi8
,
1
);
vi6
=
vextq_f32
(
vi4
,
vi8
,
2
);
vi7
=
vextq_f32
(
vi4
,
vi8
,
3
);
vi9
=
vextq_f32
(
vi8
,
vi12
,
1
);
vi10
=
vextq_f32
(
vi8
,
vi12
,
2
);
vi11
=
vextq_f32
(
vi8
,
vi12
,
3
);
vi13
=
vextq_f32
(
vi12
,
vi16
,
1
);
vi14
=
vextq_f32
(
vi12
,
vi16
,
2
);
vo
=
vmlaq_lane_f32
(
vo
,
vi0
,
vget_low_f32
(
vf0
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi1
,
vget_low_f32
(
vf0
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi2
,
vget_high_f32
(
vf0
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi3
,
vget_high_f32
(
vf0
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi4
,
vget_low_f32
(
vf1
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi5
,
vget_low_f32
(
vf1
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi6
,
vget_high_f32
(
vf1
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi7
,
vget_high_f32
(
vf1
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi8
,
vget_low_f32
(
vf2
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi9
,
vget_low_f32
(
vf2
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi10
,
vget_high_f32
(
vf2
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi11
,
vget_high_f32
(
vf2
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi12
,
vget_low_f32
(
vf3
),
1
);
vo
=
vmlaq_lane_f32
(
vo
,
vi13
,
vget_high_f32
(
vf3
),
0
);
vo
=
vmlaq_lane_f32
(
vo
,
vi14
,
vget_high_f32
(
vf3
),
1
);
vst1q_f32
(
out_ptr_base
+
out_offset
,
vo
);
}
// w
}
// ht
#else
Conv2dCPUK1x15Calc
(
in_ptr_base
,
filter_ptr
,
in_width
,
in_channels
,
out_height
,
h
,
tile_height
,
out_width
,
out_image_size
,
out_ptr_base
,
0
,
1
);
#endif
}
// c
}
// h
}
// m
}
// b
}
}
// namespace kernels
}
// namespace mace
mace/kernels/arm/conv_winograd.cc
浏览文件 @
81865af1
此差异已折叠。
点击以展开。
mace/kernels/conv_2d.h
浏览文件 @
81865af1
...
...
@@ -363,6 +363,10 @@ struct Conv2dFunctor<DeviceType::CPU, float> : Conv2dFunctorBase {
&&
stride_h
==
2
&&
stride_w
==
2
&&
dilation_h
==
1
&&
dilation_w
==
1
;
bool
use_neon_7x7_s3
=
filter_h
==
7
&&
filter_w
==
7
&&
stride_h
==
3
&&
stride_w
==
3
&&
dilation_h
==
1
&&
dilation_w
==
1
;
bool
use_neon_1x15_s1
=
filter_h
==
1
&&
filter_w
==
15
&&
stride_h
==
1
&&
stride_w
==
1
&&
dilation_h
==
1
&&
dilation_w
==
1
;
bool
use_neon_15x1_s1
=
filter_h
==
15
&&
filter_w
==
1
&&
stride_h
==
1
&&
stride_w
==
1
&&
dilation_h
==
1
&&
dilation_w
==
1
;
std
::
vector
<
index_t
>
transformed_input_shape
;
std
::
vector
<
index_t
>
transformed_output_shape
;
...
...
@@ -402,24 +406,26 @@ struct Conv2dFunctor<DeviceType::CPU, float> : Conv2dFunctorBase {
tile_count
});
transformed_filter_shape
.
insert
(
transformed_filter_shape
.
end
(),
{
in_tile_area
,
channels
,
input_channels
});
}
else
if
(
use_neon_3x3_s1
)
{
extra_output_height
=
RoundUp
<
index_t
>
(
height
,
2
);
extra_input_height
=
std
::
max
(
padded_input_height
,
extra_output_height
+
2
);
extra_output_width
=
RoundUp
<
index_t
>
(
width
,
4
);
extra_input_width
=
std
::
max
(
padded_input_width
,
extra_output_width
+
2
);
if
(
extra_input_height
!=
padded_input_height
)
{
pad_bottom
+=
(
extra_input_height
-
padded_input_height
);
}
if
(
extra_input_width
!=
padded_input_width
)
{
pad_right
+=
(
extra_input_width
-
padded_input_width
);
}
else
{
index_t
tile_h
,
tile_w
;
if
(
use_neon_1x1_s1
)
{
tile_h
=
1
;
tile_w
=
1
;
}
else
if
(
use_neon_3x3_s1
)
{
tile_h
=
2
;
tile_w
=
4
;
}
else
if
(
use_neon_15x1_s1
)
{
tile_h
=
4
;
tile_w
=
1
;
}
else
{
tile_h
=
1
;
tile_w
=
4
;
}
}
else
if
(
!
use_neon_1x1_s1
)
{
extra_output_height
=
height
;
extra_output_height
=
RoundUp
<
index_t
>
(
height
,
tile_h
);
extra_input_height
=
std
::
max
(
padded_input_height
,
(
extra_output_height
-
1
)
*
stride_h
+
(
filter_h
-
1
)
*
dilation_h
+
1
);
extra_output_width
=
RoundUp
<
index_t
>
(
width
,
4
);
extra_output_width
=
RoundUp
<
index_t
>
(
width
,
tile_w
);
extra_input_width
=
std
::
max
(
padded_input_width
,
(
extra_output_width
-
1
)
*
stride_w
+
(
filter_w
-
1
)
*
dilation_w
+
1
);
...
...
@@ -584,6 +590,22 @@ struct Conv2dFunctor<DeviceType::CPU, float> : Conv2dFunctorBase {
extra_output_shape
,
pad_output
);
};
}
else
if
(
use_neon_1x15_s1
)
{
conv_func
=
[
=
](
const
float
*
pad_input
,
float
*
pad_output
)
{
Conv2dNeonK1x15S1
(
pad_input
,
filter_data
,
extra_input_shape
,
extra_output_shape
,
pad_output
);
};
}
else
if
(
use_neon_15x1_s1
)
{
conv_func
=
[
=
](
const
float
*
pad_input
,
float
*
pad_output
)
{
Conv2dNeonK15x1S1
(
pad_input
,
filter_data
,
extra_input_shape
,
extra_output_shape
,
pad_output
);
};
}
else
{
conv_func
=
[
=
](
const
float
*
pad_input
,
float
*
pad_output
)
{
Conv2dGeneral
(
pad_input
,
...
...
mace/kernels/softmax.h
浏览文件 @
81865af1
...
...
@@ -43,6 +43,7 @@ struct SoftmaxFunctor<DeviceType::CPU, float> {
const
index_t
batch
=
input
->
dim
(
0
);
const
index_t
class_count
=
input
->
dim
(
1
);
const
index_t
class_size
=
input
->
dim
(
2
)
*
input
->
dim
(
3
);
const
index_t
batch_size
=
class_count
*
class_size
;
Tensor
::
MappingGuard
input_guard
(
input
);
Tensor
::
MappingGuard
output_guard
(
output
);
...
...
@@ -50,46 +51,37 @@ struct SoftmaxFunctor<DeviceType::CPU, float> {
float
*
output_data
=
output
->
mutable_data
<
float
>
();
for
(
index_t
b
=
0
;
b
<
batch
;
++
b
)
{
std
::
vector
<
float
>
max_val
(
class_size
,
std
::
numeric_limits
<
float
>::
lowest
());
std
::
vector
<
float
>
sum_val
(
class_size
,
0.
f
);
// calculate max for each class
for
(
index_t
c
=
0
;
c
<
class_count
;
++
c
)
{
const
float
*
input_ptr
=
input_data
+
(
b
*
class_count
+
c
)
*
class_size
;
for
(
index_t
k
=
0
;
k
<
class_size
;
++
k
)
{
max_val
[
k
]
=
std
::
max
(
max_val
[
k
],
input_ptr
[
k
]);
}
}
// calculate data - max for each class
#pragma omp parallel for
for
(
index_t
c
=
0
;
c
<
class_count
;
++
c
)
{
const
float
*
input_ptr
=
input_data
+
(
b
*
class_count
+
c
)
*
class_size
;
float
*
output_ptr
=
output_data
+
(
b
*
class_count
+
c
)
*
class_size
;
for
(
index_t
k
=
0
;
k
<
class_size
;
++
k
)
{
output_ptr
[
k
]
=
::
exp
(
input_ptr
[
k
]
-
max_val
[
k
]);
for
(
index_t
k
=
0
;
k
<
class_size
;
++
k
)
{
const
float
*
input_ptr
=
input_data
+
b
*
batch_size
+
k
;
float
*
output_ptr
=
output_data
+
b
*
batch_size
+
k
;
float
max_val
=
std
::
numeric_limits
<
float
>::
lowest
();
index_t
channel_offset
=
0
;
for
(
index_t
c
=
0
;
c
<
class_count
;
++
c
)
{
float
data
=
input_ptr
[
channel_offset
];
if
(
data
>
max_val
)
{
max_val
=
data
;
}
channel_offset
+=
class_size
;
}
}
// calculate sum for each class
for
(
index_t
c
=
0
;
c
<
class_count
;
++
c
)
{
float
*
output_ptr
=
output_data
+
(
b
*
class_count
+
c
)
*
class_size
;
for
(
index_t
k
=
0
;
k
<
class_size
;
++
k
)
{
sum_val
[
k
]
+=
output_ptr
[
k
];
channel_offset
=
0
;
float
sum
=
0
;
for
(
index_t
c
=
0
;
c
<
class_count
;
++
c
)
{
float
exp_value
=
::
exp
(
input_ptr
[
channel_offset
]
-
max_val
);
sum
+=
exp_value
;
output_ptr
[
channel_offset
]
=
exp_value
;
channel_offset
+=
class_size
;
}
}
// calculate (data - max) / sum for each class
for
(
index_t
c
=
0
;
c
<
class_count
;
++
c
)
{
float
*
output_ptr
=
output_data
+
(
b
*
class_count
+
c
)
*
class_size
;
for
(
index_t
k
=
0
;
k
<
class_size
;
++
k
)
{
output_ptr
[
k
]
/=
sum_val
[
k
];
channel_offset
=
0
;
for
(
index_t
c
=
0
;
c
<
class_count
;
++
c
)
{
output_ptr
[
channel_offset
]
/=
sum
;
channel_offset
+=
class_size
;
}
}
}
}
// k
}
// b
}
};
...
...
mace/kernels/transpose.h
浏览文件 @
81865af1
...
...
@@ -15,6 +15,10 @@
#ifndef MACE_KERNELS_TRANSPOSE_H_
#define MACE_KERNELS_TRANSPOSE_H_
#if defined(MACE_ENABLE_NEON)
#include <arm_neon.h>
#endif
#include <vector>
#include "mace/core/future.h"
...
...
@@ -25,6 +29,65 @@
namespace
mace
{
namespace
kernels
{
static
void
TransposeNHWCToNCHWC3
(
const
float
*
input
,
float
*
output
,
const
index_t
height
,
const
index_t
width
)
{
index_t
image_size
=
height
*
width
;
#pragma omp parallel for
for
(
index_t
h
=
0
;
h
<
height
;
++
h
)
{
index_t
in_offset
=
h
*
width
*
3
;
index_t
out_offset
=
h
*
width
;
index_t
w
;
for
(
w
=
0
;
w
+
3
<
width
;
w
+=
4
)
{
float32x4x3_t
vi
=
vld3q_f32
(
input
+
in_offset
);
vst1q_f32
(
output
+
out_offset
,
vi
.
val
[
0
]);
vst1q_f32
(
output
+
out_offset
+
image_size
,
vi
.
val
[
1
]);
vst1q_f32
(
output
+
out_offset
+
image_size
*
2
,
vi
.
val
[
2
]);
in_offset
+=
12
;
out_offset
+=
4
;
}
for
(;
w
<
width
;
++
w
)
{
for
(
index_t
c
=
0
;
c
<
3
;
++
c
)
{
output
[
h
*
width
+
image_size
*
c
+
w
]
=
input
[
h
*
width
*
3
+
w
*
3
+
c
];
}
}
}
}
static
void
TransposeNCHWToNHWCC2
(
const
float
*
input
,
float
*
output
,
const
index_t
height
,
const
index_t
width
)
{
index_t
image_size
=
height
*
width
;
#pragma omp parallel for
for
(
index_t
h
=
0
;
h
<
height
;
++
h
)
{
index_t
in_offset
=
h
*
width
;
index_t
out_offset
=
h
*
width
*
2
;
index_t
w
;
for
(
w
=
0
;
w
+
3
<
width
;
w
+=
4
)
{
float32x4_t
vi0
=
vld1q_f32
(
input
+
in_offset
);
float32x4_t
vi1
=
vld1q_f32
(
input
+
in_offset
+
image_size
);
float32x4x2_t
vi
=
{
vi0
,
vi1
};
vst2q_f32
(
output
+
out_offset
,
vi
);
in_offset
+=
4
;
out_offset
+=
8
;
}
for
(;
w
<
width
;
++
w
)
{
for
(
index_t
c
=
0
;
c
<
2
;
++
c
)
{
output
[
h
*
width
*
2
+
w
*
2
+
c
]
=
input
[
h
*
width
+
image_size
*
c
+
w
];
}
}
}
}
template
<
DeviceType
D
,
typename
T
>
struct
TransposeFunctor
{
explicit
TransposeFunctor
(
const
std
::
vector
<
int
>
&
dims
)
:
dims_
(
dims
)
{}
...
...
@@ -48,28 +111,48 @@ struct TransposeFunctor {
}
}
}
else
if
(
input
->
dim_size
()
==
4
)
{
std
::
vector
<
index_t
>
in_stride
{
input_shape
[
1
]
*
input_shape
[
2
]
*
input_shape
[
3
],
input_shape
[
2
]
*
input_shape
[
3
],
input_shape
[
3
],
1
};
std
::
vector
<
index_t
>
out_stride
{
output_shape
[
1
]
*
output_shape
[
2
]
*
output_shape
[
3
],
output_shape
[
2
]
*
output_shape
[
3
],
output_shape
[
3
],
1
};
std
::
vector
<
index_t
>
idim
(
4
,
0
);
std
::
vector
<
index_t
>
odim
(
4
,
0
);
for
(
odim
[
0
]
=
0
;
odim
[
0
]
<
output_shape
[
0
];
++
odim
[
0
])
{
for
(
odim
[
1
]
=
0
;
odim
[
1
]
<
output_shape
[
1
];
++
odim
[
1
])
{
for
(
odim
[
2
]
=
0
;
odim
[
2
]
<
output_shape
[
2
];
++
odim
[
2
])
{
for
(
odim
[
3
]
=
0
;
odim
[
3
]
<
output_shape
[
3
];
++
odim
[
3
])
{
idim
[
dims_
[
0
]]
=
odim
[
0
];
idim
[
dims_
[
1
]]
=
odim
[
1
];
idim
[
dims_
[
2
]]
=
odim
[
2
];
idim
[
dims_
[
3
]]
=
odim
[
3
];
output_data
[
odim
[
0
]
*
out_stride
[
0
]
+
odim
[
1
]
*
out_stride
[
1
]
+
odim
[
2
]
*
out_stride
[
2
]
+
odim
[
3
]]
=
input_data
[
idim
[
0
]
*
in_stride
[
0
]
+
idim
[
1
]
*
in_stride
[
1
]
+
idim
[
2
]
*
in_stride
[
2
]
+
idim
[
3
]];
std
::
vector
<
int
>
transpose_order_from_NHWC_to_NCHW
{
0
,
3
,
1
,
2
};
std
::
vector
<
int
>
transpose_order_from_NCHW_to_NHWC
{
0
,
2
,
3
,
1
};
index_t
batch_size
=
input
->
dim
(
1
)
*
input
->
dim
(
2
)
*
input
->
dim
(
3
);
if
(
dims_
==
transpose_order_from_NHWC_to_NCHW
&&
input
->
dim
(
3
)
==
3
)
{
for
(
index_t
b
=
0
;
b
<
input
->
dim
(
0
);
++
b
)
{
TransposeNHWCToNCHWC3
(
input_data
+
b
*
batch_size
,
output_data
+
b
*
batch_size
,
input
->
dim
(
1
),
input
->
dim
(
2
));
}
}
else
if
(
dims_
==
transpose_order_from_NCHW_to_NHWC
&&
input
->
dim
(
1
)
==
2
)
{
for
(
index_t
b
=
0
;
b
<
input
->
dim
(
0
);
++
b
)
{
TransposeNCHWToNHWCC2
(
input_data
+
b
*
batch_size
,
output_data
+
b
*
batch_size
,
input
->
dim
(
2
),
input
->
dim
(
3
));
}
}
else
{
std
::
vector
<
index_t
>
in_stride
{
input_shape
[
1
]
*
input_shape
[
2
]
*
input_shape
[
3
],
input_shape
[
2
]
*
input_shape
[
3
],
input_shape
[
3
],
1
};
std
::
vector
<
index_t
>
out_stride
{
output_shape
[
1
]
*
output_shape
[
2
]
*
output_shape
[
3
],
output_shape
[
2
]
*
output_shape
[
3
],
output_shape
[
3
],
1
};
std
::
vector
<
index_t
>
idim
(
4
,
0
);
std
::
vector
<
index_t
>
odim
(
4
,
0
);
for
(
odim
[
0
]
=
0
;
odim
[
0
]
<
output_shape
[
0
];
++
odim
[
0
])
{
for
(
odim
[
1
]
=
0
;
odim
[
1
]
<
output_shape
[
1
];
++
odim
[
1
])
{
for
(
odim
[
2
]
=
0
;
odim
[
2
]
<
output_shape
[
2
];
++
odim
[
2
])
{
for
(
odim
[
3
]
=
0
;
odim
[
3
]
<
output_shape
[
3
];
++
odim
[
3
])
{
idim
[
dims_
[
0
]]
=
odim
[
0
];
idim
[
dims_
[
1
]]
=
odim
[
1
];
idim
[
dims_
[
2
]]
=
odim
[
2
];
idim
[
dims_
[
3
]]
=
odim
[
3
];
output_data
[
odim
[
0
]
*
out_stride
[
0
]
+
odim
[
1
]
*
out_stride
[
1
]
+
odim
[
2
]
*
out_stride
[
2
]
+
odim
[
3
]]
=
input_data
[
idim
[
0
]
*
in_stride
[
0
]
+
idim
[
1
]
*
in_stride
[
1
]
+
idim
[
2
]
*
in_stride
[
2
]
+
idim
[
3
]];
}
}
}
}
...
...
mace/ops/conv_2d_benchmark.cc
浏览文件 @
81865af1
...
...
@@ -165,6 +165,13 @@ BM_CONV_2D(1, 32, 256, 256, 3, 3, 1, 4, VALID, 32);
BM_CONV_2D
(
1
,
128
,
56
,
56
,
1
,
1
,
1
,
1
,
SAME
,
128
);
BM_CONV_2D
(
1
,
1024
,
7
,
7
,
1
,
1
,
1
,
1
,
SAME
,
1024
);
BM_CONV_2D
(
64
,
32
,
34
,
34
,
3
,
3
,
1
,
1
,
VALID
,
32
);
BM_CONV_2D
(
1
,
32
,
34
,
34
,
3
,
3
,
1
,
1
,
VALID
,
32
);
BM_CONV_2D
(
1
,
32
,
256
,
256
,
1
,
15
,
1
,
1
,
SAME
,
2
);
BM_CONV_2D
(
1
,
32
,
256
,
256
,
15
,
1
,
1
,
1
,
SAME
,
2
);
BM_CONV_2D
(
1
,
64
,
64
,
64
,
15
,
1
,
1
,
1
,
SAME
,
2
);
}
// namespace test
}
// namespace ops
}
// namespace mace
mace/ops/conv_2d_test.cc
浏览文件 @
81865af1
...
...
@@ -779,11 +779,17 @@ TEST_F(Conv2dOpTest, OPENCLHalfAlignedConv3x3S12) {
TEST_F
(
Conv2dOpTest
,
OPENCLHalfAlignedConv15x1S12
)
{
TestHalfComplexConvNxNS12
<
DeviceType
::
GPU
>
({
32
,
32
},
{
15
,
1
,
256
,
2
},
{
1
,
1
});
TestHalfComplexConvNxNS12
<
DeviceType
::
GPU
>
({
64
,
64
},
{
15
,
1
,
64
,
2
},
{
1
,
1
});
TestHalfComplexConvNxNS12
<
DeviceType
::
GPU
>
({
256
,
256
},
{
15
,
1
,
32
,
2
},
{
1
,
1
});
}
TEST_F
(
Conv2dOpTest
,
OPENCLHalfAlignedConv1x15S12
)
{
TestHalfComplexConvNxNS12
<
DeviceType
::
GPU
>
({
32
,
32
},
{
1
,
15
,
256
,
2
},
{
1
,
1
});
TestHalfComplexConvNxNS12
<
DeviceType
::
GPU
>
({
256
,
256
},
{
1
,
15
,
32
,
2
},
{
1
,
1
});
}
TEST_F
(
Conv2dOpTest
,
OPENCLHalfAlignedConv7x75S12
)
{
...
...
mace/ops/transpose_benchmark.cc
浏览文件 @
81865af1
...
...
@@ -83,6 +83,9 @@ void TransposeBenchmark(int iters,
#define BM_TRANSPOSE4D(N, C, H, W, D0, D1, D2, D3) \
BM_TRANSPOSE4D_MACRO(N, C, H, W, D0, D1, D2, D3, float, CPU);
BM_TRANSPOSE4D
(
1
,
512
,
512
,
3
,
0
,
3
,
1
,
2
);
BM_TRANSPOSE4D
(
1
,
2
,
512
,
512
,
0
,
2
,
3
,
1
);
BM_TRANSPOSE4D
(
1
,
64
,
64
,
512
,
0
,
3
,
1
,
2
);
BM_TRANSPOSE4D
(
1
,
512
,
64
,
64
,
0
,
2
,
3
,
1
);
BM_TRANSPOSE2D
(
128
,
128
);
...
...
mace/ops/transpose_test.cc
浏览文件 @
81865af1
...
...
@@ -37,16 +37,51 @@ void TransposeNCHWTest(const std::vector<index_t> &input_shape) {
// Run on cpu
net
.
RunOp
();
net
.
FillNHWCInputToNCHWInput
<
DeviceType
::
CPU
,
float
>
(
"InputNCHW"
,
"Input"
);
net
.
TransformDataFormat
<
DeviceType
::
CPU
,
float
>
(
"Input"
,
DataFormat
::
NHWC
,
"InputNCHW"
,
DataFormat
::
NCHW
);
ExpectTensorNear
<
float
>
(
*
net
.
GetOutput
(
"InputNCHW"
),
*
net
.
GetOutput
(
"Output"
));
}
void
TransposeNHWCTest
(
const
std
::
vector
<
index_t
>
&
input_shape
)
{
// Construct graph
OpsTestNet
net
;
// Add input data
net
.
AddRandomInput
<
CPU
,
float
>
(
"Input"
,
input_shape
);
OpDefBuilder
(
"Transpose"
,
"TransposeNHWCTest"
)
.
Input
(
"Input"
)
.
Output
(
"Output"
)
.
AddIntsArg
(
"dims"
,
{
0
,
2
,
3
,
1
})
.
Finalize
(
net
.
NewOperatorDef
());
// Run on cpu
net
.
RunOp
();
net
.
TransformDataFormat
<
DeviceType
::
CPU
,
float
>
(
"Input"
,
DataFormat
::
NCHW
,
"InputNHWC"
,
DataFormat
::
NHWC
);
ExpectTensorNear
<
float
>
(
*
net
.
GetOutput
(
"InputNHWC"
),
*
net
.
GetOutput
(
"Output"
));
}
}
// namespace
TEST_F
(
TransposeOpTest
,
NCHW
)
{
TEST_F
(
TransposeOpTest
,
N
HWC_to_N
CHW
)
{
TransposeNCHWTest
({
3
,
64
,
64
,
128
});
TransposeNCHWTest
({
1
,
64
,
48
,
128
});
TransposeNCHWTest
({
1
,
512
,
512
,
3
});
TransposeNCHWTest
({
2
,
512
,
512
,
3
});
}
TEST_F
(
TransposeOpTest
,
NCHW_to_NHWC
)
{
TransposeNHWCTest
({
1
,
2
,
512
,
512
});
TransposeNHWCTest
({
1
,
3
,
512
,
512
});
TransposeNHWCTest
({
2
,
2
,
512
,
512
});
}
TEST_F
(
TransposeOpTest
,
Rank2
)
{
...
...
mace/python/tools/converter.py
浏览文件 @
81865af1
...
...
@@ -40,11 +40,6 @@ FLAGS = None
device_type_map
=
{
'cpu'
:
cvt
.
DeviceType
.
CPU
.
value
,
'gpu'
:
cvt
.
DeviceType
.
GPU
.
value
,
'dsp'
:
cvt
.
DeviceType
.
HEXAGON
.
value
}
device_data_type_map
=
{
cvt
.
DeviceType
.
CPU
.
value
:
mace_pb2
.
DT_FLOAT
,
cvt
.
DeviceType
.
GPU
.
value
:
mace_pb2
.
DT_HALF
,
cvt
.
DeviceType
.
HEXAGON
.
value
:
mace_pb2
.
DT_UINT8
}
def
file_checksum
(
fname
):
...
...
@@ -129,6 +124,17 @@ def main(unused_args):
FLAGS
.
weight_file
)
output_graph_def
=
converter
.
run
()
if
FLAGS
.
gpu_data_type
==
'half'
:
gpu_data_type
=
mace_pb2
.
DT_HALF
else
:
gpu_data_type
=
mace_pb2
.
DT_FLOAT
device_data_type_map
=
{
cvt
.
DeviceType
.
CPU
.
value
:
mace_pb2
.
DT_FLOAT
,
cvt
.
DeviceType
.
GPU
.
value
:
gpu_data_type
,
cvt
.
DeviceType
.
HEXAGON
.
value
:
mace_pb2
.
DT_UINT8
}
print
(
"Transform model to one that can better run on device"
)
if
not
FLAGS
.
runtime
:
cpu_graph_def
=
copy
.
deepcopy
(
output_graph_def
)
...
...
@@ -180,7 +186,7 @@ def main(unused_args):
tensor_util
.
rename_tensor
(
output_graph_def
)
tensor_infos
,
model_data
=
tensor_util
.
get_tensor_info_and_model_data
(
output_graph_def
,
FLAGS
.
runtime
)
output_graph_def
,
FLAGS
.
runtime
,
FLAGS
.
gpu_data_type
)
source_converter_lib
.
convert_to_source
(
output_graph_def
,
model_checksum
,
weight_checksum
,
FLAGS
.
template
,
...
...
@@ -194,7 +200,10 @@ def main(unused_args):
f
.
write
(
bytearray
(
model_data
))
if
FLAGS
.
model_load_type
==
'pb'
:
tensor_util
.
del_tensor_data
(
output_graph_def
,
FLAGS
.
runtime
)
tensor_util
.
del_tensor_data
(
output_graph_def
,
FLAGS
.
runtime
,
FLAGS
.
gpu_data_type
)
tensor_util
.
update_tensor_data_type
(
output_graph_def
,
FLAGS
.
runtime
,
FLAGS
.
gpu_data_type
)
with
open
(
FLAGS
.
pb_output
,
"wb"
)
as
f
:
f
.
write
(
output_graph_def
.
SerializeToString
())
# with open(FLAGS.pb_output + '_txt', "wb") as f:
...
...
@@ -253,8 +262,6 @@ def parse_args():
help
=
"e.g., input_node"
)
parser
.
add_argument
(
"--output_node"
,
type
=
str
,
default
=
"softmax"
,
help
=
"e.g., softmax"
)
parser
.
add_argument
(
"--output_type"
,
type
=
str
,
default
=
"pb"
,
help
=
"output type: source/pb"
)
parser
.
add_argument
(
"--template"
,
type
=
str
,
default
=
""
,
help
=
"template path"
)
parser
.
add_argument
(
...
...
@@ -293,6 +300,8 @@ def parse_args():
default
=
"source"
,
help
=
"[source|pb] Load models in generated `source` code"
+
"or `pb` file."
)
parser
.
add_argument
(
"--gpu_data_type"
,
type
=
str
,
default
=
"half"
,
help
=
"half/float"
)
return
parser
.
parse_known_args
()
...
...
mace/python/tools/converter_tool/base_converter.py
浏览文件 @
81865af1
...
...
@@ -153,14 +153,15 @@ class TransformerRule(Enum):
TRANSFORM_GPU_WINOGRAD
=
8
TRANSFORM_ADD_TO_BIASADD
=
9
FOLD_BIASADD
=
10
FOLD_ACTIVATION
=
11
TRANSPOSE_FILTERS
=
12
RESHAPE_FC_WEIGHT
=
13
TRANSPOSE_DATA_FORMAT
=
14
TRANSFORM_GLOBAL_CONV_TO_FC
=
15
TRANSFORM_BUFFER_IMAGE
=
16
ADD_DEVICE_AND_DATA_TYPE
=
17
SORT_BY_EXECUTION
=
18
FLATTEN_ATROUS_CONV
=
11
FOLD_ACTIVATION
=
12
TRANSPOSE_FILTERS
=
13
RESHAPE_FC_WEIGHT
=
14
TRANSPOSE_DATA_FORMAT
=
15
TRANSFORM_GLOBAL_CONV_TO_FC
=
16
TRANSFORM_BUFFER_IMAGE
=
17
ADD_DEVICE_AND_DATA_TYPE
=
18
SORT_BY_EXECUTION
=
19
class
ConverterInterface
(
object
):
...
...
@@ -218,6 +219,7 @@ class ConverterOption(object):
TransformerRule
.
TRANSFORM_GPU_WINOGRAD
,
TransformerRule
.
TRANSFORM_ADD_TO_BIASADD
,
TransformerRule
.
FOLD_BIASADD
,
TransformerRule
.
FLATTEN_ATROUS_CONV
,
TransformerRule
.
FOLD_ACTIVATION
,
TransformerRule
.
TRANSPOSE_FILTERS
,
TransformerRule
.
TRANSPOSE_DATA_FORMAT
,
...
...
mace/python/tools/converter_tool/tensorflow_converter.py
浏览文件 @
81865af1
...
...
@@ -16,6 +16,7 @@
import
math
import
numpy
as
np
import
tensorflow
as
tf
from
enum
import
Enum
from
mace.proto
import
mace_pb2
from
mace.python.tools.converter_tool
import
base_converter
...
...
@@ -41,6 +42,50 @@ tf_epsilon_str = 'epsilon'
tf_align_corners
=
'align_corners'
tf_block_size
=
'block_size'
TFSupportedOps
=
[
'Conv2D'
,
'DepthwiseConv2dNative'
,
'Conv2DBackpropInput'
,
'BiasAdd'
,
'Add'
,
'Sub'
,
'Mul'
,
'Div'
,
'Min'
,
'Max'
,
'Neg'
,
'Abs'
,
'RealDiv'
,
'SquaredDifference'
,
'Pow'
,
'Relu'
,
'Relu6'
,
'Tanh'
,
'Sigmoid'
,
'FusedBatchNorm'
,
'AvgPool'
,
'MaxPool'
,
'Squeeze'
,
'MatMul'
,
'Identity'
,
'Reshape'
,
'Shape'
,
'Transpose'
,
'Softmax'
,
'ResizeBilinear'
,
'Placeholder'
,
'SpaceToBatchND'
,
'BatchToSpaceND'
,
'DepthToSpace'
,
'SpaceToDepth'
,
'Pad'
,
'ConcatV2'
,
'Mean'
,
'Const'
,
]
TFOpType
=
Enum
(
'TFOpType'
,
[(
op
,
op
)
for
op
in
TFSupportedOps
],
type
=
str
)
class
TensorflowConverter
(
base_converter
.
ConverterInterface
):
"""A class for convert tensorflow frozen model to mace model.
...
...
@@ -53,71 +98,70 @@ class TensorflowConverter(base_converter.ConverterInterface):
'FULL'
:
PaddingMode
.
FULL
}
pooling_type_mode
=
{
'AvgPool'
:
PoolingType
.
AVG
,
'MaxPool'
:
PoolingType
.
MAX
TFOpType
.
AvgPool
.
name
:
PoolingType
.
AVG
,
TFOpType
.
MaxPool
.
name
:
PoolingType
.
MAX
}
eltwise_type
=
{
'Add'
:
EltwiseType
.
SUM
,
'Sub'
:
EltwiseType
.
SUB
,
'Mul'
:
EltwiseType
.
PROD
,
'Div'
:
EltwiseType
.
DIV
,
'Min'
:
EltwiseType
.
MIN
,
'Max'
:
EltwiseType
.
MAX
,
'Neg'
:
EltwiseType
.
NEG
,
'Abs'
:
EltwiseType
.
ABS
,
'RealDiv'
:
EltwiseType
.
DIV
,
'SquaredDifference'
:
EltwiseType
.
SQR_DIFF
,
'Pow'
:
EltwiseType
.
POW
TFOpType
.
Add
.
name
:
EltwiseType
.
SUM
,
TFOpType
.
Sub
.
name
:
EltwiseType
.
SUB
,
TFOpType
.
Mul
.
name
:
EltwiseType
.
PROD
,
TFOpType
.
Div
.
name
:
EltwiseType
.
DIV
,
TFOpType
.
Min
.
name
:
EltwiseType
.
MIN
,
TFOpType
.
Max
.
name
:
EltwiseType
.
MAX
,
TFOpType
.
Neg
.
name
:
EltwiseType
.
NEG
,
TFOpType
.
Abs
.
name
:
EltwiseType
.
ABS
,
TFOpType
.
RealDiv
.
name
:
EltwiseType
.
DIV
,
TFOpType
.
SquaredDifference
.
name
:
EltwiseType
.
SQR_DIFF
,
TFOpType
.
Pow
.
name
:
EltwiseType
.
POW
}
activation_type
=
{
'Relu'
:
ActivationType
.
RELU
,
'Relu6'
:
ActivationType
.
RELUX
,
'Tanh'
:
ActivationType
.
TANH
,
'Sigmoid'
:
ActivationType
.
SIGMOID
TFOpType
.
Relu
.
name
:
ActivationType
.
RELU
,
TFOpType
.
Relu6
.
name
:
ActivationType
.
RELUX
,
TFOpType
.
Tanh
.
name
:
ActivationType
.
TANH
,
TFOpType
.
Sigmoid
.
name
:
ActivationType
.
SIGMOID
}
def
__init__
(
self
,
option
,
src_model_file
):
self
.
_op_converters
=
{
'Conv2D'
:
self
.
convert_conv2d
,
'DepthwiseConv2dNative'
:
self
.
convert_conv2d
,
'Conv2DBackpropInput'
:
self
.
convert_conv2d
,
'BiasAdd'
:
self
.
convert_biasadd
,
'Add'
:
self
.
convert_add
,
'Sub'
:
self
.
convert_elementwise
,
'Mul'
:
self
.
convert_elementwise
,
'Div'
:
self
.
convert_elementwise
,
'Min'
:
self
.
convert_elementwise
,
'Max'
:
self
.
convert_elementwise
,
'Neg'
:
self
.
convert_elementwise
,
'Abs'
:
self
.
convert_elementwise
,
'RealDiv'
:
self
.
convert_elementwise
,
'SquaredDifference'
:
self
.
convert_elementwise
,
'Pow'
:
self
.
convert_elementwise
,
'Relu'
:
self
.
convert_activation
,
'Relu6'
:
self
.
convert_activation
,
'Tanh'
:
self
.
convert_activation
,
'Sigmoid'
:
self
.
convert_activation
,
'FusedBatchNorm'
:
self
.
convert_fused_batchnorm
,
'AvgPool'
:
self
.
convert_pooling
,
'MaxPool'
:
self
.
convert_pooling
,
'Squeeze'
:
self
.
convert_identity
,
'MatMul'
:
self
.
convert_matmul
,
'Identity'
:
self
.
convert_identity
,
'Reshape'
:
self
.
convert_reshape
,
'Shape'
:
self
.
convert_nop
,
'Transpose'
:
self
.
convert_transpose
,
'Softmax'
:
self
.
convert_softmax
,
'ResizeBilinear'
:
self
.
convert_resize_bilinear
,
'Placeholder'
:
self
.
convert_nop
,
'SpaceToBatchND'
:
self
.
convert_space_batch
,
'BatchToSpaceND'
:
self
.
convert_space_batch
,
'DepthToSpace'
:
self
.
convert_space_depth
,
'SpaceToDepth'
:
self
.
convert_space_depth
,
'Pad'
:
self
.
convert_pad
,
'ConcatV2'
:
self
.
convert_concat
,
'Mean'
:
self
.
convert_mean
,
# Const converter_tool should be placed at the end
'Const'
:
self
.
convert_tensor
,
TFOpType
.
Conv2D
.
name
:
self
.
convert_conv2d
,
TFOpType
.
DepthwiseConv2dNative
.
name
:
self
.
convert_conv2d
,
TFOpType
.
Conv2DBackpropInput
.
name
:
self
.
convert_conv2d
,
TFOpType
.
BiasAdd
.
name
:
self
.
convert_biasadd
,
TFOpType
.
Add
.
name
:
self
.
convert_add
,
TFOpType
.
Sub
.
name
:
self
.
convert_elementwise
,
TFOpType
.
Mul
.
name
:
self
.
convert_elementwise
,
TFOpType
.
Div
.
name
:
self
.
convert_elementwise
,
TFOpType
.
Min
.
name
:
self
.
convert_elementwise
,
TFOpType
.
Max
.
name
:
self
.
convert_elementwise
,
TFOpType
.
Neg
.
name
:
self
.
convert_elementwise
,
TFOpType
.
Abs
.
name
:
self
.
convert_elementwise
,
TFOpType
.
RealDiv
.
name
:
self
.
convert_elementwise
,
TFOpType
.
SquaredDifference
.
name
:
self
.
convert_elementwise
,
TFOpType
.
Pow
.
name
:
self
.
convert_elementwise
,
TFOpType
.
Relu
.
name
:
self
.
convert_activation
,
TFOpType
.
Relu6
.
name
:
self
.
convert_activation
,
TFOpType
.
Tanh
.
name
:
self
.
convert_activation
,
TFOpType
.
Sigmoid
.
name
:
self
.
convert_activation
,
TFOpType
.
FusedBatchNorm
.
name
:
self
.
convert_fused_batchnorm
,
TFOpType
.
AvgPool
.
name
:
self
.
convert_pooling
,
TFOpType
.
MaxPool
.
name
:
self
.
convert_pooling
,
TFOpType
.
Squeeze
.
name
:
self
.
convert_identity
,
TFOpType
.
MatMul
.
name
:
self
.
convert_matmul
,
TFOpType
.
Identity
.
name
:
self
.
convert_identity
,
TFOpType
.
Reshape
.
name
:
self
.
convert_reshape
,
TFOpType
.
Shape
.
name
:
self
.
convert_nop
,
TFOpType
.
Transpose
.
name
:
self
.
convert_transpose
,
TFOpType
.
Softmax
.
name
:
self
.
convert_softmax
,
TFOpType
.
ResizeBilinear
.
name
:
self
.
convert_resize_bilinear
,
TFOpType
.
Placeholder
.
name
:
self
.
convert_nop
,
TFOpType
.
SpaceToBatchND
.
name
:
self
.
convert_space_batch
,
TFOpType
.
BatchToSpaceND
.
name
:
self
.
convert_space_batch
,
TFOpType
.
DepthToSpace
.
name
:
self
.
convert_space_depth
,
TFOpType
.
SpaceToDepth
.
name
:
self
.
convert_space_depth
,
TFOpType
.
Pad
.
name
:
self
.
convert_pad
,
TFOpType
.
ConcatV2
.
name
:
self
.
convert_concat
,
TFOpType
.
Mean
.
name
:
self
.
convert_mean
,
TFOpType
.
Const
.
name
:
self
.
convert_nop
,
}
self
.
_option
=
option
self
.
_mace_net_def
=
mace_pb2
.
NetDef
()
...
...
@@ -180,24 +224,29 @@ class TensorflowConverter(base_converter.ConverterInterface):
"Mace does not support tensorflow op type %s yet"
%
tf_op
.
type
)
self
.
_op_converters
[
tf_op
.
type
](
tf_op
)
self
.
convert_tensors
()
def
convert_tensor
(
self
,
tf_op
):
output_name
=
tf_op
.
outputs
[
0
].
name
if
output_name
not
in
self
.
_skip_tensor
:
tensor
=
self
.
_mace_net_def
.
tensors
.
add
()
tensor
.
name
=
tf_op
.
outputs
[
0
].
name
tf_tensor
=
tf_op
.
outputs
[
0
].
eval
()
tensor
.
dims
.
extend
(
list
(
tf_tensor
.
shape
))
tf_dt
=
tf_op
.
get_attr
(
'dtype'
)
if
tf_dt
==
tf
.
float32
:
tensor
.
data_type
=
mace_pb2
.
DT_FLOAT
tensor
.
float_data
.
extend
(
tf_tensor
.
astype
(
np
.
float32
).
flat
)
elif
tf_dt
==
tf
.
int32
:
tensor
.
data_type
=
mace_pb2
.
DT_INT32
tensor
.
int32_data
.
extend
(
tf_tensor
.
astype
(
np
.
int32
).
flat
)
else
:
mace_check
(
False
,
"Not supported tensor type: %s"
%
tf_dt
.
name
)
def
convert_tensors
(
self
):
for
tf_op
in
self
.
_tf_graph
.
get_operations
():
if
tf_op
.
type
!=
TFOpType
.
Const
.
name
:
continue
output_name
=
tf_op
.
outputs
[
0
].
name
if
output_name
not
in
self
.
_skip_tensor
:
tensor
=
self
.
_mace_net_def
.
tensors
.
add
()
tensor
.
name
=
tf_op
.
outputs
[
0
].
name
tf_tensor
=
tf_op
.
outputs
[
0
].
eval
()
tensor
.
dims
.
extend
(
list
(
tf_tensor
.
shape
))
tf_dt
=
tf_op
.
get_attr
(
'dtype'
)
if
tf_dt
==
tf
.
float32
:
tensor
.
data_type
=
mace_pb2
.
DT_FLOAT
tensor
.
float_data
.
extend
(
tf_tensor
.
astype
(
np
.
float32
).
flat
)
elif
tf_dt
==
tf
.
int32
:
tensor
.
data_type
=
mace_pb2
.
DT_INT32
tensor
.
int32_data
.
extend
(
tf_tensor
.
astype
(
np
.
int32
).
flat
)
else
:
mace_check
(
False
,
"Not supported tensor type: %s"
%
tf_dt
.
name
)
def
add_tensor
(
self
,
name
,
shape
,
data_type
,
value
):
tensor
=
self
.
_mace_net_def
.
tensors
.
add
()
...
...
@@ -229,9 +278,9 @@ class TensorflowConverter(base_converter.ConverterInterface):
def
convert_conv2d
(
self
,
tf_op
):
op
=
self
.
convert_general_op
(
tf_op
)
if
tf_op
.
type
==
'DepthwiseConv2dNative'
:
if
tf_op
.
type
==
TFOpType
.
DepthwiseConv2dNative
.
name
:
op
.
type
=
MaceOp
.
DepthwiseConv2d
.
name
elif
tf_op
.
type
==
'Conv2DBackpropInput'
:
elif
tf_op
.
type
==
TFOpType
.
Conv2DBackpropInput
.
name
:
op
.
type
=
MaceOp
.
Deconv2D
.
name
else
:
op
.
type
=
MaceOp
.
Conv2D
.
name
...
...
@@ -274,7 +323,7 @@ class TensorflowConverter(base_converter.ConverterInterface):
type_arg
.
name
=
MaceKeyword
.
mace_activation_type_str
type_arg
.
s
=
self
.
activation_type
[
tf_op
.
type
].
name
if
tf_op
.
type
==
'Relu6'
:
if
tf_op
.
type
==
TFOpType
.
Relu6
.
name
:
limit_arg
=
op
.
arg
.
add
()
limit_arg
.
name
=
MaceKeyword
.
mace_activation_max_limit_str
limit_arg
.
f
=
6.0
...
...
@@ -335,7 +384,7 @@ class TensorflowConverter(base_converter.ConverterInterface):
size_arg
.
name
=
MaceKeyword
.
mace_resize_size_str
size_value
=
tf_op
.
inputs
[
1
].
eval
().
astype
(
np
.
int32
)
size_arg
.
ints
.
extend
(
size_value
)
self
.
_skip_tensor
.
update
(
tf_op
.
inputs
[
1
].
name
)
self
.
_skip_tensor
.
add
(
tf_op
.
inputs
[
1
].
name
)
align_corners_arg
=
op
.
arg
.
add
()
align_corners_arg
.
name
=
MaceKeyword
.
mace_align_corners_str
align_corners_arg
.
i
=
tf_op
.
get_attr
(
tf_align_corners
)
...
...
@@ -357,7 +406,7 @@ class TensorflowConverter(base_converter.ConverterInterface):
size_arg
.
ints
.
extend
(
size_value
)
crops_or_paddings_arg
=
op
.
arg
.
add
()
if
op
.
type
==
'BatchToSpaceND'
:
if
op
.
type
==
TFOpType
.
BatchToSpaceND
.
name
:
op
.
type
=
MaceOp
.
BatchToSpaceND
.
name
crops_or_paddings_arg
.
name
=
\
MaceKeyword
.
mace_batch_to_space_crops_str
...
...
@@ -367,12 +416,12 @@ class TensorflowConverter(base_converter.ConverterInterface):
crops_or_paddings_value
=
tf_op
.
inputs
[
2
].
eval
().
astype
(
np
.
int32
).
flat
crops_or_paddings_arg
.
ints
.
extend
(
crops_or_paddings_value
)
self
.
_skip_tensor
.
update
(
tf_op
.
inputs
[
1
].
name
)
self
.
_skip_tensor
.
update
(
tf_op
.
inputs
[
2
].
name
)
self
.
_skip_tensor
.
add
(
tf_op
.
inputs
[
1
].
name
)
self
.
_skip_tensor
.
add
(
tf_op
.
inputs
[
2
].
name
)
def
convert_space_depth
(
self
,
tf_op
):
op
=
self
.
convert_general_op
(
tf_op
)
if
op
.
type
==
'SpaceToDepth'
:
if
op
.
type
==
TFOpType
.
SpaceToDepth
.
name
:
op
.
type
=
MaceOp
.
SpaceToDepth
.
name
else
:
op
.
type
=
MaceOp
.
DepthToSpace
.
name
...
...
@@ -390,14 +439,14 @@ class TensorflowConverter(base_converter.ConverterInterface):
paddings_arg
.
name
=
MaceKeyword
.
mace_paddings_str
paddings_value
=
tf_op
.
inputs
[
1
].
eval
().
astype
(
np
.
int32
).
flat
paddings_arg
.
ints
.
extend
(
paddings_value
)
self
.
_skip_tensor
.
update
(
tf_op
.
inputs
[
1
].
name
)
self
.
_skip_tensor
.
add
(
tf_op
.
inputs
[
1
].
name
)
if
len
(
tf_op
.
inputs
)
==
3
:
constant_value_arg
=
op
.
arg
.
add
()
constant_value_arg
.
name
=
MaceKeyword
.
mace_constant_value_str
constant_value
=
tf_op
.
inputs
[
2
].
eval
().
astype
(
np
.
int32
).
flat
[
0
]
constant_value_arg
.
i
=
constant_value
self
.
_skip_tensor
.
update
(
tf_op
.
inputs
[
2
].
name
)
self
.
_skip_tensor
.
add
(
tf_op
.
inputs
[
2
].
name
)
def
convert_concat
(
self
,
tf_op
):
op
=
self
.
convert_general_op
(
tf_op
)
...
...
@@ -412,7 +461,7 @@ class TensorflowConverter(base_converter.ConverterInterface):
mace_check
(
axis
==
3
,
"only support concat at channel dimension"
)
self
.
_skip_tensor
.
update
(
tf_op
.
inputs
[
-
1
].
name
)
self
.
_skip_tensor
.
add
(
tf_op
.
inputs
[
-
1
].
name
)
def
convert_matmul
(
self
,
tf_op
):
op
=
self
.
convert_general_op
(
tf_op
)
...
...
@@ -426,13 +475,13 @@ class TensorflowConverter(base_converter.ConverterInterface):
shape_arg
=
op
.
arg
.
add
()
shape_arg
.
name
=
MaceKeyword
.
mace_shape_str
shape_value
=
[]
if
tf_op
.
inputs
[
1
].
op
.
type
==
'Const'
:
if
tf_op
.
inputs
[
1
].
op
.
type
==
TFOpType
.
Const
.
name
:
shape_value
=
list
(
tf_op
.
inputs
[
1
].
eval
().
astype
(
np
.
int32
))
for
i
in
xrange
(
len
(
shape_value
)):
if
shape_value
[
i
]
==
-
1
:
shape_value
[
i
]
=
1
self
.
_skip_tensor
.
update
(
tf_op
.
inputs
[
-
1
].
name
)
elif
tf_op
.
inputs
[
1
].
op
.
type
==
'Shape'
:
self
.
_skip_tensor
.
add
(
tf_op
.
inputs
[
-
1
].
name
)
elif
tf_op
.
inputs
[
1
].
op
.
type
==
TFOpType
.
Shape
.
name
:
shape_value
=
list
(
tf_op
.
inputs
[
1
].
op
.
inputs
[
0
].
shape
.
as_list
())
shape_arg
.
ints
.
extend
(
shape_value
)
...
...
mace/python/tools/converter_tool/transformer.py
浏览文件 @
81865af1
...
...
@@ -66,6 +66,8 @@ class Transformer(base_converter.ConverterInterface):
TransformerRule
.
TRANSFORM_ADD_TO_BIASADD
,
TransformerRule
.
FOLD_BIASADD
,
TransformerRule
.
FOLD_ACTIVATION
,
TransformerRule
.
FLATTEN_ATROUS_CONV
,
TransformerRule
.
FOLD_ACTIVATION
,
TransformerRule
.
TRANSPOSE_FILTERS
,
TransformerRule
.
TRANSPOSE_DATA_FORMAT
,
TransformerRule
.
TRANSFORM_GLOBAL_CONV_TO_FC
,
...
...
@@ -93,6 +95,7 @@ class Transformer(base_converter.ConverterInterface):
TransformerRule
.
TRANSFORM_ADD_TO_BIASADD
:
self
.
transform_add_to_biasadd
,
TransformerRule
.
FOLD_BIASADD
:
self
.
fold_biasadd
,
TransformerRule
.
FLATTEN_ATROUS_CONV
:
self
.
flatten_atrous_conv
,
TransformerRule
.
FOLD_ACTIVATION
:
self
.
fold_activation
,
TransformerRule
.
TRANSPOSE_FILTERS
:
self
.
transpose_filters
,
TransformerRule
.
TRANSPOSE_DATA_FORMAT
:
self
.
transpose_data_format
,
...
...
@@ -616,6 +619,65 @@ class Transformer(base_converter.ConverterInterface):
return
False
def
flatten_atrous_conv
(
self
):
if
self
.
_option
.
device
!=
DeviceType
.
GPU
.
value
:
return
net
=
self
.
_model
for
op
in
net
.
op
:
if
(
op
.
type
==
MaceOp
.
SpaceToBatchND
.
name
and
len
(
self
.
_consumers
.
get
(
op
.
output
[
0
],
[]))
==
1
):
conv_op
=
self
.
_consumers
.
get
(
op
.
output
[
0
])[
0
]
if
(
conv_op
.
type
==
MaceOp
.
Conv2D
.
name
or
conv_op
.
type
==
MaceOp
.
DepthwiseConv2d
.
name
)
\
and
len
(
self
.
_consumers
.
get
(
conv_op
.
output
[
0
],
[]))
==
1
:
# noqa
b2s_op
=
self
.
_consumers
.
get
(
conv_op
.
output
[
0
])[
0
]
if
b2s_op
.
type
==
MaceOp
.
BatchToSpaceND
.
name
:
print
"Flatten atrous convolution"
# Add args.
padding_arg_values
=
ConverterUtil
.
get_arg
(
op
,
MaceKeyword
.
mace_paddings_str
).
ints
blocks_arg_values
=
ConverterUtil
.
get_arg
(
b2s_op
,
MaceKeyword
.
mace_space_batch_block_shape_str
).
ints
dilation_arg
=
ConverterUtil
.
get_arg
(
conv_op
,
MaceKeyword
.
mace_dilations_str
)
if
dilation_arg
is
None
:
dilation_arg
=
conv_op
.
arg
.
add
()
dilation_arg
.
name
=
MaceKeyword
.
mace_dilations_str
dilation_arg
.
ints
[:]
=
blocks_arg_values
padding_arg
=
ConverterUtil
.
get_arg
(
conv_op
,
MaceKeyword
.
mace_padding_str
)
if
padding_arg
is
None
:
padding_arg
=
conv_op
.
arg
.
add
()
padding_arg
.
name
=
MaceKeyword
.
mace_padding_str
if
len
(
padding_arg_values
)
>
0
\
and
padding_arg_values
[
0
]
>
0
:
padding_arg
.
i
=
PaddingMode
.
SAME
.
value
else
:
padding_arg
.
i
=
PaddingMode
.
VALID
.
value
strides_arg
=
ConverterUtil
.
get_arg
(
conv_op
,
MaceKeyword
.
mace_strides_str
)
if
strides_arg
is
None
:
strides_arg
=
conv_op
.
arg
.
add
()
strides_arg
.
name
=
MaceKeyword
.
mace_strides_str
strides_arg
.
ints
[:]
=
[
1
,
1
]
# update output shape
conv_op
.
output_shape
[
0
].
dims
[:]
=
\
b2s_op
.
output_shape
[
0
].
dims
[:]
self
.
safe_remove_node
(
op
,
None
)
self
.
safe_remove_node
(
b2s_op
,
conv_op
)
return
True
return
False
def
fold_activation
(
self
):
net
=
self
.
_model
for
op
in
net
.
op
:
...
...
mace/python/tools/source_converter_lib.py
浏览文件 @
81865af1
...
...
@@ -27,7 +27,6 @@ def convert_to_source(net_def, model_checksum, weight_checksum, template_dir,
obfuscate
,
model_tag
,
output
,
runtime
,
embed_model_data
,
winograd_conv
,
model_load_type
,
tensor_infos
,
model_data
):
# Capture our current directory
print
template_dir
...
...
mace/python/tools/tensor_util.py
浏览文件 @
81865af1
...
...
@@ -105,11 +105,11 @@ def rename_tensor(net_def):
class
TensorInfo
:
def
__init__
(
self
,
id
,
t
,
runtime
):
def
__init__
(
self
,
id
,
t
,
runtime
,
gpu_data_type
):
self
.
id
=
id
self
.
data_type
=
mace_pb2
.
DataType
.
Name
(
t
.
data_type
)
if
t
.
data_type
==
mace_pb2
.
DT_FLOAT
:
if
runtime
==
'gpu'
:
if
runtime
==
'gpu'
and
gpu_data_type
==
'half'
:
self
.
data_type
=
mace_pb2
.
DT_HALF
self
.
data
=
bytearray
(
np
.
array
(
t
.
float_data
).
astype
(
np
.
float16
).
tobytes
())
...
...
@@ -127,13 +127,13 @@ class TensorInfo:
raise
Exception
(
'Tensor data type %s not supported'
%
t
.
data_type
)
def
get_tensor_info_and_model_data
(
net_def
,
runtime
):
def
get_tensor_info_and_model_data
(
net_def
,
runtime
,
gpu_data_type
):
model_data
=
[]
offset
=
0
counter
=
0
tensor_infos
=
[]
for
t
in
net_def
.
tensors
:
tensor_info
=
TensorInfo
(
counter
,
t
,
runtime
)
tensor_info
=
TensorInfo
(
counter
,
t
,
runtime
,
gpu_data_type
)
tensor_infos
.
append
(
tensor_info
)
# align
if
tensor_info
.
data_type
!=
'DT_UINT8'
and
offset
%
4
!=
0
:
...
...
@@ -156,15 +156,17 @@ def get_tensor_info_and_model_data(net_def, runtime):
return
tensor_infos
,
model_data
def
del_tensor_data
(
net_def
,
runtime
):
def
del_tensor_data
(
net_def
,
runtime
,
gpu_data_type
):
for
t
in
net_def
.
tensors
:
if
t
.
data_type
==
mace_pb2
.
DT_FLOAT
:
del
t
.
float_data
[:]
if
runtime
==
'gpu'
:
t
.
data_type
=
mace_pb2
.
DT_HALF
else
:
t
.
data_type
=
mace_pb2
.
DT_FLOAT
elif
t
.
data_type
==
mace_pb2
.
DT_INT32
:
del
t
.
int32_data
[:]
elif
t
.
data_type
==
mace_pb2
.
DT_UINT8
:
del
t
.
int32_data
[:]
def
update_tensor_data_type
(
net_def
,
runtime
,
gpu_data_type
):
for
t
in
net_def
.
tensors
:
if
t
.
data_type
==
mace_pb2
.
DT_FLOAT
and
runtime
==
'gpu'
\
and
gpu_data_type
==
'half'
:
t
.
data_type
=
mace_pb2
.
DT_HALF
tools/mace_tools.py
浏览文件 @
81865af1
...
...
@@ -538,6 +538,11 @@ def parse_args():
default
=
"source"
,
help
=
"[source|pb] Load models in generated `source` code"
+
"or `pb` file."
)
parser
.
add_argument
(
"--gpu_data_type"
,
type
=
str
,
default
=
"half"
,
help
=
"[half | float]."
)
return
parser
.
parse_known_args
()
...
...
@@ -809,7 +814,8 @@ def main(unused_args):
model_config
[
"fast_conv"
],
model_config
[
"obfuscate"
],
model_output_base_dir
,
FLAGS
.
model_load_type
)
FLAGS
.
model_load_type
,
FLAGS
.
gpu_data_type
)
for
target_abi
in
configs
[
"target_abis"
]:
for
target_soc
in
target_socs
:
...
...
tools/sh_commands.py
浏览文件 @
81865af1
...
...
@@ -470,7 +470,8 @@ def gen_model_code(model_codegen_dir,
fast_conv
,
obfuscate
,
model_output_dir
,
model_load_type
):
model_load_type
,
gpu_data_type
):
print
(
"* Genearte model code"
)
bazel_build_common
(
"//mace/python/tools:converter"
)
...
...
@@ -499,6 +500,7 @@ def gen_model_code(model_codegen_dir,
"--codegen_output=%s/model.cc"
%
model_codegen_dir
,
"--pb_output=%s/%s.pb"
%
(
model_output_dir
,
model_tag
),
"--model_load_type=%s"
%
model_load_type
,
"--gpu_data_type=%s"
%
gpu_data_type
,
_out
=
process_output
,
_bg
=
True
,
_err_to_out
=
True
)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录