提交 81865af1 编写于 作者: Y yejianwu

Merge branch 'master' of v9.git.n.xiaomi.com:deep-computing/mace into load_model_in_pb

...@@ -12,12 +12,17 @@ ...@@ -12,12 +12,17 @@
mobile heterogeneous computing platforms. The design is focused on the following mobile heterogeneous computing platforms. The design is focused on the following
targets: targets:
* Performance * Performance
* The runtime is highly optimized with NEON, OpenCL and HVX. Except for the * The runtime is highly optimized with NEON, OpenCL and Hexagon, and
inference speed, the initialization speed is also intensively optimized. [Winograd algorithm](https://arxiv.org/abs/1509.09308) is introduced to
speed up the convolution operations. Except for the inference speed, the
initialization speed is also intensively optimized.
* Power consumption * Power consumption
* Chip dependent power options are included as advanced API. * Chip dependent power options like big.LITTLE scheduling, Adreno GPU hints are
included as advanced API.
* Memory usage and library footprint * Memory usage and library footprint
* Graph level memory allocation optimization and buffer reuse is supported. * Graph level memory allocation optimization and buffer reuse is supported.
The core library tries to keep minium external dependencies to keep the
library footprint small.
* Model protection * Model protection
* Model protection is one the highest priority feature from the beginning of * Model protection is one the highest priority feature from the beginning of
the design. Various techniques are introduced like coverting models to C++ the design. Various techniques are introduced like coverting models to C++
...@@ -28,31 +33,34 @@ targets: ...@@ -28,31 +33,34 @@ targets:
archetectures with limited performance. archetectures with limited performance.
## Getting Started ## Getting Started
* [Introduction](docs/getting_started/introduction.rst)
* [How to build](docs/getting_started/how_to_build.rst)
* [Create a model deployment file](docs/getting_started/create_a_model_deployment.rst)
## Performance ## Performance
[MiAI Model Zoo](http://v9.git.n.xiaomi.com/deep-computing/mace-models) contains [MiAI Compute Engine Model Zoo](http://v9.git.n.xiaomi.com/deep-computing/mace-models) contains
several common neural networks models and built daily against several mobile several common neural networks models and built daily against a list of mobile
phones. The benchmark result can be found in the CI result page. phones. The benchmark result can be found in the CI result page.
## Communication ## Communication
* GitHub issues: bug reports, usage issues, feature requests * GitHub issues: bug reports, usage issues, feature requests
* Gitter or Slack: * Gitter:
* QQ群: * QQ群: 756046893
## Contributing ## Contributing
Any kind of contributions are welcome. For bug reports, feature requests, Any kind of contributions are welcome. For bug reports, feature requests,
please just open an issue without any hesitance. For code contributions, it's please just open an issue without any hesitance. For code contributions, it's
strongly suggested to open an issue for discussion first. For more details, strongly suggested to open an issue for discussion first. For more details,
please refer to [this guide](docs). please refer to [the contribution guide](docs/development/contributing.md).
## License ## License
[Apache License 2.0](LICENSE). [Apache License 2.0](LICENSE).
## Acknowledgement ## Acknowledgement
*MiAI Compute Engine* depends on several open source projects located in MiAI Compute Engine depends on several open source projects located in
[third_party](mace/third_party) directory. Particularly, we learned a lot from [third_party](mace/third_party) directory. Particularly, we learned a lot from
the following projects during the development: the following projects during the development:
* [nnlib](https://source.codeaurora.org/quic/hexagon_nn/nnlib): the DSP runtime * [Qualcomm Hexagon NN Offload Framework](https://source.codeaurora.org/quic/hexagon_nn/nnlib): the Hexagon DSP runtime
depends on this library. depends on this library.
* [TensorFlow](https://github.com/tensorflow/tensorflow), * [TensorFlow](https://github.com/tensorflow/tensorflow),
[Caffe](https://github.com/BVLC/caffe), [Caffe](https://github.com/BVLC/caffe),
......
...@@ -5,10 +5,12 @@ v0.6.0 (2018-04-04) ...@@ -5,10 +5,12 @@ v0.6.0 (2018-04-04)
------ ------
1. Change mace header interfaces, only including necessary methods. 1. Change mace header interfaces, only including necessary methods.
v0.6.3 (2018-05-21)
------
1. support `float` data_type when running in gpu
v0.7.0 (2018-05-18) v0.7.0 (2018-05-18)
------ ------
1. Change interface that report error type 1. Change interface that report error type
2. Improve cpu performace 2. Improve cpu performace
3. Merge cpu/gpu engine to one 3. Merge cpu/gpu engine to one
\ No newline at end of file
FROM ubuntu:16.04 FROM ubuntu:16.04
# Update source # Update source
# Looks like mirrors.163.com does not work in xiaomi network
# RUN sed -i 's/http:\/\/archive\.ubuntu\.com\/ubuntu\//http:\/\/mirrors\.163\.com\/ubuntu\//g' /etc/apt/sources.list
RUN apt-get update -y RUN apt-get update -y
## Basic tools ## Basic tools
......
FROM cr.d.xiaomi.net/mace/mace-dev:latest FROM cr.d.xiaomi.net/mace/mace-dev:latest
# Update source # Update source
# Looks like mirrors.163.com does not work in xiaomi network
# RUN sed -i 's/http:\/\/archive\.ubuntu\.com\/ubuntu\//http:\/\/mirrors\.163\.com\/ubuntu\//g' /etc/apt/sources.list
RUN apt-get update -y RUN apt-get update -y
# Install gitlab runner # Install gitlab runner
......
docs/getting_started/mace-arch.png

18.2 KB | W: | H:

docs/getting_started/mace-arch.png

18.0 KB | W: | H:

docs/getting_started/mace-arch.png
docs/getting_started/mace-arch.png
docs/getting_started/mace-arch.png
docs/getting_started/mace-arch.png
  • 2-up
  • Swipe
  • Onion skin
...@@ -65,6 +65,18 @@ extern void Conv2dNeonK7x7S3(const float *input, ...@@ -65,6 +65,18 @@ extern void Conv2dNeonK7x7S3(const float *input,
const index_t *out_shape, const index_t *out_shape,
float *output); float *output);
extern void Conv2dNeonK1x15S1(const float *input,
const float *filter,
const index_t *in_shape,
const index_t *out_shape,
float *output);
extern void Conv2dNeonK15x1S1(const float *input,
const float *filter,
const index_t *in_shape,
const index_t *out_shape,
float *output);
} // namespace kernels } // namespace kernels
} // namespace mace } // namespace mace
......
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#if defined(MACE_ENABLE_NEON)
#include <arm_neon.h>
#endif
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/utils/utils.h"
namespace mace {
namespace kernels {
inline void Conv2dCPUK15x1Calc(const float *in_ptr,
const float *filter_ptr,
const index_t in_width,
const index_t in_channels,
const index_t out_height,
const index_t out_width,
const index_t w,
const index_t tile_width,
const index_t out_image_size,
float *out_ptr,
const index_t io,
const int stride) {
for (index_t ih = 0; ih < out_height; ++ih) {
for (index_t iw = 0; iw < tile_width && w + iw < out_width; ++iw) {
for (int i = 0; i < 15; ++i) {
for (int j = 0; j < 1; ++j) {
out_ptr[io * out_image_size + ih * out_width + w + iw]
+= in_ptr[(ih * stride + i) * in_width + ((w + iw) * stride + j)]
* filter_ptr[io * in_channels * 15 + i * 1 + j];
}
}
}
}
}
// Ho = 4, Wo = 1, Co = 1
void Conv2dNeonK15x1S1(const float *input,
const float *filter,
const index_t *in_shape,
const index_t *out_shape,
float *output) {
const index_t in_image_size = in_shape[2] * in_shape[3];
const index_t out_image_size = out_shape[2] * out_shape[3];
const index_t in_batch_size = in_shape[1] * in_image_size;
const index_t out_batch_size = out_shape[1] * out_image_size;
const index_t tile_width =
out_shape[1] < 4 ? RoundUpDiv4(out_shape[3]) : out_shape[3];
#pragma omp parallel for collapse(3)
for (index_t b = 0; b < out_shape[0]; ++b) {
for (index_t m = 0; m < out_shape[1]; ++m) {
for (index_t w = 0; w < out_shape[3]; w += tile_width) {
const index_t out_height = out_shape[2];
const index_t out_width = out_shape[3];
const index_t in_channels = in_shape[1];
const index_t in_width = in_shape[3];
float *out_ptr_base =
output + b * out_batch_size + m * out_image_size;
for (index_t c = 0; c < in_channels; ++c) {
const float *in_ptr_base =
input + b * in_batch_size + c * in_image_size;
const float *filter_ptr = filter + m * in_channels * 15 + c * 15;
#if defined(MACE_ENABLE_NEON) && !defined(__aarch64__)
/* load filter (1 outch x 1 height x 4 width) */
float32x4_t vf0, vf1, vf2, vf3;
vf0 = vld1q_f32(filter_ptr);
vf1 = vld1q_f32(filter_ptr + 4);
vf2 = vld1q_f32(filter_ptr + 8);
vf3 = vld1q_f32(filter_ptr + 11);
for (index_t h = 0; h + 3 < out_height; h += 4) {
for (index_t wt = 0; wt < tile_width && w + wt < out_width; ++wt) {
// load output
index_t out_offset = h * out_width + w + wt;
// output (1 outch x 1 height x 4 width): vo_outch_height
float32x4_t vo = {out_ptr_base[out_offset],
out_ptr_base[out_offset + out_width],
out_ptr_base[out_offset + 2 * out_width],
out_ptr_base[out_offset + 3 * out_width]};
// input offset
index_t in_offset = h * in_width + w + wt;
// input (3 slide)
float32x4_t vi0 = {in_ptr_base[in_offset],
in_ptr_base[in_offset + in_width],
in_ptr_base[in_offset + 2 * in_width],
in_ptr_base[in_offset + 3 * in_width]};
float32x4_t vi4 = {in_ptr_base[in_offset + 4 * in_width],
in_ptr_base[in_offset + 5 * in_width],
in_ptr_base[in_offset + 6 * in_width],
in_ptr_base[in_offset + 7 * in_width]};
float32x4_t vi8 = {in_ptr_base[in_offset + 8 * in_width],
in_ptr_base[in_offset + 9 * in_width],
in_ptr_base[in_offset + 10 * in_width],
in_ptr_base[in_offset + 11 * in_width]};
float32x4_t vi12 = {in_ptr_base[in_offset + 12 * in_width],
in_ptr_base[in_offset + 13 * in_width],
in_ptr_base[in_offset + 14 * in_width],
in_ptr_base[in_offset + 15 * in_width]};
float32x4_t vi16 = {in_ptr_base[in_offset + 16 * in_width],
in_ptr_base[in_offset + 17 * in_width]};
float32x4_t vi1 = vextq_f32(vi0, vi4, 1);
float32x4_t vi2 = vextq_f32(vi0, vi4, 2);
float32x4_t vi3 = vextq_f32(vi0, vi4, 3);
float32x4_t vi5 = vextq_f32(vi4, vi8, 1);
float32x4_t vi6 = vextq_f32(vi4, vi8, 2);
float32x4_t vi7 = vextq_f32(vi4, vi8, 3);
float32x4_t vi9 = vextq_f32(vi8, vi12, 1);
float32x4_t vi10 = vextq_f32(vi8, vi12, 2);
float32x4_t vi11 = vextq_f32(vi8, vi12, 3);
float32x4_t vi13 = vextq_f32(vi12, vi16, 1);
float32x4_t vi14 = vextq_f32(vi12, vi16, 2);
vo = vmlaq_lane_f32(vo, vi0, vget_low_f32(vf0), 0);
vo = vmlaq_lane_f32(vo, vi1, vget_low_f32(vf0), 1);
vo = vmlaq_lane_f32(vo, vi2, vget_high_f32(vf0), 0);
vo = vmlaq_lane_f32(vo, vi3, vget_high_f32(vf0), 1);
vo = vmlaq_lane_f32(vo, vi4, vget_low_f32(vf1), 0);
vo = vmlaq_lane_f32(vo, vi5, vget_low_f32(vf1), 1);
vo = vmlaq_lane_f32(vo, vi6, vget_high_f32(vf1), 0);
vo = vmlaq_lane_f32(vo, vi7, vget_high_f32(vf1), 1);
vo = vmlaq_lane_f32(vo, vi8, vget_low_f32(vf2), 0);
vo = vmlaq_lane_f32(vo, vi9, vget_low_f32(vf2), 1);
vo = vmlaq_lane_f32(vo, vi10, vget_high_f32(vf2), 0);
vo = vmlaq_lane_f32(vo, vi11, vget_high_f32(vf2), 1);
vo = vmlaq_lane_f32(vo, vi12, vget_low_f32(vf3), 1);
vo = vmlaq_lane_f32(vo, vi13, vget_high_f32(vf3), 0);
vo = vmlaq_lane_f32(vo, vi14, vget_high_f32(vf3), 1);
out_ptr_base[out_offset] = vo[0];
out_ptr_base[out_offset + out_width] = vo[1];
out_ptr_base[out_offset + 2 * out_width] = vo[2];
out_ptr_base[out_offset + 3 * out_width] = vo[3];
} // wt
} // h
#else
Conv2dCPUK15x1Calc(in_ptr_base, filter_ptr, in_width, in_channels,
out_height, out_width, w, tile_width,
out_image_size, out_ptr_base, 0, 1);
#endif
} // c
} // w
} // m
} // b
}
} // namespace kernels
} // namespace mace
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#if defined(MACE_ENABLE_NEON)
#include <arm_neon.h>
#endif
#include "mace/kernels/arm/conv_2d_neon.h"
#include "mace/utils/utils.h"
#include "mace/utils/logging.h"
namespace mace {
namespace kernels {
inline void Conv2dCPUK1x15Calc(const float *in_ptr,
const float *filter_ptr,
const index_t in_width,
const index_t in_channels,
const index_t out_height,
const index_t h,
const index_t tile_height,
const index_t out_width,
const index_t out_image_size,
float *out_ptr,
const index_t io,
const int stride) {
for (index_t ih = 0; ih < tile_height && h + ih < out_height; ++ih) {
for (index_t iw = 0; iw < out_width; ++iw) {
for (int i = 0; i < 1; ++i) {
for (int j = 0; j < 15; ++j) {
out_ptr[io * out_image_size + (h + ih) * out_width + iw]
+= in_ptr[((h + ih) * stride + i) * in_width + (iw * stride + j)]
* filter_ptr[io * in_channels * 15 + i * 15 + j];
}
}
}
}
}
// Ho = 1, Wo = 4, Co = 1
void Conv2dNeonK1x15S1(const float *input,
const float *filter,
const index_t *in_shape,
const index_t *out_shape,
float *output) {
const index_t in_image_size = in_shape[2] * in_shape[3];
const index_t out_image_size = out_shape[2] * out_shape[3];
const index_t in_batch_size = in_shape[1] * in_image_size;
const index_t out_batch_size = out_shape[1] * out_image_size;
const index_t tile_height =
out_shape[1] < 4 ? RoundUpDiv4(out_shape[2]) : out_shape[2];
#pragma omp parallel for collapse(3)
for (index_t b = 0; b < out_shape[0]; ++b) {
for (index_t m = 0; m < out_shape[1]; ++m) {
for (index_t h = 0; h < out_shape[2]; h += tile_height) {
const index_t out_height = out_shape[2];
const index_t out_width = out_shape[3];
const index_t in_channels = in_shape[1];
const index_t in_width = in_shape[3];
float *out_ptr_base =
output + b * out_batch_size + m * out_image_size;
for (index_t c = 0; c < in_channels; ++c) {
const float *in_ptr_base =
input + b * in_batch_size + c * in_image_size;
const float *filter_ptr = filter + m * in_channels * 15 + c * 15;
#if defined(MACE_ENABLE_NEON) && !defined(__aarch64__)
/* load filter (1 outch x 4 height x 1 width) */
float32x4_t vf0, vf1, vf2, vf3;
vf0 = vld1q_f32(filter_ptr);
vf1 = vld1q_f32(filter_ptr + 4);
vf2 = vld1q_f32(filter_ptr + 8);
vf3 = vld1q_f32(filter_ptr + 11);
for (index_t ht = 0; ht < tile_height && h + ht < out_height; ++ht) {
for (index_t w = 0; w + 3 < out_width; w += 4) {
// output (1 outch x 1 height x 4 width): vo_outch_height
float32x4_t vo;
// load output
index_t out_offset = (h + ht) * out_width + w;
vo = vld1q_f32(out_ptr_base + out_offset);
// input (3 slide)
float32x4_t vi0, vi1, vi2, vi3, vi4, vi5, vi6, vi7, vi8, vi9,
vi10, vi11, vi12, vi13, vi14, vi16;
// input offset
index_t in_offset = (h + ht) * in_width + w;
// load input
vi0 = vld1q_f32(in_ptr_base + in_offset);
vi4 = vld1q_f32(in_ptr_base + in_offset + 4);
vi8 = vld1q_f32(in_ptr_base + in_offset + 8);
vi12 = vld1q_f32(in_ptr_base + in_offset + 12);
vi16 = vld1q_f32(in_ptr_base + in_offset + 16);
vi1 = vextq_f32(vi0, vi4, 1);
vi2 = vextq_f32(vi0, vi4, 2);
vi3 = vextq_f32(vi0, vi4, 3);
vi5 = vextq_f32(vi4, vi8, 1);
vi6 = vextq_f32(vi4, vi8, 2);
vi7 = vextq_f32(vi4, vi8, 3);
vi9 = vextq_f32(vi8, vi12, 1);
vi10 = vextq_f32(vi8, vi12, 2);
vi11 = vextq_f32(vi8, vi12, 3);
vi13 = vextq_f32(vi12, vi16, 1);
vi14 = vextq_f32(vi12, vi16, 2);
vo = vmlaq_lane_f32(vo, vi0, vget_low_f32(vf0), 0);
vo = vmlaq_lane_f32(vo, vi1, vget_low_f32(vf0), 1);
vo = vmlaq_lane_f32(vo, vi2, vget_high_f32(vf0), 0);
vo = vmlaq_lane_f32(vo, vi3, vget_high_f32(vf0), 1);
vo = vmlaq_lane_f32(vo, vi4, vget_low_f32(vf1), 0);
vo = vmlaq_lane_f32(vo, vi5, vget_low_f32(vf1), 1);
vo = vmlaq_lane_f32(vo, vi6, vget_high_f32(vf1), 0);
vo = vmlaq_lane_f32(vo, vi7, vget_high_f32(vf1), 1);
vo = vmlaq_lane_f32(vo, vi8, vget_low_f32(vf2), 0);
vo = vmlaq_lane_f32(vo, vi9, vget_low_f32(vf2), 1);
vo = vmlaq_lane_f32(vo, vi10, vget_high_f32(vf2), 0);
vo = vmlaq_lane_f32(vo, vi11, vget_high_f32(vf2), 1);
vo = vmlaq_lane_f32(vo, vi12, vget_low_f32(vf3), 1);
vo = vmlaq_lane_f32(vo, vi13, vget_high_f32(vf3), 0);
vo = vmlaq_lane_f32(vo, vi14, vget_high_f32(vf3), 1);
vst1q_f32(out_ptr_base + out_offset, vo);
} // w
} // ht
#else
Conv2dCPUK1x15Calc(in_ptr_base, filter_ptr, in_width, in_channels,
out_height, h, tile_height, out_width,
out_image_size, out_ptr_base, 0, 1);
#endif
} // c
} // h
} // m
} // b
}
} // namespace kernels
} // namespace mace
此差异已折叠。
...@@ -363,6 +363,10 @@ struct Conv2dFunctor<DeviceType::CPU, float> : Conv2dFunctorBase { ...@@ -363,6 +363,10 @@ struct Conv2dFunctor<DeviceType::CPU, float> : Conv2dFunctorBase {
&& stride_h == 2 && stride_w == 2 && dilation_h == 1 && dilation_w == 1; && stride_h == 2 && stride_w == 2 && dilation_h == 1 && dilation_w == 1;
bool use_neon_7x7_s3 = filter_h == 7 && filter_w == 7 bool use_neon_7x7_s3 = filter_h == 7 && filter_w == 7
&& stride_h == 3 && stride_w == 3 && dilation_h == 1 && dilation_w == 1; && stride_h == 3 && stride_w == 3 && dilation_h == 1 && dilation_w == 1;
bool use_neon_1x15_s1 = filter_h == 1 && filter_w == 15
&& stride_h == 1 && stride_w == 1 && dilation_h == 1 && dilation_w == 1;
bool use_neon_15x1_s1 = filter_h == 15 && filter_w == 1
&& stride_h == 1 && stride_w == 1 && dilation_h == 1 && dilation_w == 1;
std::vector<index_t> transformed_input_shape; std::vector<index_t> transformed_input_shape;
std::vector<index_t> transformed_output_shape; std::vector<index_t> transformed_output_shape;
...@@ -402,24 +406,26 @@ struct Conv2dFunctor<DeviceType::CPU, float> : Conv2dFunctorBase { ...@@ -402,24 +406,26 @@ struct Conv2dFunctor<DeviceType::CPU, float> : Conv2dFunctorBase {
tile_count}); tile_count});
transformed_filter_shape.insert(transformed_filter_shape.end(), transformed_filter_shape.insert(transformed_filter_shape.end(),
{in_tile_area, channels, input_channels}); {in_tile_area, channels, input_channels});
} else if (use_neon_3x3_s1) { } else {
extra_output_height = RoundUp<index_t>(height, 2); index_t tile_h, tile_w;
extra_input_height = if (use_neon_1x1_s1) {
std::max(padded_input_height, extra_output_height + 2); tile_h = 1;
extra_output_width = RoundUp<index_t>(width, 4); tile_w = 1;
extra_input_width = std::max(padded_input_width, extra_output_width + 2); } else if (use_neon_3x3_s1) {
if (extra_input_height != padded_input_height) { tile_h = 2;
pad_bottom += (extra_input_height - padded_input_height); tile_w = 4;
} } else if (use_neon_15x1_s1) {
if (extra_input_width != padded_input_width) { tile_h = 4;
pad_right += (extra_input_width - padded_input_width); tile_w = 1;
} else {
tile_h = 1;
tile_w = 4;
} }
} else if (!use_neon_1x1_s1) { extra_output_height = RoundUp<index_t>(height, tile_h);
extra_output_height = height;
extra_input_height = extra_input_height =
std::max(padded_input_height, (extra_output_height - 1) * stride_h std::max(padded_input_height, (extra_output_height - 1) * stride_h
+ (filter_h - 1) * dilation_h + 1); + (filter_h - 1) * dilation_h + 1);
extra_output_width = RoundUp<index_t>(width, 4); extra_output_width = RoundUp<index_t>(width, tile_w);
extra_input_width = extra_input_width =
std::max(padded_input_width, (extra_output_width - 1) * stride_w std::max(padded_input_width, (extra_output_width - 1) * stride_w
+ (filter_w - 1) * dilation_w + 1); + (filter_w - 1) * dilation_w + 1);
...@@ -584,6 +590,22 @@ struct Conv2dFunctor<DeviceType::CPU, float> : Conv2dFunctorBase { ...@@ -584,6 +590,22 @@ struct Conv2dFunctor<DeviceType::CPU, float> : Conv2dFunctorBase {
extra_output_shape, extra_output_shape,
pad_output); pad_output);
}; };
} else if (use_neon_1x15_s1) {
conv_func = [=](const float *pad_input, float *pad_output) {
Conv2dNeonK1x15S1(pad_input,
filter_data,
extra_input_shape,
extra_output_shape,
pad_output);
};
} else if (use_neon_15x1_s1) {
conv_func = [=](const float *pad_input, float *pad_output) {
Conv2dNeonK15x1S1(pad_input,
filter_data,
extra_input_shape,
extra_output_shape,
pad_output);
};
} else { } else {
conv_func = [=](const float *pad_input, float *pad_output) { conv_func = [=](const float *pad_input, float *pad_output) {
Conv2dGeneral(pad_input, Conv2dGeneral(pad_input,
......
...@@ -43,6 +43,7 @@ struct SoftmaxFunctor<DeviceType::CPU, float> { ...@@ -43,6 +43,7 @@ struct SoftmaxFunctor<DeviceType::CPU, float> {
const index_t batch = input->dim(0); const index_t batch = input->dim(0);
const index_t class_count = input->dim(1); const index_t class_count = input->dim(1);
const index_t class_size = input->dim(2) * input->dim(3); const index_t class_size = input->dim(2) * input->dim(3);
const index_t batch_size = class_count * class_size;
Tensor::MappingGuard input_guard(input); Tensor::MappingGuard input_guard(input);
Tensor::MappingGuard output_guard(output); Tensor::MappingGuard output_guard(output);
...@@ -50,46 +51,37 @@ struct SoftmaxFunctor<DeviceType::CPU, float> { ...@@ -50,46 +51,37 @@ struct SoftmaxFunctor<DeviceType::CPU, float> {
float *output_data = output->mutable_data<float>(); float *output_data = output->mutable_data<float>();
for (index_t b = 0; b < batch; ++b) { for (index_t b = 0; b < batch; ++b) {
std::vector<float>
max_val(class_size, std::numeric_limits<float>::lowest());
std::vector<float> sum_val(class_size, 0.f);
// calculate max for each class
for (index_t c = 0; c < class_count; ++c) {
const float
*input_ptr = input_data + (b * class_count + c) * class_size;
for (index_t k = 0; k < class_size; ++k) {
max_val[k] = std::max(max_val[k], input_ptr[k]);
}
}
// calculate data - max for each class
#pragma omp parallel for #pragma omp parallel for
for (index_t c = 0; c < class_count; ++c) { for (index_t k = 0; k < class_size; ++k) {
const float const float *input_ptr = input_data + b * batch_size + k;
*input_ptr = input_data + (b * class_count + c) * class_size; float *output_ptr = output_data + b * batch_size + k;
float *output_ptr = output_data + (b * class_count + c) * class_size;
for (index_t k = 0; k < class_size; ++k) { float max_val = std::numeric_limits<float>::lowest();
output_ptr[k] = ::exp(input_ptr[k] - max_val[k]); index_t channel_offset = 0;
for (index_t c = 0; c < class_count; ++c) {
float data = input_ptr[channel_offset];
if (data > max_val) {
max_val = data;
}
channel_offset += class_size;
} }
}
// calculate sum for each class channel_offset = 0;
for (index_t c = 0; c < class_count; ++c) { float sum = 0;
float *output_ptr = output_data + (b * class_count + c) * class_size; for (index_t c = 0; c < class_count; ++c) {
for (index_t k = 0; k < class_size; ++k) { float exp_value = ::exp(input_ptr[channel_offset] - max_val);
sum_val[k] += output_ptr[k]; sum += exp_value;
output_ptr[channel_offset] = exp_value;
channel_offset += class_size;
} }
}
// calculate (data - max) / sum for each class channel_offset = 0;
for (index_t c = 0; c < class_count; ++c) { for (index_t c = 0; c < class_count; ++c) {
float *output_ptr = output_data + (b * class_count + c) * class_size; output_ptr[channel_offset] /= sum;
for (index_t k = 0; k < class_size; ++k) { channel_offset += class_size;
output_ptr[k] /= sum_val[k];
} }
} } // k
} } // b
} }
}; };
......
...@@ -15,6 +15,10 @@ ...@@ -15,6 +15,10 @@
#ifndef MACE_KERNELS_TRANSPOSE_H_ #ifndef MACE_KERNELS_TRANSPOSE_H_
#define MACE_KERNELS_TRANSPOSE_H_ #define MACE_KERNELS_TRANSPOSE_H_
#if defined(MACE_ENABLE_NEON)
#include <arm_neon.h>
#endif
#include <vector> #include <vector>
#include "mace/core/future.h" #include "mace/core/future.h"
...@@ -25,6 +29,65 @@ ...@@ -25,6 +29,65 @@
namespace mace { namespace mace {
namespace kernels { namespace kernels {
static void TransposeNHWCToNCHWC3(const float *input,
float *output,
const index_t height,
const index_t width) {
index_t image_size = height * width;
#pragma omp parallel for
for (index_t h = 0; h < height; ++h) {
index_t in_offset = h * width * 3;
index_t out_offset = h * width;
index_t w;
for (w = 0; w + 3 < width; w += 4) {
float32x4x3_t vi = vld3q_f32(input + in_offset);
vst1q_f32(output + out_offset, vi.val[0]);
vst1q_f32(output + out_offset + image_size, vi.val[1]);
vst1q_f32(output + out_offset + image_size * 2, vi.val[2]);
in_offset += 12;
out_offset += 4;
}
for (; w < width; ++w) {
for (index_t c = 0; c < 3; ++c) {
output[h * width + image_size * c + w] =
input[h * width * 3 + w * 3 + c];
}
}
}
}
static void TransposeNCHWToNHWCC2(const float *input,
float *output,
const index_t height,
const index_t width) {
index_t image_size = height * width;
#pragma omp parallel for
for (index_t h = 0; h < height; ++h) {
index_t in_offset = h * width;
index_t out_offset = h * width * 2;
index_t w;
for (w = 0; w + 3 < width; w += 4) {
float32x4_t vi0 = vld1q_f32(input + in_offset);
float32x4_t vi1 = vld1q_f32(input + in_offset + image_size);
float32x4x2_t vi = {vi0, vi1};
vst2q_f32(output + out_offset, vi);
in_offset += 4;
out_offset += 8;
}
for (; w < width; ++w) {
for (index_t c = 0; c < 2; ++c) {
output[h * width * 2 + w * 2 + c] =
input[h * width + image_size * c + w];
}
}
}
}
template<DeviceType D, typename T> template<DeviceType D, typename T>
struct TransposeFunctor { struct TransposeFunctor {
explicit TransposeFunctor(const std::vector<int> &dims) : dims_(dims) {} explicit TransposeFunctor(const std::vector<int> &dims) : dims_(dims) {}
...@@ -48,28 +111,48 @@ struct TransposeFunctor { ...@@ -48,28 +111,48 @@ struct TransposeFunctor {
} }
} }
} else if (input->dim_size() == 4) { } else if (input->dim_size() == 4) {
std::vector<index_t> std::vector<int> transpose_order_from_NHWC_to_NCHW{0, 3, 1, 2};
in_stride{input_shape[1] * input_shape[2] * input_shape[3], std::vector<int> transpose_order_from_NCHW_to_NHWC{0, 2, 3, 1};
input_shape[2] * input_shape[3], input_shape[3], 1}; index_t batch_size = input->dim(1) * input->dim(2) * input->dim(3);
std::vector<index_t> if (dims_ == transpose_order_from_NHWC_to_NCHW && input->dim(3) == 3) {
out_stride{output_shape[1] * output_shape[2] * output_shape[3], for (index_t b = 0; b < input->dim(0); ++b) {
output_shape[2] * output_shape[3], output_shape[3], 1}; TransposeNHWCToNCHWC3(input_data + b * batch_size,
output_data + b * batch_size,
std::vector<index_t> idim(4, 0); input->dim(1),
std::vector<index_t> odim(4, 0); input->dim(2));
for (odim[0] = 0; odim[0] < output_shape[0]; ++odim[0]) { }
for (odim[1] = 0; odim[1] < output_shape[1]; ++odim[1]) { } else if (dims_ == transpose_order_from_NCHW_to_NHWC
for (odim[2] = 0; odim[2] < output_shape[2]; ++odim[2]) { && input->dim(1) == 2) {
for (odim[3] = 0; odim[3] < output_shape[3]; ++odim[3]) { for (index_t b = 0; b < input->dim(0); ++b) {
idim[dims_[0]] = odim[0]; TransposeNCHWToNHWCC2(input_data + b * batch_size,
idim[dims_[1]] = odim[1]; output_data + b * batch_size,
idim[dims_[2]] = odim[2]; input->dim(2),
idim[dims_[3]] = odim[3]; input->dim(3));
}
output_data[odim[0] * out_stride[0] + odim[1] * out_stride[1] } else {
+ odim[2] * out_stride[2] + odim[3]] = std::vector<index_t>
input_data[idim[0] * in_stride[0] + idim[1] * in_stride[1] in_stride{input_shape[1] * input_shape[2] * input_shape[3],
+ idim[2] * in_stride[2] + idim[3]]; input_shape[2] * input_shape[3], input_shape[3], 1};
std::vector<index_t>
out_stride{output_shape[1] * output_shape[2] * output_shape[3],
output_shape[2] * output_shape[3], output_shape[3], 1};
std::vector<index_t> idim(4, 0);
std::vector<index_t> odim(4, 0);
for (odim[0] = 0; odim[0] < output_shape[0]; ++odim[0]) {
for (odim[1] = 0; odim[1] < output_shape[1]; ++odim[1]) {
for (odim[2] = 0; odim[2] < output_shape[2]; ++odim[2]) {
for (odim[3] = 0; odim[3] < output_shape[3]; ++odim[3]) {
idim[dims_[0]] = odim[0];
idim[dims_[1]] = odim[1];
idim[dims_[2]] = odim[2];
idim[dims_[3]] = odim[3];
output_data[odim[0] * out_stride[0] + odim[1] * out_stride[1]
+ odim[2] * out_stride[2] + odim[3]] =
input_data[idim[0] * in_stride[0] + idim[1] * in_stride[1]
+ idim[2] * in_stride[2] + idim[3]];
}
} }
} }
} }
......
...@@ -165,6 +165,13 @@ BM_CONV_2D(1, 32, 256, 256, 3, 3, 1, 4, VALID, 32); ...@@ -165,6 +165,13 @@ BM_CONV_2D(1, 32, 256, 256, 3, 3, 1, 4, VALID, 32);
BM_CONV_2D(1, 128, 56, 56, 1, 1, 1, 1, SAME, 128); BM_CONV_2D(1, 128, 56, 56, 1, 1, 1, 1, SAME, 128);
BM_CONV_2D(1, 1024, 7, 7, 1, 1, 1, 1, SAME, 1024); BM_CONV_2D(1, 1024, 7, 7, 1, 1, 1, 1, SAME, 1024);
BM_CONV_2D(64, 32, 34, 34, 3, 3, 1, 1, VALID, 32);
BM_CONV_2D(1, 32, 34, 34, 3, 3, 1, 1, VALID, 32);
BM_CONV_2D(1, 32, 256, 256, 1, 15, 1, 1, SAME, 2);
BM_CONV_2D(1, 32, 256, 256, 15, 1, 1, 1, SAME, 2);
BM_CONV_2D(1, 64, 64, 64, 15, 1, 1, 1, SAME, 2);
} // namespace test } // namespace test
} // namespace ops } // namespace ops
} // namespace mace } // namespace mace
...@@ -779,11 +779,17 @@ TEST_F(Conv2dOpTest, OPENCLHalfAlignedConv3x3S12) { ...@@ -779,11 +779,17 @@ TEST_F(Conv2dOpTest, OPENCLHalfAlignedConv3x3S12) {
TEST_F(Conv2dOpTest, OPENCLHalfAlignedConv15x1S12) { TEST_F(Conv2dOpTest, OPENCLHalfAlignedConv15x1S12) {
TestHalfComplexConvNxNS12<DeviceType::GPU>({32, 32}, {15, 1, 256, 2}, TestHalfComplexConvNxNS12<DeviceType::GPU>({32, 32}, {15, 1, 256, 2},
{1, 1}); {1, 1});
TestHalfComplexConvNxNS12<DeviceType::GPU>({64, 64}, {15, 1, 64, 2},
{1, 1});
TestHalfComplexConvNxNS12<DeviceType::GPU>({256, 256}, {15, 1, 32, 2},
{1, 1});
} }
TEST_F(Conv2dOpTest, OPENCLHalfAlignedConv1x15S12) { TEST_F(Conv2dOpTest, OPENCLHalfAlignedConv1x15S12) {
TestHalfComplexConvNxNS12<DeviceType::GPU>({32, 32}, {1, 15, 256, 2}, TestHalfComplexConvNxNS12<DeviceType::GPU>({32, 32}, {1, 15, 256, 2},
{1, 1}); {1, 1});
TestHalfComplexConvNxNS12<DeviceType::GPU>({256, 256}, {1, 15, 32, 2},
{1, 1});
} }
TEST_F(Conv2dOpTest, OPENCLHalfAlignedConv7x75S12) { TEST_F(Conv2dOpTest, OPENCLHalfAlignedConv7x75S12) {
......
...@@ -83,6 +83,9 @@ void TransposeBenchmark(int iters, ...@@ -83,6 +83,9 @@ void TransposeBenchmark(int iters,
#define BM_TRANSPOSE4D(N, C, H, W, D0, D1, D2, D3) \ #define BM_TRANSPOSE4D(N, C, H, W, D0, D1, D2, D3) \
BM_TRANSPOSE4D_MACRO(N, C, H, W, D0, D1, D2, D3, float, CPU); BM_TRANSPOSE4D_MACRO(N, C, H, W, D0, D1, D2, D3, float, CPU);
BM_TRANSPOSE4D(1, 512, 512, 3, 0, 3, 1, 2);
BM_TRANSPOSE4D(1, 2, 512, 512, 0, 2, 3, 1);
BM_TRANSPOSE4D(1, 64, 64, 512, 0, 3, 1, 2); BM_TRANSPOSE4D(1, 64, 64, 512, 0, 3, 1, 2);
BM_TRANSPOSE4D(1, 512, 64, 64, 0, 2, 3, 1); BM_TRANSPOSE4D(1, 512, 64, 64, 0, 2, 3, 1);
BM_TRANSPOSE2D(128, 128); BM_TRANSPOSE2D(128, 128);
......
...@@ -37,16 +37,51 @@ void TransposeNCHWTest(const std::vector<index_t> &input_shape) { ...@@ -37,16 +37,51 @@ void TransposeNCHWTest(const std::vector<index_t> &input_shape) {
// Run on cpu // Run on cpu
net.RunOp(); net.RunOp();
net.FillNHWCInputToNCHWInput<DeviceType::CPU, float>("InputNCHW", "Input"); net.TransformDataFormat<DeviceType::CPU, float>("Input",
DataFormat::NHWC,
"InputNCHW",
DataFormat::NCHW);
ExpectTensorNear<float>(*net.GetOutput("InputNCHW"), ExpectTensorNear<float>(*net.GetOutput("InputNCHW"),
*net.GetOutput("Output")); *net.GetOutput("Output"));
} }
void TransposeNHWCTest(const std::vector<index_t> &input_shape) {
// Construct graph
OpsTestNet net;
// Add input data
net.AddRandomInput<CPU, float>("Input", input_shape);
OpDefBuilder("Transpose", "TransposeNHWCTest")
.Input("Input")
.Output("Output")
.AddIntsArg("dims", {0, 2, 3, 1})
.Finalize(net.NewOperatorDef());
// Run on cpu
net.RunOp();
net.TransformDataFormat<DeviceType::CPU, float>("Input",
DataFormat::NCHW,
"InputNHWC",
DataFormat::NHWC);
ExpectTensorNear<float>(*net.GetOutput("InputNHWC"),
*net.GetOutput("Output"));
}
} // namespace } // namespace
TEST_F(TransposeOpTest, NCHW) { TEST_F(TransposeOpTest, NHWC_to_NCHW) {
TransposeNCHWTest({3, 64, 64, 128}); TransposeNCHWTest({3, 64, 64, 128});
TransposeNCHWTest({1, 64, 48, 128}); TransposeNCHWTest({1, 64, 48, 128});
TransposeNCHWTest({1, 512, 512, 3});
TransposeNCHWTest({2, 512, 512, 3});
}
TEST_F(TransposeOpTest, NCHW_to_NHWC) {
TransposeNHWCTest({1, 2, 512, 512});
TransposeNHWCTest({1, 3, 512, 512});
TransposeNHWCTest({2, 2, 512, 512});
} }
TEST_F(TransposeOpTest, Rank2) { TEST_F(TransposeOpTest, Rank2) {
......
...@@ -40,11 +40,6 @@ FLAGS = None ...@@ -40,11 +40,6 @@ FLAGS = None
device_type_map = {'cpu': cvt.DeviceType.CPU.value, device_type_map = {'cpu': cvt.DeviceType.CPU.value,
'gpu': cvt.DeviceType.GPU.value, 'gpu': cvt.DeviceType.GPU.value,
'dsp': cvt.DeviceType.HEXAGON.value} 'dsp': cvt.DeviceType.HEXAGON.value}
device_data_type_map = {
cvt.DeviceType.CPU.value: mace_pb2.DT_FLOAT,
cvt.DeviceType.GPU.value: mace_pb2.DT_HALF,
cvt.DeviceType.HEXAGON.value: mace_pb2.DT_UINT8
}
def file_checksum(fname): def file_checksum(fname):
...@@ -129,6 +124,17 @@ def main(unused_args): ...@@ -129,6 +124,17 @@ def main(unused_args):
FLAGS.weight_file) FLAGS.weight_file)
output_graph_def = converter.run() output_graph_def = converter.run()
if FLAGS.gpu_data_type == 'half':
gpu_data_type = mace_pb2.DT_HALF
else:
gpu_data_type = mace_pb2.DT_FLOAT
device_data_type_map = {
cvt.DeviceType.CPU.value: mace_pb2.DT_FLOAT,
cvt.DeviceType.GPU.value: gpu_data_type,
cvt.DeviceType.HEXAGON.value: mace_pb2.DT_UINT8
}
print("Transform model to one that can better run on device") print("Transform model to one that can better run on device")
if not FLAGS.runtime: if not FLAGS.runtime:
cpu_graph_def = copy.deepcopy(output_graph_def) cpu_graph_def = copy.deepcopy(output_graph_def)
...@@ -180,7 +186,7 @@ def main(unused_args): ...@@ -180,7 +186,7 @@ def main(unused_args):
tensor_util.rename_tensor(output_graph_def) tensor_util.rename_tensor(output_graph_def)
tensor_infos, model_data = tensor_util.get_tensor_info_and_model_data( tensor_infos, model_data = tensor_util.get_tensor_info_and_model_data(
output_graph_def, FLAGS.runtime) output_graph_def, FLAGS.runtime, FLAGS.gpu_data_type)
source_converter_lib.convert_to_source( source_converter_lib.convert_to_source(
output_graph_def, model_checksum, weight_checksum, FLAGS.template, output_graph_def, model_checksum, weight_checksum, FLAGS.template,
...@@ -194,7 +200,10 @@ def main(unused_args): ...@@ -194,7 +200,10 @@ def main(unused_args):
f.write(bytearray(model_data)) f.write(bytearray(model_data))
if FLAGS.model_load_type == 'pb': if FLAGS.model_load_type == 'pb':
tensor_util.del_tensor_data(output_graph_def, FLAGS.runtime) tensor_util.del_tensor_data(
output_graph_def, FLAGS.runtime, FLAGS.gpu_data_type)
tensor_util.update_tensor_data_type(
output_graph_def, FLAGS.runtime, FLAGS.gpu_data_type)
with open(FLAGS.pb_output, "wb") as f: with open(FLAGS.pb_output, "wb") as f:
f.write(output_graph_def.SerializeToString()) f.write(output_graph_def.SerializeToString())
# with open(FLAGS.pb_output + '_txt', "wb") as f: # with open(FLAGS.pb_output + '_txt', "wb") as f:
...@@ -253,8 +262,6 @@ def parse_args(): ...@@ -253,8 +262,6 @@ def parse_args():
help="e.g., input_node") help="e.g., input_node")
parser.add_argument( parser.add_argument(
"--output_node", type=str, default="softmax", help="e.g., softmax") "--output_node", type=str, default="softmax", help="e.g., softmax")
parser.add_argument(
"--output_type", type=str, default="pb", help="output type: source/pb")
parser.add_argument( parser.add_argument(
"--template", type=str, default="", help="template path") "--template", type=str, default="", help="template path")
parser.add_argument( parser.add_argument(
...@@ -293,6 +300,8 @@ def parse_args(): ...@@ -293,6 +300,8 @@ def parse_args():
default="source", default="source",
help="[source|pb] Load models in generated `source` code" + help="[source|pb] Load models in generated `source` code" +
"or `pb` file.") "or `pb` file.")
parser.add_argument(
"--gpu_data_type", type=str, default="half", help="half/float")
return parser.parse_known_args() return parser.parse_known_args()
......
...@@ -153,14 +153,15 @@ class TransformerRule(Enum): ...@@ -153,14 +153,15 @@ class TransformerRule(Enum):
TRANSFORM_GPU_WINOGRAD = 8 TRANSFORM_GPU_WINOGRAD = 8
TRANSFORM_ADD_TO_BIASADD = 9 TRANSFORM_ADD_TO_BIASADD = 9
FOLD_BIASADD = 10 FOLD_BIASADD = 10
FOLD_ACTIVATION = 11 FLATTEN_ATROUS_CONV = 11
TRANSPOSE_FILTERS = 12 FOLD_ACTIVATION = 12
RESHAPE_FC_WEIGHT = 13 TRANSPOSE_FILTERS = 13
TRANSPOSE_DATA_FORMAT = 14 RESHAPE_FC_WEIGHT = 14
TRANSFORM_GLOBAL_CONV_TO_FC = 15 TRANSPOSE_DATA_FORMAT = 15
TRANSFORM_BUFFER_IMAGE = 16 TRANSFORM_GLOBAL_CONV_TO_FC = 16
ADD_DEVICE_AND_DATA_TYPE = 17 TRANSFORM_BUFFER_IMAGE = 17
SORT_BY_EXECUTION = 18 ADD_DEVICE_AND_DATA_TYPE = 18
SORT_BY_EXECUTION = 19
class ConverterInterface(object): class ConverterInterface(object):
...@@ -218,6 +219,7 @@ class ConverterOption(object): ...@@ -218,6 +219,7 @@ class ConverterOption(object):
TransformerRule.TRANSFORM_GPU_WINOGRAD, TransformerRule.TRANSFORM_GPU_WINOGRAD,
TransformerRule.TRANSFORM_ADD_TO_BIASADD, TransformerRule.TRANSFORM_ADD_TO_BIASADD,
TransformerRule.FOLD_BIASADD, TransformerRule.FOLD_BIASADD,
TransformerRule.FLATTEN_ATROUS_CONV,
TransformerRule.FOLD_ACTIVATION, TransformerRule.FOLD_ACTIVATION,
TransformerRule.TRANSPOSE_FILTERS, TransformerRule.TRANSPOSE_FILTERS,
TransformerRule.TRANSPOSE_DATA_FORMAT, TransformerRule.TRANSPOSE_DATA_FORMAT,
......
...@@ -16,6 +16,7 @@ ...@@ -16,6 +16,7 @@
import math import math
import numpy as np import numpy as np
import tensorflow as tf import tensorflow as tf
from enum import Enum
from mace.proto import mace_pb2 from mace.proto import mace_pb2
from mace.python.tools.converter_tool import base_converter from mace.python.tools.converter_tool import base_converter
...@@ -41,6 +42,50 @@ tf_epsilon_str = 'epsilon' ...@@ -41,6 +42,50 @@ tf_epsilon_str = 'epsilon'
tf_align_corners = 'align_corners' tf_align_corners = 'align_corners'
tf_block_size = 'block_size' tf_block_size = 'block_size'
TFSupportedOps = [
'Conv2D',
'DepthwiseConv2dNative',
'Conv2DBackpropInput',
'BiasAdd',
'Add',
'Sub',
'Mul',
'Div',
'Min',
'Max',
'Neg',
'Abs',
'RealDiv',
'SquaredDifference',
'Pow',
'Relu',
'Relu6',
'Tanh',
'Sigmoid',
'FusedBatchNorm',
'AvgPool',
'MaxPool',
'Squeeze',
'MatMul',
'Identity',
'Reshape',
'Shape',
'Transpose',
'Softmax',
'ResizeBilinear',
'Placeholder',
'SpaceToBatchND',
'BatchToSpaceND',
'DepthToSpace',
'SpaceToDepth',
'Pad',
'ConcatV2',
'Mean',
'Const',
]
TFOpType = Enum('TFOpType', [(op, op) for op in TFSupportedOps], type=str)
class TensorflowConverter(base_converter.ConverterInterface): class TensorflowConverter(base_converter.ConverterInterface):
"""A class for convert tensorflow frozen model to mace model. """A class for convert tensorflow frozen model to mace model.
...@@ -53,71 +98,70 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -53,71 +98,70 @@ class TensorflowConverter(base_converter.ConverterInterface):
'FULL': PaddingMode.FULL 'FULL': PaddingMode.FULL
} }
pooling_type_mode = { pooling_type_mode = {
'AvgPool': PoolingType.AVG, TFOpType.AvgPool.name: PoolingType.AVG,
'MaxPool': PoolingType.MAX TFOpType.MaxPool.name: PoolingType.MAX
} }
eltwise_type = { eltwise_type = {
'Add': EltwiseType.SUM, TFOpType.Add.name: EltwiseType.SUM,
'Sub': EltwiseType.SUB, TFOpType.Sub.name: EltwiseType.SUB,
'Mul': EltwiseType.PROD, TFOpType.Mul.name: EltwiseType.PROD,
'Div': EltwiseType.DIV, TFOpType.Div.name: EltwiseType.DIV,
'Min': EltwiseType.MIN, TFOpType.Min.name: EltwiseType.MIN,
'Max': EltwiseType.MAX, TFOpType.Max.name: EltwiseType.MAX,
'Neg': EltwiseType.NEG, TFOpType.Neg.name: EltwiseType.NEG,
'Abs': EltwiseType.ABS, TFOpType.Abs.name: EltwiseType.ABS,
'RealDiv': EltwiseType.DIV, TFOpType.RealDiv.name: EltwiseType.DIV,
'SquaredDifference': EltwiseType.SQR_DIFF, TFOpType.SquaredDifference.name: EltwiseType.SQR_DIFF,
'Pow': EltwiseType.POW TFOpType.Pow.name: EltwiseType.POW
} }
activation_type = { activation_type = {
'Relu': ActivationType.RELU, TFOpType.Relu.name: ActivationType.RELU,
'Relu6': ActivationType.RELUX, TFOpType.Relu6.name: ActivationType.RELUX,
'Tanh': ActivationType.TANH, TFOpType.Tanh.name: ActivationType.TANH,
'Sigmoid': ActivationType.SIGMOID TFOpType.Sigmoid.name: ActivationType.SIGMOID
} }
def __init__(self, option, src_model_file): def __init__(self, option, src_model_file):
self._op_converters = { self._op_converters = {
'Conv2D': self.convert_conv2d, TFOpType.Conv2D.name: self.convert_conv2d,
'DepthwiseConv2dNative': self.convert_conv2d, TFOpType.DepthwiseConv2dNative.name: self.convert_conv2d,
'Conv2DBackpropInput': self.convert_conv2d, TFOpType.Conv2DBackpropInput.name: self.convert_conv2d,
'BiasAdd': self.convert_biasadd, TFOpType.BiasAdd.name: self.convert_biasadd,
'Add': self.convert_add, TFOpType.Add.name: self.convert_add,
'Sub': self.convert_elementwise, TFOpType.Sub.name: self.convert_elementwise,
'Mul': self.convert_elementwise, TFOpType.Mul.name: self.convert_elementwise,
'Div': self.convert_elementwise, TFOpType.Div.name: self.convert_elementwise,
'Min': self.convert_elementwise, TFOpType.Min.name: self.convert_elementwise,
'Max': self.convert_elementwise, TFOpType.Max.name: self.convert_elementwise,
'Neg': self.convert_elementwise, TFOpType.Neg.name: self.convert_elementwise,
'Abs': self.convert_elementwise, TFOpType.Abs.name: self.convert_elementwise,
'RealDiv': self.convert_elementwise, TFOpType.RealDiv.name: self.convert_elementwise,
'SquaredDifference': self.convert_elementwise, TFOpType.SquaredDifference.name: self.convert_elementwise,
'Pow': self.convert_elementwise, TFOpType.Pow.name: self.convert_elementwise,
'Relu': self.convert_activation, TFOpType.Relu.name: self.convert_activation,
'Relu6': self.convert_activation, TFOpType.Relu6.name: self.convert_activation,
'Tanh': self.convert_activation, TFOpType.Tanh.name: self.convert_activation,
'Sigmoid': self.convert_activation, TFOpType.Sigmoid.name: self.convert_activation,
'FusedBatchNorm': self.convert_fused_batchnorm, TFOpType.FusedBatchNorm.name: self.convert_fused_batchnorm,
'AvgPool': self.convert_pooling, TFOpType.AvgPool.name: self.convert_pooling,
'MaxPool': self.convert_pooling, TFOpType.MaxPool.name: self.convert_pooling,
'Squeeze': self.convert_identity, TFOpType.Squeeze.name: self.convert_identity,
'MatMul': self.convert_matmul, TFOpType.MatMul.name: self.convert_matmul,
'Identity': self.convert_identity, TFOpType.Identity.name: self.convert_identity,
'Reshape': self.convert_reshape, TFOpType.Reshape.name: self.convert_reshape,
'Shape': self.convert_nop, TFOpType.Shape.name: self.convert_nop,
'Transpose': self.convert_transpose, TFOpType.Transpose.name: self.convert_transpose,
'Softmax': self.convert_softmax, TFOpType.Softmax.name: self.convert_softmax,
'ResizeBilinear': self.convert_resize_bilinear, TFOpType.ResizeBilinear.name: self.convert_resize_bilinear,
'Placeholder': self.convert_nop, TFOpType.Placeholder.name: self.convert_nop,
'SpaceToBatchND': self.convert_space_batch, TFOpType.SpaceToBatchND.name: self.convert_space_batch,
'BatchToSpaceND': self.convert_space_batch, TFOpType.BatchToSpaceND.name: self.convert_space_batch,
'DepthToSpace': self.convert_space_depth, TFOpType.DepthToSpace.name: self.convert_space_depth,
'SpaceToDepth': self.convert_space_depth, TFOpType.SpaceToDepth.name: self.convert_space_depth,
'Pad': self.convert_pad, TFOpType.Pad.name: self.convert_pad,
'ConcatV2': self.convert_concat, TFOpType.ConcatV2.name: self.convert_concat,
'Mean': self.convert_mean, TFOpType.Mean.name: self.convert_mean,
# Const converter_tool should be placed at the end TFOpType.Const.name: self.convert_nop,
'Const': self.convert_tensor,
} }
self._option = option self._option = option
self._mace_net_def = mace_pb2.NetDef() self._mace_net_def = mace_pb2.NetDef()
...@@ -180,24 +224,29 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -180,24 +224,29 @@ class TensorflowConverter(base_converter.ConverterInterface):
"Mace does not support tensorflow op type %s yet" "Mace does not support tensorflow op type %s yet"
% tf_op.type) % tf_op.type)
self._op_converters[tf_op.type](tf_op) self._op_converters[tf_op.type](tf_op)
self.convert_tensors()
def convert_tensor(self, tf_op): def convert_tensors(self):
output_name = tf_op.outputs[0].name for tf_op in self._tf_graph.get_operations():
if output_name not in self._skip_tensor: if tf_op.type != TFOpType.Const.name:
tensor = self._mace_net_def.tensors.add() continue
tensor.name = tf_op.outputs[0].name output_name = tf_op.outputs[0].name
tf_tensor = tf_op.outputs[0].eval() if output_name not in self._skip_tensor:
tensor.dims.extend(list(tf_tensor.shape)) tensor = self._mace_net_def.tensors.add()
tensor.name = tf_op.outputs[0].name
tf_dt = tf_op.get_attr('dtype') tf_tensor = tf_op.outputs[0].eval()
if tf_dt == tf.float32: tensor.dims.extend(list(tf_tensor.shape))
tensor.data_type = mace_pb2.DT_FLOAT
tensor.float_data.extend(tf_tensor.astype(np.float32).flat) tf_dt = tf_op.get_attr('dtype')
elif tf_dt == tf.int32: if tf_dt == tf.float32:
tensor.data_type = mace_pb2.DT_INT32 tensor.data_type = mace_pb2.DT_FLOAT
tensor.int32_data.extend(tf_tensor.astype(np.int32).flat) tensor.float_data.extend(tf_tensor.astype(np.float32).flat)
else: elif tf_dt == tf.int32:
mace_check(False, "Not supported tensor type: %s" % tf_dt.name) tensor.data_type = mace_pb2.DT_INT32
tensor.int32_data.extend(tf_tensor.astype(np.int32).flat)
else:
mace_check(False,
"Not supported tensor type: %s" % tf_dt.name)
def add_tensor(self, name, shape, data_type, value): def add_tensor(self, name, shape, data_type, value):
tensor = self._mace_net_def.tensors.add() tensor = self._mace_net_def.tensors.add()
...@@ -229,9 +278,9 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -229,9 +278,9 @@ class TensorflowConverter(base_converter.ConverterInterface):
def convert_conv2d(self, tf_op): def convert_conv2d(self, tf_op):
op = self.convert_general_op(tf_op) op = self.convert_general_op(tf_op)
if tf_op.type == 'DepthwiseConv2dNative': if tf_op.type == TFOpType.DepthwiseConv2dNative.name:
op.type = MaceOp.DepthwiseConv2d.name op.type = MaceOp.DepthwiseConv2d.name
elif tf_op.type == 'Conv2DBackpropInput': elif tf_op.type == TFOpType.Conv2DBackpropInput.name:
op.type = MaceOp.Deconv2D.name op.type = MaceOp.Deconv2D.name
else: else:
op.type = MaceOp.Conv2D.name op.type = MaceOp.Conv2D.name
...@@ -274,7 +323,7 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -274,7 +323,7 @@ class TensorflowConverter(base_converter.ConverterInterface):
type_arg.name = MaceKeyword.mace_activation_type_str type_arg.name = MaceKeyword.mace_activation_type_str
type_arg.s = self.activation_type[tf_op.type].name type_arg.s = self.activation_type[tf_op.type].name
if tf_op.type == 'Relu6': if tf_op.type == TFOpType.Relu6.name:
limit_arg = op.arg.add() limit_arg = op.arg.add()
limit_arg.name = MaceKeyword.mace_activation_max_limit_str limit_arg.name = MaceKeyword.mace_activation_max_limit_str
limit_arg.f = 6.0 limit_arg.f = 6.0
...@@ -335,7 +384,7 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -335,7 +384,7 @@ class TensorflowConverter(base_converter.ConverterInterface):
size_arg.name = MaceKeyword.mace_resize_size_str size_arg.name = MaceKeyword.mace_resize_size_str
size_value = tf_op.inputs[1].eval().astype(np.int32) size_value = tf_op.inputs[1].eval().astype(np.int32)
size_arg.ints.extend(size_value) size_arg.ints.extend(size_value)
self._skip_tensor.update(tf_op.inputs[1].name) self._skip_tensor.add(tf_op.inputs[1].name)
align_corners_arg = op.arg.add() align_corners_arg = op.arg.add()
align_corners_arg.name = MaceKeyword.mace_align_corners_str align_corners_arg.name = MaceKeyword.mace_align_corners_str
align_corners_arg.i = tf_op.get_attr(tf_align_corners) align_corners_arg.i = tf_op.get_attr(tf_align_corners)
...@@ -357,7 +406,7 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -357,7 +406,7 @@ class TensorflowConverter(base_converter.ConverterInterface):
size_arg.ints.extend(size_value) size_arg.ints.extend(size_value)
crops_or_paddings_arg = op.arg.add() crops_or_paddings_arg = op.arg.add()
if op.type == 'BatchToSpaceND': if op.type == TFOpType.BatchToSpaceND.name:
op.type = MaceOp.BatchToSpaceND.name op.type = MaceOp.BatchToSpaceND.name
crops_or_paddings_arg.name = \ crops_or_paddings_arg.name = \
MaceKeyword.mace_batch_to_space_crops_str MaceKeyword.mace_batch_to_space_crops_str
...@@ -367,12 +416,12 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -367,12 +416,12 @@ class TensorflowConverter(base_converter.ConverterInterface):
crops_or_paddings_value = tf_op.inputs[2].eval().astype(np.int32).flat crops_or_paddings_value = tf_op.inputs[2].eval().astype(np.int32).flat
crops_or_paddings_arg.ints.extend(crops_or_paddings_value) crops_or_paddings_arg.ints.extend(crops_or_paddings_value)
self._skip_tensor.update(tf_op.inputs[1].name) self._skip_tensor.add(tf_op.inputs[1].name)
self._skip_tensor.update(tf_op.inputs[2].name) self._skip_tensor.add(tf_op.inputs[2].name)
def convert_space_depth(self, tf_op): def convert_space_depth(self, tf_op):
op = self.convert_general_op(tf_op) op = self.convert_general_op(tf_op)
if op.type == 'SpaceToDepth': if op.type == TFOpType.SpaceToDepth.name:
op.type = MaceOp.SpaceToDepth.name op.type = MaceOp.SpaceToDepth.name
else: else:
op.type = MaceOp.DepthToSpace.name op.type = MaceOp.DepthToSpace.name
...@@ -390,14 +439,14 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -390,14 +439,14 @@ class TensorflowConverter(base_converter.ConverterInterface):
paddings_arg.name = MaceKeyword.mace_paddings_str paddings_arg.name = MaceKeyword.mace_paddings_str
paddings_value = tf_op.inputs[1].eval().astype(np.int32).flat paddings_value = tf_op.inputs[1].eval().astype(np.int32).flat
paddings_arg.ints.extend(paddings_value) paddings_arg.ints.extend(paddings_value)
self._skip_tensor.update(tf_op.inputs[1].name) self._skip_tensor.add(tf_op.inputs[1].name)
if len(tf_op.inputs) == 3: if len(tf_op.inputs) == 3:
constant_value_arg = op.arg.add() constant_value_arg = op.arg.add()
constant_value_arg.name = MaceKeyword.mace_constant_value_str constant_value_arg.name = MaceKeyword.mace_constant_value_str
constant_value = tf_op.inputs[2].eval().astype(np.int32).flat[0] constant_value = tf_op.inputs[2].eval().astype(np.int32).flat[0]
constant_value_arg.i = constant_value constant_value_arg.i = constant_value
self._skip_tensor.update(tf_op.inputs[2].name) self._skip_tensor.add(tf_op.inputs[2].name)
def convert_concat(self, tf_op): def convert_concat(self, tf_op):
op = self.convert_general_op(tf_op) op = self.convert_general_op(tf_op)
...@@ -412,7 +461,7 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -412,7 +461,7 @@ class TensorflowConverter(base_converter.ConverterInterface):
mace_check(axis == 3, "only support concat at channel dimension") mace_check(axis == 3, "only support concat at channel dimension")
self._skip_tensor.update(tf_op.inputs[-1].name) self._skip_tensor.add(tf_op.inputs[-1].name)
def convert_matmul(self, tf_op): def convert_matmul(self, tf_op):
op = self.convert_general_op(tf_op) op = self.convert_general_op(tf_op)
...@@ -426,13 +475,13 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -426,13 +475,13 @@ class TensorflowConverter(base_converter.ConverterInterface):
shape_arg = op.arg.add() shape_arg = op.arg.add()
shape_arg.name = MaceKeyword.mace_shape_str shape_arg.name = MaceKeyword.mace_shape_str
shape_value = [] shape_value = []
if tf_op.inputs[1].op.type == 'Const': if tf_op.inputs[1].op.type == TFOpType.Const.name:
shape_value = list(tf_op.inputs[1].eval().astype(np.int32)) shape_value = list(tf_op.inputs[1].eval().astype(np.int32))
for i in xrange(len(shape_value)): for i in xrange(len(shape_value)):
if shape_value[i] == -1: if shape_value[i] == -1:
shape_value[i] = 1 shape_value[i] = 1
self._skip_tensor.update(tf_op.inputs[-1].name) self._skip_tensor.add(tf_op.inputs[-1].name)
elif tf_op.inputs[1].op.type == 'Shape': elif tf_op.inputs[1].op.type == TFOpType.Shape.name:
shape_value = list(tf_op.inputs[1].op.inputs[0].shape.as_list()) shape_value = list(tf_op.inputs[1].op.inputs[0].shape.as_list())
shape_arg.ints.extend(shape_value) shape_arg.ints.extend(shape_value)
......
...@@ -66,6 +66,8 @@ class Transformer(base_converter.ConverterInterface): ...@@ -66,6 +66,8 @@ class Transformer(base_converter.ConverterInterface):
TransformerRule.TRANSFORM_ADD_TO_BIASADD, TransformerRule.TRANSFORM_ADD_TO_BIASADD,
TransformerRule.FOLD_BIASADD, TransformerRule.FOLD_BIASADD,
TransformerRule.FOLD_ACTIVATION, TransformerRule.FOLD_ACTIVATION,
TransformerRule.FLATTEN_ATROUS_CONV,
TransformerRule.FOLD_ACTIVATION,
TransformerRule.TRANSPOSE_FILTERS, TransformerRule.TRANSPOSE_FILTERS,
TransformerRule.TRANSPOSE_DATA_FORMAT, TransformerRule.TRANSPOSE_DATA_FORMAT,
TransformerRule.TRANSFORM_GLOBAL_CONV_TO_FC, TransformerRule.TRANSFORM_GLOBAL_CONV_TO_FC,
...@@ -93,6 +95,7 @@ class Transformer(base_converter.ConverterInterface): ...@@ -93,6 +95,7 @@ class Transformer(base_converter.ConverterInterface):
TransformerRule.TRANSFORM_ADD_TO_BIASADD: TransformerRule.TRANSFORM_ADD_TO_BIASADD:
self.transform_add_to_biasadd, self.transform_add_to_biasadd,
TransformerRule.FOLD_BIASADD: self.fold_biasadd, TransformerRule.FOLD_BIASADD: self.fold_biasadd,
TransformerRule.FLATTEN_ATROUS_CONV: self.flatten_atrous_conv,
TransformerRule.FOLD_ACTIVATION: self.fold_activation, TransformerRule.FOLD_ACTIVATION: self.fold_activation,
TransformerRule.TRANSPOSE_FILTERS: self.transpose_filters, TransformerRule.TRANSPOSE_FILTERS: self.transpose_filters,
TransformerRule.TRANSPOSE_DATA_FORMAT: self.transpose_data_format, TransformerRule.TRANSPOSE_DATA_FORMAT: self.transpose_data_format,
...@@ -616,6 +619,65 @@ class Transformer(base_converter.ConverterInterface): ...@@ -616,6 +619,65 @@ class Transformer(base_converter.ConverterInterface):
return False return False
def flatten_atrous_conv(self):
if self._option.device != DeviceType.GPU.value:
return
net = self._model
for op in net.op:
if (op.type == MaceOp.SpaceToBatchND.name
and len(self._consumers.get(op.output[0], [])) == 1):
conv_op = self._consumers.get(op.output[0])[0]
if (conv_op.type == MaceOp.Conv2D.name
or conv_op.type == MaceOp.DepthwiseConv2d.name) \
and len(self._consumers.get(conv_op.output[0], [])) == 1: # noqa
b2s_op = self._consumers.get(conv_op.output[0])[0]
if b2s_op.type == MaceOp.BatchToSpaceND.name:
print "Flatten atrous convolution"
# Add args.
padding_arg_values = ConverterUtil.get_arg(
op,
MaceKeyword.mace_paddings_str).ints
blocks_arg_values = ConverterUtil.get_arg(
b2s_op,
MaceKeyword.mace_space_batch_block_shape_str).ints
dilation_arg = ConverterUtil.get_arg(
conv_op,
MaceKeyword.mace_dilations_str)
if dilation_arg is None:
dilation_arg = conv_op.arg.add()
dilation_arg.name = MaceKeyword.mace_dilations_str
dilation_arg.ints[:] = blocks_arg_values
padding_arg = ConverterUtil.get_arg(
conv_op,
MaceKeyword.mace_padding_str)
if padding_arg is None:
padding_arg = conv_op.arg.add()
padding_arg.name = MaceKeyword.mace_padding_str
if len(padding_arg_values) > 0 \
and padding_arg_values[0] > 0:
padding_arg.i = PaddingMode.SAME.value
else:
padding_arg.i = PaddingMode.VALID.value
strides_arg = ConverterUtil.get_arg(
conv_op,
MaceKeyword.mace_strides_str)
if strides_arg is None:
strides_arg = conv_op.arg.add()
strides_arg.name = MaceKeyword.mace_strides_str
strides_arg.ints[:] = [1, 1]
# update output shape
conv_op.output_shape[0].dims[:] = \
b2s_op.output_shape[0].dims[:]
self.safe_remove_node(op, None)
self.safe_remove_node(b2s_op, conv_op)
return True
return False
def fold_activation(self): def fold_activation(self):
net = self._model net = self._model
for op in net.op: for op in net.op:
......
...@@ -27,7 +27,6 @@ def convert_to_source(net_def, model_checksum, weight_checksum, template_dir, ...@@ -27,7 +27,6 @@ def convert_to_source(net_def, model_checksum, weight_checksum, template_dir,
obfuscate, model_tag, output, runtime, embed_model_data, obfuscate, model_tag, output, runtime, embed_model_data,
winograd_conv, model_load_type, tensor_infos, winograd_conv, model_load_type, tensor_infos,
model_data): model_data):
# Capture our current directory # Capture our current directory
print template_dir print template_dir
......
...@@ -105,11 +105,11 @@ def rename_tensor(net_def): ...@@ -105,11 +105,11 @@ def rename_tensor(net_def):
class TensorInfo: class TensorInfo:
def __init__(self, id, t, runtime): def __init__(self, id, t, runtime, gpu_data_type):
self.id = id self.id = id
self.data_type = mace_pb2.DataType.Name(t.data_type) self.data_type = mace_pb2.DataType.Name(t.data_type)
if t.data_type == mace_pb2.DT_FLOAT: if t.data_type == mace_pb2.DT_FLOAT:
if runtime == 'gpu': if runtime == 'gpu' and gpu_data_type == 'half':
self.data_type = mace_pb2.DT_HALF self.data_type = mace_pb2.DT_HALF
self.data = bytearray( self.data = bytearray(
np.array(t.float_data).astype(np.float16).tobytes()) np.array(t.float_data).astype(np.float16).tobytes())
...@@ -127,13 +127,13 @@ class TensorInfo: ...@@ -127,13 +127,13 @@ class TensorInfo:
raise Exception('Tensor data type %s not supported' % t.data_type) raise Exception('Tensor data type %s not supported' % t.data_type)
def get_tensor_info_and_model_data(net_def, runtime): def get_tensor_info_and_model_data(net_def, runtime, gpu_data_type):
model_data = [] model_data = []
offset = 0 offset = 0
counter = 0 counter = 0
tensor_infos = [] tensor_infos = []
for t in net_def.tensors: for t in net_def.tensors:
tensor_info = TensorInfo(counter, t, runtime) tensor_info = TensorInfo(counter, t, runtime, gpu_data_type)
tensor_infos.append(tensor_info) tensor_infos.append(tensor_info)
# align # align
if tensor_info.data_type != 'DT_UINT8' and offset % 4 != 0: if tensor_info.data_type != 'DT_UINT8' and offset % 4 != 0:
...@@ -156,15 +156,17 @@ def get_tensor_info_and_model_data(net_def, runtime): ...@@ -156,15 +156,17 @@ def get_tensor_info_and_model_data(net_def, runtime):
return tensor_infos, model_data return tensor_infos, model_data
def del_tensor_data(net_def, runtime): def del_tensor_data(net_def, runtime, gpu_data_type):
for t in net_def.tensors: for t in net_def.tensors:
if t.data_type == mace_pb2.DT_FLOAT: if t.data_type == mace_pb2.DT_FLOAT:
del t.float_data[:] del t.float_data[:]
if runtime == 'gpu':
t.data_type = mace_pb2.DT_HALF
else:
t.data_type = mace_pb2.DT_FLOAT
elif t.data_type == mace_pb2.DT_INT32: elif t.data_type == mace_pb2.DT_INT32:
del t.int32_data[:] del t.int32_data[:]
elif t.data_type == mace_pb2.DT_UINT8: elif t.data_type == mace_pb2.DT_UINT8:
del t.int32_data[:] del t.int32_data[:]
def update_tensor_data_type(net_def, runtime, gpu_data_type):
for t in net_def.tensors:
if t.data_type == mace_pb2.DT_FLOAT and runtime == 'gpu' \
and gpu_data_type == 'half':
t.data_type = mace_pb2.DT_HALF
...@@ -538,6 +538,11 @@ def parse_args(): ...@@ -538,6 +538,11 @@ def parse_args():
default="source", default="source",
help="[source|pb] Load models in generated `source` code" + help="[source|pb] Load models in generated `source` code" +
"or `pb` file.") "or `pb` file.")
parser.add_argument(
"--gpu_data_type",
type=str,
default="half",
help="[half | float].")
return parser.parse_known_args() return parser.parse_known_args()
...@@ -809,7 +814,8 @@ def main(unused_args): ...@@ -809,7 +814,8 @@ def main(unused_args):
model_config["fast_conv"], model_config["fast_conv"],
model_config["obfuscate"], model_config["obfuscate"],
model_output_base_dir, model_output_base_dir,
FLAGS.model_load_type) FLAGS.model_load_type,
FLAGS.gpu_data_type)
for target_abi in configs["target_abis"]: for target_abi in configs["target_abis"]:
for target_soc in target_socs: for target_soc in target_socs:
......
...@@ -470,7 +470,8 @@ def gen_model_code(model_codegen_dir, ...@@ -470,7 +470,8 @@ def gen_model_code(model_codegen_dir,
fast_conv, fast_conv,
obfuscate, obfuscate,
model_output_dir, model_output_dir,
model_load_type): model_load_type,
gpu_data_type):
print("* Genearte model code") print("* Genearte model code")
bazel_build_common("//mace/python/tools:converter") bazel_build_common("//mace/python/tools:converter")
...@@ -499,6 +500,7 @@ def gen_model_code(model_codegen_dir, ...@@ -499,6 +500,7 @@ def gen_model_code(model_codegen_dir,
"--codegen_output=%s/model.cc" % model_codegen_dir, "--codegen_output=%s/model.cc" % model_codegen_dir,
"--pb_output=%s/%s.pb" % (model_output_dir, model_tag), "--pb_output=%s/%s.pb" % (model_output_dir, model_tag),
"--model_load_type=%s" % model_load_type, "--model_load_type=%s" % model_load_type,
"--gpu_data_type=%s" % gpu_data_type,
_out=process_output, _out=process_output,
_bg=True, _bg=True,
_err_to_out=True) _err_to_out=True)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册