提交 7a79a9cf 编写于 作者: 李寅

Merge branch 'add-onnx' into 'master'

add onnx

See merge request !902
...@@ -82,7 +82,8 @@ the following projects during the development: ...@@ -82,7 +82,8 @@ the following projects during the development:
[Caffe](https://github.com/BVLC/caffe), [Caffe](https://github.com/BVLC/caffe),
[SNPE](https://developer.qualcomm.com/software/snapdragon-neural-processing-engine-ai), [SNPE](https://developer.qualcomm.com/software/snapdragon-neural-processing-engine-ai),
[ARM ComputeLibrary](https://github.com/ARM-software/ComputeLibrary), [ARM ComputeLibrary](https://github.com/ARM-software/ComputeLibrary),
[ncnn](https://github.com/Tencent/ncnn) and many others: we learned many best [ncnn](https://github.com/Tencent/ncnn),
[ONNX](https://github.com/onnx/onnx) and many others: we learned many best
practices from these projects. practices from these projects.
Finally, we also thank the Qualcomm, Pinecone and MediaTek engineering teams for Finally, we also thank the Qualcomm, Pinecone and MediaTek engineering teams for
......
...@@ -64,6 +64,9 @@ Optional dependencies ...@@ -64,6 +64,9 @@ Optional dependencies
* - FileLock * - FileLock
- pip install -I filelock==3.0.0 - pip install -I filelock==3.0.0
- Required by run on Android - Required by run on Android
* - ONNX
- pip install onnx
- Required by ONNX model
.. note:: .. note::
......
...@@ -72,3 +72,9 @@ Install Caffe (Optional) ...@@ -72,3 +72,9 @@ Install Caffe (Optional)
------------------------- -------------------------
Please follow the installation instruction of `Caffe <http://caffe.berkeleyvision.org/installation.html>`__. Please follow the installation instruction of `Caffe <http://caffe.berkeleyvision.org/installation.html>`__.
Install ONNX (Optional)
-------------------------
Please follow the installation instruction of `ONNX <https://github.com/onnx/onnx#source>`__.
...@@ -18,8 +18,7 @@ MACE Model ...@@ -18,8 +18,7 @@ MACE Model
~~~~~~~~~~ ~~~~~~~~~~
MACE defines a customized model format which is similar to MACE defines a customized model format which is similar to
Caffe2. The MACE model can be converted from exported models by TensorFlow Caffe2. The MACE model can be converted from exported models by TensorFlow, Caffe or ONNX Model.
and Caffe.
MACE Interpreter MACE Interpreter
~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
...@@ -50,7 +49,7 @@ Build MACE dynamic or static libraries. ...@@ -50,7 +49,7 @@ Build MACE dynamic or static libraries.
3. Convert model 3. Convert model
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
Convert TensorFlow or Caffe model to MACE model. Convert TensorFlow, Caffe or ONNX model to MACE model.
4.1. Deploy 4.1. Deploy
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
...@@ -86,7 +85,7 @@ MACE覆盖了常见的移动端计算设备(CPU,GPU和DSP),并且提供 ...@@ -86,7 +85,7 @@ MACE覆盖了常见的移动端计算设备(CPU,GPU和DSP),并且提供
MACE Model MACE Model
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
MACE定义了自有的模型格式(类似于Caffe2),通过MACE提供的工具可以将Caffe和TensorFlow的模型 MACE定义了自有的模型格式(类似于Caffe2),通过MACE提供的工具可以将Caffe/TensorFlow/ONNX格式的模型
转为MACE模型。 转为MACE模型。
MACE Interpreter MACE Interpreter
...@@ -118,7 +117,7 @@ CPU/GPU/DSP Runtime对应于各个计算设备的算子实现。 ...@@ -118,7 +117,7 @@ CPU/GPU/DSP Runtime对应于各个计算设备的算子实现。
3. 转换模型 3. 转换模型
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
将TensorFlow 或者 Caffe的模型转为MACE的模型。 将TensorFlow或者Caffe或者ONNX的模型转为MACE的模型。
4.1. 部署 4.1. 部署
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
......
...@@ -78,6 +78,8 @@ in one deployment file. ...@@ -78,6 +78,8 @@ in one deployment file.
- [optional] Specify Numpy validation inputs. When not provided, [-1, 1] random values will be used. - [optional] Specify Numpy validation inputs. When not provided, [-1, 1] random values will be used.
* - validation_threshold * - validation_threshold
- [optional] Specify the similarity threshold for validation. A dict with key in 'CPU', 'GPU' and/or 'HEXAGON' and value <= 1.0. - [optional] Specify the similarity threshold for validation. A dict with key in 'CPU', 'GPU' and/or 'HEXAGON' and value <= 1.0.
* - backend
- The onnx backend framework for validation, could be [tensorflow, caffe2, pytorch], default is tensorflow.
* - runtime * - runtime
- The running device, one of [cpu, gpu, dsp, cpu_gpu]. cpu_gpu contains CPU and GPU model definition so you can run the model on both CPU and GPU. - The running device, one of [cpu, gpu, dsp, cpu_gpu]. cpu_gpu contains CPU and GPU model definition so you can run the model on both CPU and GPU.
* - data_type * - data_type
......
...@@ -114,6 +114,19 @@ MACE now supports models from TensorFlow and Caffe (more frameworks will be supp ...@@ -114,6 +114,19 @@ MACE now supports models from TensorFlow and Caffe (more frameworks will be supp
# Upgrade caffemodel # Upgrade caffemodel
$CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel $CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel
- ONNX
Prepare your ONNX model.onnx file.
Use `ONNX Optimizer Tool <https://github.com/XiaoMi/mace/tree/master/tools/onnx_optimizer.py>`__ to optimize your model for inference.
This tool will improve the efficiency of inference like the `Graph Transform Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md>`__
in TensorFlow.
.. code:: bash
# Optimize your model
$python MACE_ROOT/tools/onnx_optimizer.py model.onnx model_opt.onnx
=========================================== ===========================================
2. Create a deployment file for your model 2. Create a deployment file for your model
...@@ -137,6 +150,12 @@ Modify one of them and use it for your own case. ...@@ -137,6 +150,12 @@ Modify one of them and use it for your own case.
.. literalinclude:: models/demo_models_caffe.yml .. literalinclude:: models/demo_models_caffe.yml
:language: yaml :language: yaml
- ONNX
.. literalinclude:: models/demo_models_onnx.yml
:language: yaml
More details about model deployment file are in :doc:`advanced_usage`. More details about model deployment file are in :doc:`advanced_usage`.
====================== ======================
......
# The name of library
library_name: mobilenet
target_abis: [arm64-v8a]
model_graph_format: file
model_data_format: file
models:
mobilenet_v1: # model tag, which will be used in model loading and must be specific.
platform: onnx
# path to your onnx model file. Support local path, http:// and https://
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v1/mobilenet-v1-1.0.pb
# sha256_checksum of your model's onnx file.
# use this command to get the sha256_checksum: sha256sum path/to/your/pb/file
model_sha256_checksum: 71b10f540ece33c49a7b51f5d4095fc9bd78ce46ebf0300487b2ee23d71294e6
# define your model's interface
# if there multiple inputs or outputs, write like blow:
# subgraphs:
# - input_tensors:
# - input0
# - input1
# input_shapes:
# - 1,224,224,3
# - 1,224,224,3
# output_tensors:
# - output0
# - output1
# output_shapes:
# - 1,1001
# - 1,1001
subgraphs:
- input_tensors:
- input
input_shapes:
- 1,224,224,3
output_tensors:
- MobilenetV1/Predictions/Reshape_1
output_shapes:
- 1,1001
# onnx backend framwork for validation. Suppport pytorch/caffe/tensorflow. Default is tensorflow.
backend: tensorflow
# cpu, gpu or cpu+gpu
runtime: cpu+gpu
winograd: 0
\ No newline at end of file
...@@ -32,7 +32,8 @@ enum ActivationType { ...@@ -32,7 +32,8 @@ enum ActivationType {
RELUX = 2, RELUX = 2,
PRELU = 3, PRELU = 3,
TANH = 4, TANH = 4,
SIGMOID = 5 SIGMOID = 5,
LEAKYRELU = 6,
}; };
inline ActivationType StringToActivationType(const std::string type) { inline ActivationType StringToActivationType(const std::string type) {
...@@ -48,6 +49,8 @@ inline ActivationType StringToActivationType(const std::string type) { ...@@ -48,6 +49,8 @@ inline ActivationType StringToActivationType(const std::string type) {
return ActivationType::SIGMOID; return ActivationType::SIGMOID;
} else if (type == "NOOP") { } else if (type == "NOOP") {
return ActivationType::NOOP; return ActivationType::NOOP;
} else if (type == "LEAKYRELU") {
return ActivationType ::LEAKYRELU;
} else { } else {
LOG(FATAL) << "Unknown activation type: " << type; LOG(FATAL) << "Unknown activation type: " << type;
} }
...@@ -90,6 +93,13 @@ void DoActivation(const T *input_ptr, ...@@ -90,6 +93,13 @@ void DoActivation(const T *input_ptr,
output_ptr[i] = 1 / (1 + std::exp(-input_ptr[i])); output_ptr[i] = 1 / (1 + std::exp(-input_ptr[i]));
} }
break; break;
case LEAKYRELU:
#pragma omp parallel for schedule(runtime)
for (index_t i = 0; i < size; ++i) {
output_ptr[i] = std::max(input_ptr[i],
static_cast<T>(0)) * relux_max_limit;
}
break;
default: default:
LOG(FATAL) << "Unknown activation type: " << type; LOG(FATAL) << "Unknown activation type: " << type;
} }
...@@ -122,6 +132,9 @@ inline void DoActivation(const float *input_ptr, ...@@ -122,6 +132,9 @@ inline void DoActivation(const float *input_ptr,
output_ptr[i] = 1 / (1 + std::exp(-input_ptr[i])); output_ptr[i] = 1 / (1 + std::exp(-input_ptr[i]));
} }
break; break;
case LEAKYRELU:
LeakyReluNeon(input_ptr, relux_max_limit, size, output_ptr);
break;
default: default:
LOG(FATAL) << "Unknown activation type: " << type; LOG(FATAL) << "Unknown activation type: " << type;
} }
......
...@@ -27,18 +27,29 @@ template <DeviceType D, class T> ...@@ -27,18 +27,29 @@ template <DeviceType D, class T>
class ArgMaxOp : public Operation { class ArgMaxOp : public Operation {
public: public:
explicit ArgMaxOp(OpConstructContext *context) explicit ArgMaxOp(OpConstructContext *context)
: Operation(context) {} : Operation(context),
axis_(Operation::GetOptionalArg<int>("axis", 0)),
keep_dims_(Operation::GetOptionalArg<bool>("keepdims", true)),
argmin_(Operation::GetOptionalArg<bool>("argmin", false)) {}
MaceStatus Run(OpContext *context) override { MaceStatus Run(OpContext *context) override {
MACE_UNUSED(context); MACE_UNUSED(context);
const Tensor *input = this->Input(0); const Tensor *input = this->Input(0);
const Tensor *axis = this->Input(1); const Tensor *axis = this->InputSize() == 2 ?
this->Input(1) : nullptr;
Tensor *output = this->Output(0); Tensor *output = this->Output(0);
MACE_CHECK(keep_dims_, "Mace only supports keep_dims ArgMax.");
MACE_CHECK(input->dim_size() > 0, "ArgMax input should not be a scalar"); MACE_CHECK(input->dim_size() > 0, "ArgMax input should not be a scalar");
MACE_CHECK(axis->dim_size() == 0, "Mace argmax only supports scalar axis"); int axis_value = 0;
Tensor::MappingGuard axis_guard(axis); if (axis != nullptr) {
int axis_value = axis->data<int32_t>()[0]; MACE_CHECK(axis->dim_size() == 0,
"Mace argmax only supports scalar axis");
Tensor::MappingGuard axis_guard(axis);
axis_value = axis->data<int32_t>()[0];
} else {
axis_value = axis_;
}
if (axis_value < 0) { if (axis_value < 0) {
axis_value += input->dim_size(); axis_value += input->dim_size();
} }
...@@ -59,22 +70,43 @@ class ArgMaxOp : public Operation { ...@@ -59,22 +70,43 @@ class ArgMaxOp : public Operation {
index_t outer_size = output->size(); index_t outer_size = output->size();
index_t inner_size = input->dim(axis_value); index_t inner_size = input->dim(axis_value);
if (argmin_) {
#pragma omp parallel for schedule(runtime) #pragma omp parallel for schedule(runtime)
for (index_t i = 0; i < outer_size; ++i) { for (index_t i = 0; i < outer_size; ++i) {
int idx = 0; int idx = 0;
T max_value = std::numeric_limits<T>::lowest(); T min_value = std::numeric_limits<T>::max();
const T *input_ptr = input_data + i * inner_size; const T *input_ptr = input_data + i * inner_size;
for (index_t j = 0; j < inner_size; ++j) { for (index_t j = 0; j < inner_size; ++j) {
if (input_ptr[j] > max_value) { if (input_ptr[j] < min_value) {
max_value = input_ptr[j]; min_value = input_ptr[j];
idx = j; idx = j;
}
} }
output_data[i] = idx;
}
} else {
#pragma omp parallel for schedule(runtime)
for (index_t i = 0; i < outer_size; ++i) {
int idx = 0;
T max_value = std::numeric_limits<T>::lowest();
const T *input_ptr = input_data + i * inner_size;
for (index_t j = 0; j < inner_size; ++j) {
if (input_ptr[j] > max_value) {
max_value = input_ptr[j];
idx = j;
}
}
output_data[i] = idx;
} }
output_data[i] = idx;
} }
return MaceStatus::MACE_SUCCESS; return MaceStatus::MACE_SUCCESS;
} }
protected:
const int axis_;
bool keep_dims_;
bool argmin_;
}; };
......
...@@ -67,5 +67,29 @@ void ReluxNeon(const float *input, const float limit, ...@@ -67,5 +67,29 @@ void ReluxNeon(const float *input, const float limit,
#endif #endif
} }
void LeakyReluNeon(const float *input, const float alpha,
const index_t size, float *output) {
#if defined(MACE_ENABLE_NEON)
float32x4_t vzero = vdupq_n_f32(0.f);
float32x4_t valpha = vdupq_n_f32(alpha);
#pragma omp parallel for schedule(runtime)
for (index_t i = 0; i <= size - 4; i += 4) {
float32x4_t v = vld1q_f32(input + i);
v = vmaxq_f32(v, vzero);
v = vmulq_f32(v, valpha);
vst1q_f32(output + i, v);
}
// remain
for (index_t i = (size >> 2) << 2; i < size; ++i) {
output[i] = std::max(input[i], 0.f) * alpha;
}
#else
#pragma omp parallel for schedule(runtime)
for (index_t i = 0; i < size; ++i) {
output[i] = std::max(input[i], 0.f) * alpha;
}
#endif
}
} // namespace ops } // namespace ops
} // namespace mace } // namespace mace
...@@ -25,6 +25,9 @@ void ReluNeon(const float *input, const index_t size, float *output); ...@@ -25,6 +25,9 @@ void ReluNeon(const float *input, const index_t size, float *output);
void ReluxNeon(const float *input, const float limit, void ReluxNeon(const float *input, const float limit,
const index_t size, float *output); const index_t size, float *output);
void LeakyReluNeon(const float *input, const float alpha,
const index_t size, float *output);
} // namespace ops } // namespace ops
} // namespace mace } // namespace mace
......
...@@ -43,6 +43,7 @@ class PoolingKernel : public OpenCLPoolingKernel { ...@@ -43,6 +43,7 @@ class PoolingKernel : public OpenCLPoolingKernel {
const Padding &padding_type, const Padding &padding_type,
const std::vector<int> &padding_data, const std::vector<int> &padding_data,
const int *dilations, const int *dilations,
const RoundType round_type,
Tensor *output) override; Tensor *output) override;
private: private:
...@@ -62,6 +63,7 @@ MaceStatus PoolingKernel<T>::Compute( ...@@ -62,6 +63,7 @@ MaceStatus PoolingKernel<T>::Compute(
const Padding &padding_type, const Padding &padding_type,
const std::vector<int> &padding_data, const std::vector<int> &padding_data,
const int *dilations, const int *dilations,
const RoundType round_type,
Tensor *output) { Tensor *output) {
MACE_CHECK(dilations[0] == 1 && dilations[1] == 1) MACE_CHECK(dilations[0] == 1 && dilations[1] == 1)
<< "Pooling opencl kernel not support dilation yet"; << "Pooling opencl kernel not support dilation yet";
...@@ -82,7 +84,7 @@ MaceStatus PoolingKernel<T>::Compute( ...@@ -82,7 +84,7 @@ MaceStatus PoolingKernel<T>::Compute(
} else { } else {
paddings = padding_data; paddings = padding_data;
CalcOutputSize(input->shape().data(), filter_shape.data(), CalcOutputSize(input->shape().data(), filter_shape.data(),
padding_data.data(), dilations, strides, RoundType::CEIL, padding_data.data(), dilations, strides, round_type,
output_shape.data()); output_shape.data());
} }
......
...@@ -102,6 +102,9 @@ inline DATA_TYPE4 do_activation(DATA_TYPE4 in, ...@@ -102,6 +102,9 @@ inline DATA_TYPE4 do_activation(DATA_TYPE4 in,
#endif #endif
#ifdef USE_SIGMOID #ifdef USE_SIGMOID
out = do_sigmoid(in); out = do_sigmoid(in);
#endif
#ifdef USE_LEAKYRELU
out = fmax(in, (DATA_TYPE)0) * relux_max_limit;
#endif #endif
return out; return out;
} }
......
#include <common.h> #include <common.h>
__kernel void reduce_mean(OUT_OF_RANGE_PARAMS __kernel void reduce(OUT_OF_RANGE_PARAMS
GLOBAL_WORK_GROUP_SIZE_DIM3 GLOBAL_WORK_GROUP_SIZE_DIM3
__read_only image2d_t input, __read_only image2d_t input,
__local float4 *group_sum, __local float4 *group_sum,
__private const int group_size, __private const int group_size,
__private const int partial_len, __private const int partial_len,
__private const int remain_index, __private const int remain_index,
__private const int batch, __private const int batch,
__private const int in_height, __private const int in_height,
__private const int in_width, __private const int in_width,
__private const float image_size_reciprocal, __private const float image_size_reciprocal,
__private const int channel_blocks, __private const int channel_blocks,
__write_only image2d_t output) { __write_only image2d_t output) {
const int i = get_local_id(0); const int i = get_local_id(0);
const int j = get_local_id(1); const int j = get_local_id(1);
const int k = get_global_id(2); const int k = get_global_id(2);
...@@ -22,12 +22,22 @@ __kernel void reduce_mean(OUT_OF_RANGE_PARAMS ...@@ -22,12 +22,22 @@ __kernel void reduce_mean(OUT_OF_RANGE_PARAMS
return; return;
#endif #endif
const int dim0_size = get_local_size(0); const int dim0_size = get_local_size(0);
float4 tmp = (float4){0, 0, 0, 0};
const int index = mad24(j, dim0_size, i); const int index = mad24(j, dim0_size, i);
const int b = k / channel_blocks; const int b = k / channel_blocks;
const int ch = mad24(b, -channel_blocks, k); const int ch = mad24(b, -channel_blocks, k);
DATA_TYPE4 in; DATA_TYPE4 in;
#if REDUCE_TYPE == 1
float4 tmp = (float4){MAXFLOAT, MAXFLOAT, MAXFLOAT, MAXFLOAT};
#elif REDUCE_TYPE == 2
float4 tmp = (float4){-MAXFLOAT, -MAXFLOAT, -MAXFLOAT, -MAXFLOAT};
#elif REDUCE_TYPE == 3
float4 tmp = (float4){1, 1, 1, 1};
#else
float4 tmp = (float4){0, 0, 0, 0};
#endif
const int valid_part_len = select(partial_len, const int valid_part_len = select(partial_len,
partial_len - 1, partial_len - 1,
remain_index > 0 && index >= remain_index); remain_index > 0 && index >= remain_index);
...@@ -43,19 +53,51 @@ __kernel void reduce_mean(OUT_OF_RANGE_PARAMS ...@@ -43,19 +53,51 @@ __kernel void reduce_mean(OUT_OF_RANGE_PARAMS
int pos_x = mad24(ch, in_width, w_id); int pos_x = mad24(ch, in_width, w_id);
int pos_y = mad24(b, in_height, h_id); int pos_y = mad24(b, in_height, h_id);
in = READ_IMAGET(input, SAMPLER, (int2)(pos_x, pos_y)); in = READ_IMAGET(input, SAMPLER, (int2)(pos_x, pos_y));
// MIN
#if REDUCE_TYPE == 1
tmp = fmin(tmp, in);
// MAX
#elif REDUCE_TYPE == 2
tmp = fmax(tmp, in);
// PROD
#elif REDUCE_TYPE == 3
tmp = tmp * in;
// MEAN
#else
tmp = tmp + in; tmp = tmp + in;
#endif
} }
group_sum[index] = tmp * image_size_reciprocal;
#if REDUCE_TYPE == 0
tmp = tmp * image_size_reciprocal;
#endif
group_sum[index] = tmp;
#ifdef NON_QUALCOMM_ADRENO #ifdef NON_QUALCOMM_ADRENO
barrier(CLK_LOCAL_MEM_FENCE); barrier(CLK_LOCAL_MEM_FENCE);
#endif #endif
if (i == 0 && j == 0) { if (i == 0 && j == 0) {
#if REDUCE_TYPE == 1
DATA_TYPE4 out = (DATA_TYPE4){MAXFLOAT, MAXFLOAT, MAXFLOAT, MAXFLOAT};
#elif REDUCE_TYPE == 2
DATA_TYPE4 out = (DATA_TYPE4){-MAXFLOAT, -MAXFLOAT, -MAXFLOAT, -MAXFLOAT};
#elif REDUCE_TYPE == 3
DATA_TYPE4 out = (DATA_TYPE4){1, 1, 1, 1};
#else
DATA_TYPE4 out = (DATA_TYPE4){0, 0, 0, 0}; DATA_TYPE4 out = (DATA_TYPE4){0, 0, 0, 0};
#endif
#pragma unroll #pragma unroll
for (int l = 0; l < group_size; ++l) { for (int l = 0; l < group_size; ++l) {
#if REDUCE_TYPE == 1
out = fmin(out, group_sum[l]);
#elif REDUCE_TYPE == 2
out = fmax(out, group_sum[l]);
#elif REDUCE_TYPE == 3
out = out * group_sum[l];
#else
out = out + group_sum[l]; out = out + group_sum[l];
#endif
} }
WRITE_IMAGET(output, (int2)(ch, b), out); WRITE_IMAGET(output, (int2)(ch, b), out);
} }
......
...@@ -99,6 +99,10 @@ MaceStatus ActivationKernel<T>::Compute( ...@@ -99,6 +99,10 @@ MaceStatus ActivationKernel<T>::Compute(
tuning_key_prefix_ = "sigmoid_opencl_kernel"; tuning_key_prefix_ = "sigmoid_opencl_kernel";
built_options.emplace("-DUSE_SIGMOID"); built_options.emplace("-DUSE_SIGMOID");
break; break;
case LEAKYRELU:
tuning_key_prefix_ = "leakyrelu_opencl_kernel";
built_options.emplace("-DUSE_LEAKYRELU");
break;
default: default:
LOG(FATAL) << "Unknown activation type: " << activation_; LOG(FATAL) << "Unknown activation type: " << activation_;
} }
......
...@@ -69,6 +69,7 @@ class PoolingKernel : public OpenCLPoolingKernel { ...@@ -69,6 +69,7 @@ class PoolingKernel : public OpenCLPoolingKernel {
const Padding &padding_type, const Padding &padding_type,
const std::vector<int> &padding_data, const std::vector<int> &padding_data,
const int *dilations, const int *dilations,
const RoundType round_type,
Tensor *output) override; Tensor *output) override;
private: private:
...@@ -87,6 +88,7 @@ MaceStatus PoolingKernel<T>::Compute( ...@@ -87,6 +88,7 @@ MaceStatus PoolingKernel<T>::Compute(
const Padding &padding_type, const Padding &padding_type,
const std::vector<int> &padding_data, const std::vector<int> &padding_data,
const int *dilations, const int *dilations,
const RoundType round_type,
Tensor *output) { Tensor *output) {
MACE_CHECK(dilations[0] == 1 && dilations[1] == 1) MACE_CHECK(dilations[0] == 1 && dilations[1] == 1)
<< "Pooling opencl kernel not support dilation yet"; << "Pooling opencl kernel not support dilation yet";
...@@ -103,7 +105,7 @@ MaceStatus PoolingKernel<T>::Compute( ...@@ -103,7 +105,7 @@ MaceStatus PoolingKernel<T>::Compute(
} else { } else {
paddings = padding_data; paddings = padding_data;
CalcOutputSize(input->shape().data(), filter_shape.data(), CalcOutputSize(input->shape().data(), filter_shape.data(),
padding_data.data(), dilations, strides, RoundType::CEIL, padding_data.data(), dilations, strides, round_type,
output_shape.data()); output_shape.data());
} }
......
...@@ -11,10 +11,10 @@ ...@@ -11,10 +11,10 @@
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
#ifndef MACE_OPS_OPENCL_IMAGE_REDUCE_MEAN_H_ #ifndef MACE_OPS_OPENCL_IMAGE_REDUCE_H_
#define MACE_OPS_OPENCL_IMAGE_REDUCE_MEAN_H_ #define MACE_OPS_OPENCL_IMAGE_REDUCE_H_
#include "mace/ops/opencl/reduce_mean.h" #include "mace/ops/opencl/reduce.h"
#include <memory> #include <memory>
#include <set> #include <set>
...@@ -24,6 +24,7 @@ ...@@ -24,6 +24,7 @@
#include "mace/core/op_context.h" #include "mace/core/op_context.h"
#include "mace/core/tensor.h" #include "mace/core/tensor.h"
#include "mace/ops/opencl/helper.h" #include "mace/ops/opencl/helper.h"
#include "mace/ops/reduce.h"
namespace mace { namespace mace {
namespace ops { namespace ops {
...@@ -31,11 +32,12 @@ namespace opencl { ...@@ -31,11 +32,12 @@ namespace opencl {
namespace image { namespace image {
template <typename T> template <typename T>
class ReduceMeanKernel : public OpenCLReduceMeanKernel { class ReduceKernel : public OpenCLReduceKernel {
public: public:
ReduceMeanKernel(const std::vector<int> axis, ReduceKernel(ReduceType type,
const bool keep_dims) const std::vector<int> axis,
: axis_(axis), keep_dims_(keep_dims) {} const bool keep_dims)
: reduce_type_(type), axis_(axis), keep_dims_(keep_dims) {}
MaceStatus Compute( MaceStatus Compute(
OpContext *context, OpContext *context,
...@@ -43,6 +45,7 @@ class ReduceMeanKernel : public OpenCLReduceMeanKernel { ...@@ -43,6 +45,7 @@ class ReduceMeanKernel : public OpenCLReduceMeanKernel {
Tensor *output) override; Tensor *output) override;
private: private:
ReduceType reduce_type_;
const std::vector<int> axis_; const std::vector<int> axis_;
bool keep_dims_; bool keep_dims_;
cl::Kernel kernel_; cl::Kernel kernel_;
...@@ -51,16 +54,16 @@ class ReduceMeanKernel : public OpenCLReduceMeanKernel { ...@@ -51,16 +54,16 @@ class ReduceMeanKernel : public OpenCLReduceMeanKernel {
}; };
template <typename T> template <typename T>
MaceStatus ReduceMeanKernel<T>::Compute( MaceStatus ReduceKernel<T>::Compute(
OpContext *context, OpContext *context,
const Tensor *input, const Tensor *input,
Tensor *output) { Tensor *output) {
MACE_CHECK_NOTNULL(input); MACE_CHECK_NOTNULL(input);
MACE_CHECK(keep_dims_, "reduce mean gpu only support keep dims."); MACE_CHECK(keep_dims_, "reduce mean gpu only support keep dims.");
MACE_CHECK(input->dim_size() == 4, MACE_CHECK(input->dim_size() == 4,
"reduce mean gpu only support 4-dim input"); "reduce gpu only support 4-dim input");
MACE_CHECK(axis_.size() == 2 && axis_[0] == 1 && axis_[1] == 2, MACE_CHECK(axis_.size() == 2 && axis_[0] == 1 && axis_[1] == 2,
"reduce mean gpu only support 1,2-axis reduce"); "reduce gpu only support 1,2-axis reduce");
index_t batch = input->dim(0); index_t batch = input->dim(0);
const index_t in_height = input->dim(1); const index_t in_height = input->dim(1);
const index_t in_width = input->dim(2); const index_t in_width = input->dim(2);
...@@ -84,14 +87,15 @@ MaceStatus ReduceMeanKernel<T>::Compute( ...@@ -84,14 +87,15 @@ MaceStatus ReduceMeanKernel<T>::Compute(
std::set<std::string> built_options; std::set<std::string> built_options;
MACE_OUT_OF_RANGE_CONFIG; MACE_OUT_OF_RANGE_CONFIG;
MACE_NON_UNIFORM_WG_CONFIG; MACE_NON_UNIFORM_WG_CONFIG;
std::string kernel_name = MACE_OBFUSCATE_SYMBOL("reduce_mean"); std::string kernel_name = MACE_OBFUSCATE_SYMBOL("reduce");
built_options.emplace("-Dreduce_mean=" + kernel_name); built_options.emplace("-Dreduce=" + kernel_name);
built_options.emplace("-DDATA_TYPE=" + DtToUpCompatibleCLDt(dt)); built_options.emplace("-DDATA_TYPE=" + DtToUpCompatibleCLDt(dt));
built_options.emplace("-DCMD_DATA_TYPE=" + DtToUpCompatibleCLCMDDt(dt)); built_options.emplace("-DCMD_DATA_TYPE=" + DtToUpCompatibleCLCMDDt(dt));
built_options.emplace(MakeString("-DREDUCE_TYPE=", reduce_type_));
if (runtime->gpu_type() != GPUType::QUALCOMM_ADRENO) { if (runtime->gpu_type() != GPUType::QUALCOMM_ADRENO) {
built_options.emplace("-DNON_QUALCOMM_ADRENO"); built_options.emplace("-DNON_QUALCOMM_ADRENO");
} }
MACE_RETURN_IF_ERROR(runtime->BuildKernel("reduce_mean", MACE_RETURN_IF_ERROR(runtime->BuildKernel("reduce",
kernel_name, kernel_name,
built_options, built_options,
&kernel_)); &kernel_));
...@@ -170,4 +174,4 @@ MaceStatus ReduceMeanKernel<T>::Compute( ...@@ -170,4 +174,4 @@ MaceStatus ReduceMeanKernel<T>::Compute(
} // namespace ops } // namespace ops
} // namespace mace } // namespace mace
#endif // MACE_OPS_OPENCL_IMAGE_REDUCE_MEAN_H_ #endif // MACE_OPS_OPENCL_IMAGE_REDUCE_H_
...@@ -36,6 +36,7 @@ class OpenCLPoolingKernel { ...@@ -36,6 +36,7 @@ class OpenCLPoolingKernel {
const Padding &padding_type, const Padding &padding_type,
const std::vector<int> &padding_data, const std::vector<int> &padding_data,
const int *dilations, const int *dilations,
const RoundType round_type,
Tensor *output) = 0; Tensor *output) = 0;
MACE_EMPTY_VIRTUAL_DESTRUCTOR(OpenCLPoolingKernel); MACE_EMPTY_VIRTUAL_DESTRUCTOR(OpenCLPoolingKernel);
}; };
......
...@@ -12,8 +12,8 @@ ...@@ -12,8 +12,8 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
#ifndef MACE_OPS_OPENCL_REDUCE_MEAN_H_ #ifndef MACE_OPS_OPENCL_REDUCE_H_
#define MACE_OPS_OPENCL_REDUCE_MEAN_H_ #define MACE_OPS_OPENCL_REDUCE_H_
#include "mace/public/mace.h" #include "mace/public/mace.h"
#include "mace/utils/utils.h" #include "mace/utils/utils.h"
...@@ -24,16 +24,16 @@ class OpContext; ...@@ -24,16 +24,16 @@ class OpContext;
class Tensor; class Tensor;
namespace ops { namespace ops {
class OpenCLReduceMeanKernel { class OpenCLReduceKernel {
public: public:
virtual MaceStatus Compute( virtual MaceStatus Compute(
OpContext *context, OpContext *context,
const Tensor *input, const Tensor *input,
Tensor *output) = 0; Tensor *output) = 0;
MACE_EMPTY_VIRTUAL_DESTRUCTOR(OpenCLReduceMeanKernel); MACE_EMPTY_VIRTUAL_DESTRUCTOR(OpenCLReduceKernel);
}; };
} // namespace ops } // namespace ops
} // namespace mace } // namespace mace
#endif // MACE_OPS_OPENCL_REDUCE_MEAN_H_ #endif // MACE_OPS_OPENCL_REDUCE_H_
...@@ -44,7 +44,7 @@ extern void RegisterLocalResponseNorm(OpRegistryBase *op_registry); ...@@ -44,7 +44,7 @@ extern void RegisterLocalResponseNorm(OpRegistryBase *op_registry);
extern void RegisterMatMul(OpRegistryBase *op_registry); extern void RegisterMatMul(OpRegistryBase *op_registry);
extern void RegisterPad(OpRegistryBase *op_registry); extern void RegisterPad(OpRegistryBase *op_registry);
extern void RegisterPooling(OpRegistryBase *op_registry); extern void RegisterPooling(OpRegistryBase *op_registry);
extern void RegisterReduceMean(OpRegistryBase *op_registry); extern void RegisterReduce(OpRegistryBase *op_registry);
extern void RegisterReshape(OpRegistryBase *op_registry); extern void RegisterReshape(OpRegistryBase *op_registry);
extern void RegisterResizeBicubic(OpRegistryBase *op_registry); extern void RegisterResizeBicubic(OpRegistryBase *op_registry);
extern void RegisterResizeBilinear(OpRegistryBase *op_registry); extern void RegisterResizeBilinear(OpRegistryBase *op_registry);
...@@ -102,7 +102,7 @@ OpRegistry::OpRegistry() : OpRegistryBase() { ...@@ -102,7 +102,7 @@ OpRegistry::OpRegistry() : OpRegistryBase() {
ops::RegisterMatMul(this); ops::RegisterMatMul(this);
ops::RegisterPad(this); ops::RegisterPad(this);
ops::RegisterPooling(this); ops::RegisterPooling(this);
ops::RegisterReduceMean(this); ops::RegisterReduce(this);
ops::RegisterReshape(this); ops::RegisterReshape(this);
ops::RegisterResizeBicubic(this); ops::RegisterResizeBicubic(this);
ops::RegisterResizeBilinear(this); ops::RegisterResizeBilinear(this);
......
...@@ -43,11 +43,14 @@ class PoolingOpBase : public ConvPool2dOpBase { ...@@ -43,11 +43,14 @@ class PoolingOpBase : public ConvPool2dOpBase {
kernels_(Operation::GetRepeatedArgs<int>("kernels")), kernels_(Operation::GetRepeatedArgs<int>("kernels")),
pooling_type_( pooling_type_(
static_cast<PoolingType>(Operation::GetOptionalArg<int>( static_cast<PoolingType>(Operation::GetOptionalArg<int>(
"pooling_type", static_cast<int>(AVG)))) {} "pooling_type", static_cast<int>(AVG)))),
round_type_(static_cast<RoundType>(Operation::GetOptionalArg<int>(
"round_mode", static_cast<int>(CEIL)))) {}
protected: protected:
std::vector<int> kernels_; std::vector<int> kernels_;
PoolingType pooling_type_; PoolingType pooling_type_;
RoundType round_type_;
MACE_OP_INPUT_TAGS(INPUT); MACE_OP_INPUT_TAGS(INPUT);
MACE_OP_OUTPUT_TAGS(OUTPUT); MACE_OP_OUTPUT_TAGS(OUTPUT);
...@@ -82,7 +85,7 @@ class PoolingOp<DeviceType::CPU, float> : public PoolingOpBase { ...@@ -82,7 +85,7 @@ class PoolingOp<DeviceType::CPU, float> : public PoolingOpBase {
paddings_.data(), paddings_.data(),
dilations_.data(), dilations_.data(),
strides_.data(), strides_.data(),
RoundType::CEIL, round_type_,
output_shape.data()); output_shape.data());
} }
MACE_RETURN_IF_ERROR(output_tensor->Resize(output_shape)); MACE_RETURN_IF_ERROR(output_tensor->Resize(output_shape));
...@@ -255,7 +258,7 @@ class PoolingOp<DeviceType::CPU, uint8_t> : public PoolingOpBase { ...@@ -255,7 +258,7 @@ class PoolingOp<DeviceType::CPU, uint8_t> : public PoolingOpBase {
paddings_.data(), paddings_.data(),
dilations_.data(), dilations_.data(),
strides_.data(), strides_.data(),
RoundType::CEIL, round_type_,
output_shape.data()); output_shape.data());
} }
MACE_RETURN_IF_ERROR(output_tensor->Resize(output_shape)); MACE_RETURN_IF_ERROR(output_tensor->Resize(output_shape));
...@@ -442,7 +445,7 @@ class PoolingOp<DeviceType::GPU, T> : public PoolingOpBase { ...@@ -442,7 +445,7 @@ class PoolingOp<DeviceType::GPU, T> : public PoolingOpBase {
return kernel_->Compute(context, input, pooling_type_, kernels_.data(), return kernel_->Compute(context, input, pooling_type_, kernels_.data(),
strides_.data(), padding_type_, paddings_, strides_.data(), padding_type_, paddings_,
dilations_.data(), output); dilations_.data(), round_type_, output);
} }
private: private:
......
此差异已折叠。
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_OPS_REDUCE_H_
#define MACE_OPS_REDUCE_H_
namespace mace {
enum ReduceType {
// SUM = 0,
MEAN = 0,
MIN = 1,
MAX = 2,
PROD = 3,
// SUM_SQR = 4,
// SQR_MEAN = 5,
};
} // namespace mace
#endif // MACE_OPS_REDUCE_H_
...@@ -21,7 +21,7 @@ namespace test { ...@@ -21,7 +21,7 @@ namespace test {
namespace { namespace {
template <DeviceType D, typename T> template <DeviceType D, typename T>
void ReduceMean(int iters, int batch, int channels, void Reduce(int iters, int batch, int channels,
int height, int width) { int height, int width) {
mace::testing::StopTiming(); mace::testing::StopTiming();
...@@ -34,7 +34,7 @@ void ReduceMean(int iters, int batch, int channels, ...@@ -34,7 +34,7 @@ void ReduceMean(int iters, int batch, int channels,
net.AddRandomInput<D, T>("Input", {batch, channels, height, width}); net.AddRandomInput<D, T>("Input", {batch, channels, height, width});
} }
OpDefBuilder("ReduceMean", "ReduceMeanBM") OpDefBuilder("Reduce", "ReduceBM")
.Input("Input") .Input("Input")
.AddIntsArg("axis", axis) .AddIntsArg("axis", axis)
.Output("OutputImage") .Output("OutputImage")
...@@ -55,30 +55,30 @@ void ReduceMean(int iters, int batch, int channels, ...@@ -55,30 +55,30 @@ void ReduceMean(int iters, int batch, int channels,
} }
} // namespace } // namespace
#define MACE_BM_REDUCE_MEAN_MACRO(N, C, H, W, TYPE, DEVICE) \ #define MACE_BM_REDUCE_MACRO(N, C, H, W, TYPE, DEVICE) \
static void \ static void \
MACE_BM_REDUCE_MEAN_##N##_##C##_##H##_##W##_##TYPE##_##DEVICE(\ MACE_BM_REDUCE_##N##_##C##_##H##_##W##_##TYPE##_##DEVICE(\
int iters) { \ int iters) { \
const int64_t tot = static_cast<int64_t>(iters) * N * C * H * W; \ const int64_t tot = static_cast<int64_t>(iters) * N * C * H * W; \
mace::testing::MaccProcessed(tot); \ mace::testing::MaccProcessed(tot); \
mace::testing::BytesProcessed(tot *(sizeof(TYPE))); \ mace::testing::BytesProcessed(tot *(sizeof(TYPE))); \
ReduceMean<DEVICE, TYPE>(iters, N, C, H, W); \ Reduce<DEVICE, TYPE>(iters, N, C, H, W); \
} \ } \
MACE_BENCHMARK( \ MACE_BENCHMARK( \
MACE_BM_REDUCE_MEAN_##N##_##C##_##H##_##W##_##TYPE##_##DEVICE) MACE_BM_REDUCE_##N##_##C##_##H##_##W##_##TYPE##_##DEVICE)
#define MACE_BM_REDUCE_MEAN(N, C, H, W) \ #define MACE_BM_REDUCE(N, C, H, W) \
MACE_BM_REDUCE_MEAN_MACRO(N, C, H, W, float, GPU); \ MACE_BM_REDUCE_MACRO(N, C, H, W, float, GPU); \
MACE_BM_REDUCE_MEAN_MACRO(N, C, H, W, half, GPU); \ MACE_BM_REDUCE_MACRO(N, C, H, W, half, GPU); \
MACE_BM_REDUCE_MEAN_MACRO(N, C, H, W, float, CPU); MACE_BM_REDUCE_MACRO(N, C, H, W, float, CPU);
MACE_BM_REDUCE_MEAN(1, 1, 512, 512); MACE_BM_REDUCE(1, 1, 512, 512);
MACE_BM_REDUCE_MEAN(4, 3, 128, 128); MACE_BM_REDUCE(4, 3, 128, 128);
MACE_BM_REDUCE_MEAN(4, 1, 512, 512); MACE_BM_REDUCE(4, 1, 512, 512);
MACE_BM_REDUCE_MEAN(16, 32, 112, 112); MACE_BM_REDUCE(16, 32, 112, 112);
MACE_BM_REDUCE_MEAN(8, 64, 256, 256); MACE_BM_REDUCE(8, 64, 256, 256);
MACE_BM_REDUCE_MEAN(1, 32, 480, 640); MACE_BM_REDUCE(1, 32, 480, 640);
} // namespace test } // namespace test
......
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <algorithm>
#include <memory>
#include <vector>
#include "mace/core/future.h"
#include "mace/core/operator.h"
#include "mace/core/tensor.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/ops/opencl/image/reduce_mean.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace ops {
class ReduceMeanOpBase : public Operation {
public:
explicit ReduceMeanOpBase(OpConstructContext *context)
: Operation(context),
axis_(Operation::GetRepeatedArgs<int>("axis")),
keep_dims_(Operation::GetOptionalArg<bool>("keepdims", false)) {
}
protected:
inline void Validate() {
const Tensor *input = this->Input(0);
const int left = static_cast<int>(input->dim_size() * -1);
const int right = static_cast<int>(input->dim_size());
if (axis_.size()) {
for (unsigned int i = 0; i < axis_.size(); ++i) {
MACE_CHECK(axis_[i] > left && axis_[i] < right, "Axis is over range.");
}
}
}
protected:
std::vector<int> axis_;
bool keep_dims_;
};
template <DeviceType D, class T>
class ReduceMeanOp;
template <typename T>
class ReduceMeanOp<DeviceType::CPU, T> : public ReduceMeanOpBase {
public:
explicit ReduceMeanOp(OpConstructContext *context)
: ReduceMeanOpBase(context) {
}
MaceStatus Run(OpContext *context) override {
MACE_UNUSED(context);
Validate();
const Tensor *input = this->Input(0);
Tensor *output = this->Output(0);
Simplify(input);
output->Resize(out_shape_);
Compute(input, output);
return MaceStatus::MACE_SUCCESS;
}
private:
void Simplify(const Tensor *input) {
std::vector<bool> bitmap(static_cast<uint32_t>(input->dim_size()), false);
if (axis_.size() == 0) {
for (int i = 0; i < input->dim_size(); ++i) {
bitmap[i] = true;
}
} else {
for (unsigned int i = 0; i < axis_.size(); ++i) {
int index = axis_[i] >= 0 ?
axis_[i] :
axis_[i] + input->dim_size();
// axis format is NHWC
if (input->dim_size() == 4) {
if (index == 1) index = 2;
else if (index == 2) index = 3;
else if (index == 3) index = 1;
}
bitmap[index] = true;
}
}
out_shape_.clear();
for (unsigned int i = 0; i < input->dim_size(); ++i) {
if (!bitmap[i]) {
out_shape_.push_back(input->dim(i));
} else if (keep_dims_) {
out_shape_.push_back(1);
}
}
data_reshape_.clear();
unsigned int dim_index = 0;
for (; dim_index < input->dim_size(); ++dim_index) {
if (input->dim(dim_index) != 1) break;
}
if (dim_index >= input->dim_size()) {
reduce_first_axis_ = true;
} else {
reduce_first_axis_ = bitmap[dim_index];
data_reshape_.push_back(input->dim(dim_index));
++dim_index;
for (; dim_index < input->dim_size(); ++dim_index) {
const int n = input->dim(dim_index);
if (n == 1) {
bitmap[dim_index] = bitmap[dim_index - 1];
}
if (bitmap[dim_index-1] != bitmap[dim_index]) {
data_reshape_.push_back(n);
} else {
data_reshape_.back() *= n;
}
}
}
}
void Compute(const Tensor *input, Tensor *output) {
Tensor::MappingGuard input_mapper(input);
const T *input_ptr = input->data<T>();
Tensor::MappingGuard output_map(output);
T *output_ptr = output->mutable_data<T>();
memset(output_ptr, 0, output->size() * sizeof(T));
switch (data_reshape_.size()) {
case 1:
if (reduce_first_axis_) {
T sum = 0;
for (int i = 0; i < data_reshape_[0]; ++i) {
sum = sum + input_ptr[i];
}
output_ptr[0] = sum / data_reshape_[0];
} else {
#pragma omp parallel for schedule(runtime)
for (int i = 0; i < data_reshape_[0]; ++i) {
output_ptr[i] = input_ptr[i];
}
}
break;
case 2:
if (reduce_first_axis_) {
#pragma omp parallel for schedule(runtime)
for (int i = 0; i < data_reshape_[1]; ++i) {
for (int j = 0; j < data_reshape_[0]; ++j) {
output_ptr[i] += input_ptr[j * data_reshape_[1] + i];
}
output_ptr[i] /= data_reshape_[0];
}
} else {
#pragma omp parallel for schedule(runtime)
for (int i = 0; i < data_reshape_[0]; ++i) {
for (int j = 0; j < data_reshape_[1]; ++j) {
output_ptr[i] += input_ptr[i * data_reshape_[1] + j];
}
output_ptr[i] /= data_reshape_[1];
}
}
break;
case 3:
if (reduce_first_axis_) {
#pragma omp parallel for schedule(runtime)
for (int i = 0; i < data_reshape_[1]; ++i) {
for (int j = 0; j < data_reshape_[2]; ++j) {
for (int k = 0; k < data_reshape_[0]; ++k) {
output_ptr[i] +=
input_ptr[(k * data_reshape_[1] + i) * data_reshape_[2]
+ j];
}
}
output_ptr[i] /= (data_reshape_[0] * data_reshape_[2]);
}
} else {
#pragma omp parallel for collapse(2) schedule(runtime)
for (int i = 0; i < data_reshape_[0]; ++i) {
for (int j = 0; j < data_reshape_[2]; ++j) {
for (int k = 0; k < data_reshape_[1]; ++k) {
output_ptr[i * data_reshape_[2] + j] +=
input_ptr[(i * data_reshape_[1] + k) * data_reshape_[2]
+ j];
}
output_ptr[i * data_reshape_[2] + j] /= data_reshape_[1];
}
}
}
break;
case 4:
if (reduce_first_axis_) {
#pragma omp parallel for collapse(2) schedule(runtime)
for (int i = 0; i < data_reshape_[1]; ++i) {
for (int j = 0; j < data_reshape_[3]; ++j) {
for (int k = 0; k < data_reshape_[2]; ++k) {
for (int t = 0; t < data_reshape_[0]; ++t) {
output_ptr[i * data_reshape_[3] + j] +=
input_ptr[((t * data_reshape_[1] + i) *
data_reshape_[2] + k)*data_reshape_[3] + j];
}
}
output_ptr[i * data_reshape_[3] + j] /=
(data_reshape_[0] * data_reshape_[2]);
}
}
} else {
#pragma omp parallel for collapse(2) schedule(runtime)
for (int i = 0; i < data_reshape_[0]; ++i) {
for (int j = 0; j < data_reshape_[2]; ++j) {
for (int k = 0; k < data_reshape_[1]; ++k) {
for (int t = 0; t < data_reshape_[3]; ++t) {
output_ptr[i * data_reshape_[2] + j] +=
input_ptr[((i * data_reshape_[1] + k) *
data_reshape_[2] + j)*data_reshape_[3] + t];
}
}
output_ptr[i * data_reshape_[2] + j] /=
(data_reshape_[1] * data_reshape_[3]);
}
}
}
break;
default:
MACE_CHECK(false, "not implemented in mace")
<< "data reshape size" << data_reshape_.size()
<< "reduce first axis:" << reduce_first_axis_;
break;
}
}
private:
bool reduce_first_axis_;
std::vector<int> data_reshape_;
std::vector<index_t> out_shape_;
};
#ifdef MACE_ENABLE_OPENCL
template <typename T>
class ReduceMeanOp<DeviceType::GPU, T> : public ReduceMeanOpBase {
public:
explicit ReduceMeanOp(OpConstructContext *context)
: ReduceMeanOpBase(context) {
if (context->device()->gpu_runtime()->UseImageMemory()) {
kernel_.reset(new opencl::image::ReduceMeanKernel<T>(axis_, keep_dims_));
} else {
MACE_NOT_IMPLEMENTED;
}
}
MaceStatus Run(OpContext *context) override {
Validate();
const Tensor *input = this->Input(0);
Tensor *output = this->Output(0);
return kernel_->Compute(context, input, output);
}
private:
std::unique_ptr<OpenCLReduceMeanKernel> kernel_;
};
#endif // MACE_ENABLE_OPENCL
void RegisterReduceMean(OpRegistryBase *op_registry) {
MACE_REGISTER_OP(op_registry, "ReduceMean", ReduceMeanOp,
DeviceType::CPU, float);
#ifdef MACE_ENABLE_OPENCL
MACE_REGISTER_OP(op_registry, "ReduceMean", ReduceMeanOp,
DeviceType::GPU, float);
MACE_REGISTER_OP(op_registry, "ReduceMean", ReduceMeanOp,
DeviceType::GPU, half);
#endif // MACE_ENABLE_OPENCL
}
} // namespace ops
} // namespace mace
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/ops/ops_test_util.h"
namespace mace {
namespace ops {
namespace test {
class ReduceMeanOpTest : public OpsTestBase {};
namespace {
template <DeviceType D>
void Simple(const std::vector<index_t> &input_shape,
const std::vector<float> &input,
const std::vector<int> &axis,
const std::vector<index_t> &output_shape,
const std::vector<float> &output,
const bool keepdims = true) {
// Construct graph
OpsTestNet net;
// Add input data
net.AddInputFromArray<D, float>("Input", input_shape, input);
if (D == DeviceType::CPU) {
net.TransformDataFormat<D, float>("Input", NHWC, "InputNCHW", NCHW);
OpDefBuilder("ReduceMean", "ReduceMeanTest")
.Input("InputNCHW")
.AddIntsArg("axis", axis)
.AddIntArg("keepdims", keepdims ? 1 : 0)
.Output("OutputNCHW")
.Finalize(net.NewOperatorDef());
// Run
net.RunOp(D);
net.TransformDataFormat<D, float>("OutputNCHW", NCHW, "Output", NHWC);
} else {
OpDefBuilder("ReduceMean", "ReduceMeanTest")
.Input("Input")
.AddIntsArg("axis", axis)
.AddIntArg("keepdims", keepdims ? 1 : 0)
.Output("Output")
.Finalize(net.NewOperatorDef());
// Run
net.RunOp(D);
}
auto expected = net.CreateTensor<float>(output_shape, output);
ExpectTensorNear<float>(*expected, *net.GetOutput("Output"), 1e-5, 1e-3);
}
template <DeviceType D>
void Simple3D(const std::vector<index_t> &input_shape,
const std::vector<float> &input,
const std::vector<int> &axis,
const std::vector<index_t> &output_shape,
const std::vector<float> &output,
const bool keepdims = true) {
// Construct graph
OpsTestNet net;
// Add input data
net.AddInputFromArray<D, float>("Input", input_shape, input);
OpDefBuilder("ReduceMean", "ReduceMeanTest")
.Input("Input")
.AddIntsArg("axis", axis)
.AddIntArg("keepdims", keepdims ? 1 : 0)
.Output("Output")
.Finalize(net.NewOperatorDef());
// Run
net.RunOp(D);
auto expected = net.CreateTensor<float>(output_shape, output);
ExpectTensorNear<float>(*expected, *net.GetOutput("Output"), 1e-5, 1e-3);
}
template <DeviceType D>
void Simple12Test() {
Simple<D>({2, 2, 3, 4},
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23},
{1, 2},
{2, 1, 1, 4},
{10, 11, 12, 13,
10, 11, 12, 13});
}
template <DeviceType D>
void Simple1Axis() {
Simple<D>({2, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23,
0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{1},
{2, 1, 3, 4},
{6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{-3},
{1, 1, 3, 4},
{6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{2},
{1, 2, 1, 4},
{4, 5, 6, 7, 16, 17, 18, 19});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{-1},
{1, 2, 3, 1},
{1.5, 5.5, 9.5, 13.5, 17.5, 21.5});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{1},
{1, 1, 3, 3},
{9, 10, 11, 12, 13, 14, 15, 16, 17});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{-2},
{1, 3, 1, 3},
{3, 4, 5, 12, 13, 14, 21, 22, 23});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{3},
{1, 3, 3, 1},
{1, 4, 7, 10, 13, 16, 19, 22, 25});
}
template <DeviceType D>
void Simple2Axis() {
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 1},
{1, 1, 3, 4},
{6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 2},
{1, 2, 1, 4},
{4, 5, 6, 7, 16, 17, 18, 19});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{1, 3},
{1, 1, 3, 1},
{7.5, 11.5, 15.5});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{1, 2},
{1, 1, 1, 3},
{12, 13, 14});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{0, 1},
{1, 1, 3, 3},
{9, 10, 11, 12, 13, 14, 15, 16, 17});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{2, 3},
{1, 3, 1, 1},
{4, 13, 22});
}
template <DeviceType D>
void Simple2Axis3D() {
Simple3D<D>({2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 1},
{1, 1, 4},
{10, 11, 12, 13});
Simple3D<D>({2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{1, 2},
{2, 1, 1},
{5.5, 17.5});
}
template <DeviceType D>
void Simple3Axis() {
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{1, 2, 3},
{1, 1, 1, 1},
{11.5});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 2, 3},
{1, 2, 1, 1},
{5.5, 17.5});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 1, 3},
{1, 1, 3, 1},
{7.5, 11.5, 15.5});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 1, 2},
{1, 1, 1, 4},
{10, 11, 12, 13});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{1, 2, 3},
{1, 1, 1, 1},
{13});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{0, 2, 3},
{1, 3, 1, 1},
{4, 13, 22});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{0, 1, 3},
{1, 1, 3, 1},
{10, 13, 16});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{0, 1, 2},
{1, 1, 1, 3},
{12, 13, 14});
}
} // namespace
TEST_F(ReduceMeanOpTest, CPUSimple12) {
Simple12Test<DeviceType::CPU>();
}
TEST_F(ReduceMeanOpTest, GPUSimple12) {
Simple12Test<DeviceType::GPU>();
}
TEST_F(ReduceMeanOpTest, CPUSimple1Axis) {
Simple1Axis<DeviceType::CPU>();
}
TEST_F(ReduceMeanOpTest, CPUSimple2Axis) {
Simple2Axis<DeviceType::CPU>();
}
TEST_F(ReduceMeanOpTest, CPUSimple2Axis3D) {
Simple2Axis3D<DeviceType::CPU>();
}
TEST_F(ReduceMeanOpTest, CPUSimple3Axis) {
Simple3Axis<DeviceType::CPU>();
}
TEST_F(ReduceMeanOpTest, CPUSimpleReduceDims) {
Simple3D<CPU>({2, 3, 4},
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23},
{0, 1},
{4},
{10, 11, 12, 13},
false);
}
namespace {
template <DeviceType D, typename T>
void RandomTest(const std::vector<index_t> &input_shape,
const std::vector<int> &axis) {
testing::internal::LogToStderr();
srand(time(NULL));
// Construct graph
OpsTestNet net;
// Add input data
net.AddRandomInput<D, float>("Input", input_shape);
net.TransformDataFormat<DeviceType::CPU, float>("Input", NHWC, "InputNCHW",
NCHW);
OpDefBuilder("ReduceMean", "ReduceMeanTest")
.Input("InputNCHW")
.AddIntsArg("axis", axis)
.AddIntArg("keepdims", 1)
.Output("OutputNCHW")
.Finalize(net.NewOperatorDef());
// Run
net.RunOp();
net.TransformDataFormat<DeviceType::CPU, float>("OutputNCHW", NCHW,
"Output", NHWC);
OpDefBuilder("ReduceMean", "ReduceMeanTest")
.Input("Input")
.AddIntsArg("axis", axis)
.AddIntArg("keepdims", 1)
.Output("OPENCLOutput")
.Finalize(net.NewOperatorDef());
// Run
net.RunOp(D);
if (DataTypeToEnum<T>::value == DT_FLOAT) {
ExpectTensorNear<float>(*net.GetTensor("Output"),
*net.GetOutput("OPENCLOutput"), 1e-5, 1e-4);
} else {
ExpectTensorNear<float>(*net.GetTensor("Output"),
*net.GetOutput("OPENCLOutput"), 1e-2, 1e-2);
}
}
} // namespace
TEST_F(ReduceMeanOpTest, GPURandomFloat) {
RandomTest<DeviceType::GPU, float>({4, 64, 64, 3}, {1, 2});
RandomTest<DeviceType::GPU, float>({2, 64, 64, 4}, {1, 2});
RandomTest<DeviceType::GPU, float>({8, 128, 128, 64}, {1, 2});
RandomTest<DeviceType::GPU, float>({1, 640, 480, 64}, {1, 2});
RandomTest<DeviceType::GPU, float>({1, 480, 640, 32}, {1, 2});
RandomTest<DeviceType::GPU, float>({1, 512, 512, 16}, {1, 2});
RandomTest<DeviceType::GPU, float>({8, 117, 87, 33}, {1, 2});
RandomTest<DeviceType::GPU, float>({1, 619, 450, 61}, {1, 2});
RandomTest<DeviceType::GPU, float>({1, 511, 561, 11}, {1, 2});
}
TEST_F(ReduceMeanOpTest, GPURandomHalf) {
RandomTest<DeviceType::GPU, half>({4, 64, 64, 3}, {1, 2});
RandomTest<DeviceType::GPU, half>({2, 64, 64, 4}, {1, 2});
RandomTest<DeviceType::GPU, half>({8, 128, 128, 64}, {1, 2});
RandomTest<DeviceType::GPU, half>({1, 640, 480, 64}, {1, 2});
RandomTest<DeviceType::GPU, half>({1, 480, 640, 32}, {1, 2});
RandomTest<DeviceType::GPU, half>({1, 512, 512, 16}, {1, 2});
RandomTest<DeviceType::GPU, half>({8, 117, 87, 33}, {1, 2});
RandomTest<DeviceType::GPU, half>({1, 619, 450, 61}, {1, 2});
RandomTest<DeviceType::GPU, half>({1, 511, 561, 11}, {1, 2});
}
} // namespace test
} // namespace ops
} // namespace mace
此差异已折叠。
...@@ -36,6 +36,7 @@ class ReshapeOp : public Operation { ...@@ -36,6 +36,7 @@ class ReshapeOp : public Operation {
int unknown_idx = -1; int unknown_idx = -1;
index_t product = 1; index_t product = 1;
std::vector<index_t> out_shape; std::vector<index_t> out_shape;
index_t n = 0;
for (int i = 0; i < num_dims; ++i) { for (int i = 0; i < num_dims; ++i) {
if (shape_data[i] == -1) { if (shape_data[i] == -1) {
...@@ -45,8 +46,15 @@ class ReshapeOp : public Operation { ...@@ -45,8 +46,15 @@ class ReshapeOp : public Operation {
} else { } else {
MACE_CHECK(shape_data[i] >= 0, "Shape must be non-negative: ", MACE_CHECK(shape_data[i] >= 0, "Shape must be non-negative: ",
shape_data[i]); shape_data[i]);
out_shape.push_back(shape_data[i]); if (shape_data[i] == 0) {
product *= shape_data[i]; MACE_CHECK(i < input->dim_size(),
"dims:0 out of input dims' range.");
n = input->dim(i);
} else {
n = shape_data[i];
}
out_shape.push_back(n);
product *= n;
} }
} }
......
...@@ -13,6 +13,7 @@ py_library( ...@@ -13,6 +13,7 @@ py_library(
"converter_tool/base_converter.py", "converter_tool/base_converter.py",
"converter_tool/caffe_converter.py", "converter_tool/caffe_converter.py",
"converter_tool/hexagon_converter.py", "converter_tool/hexagon_converter.py",
"converter_tool/onnx_converter.py",
"converter_tool/shape_inference.py", "converter_tool/shape_inference.py",
"converter_tool/tensorflow_converter.py", "converter_tool/tensorflow_converter.py",
"converter_tool/tf_dsp_converter.py", "converter_tool/tf_dsp_converter.py",
......
...@@ -101,7 +101,7 @@ def main(unused_args): ...@@ -101,7 +101,7 @@ def main(unused_args):
file=sys.stderr) file=sys.stderr)
sys.exit(-1) sys.exit(-1)
if FLAGS.platform not in ['tensorflow', 'caffe']: if FLAGS.platform not in ['tensorflow', 'caffe', 'onnx']:
six.print_("platform %s is not supported." % FLAGS.platform, six.print_("platform %s is not supported." % FLAGS.platform,
file=sys.stderr) file=sys.stderr)
sys.exit(-1) sys.exit(-1)
...@@ -188,6 +188,9 @@ def main(unused_args): ...@@ -188,6 +188,9 @@ def main(unused_args):
converter = caffe_converter.CaffeConverter(option, converter = caffe_converter.CaffeConverter(option,
FLAGS.model_file, FLAGS.model_file,
FLAGS.weight_file) FLAGS.weight_file)
elif FLAGS.platform == 'onnx':
from mace.python.tools.converter_tool import onnx_converter
converter = onnx_converter.OnnxConverter(option, FLAGS.model_file)
else: else:
six.print_("Mace do not support platorm %s yet." % FLAGS.platform, six.print_("Mace do not support platorm %s yet." % FLAGS.platform,
file=sys.stderr) file=sys.stderr)
...@@ -231,6 +234,7 @@ def parse_args(): ...@@ -231,6 +234,7 @@ def parse_args():
type=str, type=str,
default="", default="",
help="TensorFlow \'GraphDef\' file to load, " help="TensorFlow \'GraphDef\' file to load, "
"Onnx model file .onnx to load, "
"Caffe prototxt file to load.") "Caffe prototxt file to load.")
parser.add_argument( parser.add_argument(
"--weight_file", type=str, default="", help="Caffe data file to load.") "--weight_file", type=str, default="", help="Caffe data file to load.")
...@@ -300,7 +304,10 @@ def parse_args(): ...@@ -300,7 +304,10 @@ def parse_args():
parser.add_argument( parser.add_argument(
"--check_shape", type=str, default="", help="check shape.") "--check_shape", type=str, default="", help="check shape.")
parser.add_argument( parser.add_argument(
"--platform", type=str, default="tensorflow", help="tensorflow/caffe") "--platform",
type=str,
default="tensorflow",
help="tensorflow/caffe/onnx")
parser.add_argument( parser.add_argument(
"--embed_model_data", "--embed_model_data",
type=str2bool, type=str2bool,
......
...@@ -37,11 +37,14 @@ class FilterFormat(Enum): ...@@ -37,11 +37,14 @@ class FilterFormat(Enum):
OHWI = 103 OHWI = 103
# SAME_LOWER: if the amount of paddings to be added is odd,
# it will add the extra data to the right or bottom
class PaddingMode(Enum): class PaddingMode(Enum):
VALID = 0 VALID = 0
SAME = 1 SAME = 1
FULL = 2 FULL = 2
NA = 3 SAME_LOWER = 3
NA = 4
class PoolingType(Enum): class PoolingType(Enum):
...@@ -49,6 +52,11 @@ class PoolingType(Enum): ...@@ -49,6 +52,11 @@ class PoolingType(Enum):
MAX = 2 MAX = 2
class RoundMode(Enum):
FLOOR = 0
CEIL = 1
class ActivationType(Enum): class ActivationType(Enum):
NOOP = 0 NOOP = 0
RELU = 1 RELU = 1
...@@ -56,6 +64,7 @@ class ActivationType(Enum): ...@@ -56,6 +64,7 @@ class ActivationType(Enum):
PRELU = 3 PRELU = 3
TANH = 4 TANH = 4
SIGMOID = 5 SIGMOID = 5
LEAKYRELU = 6
class EltwiseType(Enum): class EltwiseType(Enum):
...@@ -72,9 +81,17 @@ class EltwiseType(Enum): ...@@ -72,9 +81,17 @@ class EltwiseType(Enum):
EQUAL = 10 EQUAL = 10
class ReduceType(Enum):
MEAN = 0
MIN = 1
MAX = 2
PROD = 3
class FrameworkType(Enum): class FrameworkType(Enum):
TENSORFLOW = 0 TENSORFLOW = 0
CAFFE = 1 CAFFE = 1
ONNX = 2
MaceSupportedOps = [ MaceSupportedOps = [
...@@ -108,7 +125,7 @@ MaceSupportedOps = [ ...@@ -108,7 +125,7 @@ MaceSupportedOps = [
'Pooling', 'Pooling',
'Proposal', 'Proposal',
'Quantize', 'Quantize',
'ReduceMean', 'Reduce',
'Reshape', 'Reshape',
'ResizeBicubic', 'ResizeBicubic',
'ResizeBilinear', 'ResizeBilinear',
...@@ -184,6 +201,10 @@ class MaceKeyword(object): ...@@ -184,6 +201,10 @@ class MaceKeyword(object):
mace_group_str = "group" mace_group_str = "group"
mace_wino_arg_str = "wino_block_size" mace_wino_arg_str = "wino_block_size"
mace_quantize_flag_arg_str = "quantize_flag" mace_quantize_flag_arg_str = "quantize_flag"
mace_epsilon_str = 'epsilon'
mace_reduce_type_str = 'reduce_type'
mace_argmin_str = 'argmin'
mace_round_mode_str = 'round_mode'
class TransformerRule(Enum): class TransformerRule(Enum):
......
此差异已折叠。
...@@ -26,6 +26,7 @@ from mace.python.tools.converter_tool.base_converter import PaddingMode ...@@ -26,6 +26,7 @@ from mace.python.tools.converter_tool.base_converter import PaddingMode
from mace.python.tools.converter_tool.base_converter import ActivationType from mace.python.tools.converter_tool.base_converter import ActivationType
from mace.python.tools.converter_tool.base_converter import EltwiseType from mace.python.tools.converter_tool.base_converter import EltwiseType
from mace.python.tools.converter_tool.base_converter import FrameworkType from mace.python.tools.converter_tool.base_converter import FrameworkType
from mace.python.tools.converter_tool.base_converter import ReduceType
from mace.python.tools.converter_tool.base_converter import DataFormat from mace.python.tools.converter_tool.base_converter import DataFormat
from mace.python.tools.converter_tool.base_converter import FilterFormat from mace.python.tools.converter_tool.base_converter import FilterFormat
from mace.python.tools.converter_tool.base_converter import MaceOp from mace.python.tools.converter_tool.base_converter import MaceOp
...@@ -465,15 +466,6 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -465,15 +466,6 @@ class TensorflowConverter(base_converter.ConverterInterface):
"Mace only supports dilation == 1 conv2d_transpose.") "Mace only supports dilation == 1 conv2d_transpose.")
mace_check(len(tf_op.inputs) >= 3, mace_check(len(tf_op.inputs) >= 3,
"deconv should have (>=) 3 inputs.") "deconv should have (>=) 3 inputs.")
output_shape_arg = op.arg.add()
output_shape_arg.name = MaceKeyword.mace_output_shape_str
# if tf_op.inputs[0].op.type == TFOpType.Const.name:
# output_shape_value = \
# tf_op.inputs[0].eval().astype(np.int32).flat
# output_shape_arg.ints.extend(output_shape_value)
# else:
# output_shape_value = {}
# output_shape_arg.ints.extend(output_shape_value)
del op.input[:] del op.input[:]
op.input.extend([tf_op.inputs[2].name, op.input.extend([tf_op.inputs[2].name,
tf_op.inputs[1].name, tf_op.inputs[1].name,
...@@ -810,7 +802,12 @@ class TensorflowConverter(base_converter.ConverterInterface): ...@@ -810,7 +802,12 @@ class TensorflowConverter(base_converter.ConverterInterface):
op = self.convert_general_op(tf_op) op = self.convert_general_op(tf_op)
del op.input[1:] del op.input[1:]
op.type = MaceOp.ReduceMean.name op.type = MaceOp.Reduce.name
reduce_type_arg = op.arg.add()
reduce_type_arg.name = MaceKeyword.mace_reduce_type_str
reduce_type_arg.i = ReduceType.MEAN
axis_arg = op.arg.add() axis_arg = op.arg.add()
axis_arg.name = MaceKeyword.mace_axis_str axis_arg.name = MaceKeyword.mace_axis_str
if len(tf_op.inputs) > 1: if len(tf_op.inputs) > 1:
......
...@@ -352,21 +352,26 @@ class Transformer(base_converter.ConverterInterface): ...@@ -352,21 +352,26 @@ class Transformer(base_converter.ConverterInterface):
if elttype == EltwiseType.SQR_DIFF.value and\ if elttype == EltwiseType.SQR_DIFF.value and\
self.consumer_count(op.output[0]) == 1: self.consumer_count(op.output[0]) == 1:
consumer_op = self._consumers[op.output[0]][0] consumer_op = self._consumers[op.output[0]][0]
axis = ConverterUtil.get_arg( if consumer_op.type == MaceOp.Reduce.name:
consumer_op, axis = ConverterUtil.get_arg(
MaceKeyword.mace_axis_str).ints consumer_op,
keep_dims = ConverterUtil.get_arg( MaceKeyword.mace_axis_str).ints
consumer_op, keep_dims = ConverterUtil.get_arg(
MaceKeyword.mace_keepdims_str).i consumer_op,
if consumer_op.type == MaceOp.ReduceMean.name and\ MaceKeyword.mace_keepdims_str).i
len(consumer_op.input) == 1 and \ reduce_type = ConverterUtil.get_arg(
axis[0] == 1 and axis[1] == 2 and keep_dims != 0: consumer_op,
print("Fold SquaredDiff ReduceMean: %s" % op.name) MaceKeyword.mace_reduce_type_str).i
op.type = MaceOp.SqrDiffMean.name if reduce_type == ReduceType.MEAN and\
op.output[0] = consumer_op.output[0] len(consumer_op.input) == 1 and\
self.replace_quantize_info(op, consumer_op) axis[0] == 1 and axis[1] == 2 and\
self.safe_remove_node(consumer_op, op) keep_dims > 0:
return True print("Fold SquaredDiff Reduce: %s" % op.name)
op.type = MaceOp.SqrDiffMean.name
op.output[0] = consumer_op.output[0]
self.replace_quantize_info(op, consumer_op)
self.safe_remove_node(consumer_op, op)
return True
return False return False
...@@ -1005,13 +1010,13 @@ class Transformer(base_converter.ConverterInterface): ...@@ -1005,13 +1010,13 @@ class Transformer(base_converter.ConverterInterface):
'only support squeeze at at [2, 3]') 'only support squeeze at at [2, 3]')
arg.ints[:] = [1, 2] arg.ints[:] = [1, 2]
elif op.type == MaceOp.ReduceMean.name: elif op.type == MaceOp.Reduce.name:
for arg in op.arg: for arg in op.arg:
if arg.name == MaceKeyword.mace_axis_str: if arg.name == MaceKeyword.mace_axis_str:
if ConverterUtil.data_format( if ConverterUtil.data_format(
op) == DataFormat.NCHW \ op) == DataFormat.NCHW \
and self._target_data_format == DataFormat.NHWC: # noqa and self._target_data_format == DataFormat.NHWC: # noqa
print("Transpose reduce mean args: %s(%s)" print("Transpose reduce args: %s(%s)"
% (op.name, op.type)) % (op.name, op.type))
reduce_axises = list(arg.ints) reduce_axises = list(arg.ints)
new_axises = [] new_axises = []
......
...@@ -48,7 +48,7 @@ def _opencl_encrypt_kernel_impl(repository_ctx): ...@@ -48,7 +48,7 @@ def _opencl_encrypt_kernel_impl(repository_ctx):
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/pad.cl")) unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/pad.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/pooling.cl")) unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/pooling.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/pooling_buffer.cl")) unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/pooling_buffer.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/reduce_mean.cl")) unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/reduce.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/resize_bicubic.cl")) unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/resize_bicubic.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/resize_bilinear.cl")) unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/resize_bilinear.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/split.cl")) unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/split.cl"))
......
...@@ -362,6 +362,7 @@ class YAMLKeyword(object): ...@@ -362,6 +362,7 @@ class YAMLKeyword(object):
validation_threshold = 'validation_threshold' validation_threshold = 'validation_threshold'
graph_optimize_options = 'graph_optimize_options' # internal use for now graph_optimize_options = 'graph_optimize_options' # internal use for now
cl_mem_type = 'cl_mem_type' cl_mem_type = 'cl_mem_type'
backend = 'backend'
################################ ################################
......
...@@ -55,6 +55,7 @@ ModelFormatStrs = [ ...@@ -55,6 +55,7 @@ ModelFormatStrs = [
PlatformTypeStrs = [ PlatformTypeStrs = [
"tensorflow", "tensorflow",
"caffe", "caffe",
"onnx",
] ]
PlatformType = Enum('PlatformType', [(ele, ele) for ele in PlatformTypeStrs], PlatformType = Enum('PlatformType', [(ele, ele) for ele in PlatformTypeStrs],
type=str) type=str)
...@@ -469,6 +470,10 @@ def format_model_config(flags): ...@@ -469,6 +470,10 @@ def format_model_config(flags):
else: else:
subgraph[YAMLKeyword.validation_inputs_data] = \ subgraph[YAMLKeyword.validation_inputs_data] = \
validation_inputs_data validation_inputs_data
onnx_backend = subgraph.get(
YAMLKeyword.backend, "tensorflow")
subgraph[YAMLKeyword.backend] = onnx_backend
input_ranges = subgraph.get( input_ranges = subgraph.get(
YAMLKeyword.input_ranges, []) YAMLKeyword.input_ranges, [])
if not isinstance(input_ranges, list): if not isinstance(input_ranges, list):
......
...@@ -572,7 +572,8 @@ class DeviceWrapper: ...@@ -572,7 +572,8 @@ class DeviceWrapper:
YAMLKeyword.input_data_types], YAMLKeyword.input_data_types],
caffe_env=flags.caffe_env, caffe_env=flags.caffe_env,
validation_threshold=subgraphs[0][ validation_threshold=subgraphs[0][
YAMLKeyword.validation_threshold][validate_type] YAMLKeyword.validation_threshold][validate_type],
backend=subgraphs[0][YAMLKeyword.backend]
) )
if flags.report and flags.round > 0: if flags.report and flags.round > 0:
tuned = is_tuned and device_type == DeviceType.GPU tuned = is_tuned and device_type == DeviceType.GPU
......
# Copyright 2018 Xiaomi, Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import onnx
import sys
from onnx import optimizer
# Usage: python onnx_optimizer.py model.onnx model_opt.onnx
def main():
if len(sys.argv) != 3:
print "Usage: python onnx_optimizer.py model.onnx model_opt.onnx"
sys.exit(0)
in_path = sys.argv[1]
out_path = sys.argv[2]
original_model = onnx.load(in_path)
print "Start optimize ONNX model for inference:"
passes = ['eliminate_identity',
'fuse_consecutive_squeezes',
'fuse_consecutive_transposes',
'eliminate_nop_pad',
'eliminate_nop_transpose',
'eliminate_unused_initializer',
'extract_constant_to_initializer',
'fuse_add_bias_into_conv',
'fuse_bn_into_conv',
'fuse_transpose_into_gemm']
for i in range(len(passes)):
print i, ".", passes[i]
optimized_model = optimizer.optimize(original_model, passes)
onnx.save_model(optimized_model, out_path)
print "Optimize Finished!"
print "Please check new model in:", out_path
if __name__ == '__main__':
main()
...@@ -621,7 +621,8 @@ def validate_model(abi, ...@@ -621,7 +621,8 @@ def validate_model(abi,
caffe_env, caffe_env,
input_file_name="model_input", input_file_name="model_input",
output_file_name="model_out", output_file_name="model_out",
validation_threshold=0.9): validation_threshold=0.9,
backend="tensorflow"):
six.print_("* Validate with %s" % platform) six.print_("* Validate with %s" % platform)
if abi != "host": if abi != "host":
for output_name in output_nodes: for output_name in output_nodes:
...@@ -638,7 +639,14 @@ def validate_model(abi, ...@@ -638,7 +639,14 @@ def validate_model(abi,
"%s/%s" % (model_output_dir, output_file_name), device_type, "%s/%s" % (model_output_dir, output_file_name), device_type,
":".join(input_shapes), ":".join(output_shapes), ":".join(input_shapes), ":".join(output_shapes),
",".join(input_nodes), ",".join(output_nodes), ",".join(input_nodes), ",".join(output_nodes),
validation_threshold, ",".join(input_data_types)) validation_threshold, ",".join(input_data_types), backend)
elif platform == "onnx":
validate(platform, model_file_path, "",
"%s/%s" % (model_output_dir, input_file_name),
"%s/%s" % (model_output_dir, output_file_name), device_type,
":".join(input_shapes), ":".join(output_shapes),
",".join(input_nodes), ",".join(output_nodes),
validation_threshold, ",".join(input_data_types), backend)
elif platform == "caffe": elif platform == "caffe":
image_name = "mace-caffe:latest" image_name = "mace-caffe:latest"
container_name = "mace_caffe_validator" container_name = "mace_caffe_validator"
...@@ -654,7 +662,7 @@ def validate_model(abi, ...@@ -654,7 +662,7 @@ def validate_model(abi,
device_type, device_type,
":".join(input_shapes), ":".join(output_shapes), ":".join(input_shapes), ":".join(output_shapes),
",".join(input_nodes), ",".join(output_nodes), ",".join(input_nodes), ",".join(output_nodes),
validation_threshold, ",".join(input_data_types)) validation_threshold, ",".join(input_data_types), backend)
elif caffe_env == common.CaffeEnvType.DOCKER: elif caffe_env == common.CaffeEnvType.DOCKER:
docker_image_id = sh.docker("images", "-q", image_name) docker_image_id = sh.docker("images", "-q", image_name)
if not docker_image_id: if not docker_image_id:
...@@ -720,6 +728,7 @@ def validate_model(abi, ...@@ -720,6 +728,7 @@ def validate_model(abi,
"--output_shape=%s" % ":".join(output_shapes), "--output_shape=%s" % ":".join(output_shapes),
"--validation_threshold=%f" % validation_threshold, "--validation_threshold=%f" % validation_threshold,
"--input_data_type=%s" % ",".join(input_data_types), "--input_data_type=%s" % ",".join(input_data_types),
"--backend=%s" % ",".join(backend),
_fg=True) _fg=True)
six.print_("Validation done!\n") six.print_("Validation done!\n")
......
...@@ -21,6 +21,10 @@ import re ...@@ -21,6 +21,10 @@ import re
import common import common
import onnx
from onnx import helper
from onnx import TensorProto
# Validation Flow: # Validation Flow:
# 1. Generate input data # 1. Generate input data
# 2. Use mace_run to run model on phone. # 2. Use mace_run to run model on phone.
...@@ -190,9 +194,64 @@ def validate_caffe_model(platform, device_type, model_file, input_file, ...@@ -190,9 +194,64 @@ def validate_caffe_model(platform, device_type, model_file, input_file,
value, validation_threshold) value, validation_threshold)
def validate_onnx_model(platform, device_type, model_file, input_file,
mace_out_file, input_names, input_shapes,
output_names, output_shapes, validation_threshold,
input_data_types, backend):
if backend == "tensorflow":
from onnx_tf.backend import prepare
print "valivate on onnx tensorflow backend."
elif backend == "caffe2" or backend == "pytorch":
from caffe2.python.onnx.backend import prepare
print "valivate on onnx caffe2 backend."
else:
common.MaceLogger.error(
VALIDATION_MODULE,
"onnx backend framwork '" + backend + "' is invalid.")
if not os.path.isfile(model_file):
common.MaceLogger.error(
VALIDATION_MODULE,
"Input graph file '" + model_file + "' does not exist!")
model = onnx.load(model_file)
input_dict = {}
for i in range(len(input_names)):
input_value = load_data(common.formatted_file_name(input_file,
input_names[i]),
input_data_types[i])
input_value = input_value.reshape(input_shapes[i]).transpose((0, 3, 1,
2))
input_dict[input_names[i]] = input_value
onnx_outputs = []
for i in range(len(output_names)):
out_shape = output_shapes[i]
if len(out_shape) == 4:
out_shape[1], out_shape[2], out_shape[3] = \
out_shape[3], out_shape[1], out_shape[2]
onnx_outputs.append(
helper.make_tensor_value_info(output_names[i],
TensorProto.FLOAT,
out_shape))
model.graph.output.extend(onnx_outputs)
rep = prepare(model)
output_values = rep.run(input_dict)
for i in range(len(output_names)):
out_name = output_names[i]
value = output_values[out_name].flatten()
out_shape = output_shapes[i]
if len(out_shape) == 4:
value = value.reshape(out_shape).transpose((0, 2, 3, 1))
output_file_name = common.formatted_file_name(mace_out_file,
output_names[i])
mace_out_value = load_data(output_file_name)
compare_output(platform, device_type, output_names[i],
mace_out_value, value,
validation_threshold)
def validate(platform, model_file, weight_file, input_file, mace_out_file, def validate(platform, model_file, weight_file, input_file, mace_out_file,
device_type, input_shape, output_shape, input_node, output_node, device_type, input_shape, output_shape, input_node, output_node,
validation_threshold, input_data_type): validation_threshold, input_data_type, backend):
input_names = [name for name in input_node.split(',')] input_names = [name for name in input_node.split(',')]
input_shape_strs = [shape for shape in input_shape.split(':')] input_shape_strs = [shape for shape in input_shape.split(':')]
input_shapes = [[int(x) for x in shape.split(',')] input_shapes = [[int(x) for x in shape.split(',')]
...@@ -217,6 +276,15 @@ def validate(platform, model_file, weight_file, input_file, mace_out_file, ...@@ -217,6 +276,15 @@ def validate(platform, model_file, weight_file, input_file, mace_out_file,
mace_out_file, weight_file, input_names, mace_out_file, weight_file, input_names,
input_shapes, output_names, output_shapes, input_shapes, output_names, output_shapes,
validation_threshold) validation_threshold)
elif platform == 'onnx':
output_shape_strs = [shape for shape in output_shape.split(':')]
output_shapes = [[int(x) for x in shape.split(',')]
for shape in output_shape_strs]
validate_onnx_model(platform, device_type, model_file, input_file,
mace_out_file, input_names, input_shapes,
output_names, output_shapes,
validation_threshold,
input_data_types, backend)
def parse_args(): def parse_args():
...@@ -259,6 +327,11 @@ def parse_args(): ...@@ -259,6 +327,11 @@ def parse_args():
parser.add_argument( parser.add_argument(
"--validation_threshold", type=float, default=0.995, "--validation_threshold", type=float, default=0.995,
help="validation similarity threshold") help="validation similarity threshold")
parser.add_argument(
"--backend",
type=str,
default="tensorflow",
help="onnx backend framwork")
return parser.parse_known_args() return parser.parse_known_args()
...@@ -276,4 +349,5 @@ if __name__ == '__main__': ...@@ -276,4 +349,5 @@ if __name__ == '__main__':
FLAGS.input_node, FLAGS.input_node,
FLAGS.output_node, FLAGS.output_node,
FLAGS.validation_threshold, FLAGS.validation_threshold,
FLAGS.input_data_type) FLAGS.input_data_type,
FLAGS.backend)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册