提交 7a79a9cf 编写于 作者: 李寅

Merge branch 'add-onnx' into 'master'

add onnx

See merge request !902
......@@ -82,7 +82,8 @@ the following projects during the development:
[Caffe](https://github.com/BVLC/caffe),
[SNPE](https://developer.qualcomm.com/software/snapdragon-neural-processing-engine-ai),
[ARM ComputeLibrary](https://github.com/ARM-software/ComputeLibrary),
[ncnn](https://github.com/Tencent/ncnn) and many others: we learned many best
[ncnn](https://github.com/Tencent/ncnn),
[ONNX](https://github.com/onnx/onnx) and many others: we learned many best
practices from these projects.
Finally, we also thank the Qualcomm, Pinecone and MediaTek engineering teams for
......
......@@ -64,6 +64,9 @@ Optional dependencies
* - FileLock
- pip install -I filelock==3.0.0
- Required by run on Android
* - ONNX
- pip install onnx
- Required by ONNX model
.. note::
......
......@@ -72,3 +72,9 @@ Install Caffe (Optional)
-------------------------
Please follow the installation instruction of `Caffe <http://caffe.berkeleyvision.org/installation.html>`__.
Install ONNX (Optional)
-------------------------
Please follow the installation instruction of `ONNX <https://github.com/onnx/onnx#source>`__.
......@@ -18,8 +18,7 @@ MACE Model
~~~~~~~~~~
MACE defines a customized model format which is similar to
Caffe2. The MACE model can be converted from exported models by TensorFlow
and Caffe.
Caffe2. The MACE model can be converted from exported models by TensorFlow, Caffe or ONNX Model.
MACE Interpreter
~~~~~~~~~~~~~~~~~
......@@ -50,7 +49,7 @@ Build MACE dynamic or static libraries.
3. Convert model
~~~~~~~~~~~~~~~~~~
Convert TensorFlow or Caffe model to MACE model.
Convert TensorFlow, Caffe or ONNX model to MACE model.
4.1. Deploy
~~~~~~~~~~~~~~~~~~
......@@ -86,7 +85,7 @@ MACE覆盖了常见的移动端计算设备(CPU,GPU和DSP),并且提供
MACE Model
~~~~~~~~~~~~~~~~~~
MACE定义了自有的模型格式(类似于Caffe2),通过MACE提供的工具可以将Caffe和TensorFlow的模型
MACE定义了自有的模型格式(类似于Caffe2),通过MACE提供的工具可以将Caffe/TensorFlow/ONNX格式的模型
转为MACE模型。
MACE Interpreter
......@@ -118,7 +117,7 @@ CPU/GPU/DSP Runtime对应于各个计算设备的算子实现。
3. 转换模型
~~~~~~~~~~~~~~~~~~
将TensorFlow 或者 Caffe的模型转为MACE的模型。
将TensorFlow或者Caffe或者ONNX的模型转为MACE的模型。
4.1. 部署
~~~~~~~~~~~~~~~~~~
......
......@@ -78,6 +78,8 @@ in one deployment file.
- [optional] Specify Numpy validation inputs. When not provided, [-1, 1] random values will be used.
* - validation_threshold
- [optional] Specify the similarity threshold for validation. A dict with key in 'CPU', 'GPU' and/or 'HEXAGON' and value <= 1.0.
* - backend
- The onnx backend framework for validation, could be [tensorflow, caffe2, pytorch], default is tensorflow.
* - runtime
- The running device, one of [cpu, gpu, dsp, cpu_gpu]. cpu_gpu contains CPU and GPU model definition so you can run the model on both CPU and GPU.
* - data_type
......
......@@ -114,6 +114,19 @@ MACE now supports models from TensorFlow and Caffe (more frameworks will be supp
# Upgrade caffemodel
$CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel
- ONNX
Prepare your ONNX model.onnx file.
Use `ONNX Optimizer Tool <https://github.com/XiaoMi/mace/tree/master/tools/onnx_optimizer.py>`__ to optimize your model for inference.
This tool will improve the efficiency of inference like the `Graph Transform Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md>`__
in TensorFlow.
.. code:: bash
# Optimize your model
$python MACE_ROOT/tools/onnx_optimizer.py model.onnx model_opt.onnx
===========================================
2. Create a deployment file for your model
......@@ -137,6 +150,12 @@ Modify one of them and use it for your own case.
.. literalinclude:: models/demo_models_caffe.yml
:language: yaml
- ONNX
.. literalinclude:: models/demo_models_onnx.yml
:language: yaml
More details about model deployment file are in :doc:`advanced_usage`.
======================
......
# The name of library
library_name: mobilenet
target_abis: [arm64-v8a]
model_graph_format: file
model_data_format: file
models:
mobilenet_v1: # model tag, which will be used in model loading and must be specific.
platform: onnx
# path to your onnx model file. Support local path, http:// and https://
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v1/mobilenet-v1-1.0.pb
# sha256_checksum of your model's onnx file.
# use this command to get the sha256_checksum: sha256sum path/to/your/pb/file
model_sha256_checksum: 71b10f540ece33c49a7b51f5d4095fc9bd78ce46ebf0300487b2ee23d71294e6
# define your model's interface
# if there multiple inputs or outputs, write like blow:
# subgraphs:
# - input_tensors:
# - input0
# - input1
# input_shapes:
# - 1,224,224,3
# - 1,224,224,3
# output_tensors:
# - output0
# - output1
# output_shapes:
# - 1,1001
# - 1,1001
subgraphs:
- input_tensors:
- input
input_shapes:
- 1,224,224,3
output_tensors:
- MobilenetV1/Predictions/Reshape_1
output_shapes:
- 1,1001
# onnx backend framwork for validation. Suppport pytorch/caffe/tensorflow. Default is tensorflow.
backend: tensorflow
# cpu, gpu or cpu+gpu
runtime: cpu+gpu
winograd: 0
\ No newline at end of file
......@@ -32,7 +32,8 @@ enum ActivationType {
RELUX = 2,
PRELU = 3,
TANH = 4,
SIGMOID = 5
SIGMOID = 5,
LEAKYRELU = 6,
};
inline ActivationType StringToActivationType(const std::string type) {
......@@ -48,6 +49,8 @@ inline ActivationType StringToActivationType(const std::string type) {
return ActivationType::SIGMOID;
} else if (type == "NOOP") {
return ActivationType::NOOP;
} else if (type == "LEAKYRELU") {
return ActivationType ::LEAKYRELU;
} else {
LOG(FATAL) << "Unknown activation type: " << type;
}
......@@ -90,6 +93,13 @@ void DoActivation(const T *input_ptr,
output_ptr[i] = 1 / (1 + std::exp(-input_ptr[i]));
}
break;
case LEAKYRELU:
#pragma omp parallel for schedule(runtime)
for (index_t i = 0; i < size; ++i) {
output_ptr[i] = std::max(input_ptr[i],
static_cast<T>(0)) * relux_max_limit;
}
break;
default:
LOG(FATAL) << "Unknown activation type: " << type;
}
......@@ -122,6 +132,9 @@ inline void DoActivation(const float *input_ptr,
output_ptr[i] = 1 / (1 + std::exp(-input_ptr[i]));
}
break;
case LEAKYRELU:
LeakyReluNeon(input_ptr, relux_max_limit, size, output_ptr);
break;
default:
LOG(FATAL) << "Unknown activation type: " << type;
}
......
......@@ -27,18 +27,29 @@ template <DeviceType D, class T>
class ArgMaxOp : public Operation {
public:
explicit ArgMaxOp(OpConstructContext *context)
: Operation(context) {}
: Operation(context),
axis_(Operation::GetOptionalArg<int>("axis", 0)),
keep_dims_(Operation::GetOptionalArg<bool>("keepdims", true)),
argmin_(Operation::GetOptionalArg<bool>("argmin", false)) {}
MaceStatus Run(OpContext *context) override {
MACE_UNUSED(context);
const Tensor *input = this->Input(0);
const Tensor *axis = this->Input(1);
const Tensor *axis = this->InputSize() == 2 ?
this->Input(1) : nullptr;
Tensor *output = this->Output(0);
MACE_CHECK(keep_dims_, "Mace only supports keep_dims ArgMax.");
MACE_CHECK(input->dim_size() > 0, "ArgMax input should not be a scalar");
MACE_CHECK(axis->dim_size() == 0, "Mace argmax only supports scalar axis");
Tensor::MappingGuard axis_guard(axis);
int axis_value = axis->data<int32_t>()[0];
int axis_value = 0;
if (axis != nullptr) {
MACE_CHECK(axis->dim_size() == 0,
"Mace argmax only supports scalar axis");
Tensor::MappingGuard axis_guard(axis);
axis_value = axis->data<int32_t>()[0];
} else {
axis_value = axis_;
}
if (axis_value < 0) {
axis_value += input->dim_size();
}
......@@ -59,22 +70,43 @@ class ArgMaxOp : public Operation {
index_t outer_size = output->size();
index_t inner_size = input->dim(axis_value);
if (argmin_) {
#pragma omp parallel for schedule(runtime)
for (index_t i = 0; i < outer_size; ++i) {
int idx = 0;
T max_value = std::numeric_limits<T>::lowest();
const T *input_ptr = input_data + i * inner_size;
for (index_t j = 0; j < inner_size; ++j) {
if (input_ptr[j] > max_value) {
max_value = input_ptr[j];
idx = j;
for (index_t i = 0; i < outer_size; ++i) {
int idx = 0;
T min_value = std::numeric_limits<T>::max();
const T *input_ptr = input_data + i * inner_size;
for (index_t j = 0; j < inner_size; ++j) {
if (input_ptr[j] < min_value) {
min_value = input_ptr[j];
idx = j;
}
}
output_data[i] = idx;
}
} else {
#pragma omp parallel for schedule(runtime)
for (index_t i = 0; i < outer_size; ++i) {
int idx = 0;
T max_value = std::numeric_limits<T>::lowest();
const T *input_ptr = input_data + i * inner_size;
for (index_t j = 0; j < inner_size; ++j) {
if (input_ptr[j] > max_value) {
max_value = input_ptr[j];
idx = j;
}
}
output_data[i] = idx;
}
output_data[i] = idx;
}
return MaceStatus::MACE_SUCCESS;
}
protected:
const int axis_;
bool keep_dims_;
bool argmin_;
};
......
......@@ -67,5 +67,29 @@ void ReluxNeon(const float *input, const float limit,
#endif
}
void LeakyReluNeon(const float *input, const float alpha,
const index_t size, float *output) {
#if defined(MACE_ENABLE_NEON)
float32x4_t vzero = vdupq_n_f32(0.f);
float32x4_t valpha = vdupq_n_f32(alpha);
#pragma omp parallel for schedule(runtime)
for (index_t i = 0; i <= size - 4; i += 4) {
float32x4_t v = vld1q_f32(input + i);
v = vmaxq_f32(v, vzero);
v = vmulq_f32(v, valpha);
vst1q_f32(output + i, v);
}
// remain
for (index_t i = (size >> 2) << 2; i < size; ++i) {
output[i] = std::max(input[i], 0.f) * alpha;
}
#else
#pragma omp parallel for schedule(runtime)
for (index_t i = 0; i < size; ++i) {
output[i] = std::max(input[i], 0.f) * alpha;
}
#endif
}
} // namespace ops
} // namespace mace
......@@ -25,6 +25,9 @@ void ReluNeon(const float *input, const index_t size, float *output);
void ReluxNeon(const float *input, const float limit,
const index_t size, float *output);
void LeakyReluNeon(const float *input, const float alpha,
const index_t size, float *output);
} // namespace ops
} // namespace mace
......
......@@ -43,6 +43,7 @@ class PoolingKernel : public OpenCLPoolingKernel {
const Padding &padding_type,
const std::vector<int> &padding_data,
const int *dilations,
const RoundType round_type,
Tensor *output) override;
private:
......@@ -62,6 +63,7 @@ MaceStatus PoolingKernel<T>::Compute(
const Padding &padding_type,
const std::vector<int> &padding_data,
const int *dilations,
const RoundType round_type,
Tensor *output) {
MACE_CHECK(dilations[0] == 1 && dilations[1] == 1)
<< "Pooling opencl kernel not support dilation yet";
......@@ -82,7 +84,7 @@ MaceStatus PoolingKernel<T>::Compute(
} else {
paddings = padding_data;
CalcOutputSize(input->shape().data(), filter_shape.data(),
padding_data.data(), dilations, strides, RoundType::CEIL,
padding_data.data(), dilations, strides, round_type,
output_shape.data());
}
......
......@@ -102,6 +102,9 @@ inline DATA_TYPE4 do_activation(DATA_TYPE4 in,
#endif
#ifdef USE_SIGMOID
out = do_sigmoid(in);
#endif
#ifdef USE_LEAKYRELU
out = fmax(in, (DATA_TYPE)0) * relux_max_limit;
#endif
return out;
}
......
#include <common.h>
__kernel void reduce_mean(OUT_OF_RANGE_PARAMS
GLOBAL_WORK_GROUP_SIZE_DIM3
__read_only image2d_t input,
__local float4 *group_sum,
__private const int group_size,
__private const int partial_len,
__private const int remain_index,
__private const int batch,
__private const int in_height,
__private const int in_width,
__private const float image_size_reciprocal,
__private const int channel_blocks,
__write_only image2d_t output) {
__kernel void reduce(OUT_OF_RANGE_PARAMS
GLOBAL_WORK_GROUP_SIZE_DIM3
__read_only image2d_t input,
__local float4 *group_sum,
__private const int group_size,
__private const int partial_len,
__private const int remain_index,
__private const int batch,
__private const int in_height,
__private const int in_width,
__private const float image_size_reciprocal,
__private const int channel_blocks,
__write_only image2d_t output) {
const int i = get_local_id(0);
const int j = get_local_id(1);
const int k = get_global_id(2);
......@@ -22,12 +22,22 @@ __kernel void reduce_mean(OUT_OF_RANGE_PARAMS
return;
#endif
const int dim0_size = get_local_size(0);
float4 tmp = (float4){0, 0, 0, 0};
const int index = mad24(j, dim0_size, i);
const int b = k / channel_blocks;
const int ch = mad24(b, -channel_blocks, k);
DATA_TYPE4 in;
#if REDUCE_TYPE == 1
float4 tmp = (float4){MAXFLOAT, MAXFLOAT, MAXFLOAT, MAXFLOAT};
#elif REDUCE_TYPE == 2
float4 tmp = (float4){-MAXFLOAT, -MAXFLOAT, -MAXFLOAT, -MAXFLOAT};
#elif REDUCE_TYPE == 3
float4 tmp = (float4){1, 1, 1, 1};
#else
float4 tmp = (float4){0, 0, 0, 0};
#endif
const int valid_part_len = select(partial_len,
partial_len - 1,
remain_index > 0 && index >= remain_index);
......@@ -43,19 +53,51 @@ __kernel void reduce_mean(OUT_OF_RANGE_PARAMS
int pos_x = mad24(ch, in_width, w_id);
int pos_y = mad24(b, in_height, h_id);
in = READ_IMAGET(input, SAMPLER, (int2)(pos_x, pos_y));
// MIN
#if REDUCE_TYPE == 1
tmp = fmin(tmp, in);
// MAX
#elif REDUCE_TYPE == 2
tmp = fmax(tmp, in);
// PROD
#elif REDUCE_TYPE == 3
tmp = tmp * in;
// MEAN
#else
tmp = tmp + in;
#endif
}
group_sum[index] = tmp * image_size_reciprocal;
#if REDUCE_TYPE == 0
tmp = tmp * image_size_reciprocal;
#endif
group_sum[index] = tmp;
#ifdef NON_QUALCOMM_ADRENO
barrier(CLK_LOCAL_MEM_FENCE);
#endif
if (i == 0 && j == 0) {
#if REDUCE_TYPE == 1
DATA_TYPE4 out = (DATA_TYPE4){MAXFLOAT, MAXFLOAT, MAXFLOAT, MAXFLOAT};
#elif REDUCE_TYPE == 2
DATA_TYPE4 out = (DATA_TYPE4){-MAXFLOAT, -MAXFLOAT, -MAXFLOAT, -MAXFLOAT};
#elif REDUCE_TYPE == 3
DATA_TYPE4 out = (DATA_TYPE4){1, 1, 1, 1};
#else
DATA_TYPE4 out = (DATA_TYPE4){0, 0, 0, 0};
#endif
#pragma unroll
for (int l = 0; l < group_size; ++l) {
#if REDUCE_TYPE == 1
out = fmin(out, group_sum[l]);
#elif REDUCE_TYPE == 2
out = fmax(out, group_sum[l]);
#elif REDUCE_TYPE == 3
out = out * group_sum[l];
#else
out = out + group_sum[l];
#endif
}
WRITE_IMAGET(output, (int2)(ch, b), out);
}
......
......@@ -99,6 +99,10 @@ MaceStatus ActivationKernel<T>::Compute(
tuning_key_prefix_ = "sigmoid_opencl_kernel";
built_options.emplace("-DUSE_SIGMOID");
break;
case LEAKYRELU:
tuning_key_prefix_ = "leakyrelu_opencl_kernel";
built_options.emplace("-DUSE_LEAKYRELU");
break;
default:
LOG(FATAL) << "Unknown activation type: " << activation_;
}
......
......@@ -69,6 +69,7 @@ class PoolingKernel : public OpenCLPoolingKernel {
const Padding &padding_type,
const std::vector<int> &padding_data,
const int *dilations,
const RoundType round_type,
Tensor *output) override;
private:
......@@ -87,6 +88,7 @@ MaceStatus PoolingKernel<T>::Compute(
const Padding &padding_type,
const std::vector<int> &padding_data,
const int *dilations,
const RoundType round_type,
Tensor *output) {
MACE_CHECK(dilations[0] == 1 && dilations[1] == 1)
<< "Pooling opencl kernel not support dilation yet";
......@@ -103,7 +105,7 @@ MaceStatus PoolingKernel<T>::Compute(
} else {
paddings = padding_data;
CalcOutputSize(input->shape().data(), filter_shape.data(),
padding_data.data(), dilations, strides, RoundType::CEIL,
padding_data.data(), dilations, strides, round_type,
output_shape.data());
}
......
......@@ -11,10 +11,10 @@
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_OPS_OPENCL_IMAGE_REDUCE_MEAN_H_
#define MACE_OPS_OPENCL_IMAGE_REDUCE_MEAN_H_
#ifndef MACE_OPS_OPENCL_IMAGE_REDUCE_H_
#define MACE_OPS_OPENCL_IMAGE_REDUCE_H_
#include "mace/ops/opencl/reduce_mean.h"
#include "mace/ops/opencl/reduce.h"
#include <memory>
#include <set>
......@@ -24,6 +24,7 @@
#include "mace/core/op_context.h"
#include "mace/core/tensor.h"
#include "mace/ops/opencl/helper.h"
#include "mace/ops/reduce.h"
namespace mace {
namespace ops {
......@@ -31,11 +32,12 @@ namespace opencl {
namespace image {
template <typename T>
class ReduceMeanKernel : public OpenCLReduceMeanKernel {
class ReduceKernel : public OpenCLReduceKernel {
public:
ReduceMeanKernel(const std::vector<int> axis,
const bool keep_dims)
: axis_(axis), keep_dims_(keep_dims) {}
ReduceKernel(ReduceType type,
const std::vector<int> axis,
const bool keep_dims)
: reduce_type_(type), axis_(axis), keep_dims_(keep_dims) {}
MaceStatus Compute(
OpContext *context,
......@@ -43,6 +45,7 @@ class ReduceMeanKernel : public OpenCLReduceMeanKernel {
Tensor *output) override;
private:
ReduceType reduce_type_;
const std::vector<int> axis_;
bool keep_dims_;
cl::Kernel kernel_;
......@@ -51,16 +54,16 @@ class ReduceMeanKernel : public OpenCLReduceMeanKernel {
};
template <typename T>
MaceStatus ReduceMeanKernel<T>::Compute(
MaceStatus ReduceKernel<T>::Compute(
OpContext *context,
const Tensor *input,
Tensor *output) {
MACE_CHECK_NOTNULL(input);
MACE_CHECK(keep_dims_, "reduce mean gpu only support keep dims.");
MACE_CHECK(input->dim_size() == 4,
"reduce mean gpu only support 4-dim input");
"reduce gpu only support 4-dim input");
MACE_CHECK(axis_.size() == 2 && axis_[0] == 1 && axis_[1] == 2,
"reduce mean gpu only support 1,2-axis reduce");
"reduce gpu only support 1,2-axis reduce");
index_t batch = input->dim(0);
const index_t in_height = input->dim(1);
const index_t in_width = input->dim(2);
......@@ -84,14 +87,15 @@ MaceStatus ReduceMeanKernel<T>::Compute(
std::set<std::string> built_options;
MACE_OUT_OF_RANGE_CONFIG;
MACE_NON_UNIFORM_WG_CONFIG;
std::string kernel_name = MACE_OBFUSCATE_SYMBOL("reduce_mean");
built_options.emplace("-Dreduce_mean=" + kernel_name);
std::string kernel_name = MACE_OBFUSCATE_SYMBOL("reduce");
built_options.emplace("-Dreduce=" + kernel_name);
built_options.emplace("-DDATA_TYPE=" + DtToUpCompatibleCLDt(dt));
built_options.emplace("-DCMD_DATA_TYPE=" + DtToUpCompatibleCLCMDDt(dt));
built_options.emplace(MakeString("-DREDUCE_TYPE=", reduce_type_));
if (runtime->gpu_type() != GPUType::QUALCOMM_ADRENO) {
built_options.emplace("-DNON_QUALCOMM_ADRENO");
}
MACE_RETURN_IF_ERROR(runtime->BuildKernel("reduce_mean",
MACE_RETURN_IF_ERROR(runtime->BuildKernel("reduce",
kernel_name,
built_options,
&kernel_));
......@@ -170,4 +174,4 @@ MaceStatus ReduceMeanKernel<T>::Compute(
} // namespace ops
} // namespace mace
#endif // MACE_OPS_OPENCL_IMAGE_REDUCE_MEAN_H_
#endif // MACE_OPS_OPENCL_IMAGE_REDUCE_H_
......@@ -36,6 +36,7 @@ class OpenCLPoolingKernel {
const Padding &padding_type,
const std::vector<int> &padding_data,
const int *dilations,
const RoundType round_type,
Tensor *output) = 0;
MACE_EMPTY_VIRTUAL_DESTRUCTOR(OpenCLPoolingKernel);
};
......
......@@ -12,8 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_OPS_OPENCL_REDUCE_MEAN_H_
#define MACE_OPS_OPENCL_REDUCE_MEAN_H_
#ifndef MACE_OPS_OPENCL_REDUCE_H_
#define MACE_OPS_OPENCL_REDUCE_H_
#include "mace/public/mace.h"
#include "mace/utils/utils.h"
......@@ -24,16 +24,16 @@ class OpContext;
class Tensor;
namespace ops {
class OpenCLReduceMeanKernel {
class OpenCLReduceKernel {
public:
virtual MaceStatus Compute(
OpContext *context,
const Tensor *input,
Tensor *output) = 0;
MACE_EMPTY_VIRTUAL_DESTRUCTOR(OpenCLReduceMeanKernel);
MACE_EMPTY_VIRTUAL_DESTRUCTOR(OpenCLReduceKernel);
};
} // namespace ops
} // namespace mace
#endif // MACE_OPS_OPENCL_REDUCE_MEAN_H_
#endif // MACE_OPS_OPENCL_REDUCE_H_
......@@ -44,7 +44,7 @@ extern void RegisterLocalResponseNorm(OpRegistryBase *op_registry);
extern void RegisterMatMul(OpRegistryBase *op_registry);
extern void RegisterPad(OpRegistryBase *op_registry);
extern void RegisterPooling(OpRegistryBase *op_registry);
extern void RegisterReduceMean(OpRegistryBase *op_registry);
extern void RegisterReduce(OpRegistryBase *op_registry);
extern void RegisterReshape(OpRegistryBase *op_registry);
extern void RegisterResizeBicubic(OpRegistryBase *op_registry);
extern void RegisterResizeBilinear(OpRegistryBase *op_registry);
......@@ -102,7 +102,7 @@ OpRegistry::OpRegistry() : OpRegistryBase() {
ops::RegisterMatMul(this);
ops::RegisterPad(this);
ops::RegisterPooling(this);
ops::RegisterReduceMean(this);
ops::RegisterReduce(this);
ops::RegisterReshape(this);
ops::RegisterResizeBicubic(this);
ops::RegisterResizeBilinear(this);
......
......@@ -43,11 +43,14 @@ class PoolingOpBase : public ConvPool2dOpBase {
kernels_(Operation::GetRepeatedArgs<int>("kernels")),
pooling_type_(
static_cast<PoolingType>(Operation::GetOptionalArg<int>(
"pooling_type", static_cast<int>(AVG)))) {}
"pooling_type", static_cast<int>(AVG)))),
round_type_(static_cast<RoundType>(Operation::GetOptionalArg<int>(
"round_mode", static_cast<int>(CEIL)))) {}
protected:
std::vector<int> kernels_;
PoolingType pooling_type_;
RoundType round_type_;
MACE_OP_INPUT_TAGS(INPUT);
MACE_OP_OUTPUT_TAGS(OUTPUT);
......@@ -82,7 +85,7 @@ class PoolingOp<DeviceType::CPU, float> : public PoolingOpBase {
paddings_.data(),
dilations_.data(),
strides_.data(),
RoundType::CEIL,
round_type_,
output_shape.data());
}
MACE_RETURN_IF_ERROR(output_tensor->Resize(output_shape));
......@@ -255,7 +258,7 @@ class PoolingOp<DeviceType::CPU, uint8_t> : public PoolingOpBase {
paddings_.data(),
dilations_.data(),
strides_.data(),
RoundType::CEIL,
round_type_,
output_shape.data());
}
MACE_RETURN_IF_ERROR(output_tensor->Resize(output_shape));
......@@ -442,7 +445,7 @@ class PoolingOp<DeviceType::GPU, T> : public PoolingOpBase {
return kernel_->Compute(context, input, pooling_type_, kernels_.data(),
strides_.data(), padding_type_, paddings_,
dilations_.data(), output);
dilations_.data(), round_type_, output);
}
private:
......
此差异已折叠。
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MACE_OPS_REDUCE_H_
#define MACE_OPS_REDUCE_H_
namespace mace {
enum ReduceType {
// SUM = 0,
MEAN = 0,
MIN = 1,
MAX = 2,
PROD = 3,
// SUM_SQR = 4,
// SQR_MEAN = 5,
};
} // namespace mace
#endif // MACE_OPS_REDUCE_H_
......@@ -21,7 +21,7 @@ namespace test {
namespace {
template <DeviceType D, typename T>
void ReduceMean(int iters, int batch, int channels,
void Reduce(int iters, int batch, int channels,
int height, int width) {
mace::testing::StopTiming();
......@@ -34,7 +34,7 @@ void ReduceMean(int iters, int batch, int channels,
net.AddRandomInput<D, T>("Input", {batch, channels, height, width});
}
OpDefBuilder("ReduceMean", "ReduceMeanBM")
OpDefBuilder("Reduce", "ReduceBM")
.Input("Input")
.AddIntsArg("axis", axis)
.Output("OutputImage")
......@@ -55,30 +55,30 @@ void ReduceMean(int iters, int batch, int channels,
}
} // namespace
#define MACE_BM_REDUCE_MEAN_MACRO(N, C, H, W, TYPE, DEVICE) \
#define MACE_BM_REDUCE_MACRO(N, C, H, W, TYPE, DEVICE) \
static void \
MACE_BM_REDUCE_MEAN_##N##_##C##_##H##_##W##_##TYPE##_##DEVICE(\
MACE_BM_REDUCE_##N##_##C##_##H##_##W##_##TYPE##_##DEVICE(\
int iters) { \
const int64_t tot = static_cast<int64_t>(iters) * N * C * H * W; \
mace::testing::MaccProcessed(tot); \
mace::testing::BytesProcessed(tot *(sizeof(TYPE))); \
ReduceMean<DEVICE, TYPE>(iters, N, C, H, W); \
Reduce<DEVICE, TYPE>(iters, N, C, H, W); \
} \
MACE_BENCHMARK( \
MACE_BM_REDUCE_MEAN_##N##_##C##_##H##_##W##_##TYPE##_##DEVICE)
MACE_BM_REDUCE_##N##_##C##_##H##_##W##_##TYPE##_##DEVICE)
#define MACE_BM_REDUCE_MEAN(N, C, H, W) \
MACE_BM_REDUCE_MEAN_MACRO(N, C, H, W, float, GPU); \
MACE_BM_REDUCE_MEAN_MACRO(N, C, H, W, half, GPU); \
MACE_BM_REDUCE_MEAN_MACRO(N, C, H, W, float, CPU);
#define MACE_BM_REDUCE(N, C, H, W) \
MACE_BM_REDUCE_MACRO(N, C, H, W, float, GPU); \
MACE_BM_REDUCE_MACRO(N, C, H, W, half, GPU); \
MACE_BM_REDUCE_MACRO(N, C, H, W, float, CPU);
MACE_BM_REDUCE_MEAN(1, 1, 512, 512);
MACE_BM_REDUCE_MEAN(4, 3, 128, 128);
MACE_BM_REDUCE_MEAN(4, 1, 512, 512);
MACE_BM_REDUCE_MEAN(16, 32, 112, 112);
MACE_BM_REDUCE_MEAN(8, 64, 256, 256);
MACE_BM_REDUCE_MEAN(1, 32, 480, 640);
MACE_BM_REDUCE(1, 1, 512, 512);
MACE_BM_REDUCE(4, 3, 128, 128);
MACE_BM_REDUCE(4, 1, 512, 512);
MACE_BM_REDUCE(16, 32, 112, 112);
MACE_BM_REDUCE(8, 64, 256, 256);
MACE_BM_REDUCE(1, 32, 480, 640);
} // namespace test
......
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <algorithm>
#include <memory>
#include <vector>
#include "mace/core/future.h"
#include "mace/core/operator.h"
#include "mace/core/tensor.h"
#ifdef MACE_ENABLE_OPENCL
#include "mace/ops/opencl/image/reduce_mean.h"
#endif // MACE_ENABLE_OPENCL
namespace mace {
namespace ops {
class ReduceMeanOpBase : public Operation {
public:
explicit ReduceMeanOpBase(OpConstructContext *context)
: Operation(context),
axis_(Operation::GetRepeatedArgs<int>("axis")),
keep_dims_(Operation::GetOptionalArg<bool>("keepdims", false)) {
}
protected:
inline void Validate() {
const Tensor *input = this->Input(0);
const int left = static_cast<int>(input->dim_size() * -1);
const int right = static_cast<int>(input->dim_size());
if (axis_.size()) {
for (unsigned int i = 0; i < axis_.size(); ++i) {
MACE_CHECK(axis_[i] > left && axis_[i] < right, "Axis is over range.");
}
}
}
protected:
std::vector<int> axis_;
bool keep_dims_;
};
template <DeviceType D, class T>
class ReduceMeanOp;
template <typename T>
class ReduceMeanOp<DeviceType::CPU, T> : public ReduceMeanOpBase {
public:
explicit ReduceMeanOp(OpConstructContext *context)
: ReduceMeanOpBase(context) {
}
MaceStatus Run(OpContext *context) override {
MACE_UNUSED(context);
Validate();
const Tensor *input = this->Input(0);
Tensor *output = this->Output(0);
Simplify(input);
output->Resize(out_shape_);
Compute(input, output);
return MaceStatus::MACE_SUCCESS;
}
private:
void Simplify(const Tensor *input) {
std::vector<bool> bitmap(static_cast<uint32_t>(input->dim_size()), false);
if (axis_.size() == 0) {
for (int i = 0; i < input->dim_size(); ++i) {
bitmap[i] = true;
}
} else {
for (unsigned int i = 0; i < axis_.size(); ++i) {
int index = axis_[i] >= 0 ?
axis_[i] :
axis_[i] + input->dim_size();
// axis format is NHWC
if (input->dim_size() == 4) {
if (index == 1) index = 2;
else if (index == 2) index = 3;
else if (index == 3) index = 1;
}
bitmap[index] = true;
}
}
out_shape_.clear();
for (unsigned int i = 0; i < input->dim_size(); ++i) {
if (!bitmap[i]) {
out_shape_.push_back(input->dim(i));
} else if (keep_dims_) {
out_shape_.push_back(1);
}
}
data_reshape_.clear();
unsigned int dim_index = 0;
for (; dim_index < input->dim_size(); ++dim_index) {
if (input->dim(dim_index) != 1) break;
}
if (dim_index >= input->dim_size()) {
reduce_first_axis_ = true;
} else {
reduce_first_axis_ = bitmap[dim_index];
data_reshape_.push_back(input->dim(dim_index));
++dim_index;
for (; dim_index < input->dim_size(); ++dim_index) {
const int n = input->dim(dim_index);
if (n == 1) {
bitmap[dim_index] = bitmap[dim_index - 1];
}
if (bitmap[dim_index-1] != bitmap[dim_index]) {
data_reshape_.push_back(n);
} else {
data_reshape_.back() *= n;
}
}
}
}
void Compute(const Tensor *input, Tensor *output) {
Tensor::MappingGuard input_mapper(input);
const T *input_ptr = input->data<T>();
Tensor::MappingGuard output_map(output);
T *output_ptr = output->mutable_data<T>();
memset(output_ptr, 0, output->size() * sizeof(T));
switch (data_reshape_.size()) {
case 1:
if (reduce_first_axis_) {
T sum = 0;
for (int i = 0; i < data_reshape_[0]; ++i) {
sum = sum + input_ptr[i];
}
output_ptr[0] = sum / data_reshape_[0];
} else {
#pragma omp parallel for schedule(runtime)
for (int i = 0; i < data_reshape_[0]; ++i) {
output_ptr[i] = input_ptr[i];
}
}
break;
case 2:
if (reduce_first_axis_) {
#pragma omp parallel for schedule(runtime)
for (int i = 0; i < data_reshape_[1]; ++i) {
for (int j = 0; j < data_reshape_[0]; ++j) {
output_ptr[i] += input_ptr[j * data_reshape_[1] + i];
}
output_ptr[i] /= data_reshape_[0];
}
} else {
#pragma omp parallel for schedule(runtime)
for (int i = 0; i < data_reshape_[0]; ++i) {
for (int j = 0; j < data_reshape_[1]; ++j) {
output_ptr[i] += input_ptr[i * data_reshape_[1] + j];
}
output_ptr[i] /= data_reshape_[1];
}
}
break;
case 3:
if (reduce_first_axis_) {
#pragma omp parallel for schedule(runtime)
for (int i = 0; i < data_reshape_[1]; ++i) {
for (int j = 0; j < data_reshape_[2]; ++j) {
for (int k = 0; k < data_reshape_[0]; ++k) {
output_ptr[i] +=
input_ptr[(k * data_reshape_[1] + i) * data_reshape_[2]
+ j];
}
}
output_ptr[i] /= (data_reshape_[0] * data_reshape_[2]);
}
} else {
#pragma omp parallel for collapse(2) schedule(runtime)
for (int i = 0; i < data_reshape_[0]; ++i) {
for (int j = 0; j < data_reshape_[2]; ++j) {
for (int k = 0; k < data_reshape_[1]; ++k) {
output_ptr[i * data_reshape_[2] + j] +=
input_ptr[(i * data_reshape_[1] + k) * data_reshape_[2]
+ j];
}
output_ptr[i * data_reshape_[2] + j] /= data_reshape_[1];
}
}
}
break;
case 4:
if (reduce_first_axis_) {
#pragma omp parallel for collapse(2) schedule(runtime)
for (int i = 0; i < data_reshape_[1]; ++i) {
for (int j = 0; j < data_reshape_[3]; ++j) {
for (int k = 0; k < data_reshape_[2]; ++k) {
for (int t = 0; t < data_reshape_[0]; ++t) {
output_ptr[i * data_reshape_[3] + j] +=
input_ptr[((t * data_reshape_[1] + i) *
data_reshape_[2] + k)*data_reshape_[3] + j];
}
}
output_ptr[i * data_reshape_[3] + j] /=
(data_reshape_[0] * data_reshape_[2]);
}
}
} else {
#pragma omp parallel for collapse(2) schedule(runtime)
for (int i = 0; i < data_reshape_[0]; ++i) {
for (int j = 0; j < data_reshape_[2]; ++j) {
for (int k = 0; k < data_reshape_[1]; ++k) {
for (int t = 0; t < data_reshape_[3]; ++t) {
output_ptr[i * data_reshape_[2] + j] +=
input_ptr[((i * data_reshape_[1] + k) *
data_reshape_[2] + j)*data_reshape_[3] + t];
}
}
output_ptr[i * data_reshape_[2] + j] /=
(data_reshape_[1] * data_reshape_[3]);
}
}
}
break;
default:
MACE_CHECK(false, "not implemented in mace")
<< "data reshape size" << data_reshape_.size()
<< "reduce first axis:" << reduce_first_axis_;
break;
}
}
private:
bool reduce_first_axis_;
std::vector<int> data_reshape_;
std::vector<index_t> out_shape_;
};
#ifdef MACE_ENABLE_OPENCL
template <typename T>
class ReduceMeanOp<DeviceType::GPU, T> : public ReduceMeanOpBase {
public:
explicit ReduceMeanOp(OpConstructContext *context)
: ReduceMeanOpBase(context) {
if (context->device()->gpu_runtime()->UseImageMemory()) {
kernel_.reset(new opencl::image::ReduceMeanKernel<T>(axis_, keep_dims_));
} else {
MACE_NOT_IMPLEMENTED;
}
}
MaceStatus Run(OpContext *context) override {
Validate();
const Tensor *input = this->Input(0);
Tensor *output = this->Output(0);
return kernel_->Compute(context, input, output);
}
private:
std::unique_ptr<OpenCLReduceMeanKernel> kernel_;
};
#endif // MACE_ENABLE_OPENCL
void RegisterReduceMean(OpRegistryBase *op_registry) {
MACE_REGISTER_OP(op_registry, "ReduceMean", ReduceMeanOp,
DeviceType::CPU, float);
#ifdef MACE_ENABLE_OPENCL
MACE_REGISTER_OP(op_registry, "ReduceMean", ReduceMeanOp,
DeviceType::GPU, float);
MACE_REGISTER_OP(op_registry, "ReduceMean", ReduceMeanOp,
DeviceType::GPU, half);
#endif // MACE_ENABLE_OPENCL
}
} // namespace ops
} // namespace mace
// Copyright 2018 Xiaomi, Inc. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mace/ops/ops_test_util.h"
namespace mace {
namespace ops {
namespace test {
class ReduceMeanOpTest : public OpsTestBase {};
namespace {
template <DeviceType D>
void Simple(const std::vector<index_t> &input_shape,
const std::vector<float> &input,
const std::vector<int> &axis,
const std::vector<index_t> &output_shape,
const std::vector<float> &output,
const bool keepdims = true) {
// Construct graph
OpsTestNet net;
// Add input data
net.AddInputFromArray<D, float>("Input", input_shape, input);
if (D == DeviceType::CPU) {
net.TransformDataFormat<D, float>("Input", NHWC, "InputNCHW", NCHW);
OpDefBuilder("ReduceMean", "ReduceMeanTest")
.Input("InputNCHW")
.AddIntsArg("axis", axis)
.AddIntArg("keepdims", keepdims ? 1 : 0)
.Output("OutputNCHW")
.Finalize(net.NewOperatorDef());
// Run
net.RunOp(D);
net.TransformDataFormat<D, float>("OutputNCHW", NCHW, "Output", NHWC);
} else {
OpDefBuilder("ReduceMean", "ReduceMeanTest")
.Input("Input")
.AddIntsArg("axis", axis)
.AddIntArg("keepdims", keepdims ? 1 : 0)
.Output("Output")
.Finalize(net.NewOperatorDef());
// Run
net.RunOp(D);
}
auto expected = net.CreateTensor<float>(output_shape, output);
ExpectTensorNear<float>(*expected, *net.GetOutput("Output"), 1e-5, 1e-3);
}
template <DeviceType D>
void Simple3D(const std::vector<index_t> &input_shape,
const std::vector<float> &input,
const std::vector<int> &axis,
const std::vector<index_t> &output_shape,
const std::vector<float> &output,
const bool keepdims = true) {
// Construct graph
OpsTestNet net;
// Add input data
net.AddInputFromArray<D, float>("Input", input_shape, input);
OpDefBuilder("ReduceMean", "ReduceMeanTest")
.Input("Input")
.AddIntsArg("axis", axis)
.AddIntArg("keepdims", keepdims ? 1 : 0)
.Output("Output")
.Finalize(net.NewOperatorDef());
// Run
net.RunOp(D);
auto expected = net.CreateTensor<float>(output_shape, output);
ExpectTensorNear<float>(*expected, *net.GetOutput("Output"), 1e-5, 1e-3);
}
template <DeviceType D>
void Simple12Test() {
Simple<D>({2, 2, 3, 4},
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23},
{1, 2},
{2, 1, 1, 4},
{10, 11, 12, 13,
10, 11, 12, 13});
}
template <DeviceType D>
void Simple1Axis() {
Simple<D>({2, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23,
0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{1},
{2, 1, 3, 4},
{6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{-3},
{1, 1, 3, 4},
{6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{2},
{1, 2, 1, 4},
{4, 5, 6, 7, 16, 17, 18, 19});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{-1},
{1, 2, 3, 1},
{1.5, 5.5, 9.5, 13.5, 17.5, 21.5});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{1},
{1, 1, 3, 3},
{9, 10, 11, 12, 13, 14, 15, 16, 17});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{-2},
{1, 3, 1, 3},
{3, 4, 5, 12, 13, 14, 21, 22, 23});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{3},
{1, 3, 3, 1},
{1, 4, 7, 10, 13, 16, 19, 22, 25});
}
template <DeviceType D>
void Simple2Axis() {
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 1},
{1, 1, 3, 4},
{6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 2},
{1, 2, 1, 4},
{4, 5, 6, 7, 16, 17, 18, 19});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{1, 3},
{1, 1, 3, 1},
{7.5, 11.5, 15.5});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{1, 2},
{1, 1, 1, 3},
{12, 13, 14});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{0, 1},
{1, 1, 3, 3},
{9, 10, 11, 12, 13, 14, 15, 16, 17});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{2, 3},
{1, 3, 1, 1},
{4, 13, 22});
}
template <DeviceType D>
void Simple2Axis3D() {
Simple3D<D>({2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 1},
{1, 1, 4},
{10, 11, 12, 13});
Simple3D<D>({2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{1, 2},
{2, 1, 1},
{5.5, 17.5});
}
template <DeviceType D>
void Simple3Axis() {
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{1, 2, 3},
{1, 1, 1, 1},
{11.5});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 2, 3},
{1, 2, 1, 1},
{5.5, 17.5});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 1, 3},
{1, 1, 3, 1},
{7.5, 11.5, 15.5});
Simple<D>({1, 2, 3, 4},
{0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 14, 15,
16, 17, 18, 19,
20, 21, 22, 23},
{0, 1, 2},
{1, 1, 1, 4},
{10, 11, 12, 13});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{1, 2, 3},
{1, 1, 1, 1},
{13});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{0, 2, 3},
{1, 3, 1, 1},
{4, 13, 22});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{0, 1, 3},
{1, 1, 3, 1},
{10, 13, 16});
Simple<D>({1, 3, 3, 3},
{0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26},
{0, 1, 2},
{1, 1, 1, 3},
{12, 13, 14});
}
} // namespace
TEST_F(ReduceMeanOpTest, CPUSimple12) {
Simple12Test<DeviceType::CPU>();
}
TEST_F(ReduceMeanOpTest, GPUSimple12) {
Simple12Test<DeviceType::GPU>();
}
TEST_F(ReduceMeanOpTest, CPUSimple1Axis) {
Simple1Axis<DeviceType::CPU>();
}
TEST_F(ReduceMeanOpTest, CPUSimple2Axis) {
Simple2Axis<DeviceType::CPU>();
}
TEST_F(ReduceMeanOpTest, CPUSimple2Axis3D) {
Simple2Axis3D<DeviceType::CPU>();
}
TEST_F(ReduceMeanOpTest, CPUSimple3Axis) {
Simple3Axis<DeviceType::CPU>();
}
TEST_F(ReduceMeanOpTest, CPUSimpleReduceDims) {
Simple3D<CPU>({2, 3, 4},
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23},
{0, 1},
{4},
{10, 11, 12, 13},
false);
}
namespace {
template <DeviceType D, typename T>
void RandomTest(const std::vector<index_t> &input_shape,
const std::vector<int> &axis) {
testing::internal::LogToStderr();
srand(time(NULL));
// Construct graph
OpsTestNet net;
// Add input data
net.AddRandomInput<D, float>("Input", input_shape);
net.TransformDataFormat<DeviceType::CPU, float>("Input", NHWC, "InputNCHW",
NCHW);
OpDefBuilder("ReduceMean", "ReduceMeanTest")
.Input("InputNCHW")
.AddIntsArg("axis", axis)
.AddIntArg("keepdims", 1)
.Output("OutputNCHW")
.Finalize(net.NewOperatorDef());
// Run
net.RunOp();
net.TransformDataFormat<DeviceType::CPU, float>("OutputNCHW", NCHW,
"Output", NHWC);
OpDefBuilder("ReduceMean", "ReduceMeanTest")
.Input("Input")
.AddIntsArg("axis", axis)
.AddIntArg("keepdims", 1)
.Output("OPENCLOutput")
.Finalize(net.NewOperatorDef());
// Run
net.RunOp(D);
if (DataTypeToEnum<T>::value == DT_FLOAT) {
ExpectTensorNear<float>(*net.GetTensor("Output"),
*net.GetOutput("OPENCLOutput"), 1e-5, 1e-4);
} else {
ExpectTensorNear<float>(*net.GetTensor("Output"),
*net.GetOutput("OPENCLOutput"), 1e-2, 1e-2);
}
}
} // namespace
TEST_F(ReduceMeanOpTest, GPURandomFloat) {
RandomTest<DeviceType::GPU, float>({4, 64, 64, 3}, {1, 2});
RandomTest<DeviceType::GPU, float>({2, 64, 64, 4}, {1, 2});
RandomTest<DeviceType::GPU, float>({8, 128, 128, 64}, {1, 2});
RandomTest<DeviceType::GPU, float>({1, 640, 480, 64}, {1, 2});
RandomTest<DeviceType::GPU, float>({1, 480, 640, 32}, {1, 2});
RandomTest<DeviceType::GPU, float>({1, 512, 512, 16}, {1, 2});
RandomTest<DeviceType::GPU, float>({8, 117, 87, 33}, {1, 2});
RandomTest<DeviceType::GPU, float>({1, 619, 450, 61}, {1, 2});
RandomTest<DeviceType::GPU, float>({1, 511, 561, 11}, {1, 2});
}
TEST_F(ReduceMeanOpTest, GPURandomHalf) {
RandomTest<DeviceType::GPU, half>({4, 64, 64, 3}, {1, 2});
RandomTest<DeviceType::GPU, half>({2, 64, 64, 4}, {1, 2});
RandomTest<DeviceType::GPU, half>({8, 128, 128, 64}, {1, 2});
RandomTest<DeviceType::GPU, half>({1, 640, 480, 64}, {1, 2});
RandomTest<DeviceType::GPU, half>({1, 480, 640, 32}, {1, 2});
RandomTest<DeviceType::GPU, half>({1, 512, 512, 16}, {1, 2});
RandomTest<DeviceType::GPU, half>({8, 117, 87, 33}, {1, 2});
RandomTest<DeviceType::GPU, half>({1, 619, 450, 61}, {1, 2});
RandomTest<DeviceType::GPU, half>({1, 511, 561, 11}, {1, 2});
}
} // namespace test
} // namespace ops
} // namespace mace
此差异已折叠。
......@@ -36,6 +36,7 @@ class ReshapeOp : public Operation {
int unknown_idx = -1;
index_t product = 1;
std::vector<index_t> out_shape;
index_t n = 0;
for (int i = 0; i < num_dims; ++i) {
if (shape_data[i] == -1) {
......@@ -45,8 +46,15 @@ class ReshapeOp : public Operation {
} else {
MACE_CHECK(shape_data[i] >= 0, "Shape must be non-negative: ",
shape_data[i]);
out_shape.push_back(shape_data[i]);
product *= shape_data[i];
if (shape_data[i] == 0) {
MACE_CHECK(i < input->dim_size(),
"dims:0 out of input dims' range.");
n = input->dim(i);
} else {
n = shape_data[i];
}
out_shape.push_back(n);
product *= n;
}
}
......
......@@ -13,6 +13,7 @@ py_library(
"converter_tool/base_converter.py",
"converter_tool/caffe_converter.py",
"converter_tool/hexagon_converter.py",
"converter_tool/onnx_converter.py",
"converter_tool/shape_inference.py",
"converter_tool/tensorflow_converter.py",
"converter_tool/tf_dsp_converter.py",
......
......@@ -101,7 +101,7 @@ def main(unused_args):
file=sys.stderr)
sys.exit(-1)
if FLAGS.platform not in ['tensorflow', 'caffe']:
if FLAGS.platform not in ['tensorflow', 'caffe', 'onnx']:
six.print_("platform %s is not supported." % FLAGS.platform,
file=sys.stderr)
sys.exit(-1)
......@@ -188,6 +188,9 @@ def main(unused_args):
converter = caffe_converter.CaffeConverter(option,
FLAGS.model_file,
FLAGS.weight_file)
elif FLAGS.platform == 'onnx':
from mace.python.tools.converter_tool import onnx_converter
converter = onnx_converter.OnnxConverter(option, FLAGS.model_file)
else:
six.print_("Mace do not support platorm %s yet." % FLAGS.platform,
file=sys.stderr)
......@@ -231,6 +234,7 @@ def parse_args():
type=str,
default="",
help="TensorFlow \'GraphDef\' file to load, "
"Onnx model file .onnx to load, "
"Caffe prototxt file to load.")
parser.add_argument(
"--weight_file", type=str, default="", help="Caffe data file to load.")
......@@ -300,7 +304,10 @@ def parse_args():
parser.add_argument(
"--check_shape", type=str, default="", help="check shape.")
parser.add_argument(
"--platform", type=str, default="tensorflow", help="tensorflow/caffe")
"--platform",
type=str,
default="tensorflow",
help="tensorflow/caffe/onnx")
parser.add_argument(
"--embed_model_data",
type=str2bool,
......
......@@ -37,11 +37,14 @@ class FilterFormat(Enum):
OHWI = 103
# SAME_LOWER: if the amount of paddings to be added is odd,
# it will add the extra data to the right or bottom
class PaddingMode(Enum):
VALID = 0
SAME = 1
FULL = 2
NA = 3
SAME_LOWER = 3
NA = 4
class PoolingType(Enum):
......@@ -49,6 +52,11 @@ class PoolingType(Enum):
MAX = 2
class RoundMode(Enum):
FLOOR = 0
CEIL = 1
class ActivationType(Enum):
NOOP = 0
RELU = 1
......@@ -56,6 +64,7 @@ class ActivationType(Enum):
PRELU = 3
TANH = 4
SIGMOID = 5
LEAKYRELU = 6
class EltwiseType(Enum):
......@@ -72,9 +81,17 @@ class EltwiseType(Enum):
EQUAL = 10
class ReduceType(Enum):
MEAN = 0
MIN = 1
MAX = 2
PROD = 3
class FrameworkType(Enum):
TENSORFLOW = 0
CAFFE = 1
ONNX = 2
MaceSupportedOps = [
......@@ -108,7 +125,7 @@ MaceSupportedOps = [
'Pooling',
'Proposal',
'Quantize',
'ReduceMean',
'Reduce',
'Reshape',
'ResizeBicubic',
'ResizeBilinear',
......@@ -184,6 +201,10 @@ class MaceKeyword(object):
mace_group_str = "group"
mace_wino_arg_str = "wino_block_size"
mace_quantize_flag_arg_str = "quantize_flag"
mace_epsilon_str = 'epsilon'
mace_reduce_type_str = 'reduce_type'
mace_argmin_str = 'argmin'
mace_round_mode_str = 'round_mode'
class TransformerRule(Enum):
......
此差异已折叠。
......@@ -26,6 +26,7 @@ from mace.python.tools.converter_tool.base_converter import PaddingMode
from mace.python.tools.converter_tool.base_converter import ActivationType
from mace.python.tools.converter_tool.base_converter import EltwiseType
from mace.python.tools.converter_tool.base_converter import FrameworkType
from mace.python.tools.converter_tool.base_converter import ReduceType
from mace.python.tools.converter_tool.base_converter import DataFormat
from mace.python.tools.converter_tool.base_converter import FilterFormat
from mace.python.tools.converter_tool.base_converter import MaceOp
......@@ -465,15 +466,6 @@ class TensorflowConverter(base_converter.ConverterInterface):
"Mace only supports dilation == 1 conv2d_transpose.")
mace_check(len(tf_op.inputs) >= 3,
"deconv should have (>=) 3 inputs.")
output_shape_arg = op.arg.add()
output_shape_arg.name = MaceKeyword.mace_output_shape_str
# if tf_op.inputs[0].op.type == TFOpType.Const.name:
# output_shape_value = \
# tf_op.inputs[0].eval().astype(np.int32).flat
# output_shape_arg.ints.extend(output_shape_value)
# else:
# output_shape_value = {}
# output_shape_arg.ints.extend(output_shape_value)
del op.input[:]
op.input.extend([tf_op.inputs[2].name,
tf_op.inputs[1].name,
......@@ -810,7 +802,12 @@ class TensorflowConverter(base_converter.ConverterInterface):
op = self.convert_general_op(tf_op)
del op.input[1:]
op.type = MaceOp.ReduceMean.name
op.type = MaceOp.Reduce.name
reduce_type_arg = op.arg.add()
reduce_type_arg.name = MaceKeyword.mace_reduce_type_str
reduce_type_arg.i = ReduceType.MEAN
axis_arg = op.arg.add()
axis_arg.name = MaceKeyword.mace_axis_str
if len(tf_op.inputs) > 1:
......
......@@ -352,21 +352,26 @@ class Transformer(base_converter.ConverterInterface):
if elttype == EltwiseType.SQR_DIFF.value and\
self.consumer_count(op.output[0]) == 1:
consumer_op = self._consumers[op.output[0]][0]
axis = ConverterUtil.get_arg(
consumer_op,
MaceKeyword.mace_axis_str).ints
keep_dims = ConverterUtil.get_arg(
consumer_op,
MaceKeyword.mace_keepdims_str).i
if consumer_op.type == MaceOp.ReduceMean.name and\
len(consumer_op.input) == 1 and \
axis[0] == 1 and axis[1] == 2 and keep_dims != 0:
print("Fold SquaredDiff ReduceMean: %s" % op.name)
op.type = MaceOp.SqrDiffMean.name
op.output[0] = consumer_op.output[0]
self.replace_quantize_info(op, consumer_op)
self.safe_remove_node(consumer_op, op)
return True
if consumer_op.type == MaceOp.Reduce.name:
axis = ConverterUtil.get_arg(
consumer_op,
MaceKeyword.mace_axis_str).ints
keep_dims = ConverterUtil.get_arg(
consumer_op,
MaceKeyword.mace_keepdims_str).i
reduce_type = ConverterUtil.get_arg(
consumer_op,
MaceKeyword.mace_reduce_type_str).i
if reduce_type == ReduceType.MEAN and\
len(consumer_op.input) == 1 and\
axis[0] == 1 and axis[1] == 2 and\
keep_dims > 0:
print("Fold SquaredDiff Reduce: %s" % op.name)
op.type = MaceOp.SqrDiffMean.name
op.output[0] = consumer_op.output[0]
self.replace_quantize_info(op, consumer_op)
self.safe_remove_node(consumer_op, op)
return True
return False
......@@ -1005,13 +1010,13 @@ class Transformer(base_converter.ConverterInterface):
'only support squeeze at at [2, 3]')
arg.ints[:] = [1, 2]
elif op.type == MaceOp.ReduceMean.name:
elif op.type == MaceOp.Reduce.name:
for arg in op.arg:
if arg.name == MaceKeyword.mace_axis_str:
if ConverterUtil.data_format(
op) == DataFormat.NCHW \
and self._target_data_format == DataFormat.NHWC: # noqa
print("Transpose reduce mean args: %s(%s)"
print("Transpose reduce args: %s(%s)"
% (op.name, op.type))
reduce_axises = list(arg.ints)
new_axises = []
......
......@@ -48,7 +48,7 @@ def _opencl_encrypt_kernel_impl(repository_ctx):
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/pad.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/pooling.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/pooling_buffer.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/reduce_mean.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/reduce.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/resize_bicubic.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/resize_bilinear.cl"))
unused_var = repository_ctx.path(Label("//:mace/ops/opencl/cl/split.cl"))
......
......@@ -362,6 +362,7 @@ class YAMLKeyword(object):
validation_threshold = 'validation_threshold'
graph_optimize_options = 'graph_optimize_options' # internal use for now
cl_mem_type = 'cl_mem_type'
backend = 'backend'
################################
......
......@@ -55,6 +55,7 @@ ModelFormatStrs = [
PlatformTypeStrs = [
"tensorflow",
"caffe",
"onnx",
]
PlatformType = Enum('PlatformType', [(ele, ele) for ele in PlatformTypeStrs],
type=str)
......@@ -469,6 +470,10 @@ def format_model_config(flags):
else:
subgraph[YAMLKeyword.validation_inputs_data] = \
validation_inputs_data
onnx_backend = subgraph.get(
YAMLKeyword.backend, "tensorflow")
subgraph[YAMLKeyword.backend] = onnx_backend
input_ranges = subgraph.get(
YAMLKeyword.input_ranges, [])
if not isinstance(input_ranges, list):
......
......@@ -572,7 +572,8 @@ class DeviceWrapper:
YAMLKeyword.input_data_types],
caffe_env=flags.caffe_env,
validation_threshold=subgraphs[0][
YAMLKeyword.validation_threshold][validate_type]
YAMLKeyword.validation_threshold][validate_type],
backend=subgraphs[0][YAMLKeyword.backend]
)
if flags.report and flags.round > 0:
tuned = is_tuned and device_type == DeviceType.GPU
......
# Copyright 2018 Xiaomi, Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import onnx
import sys
from onnx import optimizer
# Usage: python onnx_optimizer.py model.onnx model_opt.onnx
def main():
if len(sys.argv) != 3:
print "Usage: python onnx_optimizer.py model.onnx model_opt.onnx"
sys.exit(0)
in_path = sys.argv[1]
out_path = sys.argv[2]
original_model = onnx.load(in_path)
print "Start optimize ONNX model for inference:"
passes = ['eliminate_identity',
'fuse_consecutive_squeezes',
'fuse_consecutive_transposes',
'eliminate_nop_pad',
'eliminate_nop_transpose',
'eliminate_unused_initializer',
'extract_constant_to_initializer',
'fuse_add_bias_into_conv',
'fuse_bn_into_conv',
'fuse_transpose_into_gemm']
for i in range(len(passes)):
print i, ".", passes[i]
optimized_model = optimizer.optimize(original_model, passes)
onnx.save_model(optimized_model, out_path)
print "Optimize Finished!"
print "Please check new model in:", out_path
if __name__ == '__main__':
main()
......@@ -621,7 +621,8 @@ def validate_model(abi,
caffe_env,
input_file_name="model_input",
output_file_name="model_out",
validation_threshold=0.9):
validation_threshold=0.9,
backend="tensorflow"):
six.print_("* Validate with %s" % platform)
if abi != "host":
for output_name in output_nodes:
......@@ -638,7 +639,14 @@ def validate_model(abi,
"%s/%s" % (model_output_dir, output_file_name), device_type,
":".join(input_shapes), ":".join(output_shapes),
",".join(input_nodes), ",".join(output_nodes),
validation_threshold, ",".join(input_data_types))
validation_threshold, ",".join(input_data_types), backend)
elif platform == "onnx":
validate(platform, model_file_path, "",
"%s/%s" % (model_output_dir, input_file_name),
"%s/%s" % (model_output_dir, output_file_name), device_type,
":".join(input_shapes), ":".join(output_shapes),
",".join(input_nodes), ",".join(output_nodes),
validation_threshold, ",".join(input_data_types), backend)
elif platform == "caffe":
image_name = "mace-caffe:latest"
container_name = "mace_caffe_validator"
......@@ -654,7 +662,7 @@ def validate_model(abi,
device_type,
":".join(input_shapes), ":".join(output_shapes),
",".join(input_nodes), ",".join(output_nodes),
validation_threshold, ",".join(input_data_types))
validation_threshold, ",".join(input_data_types), backend)
elif caffe_env == common.CaffeEnvType.DOCKER:
docker_image_id = sh.docker("images", "-q", image_name)
if not docker_image_id:
......@@ -720,6 +728,7 @@ def validate_model(abi,
"--output_shape=%s" % ":".join(output_shapes),
"--validation_threshold=%f" % validation_threshold,
"--input_data_type=%s" % ",".join(input_data_types),
"--backend=%s" % ",".join(backend),
_fg=True)
six.print_("Validation done!\n")
......
......@@ -21,6 +21,10 @@ import re
import common
import onnx
from onnx import helper
from onnx import TensorProto
# Validation Flow:
# 1. Generate input data
# 2. Use mace_run to run model on phone.
......@@ -190,9 +194,64 @@ def validate_caffe_model(platform, device_type, model_file, input_file,
value, validation_threshold)
def validate_onnx_model(platform, device_type, model_file, input_file,
mace_out_file, input_names, input_shapes,
output_names, output_shapes, validation_threshold,
input_data_types, backend):
if backend == "tensorflow":
from onnx_tf.backend import prepare
print "valivate on onnx tensorflow backend."
elif backend == "caffe2" or backend == "pytorch":
from caffe2.python.onnx.backend import prepare
print "valivate on onnx caffe2 backend."
else:
common.MaceLogger.error(
VALIDATION_MODULE,
"onnx backend framwork '" + backend + "' is invalid.")
if not os.path.isfile(model_file):
common.MaceLogger.error(
VALIDATION_MODULE,
"Input graph file '" + model_file + "' does not exist!")
model = onnx.load(model_file)
input_dict = {}
for i in range(len(input_names)):
input_value = load_data(common.formatted_file_name(input_file,
input_names[i]),
input_data_types[i])
input_value = input_value.reshape(input_shapes[i]).transpose((0, 3, 1,
2))
input_dict[input_names[i]] = input_value
onnx_outputs = []
for i in range(len(output_names)):
out_shape = output_shapes[i]
if len(out_shape) == 4:
out_shape[1], out_shape[2], out_shape[3] = \
out_shape[3], out_shape[1], out_shape[2]
onnx_outputs.append(
helper.make_tensor_value_info(output_names[i],
TensorProto.FLOAT,
out_shape))
model.graph.output.extend(onnx_outputs)
rep = prepare(model)
output_values = rep.run(input_dict)
for i in range(len(output_names)):
out_name = output_names[i]
value = output_values[out_name].flatten()
out_shape = output_shapes[i]
if len(out_shape) == 4:
value = value.reshape(out_shape).transpose((0, 2, 3, 1))
output_file_name = common.formatted_file_name(mace_out_file,
output_names[i])
mace_out_value = load_data(output_file_name)
compare_output(platform, device_type, output_names[i],
mace_out_value, value,
validation_threshold)
def validate(platform, model_file, weight_file, input_file, mace_out_file,
device_type, input_shape, output_shape, input_node, output_node,
validation_threshold, input_data_type):
validation_threshold, input_data_type, backend):
input_names = [name for name in input_node.split(',')]
input_shape_strs = [shape for shape in input_shape.split(':')]
input_shapes = [[int(x) for x in shape.split(',')]
......@@ -217,6 +276,15 @@ def validate(platform, model_file, weight_file, input_file, mace_out_file,
mace_out_file, weight_file, input_names,
input_shapes, output_names, output_shapes,
validation_threshold)
elif platform == 'onnx':
output_shape_strs = [shape for shape in output_shape.split(':')]
output_shapes = [[int(x) for x in shape.split(',')]
for shape in output_shape_strs]
validate_onnx_model(platform, device_type, model_file, input_file,
mace_out_file, input_names, input_shapes,
output_names, output_shapes,
validation_threshold,
input_data_types, backend)
def parse_args():
......@@ -259,6 +327,11 @@ def parse_args():
parser.add_argument(
"--validation_threshold", type=float, default=0.995,
help="validation similarity threshold")
parser.add_argument(
"--backend",
type=str,
default="tensorflow",
help="onnx backend framwork")
return parser.parse_known_args()
......@@ -276,4 +349,5 @@ if __name__ == '__main__':
FLAGS.input_node,
FLAGS.output_node,
FLAGS.validation_threshold,
FLAGS.input_data_type)
FLAGS.input_data_type,
FLAGS.backend)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册